All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] lib: add TCP IPv4 GRO support
@ 2017-03-22  9:32 Jiayu Hu
  2017-03-22  9:32 ` [PATCH 1/2] lib: add Generic Receive Offload support for TCP IPv4 packets Jiayu Hu
                   ` (3 more replies)
  0 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-03-22  9:32 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW offloading technique,
which reassemble small packets into large ones, to reduce processing
overheads for upper layer applications in receiving side, like networking
stack. Therefore, we propose to add GRO support in DPDK.

DPDK GRO is implemented as a standalone library, which provides GRO
functions for various of protocols. In the design of DPDK GRO, different
protocols have own reassembly functions. Applications should explicitly
invoke specific reassembly functions according to packet types.

This patchset provides TCP IPv4 GRO reassembly functions, and
demonstrates the usage of these functions in app/testpmd.

We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
the performance gains from DPDK GRO. Specifically, the experiment
environment is:
	a. Two 10Gbps physical ports (p0 and p1) on one host are linked
	together, and are in two network namespaces (ns1 and ns2);
	b. iperf client runs on p0, which is in charge of sending TCP IPv4
	packets; testpmd runs on p1. Besides, testpmd connects with one
	virtual machine (VM) via vhost-user and virtio-kernel. The VM runs
	iperf server, whose IP is 1.1.2.4;
	c. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd
	mode;
	d. iperf client and server use the following commands:
	- iperf client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
	- iperf server: iperf -s -f g
Two test cases are:
	a. w/o DPDK GRO: disable TCP IPv4 GRO on testpmd
	b. w DPDK GRO: enable TCP IPv4 GRO on testpmd
Test results:
	a. w/o DPDK GRO: 5.5 Gbits/sec
	b. w DPDK GRO: 8.46 Gbits/sec
As we can see, the throughput improvement from DPDK GRO is around 50%.

Jiayu Hu (2):
  lib: add Generic Receive Offload support for TCP IPv4 packets
  app/testpmd: provide TCP IPv4 GRO function in iofwd mode

 app/test-pmd/cmdline.c       |  48 +++++++
 app/test-pmd/config.c        |  59 +++++++++
 app/test-pmd/iofwd.c         |   7 +
 app/test-pmd/testpmd.c       |  10 ++
 app/test-pmd/testpmd.h       |   6 +
 config/common_base           |   5 +
 lib/Makefile                 |   1 +
 lib/librte_gro/Makefile      |  50 +++++++
 lib/librte_gro/rte_gro_tcp.c | 301 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h | 114 ++++++++++++++++
 mk/rte.app.mk                |   1 +
 11 files changed, 602 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH 1/2] lib: add Generic Receive Offload support for TCP IPv4 packets
  2017-03-22  9:32 [PATCH 0/2] lib: add TCP IPv4 GRO support Jiayu Hu
@ 2017-03-22  9:32 ` Jiayu Hu
  2017-03-22  9:32 ` [PATCH 2/2] app/testpmd: provide TCP IPv4 GRO function in iofwd mode Jiayu Hu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-03-22  9:32 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, Jiayu Hu

Introduce two new functions to support TCP IPv4 GRO:
- rte_gro_tcp4_tbl_create: create a lookup table for TCP IPv4 GRO.
- rte_gro_tcp4_reassemble_burst: reassemble a bulk of TCP IPv4 packets
at a time.

rte_gro_tcp4_reassemble_burst works in burst-mode, which processes a bulk
of packets at a time. That is, applications are in charge of classifying
and accumulating TCP IPv4 packets before calling it. If applications
provide non-TCP IPv4 packets, rte_gro_tcp4_reassemble_burst won't process
them.

Before using rte_gro_tcp4_reassemble_burst, applications need to create
TCP IPv4 lookup tables via rte_gro_tcp4_tbl_create, which are used by
rte_gro_tcp4_reassemble_burst. The TCP IPv4 lookup table is a cuckoo
hashing table, whose keys are rules of merging TCP IPv4 packets, and
whose values point to item-lists. Each item-list contains items whose
keys are the same.

To process an incoming packet, there are following four steps:
a. Check if the packet should be processed. TCP IPv4 GRO doesn't
process the following types packets:
	-   non TCP-IPv4 packets
	-   packets without data
	-   packets with wrong checksums
	-   fragmented packets
b. Lookup the hash table to find a item-list, which stores packets that
may be able to merge with the incoming packet.
c. If lookup successfully, check all items in the item-list. If find one
that is the neighbor of the incoming packet, chaining them together and
update the packet header fields and mbuf metadata; if don't find,
allocate a new item for the incoming packet and insert it into the
item-list.
d. If fail to find a item-list, allocate a new item-list for the incoming
packet and insert it into the hash table.

After processing all packets, update checksums for the merged ones, and
clear the content of the lookup table.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base           |   5 +
 lib/Makefile                 |   1 +
 lib/librte_gro/Makefile      |  50 +++++++
 lib/librte_gro/rte_gro_tcp.c | 301 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h | 114 ++++++++++++++++
 mk/rte.app.mk                |   1 +
 6 files changed, 472 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/config/common_base b/config/common_base
index 37aa1e1..29475ad 100644
--- a/config/common_base
+++ b/config/common_base
@@ -609,6 +609,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 4178325..0665f58 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..71bdb04
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro_tcp.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..9fd3efe
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,301 @@
+#include "rte_gro_tcp.h"
+
+struct rte_hash *
+rte_gro_tcp4_tbl_create(char *name,
+		uint32_t nb_entries, uint16_t socket_id)
+{
+	struct rte_hash_parameters ht_param = {
+		.entries = nb_entries,
+		.name = name,
+		.key_len = sizeof(struct gro_tcp4_pre_rules),
+		.hash_func = rte_jhash,
+		.hash_func_init_val = 0,
+		.socket_id = socket_id,
+	};
+	struct rte_hash *tbl;
+
+	tbl = rte_hash_create(&ht_param);
+	if (tbl == NULL)
+		printf("GRO TCP4: allocate hash table fail\n");
+	return tbl;
+}
+
+/* update TCP IPv4 checksum */
+static void
+gro_tcp4_cksum_update(struct rte_mbuf *pkt)
+{
+	uint32_t len, offset, cksum;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, cksum_pld;
+
+	if (pkt == NULL)
+		return;
+
+	len = pkt->pkt_len;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+
+	offset = sizeof(struct ether_hdr) + ipv4_ihl;
+	len -= offset;
+
+	/* TCP cksum without IP pseudo header */
+	ipv4_hdr->hdr_checksum = 0;
+	tcp_hdr->cksum = 0;
+	if (rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld) < 0) {
+		printf("invalid param for raw_cksum_mbuf\n");
+		return;
+	}
+	/* IP pseudo header cksum */
+	cksum = cksum_pld;
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
+
+	/* combine TCP checksum and IP pseudo header checksum */
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	cksum = (cksum == 0) ? 0xffff : cksum;
+	tcp_hdr->cksum = cksum;
+
+	/* update IP header cksum */
+	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+}
+
+/**
+ * This function traverses the item-list to find one item that can be
+ * merged with the incoming packet. If merge successfully, the merged
+ * packets are chained together; if not, insert the incoming packet into
+ * the item-list.
+ */
+static uint64_t
+gro_tcp4_reassemble(struct gro_tcp_item_list *list,
+		struct rte_mbuf *pkt,
+		uint32_t pkt_sent_seq,
+		uint32_t pkt_idx)
+{
+	struct gro_tcp_item *items;
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+
+	items = list->items;
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, struct
+				ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	for (uint16_t i = 0; i < list->nb_item; i++) {
+		/* check if the two packets are neighbor */
+		if ((pkt_sent_seq ^ items[i].next_sent_seq) == 0) {
+			struct ipv4_hdr *ipv4_hdr2;
+			struct tcp_hdr *tcp_hdr2;
+			uint16_t ipv4_ihl2, tcp_hl2;
+			struct rte_mbuf *tail;
+
+			ipv4_hdr2 = (struct ipv4_hdr *) (rte_pktmbuf_mtod(
+						items[i].segment, struct ether_hdr *)
+					+ 1);
+
+			/* check if the option fields equal */
+			if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+				ipv4_ihl2 = IPv4_HDR_LEN(ipv4_hdr2);
+				tcp_hdr2 = (struct tcp_hdr *)
+					((char *)ipv4_hdr2 + ipv4_ihl2);
+				tcp_hl2 = TCP_HDR_LEN(tcp_hdr2);
+				if ((tcp_hl1 != tcp_hl2) ||
+						(memcmp(tcp_hdr1 + 1, tcp_hdr2 + 1,
+								tcp_hl2 - sizeof(struct tcp_hdr))
+						 != 0))
+					continue;
+			}
+			/* check if the packet length will be beyond 64K */
+			if (items[i].segment->pkt_len + tcp_dl1 > UINT16_MAX)
+				goto merge_fail;
+
+			/* remove the header of the incoming packet */
+			rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+					ipv4_ihl1 + tcp_hl1);
+			/* chain the two packet together */
+			tail = rte_pktmbuf_lastseg(items[i].segment);
+			tail->next = pkt;
+
+			/* update IP header for the merged packet */
+			ipv4_hdr2->total_length = rte_cpu_to_be_16(
+					rte_be_to_cpu_16(ipv4_hdr2->total_length)
+					+ tcp_dl1);
+
+			/* update the next expected sequence number */
+			items[i].next_sent_seq += tcp_dl1;
+
+			/* update mbuf metadata for the merged packet */
+			items[i].segment->nb_segs++;
+			items[i].segment->pkt_len += pkt->pkt_len;
+
+			return items[i].segment_idx + 1;
+		}
+	}
+
+merge_fail:
+	/* fail to merge. Insert the incoming packet into the item-list */
+	items[list->nb_item].next_sent_seq = pkt_sent_seq + tcp_dl1;
+	items[list->nb_item].segment = pkt;
+	items[list->nb_item].segment_idx = pkt_idx;
+	list->nb_item++;
+
+	return 0;
+}
+
+uint32_t
+rte_gro_tcp4_reassemble_burst(struct rte_hash *hash_tbl,
+		struct rte_mbuf **pkts,
+		const uint32_t nb_pkts)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
+	uint32_t sent_seq;
+	struct gro_tcp4_pre_rules key;
+	struct gro_tcp_item_list *list;
+
+	/* preallocated items. Each packet has nb_pkts items */
+	struct gro_tcp_item items_pool[nb_pkts * nb_pkts];
+
+	struct gro_tcp_info gro_infos[nb_pkts];
+	uint64_t ol_flags, idx;
+	int ret, is_performed_gro = 0;
+	uint32_t nb_after_gro = nb_pkts;
+
+	if (hash_tbl == NULL || pkts == NULL || nb_pkts == 0) {
+		printf("GRO TCP4: invalid parameters\n");
+		goto end;
+	}
+	memset(&key, 0, sizeof(struct gro_tcp4_pre_rules));
+
+	for (uint32_t i = 0; i < nb_pkts; i++) {
+		gro_infos[i].nb_merged_pkts = 1;
+
+		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
+		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+		ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+		/* 1. check if the packet should be processed */
+		if (ipv4_ihl < sizeof(struct ipv4_hdr))
+			continue;
+		if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
+			continue;
+		if ((ipv4_hdr->fragment_offset &
+					rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
+				== 0)
+			continue;
+
+		tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+		tcp_hl = TCP_HDR_LEN(tcp_hdr);
+		tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+			- tcp_hl;
+		if (tcp_dl == 0)
+			continue;
+
+		ol_flags = pkts[i]->ol_flags;
+		/**
+		 * 2. if HW rx checksum offload isn't enabled, recalculate the
+		 * checksum in SW. Then, check if the checksum is correct
+		 */
+		if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
+				PKT_RX_IP_CKSUM_UNKNOWN) {
+			if (ol_flags == PKT_RX_IP_CKSUM_BAD)
+				continue;
+		} else {
+			ip_cksum = ipv4_hdr->hdr_checksum;
+			ipv4_hdr->hdr_checksum = 0;
+			ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+			if (ipv4_hdr->hdr_checksum ^ ip_cksum)
+				continue;
+		}
+
+		if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
+				PKT_RX_L4_CKSUM_UNKNOWN) {
+			if (ol_flags == PKT_RX_L4_CKSUM_BAD)
+				continue;
+		} else {
+			tcp_cksum = tcp_hdr->cksum;
+			tcp_hdr->cksum = 0;
+			tcp_hdr->cksum = rte_ipv4_udptcp_cksum
+				(ipv4_hdr, tcp_hdr);
+			if (tcp_hdr->cksum ^ tcp_cksum)
+				continue;
+		}
+
+		/* 3. search for the corresponding item-list for the packet */
+		key.eth_saddr = eth_hdr->s_addr;
+		key.eth_daddr = eth_hdr->d_addr;
+		key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+		key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+		key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+		key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+		key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+		key.tcp_flags = tcp_hdr->tcp_flags;
+
+		sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+		ret = rte_hash_lookup_data(hash_tbl, &key, (void **)&list);
+
+		/* try to reassemble the packet */
+		if (ret >= 0) {
+			idx = gro_tcp4_reassemble(list, pkts[i], sent_seq, i);
+			/* merge successfully, update gro_info */
+			if (idx > 0) {
+				gro_infos[i].nb_merged_pkts = 0;
+				gro_infos[--idx].nb_merged_pkts++;
+				nb_after_gro--;
+			}
+		} else {
+			/**
+			 * fail to find a item-list. Allocate a new item-list
+			 * for the incoming packet and insert it into the hash
+			 * table.
+			 */
+			list = &(gro_infos[i].item_list);
+			list->items = &(items_pool[nb_pkts * i]);
+			list->nb_item = 1;
+			list->items[0].next_sent_seq = sent_seq + tcp_dl;
+			list->items[0].segment = pkts[i];
+			list->items[0].segment_idx = i;
+
+			if (unlikely(rte_hash_add_key_data(hash_tbl, &key, list)
+						!= 0))
+				printf("GRO TCP hash insert fail.\n");
+
+			is_performed_gro = 1;
+		}
+	}
+
+	/**
+	 * if there are packets been merged, update their checksum,
+	 * and remove useless packet addresses from packet array
+	 */
+	if (nb_after_gro < nb_pkts) {
+		struct rte_mbuf *tmp[nb_pkts];
+
+		memset(tmp, 0, sizeof(struct rte_mbuf *) * nb_pkts);
+		/* update checksum */
+		for (uint32_t i = 0, j = 0; i < nb_pkts; i++) {
+			if (gro_infos[i].nb_merged_pkts > 1)
+				gro_tcp4_cksum_update(pkts[i]);
+			if (gro_infos[i].nb_merged_pkts != 0)
+				tmp[j++] = pkts[i];
+		}
+		/* update the packet array */
+		rte_memcpy(pkts, tmp, nb_pkts * sizeof(struct rte_mbuf *));
+	}
+
+	/* if GRO is performed, reset the hash table */
+	if (is_performed_gro)
+		rte_hash_reset(hash_tbl);
+end:
+	return nb_after_gro;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..aa99a06
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,114 @@
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_malloc.h>
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define IPV4_HDR_DF_SHIFT 14
+#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
+
+#define RTE_GRO_TCP_HASH_ENTRIES_MIN RTE_HASH_BUCKET_ENTRIES
+#define RTE_GRO_TCP_HASH_ENTRIES_MAX RTE_HASH_ENTRIES_MAX
+
+/**
+ * key structure of TCP ipv4 hash table. It describes the prerequsite
+ * rules of merging packets.
+ */
+struct gro_tcp4_pre_rules {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+
+	uint8_t padding[3];
+};
+
+/**
+ * Item structure
+ */
+struct gro_tcp_item {
+	struct rte_mbuf *segment;	/**< packet address. */
+	uint32_t next_sent_seq;	/**< sequence number of the next packet. */
+	uint32_t segment_idx;	/**< packet index. */
+} __rte_cache_aligned;
+
+/**
+ * Item-list structure, which is the value in the TCP ipv4 hash table.
+ */
+struct gro_tcp_item_list {
+	struct gro_tcp_item *items;	/**< items array */
+	uint32_t nb_item;	/**< item number */
+};
+
+/**
+ * Local data structure. Every packet has an object of this structure,
+ * which is used for reassembling.
+ */
+struct gro_tcp_info {
+	struct gro_tcp_item_list item_list;	/**< preallocated item-list */
+	uint32_t nb_merged_pkts;	/**< the number of merged packets */
+};
+
+/**
+ * Create a new TCP ipv4 GRO lookup table.
+ *
+ * @param name
+ *	Lookup table name
+ * @param nb_entries
+ *  Lookup table elements number, whose value should be larger than or
+ *  equal to RTE_GRO_TCP_HASH_ENTRIES_MIN, and less than or equal to
+ *  RTE_GRO_TCP_HASH_ENTRIES_MAX, and should be power of two.
+ * @param socket_id
+ *  socket id
+ * @return
+ *  lookup table address
+ */
+struct rte_hash *
+rte_gro_tcp4_tbl_create(char *name, uint32_t nb_entries,
+		uint16_t socket_id);
+/**
+ * This function reassembles a bulk of TCP IPv4 packets. For non-TCP IPv4
+ * packets, the function won't process them.
+ *
+ * @param hash_tbl
+ *	Lookup table used to reassemble packets. It stores key-value pairs.
+ *	The key describes the prerequsite rules to merge two TCP IPv4 packets;
+ *	the value is a pointer pointing to a item-list, which contains
+ *	packets that have the same prerequisite TCP IPv4 rules. Note that
+ *	applications need to guarantee the hash_tbl is clean when first call
+ *	this function.
+ * @param pkts
+ *	Packets to reassemble.
+ * @param nb_pkts
+ *	The number of packets to reassemble.
+ * @return
+ *	The packet number after GRO. If reassemble successfully, the value is
+ *	less than nb_pkts; if not, the value is equal to nb_pkts.
+ */
+uint32_t
+rte_gro_tcp4_reassemble_burst(struct rte_hash *hash_tbl,
+		struct rte_mbuf **pkts,
+		const uint32_t nb_pkts);
+#endif
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0e0b600..521d20e 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH 2/2] app/testpmd: provide TCP IPv4 GRO function in iofwd mode
  2017-03-22  9:32 [PATCH 0/2] lib: add TCP IPv4 GRO support Jiayu Hu
  2017-03-22  9:32 ` [PATCH 1/2] lib: add Generic Receive Offload support for TCP IPv4 packets Jiayu Hu
@ 2017-03-22  9:32 ` Jiayu Hu
       [not found] ` <1B893F1B-4DA8-4F88-9583-8C0BAA570832@intel.com>
  2017-04-04 12:31 ` [PATCH v2 0/3] support GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-03-22  9:32 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, Jiayu Hu

This patch demonstrates the usage of the TCP IPv4 GRO library in testpmd.
Currently, only the iofwd mode supports this feature. By default, TCP
IPv4 GRO is turned off. The command, "gro tcp4 on", turns on this
feature; the command, "gro tcp4 off", turns off it.

Once the feature is turned on, all received packets are performed TCP
IPv4 GRO procedure before been forwarded.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c | 48 ++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/config.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/iofwd.c   |  7 ++++++
 app/test-pmd/testpmd.c | 10 +++++++++
 app/test-pmd/testpmd.h |  6 +++++
 5 files changed, 130 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 47f935d..618b9da 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro_tcp.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -396,6 +397,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro tcp4 (on|off)"
+			"    Enable or disable TCP IPv4 Receive Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3784,6 +3788,49 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET TCP IPv4 Receive Offload FOR RX PKTS *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t protocol;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	if (strcmp(res->protocol, "tcp4") == 0)
+		setup_gro_tcp4(res->mode);
+	else
+		printf("unsupported GRO protocol\n");
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_protocol =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			protocol, NULL);
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, NULL);
+
+cmdline_parse_inst_t cmd_set_gro = {
+	.f = cmd_set_gro_parsed,
+	.data = NULL,
+	.help_str = "gro tcp4 on|off",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_protocol,
+		(void *)&cmd_gro_mode,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -12464,6 +12511,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_set_gro,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 80491fc..b4144a3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -97,6 +97,7 @@
 #ifdef RTE_LIBRTE_IXGBE_PMD
 #include <rte_pmd_ixgbe.h>
 #endif
+#include <rte_gro_tcp.h>
 
 #include "testpmd.h"
 
@@ -2415,6 +2416,64 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro_tcp4(const char *mode)
+{
+	lcoreid_t lc_id;
+	streamid_t sm_id;
+	uint64_t nb_entries = 64;	/* lookup table entry number */
+
+	if (strcmp(mode, "on") == 0) {
+		if (test_done == 0) {
+			printf("Before enable TCP IPv4 GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		if (enable_gro_tcp4 == 1) {
+			printf("GRO TCP IPv4 has been turned on\n");
+			return;
+		}
+		for (lc_id = 0; lc_id < cur_fwd_config.nb_fwd_lcores; lc_id++) {
+			char name[20];
+
+			snprintf(name, sizeof(name), "GRO_TCP4_%u", lc_id);
+			if (gro_tcp4_tbls[lc_id])
+				rte_hash_free(gro_tcp4_tbls[lc_id]);
+
+			gro_tcp4_tbls[lc_id] = rte_gro_tcp4_tbl_create(
+					name,
+					nb_entries,
+					rte_lcore_to_socket_id
+					(fwd_lcores_cpuids[lc_id]));
+			if (gro_tcp4_tbls[lc_id] == NULL) {
+				enable_gro_tcp4 = 0;
+				return;
+			}
+			for (sm_id = fwd_lcores[lc_id]->stream_idx; sm_id <
+					fwd_lcores[lc_id]->stream_idx +
+					fwd_lcores[lc_id]->stream_nb; sm_id++) {
+				fwd_streams[sm_id]->tbl_idx = lc_id;
+			}
+		}
+		enable_gro_tcp4 = 1;
+	} else if (strcmp(mode, "off") == 0) {
+		if (test_done == 0) {
+			printf("Before disable TCP IPv4 GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		if (enable_gro_tcp4 == 0) {
+			printf("GRO TCP IPv4 has been turned off\n");
+			return;
+		}
+		for (lc_id = 0; lc_id < cur_fwd_config.nb_fwd_lcores; lc_id++) {
+			rte_hash_free(gro_tcp4_tbls[lc_id]);
+			gro_tcp4_tbls[lc_id] = NULL;
+		}
+		enable_gro_tcp4 = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 15cb4a2..ec05d6f 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -65,6 +65,7 @@
 #include <rte_ethdev.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro_tcp.h>
 
 #include "testpmd.h"
 
@@ -99,6 +100,12 @@ pkt_burst_io_forward(struct fwd_stream *fs)
 			pkts_burst, nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (enable_gro_tcp4) {
+		nb_rx = rte_gro_tcp4_reassemble_burst(
+				gro_tcp4_tbls[fs->tbl_idx],
+				pkts_burst,
+				nb_rx);
+	}
 	fs->rx_packets += nb_rx;
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index e04e215..caf8a61 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -273,6 +273,16 @@ uint32_t bypass_timeout = RTE_BYPASS_TMT_OFF;
 #endif
 
 /*
+ * TCP IPv4 lookup tables. Each lcore has a lookup table.
+ */
+struct rte_hash *gro_tcp4_tbls[RTE_MAX_LCORE];
+
+/*
+ * TCP IPv4 GRO enable/disable flag.
+ */
+uint8_t enable_gro_tcp4 = 0;	/* turn off by default */
+
+/*
  * Ethernet device configuration.
  */
 struct rte_eth_rxmode rx_mode = {
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8cf2860..bfd1e52 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -109,6 +109,8 @@ struct fwd_stream {
 	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
 	streamid_t peer_addr; /**< index of peer ethernet address of packets */
 
+	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
+
 	unsigned int retry_enabled;
 
 	/* "read-write" results */
@@ -420,6 +422,9 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+extern struct rte_hash *gro_tcp4_tbls[RTE_MAX_LCORE];
+extern uint8_t enable_gro_tcp4;
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -616,6 +621,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro_tcp4(const char *mode);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
       [not found]             ` <2601191342CEEE43887BDE71AB9772583FAD410A@IRSMSX109.ger.corp.intel.com>
@ 2017-03-24  2:23               ` Jiayu Hu
  2017-03-24  6:18                 ` Wiles, Keith
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-03-24  2:23 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Richardson, Bruce, Stephen Hemminger, Wiles, Keith, Yuanhan Liu,
	Yigit, Ferruh, dev

On Thu, Mar 23, 2017 at 09:12:56PM +0800, Ananyev, Konstantin wrote:
> Hi everyone,
> 
> > >
> > >
> > > > -----Original Message-----
> > > > From: Hu, Jiayu
> > > > Sent: Thursday, March 23, 2017 6:25 AM
> > > > To: Wiles, Keith <keith.wiles@intel.com>
> > > > Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Subject: Re: [dpdk-dev] [PATCH 0/2] lib: add TCP IPv4 GRO support
> > > >
> > > > On Thu, Mar 23, 2017 at 01:29:06PM +0800, Wiles, Keith wrote:
> > > > >
> > > > > > On Mar 22, 2017, at 9:15 PM, Hu, Jiayu <jiayu.hu@intel.com> wrote:
> > > > > >
> > > > > > On Wed, Mar 22, 2017 at 10:19:41PM +0800, Wiles, Keith wrote:
> > > > > >> Off list.
> > > > > >>
> > > > > >> This support for GRO, seems it needs to be a feature for all ethernet
> > > > devices and some way the developer can enable this feature like the other
> > > > offloads in DPDK. The GRO support should be set by the developer and then
> > > > the apis are called within ethdev or the PMD to process the packets. The
> > > > code is generic and creating a library is not the best overall solution
> > > > IMO.
> > > > > >
> > > > > > Indeed, in this patchset, GRO is just proposed as a application-used
> > > > library.
> > > > > > But actually, these GRO APIs can also be used by ethdev and PMD. For
> > > > > > example, register rte_gro_tcp4_reassemble_burst as a rx callback.
> > > > > > Therefore, maybe we can support GRO in two ways. The one is a
> > > > > > application-used library, the other is a device feature. Applications
> > > > decide which one to use.
> > > > > >
> > > > > > How do you think of it?
> > > > >
> > > > > I would prefer to use it only in a offload design, meaning the GRO is
> > > > just another ethernet offload the user can turn on. Using something like a
> > > > RX callback to handle the GRO for the developer. This way he just turns it
> > > > on in via a ethdev offload support feature and then setup the RX callback
> > > > via ethdev. The developer only needs to enable the feature and never calls
> > > > GRO APIs.
> > > >
> > > > The advantage of providing an application-used GRO library can enable more
> > > > flexibility to applications. Applications can flexibly use GRO functions
> > > > according to own realistic scenario. Therefore, I think it makes sense to
> > > > provide an application-used GRO library.
> > > > >
> > > > > Adding a new GRO library may not get much support and having a whole
> > > > library for GRO seems a bit odd.
> > > >
> > > > In my opinion, we just need to provide one GRO library. But it can be used
> > > > by ethernet devices and applications at the same time. Ethernet devices
> > > > use it to provide an offload feature. If applications want more
> > > > flexibility, they can just turn off this device feature, and use GRO APIs
> > > > directly.
> > > >
> > > > +Konstantin
> > > >
> > >
> > > [Apologies for the basic questions, I haven't studied the patchset in detail]
> > > Rather than adding a whole new library for it, can it just fit into librte_net or an existing lib? Are we planning a sample to show off
> > tighter integration with ethdev or changes to the ethdev library to transparently use the library when needed?
> > 
> > Currently, we have an individual library, librte_ip_frag, which provides IP fragment
> > and ressembly abilities. Similiarly, DPDK GRO will provide reassembly ability for
> > various of protocols, not only TCP. So I think it's good to make a new library for
> > this feature.
> > 
> > About GRO, we had a discussion two monthes ago. You can see it in
> > http://dpdk.org/ml/archives/dev/2017-January/056276.html
> > In that discussion, we agree to support GRO in two steps. The first is to implement
> > GRO as a standalone library, and see how much performance we can get. The second
> > step is to discuss how to integrate GRO into DPDK. Therefore, if we agree to support
> > GRO as a device feature, we need to discuss how to enable/disable this device feature.
> > Once we reach an agreement, there will be a sample to demonstrate the integration.
> 
> I think that having a separate library for GRO is a step in a right direction.
> >From my perspective - it provides a clean and flexible way to use that feature.
> If later someone would like to put GRO into ethdev layer (or particular PMD),
> he can use existing librte_gro for that.

Agree. I think introducing more flexibility is an important thing for applications.

> I didn't  have a closer look yet, but I think that caught my attention:
> API fir the lib seems too IPv4/TCP oriented -
> though I understand that the most common case and should be implemented first.
> I wonder can we have it a bit more generic and extendable, so user can specify what combination of protocols
> he is interested in (let say: ipv4/tcp,  ipv6/tcp, etc.).
> Even if right now we'll have it implemented only for ipv4/tcp.
> Then internally we can have some check is that supported or not and if yes setup things accordingly.

Indeed, current apis are too specific. It's not very friendly to applications.
Maybe we can use macro to define the combination of protocols, like GRO_TCP_IPV4
and GRO_UDP_IPV6; and provide a generic setup function and reassembly function.
Both of them perform different operations according to the macro value inputted
by the application.

> BTW, that's for 17.08, right?

Yes, it's for 17.08.

Jiayu
> 
> Konstantin
>   
>  

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24  2:23               ` [PATCH 0/2] lib: add TCP IPv4 GRO support Jiayu Hu
@ 2017-03-24  6:18                 ` Wiles, Keith
  2017-03-24  7:22                   ` Yuanhan Liu
  0 siblings, 1 reply; 141+ messages in thread
From: Wiles, Keith @ 2017-03-24  6:18 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: Ananyev, Konstantin, Richardson, Bruce, Stephen Hemminger,
	Yuanhan Liu, Yigit, Ferruh, dev


> On Mar 23, 2017, at 9:23 PM, Hu, Jiayu <jiayu.hu@intel.com> wrote:
> 
> On Thu, Mar 23, 2017 at 09:12:56PM +0800, Ananyev, Konstantin wrote:
>> Hi everyone,
>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Hu, Jiayu
>>>>> Sent: Thursday, March 23, 2017 6:25 AM
>>>>> To: Wiles, Keith <keith.wiles@intel.com>
>>>>> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>; Richardson, Bruce
>>>>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>>>>> Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>>> Subject: Re: [dpdk-dev] [PATCH 0/2] lib: add TCP IPv4 GRO support
>>>>> 
>>>>> On Thu, Mar 23, 2017 at 01:29:06PM +0800, Wiles, Keith wrote:
>>>>>> 
>>>>>>> On Mar 22, 2017, at 9:15 PM, Hu, Jiayu <jiayu.hu@intel.com> wrote:
>>>>>>> 
>>>>>>> On Wed, Mar 22, 2017 at 10:19:41PM +0800, Wiles, Keith wrote:
>>>>>>>> Off list.
>>>>>>>> 
>>>>>>>> This support for GRO, seems it needs to be a feature for all ethernet
>>>>> devices and some way the developer can enable this feature like the other
>>>>> offloads in DPDK. The GRO support should be set by the developer and then
>>>>> the apis are called within ethdev or the PMD to process the packets. The
>>>>> code is generic and creating a library is not the best overall solution
>>>>> IMO.
>>>>>>> 
>>>>>>> Indeed, in this patchset, GRO is just proposed as a application-used
>>>>> library.
>>>>>>> But actually, these GRO APIs can also be used by ethdev and PMD. For
>>>>>>> example, register rte_gro_tcp4_reassemble_burst as a rx callback.
>>>>>>> Therefore, maybe we can support GRO in two ways. The one is a
>>>>>>> application-used library, the other is a device feature. Applications
>>>>> decide which one to use.
>>>>>>> 
>>>>>>> How do you think of it?
>>>>>> 
>>>>>> I would prefer to use it only in a offload design, meaning the GRO is
>>>>> just another ethernet offload the user can turn on. Using something like a
>>>>> RX callback to handle the GRO for the developer. This way he just turns it
>>>>> on in via a ethdev offload support feature and then setup the RX callback
>>>>> via ethdev. The developer only needs to enable the feature and never calls
>>>>> GRO APIs.
>>>>> 
>>>>> The advantage of providing an application-used GRO library can enable more
>>>>> flexibility to applications. Applications can flexibly use GRO functions
>>>>> according to own realistic scenario. Therefore, I think it makes sense to
>>>>> provide an application-used GRO library.
>>>>>> 
>>>>>> Adding a new GRO library may not get much support and having a whole
>>>>> library for GRO seems a bit odd.
>>>>> 
>>>>> In my opinion, we just need to provide one GRO library. But it can be used
>>>>> by ethernet devices and applications at the same time. Ethernet devices
>>>>> use it to provide an offload feature. If applications want more
>>>>> flexibility, they can just turn off this device feature, and use GRO APIs
>>>>> directly.
>>>>> 
>>>>> +Konstantin
>>>>> 
>>>> 
>>>> [Apologies for the basic questions, I haven't studied the patchset in detail]
>>>> Rather than adding a whole new library for it, can it just fit into librte_net or an existing lib? Are we planning a sample to show off
>>> tighter integration with ethdev or changes to the ethdev library to transparently use the library when needed?
>>> 
>>> Currently, we have an individual library, librte_ip_frag, which provides IP fragment
>>> and ressembly abilities. Similiarly, DPDK GRO will provide reassembly ability for
>>> various of protocols, not only TCP. So I think it's good to make a new library for
>>> this feature.
>>> 
>>> About GRO, we had a discussion two monthes ago. You can see it in
>>> http://dpdk.org/ml/archives/dev/2017-January/056276.html
>>> In that discussion, we agree to support GRO in two steps. The first is to implement
>>> GRO as a standalone library, and see how much performance we can get. The second
>>> step is to discuss how to integrate GRO into DPDK. Therefore, if we agree to support
>>> GRO as a device feature, we need to discuss how to enable/disable this device feature.
>>> Once we reach an agreement, there will be a sample to demonstrate the integration.
>> 
>> I think that having a separate library for GRO is a step in a right direction.
>>> From my perspective - it provides a clean and flexible way to use that feature.
>> If later someone would like to put GRO into ethdev layer (or particular PMD),
>> he can use existing librte_gro for that.
> 
> Agree. I think introducing more flexibility is an important thing for applications.

Creating a new library just for GRO is not a reasonable solution, but adding that support to an existing library like librte_net would be cleaner and not create yet another library.

Creating more flexibility is not the best goal as we really want to make GRO easy and simple for the developer to use for any device without having to change his applications to take advantage of the feature. Some times providing more flexibility just means making it more complexed and more APIs the developer needs to understand. Providing GRO as a offload feature is the better direction as it makes it simple for an application to use.

If we provide GRO as a standard offload similar to the other offloads we currently have makes it easy for the developer. The best goal for a feature is the best performance for the application without having the application make even more APIs calls along with simple and easy to use.

> 
>> I didn't  have a closer look yet, but I think that caught my attention:
>> API fir the lib seems too IPv4/TCP oriented -
>> though I understand that the most common case and should be implemented first.
>> I wonder can we have it a bit more generic and extendable, so user can specify what combination of protocols
>> he is interested in (let say: ipv4/tcp,  ipv6/tcp, etc.).
>> Even if right now we'll have it implemented only for ipv4/tcp.
>> Then internally we can have some check is that supported or not and if yes setup things accordingly.
> 
> Indeed, current apis are too specific. It's not very friendly to applications.
> Maybe we can use macro to define the combination of protocols, like GRO_TCP_IPV4
> and GRO_UDP_IPV6; and provide a generic setup function and reassembly function.
> Both of them perform different operations according to the macro value inputted
> by the application.
> 
>> BTW, that's for 17.08, right?
> 
> Yes, it's for 17.08.
> 
> Jiayu
>> 
>> Konstantin
>> 
>> 

Regards,
Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24  6:18                 ` Wiles, Keith
@ 2017-03-24  7:22                   ` Yuanhan Liu
  2017-03-24  8:06                     ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Yuanhan Liu @ 2017-03-24  7:22 UTC (permalink / raw)
  To: Wiles, Keith
  Cc: Hu, Jiayu, Ananyev, Konstantin, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon

On Fri, Mar 24, 2017 at 06:18:48AM +0000, Wiles, Keith wrote:
> >> I think that having a separate library for GRO is a step in a right direction.
> >>> From my perspective - it provides a clean and flexible way to use that feature.
> >> If later someone would like to put GRO into ethdev layer (or particular PMD),
> >> he can use existing librte_gro for that.
> > 
> > Agree. I think introducing more flexibility is an important thing for applications.
> 
> Creating a new library just for GRO is not a reasonable solution, but adding that support to an existing library like librte_net would be cleaner and not create yet another library.

Librte_net seems like a good suggestion to me, especially when we are
considering to add GSO in future. The only concern to me is "net" may
be too generic. It maybe kind of hard to decide which should be in
librte_net, and which should be added as a standalone lib. For example,
shouldn't 'lpm' and 'ip_frag' also belong to librte_net?

> Creating more flexibility is not the best goal as we really want to make GRO easy and simple for the developer to use for any device without having to change his applications to take advantage of the feature. Some times providing more flexibility just means making it more complexed and more APIs the developer needs to understand. Providing GRO as a offload feature is the better direction as it makes it simple for an application to use.
> 
> If we provide GRO as a standard offload similar to the other offloads we currently have makes it easy for the developer. The best goal for a feature is the best performance for the application without having the application make even more APIs calls along with simple and easy to use.

In general, I'd agree with you, if no one is object to add a short piece
of code at the end of rte_eth_rx_burst:

     +       if (eth_gro_is_enabled(dev))
     +               nb_rx = rte_net_gro(...);
     +
             return nb_rx;
      }

Objections?

But one way or another, we need put the gro code at somewhere and we
need introduce a generic API for that. It could be librte_net as you
suggested. So the good thing is that we all at least come an agreement
that it should be implemented in lib, right? The only controversy is
should we export it to application and let them to invoke it, or hide
it inside rte_eth_rx_burst.

Though it may take some time for all of us to come an agreement on that,
but the good thing is that it would be a very trivial change once it's
done. Agree?

Thus I'd suggest Jiayu to focus on the the GRO code developement, such
as making it generic enough and adding other protocols support. And I
would like to ask you guys to help review them. Makes sense to all?

Thanks.

	--yliu


> >> I didn't  have a closer look yet, but I think that caught my attention:
> >> API fir the lib seems too IPv4/TCP oriented -
> >> though I understand that the most common case and should be implemented first.
> >> I wonder can we have it a bit more generic and extendable, so user can specify what combination of protocols
> >> he is interested in (let say: ipv4/tcp,  ipv6/tcp, etc.).
> >> Even if right now we'll have it implemented only for ipv4/tcp.
> >> Then internally we can have some check is that supported or not and if yes setup things accordingly.
> > 
> > Indeed, current apis are too specific. It's not very friendly to applications.
> > Maybe we can use macro to define the combination of protocols, like GRO_TCP_IPV4
> > and GRO_UDP_IPV6; and provide a generic setup function and reassembly function.
> > Both of them perform different operations according to the macro value inputted
> > by the application.
> > 
> >> BTW, that's for 17.08, right?
> > 
> > Yes, it's for 17.08.
> > 
> > Jiayu
> >> 
> >> Konstantin
> >> 
> >> 
> 
> Regards,
> Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24  7:22                   ` Yuanhan Liu
@ 2017-03-24  8:06                     ` Jiayu Hu
  2017-03-24 11:43                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-03-24  8:06 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Wiles, Keith, Ananyev, Konstantin, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon

On Fri, Mar 24, 2017 at 03:22:30PM +0800, Yuanhan Liu wrote:
> On Fri, Mar 24, 2017 at 06:18:48AM +0000, Wiles, Keith wrote:
> > >> I think that having a separate library for GRO is a step in a right direction.
> > >>> From my perspective - it provides a clean and flexible way to use that feature.
> > >> If later someone would like to put GRO into ethdev layer (or particular PMD),
> > >> he can use existing librte_gro for that.
> > > 
> > > Agree. I think introducing more flexibility is an important thing for applications.
> > 
> > Creating a new library just for GRO is not a reasonable solution, but adding that support to an existing library like librte_net would be cleaner and not create yet another library.
> 
> Librte_net seems like a good suggestion to me, especially when we are
> considering to add GSO in future. The only concern to me is "net" may
> be too generic. It maybe kind of hard to decide which should be in
> librte_net, and which should be added as a standalone lib. For example,
> shouldn't 'lpm' and 'ip_frag' also belong to librte_net?
> 
> > Creating more flexibility is not the best goal as we really want to make GRO easy and simple for the developer to use for any device without having to change his applications to take advantage of the feature. Some times providing more flexibility just means making it more complexed and more APIs the developer needs to understand. Providing GRO as a offload feature is the better direction as it makes it simple for an application to use.
> > 
> > If we provide GRO as a standard offload similar to the other offloads we currently have makes it easy for the developer. The best goal for a feature is the best performance for the application without having the application make even more APIs calls along with simple and easy to use.
> 
> In general, I'd agree with you, if no one is object to add a short piece
> of code at the end of rte_eth_rx_burst:
> 
>      +       if (eth_gro_is_enabled(dev))
>      +               nb_rx = rte_net_gro(...);
>      +
>              return nb_rx;
>       }
> 
> Objections?
> 
> But one way or another, we need put the gro code at somewhere and we
> need introduce a generic API for that. It could be librte_net as you
> suggested. So the good thing is that we all at least come an agreement
> that it should be implemented in lib, right? The only controversy is
> should we export it to application and let them to invoke it, or hide
> it inside rte_eth_rx_burst.
> 
> Though it may take some time for all of us to come an agreement on that,
> but the good thing is that it would be a very trivial change once it's
> done. Agree?

Agree.

> 
> Thus I'd suggest Jiayu to focus on the the GRO code developement, such
> as making it generic enough and adding other protocols support. And I
> would like to ask you guys to help review them. Makes sense to all?
> 

Agree again. No matter where to put GRO code, the apis should be generic
and extensible. And more protocols should be supported.

Thanks,
Jiayu

> Thanks.
> 
> 	--yliu
> 
> 
> > >> I didn't  have a closer look yet, but I think that caught my attention:
> > >> API fir the lib seems too IPv4/TCP oriented -
> > >> though I understand that the most common case and should be implemented first.
> > >> I wonder can we have it a bit more generic and extendable, so user can specify what combination of protocols
> > >> he is interested in (let say: ipv4/tcp,  ipv6/tcp, etc.).
> > >> Even if right now we'll have it implemented only for ipv4/tcp.
> > >> Then internally we can have some check is that supported or not and if yes setup things accordingly.
> > > 
> > > Indeed, current apis are too specific. It's not very friendly to applications.
> > > Maybe we can use macro to define the combination of protocols, like GRO_TCP_IPV4
> > > and GRO_UDP_IPV6; and provide a generic setup function and reassembly function.
> > > Both of them perform different operations according to the macro value inputted
> > > by the application.
> > > 
> > >> BTW, that's for 17.08, right?
> > > 
> > > Yes, it's for 17.08.
> > > 
> > > Jiayu
> > >> 
> > >> Konstantin
> > >> 
> > >> 
> > 
> > Regards,
> > Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24  8:06                     ` Jiayu Hu
@ 2017-03-24 11:43                       ` Ananyev, Konstantin
  2017-03-24 14:37                         ` Wiles, Keith
  2017-03-29 10:47                         ` Morten Brørup
  0 siblings, 2 replies; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-03-24 11:43 UTC (permalink / raw)
  To: Hu, Jiayu, Yuanhan Liu
  Cc: Wiles, Keith, Richardson, Bruce, Stephen Hemminger, Yigit,
	Ferruh, dev, Liang, Cunming, Thomas Monjalon



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Friday, March 24, 2017 8:07 AM
> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Cc: Wiles, Keith <keith.wiles@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Thomas Monjalon <thomas.monjalon@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH 0/2] lib: add TCP IPv4 GRO support
> 
> On Fri, Mar 24, 2017 at 03:22:30PM +0800, Yuanhan Liu wrote:
> > On Fri, Mar 24, 2017 at 06:18:48AM +0000, Wiles, Keith wrote:
> > > >> I think that having a separate library for GRO is a step in a right direction.
> > > >>> From my perspective - it provides a clean and flexible way to use that feature.
> > > >> If later someone would like to put GRO into ethdev layer (or particular PMD),
> > > >> he can use existing librte_gro for that.
> > > >
> > > > Agree. I think introducing more flexibility is an important thing for applications.
> > >
> > > Creating a new library just for GRO is not a reasonable solution, but adding that support to an existing library like librte_net would
> be cleaner and not create yet another library.
> >
> > Librte_net seems like a good suggestion to me, especially when we are
> > considering to add GSO in future. The only concern to me is "net" may
> > be too generic. It maybe kind of hard to decide which should be in
> > librte_net, and which should be added as a standalone lib. For example,
> > shouldn't 'lpm' and 'ip_frag' also belong to librte_net?

About librte_gro vs librte_net:

Right now librte_net is quite lightweight one - it mostly contains a net protocols definitions
plus some extra helper functions: to parse the l2/l3 headers to determine ptype, to calculate cksum, etc.
GRO code is quite different - it has to allocate and manage hash table(s), etc.
Again my understanding it would keep growing (with new proto support).
Again as mentioned above if GRO should go into librte_net, then librte_ipfrag and future GSO should also be here.
Which would create quite a monstrous library.  
So I think it is better to keep librte_net small and tidy and put GRO functionality into the new library.

> >
> > > Creating more flexibility is not the best goal as we really want to make GRO easy and simple for the developer to use for any
> device without having to change his applications to take advantage of the feature. Some times providing more flexibility just means
> making it more complexed and more APIs the developer needs to understand. Providing GRO as a offload feature is the better
> direction as it makes it simple for an application to use.
> > >
> > > If we provide GRO as a standard offload similar to the other offloads we currently have makes it easy for the developer. The best
> goal for a feature is the best performance for the application without having the application make even more APIs calls along with
> simple and easy to use.
> >
> > In general, I'd agree with you, if no one is object to add a short piece
> > of code at the end of rte_eth_rx_burst:
> >
> >      +       if (eth_gro_is_enabled(dev))
> >      +               nb_rx = rte_net_gro(...);
> >      +
> >              return nb_rx;
> >       }
> >
> > Objections?

I'd better not to open that door.
If we'll allow that for GRO - we'll have to allow that for every other stuff:
- ip reassembly
- l3/l4 cksum calculation if underlying HW doesn't support it
- SW ptype recognition
- etc.

Adding all these things into rx_burst() would most likely slow things down
(remember it is a data-path) and pretty soon would bring rx_burst() into
messy monster that would be hard to maintain and debug.    

My preference would be to keep rte_ethdev data-path as small and tidy as possible.
If in future we'd really like to introduce all these SW things into dev layer -
my preference would be to create some sort of new abstraction on top of current ethdev:
rte_eth_smartdev or so.
So it would be:
rte_eth_smartdev_rx_burst(....)
{
   nb_rx =  rte_eth_rx_burst(...);
   /* apply GRO, reassembly, etc. */
  ...
} 

Something similar with what 6Wind trying to introduce with their failsafe dev concept.

> >
> > But one way or another, we need put the gro code at somewhere and we
> > need introduce a generic API for that. It could be librte_net as you
> > suggested. So the good thing is that we all at least come an agreement
> > that it should be implemented in lib, right? The only controversy is
> > should we export it to application and let them to invoke it, or hide
> > it inside rte_eth_rx_burst.
> >
> > Though it may take some time for all of us to come an agreement on that,
> > but the good thing is that it would be a very trivial change once it's
> > done. Agree?
> 
> Agree.
> 
> >
> > Thus I'd suggest Jiayu to focus on the the GRO code developement, such
> > as making it generic enough and adding other protocols support. And I
> > would like to ask you guys to help review them. Makes sense to all?
> >
> 
> Agree again. No matter where to put GRO code, the apis should be generic
> and extensible. And more protocols should be supported.

Yep, that's what my take from the beginning:
Let's develop a librte_gro first and make it successful, then we can think should
we (and how) put into ethdev layer.

Konstantin

> 
> Thanks,
> Jiayu
> 
> > Thanks.
> >
> > 	--yliu
> >
> >
> > > >> I didn't  have a closer look yet, but I think that caught my attention:
> > > >> API fir the lib seems too IPv4/TCP oriented -
> > > >> though I understand that the most common case and should be implemented first.
> > > >> I wonder can we have it a bit more generic and extendable, so user can specify what combination of protocols
> > > >> he is interested in (let say: ipv4/tcp,  ipv6/tcp, etc.).
> > > >> Even if right now we'll have it implemented only for ipv4/tcp.
> > > >> Then internally we can have some check is that supported or not and if yes setup things accordingly.
> > > >
> > > > Indeed, current apis are too specific. It's not very friendly to applications.
> > > > Maybe we can use macro to define the combination of protocols, like GRO_TCP_IPV4
> > > > and GRO_UDP_IPV6; and provide a generic setup function and reassembly function.
> > > > Both of them perform different operations according to the macro value inputted
> > > > by the application.
> > > >
> > > >> BTW, that's for 17.08, right?
> > > >
> > > > Yes, it's for 17.08.
> > > >
> > > > Jiayu
> > > >>
> > > >> Konstantin
> > > >>
> > > >>
> > >
> > > Regards,
> > > Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24 11:43                       ` Ananyev, Konstantin
@ 2017-03-24 14:37                         ` Wiles, Keith
  2017-03-24 14:59                           ` Olivier Matz
  2017-03-29 10:47                         ` Morten Brørup
  1 sibling, 1 reply; 141+ messages in thread
From: Wiles, Keith @ 2017-03-24 14:37 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Hu, Jiayu, Yuanhan Liu, Richardson, Bruce, Stephen Hemminger,
	Yigit, Ferruh, dev, Liang, Cunming, Thomas Monjalon


> On Mar 24, 2017, at 6:43 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
> 
> 
> 
>> -----Original Message-----
>> From: Hu, Jiayu
>> Sent: Friday, March 24, 2017 8:07 AM
>> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Cc: Wiles, Keith <keith.wiles@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Thomas Monjalon <thomas.monjalon@6wind.com>
>> Subject: Re: [dpdk-dev] [PATCH 0/2] lib: add TCP IPv4 GRO support
>> 
>> On Fri, Mar 24, 2017 at 03:22:30PM +0800, Yuanhan Liu wrote:
>>> On Fri, Mar 24, 2017 at 06:18:48AM +0000, Wiles, Keith wrote:
>>>>>> I think that having a separate library for GRO is a step in a right direction.
>>>>>>> From my perspective - it provides a clean and flexible way to use that feature.
>>>>>> If later someone would like to put GRO into ethdev layer (or particular PMD),
>>>>>> he can use existing librte_gro for that.
>>>>> 
>>>>> Agree. I think introducing more flexibility is an important thing for applications.
>>>> 
>>>> Creating a new library just for GRO is not a reasonable solution, but adding that support to an existing library like librte_net would
>> be cleaner and not create yet another library.
>>> 
>>> Librte_net seems like a good suggestion to me, especially when we are
>>> considering to add GSO in future. The only concern to me is "net" may
>>> be too generic. It maybe kind of hard to decide which should be in
>>> librte_net, and which should be added as a standalone lib. For example,
>>> shouldn't 'lpm' and 'ip_frag' also belong to librte_net?
> 
> About librte_gro vs librte_net:
> 
> Right now librte_net is quite lightweight one - it mostly contains a net protocols definitions
> plus some extra helper functions: to parse the l2/l3 headers to determine ptype, to calculate cksum, etc.
> GRO code is quite different - it has to allocate and manage hash table(s), etc.
> Again my understanding it would keep growing (with new proto support).
> Again as mentioned above if GRO should go into librte_net, then librte_ipfrag and future GSO should also be here.
> Which would create quite a monstrous library.  
> So I think it is better to keep librte_net small and tidy and put GRO functionality into the new library.

The size of a library is not a concern we have some pretty big ones already.

-rw-rw-r-- 2 rkwiles rkwiles 1.2M Mar 21 15:22 librte_acl.a
-rw-rw-r-- 1 rkwiles rkwiles  39K Mar 21 15:22 librte_cfgfile.a
-rw-rw-r-- 1 rkwiles rkwiles 292K Mar 21 15:22 librte_cmdline.a
-rw-rw-r-- 1 rkwiles rkwiles 211K Mar 21 15:23 librte_cryptodev.a
-rw-rw-r-- 1 rkwiles rkwiles  75K Mar 21 15:22 librte_distributor.a
-rw-rw-r-- 1 rkwiles rkwiles 1.4M Mar 21 15:22 librte_eal.a
-rw-rw-r-- 1 rkwiles rkwiles 323K Mar 21 15:23 librte_efd.a
-rw-rw-r-- 1 rkwiles rkwiles 374K Mar 22 09:31 librte_ethdev.a
-rw-rw-r-- 1 rkwiles rkwiles 675K Mar 21 15:22 librte_hash.a
-rw-rw-r-- 1 rkwiles rkwiles 366K Mar 21 15:23 librte_ip_frag.a
-rw-rw-r-- 1 rkwiles rkwiles  29K Mar 21 15:22 librte_jobstats.a
-rw-rw-r-- 1 rkwiles rkwiles 167K Mar 21 15:23 librte_kni.a
-rw-rw-r-- 1 rkwiles rkwiles  19K Mar 21 15:22 librte_kvargs.a
-rw-rw-r-- 1 rkwiles rkwiles 309K Mar 21 15:22 librte_lpm.a
-rw-rw-r-- 1 rkwiles rkwiles  93K Mar 21 15:22 librte_mbuf.a
-rw-rw-r-- 1 rkwiles rkwiles 270K Mar 21 15:22 librte_mempool.a
-rw-rw-r-- 1 rkwiles rkwiles  17K Mar 21 15:22 librte_meter.a
-rw-rw-r-- 1 rkwiles rkwiles  71K Mar 21 15:22 librte_net.a
-rw-rw-r-- 1 rkwiles rkwiles 211K Mar 21 15:23 librte_pdump.a
-rw-rw-r-- 1 rkwiles rkwiles 140K Mar 21 15:23 librte_pipeline.a
-rw-rw-r-- 1 rkwiles rkwiles 1.4M Mar 21 15:23 librte_port.a
-rw-rw-r-- 1 rkwiles rkwiles 122K Mar 21 15:22 librte_power.a
-rw-rw-r-- 1 rkwiles rkwiles 154K Mar 21 15:22 librte_reorder.a
-rw-rw-r-- 1 rkwiles rkwiles  63K Mar 21 15:22 librte_ring.a
-rw-rw-r-- 1 rkwiles rkwiles 377K Mar 21 15:23 librte_sched.a
-rw-rw-r-- 1 rkwiles rkwiles 1.4M Mar 21 15:23 librte_table.a
-rw-rw-r-- 1 rkwiles rkwiles  53K Mar 21 15:22 librte_timer.a
-rw-rw-r-- 1 rkwiles rkwiles 609K Mar 21 15:23 librte_vhost.a

Removed the PMD archives and ls is not a great way to get the true size compared to using ‘size’.

If you look at the size values, the ‘size *.a’ is interesting too.

The size of librte_net is 71K + ip_frag 366K is pretty small compared to a few others. I would assume GRO is pretty small too, so adding GRO into librte_net is very reasonable. We could leave ip_frag out as currently it is a standalone lib, but continue to add GSO to librte_net. I would not assume the size would that large and it seems like the best place to put the code.

If you still want to create a gso lib then I guess you can, just seems unreasonable to me.

> 
>>> 
>>>> Creating more flexibility is not the best goal as we really want to make GRO easy and simple for the developer to use for any
>> device without having to change his applications to take advantage of the feature. Some times providing more flexibility just means
>> making it more complexed and more APIs the developer needs to understand. Providing GRO as a offload feature is the better
>> direction as it makes it simple for an application to use.
>>>> 
>>>> If we provide GRO as a standard offload similar to the other offloads we currently have makes it easy for the developer. The best
>> goal for a feature is the best performance for the application without having the application make even more APIs calls along with
>> simple and easy to use.
>>> 
>>> In general, I'd agree with you, if no one is object to add a short piece
>>> of code at the end of rte_eth_rx_burst:
>>> 
>>>     +       if (eth_gro_is_enabled(dev))
>>>     +               nb_rx = rte_net_gro(...);
>>>     +
>>>             return nb_rx;
>>>      }
>>> 
>>> Objections?

Why do we need to modify every driver, why not use the RX callback feature then no drivers were harmed in this patch :-)

> 
> I'd better not to open that door.
> If we'll allow that for GRO - we'll have to allow that for every other stuff:
> - ip reassembly
> - l3/l4 cksum calculation if underlying HW doesn't support it
> - SW ptype recognition
> - etc.
> 
> Adding all these things into rx_burst() would most likely slow things down
> (remember it is a data-path) and pretty soon would bring rx_burst() into
> messy monster that would be hard to maintain and debug.    
> 
> My preference would be to keep rte_ethdev data-path as small and tidy as possible.
> If in future we'd really like to introduce all these SW things into dev layer -
> my preference would be to create some sort of new abstraction on top of current ethdev:
> rte_eth_smartdev or so.
> So it would be:
> rte_eth_smartdev_rx_burst(....)
> {
>   nb_rx =  rte_eth_rx_burst(...);
>   /* apply GRO, reassembly, etc. */
>  ...
> } 

Adding a new API is still not required, as the RX callback code is already in place for these types of post processing of mbufs.

> 
> Something similar with what 6Wind trying to introduce with their failsafe dev concept.

The 6Wind is a different story here and not a post processing of RX mbufs.

> 
>>> 
>>> But one way or another, we need put the gro code at somewhere and we
>>> need introduce a generic API for that. It could be librte_net as you
>>> suggested. So the good thing is that we all at least come an agreement
>>> that it should be implemented in lib, right? The only controversy is
>>> should we export it to application and let them to invoke it, or hide
>>> it inside rte_eth_rx_burst.
>>> 
>>> Though it may take some time for all of us to come an agreement on that,
>>> but the good thing is that it would be a very trivial change once it's
>>> done. Agree?
>> 
>> Agree.
>> 
>>> 
>>> Thus I'd suggest Jiayu to focus on the the GRO code developement, such
>>> as making it generic enough and adding other protocols support. And I
>>> would like to ask you guys to help review them. Makes sense to all?
>>> 
>> 
>> Agree again. No matter where to put GRO code, the apis should be generic
>> and extensible. And more protocols should be supported.
> 
> Yep, that's what my take from the beginning:
> Let's develop a librte_gro first and make it successful, then we can think should
> we (and how) put into ethdev layer.

Let not create a gro library and put the code into librte_net as size is not a concern yet and it is the best place to put the code. As for ip_frag someone can move it into librte_net if someone writes the patch.

> 
> Konstantin
> 
>> 
>> Thanks,
>> Jiayu
>> 
>>> Thanks.
>>> 
>>> 	--yliu
>>> 
>>> 
>>>>>> I didn't  have a closer look yet, but I think that caught my attention:
>>>>>> API fir the lib seems too IPv4/TCP oriented -
>>>>>> though I understand that the most common case and should be implemented first.
>>>>>> I wonder can we have it a bit more generic and extendable, so user can specify what combination of protocols
>>>>>> he is interested in (let say: ipv4/tcp,  ipv6/tcp, etc.).
>>>>>> Even if right now we'll have it implemented only for ipv4/tcp.
>>>>>> Then internally we can have some check is that supported or not and if yes setup things accordingly.
>>>>> 
>>>>> Indeed, current apis are too specific. It's not very friendly to applications.
>>>>> Maybe we can use macro to define the combination of protocols, like GRO_TCP_IPV4
>>>>> and GRO_UDP_IPV6; and provide a generic setup function and reassembly function.
>>>>> Both of them perform different operations according to the macro value inputted
>>>>> by the application.
>>>>> 
>>>>>> BTW, that's for 17.08, right?
>>>>> 
>>>>> Yes, it's for 17.08.
>>>>> 
>>>>> Jiayu
>>>>>> 
>>>>>> Konstantin
>>>>>> 
>>>>>> 
>>>> 
>>>> Regards,
>>>> Keith

Regards,
Keith


^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24 14:37                         ` Wiles, Keith
@ 2017-03-24 14:59                           ` Olivier Matz
  2017-03-24 15:07                             ` Wiles, Keith
  0 siblings, 1 reply; 141+ messages in thread
From: Olivier Matz @ 2017-03-24 14:59 UTC (permalink / raw)
  To: Wiles, Keith
  Cc: Ananyev, Konstantin, Hu, Jiayu, Yuanhan Liu, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon

On Fri, 24 Mar 2017 14:37:04 +0000, "Wiles, Keith" <keith.wiles@intel.com> wrote:
> > On Mar 24, 2017, at 6:43 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
> > 
> > 
> >   

[...]

> > Yep, that's what my take from the beginning:
> > Let's develop a librte_gro first and make it successful, then we can think should
> > we (and how) put into ethdev layer.  
> 
> Let not create a gro library and put the code into librte_net as size is not a concern yet and it is the best place to put the code. As for ip_frag someone can move it into librte_net if someone writes the patch.

The size of a library _is_ an argument. Not the binary size in bytes, but
its API, because that's what the developper sees. Today, librte_net contains
protocol headers definitions and some network helpers, and the API surface
is already quite big (look at the number of lines of .h files).

I really like having a library name which matches its content.
The anwser to "what can I find in librte_gro?" is quite obvious.


Regards
Olivier

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24 14:59                           ` Olivier Matz
@ 2017-03-24 15:07                             ` Wiles, Keith
  2017-03-28 13:40                               ` Wiles, Keith
  0 siblings, 1 reply; 141+ messages in thread
From: Wiles, Keith @ 2017-03-24 15:07 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Ananyev, Konstantin, Hu, Jiayu, Yuanhan Liu, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon


> On Mar 24, 2017, at 9:59 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> 
> On Fri, 24 Mar 2017 14:37:04 +0000, "Wiles, Keith" <keith.wiles@intel.com> wrote:
>>> On Mar 24, 2017, at 6:43 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
>>> 
>>> 
>>> 
> 
> [...]
> 
>>> Yep, that's what my take from the beginning:
>>> Let's develop a librte_gro first and make it successful, then we can think should
>>> we (and how) put into ethdev layer.  
>> 
>> Let not create a gro library and put the code into librte_net as size is not a concern yet and it is the best place to put the code. As for ip_frag someone can move it into librte_net if someone writes the patch.
> 
> The size of a library _is_ an argument. Not the binary size in bytes, but
> its API, because that's what the developper sees. Today, librte_net contains
> protocol headers definitions and some network helpers, and the API surface
> is already quite big (look at the number of lines of .h files).
> 
> I really like having a library name which matches its content.
> The anwser to "what can I find in librte_gro?" is quite obvious.

If we are going to talk about API surface area lets talk about ethdev then :-)

Ok, lets create a new librte_gro, but I am not convinced it is reasonable. Maybe a better generic name is needed if we are going to add GSO to the library too. So a new name for the lib is better then librte_gro, unless you are going to create another library for GSO.

I still think the design needs to be integrated in as a real offload as I stated before and that is not something I am willing let drop.

> 
> 
> Regards
> Olivier

Regards,
Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24 15:07                             ` Wiles, Keith
@ 2017-03-28 13:40                               ` Wiles, Keith
  2017-03-28 13:57                                 ` Hu, Jiayu
  0 siblings, 1 reply; 141+ messages in thread
From: Wiles, Keith @ 2017-03-28 13:40 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Ananyev, Konstantin, Hu, Jiayu, Yuanhan Liu, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon


> On Mar 24, 2017, at 10:07 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
>> 
>> On Mar 24, 2017, at 9:59 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
>> 
>> On Fri, 24 Mar 2017 14:37:04 +0000, "Wiles, Keith" <keith.wiles@intel.com> wrote:
>>>> On Mar 24, 2017, at 6:43 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
>>>> 
>>>> 
>>>> 
>> 
>> [...]
>> 
>>>> Yep, that's what my take from the beginning:
>>>> Let's develop a librte_gro first and make it successful, then we can think should
>>>> we (and how) put into ethdev layer.  
>>> 
>>> Let not create a gro library and put the code into librte_net as size is not a concern yet and it is the best place to put the code. As for ip_frag someone can move it into librte_net if someone writes the patch.
>> 
>> The size of a library _is_ an argument. Not the binary size in bytes, but
>> its API, because that's what the developper sees. Today, librte_net contains
>> protocol headers definitions and some network helpers, and the API surface
>> is already quite big (look at the number of lines of .h files).
>> 
>> I really like having a library name which matches its content.
>> The anwser to "what can I find in librte_gro?" is quite obvious.
> 
> If we are going to talk about API surface area lets talk about ethdev then :-)
> 
> Ok, lets create a new librte_gro, but I am not convinced it is reasonable. Maybe a better generic name is needed if we are going to add GSO to the library too. So a new name for the lib is better then librte_gro, unless you are going to create another library for GSO.
> 
> I still think the design needs to be integrated in as a real offload as I stated before and that is not something I am willing let drop.

I guess we agree to create the library librte_gro and the current code needs to be updated to be include as a real offload support to DPDK as I see no real conclusion to this topic.

> 
>> 
>> 
>> Regards
>> Olivier
> 
> Regards,
> Keith

Regards,
Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-28 13:40                               ` Wiles, Keith
@ 2017-03-28 13:57                                 ` Hu, Jiayu
  2017-03-28 16:06                                   ` Wiles, Keith
  0 siblings, 1 reply; 141+ messages in thread
From: Hu, Jiayu @ 2017-03-28 13:57 UTC (permalink / raw)
  To: Wiles, Keith, Olivier Matz
  Cc: Ananyev, Konstantin, Yuanhan Liu, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon



> -----Original Message-----
> From: Wiles, Keith
> Sent: Tuesday, March 28, 2017 9:40 PM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
> <jiayu.hu@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Thomas
> Monjalon <thomas.monjalon@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH 0/2] lib: add TCP IPv4 GRO support
> 
> 
> > On Mar 24, 2017, at 10:07 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >
> >>
> >> On Mar 24, 2017, at 9:59 AM, Olivier Matz <olivier.matz@6wind.com>
> wrote:
> >>
> >> On Fri, 24 Mar 2017 14:37:04 +0000, "Wiles, Keith"
> <keith.wiles@intel.com> wrote:
> >>>> On Mar 24, 2017, at 6:43 AM, Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>
> >> [...]
> >>
> >>>> Yep, that's what my take from the beginning:
> >>>> Let's develop a librte_gro first and make it successful, then we can think
> should
> >>>> we (and how) put into ethdev layer.
> >>>
> >>> Let not create a gro library and put the code into librte_net as size is not
> a concern yet and it is the best place to put the code. As for ip_frag someone
> can move it into librte_net if someone writes the patch.
> >>
> >> The size of a library _is_ an argument. Not the binary size in bytes, but
> >> its API, because that's what the developper sees. Today, librte_net
> contains
> >> protocol headers definitions and some network helpers, and the API
> surface
> >> is already quite big (look at the number of lines of .h files).
> >>
> >> I really like having a library name which matches its content.
> >> The anwser to "what can I find in librte_gro?" is quite obvious.
> >
> > If we are going to talk about API surface area lets talk about ethdev then :-)
> >
> > Ok, lets create a new librte_gro, but I am not convinced it is reasonable.
> Maybe a better generic name is needed if we are going to add GSO to the
> library too. So a new name for the lib is better then librte_gro, unless you are
> going to create another library for GSO.
> >
> > I still think the design needs to be integrated in as a real offload as I stated
> before and that is not something I am willing let drop.
> 
> I guess we agree to create the library librte_gro and the current code needs
> to be updated to be include as a real offload support to DPDK as I see no real
> conclusion to this topic.

OK, I have known your opinions, and I agree with you. I will provide a real offloading
example to demonstrate the usage of librte_gro in the next patch.

Thanks,
Jiayu
> 
> >
> >>
> >>
> >> Regards
> >> Olivier
> >
> > Regards,
> > Keith
> 
> Regards,
> Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-28 13:57                                 ` Hu, Jiayu
@ 2017-03-28 16:06                                   ` Wiles, Keith
  0 siblings, 0 replies; 141+ messages in thread
From: Wiles, Keith @ 2017-03-28 16:06 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: Olivier Matz, Ananyev, Konstantin, Yuanhan Liu, Richardson,
	Bruce, Stephen Hemminger, Yigit, Ferruh, dev, Liang, Cunming,
	Thomas Monjalon


> On Mar 28, 2017, at 8:57 AM, Hu, Jiayu <jiayu.hu@intel.com> wrote:
> 
> 
> 
>> -----Original Message-----
>> From: Wiles, Keith
>> Sent: Tuesday, March 28, 2017 9:40 PM
>> To: Olivier Matz <olivier.matz@6wind.com>
>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Hu, Jiayu
>> <jiayu.hu@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>;
>> Richardson, Bruce <bruce.richardson@intel.com>; Stephen Hemminger
>> <stephen@networkplumber.org>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> dev@dpdk.org; Liang, Cunming <cunming.liang@intel.com>; Thomas
>> Monjalon <thomas.monjalon@6wind.com>
>> Subject: Re: [dpdk-dev] [PATCH 0/2] lib: add TCP IPv4 GRO support
>> 
>> 
>>> On Mar 24, 2017, at 10:07 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>> 
>>>> 
>>>> On Mar 24, 2017, at 9:59 AM, Olivier Matz <olivier.matz@6wind.com>
>> wrote:
>>>> 
>>>> On Fri, 24 Mar 2017 14:37:04 +0000, "Wiles, Keith"
>> <keith.wiles@intel.com> wrote:
>>>>>> On Mar 24, 2017, at 6:43 AM, Ananyev, Konstantin
>> <konstantin.ananyev@intel.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> [...]
>>>> 
>>>>>> Yep, that's what my take from the beginning:
>>>>>> Let's develop a librte_gro first and make it successful, then we can think
>> should
>>>>>> we (and how) put into ethdev layer.
>>>>> 
>>>>> Let not create a gro library and put the code into librte_net as size is not
>> a concern yet and it is the best place to put the code. As for ip_frag someone
>> can move it into librte_net if someone writes the patch.
>>>> 
>>>> The size of a library _is_ an argument. Not the binary size in bytes, but
>>>> its API, because that's what the developper sees. Today, librte_net
>> contains
>>>> protocol headers definitions and some network helpers, and the API
>> surface
>>>> is already quite big (look at the number of lines of .h files).
>>>> 
>>>> I really like having a library name which matches its content.
>>>> The anwser to "what can I find in librte_gro?" is quite obvious.
>>> 
>>> If we are going to talk about API surface area lets talk about ethdev then :-)
>>> 
>>> Ok, lets create a new librte_gro, but I am not convinced it is reasonable.
>> Maybe a better generic name is needed if we are going to add GSO to the
>> library too. So a new name for the lib is better then librte_gro, unless you are
>> going to create another library for GSO.
>>> 
>>> I still think the design needs to be integrated in as a real offload as I stated
>> before and that is not something I am willing let drop.
>> 
>> I guess we agree to create the library librte_gro and the current code needs
>> to be updated to be include as a real offload support to DPDK as I see no real
>> conclusion to this topic.
> 
> OK, I have known your opinions, and I agree with you. I will provide a real offloading
> example to demonstrate the usage of librte_gro in the next patch.

Thanks

> 
> Thanks,
> Jiayu
>> 
>>> 
>>>> 
>>>> 
>>>> Regards
>>>> Olivier
>>> 
>>> Regards,
>>> Keith
>> 
>> Regards,
>> Keith

Regards,
Keith

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-24 11:43                       ` Ananyev, Konstantin
  2017-03-24 14:37                         ` Wiles, Keith
@ 2017-03-29 10:47                         ` Morten Brørup
  2017-03-29 12:12                           ` Wiles, Keith
  1 sibling, 1 reply; 141+ messages in thread
From: Morten Brørup @ 2017-03-29 10:47 UTC (permalink / raw)
  To: Ananyev, Konstantin, Hu, Jiayu, Yuanhan Liu, Wiles, Keith,
	Richardson, Bruce, Stephen Hemminger, Yigit, Ferruh, Liang,
	Cunming, Thomas Monjalon
  Cc: dev

I have two points to this discussion:

GRO/LRO must be disabled by default! Truly transparent network appliances, such as classic Layer 3 routers and Layer 2 switches (and our SmartShare StraightShaper appliance), may not want to merge multiple smaller packets into one larger packet, regardless of any processing performance benefit. Also refer to Wikipedia (https://en.wikipedia.org/wiki/Large_receive_offload), especially reference 9 (https://bugzilla.redhat.com/show_bug.cgi?id=772317). And if any of you developers don't intuitively agree, just ask any old network engineer about all the mess he had to deal with "back in the days" when packet sizes were important... flaky path MTU discovery, MSS clamping for PPPoE and VPN tunnels, packet reassembly causing performance degradation and packet loss on underpowered VPN routers, etc.

We should consider librte_net a library for miscellaneous utilities, i.e. mainly stateless functions. Functions for packet merging (of multiple IP packets or actual IP fragments), which clearly requires a lot of memory and statefulness, does not belong here. This is worth noting, not only for the GRO/GSO library, but for similar future discussions. (The size of the compiled library is irrelevant - only its purpose matters.)


Med venlig hilsen / kind regards

Morten Brørup
CTO


SmartShare Systems A/S
Tonsbakken 16-18
DK-2740 Skovlunde
Denmark

Office      +45 70 20 00 93
Direct      +45 89 93 50 22
Mobile     +45 25 40 82 12

mb@smartsharesystems.com
www.smartsharesystems.com

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH 0/2] lib: add TCP IPv4 GRO support
  2017-03-29 10:47                         ` Morten Brørup
@ 2017-03-29 12:12                           ` Wiles, Keith
  0 siblings, 0 replies; 141+ messages in thread
From: Wiles, Keith @ 2017-03-29 12:12 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Ananyev, Konstantin, Hu, Jiayu, Yuanhan Liu, Richardson, Bruce,
	Stephen Hemminger, Yigit, Ferruh, Liang, Cunming,
	Thomas Monjalon, dev


> On Mar 29, 2017, at 5:47 AM, Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> I have two points to this discussion:
> 
> GRO/LRO must be disabled by default! Truly transparent network appliances, such as classic Layer 3 routers and Layer 2 switches (and our SmartShare StraightShaper appliance), may not want to merge multiple smaller packets into one larger packet, regardless of any processing performance benefit. Also refer to Wikipedia (https://en.wikipedia.org/wiki/Large_receive_offload), especially reference 9 (https://bugzilla.redhat.com/show_bug.cgi?id=772317). And if any of you developers don't intuitively agree, just ask any old network engineer about all the mess he had to deal with "back in the days" when packet sizes were important... flaky path MTU discovery, MSS clamping for PPPoE and VPN tunnels, packet reassembly causing performance degradation and packet loss on underpowered VPN routers, etc.

I agree we should not have any offload enabled by default and I assumed that was not the case here unless I missed something. My point was to make GRO more transparent when enabled instead of the application having to use a different API for GRO when we can put this feature inline with current rx_burst APIs. Using the RX callbacks or some other method as long as the performance of the current rx_burst APIs is not effected when the GRO is not enabled.

> 
> We should consider librte_net a library for miscellaneous utilities, i.e. mainly stateless functions. Functions for packet merging (of multiple IP packets or actual IP fragments), which clearly requires a lot of memory and statefulness, does not belong here. This is worth noting, not only for the GRO/GSO library, but for similar future discussions. (The size of the compiled library is irrelevant - only its purpose matters.)

I already decided to except the new library, but the name is not great as other features could be added to the new library like LRO. If we name it GRO then it can only be GRO.

> 
> 
> Med venlig hilsen / kind regards
> 
> Morten Brørup
> CTO
> 
> 
> SmartShare Systems A/S
> Tonsbakken 16-18
> DK-2740 Skovlunde
> Denmark
> 
> Office      +45 70 20 00 93
> Direct      +45 89 93 50 22
> Mobile     +45 25 40 82 12
> 
> mb@smartsharesystems.com
> www.smartsharesystems.com

Regards,
Keith


^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v2 0/3] support GRO in DPDK
  2017-03-22  9:32 [PATCH 0/2] lib: add TCP IPv4 GRO support Jiayu Hu
                   ` (2 preceding siblings ...)
       [not found] ` <1B893F1B-4DA8-4F88-9583-8C0BAA570832@intel.com>
@ 2017-04-04 12:31 ` Jiayu Hu
  2017-04-04 12:31   ` [PATCH v2 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                     ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-04-04 12:31 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, stephen, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
add GRO support in DPDK.

DPDK GRO is designed as a device ability, which is turned off by default.
The unit to enable/disable GRO is port. And once a port is enabled GRO,
all of its queues will reassemble packets as many as possible.

For applications, the procedure of merging packets is entirely invisible.
To use GRO, they just need to decide which ports need/needn't GRO and
invoke GRO enabling/disabling functions for these ports. For a port, if
it's enabled GRO, one generic reassembly function is registered as a RX
callback for all of its queues. That is, the reassembly procedure is
performed inside rte_eth_rx_burst.

This patchset is to support GRO in DPDK. The first patch is to provide a
GRO API framework, which enables applications to use GRO ability and
enable developers to add GRO supports for specific protocols. The second
patch supports TCP/IPv4 GRO. The last patch demonstrates how to use GRO
ability in app/testpmd.

We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
the performance gains from DPDK GRO. Specifically, the experiment
environment is:
a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. iperf client
runs on p0, which sends TCP/IPv4 packets;
c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
VM via vhost-user and virtio-kernel. The VM runs iperf server, whose IP
is 1.1.2.4;
d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
iperf client and server use the following commands:
	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
	- server: iperf -s -f g
Two test cases are:
a. w/o DPDK GRO: run testpmd without GRO
b. w DPDK GRO: testpmd enables GRO for p1
Result:
With GRO, the throughput improvement is around 50%.

Change log
==========
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable GRO feature

 app/test-pmd/cmdline.c          |  45 ++++++
 app/test-pmd/config.c           |  26 ++++
 app/test-pmd/iofwd.c            |   1 +
 app/test-pmd/testpmd.c          |   5 +
 app/test-pmd/testpmd.h          |   3 +
 config/common_base              |   5 +
 lib/Makefile                    |   1 +
 lib/librte_gro/Makefile         |  51 +++++++
 lib/librte_gro/rte_gro.c        | 293 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h        |  29 ++++
 lib/librte_gro/rte_gro_common.h |  77 +++++++++++
 lib/librte_gro/rte_gro_tcp.c    | 270 ++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h    |  95 +++++++++++++
 mk/rte.app.mk                   |   1 +
 14 files changed, 902 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_common.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v2 1/3] lib: add Generic Receive Offload API framework
  2017-04-04 12:31 ` [PATCH v2 0/3] support GRO in DPDK Jiayu Hu
@ 2017-04-04 12:31   ` Jiayu Hu
  2017-04-04 12:31   ` [PATCH v2 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-04-04 12:31 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, stephen, Jiayu Hu

In DPDK, GRO is a device ability. The unit of enabling/disabling GRO is
port. To support GRO, this patch implements a GRO API framework, which
includes two parts. One is external functions provided to applications to
use GRO ability; the other is a generic reassembly function provided to
devices.

For applications, DPDK GRO provides three external functions to
enable/disable GRO:
- rte_gro_init: initialize GRO environment;
- rte_gro_enable: enable GRO for all queues of a given port;
- rte_gro_disable: disable GRO for all queues of a given port.
Before using GRO, applications should explicitly call rte_gro_init to
initizalize GRO environment. After that, applications can call
rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
specific ports.

DPDK GRO has a generic reassembly function, rte_gro_reassemble_burst,
which processes all inputted packets in a burst-mode. If a port is
enabled GRO, rte_gro_reassemble_burst is registered as a RX callback for
all queues of this port; if the port wants to disable GRO, all the 
callbacks of its queues will be removed. Therefore, GRO procedure is
performed in ethdev layer.

In DPDK GRO, we name GRO types according to packet types, like TCP/IPV4
GRO. Each GRO type has a reassembly function, which is in charge of
processing packets of own type. Each reassembly function uses a hashing
table to merge packets. The structures of hashing table differ from GRO
types. That is, each GRO type defines own hashing table structure.
rte_gro_reassemble_burst calls these specific reassembly functions
according to packet types, and packets with unsupported protocols types
are not processed.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base              |   5 +
 lib/Makefile                    |   1 +
 lib/librte_gro/Makefile         |  50 ++++++++++
 lib/librte_gro/rte_gro.c        | 216 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h        |  29 ++++++
 lib/librte_gro/rte_gro_common.h |  75 ++++++++++++++
 mk/rte.app.mk                   |   1 +
 7 files changed, 377 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_common.h

diff --git a/config/common_base b/config/common_base
index 41191c8..720dbc4 100644
--- a/config/common_base
+++ b/config/common_base
@@ -612,6 +612,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 531b162..74637c7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -98,6 +98,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..fb3a36c
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..9b1df53
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,216 @@
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_hash.h>
+#include <stdint.h>
+#include <rte_malloc.h>
+
+#include "rte_gro.h"
+#include "rte_gro_common.h"
+
+gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
+gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
+
+struct rte_gro_status *gro_status;
+
+/**
+ * Internal function. It creates one hashing table for all
+ * DPDK-supported GRO types, and all of them are stored in an object
+ * of struct rte_gro_tbl.
+ *
+ * @param name
+ *  Name for GRO lookup table
+ * @param nb_entries
+ *  Element number of each hashing table
+ * @param socket_id
+ *  socket id
+ * @param gro_tbl
+ *  gro_tbl points to a rte_gro_tbl object, which will be initalized
+ *  inside rte_gro_tbl_setup.
+ * @return
+ *  If create successfully, return a positive value; if not, return
+ *  a negative value.
+ */
+static int
+rte_gro_tbl_setup(char *name, uint32_t nb_entries,
+		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	const uint32_t len = strlen(name) + 10;
+	char tbl_name[len];
+
+	for (int i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
+		sprintf(tbl_name, "%s_%u", name, i);
+		create_tbl_fn = tbl_create_functions[i];
+		if (create_tbl_fn && (create_tbl_fn(name,
+						nb_entries,
+						socket_id,
+						&(gro_tbl->
+							lkp_tbls[i].hash_tbl))
+					< 0)) {
+			return -1;
+		}
+		gro_tbl->lkp_tbls[i].gro_type = i;
+	}
+	return 1;
+}
+
+/**
+ * Internal function. It frees all the hashing tables stored in
+ * the given struct rte_gro_tbl object.
+ */
+static void
+rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	if (gro_tbl == NULL)
+		return;
+	for (int i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
+		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
+		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
+		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
+	}
+}
+
+/**
+ * Internal function. It performs all supported GRO types on inputted
+ * packets. For example, if current DPDK GRO supports TCP/IPv4 and
+ * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
+ * packets. Packets of unsupported GRO types won't be processed. For
+ * ethernet devices, which want to support GRO, this function is used to
+ * registered as RX callback for all queues.
+ *
+ * @param pkts
+ *  Packets to reassemble.
+ * @param nb_pkts
+ *  The number of packets to reassemble.
+ * @param gro_tbl
+ *  pointer points to an object of struct rte_gro_tbl, which has been
+ *  initialized by rte_gro_tbl_setup.
+ * @return
+ *  Packet number after GRO. If reassemble successfully, the value is
+ *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
+ *  parameters are invalid, return 0.
+ */
+static uint16_t
+rte_gro_reassemble_burst(uint8_t port __rte_unused,
+		uint16_t queue __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		uint16_t max_pkts __rte_unused,
+		void *gro_tbl)
+{
+	if ((gro_tbl == NULL) || (pkts == NULL)) {
+		printf("invalid parameters for GRO.\n");
+		return 0;
+	}
+	uint16_t nb_after_gro = nb_pkts;
+
+	return nb_after_gro;
+}
+
+void
+rte_gro_init(void)
+{
+	uint8_t nb_port;
+	uint16_t nb_queue;
+	struct rte_eth_dev_info dev_info;
+
+	/* if init already, return immediately */
+	if (gro_status) {
+		printf("repeatly init GRO environment\n");
+		return;
+	}
+
+	gro_status = (struct rte_gro_status *)rte_zmalloc(
+			NULL,
+			sizeof(struct rte_gro_status),
+			0);
+
+	nb_port = rte_eth_dev_count();
+	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
+			NULL,
+			nb_port * sizeof(struct gro_port_status),
+			0);
+	gro_status->nb_port = nb_port;
+
+	for (uint8_t i = 0; i < nb_port; i++) {
+		rte_eth_dev_info_get(i, &dev_info);
+		nb_queue = dev_info.nb_rx_queues;
+		gro_status->ports[i].gro_tbls =
+			(struct rte_gro_tbl **)rte_zmalloc(
+					NULL,
+					nb_queue * sizeof(struct rte_gro_tbl *),
+					0);
+		gro_status->ports[i].gro_cbs =
+			(struct rte_eth_rxtx_callback **)
+			rte_zmalloc(
+					NULL,
+					nb_queue *
+					sizeof(struct rte_eth_rxtx_callback *),
+					0);
+	}
+}
+
+void
+rte_gro_enable(uint8_t port_id, uint16_t socket_id)
+{
+	if (gro_status->ports[port_id].gro_enable) {
+		printf("port %u has enabled GRO\n", port_id);
+		return;
+	}
+	uint16_t nb_queue;
+	struct rte_eth_dev_info dev_info;
+	char tbl_name[20];
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+	nb_queue = dev_info.nb_rx_queues;
+
+	for (uint16_t i = 0; i < nb_queue; i++) {
+		struct rte_gro_tbl *gro_tbl;
+
+		/* allocate hashing tables for this port */
+		sprintf(tbl_name, "GRO_TBL_%u", port_id);
+		gro_tbl = (struct rte_gro_tbl *)rte_malloc
+			(NULL, sizeof(struct rte_gro_tbl), 0);
+		rte_gro_tbl_setup(tbl_name,
+				GRO_DEFAULT_LOOKUP_TABLE_ENTRY_NB,
+				socket_id,
+				gro_tbl);
+		gro_status->ports[port_id].gro_tbls[i] = gro_tbl;
+		/**
+		 * register GRO reassembly function as a rx callback for each
+		 * queue of this port.
+		 */
+		gro_status->ports[port_id].gro_cbs[i] =
+			rte_eth_add_rx_callback
+			(port_id, i,
+			 rte_gro_reassemble_burst,
+			 gro_tbl);
+	}
+	gro_status->ports[port_id].gro_enable = 1;
+}
+
+void
+rte_gro_disable(uint8_t port_id)
+{
+	if (gro_status->ports[port_id].gro_enable == 0) {
+		printf("port %u has disabled GRO\n", port_id);
+		return;
+	}
+	uint16_t nb_queue;
+	struct rte_eth_dev_info dev_info;
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+	nb_queue = dev_info.nb_rx_queues;
+
+	for (uint16_t i = 0; i < nb_queue; i++) {
+		/* free all hashing tables */
+		rte_gro_tbl_destroy(gro_status->ports[port_id].gro_tbls[i]);
+		gro_status->ports[port_id].gro_tbls[i] = NULL;
+
+		/* remove GRO rx callback */
+		rte_eth_remove_rx_callback(port_id, i,
+				gro_status->ports[port_id].gro_cbs[i]);
+		gro_status->ports[port_id].gro_cbs[i] = NULL;
+	}
+	gro_status->ports[port_id].gro_enable = 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..c84378e
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,29 @@
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+/**
+ * Initialize GRO environment for all ports. It should be called after
+ * configuring all ethernet devices, and should be called just once.
+ */
+void
+rte_gro_init(void);
+
+/**
+ * Enable GRO for a given port.
+ * @param port_id
+ *  The id of the port that is to enable GRO.
+ * @param socket_id
+ *  The NUMA socket id to which the ethernet device is connected.
+ *  By default, it's value is SOCKET_ID_ANY.
+ */
+void
+rte_gro_enable(uint8_t port_id, uint16_t socket_id);
+
+/**
+ * Disable GRO for a given port.
+ * @param port_id
+ *  The idd of the port that disables GRO.
+ */
+void
+rte_gro_disable(uint8_t port_id);
+#endif
diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
new file mode 100644
index 0000000..611d833
--- /dev/null
+++ b/lib/librte_gro/rte_gro_common.h
@@ -0,0 +1,75 @@
+#ifndef _GRO_COMMON_H_
+#define _GRO_COMMON_H_
+
+/**
+ * the maximum number of supported GRO types
+ */
+#define GRO_TYPE_MAX_NB 256
+/**
+ * flag indicates empty GRO type
+ */
+#define GRO_EMPTY_TYPE 255
+/**
+ * current supported GRO types number
+ */
+#define GRO_SUPPORT_TYPE_NB 0
+
+/**
+ * default element number of the hashing table
+ */
+#define GRO_DEFAULT_LOOKUP_TABLE_ENTRY_NB 64
+
+/**
+ * Structure to store addresses of all hashing tables.
+ */
+struct rte_gro_lkp_tbl {
+	struct rte_hash *hash_tbl;
+	uint8_t gro_type;
+};
+struct rte_gro_tbl {
+	struct rte_gro_lkp_tbl lkp_tbls[GRO_SUPPORT_TYPE_NB];
+};
+
+/**
+ * Item-list structure.
+ */
+struct gro_item_list {
+	void *items;	/**< item array */
+	uint16_t nb_item;	/**< item number */
+};
+
+/**
+ * Each packet has an object of gro_info, which records the GRO
+ * information related to this packet.
+ */
+struct gro_info {
+	struct gro_item_list item_list;	/**< pre-allocated item-list */
+	/**< packets number that are merged with it */
+	uint16_t nb_merged_packets;
+	uint8_t gro_type;	/**< GRO type that the packet is performed */
+};
+
+/**
+ * Record GRO information for each port.
+ */
+struct gro_port_status {
+	struct rte_gro_tbl **gro_tbls;
+	struct rte_eth_rxtx_callback **gro_cbs;
+	uint8_t gro_enable;	/* flag indicates if the port enables GRO */
+};
+
+struct rte_gro_status {
+	struct gro_port_status *ports;
+	uint8_t nb_port;
+};
+
+typedef int (*gro_tbl_create_fn)(
+		char *name,
+		uint32_t nb_entries,
+		uint16_t socket_id,
+		struct rte_hash **hash_tbl);
+
+typedef int32_t (*gro_reassemble_fn)(
+		struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list);
+#endif
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 336e448..d143def 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v2 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-04-04 12:31 ` [PATCH v2 0/3] support GRO in DPDK Jiayu Hu
  2017-04-04 12:31   ` [PATCH v2 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-04-04 12:31   ` Jiayu Hu
  2017-04-04 12:31   ` [PATCH v2 3/3] app/testpmd: enable GRO feature Jiayu Hu
  2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-04-04 12:31 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, stephen, Jiayu Hu

Introduce three new functions to support TCP/IPv4 GRO.
- rte_gro_tcp4_tbl_create: create a TCP/IPv4 hashing table;
- rte_gro_tcp4_reassemble: try to reassemble an incoming TCP/IPv4 packet
    with existed TCP/IPv4 packets;
- rte_gro_tcp4_cksum_update: update TCP and IPv4 checksums.

rte_gro_tcp4_reassemble uses a TCP/IPv4 hashing table to implement packet
reassembly. The TCP/IPv4 hashing table is a cuckoo hashing table, whose
keys are rules of merging TCP/IPv4 packets, and whose values point to
item-lists. The item-list contains items, which point to packets with
the same keys.

That rte_gro_tcp4_reassemble processes an incoming packet requires four
steps:
a. check if the packet should be processed. TCP/IPv4 GRO doesn't process
packets of the following types:
	- packets without data;
	- packets with wrong checksums;
	- fragmented packets.
b. lookup the hashing table to find a item-list, which stores packets that
may be able to merge with the incoming one;
c. if find the item-list, check all of its packets. If find one that
is the neighbor of the incoming packet, chaining them together and update
packet length and mbuf metadata; if don't find, allocate a new item for
the incoming packet and insert it into the item-list;
d. if fail to find a item-list, allocate a new item-list for the incoming
packet and insert it into the hash table.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 lib/librte_gro/Makefile         |   1 +
 lib/librte_gro/rte_gro.c        |  81 +++++++++++-
 lib/librte_gro/rte_gro_common.h |   4 +-
 lib/librte_gro/rte_gro_tcp.c    | 270 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h    |  95 ++++++++++++++
 5 files changed, 448 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index fb3a36c..c45f4f2 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 9b1df53..e1ac062 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -6,9 +6,12 @@
 
 #include "rte_gro.h"
 #include "rte_gro_common.h"
+#include "rte_gro_tcp.h"
 
-gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
-gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
+gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {
+	rte_gro_tcp4_reassemble, NULL};
+gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
+	rte_gro_tcp4_tbl_create, NULL};
 
 struct rte_gro_status *gro_status;
 
@@ -102,7 +105,81 @@ rte_gro_reassemble_burst(uint8_t port __rte_unused,
 		printf("invalid parameters for GRO.\n");
 		return 0;
 	}
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t l3proc_type;
+
+	/* record packet GRO info */
+	struct gro_info gro_infos[nb_pkts];
+	struct rte_gro_lkp_tbl *lkp_tbls = ((struct rte_gro_tbl *)
+			gro_tbl)->lkp_tbls;
+	int32_t ret;
 	uint16_t nb_after_gro = nb_pkts;
+	uint8_t dirty_tbls[GRO_SUPPORT_TYPE_NB] = {0};
+
+	/* pre-allocate tcp items for TCP GRO */
+	struct gro_tcp_item tcp_items[nb_pkts * nb_pkts];
+
+	for (uint16_t i = 0; i < nb_pkts; i++) {
+		gro_infos[i].nb_merged_packets = 1;	/* initial value */
+		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
+		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
+		if (l3proc_type == ETHER_TYPE_IPv4) {
+			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+			if (ipv4_hdr->next_proto_id == IPPROTO_TCP) {
+				gro_infos[i].gro_type = GRO_TCP_IPV4;
+				/* allocate an item-list for the packet */
+				gro_infos[i].item_list.items =
+					&tcp_items[i * nb_pkts];
+				gro_infos[i].item_list.nb_item = 1;
+				/**
+				 * fill the packet information into the first
+				 * item of the item-list
+				 */
+				tcp_items[i * nb_pkts].segment = pkts[i];
+				tcp_items[i * nb_pkts].segment_idx = i;
+
+				ret = rte_gro_tcp4_reassemble(
+						lkp_tbls[GRO_TCP_IPV4].hash_tbl,
+						&gro_infos[i].item_list);
+				if (ret > 0) {
+					gro_infos[i].nb_merged_packets = 0;
+					gro_infos[--ret].nb_merged_packets++;
+					nb_after_gro--;
+				}
+				dirty_tbls[GRO_TCP_IPV4] = ret >= 0 ? 1 : 0;
+			}
+		}
+	}
+	/**
+	 * if there are packets been merged, update their headers,
+	 * and remove useless packet addresses from the inputted
+	 * packet array.
+	 */
+	if (nb_after_gro < nb_pkts) {
+		struct rte_mbuf *tmp[nb_pkts];
+
+		memset(tmp, 0,
+				sizeof(struct rte_mbuf *) * nb_pkts);
+		for (uint16_t i = 0, j = 0; i < nb_pkts; i++) {
+			if (gro_infos[i].nb_merged_packets > 1) {
+				switch (gro_infos[i].gro_type) {
+				case GRO_TCP_IPV4:
+					gro_tcp4_cksum_update(pkts[i]);
+					break;
+				}
+			}
+			if (gro_infos[i].nb_merged_packets != 0)
+				tmp[j++] = pkts[i];
+		}
+		rte_memcpy(pkts, tmp,
+				nb_pkts * sizeof(struct rte_mbuf *));
+	}
+
+	/* if GRO is performed, reset the hash table */
+	for (uint16_t i = 0; i < GRO_SUPPORT_TYPE_NB; i++)
+		if (dirty_tbls[i])
+			rte_hash_reset(lkp_tbls[i].hash_tbl);
 
 	return nb_after_gro;
 }
diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
index 611d833..7b5d9ec 100644
--- a/lib/librte_gro/rte_gro_common.h
+++ b/lib/librte_gro/rte_gro_common.h
@@ -12,7 +12,9 @@
 /**
  * current supported GRO types number
  */
-#define GRO_SUPPORT_TYPE_NB 0
+#define GRO_SUPPORT_TYPE_NB 1
+
+#define GRO_TCP_IPV4 0	/**< TCP/IPv4 GRO */
 
 /**
  * default element number of the hashing table
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..f17d9f5
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,270 @@
+#include "rte_gro_tcp.h"
+
+int
+rte_gro_tcp4_tbl_create(char *name,
+		uint32_t nb_entries, uint16_t socket_id,
+		struct rte_hash **hash_tbl)
+{
+	struct rte_hash_parameters ht_param = {
+		.entries = nb_entries,
+		.name = name,
+		.key_len = sizeof(struct gro_tcp4_pre_rules),
+		.hash_func = rte_jhash,
+		.hash_func_init_val = 0,
+		.socket_id = socket_id,
+	};
+
+	*hash_tbl = rte_hash_create(&ht_param);
+	if (likely(*hash_tbl != NULL))
+		return 0;
+	return -1;
+}
+
+/* update TCP IPv4 checksum */
+void
+rte_gro_tcp4_cksum_update(struct rte_mbuf *pkt)
+{
+	uint32_t len, offset, cksum;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, cksum_pld;
+
+	if (pkt == NULL)
+		return;
+
+	len = pkt->pkt_len;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+
+	offset = sizeof(struct ether_hdr) + ipv4_ihl;
+	len -= offset;
+
+	/* TCP cksum without IP pseudo header */
+	ipv4_hdr->hdr_checksum = 0;
+	tcp_hdr->cksum = 0;
+	if (rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld) < 0) {
+		printf("invalid param for raw_cksum_mbuf\n");
+		return;
+	}
+	/* IP pseudo header cksum */
+	cksum = cksum_pld;
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
+
+	/* combine TCP checksum and IP pseudo header checksum */
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	cksum = (cksum == 0) ? 0xffff : cksum;
+	tcp_hdr->cksum = cksum;
+
+	/* update IP header cksum */
+	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+}
+
+/**
+ * This function traverses the item-list to find one item that can be
+ * merged with the incoming packet. If merge successfully, the merged
+ * packets are chained together; if not, insert the incoming packet into
+ * the item-list.
+ */
+static int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		uint16_t pkt_idx,
+		uint32_t pkt_sent_seq,
+		struct gro_item_list *list)
+{
+	struct gro_tcp_item *items;
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+
+	items = (struct gro_tcp_item *)list->items;
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, struct
+				ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	for (uint16_t i = 0; i < list->nb_item; i++) {
+		/* check if the two packets are neighbor */
+		if ((pkt_sent_seq ^ items[i].next_sent_seq) == 0) {
+			struct ipv4_hdr *ipv4_hdr2;
+			struct tcp_hdr *tcp_hdr2;
+			uint16_t ipv4_ihl2, tcp_hl2;
+			struct rte_mbuf *tail;
+
+			ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(
+						items[i].segment,
+						struct ether_hdr *)
+					+ 1);
+
+			/* check if the option fields equal */
+			if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+				ipv4_ihl2 = IPv4_HDR_LEN(ipv4_hdr2);
+				tcp_hdr2 = (struct tcp_hdr *)
+					((char *)ipv4_hdr2 + ipv4_ihl2);
+				tcp_hl2 = TCP_HDR_LEN(tcp_hdr2);
+				if ((tcp_hl1 != tcp_hl2) ||
+						(memcmp(tcp_hdr1 + 1,
+								tcp_hdr2 + 1,
+								tcp_hl2 - sizeof
+								(struct tcp_hdr))
+						 != 0))
+					continue;
+			}
+			/* check if the packet length will be beyond 64K */
+			if (items[i].segment->pkt_len + tcp_dl1 > UINT16_MAX)
+				goto merge_fail;
+
+			/* remove the header of the incoming packet */
+			rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+					ipv4_ihl1 + tcp_hl1);
+			/* chain the two packet together */
+			tail = rte_pktmbuf_lastseg(items[i].segment);
+			tail->next = pkt;
+
+			/* update IP header for the merged packet */
+			ipv4_hdr2->total_length = rte_cpu_to_be_16(
+					rte_be_to_cpu_16(
+						ipv4_hdr2->total_length)
+					+ tcp_dl1);
+
+			/* update the next expected sequence number */
+			items[i].next_sent_seq += tcp_dl1;
+
+			/* update mbuf metadata for the merged packet */
+			items[i].segment->nb_segs++;
+			items[i].segment->pkt_len += pkt->pkt_len;
+
+			return items[i].segment_idx + 1;
+		}
+	}
+
+merge_fail:
+	/* fail to merge. Insert the incoming packet into the item-list */
+	items[list->nb_item].next_sent_seq = pkt_sent_seq + tcp_dl1;
+	items[list->nb_item].segment = pkt;
+	items[list->nb_item].segment_idx = pkt_idx;
+	list->nb_item++;
+
+	return 0;
+}
+
+/**
+ * Traverse the item-list to find a packet to merge with the incoming
+ * one.
+ * @param hash_tbl
+ *  TCP IPv4 lookup table
+ * @param item_list
+ *  Pre-allocated item-list, in which the first item stores the packet
+ *  to process.
+ * @return
+ *  If the incoming packet merges with one packet successfully, return
+ *  the index + 1 of the merged packet; if the incoming packet hasn't
+ *  been performed GRO, return -1; if the incoming packet is performed
+ *  GRO but fail to merge, return 0.
+ */
+int32_t
+rte_gro_tcp4_reassemble(struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
+	struct gro_tcp4_pre_rules key = {0};
+	struct gro_item_list *list;
+	uint64_t ol_flags;
+	uint32_t sent_seq;
+	int32_t ret = -1;
+
+	/* get the packet to process */
+	struct gro_tcp_item *items = item_list->items;
+	struct rte_mbuf *pkt = items[0].segment;
+	uint32_t pkt_idx = items[0].segment_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* 1. check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto end;
+	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
+		goto end;
+	if ((ipv4_hdr->fragment_offset &
+				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
+			== 0)
+		goto end;
+
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto end;
+
+	ol_flags = pkt->ol_flags;
+	/**
+	 * 2. if HW rx checksum offload isn't enabled, recalculate the
+	 * checksum in SW. Then, check if the checksum is correct
+	 */
+	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
+			PKT_RX_IP_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
+			goto end;
+	} else {
+		ip_cksum = ipv4_hdr->hdr_checksum;
+		ipv4_hdr->hdr_checksum = 0;
+		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
+			goto end;
+	}
+
+	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
+			PKT_RX_L4_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
+			goto end;
+	} else {
+		tcp_cksum = tcp_hdr->cksum;
+		tcp_hdr->cksum = 0;
+		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
+			(ipv4_hdr, tcp_hdr);
+		if (tcp_hdr->cksum ^ tcp_cksum)
+			goto end;
+	}
+
+	/* 3. search for the corresponding item-list for the packet */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	if (rte_hash_lookup_data(hash_tbl, &key, (void **)&list) >= 0) {
+		ret = gro_tcp4_reassemble(pkt, pkt_idx, sent_seq, list);
+	} else {
+		/**
+		 * fail to find a item-list. Record the sequence number of the
+		 * incoming packet's neighbor into its item_list, and insert it
+		 * into the hash table.
+		 */
+		items[0].next_sent_seq = sent_seq + tcp_dl;
+		if (unlikely(rte_hash_add_key_data(hash_tbl, &key, item_list)
+					!= 0))
+			printf("GRO TCP hash insert fail.\n");
+		else
+			ret = 0;
+	}
+end:
+	return ret;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..52be9cd
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,95 @@
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+
+#include "rte_gro_common.h"
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define IPV4_HDR_DF_SHIFT 14
+#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
+
+
+/**
+ * key structure of TCP ipv4 hash table. It describes the prerequsite
+ * rules of merging packets.
+ */
+struct gro_tcp4_pre_rules {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+
+	uint8_t padding[3];
+};
+
+/**
+ * TCP item structure
+ */
+struct gro_tcp_item {
+	struct rte_mbuf *segment;	/**< packet address. */
+	uint32_t next_sent_seq;	/**< sequence number of the next packet. */
+	uint16_t segment_idx;	/**< packet index. */
+};
+
+void
+rte_gro_tcp4_cksum_update(struct rte_mbuf *pkt);
+
+/**
+ * Create a new TCP ipv4 GRO lookup table.
+ *
+ * @param name
+ *	Lookup table name
+ * @param nb_entries
+ *  Lookup table elements number, whose value should be larger than or
+ *  equal to RTE_GRO_HASH_ENTRIES_MIN, and less than or equal to
+ *  RTE_GRO_HASH_ENTRIES_MAX, and should be power of two.
+ * @param socket_id
+ *  socket id
+ * @return
+ *  lookup table address
+ */
+int
+rte_gro_tcp4_tbl_create(char *name, uint32_t nb_entries,
+		uint16_t socket_id, struct rte_hash **hash_tbl);
+/**
+ * This function reassembles a bulk of TCP IPv4 packets. For non-TCP IPv4
+ * packets, the function won't process them.
+ *
+ * @param hash_tbl
+ *	Lookup table used to reassemble packets. It stores key-value pairs.
+ *	The key describes the prerequsite rules to merge two TCP IPv4 packets;
+ *	the value is a pointer pointing to a item-list, which contains
+ *	packets that have the same prerequisite TCP IPv4 rules. Note that
+ *	applications need to guarantee the hash_tbl is clean when first call
+ *	this function.
+ * @return
+ *	The packet number after GRO. If reassemble successfully, the value is
+ *	less than nb_pkts; if not, the value is equal to nb_pkts.
+ */
+int32_t
+rte_gro_tcp4_reassemble(struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v2 3/3] app/testpmd: enable GRO feature
  2017-04-04 12:31 ` [PATCH v2 0/3] support GRO in DPDK Jiayu Hu
  2017-04-04 12:31   ` [PATCH v2 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-04-04 12:31   ` [PATCH v2 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-04-04 12:31   ` Jiayu Hu
  2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-04-04 12:31 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, stephen, Jiayu Hu

This patch demonstrates the usage of DPDK GRO in testpmd. By default,
GRO is turned off. Command, "gro on (port_id)", turns on GRO for the
given port; command, "gro off (port_id)", turns off GRO for the given
port. Note that, current GRO only supports TCP IPv4 packets.

Once the feature is turned on, all received packets are performed GRO
procedure before returned from rte_eth_rx_burst.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/config.c  | 26 ++++++++++++++++++++++++++
 app/test-pmd/iofwd.c   |  1 +
 app/test-pmd/testpmd.c |  5 +++++
 app/test-pmd/testpmd.h |  3 +++
 5 files changed, 80 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 47f935d..e4a9075 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -396,6 +397,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3784,6 +3788,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_set_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, NULL);
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_set_gro = {
+	.f = cmd_set_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -12464,6 +12508,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_set_gro,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 80491fc..1d210d9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -97,6 +97,7 @@
 #ifdef RTE_LIBRTE_IXGBE_PMD
 #include <rte_pmd_ixgbe.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2415,6 +2416,31 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (test_done == 0) {
+			printf("before enable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		rte_gro_enable(port_id, rte_eth_dev_socket_id(port_id));
+	} else if (strcmp(mode, "off") == 0) {
+		if (test_done == 0) {
+			printf("before disable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		rte_gro_disable(port_id);
+	} else
+		printf("unsupported GRO mode\n");
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 15cb4a2..d9be390 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -65,6 +65,7 @@
 #include <rte_ethdev.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index e04e215..d019d0f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -79,6 +79,7 @@
 #include <rte_pdump.h>
 #endif
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -1455,6 +1456,10 @@ start_port(portid_t pid)
 	else if (need_check_link_status == 0)
 		printf("Please stop the ports first\n");
 
+	/* initialize GRO environment */
+	if (pid == (portid_t)RTE_PORT_ALL)
+		rte_gro_init();
+
 	printf("Done\n");
 	return 0;
 }
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 8cf2860..c3f3fc2 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -109,6 +109,8 @@ struct fwd_stream {
 	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
 	streamid_t peer_addr; /**< index of peer ethernet address of packets */
 
+	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
+
 	unsigned int retry_enabled;
 
 	/* "read-write" results */
@@ -616,6 +618,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v3 0/3] support GRO in DPDK
  2017-04-04 12:31 ` [PATCH v2 0/3] support GRO in DPDK Jiayu Hu
                     ` (2 preceding siblings ...)
  2017-04-04 12:31   ` [PATCH v2 3/3] app/testpmd: enable GRO feature Jiayu Hu
@ 2017-04-24  8:09   ` Jiayu Hu
  2017-04-24  8:09     ` [PATCH v3 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                       ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-04-24  8:09 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
add GRO support in DPDK.

DPDK GRO is designed as a device ability, which is turned off by default.
The unit to enable/disable GRO is port. And once a port is enabled GRO,
all of its queues will reassemble packets as many as possible.

For applications, the procedure of merging packets is entirely invisible.
To use GRO, they just need to decide which ports need/needn't GRO and
invoke GRO enabling/disabling functions for these ports. For a port, if
it's enabled GRO, one generic reassembly function is registered as a RX
callback for all of its queues. That is, the reassembly procedure is
performed inside rte_eth_rx_burst.

This patchset is to support GRO in DPDK. The first patch is to provide a
GRO API framework, which enables applications to use GRO ability and
enable developers to add GRO supports for specific protocols. The second
patch supports TCP/IPv4 GRO. The last patch demonstrates how to use GRO
ability in app/testpmd.

We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
the performance gains from DPDK GRO. Specifically, the experiment
environment is:
a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. Iperf client
runs on p0, which sends TCP/IPv4 packets. Compile DPDK with optimization
level -O3;
c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
VM via vhost-user and virtio-kernel. The VM runs iperf server, whose IP
is 1.1.2.4;
d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
iperf client and server use the following commands:
	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
	- server: iperf -s -f g
Two test cases are:
a. w/o DPDK GRO: run testpmd without GRO
b. w DPDK GRO: testpmd enables GRO for p1
Result:
With GRO, the throughput improvement is around 50%.

Change log
==========
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable GRO feature

 app/test-pmd/cmdline.c          |  45 ++++++
 app/test-pmd/config.c           |  26 ++++
 app/test-pmd/iofwd.c            |   1 +
 app/test-pmd/testpmd.c          |   5 +
 app/test-pmd/testpmd.h          |   3 +
 config/common_base              |   5 +
 lib/Makefile                    |   1 +
 lib/librte_gro/Makefile         |  51 +++++++
 lib/librte_gro/rte_gro.c        | 297 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h        |  29 ++++
 lib/librte_gro/rte_gro_common.h |  77 +++++++++++
 lib/librte_gro/rte_gro_tcp.c    | 270 ++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h    |  95 +++++++++++++
 mk/rte.app.mk                   |   1 +
 14 files changed, 906 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_common.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
@ 2017-04-24  8:09     ` Jiayu Hu
  2017-05-22  9:19       ` Ananyev, Konstantin
  2017-04-24  8:09     ` [PATCH v3 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-04-24  8:09 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. This patchset is to
support GRO in DPDK. To support GRO, this patch implements a GRO API
framework.

DPDK GRO is designed as a device ability, in which the reassembly
procedure is transparent to applications. The unit to enable/disable GRO
ability is port. To use DPDK GRO ability, applications just need to
explicitly invoke GRO enabling/disabling functions for specific ports.
If enable/disable GRO for a port, all of its queues will/won't perform
GRO, once receiving packets.

DPDK GRO is implemented as a new library, which includes two parts. One
is external functions provided to applications to use GRO ability; the
other is reassembly functions to reassemble packets for various
protocols.

For applications, DPDK GRO provides three external functions to
enable/disable GRO:
- rte_gro_init: initialize GRO environment;
- rte_gro_enable: enable GRO for a given port;
- rte_gro_disable: disable GRO for a given port.
Before using GRO, applications should explicitly call rte_gro_init to
initizalize GRO environment. After that, applications can call
rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
specific ports.

DPDK GRO has a generic reassembly function, which processes all inputted
packets in a burst-mode. If a port wants to enable GRO, this generic
reassembly function will be registered as a RX callback for all queues of
this port; if the port wants to disable GRO, all the callbacks of its
queues will be removed. Therefore, GRO procedure is performed in ethdev
layer.

In DPDK GRO, each specific protocol type has a corresponding reassembly
function, which tries to merge packets of its type. For example, TCP/IPv4
reassembly function is in charge of proccessing TCP/IPv4 packets. The
generic reassembly function calls these specific reassembly functions
according to packet types, and packets with unsupported protocols types
are not processed.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base              |   5 +
 lib/Makefile                    |   1 +
 lib/librte_gro/Makefile         |  50 +++++++++
 lib/librte_gro/rte_gro.c        | 219 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h        |  29 ++++++
 lib/librte_gro/rte_gro_common.h |  75 ++++++++++++++
 mk/rte.app.mk                   |   1 +
 7 files changed, 380 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_common.h

diff --git a/config/common_base b/config/common_base
index 0b4297c..92e97ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -709,6 +709,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..e253053 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..fb3a36c
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..996b382
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,219 @@
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_hash.h>
+#include <stdint.h>
+#include <rte_malloc.h>
+
+#include "rte_gro.h"
+#include "rte_gro_common.h"
+
+gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
+gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
+
+struct rte_gro_status *gro_status;
+
+/**
+ * Internal function. It creates one hashing table for all
+ * DPDK-supported GRO types, and all of them are stored in an object
+ * of struct rte_gro_tbl.
+ *
+ * @param name
+ *  Name for GRO lookup table
+ * @param nb_entries
+ *  Element number of each hashing table
+ * @param socket_id
+ *  socket id
+ * @param gro_tbl
+ *  gro_tbl points to a rte_gro_tbl object, which will be initalized
+ *  inside rte_gro_tbl_setup.
+ * @return
+ *  If create successfully, return a positive value; if not, return
+ *  a negative value.
+ */
+static int
+rte_gro_tbl_setup(char *name, uint32_t nb_entries,
+		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	const uint32_t len = strlen(name) + 10;
+	char tbl_name[len];
+	int i;
+
+	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
+		sprintf(tbl_name, "%s_%u", name, i);
+		create_tbl_fn = tbl_create_functions[i];
+		if (create_tbl_fn && (create_tbl_fn(name,
+						nb_entries,
+						socket_id,
+						&(gro_tbl->
+							lkp_tbls[i].hash_tbl))
+					< 0)) {
+			return -1;
+		}
+		gro_tbl->lkp_tbls[i].gro_type = i;
+	}
+	return 1;
+}
+
+/**
+ * Internal function. It frees all the hashing tables stored in
+ * the given struct rte_gro_tbl object.
+ */
+static void
+rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	int i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
+		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
+		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
+		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
+	}
+}
+
+/**
+ * Internal function. It performs all supported GRO types on inputted
+ * packets. For example, if current DPDK GRO supports TCP/IPv4 and
+ * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
+ * packets. Packets of unsupported GRO types won't be processed. For
+ * ethernet devices, which want to support GRO, this function is used to
+ * registered as RX callback for all queues.
+ *
+ * @param pkts
+ *  Packets to reassemble.
+ * @param nb_pkts
+ *  The number of packets to reassemble.
+ * @param gro_tbl
+ *  pointer points to an object of struct rte_gro_tbl, which has been
+ *  initialized by rte_gro_tbl_setup.
+ * @return
+ *  Packet number after GRO. If reassemble successfully, the value is
+ *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
+ *  parameters are invalid, return 0.
+ */
+static uint16_t
+rte_gro_reassemble_burst(uint8_t port __rte_unused,
+		uint16_t queue __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		uint16_t max_pkts __rte_unused,
+		void *gro_tbl)
+{
+	if ((gro_tbl == NULL) || (pkts == NULL)) {
+		printf("invalid parameters for GRO.\n");
+		return 0;
+	}
+	uint16_t nb_after_gro = nb_pkts;
+
+	return nb_after_gro;
+}
+
+void
+rte_gro_init(void)
+{
+	uint8_t nb_port, i;
+	uint16_t nb_queue;
+	struct rte_eth_dev_info dev_info;
+
+	/* if init already, return immediately */
+	if (gro_status) {
+		printf("repeatly init GRO environment\n");
+		return;
+	}
+
+	gro_status = (struct rte_gro_status *)rte_zmalloc(
+			NULL,
+			sizeof(struct rte_gro_status),
+			0);
+
+	nb_port = rte_eth_dev_count();
+	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
+			NULL,
+			nb_port * sizeof(struct gro_port_status),
+			0);
+	gro_status->nb_port = nb_port;
+
+	for (i = 0; i < nb_port; i++) {
+		rte_eth_dev_info_get(i, &dev_info);
+		nb_queue = dev_info.nb_rx_queues;
+		gro_status->ports[i].gro_tbls =
+			(struct rte_gro_tbl **)rte_zmalloc(
+					NULL,
+					nb_queue * sizeof(struct rte_gro_tbl *),
+					0);
+		gro_status->ports[i].gro_cbs =
+			(struct rte_eth_rxtx_callback **)
+			rte_zmalloc(
+					NULL,
+					nb_queue *
+					sizeof(struct rte_eth_rxtx_callback *),
+					0);
+	}
+}
+
+void
+rte_gro_enable(uint8_t port_id, uint16_t socket_id)
+{
+	if (gro_status->ports[port_id].gro_enable) {
+		printf("port %u has enabled GRO\n", port_id);
+		return;
+	}
+	uint16_t nb_queue, i;
+	struct rte_eth_dev_info dev_info;
+	char tbl_name[20];
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+	nb_queue = dev_info.nb_rx_queues;
+
+	for (i = 0; i < nb_queue; i++) {
+		struct rte_gro_tbl *gro_tbl;
+
+		/* allocate hashing tables for this port */
+		sprintf(tbl_name, "GRO_TBL_%u", port_id);
+		gro_tbl = (struct rte_gro_tbl *)rte_malloc
+			(NULL, sizeof(struct rte_gro_tbl), 0);
+		rte_gro_tbl_setup(tbl_name,
+				GRO_DEFAULT_LOOKUP_TABLE_ENTRY_NB,
+				socket_id,
+				gro_tbl);
+		gro_status->ports[port_id].gro_tbls[i] = gro_tbl;
+		/**
+		 * register GRO reassembly function as a rx callback for each
+		 * queue of this port.
+		 */
+		gro_status->ports[port_id].gro_cbs[i] =
+			rte_eth_add_rx_callback
+			(port_id, i,
+			 rte_gro_reassemble_burst,
+			 gro_tbl);
+	}
+	gro_status->ports[port_id].gro_enable = 1;
+}
+
+void
+rte_gro_disable(uint8_t port_id)
+{
+	if (gro_status->ports[port_id].gro_enable == 0) {
+		printf("port %u has disabled GRO\n", port_id);
+		return;
+	}
+	uint16_t nb_queue, i;
+	struct rte_eth_dev_info dev_info;
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+	nb_queue = dev_info.nb_rx_queues;
+
+	for (i = 0; i < nb_queue; i++) {
+		/* free all hashing tables */
+		rte_gro_tbl_destroy(gro_status->ports[port_id].gro_tbls[i]);
+		gro_status->ports[port_id].gro_tbls[i] = NULL;
+
+		/* remove GRO rx callback */
+		rte_eth_remove_rx_callback(port_id, i,
+				gro_status->ports[port_id].gro_cbs[i]);
+		gro_status->ports[port_id].gro_cbs[i] = NULL;
+	}
+	gro_status->ports[port_id].gro_enable = 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..c84378e
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,29 @@
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+/**
+ * Initialize GRO environment for all ports. It should be called after
+ * configuring all ethernet devices, and should be called just once.
+ */
+void
+rte_gro_init(void);
+
+/**
+ * Enable GRO for a given port.
+ * @param port_id
+ *  The id of the port that is to enable GRO.
+ * @param socket_id
+ *  The NUMA socket id to which the ethernet device is connected.
+ *  By default, it's value is SOCKET_ID_ANY.
+ */
+void
+rte_gro_enable(uint8_t port_id, uint16_t socket_id);
+
+/**
+ * Disable GRO for a given port.
+ * @param port_id
+ *  The idd of the port that disables GRO.
+ */
+void
+rte_gro_disable(uint8_t port_id);
+#endif
diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
new file mode 100644
index 0000000..611d833
--- /dev/null
+++ b/lib/librte_gro/rte_gro_common.h
@@ -0,0 +1,75 @@
+#ifndef _GRO_COMMON_H_
+#define _GRO_COMMON_H_
+
+/**
+ * the maximum number of supported GRO types
+ */
+#define GRO_TYPE_MAX_NB 256
+/**
+ * flag indicates empty GRO type
+ */
+#define GRO_EMPTY_TYPE 255
+/**
+ * current supported GRO types number
+ */
+#define GRO_SUPPORT_TYPE_NB 0
+
+/**
+ * default element number of the hashing table
+ */
+#define GRO_DEFAULT_LOOKUP_TABLE_ENTRY_NB 64
+
+/**
+ * Structure to store addresses of all hashing tables.
+ */
+struct rte_gro_lkp_tbl {
+	struct rte_hash *hash_tbl;
+	uint8_t gro_type;
+};
+struct rte_gro_tbl {
+	struct rte_gro_lkp_tbl lkp_tbls[GRO_SUPPORT_TYPE_NB];
+};
+
+/**
+ * Item-list structure.
+ */
+struct gro_item_list {
+	void *items;	/**< item array */
+	uint16_t nb_item;	/**< item number */
+};
+
+/**
+ * Each packet has an object of gro_info, which records the GRO
+ * information related to this packet.
+ */
+struct gro_info {
+	struct gro_item_list item_list;	/**< pre-allocated item-list */
+	/**< packets number that are merged with it */
+	uint16_t nb_merged_packets;
+	uint8_t gro_type;	/**< GRO type that the packet is performed */
+};
+
+/**
+ * Record GRO information for each port.
+ */
+struct gro_port_status {
+	struct rte_gro_tbl **gro_tbls;
+	struct rte_eth_rxtx_callback **gro_cbs;
+	uint8_t gro_enable;	/* flag indicates if the port enables GRO */
+};
+
+struct rte_gro_status {
+	struct gro_port_status *ports;
+	uint8_t nb_port;
+};
+
+typedef int (*gro_tbl_create_fn)(
+		char *name,
+		uint32_t nb_entries,
+		uint16_t socket_id,
+		struct rte_hash **hash_tbl);
+
+typedef int32_t (*gro_reassemble_fn)(
+		struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list);
+#endif
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index b5215c0..8956821 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v3 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
  2017-04-24  8:09     ` [PATCH v3 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-04-24  8:09     ` Jiayu Hu
  2017-04-24  8:09     ` [PATCH v3 3/3] app/testpmd: enable GRO feature Jiayu Hu
  2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-04-24  8:09 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, Jiayu Hu

Introduce three new functions to support TCP/IPv4 GRO.
- rte_gro_tcp4_tbl_create: create a TCP/IPv4 hashing table;
- rte_gro_tcp4_reassemble: try to reassemble an incoming TCP/IPv4 packet
    with existed TCP/IPv4 packets;
- rte_gro_tcp4_cksum_update: update TCP and IPv4 checksums.

rte_gro_tcp4_reassemble uses a TCP/IPv4 hashing table to implement packet
reassembly. The TCP/IPv4 hashing table is a cuckoo hashing table, whose
keys are rules of merging TCP/IPv4 packets, and whose values point to
item-lists. The item-list contains items, which point to packets with
the same keys.

That rte_gro_tcp4_reassemble processes an incoming packet requires four
steps:
a. check if the packet should be processed. TCP/IPv4 GRO doesn't process
packets of the following types:
- packets without data;
- packets with wrong checksums;
- fragmented packets.
b. lookup the hashing table to find a item-list, which stores packets that
may be able to merge with the incoming one;
c. if find the item-list, check all of its packets. If find one that
is the neighbor of the incoming packet, chaining them together and update
packet length and mbuf metadata; if don't find, allocate a new item for
the incoming packet and insert it into the item-list;
d. if fail to find a item-list, allocate a new item-list for the incoming
packet and insert it into the hash table.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 lib/librte_gro/Makefile         |   1 +
 lib/librte_gro/rte_gro.c        |  82 +++++++++++-
 lib/librte_gro/rte_gro_common.h |   4 +-
 lib/librte_gro/rte_gro_tcp.c    | 270 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h    |  95 ++++++++++++++
 5 files changed, 449 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index fb3a36c..c45f4f2 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 996b382..7851ac6 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -6,9 +6,12 @@
 
 #include "rte_gro.h"
 #include "rte_gro_common.h"
+#include "rte_gro_tcp.h"
 
-gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
-gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
+gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {
+	rte_gro_tcp4_reassemble, NULL};
+gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
+	rte_gro_tcp4_tbl_create, NULL};
 
 struct rte_gro_status *gro_status;
 
@@ -105,7 +108,82 @@ rte_gro_reassemble_burst(uint8_t port __rte_unused,
 		printf("invalid parameters for GRO.\n");
 		return 0;
 	}
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t l3proc_type, i;
+
+	/* record packet GRO info */
+	struct gro_info gro_infos[nb_pkts];
+	struct rte_gro_lkp_tbl *lkp_tbls = ((struct rte_gro_tbl *)
+			gro_tbl)->lkp_tbls;
+	int32_t ret;
 	uint16_t nb_after_gro = nb_pkts;
+	uint8_t dirty_tbls[GRO_SUPPORT_TYPE_NB] = {0};
+
+	/* pre-allocate tcp items for TCP GRO */
+	struct gro_tcp_item tcp_items[nb_pkts * nb_pkts];
+
+	for (i = 0; i < nb_pkts; i++) {
+		gro_infos[i].nb_merged_packets = 1;	/* initial value */
+		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
+		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
+		if (l3proc_type == ETHER_TYPE_IPv4) {
+			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+			if (ipv4_hdr->next_proto_id == IPPROTO_TCP) {
+				gro_infos[i].gro_type = GRO_TCP_IPV4;
+				/* allocate an item-list for the packet */
+				gro_infos[i].item_list.items =
+					&tcp_items[i * nb_pkts];
+				gro_infos[i].item_list.nb_item = 1;
+				/**
+				 * fill the packet information into the first
+				 * item of the item-list
+				 */
+				tcp_items[i * nb_pkts].segment = pkts[i];
+				tcp_items[i * nb_pkts].segment_idx = i;
+
+				ret = rte_gro_tcp4_reassemble(
+						lkp_tbls[GRO_TCP_IPV4].hash_tbl,
+						&gro_infos[i].item_list);
+				if (ret > 0) {
+					gro_infos[i].nb_merged_packets = 0;
+					gro_infos[--ret].nb_merged_packets++;
+					nb_after_gro--;
+				}
+				dirty_tbls[GRO_TCP_IPV4] = ret >= 0 ? 1 : 0;
+			}
+		}
+	}
+	/**
+	 * if there are packets been merged, update their headers,
+	 * and remove useless packet addresses from the inputted
+	 * packet array.
+	 */
+	if (nb_after_gro < nb_pkts) {
+		struct rte_mbuf *tmp[nb_pkts];
+		uint16_t j;
+
+		memset(tmp, 0,
+				sizeof(struct rte_mbuf *) * nb_pkts);
+		for (i = 0, j = 0; i < nb_pkts; i++) {
+			if (gro_infos[i].nb_merged_packets > 1) {
+				switch (gro_infos[i].gro_type) {
+				case GRO_TCP_IPV4:
+					rte_gro_tcp4_cksum_update(pkts[i]);
+					break;
+				}
+			}
+			if (gro_infos[i].nb_merged_packets != 0)
+				tmp[j++] = pkts[i];
+		}
+		rte_memcpy(pkts, tmp,
+				nb_pkts * sizeof(struct rte_mbuf *));
+	}
+
+	/* if GRO is performed, reset the hash table */
+	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++)
+		if (dirty_tbls[i])
+			rte_hash_reset(lkp_tbls[i].hash_tbl);
 
 	return nb_after_gro;
 }
diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
index 611d833..7b5d9ec 100644
--- a/lib/librte_gro/rte_gro_common.h
+++ b/lib/librte_gro/rte_gro_common.h
@@ -12,7 +12,9 @@
 /**
  * current supported GRO types number
  */
-#define GRO_SUPPORT_TYPE_NB 0
+#define GRO_SUPPORT_TYPE_NB 1
+
+#define GRO_TCP_IPV4 0	/**< TCP/IPv4 GRO */
 
 /**
  * default element number of the hashing table
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..f17d9f5
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,270 @@
+#include "rte_gro_tcp.h"
+
+int
+rte_gro_tcp4_tbl_create(char *name,
+		uint32_t nb_entries, uint16_t socket_id,
+		struct rte_hash **hash_tbl)
+{
+	struct rte_hash_parameters ht_param = {
+		.entries = nb_entries,
+		.name = name,
+		.key_len = sizeof(struct gro_tcp4_pre_rules),
+		.hash_func = rte_jhash,
+		.hash_func_init_val = 0,
+		.socket_id = socket_id,
+	};
+
+	*hash_tbl = rte_hash_create(&ht_param);
+	if (likely(*hash_tbl != NULL))
+		return 0;
+	return -1;
+}
+
+/* update TCP IPv4 checksum */
+void
+rte_gro_tcp4_cksum_update(struct rte_mbuf *pkt)
+{
+	uint32_t len, offset, cksum;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, cksum_pld;
+
+	if (pkt == NULL)
+		return;
+
+	len = pkt->pkt_len;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+
+	offset = sizeof(struct ether_hdr) + ipv4_ihl;
+	len -= offset;
+
+	/* TCP cksum without IP pseudo header */
+	ipv4_hdr->hdr_checksum = 0;
+	tcp_hdr->cksum = 0;
+	if (rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld) < 0) {
+		printf("invalid param for raw_cksum_mbuf\n");
+		return;
+	}
+	/* IP pseudo header cksum */
+	cksum = cksum_pld;
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
+
+	/* combine TCP checksum and IP pseudo header checksum */
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	cksum = (cksum == 0) ? 0xffff : cksum;
+	tcp_hdr->cksum = cksum;
+
+	/* update IP header cksum */
+	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+}
+
+/**
+ * This function traverses the item-list to find one item that can be
+ * merged with the incoming packet. If merge successfully, the merged
+ * packets are chained together; if not, insert the incoming packet into
+ * the item-list.
+ */
+static int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		uint16_t pkt_idx,
+		uint32_t pkt_sent_seq,
+		struct gro_item_list *list)
+{
+	struct gro_tcp_item *items;
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+
+	items = (struct gro_tcp_item *)list->items;
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, struct
+				ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	for (uint16_t i = 0; i < list->nb_item; i++) {
+		/* check if the two packets are neighbor */
+		if ((pkt_sent_seq ^ items[i].next_sent_seq) == 0) {
+			struct ipv4_hdr *ipv4_hdr2;
+			struct tcp_hdr *tcp_hdr2;
+			uint16_t ipv4_ihl2, tcp_hl2;
+			struct rte_mbuf *tail;
+
+			ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(
+						items[i].segment,
+						struct ether_hdr *)
+					+ 1);
+
+			/* check if the option fields equal */
+			if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+				ipv4_ihl2 = IPv4_HDR_LEN(ipv4_hdr2);
+				tcp_hdr2 = (struct tcp_hdr *)
+					((char *)ipv4_hdr2 + ipv4_ihl2);
+				tcp_hl2 = TCP_HDR_LEN(tcp_hdr2);
+				if ((tcp_hl1 != tcp_hl2) ||
+						(memcmp(tcp_hdr1 + 1,
+								tcp_hdr2 + 1,
+								tcp_hl2 - sizeof
+								(struct tcp_hdr))
+						 != 0))
+					continue;
+			}
+			/* check if the packet length will be beyond 64K */
+			if (items[i].segment->pkt_len + tcp_dl1 > UINT16_MAX)
+				goto merge_fail;
+
+			/* remove the header of the incoming packet */
+			rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+					ipv4_ihl1 + tcp_hl1);
+			/* chain the two packet together */
+			tail = rte_pktmbuf_lastseg(items[i].segment);
+			tail->next = pkt;
+
+			/* update IP header for the merged packet */
+			ipv4_hdr2->total_length = rte_cpu_to_be_16(
+					rte_be_to_cpu_16(
+						ipv4_hdr2->total_length)
+					+ tcp_dl1);
+
+			/* update the next expected sequence number */
+			items[i].next_sent_seq += tcp_dl1;
+
+			/* update mbuf metadata for the merged packet */
+			items[i].segment->nb_segs++;
+			items[i].segment->pkt_len += pkt->pkt_len;
+
+			return items[i].segment_idx + 1;
+		}
+	}
+
+merge_fail:
+	/* fail to merge. Insert the incoming packet into the item-list */
+	items[list->nb_item].next_sent_seq = pkt_sent_seq + tcp_dl1;
+	items[list->nb_item].segment = pkt;
+	items[list->nb_item].segment_idx = pkt_idx;
+	list->nb_item++;
+
+	return 0;
+}
+
+/**
+ * Traverse the item-list to find a packet to merge with the incoming
+ * one.
+ * @param hash_tbl
+ *  TCP IPv4 lookup table
+ * @param item_list
+ *  Pre-allocated item-list, in which the first item stores the packet
+ *  to process.
+ * @return
+ *  If the incoming packet merges with one packet successfully, return
+ *  the index + 1 of the merged packet; if the incoming packet hasn't
+ *  been performed GRO, return -1; if the incoming packet is performed
+ *  GRO but fail to merge, return 0.
+ */
+int32_t
+rte_gro_tcp4_reassemble(struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
+	struct gro_tcp4_pre_rules key = {0};
+	struct gro_item_list *list;
+	uint64_t ol_flags;
+	uint32_t sent_seq;
+	int32_t ret = -1;
+
+	/* get the packet to process */
+	struct gro_tcp_item *items = item_list->items;
+	struct rte_mbuf *pkt = items[0].segment;
+	uint32_t pkt_idx = items[0].segment_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* 1. check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto end;
+	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
+		goto end;
+	if ((ipv4_hdr->fragment_offset &
+				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
+			== 0)
+		goto end;
+
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto end;
+
+	ol_flags = pkt->ol_flags;
+	/**
+	 * 2. if HW rx checksum offload isn't enabled, recalculate the
+	 * checksum in SW. Then, check if the checksum is correct
+	 */
+	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
+			PKT_RX_IP_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
+			goto end;
+	} else {
+		ip_cksum = ipv4_hdr->hdr_checksum;
+		ipv4_hdr->hdr_checksum = 0;
+		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
+			goto end;
+	}
+
+	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
+			PKT_RX_L4_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
+			goto end;
+	} else {
+		tcp_cksum = tcp_hdr->cksum;
+		tcp_hdr->cksum = 0;
+		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
+			(ipv4_hdr, tcp_hdr);
+		if (tcp_hdr->cksum ^ tcp_cksum)
+			goto end;
+	}
+
+	/* 3. search for the corresponding item-list for the packet */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	if (rte_hash_lookup_data(hash_tbl, &key, (void **)&list) >= 0) {
+		ret = gro_tcp4_reassemble(pkt, pkt_idx, sent_seq, list);
+	} else {
+		/**
+		 * fail to find a item-list. Record the sequence number of the
+		 * incoming packet's neighbor into its item_list, and insert it
+		 * into the hash table.
+		 */
+		items[0].next_sent_seq = sent_seq + tcp_dl;
+		if (unlikely(rte_hash_add_key_data(hash_tbl, &key, item_list)
+					!= 0))
+			printf("GRO TCP hash insert fail.\n");
+		else
+			ret = 0;
+	}
+end:
+	return ret;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..52be9cd
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,95 @@
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+
+#include "rte_gro_common.h"
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define IPV4_HDR_DF_SHIFT 14
+#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
+
+
+/**
+ * key structure of TCP ipv4 hash table. It describes the prerequsite
+ * rules of merging packets.
+ */
+struct gro_tcp4_pre_rules {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+
+	uint8_t padding[3];
+};
+
+/**
+ * TCP item structure
+ */
+struct gro_tcp_item {
+	struct rte_mbuf *segment;	/**< packet address. */
+	uint32_t next_sent_seq;	/**< sequence number of the next packet. */
+	uint16_t segment_idx;	/**< packet index. */
+};
+
+void
+rte_gro_tcp4_cksum_update(struct rte_mbuf *pkt);
+
+/**
+ * Create a new TCP ipv4 GRO lookup table.
+ *
+ * @param name
+ *	Lookup table name
+ * @param nb_entries
+ *  Lookup table elements number, whose value should be larger than or
+ *  equal to RTE_GRO_HASH_ENTRIES_MIN, and less than or equal to
+ *  RTE_GRO_HASH_ENTRIES_MAX, and should be power of two.
+ * @param socket_id
+ *  socket id
+ * @return
+ *  lookup table address
+ */
+int
+rte_gro_tcp4_tbl_create(char *name, uint32_t nb_entries,
+		uint16_t socket_id, struct rte_hash **hash_tbl);
+/**
+ * This function reassembles a bulk of TCP IPv4 packets. For non-TCP IPv4
+ * packets, the function won't process them.
+ *
+ * @param hash_tbl
+ *	Lookup table used to reassemble packets. It stores key-value pairs.
+ *	The key describes the prerequsite rules to merge two TCP IPv4 packets;
+ *	the value is a pointer pointing to a item-list, which contains
+ *	packets that have the same prerequisite TCP IPv4 rules. Note that
+ *	applications need to guarantee the hash_tbl is clean when first call
+ *	this function.
+ * @return
+ *	The packet number after GRO. If reassemble successfully, the value is
+ *	less than nb_pkts; if not, the value is equal to nb_pkts.
+ */
+int32_t
+rte_gro_tcp4_reassemble(struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v3 3/3] app/testpmd: enable GRO feature
  2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
  2017-04-24  8:09     ` [PATCH v3 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-04-24  8:09     ` [PATCH v3 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-04-24  8:09     ` Jiayu Hu
  2017-06-07  9:24       ` Wu, Jingjing
  2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-04-24  8:09 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, keith.wiles, yuanhan.liu, Jiayu Hu

This patch demonstrates the usage of GRO library in testpmd. By default,
GRO is turned off. Command, "gro on (port_id)", turns on GRO for the
given port; command, "gro off (port_id)", turns off GRO for the given
port. Note that, current GRO only supports TCP IPv4 packets.

Once the feature is turned on, all received packets are performed GRO
procedure before returned from rte_eth_rx_burst.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/config.c  | 26 ++++++++++++++++++++++++++
 app/test-pmd/iofwd.c   |  1 +
 app/test-pmd/testpmd.c |  5 +++++
 app/test-pmd/testpmd.h |  3 +++
 5 files changed, 80 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f6bd75b..200ac3c 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -420,6 +421,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3825,6 +3829,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_set_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, NULL);
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_set_gro = {
+	.f = cmd_set_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13592,6 +13636,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_set_gro,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index ef07925..525f83b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -97,6 +97,7 @@
 #ifdef RTE_LIBRTE_IXGBE_PMD
 #include <rte_pmd_ixgbe.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2423,6 +2424,31 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (test_done == 0) {
+			printf("before enable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		rte_gro_enable(port_id, rte_eth_dev_socket_id(port_id));
+	} else if (strcmp(mode, "off") == 0) {
+		if (test_done == 0) {
+			printf("before disable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		rte_gro_disable(port_id);
+	} else
+		printf("unsupported GRO mode\n");
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 15cb4a2..d9be390 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -65,6 +65,7 @@
 #include <rte_ethdev.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 3a57348..39b80b4 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -87,6 +87,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -1518,6 +1519,10 @@ start_port(portid_t pid)
 	else if (need_check_link_status == 0)
 		printf("Please stop the ports first\n");
 
+	/* initialize GRO environment */
+	if (pid == (portid_t)RTE_PORT_ALL)
+		rte_gro_init();
+
 	printf("Done\n");
 	return 0;
 }
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a9ff07e..1fa7e74 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -109,6 +109,8 @@ struct fwd_stream {
 	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
 	streamid_t peer_addr; /**< index of peer ethernet address of packets */
 
+	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
+
 	unsigned int retry_enabled;
 
 	/* "read-write" results */
@@ -615,6 +617,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-04-24  8:09     ` [PATCH v3 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-05-22  9:19       ` Ananyev, Konstantin
  2017-05-23 10:31         ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-22  9:19 UTC (permalink / raw)
  To: Hu, Jiayu, dev; +Cc: Wiles, Keith, yuanhan.liu

Hi Jiayu,
My comments/questions below.
Konstantin

> 
> For applications, DPDK GRO provides three external functions to
> enable/disable GRO:
> - rte_gro_init: initialize GRO environment;
> - rte_gro_enable: enable GRO for a given port;
> - rte_gro_disable: disable GRO for a given port.
> Before using GRO, applications should explicitly call rte_gro_init to
> initizalize GRO environment. After that, applications can call
> rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> specific ports.

I think this is too restrictive and wouldn't meet various user's needs.
User might want to:
- enable/disable GRO for particular RX queue
- or even setup different GRO types for different RX queues,
   i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
- For various reasons, user might prefer not to use RX callbacks for various reasons,
  But invoke gro() manually at somepoint in his code.
- Many users would like to control size (number of flows/items per flow),
  max allowed packet size, max timeout, etc., for different GRO tables.
- User would need a way to flush all or only timeout packets from particular GRO tables.

So I think that API needs to extended to become be much more fine-grained.
Something like that:

struct rte_gro_tbl_param {
   int32_t socket_id;
   size_t max_flows;
   size_t max_items_per_flow;
   size_t max_pkt_size;
   uint64_t packet_timeout_cycles;
   <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
  <probably type specific params>
  ...
};

struct rte_gro_tbl;
strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);

void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);

/*
 * process packets, might store some packets inside the GRO table,
 * returns number of filled entries in pkt[]
 */
uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);

/*
  * retirieves up to num timeouted packets from the table.
  */
uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);

And if you like to keep them as helper functions:

int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
void rte_gro_port_disable(uint8_t port);

Though from my perspective, it is out of scope of that library,
and I'd leave it to the upper layer to decide which callbacks and
in what order have to be installed on particular port. 

....

> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..996b382
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,219 @@
> +#include <rte_ethdev.h>
> +#include <rte_mbuf.h>
> +#include <rte_hash.h>
> +#include <stdint.h>
> +#include <rte_malloc.h>
> +
> +#include "rte_gro.h"
> +#include "rte_gro_common.h"
> +
> +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};

Why not to initialize these arrays properly at building time (here)?

> +
> +struct rte_gro_status *gro_status;
> +
> +/**
> + * Internal function. It creates one hashing table for all
> + * DPDK-supported GRO types, and all of them are stored in an object
> + * of struct rte_gro_tbl.
> + *
> + * @param name
> + *  Name for GRO lookup table
> + * @param nb_entries
> + *  Element number of each hashing table
> + * @param socket_id
> + *  socket id
> + * @param gro_tbl
> + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> + *  inside rte_gro_tbl_setup.
> + * @return
> + *  If create successfully, return a positive value; if not, return
> + *  a negative value.
> + */
> +static int
> +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	const uint32_t len = strlen(name) + 10;
> +	char tbl_name[len];
> +	int i;
> +
> +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> +		sprintf(tbl_name, "%s_%u", name, i);
> +		create_tbl_fn = tbl_create_functions[i];
> +		if (create_tbl_fn && (create_tbl_fn(name,
> +						nb_entries,
> +						socket_id,
> +						&(gro_tbl->
> +							lkp_tbls[i].hash_tbl))
> +					< 0)) {

Shouldn't you destroy already created tables here?

> +			return -1;
> +		}
> +		gro_tbl->lkp_tbls[i].gro_type = i;
> +	}
> +	return 1;
> +}
> +
> +/**
> + * Internal function. It frees all the hashing tables stored in
> + * the given struct rte_gro_tbl object.
> + */
> +static void
> +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> +{
> +	int i;
> +
> +	if (gro_tbl == NULL)
> +		return;
> +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> +	}
> +}
> +
> +/**
> + * Internal function. It performs all supported GRO types on inputted
> + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> + * packets. Packets of unsupported GRO types won't be processed. For
> + * ethernet devices, which want to support GRO, this function is used to
> + * registered as RX callback for all queues.
> + *
> + * @param pkts
> + *  Packets to reassemble.
> + * @param nb_pkts
> + *  The number of packets to reassemble.
> + * @param gro_tbl
> + *  pointer points to an object of struct rte_gro_tbl, which has been
> + *  initialized by rte_gro_tbl_setup.
> + * @return
> + *  Packet number after GRO. If reassemble successfully, the value is
> + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> + *  parameters are invalid, return 0.
> + */
> +static uint16_t
> +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> +		uint16_t queue __rte_unused,
> +		struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		uint16_t max_pkts __rte_unused,
> +		void *gro_tbl)
> +{
> +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> +		printf("invalid parameters for GRO.\n");
> +		return 0;
> +	}
> +	uint16_t nb_after_gro = nb_pkts;

Here and everywhere - please move variable definitions to the top of the block.

> +
> +	return nb_after_gro;
> +}
> +
> +void
> +rte_gro_init(void)
> +{
> +	uint8_t nb_port, i;
> +	uint16_t nb_queue;
> +	struct rte_eth_dev_info dev_info;
> +
> +	/* if init already, return immediately */
> +	if (gro_status) {
> +		printf("repeatly init GRO environment\n");
> +		return;
> +	}
> +
> +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> +			NULL,
> +			sizeof(struct rte_gro_status),
> +			0);
> +
> +	nb_port = rte_eth_dev_count();

Number of ports and number of configured queues per port can change dynamically.
In fact, I don't think we need that function and global gro_status at all.
To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
Searching for particular cb_func and cb_arg values. 

> +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> +			NULL,
> +			nb_port * sizeof(struct gro_port_status),
> +			0);
> +	gro_status->nb_port = nb_port;
> +
> +	for (i = 0; i < nb_port; i++) {
> +		rte_eth_dev_info_get(i, &dev_info);
> +		nb_queue = dev_info.nb_rx_queues;
> +		gro_status->ports[i].gro_tbls =
> +			(struct rte_gro_tbl **)rte_zmalloc(
> +					NULL,
> +					nb_queue * sizeof(struct rte_gro_tbl *),
> +					0);
> +		gro_status->ports[i].gro_cbs =
> +			(struct rte_eth_rxtx_callback **)
> +			rte_zmalloc(
> +					NULL,
> +					nb_queue *
> +					sizeof(struct rte_eth_rxtx_callback *),
> +					0);
> +	}
> +}
> +
> +}
....
> diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
....

> +
> +typedef int (*gro_tbl_create_fn)(
> +		char *name,
> +		uint32_t nb_entries,
> +		uint16_t socket_id,
> +		struct rte_hash **hash_tbl);
> +

If you have create_fn, then you'll probably need a destroy_fn too.
Second thing why always assume that particular GRO implementation would
always use rte_hash to store it's data?
Probably better hide that inside something neutral, like 'struct gro_data;' or so. 

> +typedef int32_t (*gro_reassemble_fn)(
> +		struct rte_hash *hash_tbl,
> +		struct gro_item_list *item_list);
> +#endif
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index b5215c0..8956821 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> 
>  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-22  9:19       ` Ananyev, Konstantin
@ 2017-05-23 10:31         ` Jiayu Hu
  2017-05-24 12:38           ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-05-23 10:31 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Konstantin,

Thanks for your comments. My replies/questions are below.

BRs,
Jiayu

On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> Hi Jiayu,
> My comments/questions below.
> Konstantin
> 
> > 
> > For applications, DPDK GRO provides three external functions to
> > enable/disable GRO:
> > - rte_gro_init: initialize GRO environment;
> > - rte_gro_enable: enable GRO for a given port;
> > - rte_gro_disable: disable GRO for a given port.
> > Before using GRO, applications should explicitly call rte_gro_init to
> > initizalize GRO environment. After that, applications can call
> > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > specific ports.
> 
> I think this is too restrictive and wouldn't meet various user's needs.
> User might want to:
> - enable/disable GRO for particular RX queue
> - or even setup different GRO types for different RX queues,
>    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.

The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
to applications. But are there any scenarios that different queues of a port may
require different GRO control (i.e. GRO types and enable/disable GRO)?

> - For various reasons, user might prefer not to use RX callbacks for various reasons,
>   But invoke gro() manually at somepoint in his code.

An application-used GRO library can enable more flexibility to applications. Besides,
when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
rte_eth_rx_burst returns actually received packet number or GROed packet number. And
the same issue happens in GSO, and even more seriously. This is because applications
, like VPP, always rely on the return value of rte_eth_tx_burst to decide further
operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
segmentation/reassembling libraries to applications.

> - Many users would like to control size (number of flows/items per flow),
>   max allowed packet size, max timeout, etc., for different GRO tables.
> - User would need a way to flush all or only timeout packets from particular GRO tables.
> 
> So I think that API needs to extended to become be much more fine-grained.
> Something like that:
> 
> struct rte_gro_tbl_param {
>    int32_t socket_id;
>    size_t max_flows;
>    size_t max_items_per_flow;
>    size_t max_pkt_size;
>    uint64_t packet_timeout_cycles;
>    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
>   <probably type specific params>
>   ...
> };
> 
> struct rte_gro_tbl;
> strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> 
> void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);

Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?

> 
> /*
>  * process packets, might store some packets inside the GRO table,
>  * returns number of filled entries in pkt[]
>  */
> uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> 
> /*
>   * retirieves up to num timeouted packets from the table.
>   */
> uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);

Currently, we implement GRO as RX callback, whose processing logic is simple:
receive burst packets -> perform GRO -> return to application. GRO stops after
finishing processing received packets. If we provide rte_gro_tbl_timeout, when
and who will call it?

> 
> And if you like to keep them as helper functions:
> 
> int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> void rte_gro_port_disable(uint8_t port);
> 
> Though from my perspective, it is out of scope of that library,
> and I'd leave it to the upper layer to decide which callbacks and
> in what order have to be installed on particular port. 

In my opinion, GRO library includes two parts. One is in charge of reassembling
packets, the other is in charge of implementing GRO as RX callback. And
rte_gro_port_enable/disable belongs to the second part. You mean we should let
applications to register RX callback, and GRO library just provides reassembling
related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?

> 
> ....
> 
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > new file mode 100644
> > index 0000000..996b382
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -0,0 +1,219 @@
> > +#include <rte_ethdev.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_hash.h>
> > +#include <stdint.h>
> > +#include <rte_malloc.h>
> > +
> > +#include "rte_gro.h"
> > +#include "rte_gro_common.h"
> > +
> > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> 
> Why not to initialize these arrays properly at building time (here)?

Yes, I should declare them as static variables.

> 
> > +
> > +struct rte_gro_status *gro_status;
> > +
> > +/**
> > + * Internal function. It creates one hashing table for all
> > + * DPDK-supported GRO types, and all of them are stored in an object
> > + * of struct rte_gro_tbl.
> > + *
> > + * @param name
> > + *  Name for GRO lookup table
> > + * @param nb_entries
> > + *  Element number of each hashing table
> > + * @param socket_id
> > + *  socket id
> > + * @param gro_tbl
> > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > + *  inside rte_gro_tbl_setup.
> > + * @return
> > + *  If create successfully, return a positive value; if not, return
> > + *  a negative value.
> > + */
> > +static int
> > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > +{
> > +	gro_tbl_create_fn create_tbl_fn;
> > +	const uint32_t len = strlen(name) + 10;
> > +	char tbl_name[len];
> > +	int i;
> > +
> > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > +		sprintf(tbl_name, "%s_%u", name, i);
> > +		create_tbl_fn = tbl_create_functions[i];
> > +		if (create_tbl_fn && (create_tbl_fn(name,
> > +						nb_entries,
> > +						socket_id,
> > +						&(gro_tbl->
> > +							lkp_tbls[i].hash_tbl))
> > +					< 0)) {
> 
> Shouldn't you destroy already created tables here?

Yes, I should add codes to destory created tables before creating new ones.

> 
> > +			return -1;
> > +		}
> > +		gro_tbl->lkp_tbls[i].gro_type = i;
> > +	}
> > +	return 1;
> > +}
> > +
> > +/**
> > + * Internal function. It frees all the hashing tables stored in
> > + * the given struct rte_gro_tbl object.
> > + */
> > +static void
> > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > +{
> > +	int i;
> > +
> > +	if (gro_tbl == NULL)
> > +		return;
> > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > +	}
> > +}
> > +
> > +/**
> > + * Internal function. It performs all supported GRO types on inputted
> > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > + * packets. Packets of unsupported GRO types won't be processed. For
> > + * ethernet devices, which want to support GRO, this function is used to
> > + * registered as RX callback for all queues.
> > + *
> > + * @param pkts
> > + *  Packets to reassemble.
> > + * @param nb_pkts
> > + *  The number of packets to reassemble.
> > + * @param gro_tbl
> > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > + *  initialized by rte_gro_tbl_setup.
> > + * @return
> > + *  Packet number after GRO. If reassemble successfully, the value is
> > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > + *  parameters are invalid, return 0.
> > + */
> > +static uint16_t
> > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > +		uint16_t queue __rte_unused,
> > +		struct rte_mbuf **pkts,
> > +		uint16_t nb_pkts,
> > +		uint16_t max_pkts __rte_unused,
> > +		void *gro_tbl)
> > +{
> > +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> > +		printf("invalid parameters for GRO.\n");
> > +		return 0;
> > +	}
> > +	uint16_t nb_after_gro = nb_pkts;
> 
> Here and everywhere - please move variable definitions to the top of the block.

Thanks. I will modify them in next version.
 
> 
> > +
> > +	return nb_after_gro;
> > +}
> > +
> > +void
> > +rte_gro_init(void)
> > +{
> > +	uint8_t nb_port, i;
> > +	uint16_t nb_queue;
> > +	struct rte_eth_dev_info dev_info;
> > +
> > +	/* if init already, return immediately */
> > +	if (gro_status) {
> > +		printf("repeatly init GRO environment\n");
> > +		return;
> > +	}
> > +
> > +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> > +			NULL,
> > +			sizeof(struct rte_gro_status),
> > +			0);
> > +
> > +	nb_port = rte_eth_dev_count();
> 
> Number of ports and number of configured queues per port can change dynamically.
> In fact, I don't think we need that function and global gro_status at all.
> To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
> Searching for particular cb_func and cb_arg values. 

Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But
we can't remove RX callback by searching for particular cb_func inside GRO library, since
these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock)
is unavailable to GRO library. To my knowledge, the only way is to provide a new API in
lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values.
You mean we need to add this API?

> 
> > +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > +			NULL,
> > +			nb_port * sizeof(struct gro_port_status),
> > +			0);
> > +	gro_status->nb_port = nb_port;
> > +
> > +	for (i = 0; i < nb_port; i++) {
> > +		rte_eth_dev_info_get(i, &dev_info);
> > +		nb_queue = dev_info.nb_rx_queues;
> > +		gro_status->ports[i].gro_tbls =
> > +			(struct rte_gro_tbl **)rte_zmalloc(
> > +					NULL,
> > +					nb_queue * sizeof(struct rte_gro_tbl *),
> > +					0);
> > +		gro_status->ports[i].gro_cbs =
> > +			(struct rte_eth_rxtx_callback **)
> > +			rte_zmalloc(
> > +					NULL,
> > +					nb_queue *
> > +					sizeof(struct rte_eth_rxtx_callback *),
> > +					0);
> > +	}
> > +}
> > +
> > +}
> ....
> > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
> ....
> 
> > +
> > +typedef int (*gro_tbl_create_fn)(
> > +		char *name,
> > +		uint32_t nb_entries,
> > +		uint16_t socket_id,
> > +		struct rte_hash **hash_tbl);
> > +
> 
> If you have create_fn, then you'll probably need a destroy_fn too.
> Second thing why always assume that particular GRO implementation would
> always use rte_hash to store it's data?
> Probably better hide that inside something neutral, like 'struct gro_data;' or so. 

Agree. I will modify it.

> 
> > +typedef int32_t (*gro_reassemble_fn)(
> > +		struct rte_hash *hash_tbl,
> > +		struct gro_item_list *item_list);
> > +#endif
> > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > index b5215c0..8956821 100644
> > --- a/mk/rte.app.mk
> > +++ b/mk/rte.app.mk
> > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > 
> >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > --
> > 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-23 10:31         ` Jiayu Hu
@ 2017-05-24 12:38           ` Ananyev, Konstantin
  2017-05-26  7:26             ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-24 12:38 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Wiles, Keith, yuanhan.liu


Hi Jiayu,

> 
> Hi Konstantin,
> 
> Thanks for your comments. My replies/questions are below.
> 
> BRs,
> Jiayu
> 
> On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > Hi Jiayu,
> > My comments/questions below.
> > Konstantin
> >
> > >
> > > For applications, DPDK GRO provides three external functions to
> > > enable/disable GRO:
> > > - rte_gro_init: initialize GRO environment;
> > > - rte_gro_enable: enable GRO for a given port;
> > > - rte_gro_disable: disable GRO for a given port.
> > > Before using GRO, applications should explicitly call rte_gro_init to
> > > initizalize GRO environment. After that, applications can call
> > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > specific ports.
> >
> > I think this is too restrictive and wouldn't meet various user's needs.
> > User might want to:
> > - enable/disable GRO for particular RX queue
> > - or even setup different GRO types for different RX queues,
> >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> 
> The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> to applications. But are there any scenarios that different queues of a port may
> require different GRO control (i.e. GRO types and enable/disable GRO)?

I think yes.

> 
> > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> >   But invoke gro() manually at somepoint in his code.
> 
> An application-used GRO library can enable more flexibility to applications. Besides,
> when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> the same issue happens in GSO, and even more seriously. This is because applications
> , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> segmentation/reassembling libraries to applications.
> 
> > - Many users would like to control size (number of flows/items per flow),
> >   max allowed packet size, max timeout, etc., for different GRO tables.
> > - User would need a way to flush all or only timeout packets from particular GRO tables.
> >
> > So I think that API needs to extended to become be much more fine-grained.
> > Something like that:
> >
> > struct rte_gro_tbl_param {
> >    int32_t socket_id;
> >    size_t max_flows;
> >    size_t max_items_per_flow;
> >    size_t max_pkt_size;
> >    uint64_t packet_timeout_cycles;
> >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> >   <probably type specific params>
> >   ...
> > };
> >
> > struct rte_gro_tbl;
> > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> >
> > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> 
> Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?

For any packets that sits in the gro_table for too long.

> 
> >
> > /*
> >  * process packets, might store some packets inside the GRO table,
> >  * returns number of filled entries in pkt[]
> >  */
> > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> >
> > /*
> >   * retirieves up to num timeouted packets from the table.
> >   */
> > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> 
> Currently, we implement GRO as RX callback, whose processing logic is simple:
> receive burst packets -> perform GRO -> return to application. GRO stops after
> finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> and who will call it?

I mean the following scenario:
We receive a packet, find it is eligible for GRO and put it into gro_table
in expectation - there would be more packets for the same flow.
But it could happen that we would never (or for some long time) receive
any new packets for that flow.
So the first packet would never be delivered to the upper layer,
or delivered too late.
So we need a mechanism to extract such not merged packets
and push them to the upper layer.
My thought it would be application responsibility to call it from time to time.
That's actually another reason why I think we shouldn't use application
to always use RX callbacks for GRO.

> 
> >
> > And if you like to keep them as helper functions:
> >
> > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> > void rte_gro_port_disable(uint8_t port);
> >
>
 > Though from my perspective, it is out of scope of that library,
> > and I'd leave it to the upper layer to decide which callbacks and
> > in what order have to be installed on particular port.
> 
> In my opinion, GRO library includes two parts. One is in charge of reassembling
> packets, the other is in charge of implementing GRO as RX callback. And
> rte_gro_port_enable/disable belongs to the second part. You mean we should let
> applications to register RX callback, and GRO library just provides reassembling
> related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?

Yes, that would be my preference.
Let the application developer to decide how/when to
call GRO functions (callback, direct call, some combination).

> 
> >
> > ....
> >
> > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > new file mode 100644
> > > index 0000000..996b382
> > > --- /dev/null
> > > +++ b/lib/librte_gro/rte_gro.c
> > > @@ -0,0 +1,219 @@
> > > +#include <rte_ethdev.h>
> > > +#include <rte_mbuf.h>
> > > +#include <rte_hash.h>
> > > +#include <stdint.h>
> > > +#include <rte_malloc.h>
> > > +
> > > +#include "rte_gro.h"
> > > +#include "rte_gro_common.h"
> > > +
> > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> >
> > Why not to initialize these arrays properly at building time (here)?
> 
> Yes, I should declare them as static variables.
> 
> >
> > > +
> > > +struct rte_gro_status *gro_status;
> > > +
> > > +/**
> > > + * Internal function. It creates one hashing table for all
> > > + * DPDK-supported GRO types, and all of them are stored in an object
> > > + * of struct rte_gro_tbl.
> > > + *
> > > + * @param name
> > > + *  Name for GRO lookup table
> > > + * @param nb_entries
> > > + *  Element number of each hashing table
> > > + * @param socket_id
> > > + *  socket id
> > > + * @param gro_tbl
> > > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > > + *  inside rte_gro_tbl_setup.
> > > + * @return
> > > + *  If create successfully, return a positive value; if not, return
> > > + *  a negative value.
> > > + */
> > > +static int
> > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > > +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > > +{
> > > +	gro_tbl_create_fn create_tbl_fn;
> > > +	const uint32_t len = strlen(name) + 10;
> > > +	char tbl_name[len];
> > > +	int i;
> > > +
> > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > +		sprintf(tbl_name, "%s_%u", name, i);
> > > +		create_tbl_fn = tbl_create_functions[i];
> > > +		if (create_tbl_fn && (create_tbl_fn(name,
> > > +						nb_entries,
> > > +						socket_id,
> > > +						&(gro_tbl->
> > > +							lkp_tbls[i].hash_tbl))
> > > +					< 0)) {
> >
> > Shouldn't you destroy already created tables here?
> 
> Yes, I should add codes to destory created tables before creating new ones.
> 
> >
> > > +			return -1;
> > > +		}
> > > +		gro_tbl->lkp_tbls[i].gro_type = i;
> > > +	}
> > > +	return 1;
> > > +}
> > > +
> > > +/**
> > > + * Internal function. It frees all the hashing tables stored in
> > > + * the given struct rte_gro_tbl object.
> > > + */
> > > +static void
> > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > +{
> > > +	int i;
> > > +
> > > +	if (gro_tbl == NULL)
> > > +		return;
> > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > > +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > > +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * Internal function. It performs all supported GRO types on inputted
> > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > > + * packets. Packets of unsupported GRO types won't be processed. For
> > > + * ethernet devices, which want to support GRO, this function is used to
> > > + * registered as RX callback for all queues.
> > > + *
> > > + * @param pkts
> > > + *  Packets to reassemble.
> > > + * @param nb_pkts
> > > + *  The number of packets to reassemble.
> > > + * @param gro_tbl
> > > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > > + *  initialized by rte_gro_tbl_setup.
> > > + * @return
> > > + *  Packet number after GRO. If reassemble successfully, the value is
> > > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > > + *  parameters are invalid, return 0.
> > > + */
> > > +static uint16_t
> > > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > > +		uint16_t queue __rte_unused,
> > > +		struct rte_mbuf **pkts,
> > > +		uint16_t nb_pkts,
> > > +		uint16_t max_pkts __rte_unused,
> > > +		void *gro_tbl)
> > > +{
> > > +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> > > +		printf("invalid parameters for GRO.\n");
> > > +		return 0;
> > > +	}
> > > +	uint16_t nb_after_gro = nb_pkts;
> >
> > Here and everywhere - please move variable definitions to the top of the block.
> 
> Thanks. I will modify them in next version.
> 
> >
> > > +
> > > +	return nb_after_gro;
> > > +}
> > > +
> > > +void
> > > +rte_gro_init(void)
> > > +{
> > > +	uint8_t nb_port, i;
> > > +	uint16_t nb_queue;
> > > +	struct rte_eth_dev_info dev_info;
> > > +
> > > +	/* if init already, return immediately */
> > > +	if (gro_status) {
> > > +		printf("repeatly init GRO environment\n");
> > > +		return;
> > > +	}
> > > +
> > > +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> > > +			NULL,
> > > +			sizeof(struct rte_gro_status),
> > > +			0);
> > > +
> > > +	nb_port = rte_eth_dev_count();
> >
> > Number of ports and number of configured queues per port can change dynamically.
> > In fact, I don't think we need that function and global gro_status at all.
> > To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
> > Searching for particular cb_func and cb_arg values.
> 
> Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But
> we can't remove RX callback by searching for particular cb_func inside GRO library, since
> these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock)
> is unavailable to GRO library. To my knowledge, the only way is to provide a new API in
> lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values.
> You mean we need to add this API?

Yes my though was to add something like:
rte_eth_find_rx_callback()                        /*find specific callback *./
or rte_eth_remove_get_rx_callbacks()  /*get a copy  of list of all currently installed callback */ 

But if we move installing GRO callbacks out of scope of the library, we probably wouldn't need it.

> 
> >
> > > +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > > +			NULL,
> > > +			nb_port * sizeof(struct gro_port_status),
> > > +			0);
> > > +	gro_status->nb_port = nb_port;
> > > +
> > > +	for (i = 0; i < nb_port; i++) {
> > > +		rte_eth_dev_info_get(i, &dev_info);
> > > +		nb_queue = dev_info.nb_rx_queues;
> > > +		gro_status->ports[i].gro_tbls =
> > > +			(struct rte_gro_tbl **)rte_zmalloc(
> > > +					NULL,
> > > +					nb_queue * sizeof(struct rte_gro_tbl *),
> > > +					0);
> > > +		gro_status->ports[i].gro_cbs =
> > > +			(struct rte_eth_rxtx_callback **)
> > > +			rte_zmalloc(
> > > +					NULL,
> > > +					nb_queue *
> > > +					sizeof(struct rte_eth_rxtx_callback *),
> > > +					0);
> > > +	}
> > > +}
> > > +
> > > +}
> > ....
> > > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
> > ....
> >
> > > +
> > > +typedef int (*gro_tbl_create_fn)(
> > > +		char *name,
> > > +		uint32_t nb_entries,
> > > +		uint16_t socket_id,
> > > +		struct rte_hash **hash_tbl);
> > > +
> >
> > If you have create_fn, then you'll probably need a destroy_fn too.
> > Second thing why always assume that particular GRO implementation would
> > always use rte_hash to store it's data?
> > Probably better hide that inside something neutral, like 'struct gro_data;' or so.
> 
> Agree. I will modify it.
> 
> >
> > > +typedef int32_t (*gro_reassemble_fn)(
> > > +		struct rte_hash *hash_tbl,
> > > +		struct gro_item_list *item_list);
> > > +#endif
> > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > > index b5215c0..8956821 100644
> > > --- a/mk/rte.app.mk
> > > +++ b/mk/rte.app.mk
> > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > >
> > >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > > --
> > > 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-24 12:38           ` Ananyev, Konstantin
@ 2017-05-26  7:26             ` Jiayu Hu
  2017-05-26 23:10               ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-05-26  7:26 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Konstantin,

On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> 
> Hi Jiayu,
> 
> > 
> > Hi Konstantin,
> > 
> > Thanks for your comments. My replies/questions are below.
> > 
> > BRs,
> > Jiayu
> > 
> > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > Hi Jiayu,
> > > My comments/questions below.
> > > Konstantin
> > >
> > > >
> > > > For applications, DPDK GRO provides three external functions to
> > > > enable/disable GRO:
> > > > - rte_gro_init: initialize GRO environment;
> > > > - rte_gro_enable: enable GRO for a given port;
> > > > - rte_gro_disable: disable GRO for a given port.
> > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > initizalize GRO environment. After that, applications can call
> > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > specific ports.
> > >
> > > I think this is too restrictive and wouldn't meet various user's needs.
> > > User might want to:
> > > - enable/disable GRO for particular RX queue
> > > - or even setup different GRO types for different RX queues,
> > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > 
> > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > to applications. But are there any scenarios that different queues of a port may
> > require different GRO control (i.e. GRO types and enable/disable GRO)?
> 
> I think yes.
> 
> > 
> > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > >   But invoke gro() manually at somepoint in his code.
> > 
> > An application-used GRO library can enable more flexibility to applications. Besides,
> > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > the same issue happens in GSO, and even more seriously. This is because applications
> > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > segmentation/reassembling libraries to applications.
> > 
> > > - Many users would like to control size (number of flows/items per flow),
> > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > >
> > > So I think that API needs to extended to become be much more fine-grained.
> > > Something like that:
> > >
> > > struct rte_gro_tbl_param {
> > >    int32_t socket_id;
> > >    size_t max_flows;
> > >    size_t max_items_per_flow;
> > >    size_t max_pkt_size;
> > >    uint64_t packet_timeout_cycles;
> > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > >   <probably type specific params>
> > >   ...
> > > };
> > >
> > > struct rte_gro_tbl;
> > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > >
> > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > 
> > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> 
> For any packets that sits in the gro_table for too long.
> 
> > 
> > >
> > > /*
> > >  * process packets, might store some packets inside the GRO table,
> > >  * returns number of filled entries in pkt[]
> > >  */
> > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > >
> > > /*
> > >   * retirieves up to num timeouted packets from the table.
> > >   */
> > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > 
> > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > receive burst packets -> perform GRO -> return to application. GRO stops after
> > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > and who will call it?
> 
> I mean the following scenario:
> We receive a packet, find it is eligible for GRO and put it into gro_table
> in expectation - there would be more packets for the same flow.
> But it could happen that we would never (or for some long time) receive
> any new packets for that flow.
> So the first packet would never be delivered to the upper layer,
> or delivered too late.
> So we need a mechanism to extract such not merged packets
> and push them to the upper layer.
> My thought it would be application responsibility to call it from time to time.
> That's actually another reason why I think we shouldn't use application
> to always use RX callbacks for GRO.

Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
which merges N inputted packets at a time. After finishing processing these
packets, it returns all of them and reset hashing tables. Therefore, there
are no packets in hashing tables after rte_gro_reassemble_burst returns.

If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
function, like rte_gro_reassemble, which processes one given packet at a
time and won't reset hashing tables. Applications decide when to flush packets
in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used 
to flush the packets. Do you mean that?
> 
> > 
> > >
> > > And if you like to keep them as helper functions:
> > >
> > > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> > > void rte_gro_port_disable(uint8_t port);
> > >
> >
>  > Though from my perspective, it is out of scope of that library,
> > > and I'd leave it to the upper layer to decide which callbacks and
> > > in what order have to be installed on particular port.
> > 
> > In my opinion, GRO library includes two parts. One is in charge of reassembling
> > packets, the other is in charge of implementing GRO as RX callback. And
> > rte_gro_port_enable/disable belongs to the second part. You mean we should let
> > applications to register RX callback, and GRO library just provides reassembling
> > related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?
> 
> Yes, that would be my preference.
> Let the application developer to decide how/when to
> call GRO functions (callback, direct call, some combination).
> 
> > 
> > >
> > > ....
> > >
> > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > > new file mode 100644
> > > > index 0000000..996b382
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro.c
> > > > @@ -0,0 +1,219 @@
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_mbuf.h>
> > > > +#include <rte_hash.h>
> > > > +#include <stdint.h>
> > > > +#include <rte_malloc.h>
> > > > +
> > > > +#include "rte_gro.h"
> > > > +#include "rte_gro_common.h"
> > > > +
> > > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> > >
> > > Why not to initialize these arrays properly at building time (here)?
> > 
> > Yes, I should declare them as static variables.
> > 
> > >
> > > > +
> > > > +struct rte_gro_status *gro_status;
> > > > +
> > > > +/**
> > > > + * Internal function. It creates one hashing table for all
> > > > + * DPDK-supported GRO types, and all of them are stored in an object
> > > > + * of struct rte_gro_tbl.
> > > > + *
> > > > + * @param name
> > > > + *  Name for GRO lookup table
> > > > + * @param nb_entries
> > > > + *  Element number of each hashing table
> > > > + * @param socket_id
> > > > + *  socket id
> > > > + * @param gro_tbl
> > > > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > > > + *  inside rte_gro_tbl_setup.
> > > > + * @return
> > > > + *  If create successfully, return a positive value; if not, return
> > > > + *  a negative value.
> > > > + */
> > > > +static int
> > > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > > > +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > > > +{
> > > > +	gro_tbl_create_fn create_tbl_fn;
> > > > +	const uint32_t len = strlen(name) + 10;
> > > > +	char tbl_name[len];
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > > +		sprintf(tbl_name, "%s_%u", name, i);
> > > > +		create_tbl_fn = tbl_create_functions[i];
> > > > +		if (create_tbl_fn && (create_tbl_fn(name,
> > > > +						nb_entries,
> > > > +						socket_id,
> > > > +						&(gro_tbl->
> > > > +							lkp_tbls[i].hash_tbl))
> > > > +					< 0)) {
> > >
> > > Shouldn't you destroy already created tables here?
> > 
> > Yes, I should add codes to destory created tables before creating new ones.
> > 
> > >
> > > > +			return -1;
> > > > +		}
> > > > +		gro_tbl->lkp_tbls[i].gro_type = i;
> > > > +	}
> > > > +	return 1;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Internal function. It frees all the hashing tables stored in
> > > > + * the given struct rte_gro_tbl object.
> > > > + */
> > > > +static void
> > > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	if (gro_tbl == NULL)
> > > > +		return;
> > > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > > +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > > > +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > > > +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > > > +	}
> > > > +}
> > > > +
> > > > +/**
> > > > + * Internal function. It performs all supported GRO types on inputted
> > > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > > > + * packets. Packets of unsupported GRO types won't be processed. For
> > > > + * ethernet devices, which want to support GRO, this function is used to
> > > > + * registered as RX callback for all queues.
> > > > + *
> > > > + * @param pkts
> > > > + *  Packets to reassemble.
> > > > + * @param nb_pkts
> > > > + *  The number of packets to reassemble.
> > > > + * @param gro_tbl
> > > > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > > > + *  initialized by rte_gro_tbl_setup.
> > > > + * @return
> > > > + *  Packet number after GRO. If reassemble successfully, the value is
> > > > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > > > + *  parameters are invalid, return 0.
> > > > + */
> > > > +static uint16_t
> > > > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > > > +		uint16_t queue __rte_unused,
> > > > +		struct rte_mbuf **pkts,
> > > > +		uint16_t nb_pkts,
> > > > +		uint16_t max_pkts __rte_unused,
> > > > +		void *gro_tbl)
> > > > +{
> > > > +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> > > > +		printf("invalid parameters for GRO.\n");
> > > > +		return 0;
> > > > +	}
> > > > +	uint16_t nb_after_gro = nb_pkts;
> > >
> > > Here and everywhere - please move variable definitions to the top of the block.
> > 
> > Thanks. I will modify them in next version.
> > 
> > >
> > > > +
> > > > +	return nb_after_gro;
> > > > +}
> > > > +
> > > > +void
> > > > +rte_gro_init(void)
> > > > +{
> > > > +	uint8_t nb_port, i;
> > > > +	uint16_t nb_queue;
> > > > +	struct rte_eth_dev_info dev_info;
> > > > +
> > > > +	/* if init already, return immediately */
> > > > +	if (gro_status) {
> > > > +		printf("repeatly init GRO environment\n");
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> > > > +			NULL,
> > > > +			sizeof(struct rte_gro_status),
> > > > +			0);
> > > > +
> > > > +	nb_port = rte_eth_dev_count();
> > >
> > > Number of ports and number of configured queues per port can change dynamically.
> > > In fact, I don't think we need that function and global gro_status at all.
> > > To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
> > > Searching for particular cb_func and cb_arg values.
> > 
> > Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But
> > we can't remove RX callback by searching for particular cb_func inside GRO library, since
> > these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock)
> > is unavailable to GRO library. To my knowledge, the only way is to provide a new API in
> > lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values.
> > You mean we need to add this API?
> 
> Yes my though was to add something like:
> rte_eth_find_rx_callback()                        /*find specific callback *./
> or rte_eth_remove_get_rx_callbacks()  /*get a copy  of list of all currently installed callback */ 
> 
> But if we move installing GRO callbacks out of scope of the library, we probably wouldn't need it.
> 
> > 
> > >
> > > > +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > > > +			NULL,
> > > > +			nb_port * sizeof(struct gro_port_status),
> > > > +			0);
> > > > +	gro_status->nb_port = nb_port;
> > > > +
> > > > +	for (i = 0; i < nb_port; i++) {
> > > > +		rte_eth_dev_info_get(i, &dev_info);
> > > > +		nb_queue = dev_info.nb_rx_queues;
> > > > +		gro_status->ports[i].gro_tbls =
> > > > +			(struct rte_gro_tbl **)rte_zmalloc(
> > > > +					NULL,
> > > > +					nb_queue * sizeof(struct rte_gro_tbl *),
> > > > +					0);
> > > > +		gro_status->ports[i].gro_cbs =
> > > > +			(struct rte_eth_rxtx_callback **)
> > > > +			rte_zmalloc(
> > > > +					NULL,
> > > > +					nb_queue *
> > > > +					sizeof(struct rte_eth_rxtx_callback *),
> > > > +					0);
> > > > +	}
> > > > +}
> > > > +
> > > > +}
> > > ....
> > > > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
> > > ....
> > >
> > > > +
> > > > +typedef int (*gro_tbl_create_fn)(
> > > > +		char *name,
> > > > +		uint32_t nb_entries,
> > > > +		uint16_t socket_id,
> > > > +		struct rte_hash **hash_tbl);
> > > > +
> > >
> > > If you have create_fn, then you'll probably need a destroy_fn too.
> > > Second thing why always assume that particular GRO implementation would
> > > always use rte_hash to store it's data?
> > > Probably better hide that inside something neutral, like 'struct gro_data;' or so.
> > 
> > Agree. I will modify it.
> > 
> > >
> > > > +typedef int32_t (*gro_reassemble_fn)(
> > > > +		struct rte_hash *hash_tbl,
> > > > +		struct gro_item_list *item_list);
> > > > +#endif
> > > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > > > index b5215c0..8956821 100644
> > > > --- a/mk/rte.app.mk
> > > > +++ b/mk/rte.app.mk
> > > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > > >
> > > >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > > > --
> > > > 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-26  7:26             ` Jiayu Hu
@ 2017-05-26 23:10               ` Ananyev, Konstantin
  2017-05-27  3:41                 ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-26 23:10 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Friday, May 26, 2017 8:26 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> >
> > Hi Jiayu,
> >
> > >
> > > Hi Konstantin,
> > >
> > > Thanks for your comments. My replies/questions are below.
> > >
> > > BRs,
> > > Jiayu
> > >
> > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > Hi Jiayu,
> > > > My comments/questions below.
> > > > Konstantin
> > > >
> > > > >
> > > > > For applications, DPDK GRO provides three external functions to
> > > > > enable/disable GRO:
> > > > > - rte_gro_init: initialize GRO environment;
> > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > initizalize GRO environment. After that, applications can call
> > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > specific ports.
> > > >
> > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > User might want to:
> > > > - enable/disable GRO for particular RX queue
> > > > - or even setup different GRO types for different RX queues,
> > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > >
> > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > to applications. But are there any scenarios that different queues of a port may
> > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> >
> > I think yes.
> >
> > >
> > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > >   But invoke gro() manually at somepoint in his code.
> > >
> > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > the same issue happens in GSO, and even more seriously. This is because applications
> > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > segmentation/reassembling libraries to applications.
> > >
> > > > - Many users would like to control size (number of flows/items per flow),
> > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > >
> > > > So I think that API needs to extended to become be much more fine-grained.
> > > > Something like that:
> > > >
> > > > struct rte_gro_tbl_param {
> > > >    int32_t socket_id;
> > > >    size_t max_flows;
> > > >    size_t max_items_per_flow;
> > > >    size_t max_pkt_size;
> > > >    uint64_t packet_timeout_cycles;
> > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > >   <probably type specific params>
> > > >   ...
> > > > };
> > > >
> > > > struct rte_gro_tbl;
> > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > >
> > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > >
> > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> >
> > For any packets that sits in the gro_table for too long.
> >
> > >
> > > >
> > > > /*
> > > >  * process packets, might store some packets inside the GRO table,
> > > >  * returns number of filled entries in pkt[]
> > > >  */
> > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > >
> > > > /*
> > > >   * retirieves up to num timeouted packets from the table.
> > > >   */
> > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > >
> > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > and who will call it?
> >
> > I mean the following scenario:
> > We receive a packet, find it is eligible for GRO and put it into gro_table
> > in expectation - there would be more packets for the same flow.
> > But it could happen that we would never (or for some long time) receive
> > any new packets for that flow.
> > So the first packet would never be delivered to the upper layer,
> > or delivered too late.
> > So we need a mechanism to extract such not merged packets
> > and push them to the upper layer.
> > My thought it would be application responsibility to call it from time to time.
> > That's actually another reason why I think we shouldn't use application
> > to always use RX callbacks for GRO.
> 
> Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> which merges N inputted packets at a time. After finishing processing these
> packets, it returns all of them and reset hashing tables. Therefore, there
> are no packets in hashing tables after rte_gro_reassemble_burst returns.

Ok, sorry I missed that part with rte_hash_reset().
So gro_ressemble_burst() performs only inline processing on current input packets
and doesn't try to save packets for future merging, correct?
Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
So my opinion let's keep it like that - nice and simple.
BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets. 

> 
> If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> function, like rte_gro_reassemble, which processes one given packet at a
> time and won't reset hashing tables. Applications decide when to flush packets
> in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> to flush the packets. Do you mean that?

Yes, that's what I meant, but as I said above - I think your approach is probably
more preferable - it is much simpler and lightweight.
Konstantin
 

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-26 23:10               ` Ananyev, Konstantin
@ 2017-05-27  3:41                 ` Jiayu Hu
  2017-05-27 11:12                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-05-27  3:41 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Wiles, Keith, yuanhan.liu

On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Friday, May 26, 2017 8:26 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > 
> > Hi Konstantin,
> > 
> > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > >
> > > Hi Jiayu,
> > >
> > > >
> > > > Hi Konstantin,
> > > >
> > > > Thanks for your comments. My replies/questions are below.
> > > >
> > > > BRs,
> > > > Jiayu
> > > >
> > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > Hi Jiayu,
> > > > > My comments/questions below.
> > > > > Konstantin
> > > > >
> > > > > >
> > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > enable/disable GRO:
> > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > initizalize GRO environment. After that, applications can call
> > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > specific ports.
> > > > >
> > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > User might want to:
> > > > > - enable/disable GRO for particular RX queue
> > > > > - or even setup different GRO types for different RX queues,
> > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > >
> > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > to applications. But are there any scenarios that different queues of a port may
> > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > >
> > > I think yes.
> > >
> > > >
> > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > >   But invoke gro() manually at somepoint in his code.
> > > >
> > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > segmentation/reassembling libraries to applications.
> > > >
> > > > > - Many users would like to control size (number of flows/items per flow),
> > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > >
> > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > Something like that:
> > > > >
> > > > > struct rte_gro_tbl_param {
> > > > >    int32_t socket_id;
> > > > >    size_t max_flows;
> > > > >    size_t max_items_per_flow;
> > > > >    size_t max_pkt_size;
> > > > >    uint64_t packet_timeout_cycles;
> > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > >   <probably type specific params>
> > > > >   ...
> > > > > };
> > > > >
> > > > > struct rte_gro_tbl;
> > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > >
> > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > >
> > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > >
> > > For any packets that sits in the gro_table for too long.
> > >
> > > >
> > > > >
> > > > > /*
> > > > >  * process packets, might store some packets inside the GRO table,
> > > > >  * returns number of filled entries in pkt[]
> > > > >  */
> > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > >
> > > > > /*
> > > > >   * retirieves up to num timeouted packets from the table.
> > > > >   */
> > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > >
> > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > and who will call it?
> > >
> > > I mean the following scenario:
> > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > in expectation - there would be more packets for the same flow.
> > > But it could happen that we would never (or for some long time) receive
> > > any new packets for that flow.
> > > So the first packet would never be delivered to the upper layer,
> > > or delivered too late.
> > > So we need a mechanism to extract such not merged packets
> > > and push them to the upper layer.
> > > My thought it would be application responsibility to call it from time to time.
> > > That's actually another reason why I think we shouldn't use application
> > > to always use RX callbacks for GRO.
> > 
> > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > which merges N inputted packets at a time. After finishing processing these
> > packets, it returns all of them and reset hashing tables. Therefore, there
> > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> 
> Ok, sorry I missed that part with rte_hash_reset().
> So gro_ressemble_burst() performs only inline processing on current input packets
> and doesn't try to save packets for future merging, correct?

Yes.

> Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> So my opinion let's keep it like that - nice and simple.
> BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets. 

Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
some GRO type specific information?

> 
> > 
> > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > function, like rte_gro_reassemble, which processes one given packet at a
> > time and won't reset hashing tables. Applications decide when to flush packets
> > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > to flush the packets. Do you mean that?
> 
> Yes, that's what I meant, but as I said above - I think your approach is probably
> more preferable - it is much simpler and lightweight.
> Konstantin
>  

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-27  3:41                 ` Jiayu Hu
@ 2017-05-27 11:12                   ` Ananyev, Konstantin
  2017-05-27 14:09                     ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-27 11:12 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Wiles, Keith, yuanhan.liu



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Saturday, May 27, 2017 4:42 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > Hi Jiayu,
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Friday, May 26, 2017 8:26 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > >
> > > Hi Konstantin,
> > >
> > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > > Hi Jiayu,
> > > >
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > Thanks for your comments. My replies/questions are below.
> > > > >
> > > > > BRs,
> > > > > Jiayu
> > > > >
> > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > > My comments/questions below.
> > > > > > Konstantin
> > > > > >
> > > > > > >
> > > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > > enable/disable GRO:
> > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > > specific ports.
> > > > > >
> > > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > > User might want to:
> > > > > > - enable/disable GRO for particular RX queue
> > > > > > - or even setup different GRO types for different RX queues,
> > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > > >
> > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > > to applications. But are there any scenarios that different queues of a port may
> > > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > > >
> > > > I think yes.
> > > >
> > > > >
> > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > > >   But invoke gro() manually at somepoint in his code.
> > > > >
> > > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > > segmentation/reassembling libraries to applications.
> > > > >
> > > > > > - Many users would like to control size (number of flows/items per flow),
> > > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > > >
> > > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > > Something like that:
> > > > > >
> > > > > > struct rte_gro_tbl_param {
> > > > > >    int32_t socket_id;
> > > > > >    size_t max_flows;
> > > > > >    size_t max_items_per_flow;
> > > > > >    size_t max_pkt_size;
> > > > > >    uint64_t packet_timeout_cycles;
> > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > >   <probably type specific params>
> > > > > >   ...
> > > > > > };
> > > > > >
> > > > > > struct rte_gro_tbl;
> > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > > >
> > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > >
> > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > > >
> > > > For any packets that sits in the gro_table for too long.
> > > >
> > > > >
> > > > > >
> > > > > > /*
> > > > > >  * process packets, might store some packets inside the GRO table,
> > > > > >  * returns number of filled entries in pkt[]
> > > > > >  */
> > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > > >
> > > > > > /*
> > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > >   */
> > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > >
> > > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > > and who will call it?
> > > >
> > > > I mean the following scenario:
> > > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > > in expectation - there would be more packets for the same flow.
> > > > But it could happen that we would never (or for some long time) receive
> > > > any new packets for that flow.
> > > > So the first packet would never be delivered to the upper layer,
> > > > or delivered too late.
> > > > So we need a mechanism to extract such not merged packets
> > > > and push them to the upper layer.
> > > > My thought it would be application responsibility to call it from time to time.
> > > > That's actually another reason why I think we shouldn't use application
> > > > to always use RX callbacks for GRO.
> > >
> > > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > > which merges N inputted packets at a time. After finishing processing these
> > > packets, it returns all of them and reset hashing tables. Therefore, there
> > > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> >
> > Ok, sorry I missed that part with rte_hash_reset().
> > So gro_ressemble_burst() performs only inline processing on current input packets
> > and doesn't try to save packets for future merging, correct?
> 
> Yes.
> 
> > Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> > So my opinion let's keep it like that - nice and simple.
> > BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets.
> 
> Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
> some GRO type specific information?

Yes, but we don't need the actual hash-tables, etc. inside.
Passing something like struct gro_param seems enough.

> 
> >
> > >
> > > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > > function, like rte_gro_reassemble, which processes one given packet at a
> > > time and won't reset hashing tables. Applications decide when to flush packets
> > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > > to flush the packets. Do you mean that?
> >
> > Yes, that's what I meant, but as I said above - I think your approach is probably
> > more preferable - it is much simpler and lightweight.
> > Konstantin
> >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-27 11:12                   ` Ananyev, Konstantin
@ 2017-05-27 14:09                     ` Jiayu Hu
  2017-05-27 16:51                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-05-27 14:09 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Konstantin,

On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Saturday, May 27, 2017 4:42 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > 
> > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > Hi Jiayu,
> > >
> > > > -----Original Message-----
> > > > From: Hu, Jiayu
> > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > > Hi Jiayu,
> > > > >
> > > > > >
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > Thanks for your comments. My replies/questions are below.
> > > > > >
> > > > > > BRs,
> > > > > > Jiayu
> > > > > >
> > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > > > Hi Jiayu,
> > > > > > > My comments/questions below.
> > > > > > > Konstantin
> > > > > > >
> > > > > > > >
> > > > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > > > enable/disable GRO:
> > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > > > specific ports.
> > > > > > >
> > > > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > > > User might want to:
> > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > > > >
> > > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > > > to applications. But are there any scenarios that different queues of a port may
> > > > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > > > >
> > > > > I think yes.
> > > > >
> > > > > >
> > > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > >
> > > > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > > > segmentation/reassembling libraries to applications.
> > > > > >
> > > > > > > - Many users would like to control size (number of flows/items per flow),
> > > > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > > > >
> > > > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > > > Something like that:
> > > > > > >
> > > > > > > struct rte_gro_tbl_param {
> > > > > > >    int32_t socket_id;
> > > > > > >    size_t max_flows;
> > > > > > >    size_t max_items_per_flow;
> > > > > > >    size_t max_pkt_size;
> > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > >   <probably type specific params>
> > > > > > >   ...
> > > > > > > };
> > > > > > >
> > > > > > > struct rte_gro_tbl;
> > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > > > >
> > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > >
> > > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > > > >
> > > > > For any packets that sits in the gro_table for too long.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > /*
> > > > > > >  * process packets, might store some packets inside the GRO table,
> > > > > > >  * returns number of filled entries in pkt[]
> > > > > > >  */
> > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > >
> > > > > > > /*
> > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > >   */
> > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > >
> > > > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > > > and who will call it?
> > > > >
> > > > > I mean the following scenario:
> > > > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > > > in expectation - there would be more packets for the same flow.
> > > > > But it could happen that we would never (or for some long time) receive
> > > > > any new packets for that flow.
> > > > > So the first packet would never be delivered to the upper layer,
> > > > > or delivered too late.
> > > > > So we need a mechanism to extract such not merged packets
> > > > > and push them to the upper layer.
> > > > > My thought it would be application responsibility to call it from time to time.
> > > > > That's actually another reason why I think we shouldn't use application
> > > > > to always use RX callbacks for GRO.
> > > >
> > > > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > > > which merges N inputted packets at a time. After finishing processing these
> > > > packets, it returns all of them and reset hashing tables. Therefore, there
> > > > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> > >
> > > Ok, sorry I missed that part with rte_hash_reset().
> > > So gro_ressemble_burst() performs only inline processing on current input packets
> > > and doesn't try to save packets for future merging, correct?
> > 
> > Yes.
> > 
> > > Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> > > So my opinion let's keep it like that - nice and simple.
> > > BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets.
> > 
> > Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
> > some GRO type specific information?
> 
> Yes, but we don't need the actual hash-tables, etc. inside.
> Passing something like struct gro_param seems enough.

Yes, we can just pass gro_param and allocate hashing tables
inside rte_gro_reassemble_burst. If so, hashing tables of
desired GRO types are created and freed in each invocation
of rte_gro_reassemble_burst. In GRO library, hashing tables
are created by GRO type specific gro_tbl_create_fn. These
gro_tbl_create_fn may allocate hashing table space via malloc
(or rte_malloc). Therefore, we may frequently call malloc/free
when using rte_gro_reassemble_burst. In my opinion, it will
degrade GRO performance greatly.

But if we ask applications to input hashing tables, what we
need to do is to reset them after finishing using in
rte_gro_reassemble_burst, rather than to malloc and free each
time. Therefore, I think this way is more efficient. How do you
think?

> 
> > 
> > >
> > > >
> > > > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > > > function, like rte_gro_reassemble, which processes one given packet at a
> > > > time and won't reset hashing tables. Applications decide when to flush packets
> > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > > > to flush the packets. Do you mean that?
> > >
> > > Yes, that's what I meant, but as I said above - I think your approach is probably
> > > more preferable - it is much simpler and lightweight.
> > > Konstantin
> > >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-27 14:09                     ` Jiayu Hu
@ 2017-05-27 16:51                       ` Ananyev, Konstantin
  2017-05-29 10:22                         ` Hu, Jiayu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-27 16:51 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Wiles, Keith, yuanhan.liu


Hi Jiayu,

> 
> Hi Konstantin,
> 
> On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Saturday, May 27, 2017 4:42 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > >
> > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > Hi Jiayu,
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > > > > >
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > >
> > > > > > > BRs,
> > > > > > > Jiayu
> > > > > > >
> > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > > > > Hi Jiayu,
> > > > > > > > My comments/questions below.
> > > > > > > > Konstantin
> > > > > > > >
> > > > > > > > >
> > > > > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > > > > enable/disable GRO:
> > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > > > > specific ports.
> > > > > > > >
> > > > > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > > > > User might want to:
> > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > > > > >
> > > > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > > > > to applications. But are there any scenarios that different queues of a port may
> > > > > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > > > > >
> > > > > > I think yes.
> > > > > >
> > > > > > >
> > > > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > >
> > > > > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > > > > segmentation/reassembling libraries to applications.
> > > > > > >
> > > > > > > > - Many users would like to control size (number of flows/items per flow),
> > > > > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > > > > >
> > > > > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > struct rte_gro_tbl_param {
> > > > > > > >    int32_t socket_id;
> > > > > > > >    size_t max_flows;
> > > > > > > >    size_t max_items_per_flow;
> > > > > > > >    size_t max_pkt_size;
> > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > >   <probably type specific params>
> > > > > > > >   ...
> > > > > > > > };
> > > > > > > >
> > > > > > > > struct rte_gro_tbl;
> > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > > > > >
> > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > >
> > > > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > > > > >
> > > > > > For any packets that sits in the gro_table for too long.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > /*
> > > > > > > >  * process packets, might store some packets inside the GRO table,
> > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > >  */
> > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > >
> > > > > > > > /*
> > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > >   */
> > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > >
> > > > > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > > > > and who will call it?
> > > > > >
> > > > > > I mean the following scenario:
> > > > > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > > > > in expectation - there would be more packets for the same flow.
> > > > > > But it could happen that we would never (or for some long time) receive
> > > > > > any new packets for that flow.
> > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > or delivered too late.
> > > > > > So we need a mechanism to extract such not merged packets
> > > > > > and push them to the upper layer.
> > > > > > My thought it would be application responsibility to call it from time to time.
> > > > > > That's actually another reason why I think we shouldn't use application
> > > > > > to always use RX callbacks for GRO.
> > > > >
> > > > > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > > > > which merges N inputted packets at a time. After finishing processing these
> > > > > packets, it returns all of them and reset hashing tables. Therefore, there
> > > > > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> > > >
> > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > So gro_ressemble_burst() performs only inline processing on current input packets
> > > > and doesn't try to save packets for future merging, correct?
> > >
> > > Yes.
> > >
> > > > Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> > > > So my opinion let's keep it like that - nice and simple.
> > > > BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets.
> > >
> > > Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
> > > some GRO type specific information?
> >
> > Yes, but we don't need the actual hash-tables, etc. inside.
> > Passing something like struct gro_param seems enough.
> 
> Yes, we can just pass gro_param and allocate hashing tables
> inside rte_gro_reassemble_burst. If so, hashing tables of
> desired GRO types are created and freed in each invocation
> of rte_gro_reassemble_burst. In GRO library, hashing tables
> are created by GRO type specific gro_tbl_create_fn. These
> gro_tbl_create_fn may allocate hashing table space via malloc
> (or rte_malloc). Therefore, we may frequently call malloc/free
> when using rte_gro_reassemble_burst. In my opinion, it will
> degrade GRO performance greatly.

I don't' understand why do we need to put/extract each packet into/from hash table at all.
We have N input packets that need to be grouped/sorted  by some criteria.
Surely that can be done without any hash-table involved.
What is the need for hash table and all the overhead it brings here?
Konstantin

> 
> But if we ask applications to input hashing tables, what we
> need to do is to reset them after finishing using in
> rte_gro_reassemble_burst, rather than to malloc and free each
> time. Therefore, I think this way is more efficient. How do you
> think?
> 
> >
> > >
> > > >
> > > > >
> > > > > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > > > > function, like rte_gro_reassemble, which processes one given packet at a
> > > > > time and won't reset hashing tables. Applications decide when to flush packets
> > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > > > > to flush the packets. Do you mean that?
> > > >
> > > > Yes, that's what I meant, but as I said above - I think your approach is probably
> > > > more preferable - it is much simpler and lightweight.
> > > > Konstantin
> > > >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-27 16:51                       ` Ananyev, Konstantin
@ 2017-05-29 10:22                         ` Hu, Jiayu
  2017-05-29 12:18                           ` Bruce Richardson
  2017-05-29 12:51                           ` Ananyev, Konstantin
  0 siblings, 2 replies; 141+ messages in thread
From: Hu, Jiayu @ 2017-05-29 10:22 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Sunday, May 28, 2017 12:51 AM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> 
> Hi Jiayu,
> 
> >
> > Hi Konstantin,
> >
> > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Hu, Jiayu
> > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > > >
> > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > Hi Jiayu,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hu, Jiayu
> > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > > > > >
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> wrote:
> > > > > > >
> > > > > > > Hi Jiayu,
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > >
> > > > > > > > BRs,
> > > > > > > > Jiayu
> > > > > > > >
> > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> Konstantin wrote:
> > > > > > > > > Hi Jiayu,
> > > > > > > > > My comments/questions below.
> > > > > > > > > Konstantin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > For applications, DPDK GRO provides three external functions
> to
> > > > > > > > > > enable/disable GRO:
> > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > Before using GRO, applications should explicitly call
> rte_gro_init to
> > > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> disable GRO for
> > > > > > > > > > specific ports.
> > > > > > > > >
> > > > > > > > > I think this is too restrictive and wouldn't meet various user's
> needs.
> > > > > > > > > User might want to:
> > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> IPV6/TCP for queue 1, etc.
> > > > > > > >
> > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> queue is that LINUX
> > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> provide more flexibility
> > > > > > > > to applications. But are there any scenarios that different
> queues of a port may
> > > > > > > > require different GRO control (i.e. GRO types and enable/disable
> GRO)?
> > > > > > >
> > > > > > > I think yes.
> > > > > > >
> > > > > > > >
> > > > > > > > > - For various reasons, user might prefer not to use RX callbacks
> for various reasons,
> > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > >
> > > > > > > > An application-used GRO library can enable more flexibility to
> applications. Besides,
> > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an
> issue that
> > > > > > > > rte_eth_rx_burst returns actually received packet number or
> GROed packet number. And
> > > > > > > > the same issue happens in GSO, and even more seriously. This is
> because applications
> > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to
> decide further
> > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> this issue won't exist.
> > > > > > > > And DPDK is a library, which is not a holistic system like LINUX.
> We don't need to do
> > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> directly provide SW
> > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > >
> > > > > > > > > - Many users would like to control size (number of flows/items
> per flow),
> > > > > > > > >   max allowed packet size, max timeout, etc., for different GRO
> tables.
> > > > > > > > > - User would need a way to flush all or only timeout packets
> from particular GRO tables.
> > > > > > > > >
> > > > > > > > > So I think that API needs to extended to become be much more
> fine-grained.
> > > > > > > > > Something like that:
> > > > > > > > >
> > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > >    int32_t socket_id;
> > > > > > > > >    size_t max_flows;
> > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > >    size_t max_pkt_size;
> > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > >   <probably type specific params>
> > > > > > > > >   ...
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> rte_gro_tbl_param *param);
> > > > > > > > >
> > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > >
> > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> grained control APIs to
> > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it
> for TCP packets?
> > > > > > >
> > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > /*
> > > > > > > > >  * process packets, might store some packets inside the GRO
> table,
> > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > >  */
> > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > /*
> > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > >   */
> > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t
> tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > >
> > > > > > > > Currently, we implement GRO as RX callback, whose processing
> logic is simple:
> > > > > > > > receive burst packets -> perform GRO -> return to application.
> GRO stops after
> > > > > > > > finishing processing received packets. If we provide
> rte_gro_tbl_timeout, when
> > > > > > > > and who will call it?
> > > > > > >
> > > > > > > I mean the following scenario:
> > > > > > > We receive a packet, find it is eligible for GRO and put it into
> gro_table
> > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > But it could happen that we would never (or for some long time)
> receive
> > > > > > > any new packets for that flow.
> > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > or delivered too late.
> > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > and push them to the upper layer.
> > > > > > > My thought it would be application responsibility to call it from
> time to time.
> > > > > > > That's actually another reason why I think we shouldn't use
> application
> > > > > > > to always use RX callbacks for GRO.
> > > > > >
> > > > > > Currently, we only provide one reassembly function,
> rte_gro_reassemble_burst,
> > > > > > which merges N inputted packets at a time. After finishing
> processing these
> > > > > > packets, it returns all of them and reset hashing tables. Therefore,
> there
> > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> returns.
> > > > >
> > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > So gro_ressemble_burst() performs only inline processing on current
> input packets
> > > > > and doesn't try to save packets for future merging, correct?
> > > >
> > > > Yes.
> > > >
> > > > > Such approach indeed is much lightweight and doesn't require any
> extra timeouts and flush().
> > > > > So my opinion let's keep it like that - nice and simple.
> > > > > BTW, I think in that case we don't need any hashtables (or any other
> persistent strucures)at all.
> > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> perform on given array of packets.
> > > >
> > > > Beside GRO types that are desired to perform, maybe it also needs
> max_pkt_size and
> > > > some GRO type specific information?
> > >
> > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > Passing something like struct gro_param seems enough.
> >
> > Yes, we can just pass gro_param and allocate hashing tables
> > inside rte_gro_reassemble_burst. If so, hashing tables of
> > desired GRO types are created and freed in each invocation
> > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > are created by GRO type specific gro_tbl_create_fn. These
> > gro_tbl_create_fn may allocate hashing table space via malloc
> > (or rte_malloc). Therefore, we may frequently call malloc/free
> > when using rte_gro_reassemble_burst. In my opinion, it will
> > degrade GRO performance greatly.
> 
> I don't' understand why do we need to put/extract each packet into/from
> hash table at all.
> We have N input packets that need to be grouped/sorted  by some criteria.
> Surely that can be done without any hash-table involved.
> What is the need for hash table and all the overhead it brings here?

In current design, I assume all GRO types use hash tables to merge
packets. The key of the hash table is the criteria to merge packets.
So the main difference for different GRO types' hash tables is the
key definition.

And the reason for using hash tables is to speed up reassembly. Given
There are N TCP packets inputted, the simplest way to process packets[i]
Is to traverse processed packets[0]~packets[i-1] and try to find a one
to merge. In the worst case, we need to check all of packets[0~i-1].
In this case, the time complexity of processing N packets is O(N^2).
If we use a hash table, whose key is the criteria to merge two packets,
the time to find a packet that may be merged with packets[i] is O(1).

Do you think it's too complicated?

Jiayu

> Konstantin
> 
> >
> > But if we ask applications to input hashing tables, what we
> > need to do is to reset them after finishing using in
> > rte_gro_reassemble_burst, rather than to malloc and free each
> > time. Therefore, I think this way is more efficient. How do you
> > think?
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > If we provide rte_gro_tbl_timeout, we also need to provide another
> reassmebly
> > > > > > function, like rte_gro_reassemble, which processes one given
> packet at a
> > > > > > time and won't reset hashing tables. Applications decide when to
> flush packets
> > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that
> can be used
> > > > > > to flush the packets. Do you mean that?
> > > > >
> > > > > Yes, that's what I meant, but as I said above - I think your approach is
> probably
> > > > > more preferable - it is much simpler and lightweight.
> > > > > Konstantin
> > > > >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-29 10:22                         ` Hu, Jiayu
@ 2017-05-29 12:18                           ` Bruce Richardson
  2017-05-30 14:10                             ` Hu, Jiayu
  2017-05-29 12:51                           ` Ananyev, Konstantin
  1 sibling, 1 reply; 141+ messages in thread
From: Bruce Richardson @ 2017-05-29 12:18 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: Ananyev, Konstantin, dev, Wiles, Keith, yuanhan.liu

On Mon, May 29, 2017 at 10:22:57AM +0000, Hu, Jiayu wrote:
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Sunday, May 28, 2017 12:51 AM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > 
> > 
> > Hi Jiayu,
> > 
> > >
> > > Hi Konstantin,
> > >
> > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > >
> > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > >
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > >
> > > > > > > > > BRs,
> > > > > > > > > Jiayu
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > Konstantin wrote:
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > > My comments/questions below.
> > > > > > > > > > Konstantin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For applications, DPDK GRO provides three external functions
> > to
> > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > Before using GRO, applications should explicitly call
> > rte_gro_init to
> > > > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> > disable GRO for
> > > > > > > > > > > specific ports.
> > > > > > > > > >
> > > > > > > > > > I think this is too restrictive and wouldn't meet various user's
> > needs.
> > > > > > > > > > User might want to:
> > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > IPV6/TCP for queue 1, etc.
> > > > > > > > >
> > > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> > queue is that LINUX
> > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > provide more flexibility
> > > > > > > > > to applications. But are there any scenarios that different
> > queues of a port may
> > > > > > > > > require different GRO control (i.e. GRO types and enable/disable
> > GRO)?
> > > > > > > >
> > > > > > > > I think yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > - For various reasons, user might prefer not to use RX callbacks
> > for various reasons,
> > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > >
> > > > > > > > > An application-used GRO library can enable more flexibility to
> > applications. Besides,
> > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an
> > issue that
> > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > GROed packet number. And
> > > > > > > > > the same issue happens in GSO, and even more seriously. This is
> > because applications
> > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to
> > decide further
> > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > this issue won't exist.
> > > > > > > > > And DPDK is a library, which is not a holistic system like LINUX.
> > We don't need to do
> > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > directly provide SW
> > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > >
> > > > > > > > > > - Many users would like to control size (number of flows/items
> > per flow),
> > > > > > > > > >   max allowed packet size, max timeout, etc., for different GRO
> > tables.
> > > > > > > > > > - User would need a way to flush all or only timeout packets
> > from particular GRO tables.
> > > > > > > > > >
> > > > > > > > > > So I think that API needs to extended to become be much more
> > fine-grained.
> > > > > > > > > > Something like that:
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > >    int32_t socket_id;
> > > > > > > > > >    size_t max_flows;
> > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > >   <probably type specific params>
> > > > > > > > > >   ...
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > rte_gro_tbl_param *param);
> > > > > > > > > >
> > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > >
> > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > grained control APIs to
> > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it
> > for TCP packets?
> > > > > > > >
> > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >  * process packets, might store some packets inside the GRO
> > table,
> > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > >  */
> > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > >   */
> > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t
> > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > Currently, we implement GRO as RX callback, whose processing
> > logic is simple:
> > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > GRO stops after
> > > > > > > > > finishing processing received packets. If we provide
> > rte_gro_tbl_timeout, when
> > > > > > > > > and who will call it?
> > > > > > > >
> > > > > > > > I mean the following scenario:
> > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > gro_table
> > > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > > But it could happen that we would never (or for some long time)
> > receive
> > > > > > > > any new packets for that flow.
> > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > or delivered too late.
> > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > and push them to the upper layer.
> > > > > > > > My thought it would be application responsibility to call it from
> > time to time.
> > > > > > > > That's actually another reason why I think we shouldn't use
> > application
> > > > > > > > to always use RX callbacks for GRO.
> > > > > > >
> > > > > > > Currently, we only provide one reassembly function,
> > rte_gro_reassemble_burst,
> > > > > > > which merges N inputted packets at a time. After finishing
> > processing these
> > > > > > > packets, it returns all of them and reset hashing tables. Therefore,
> > there
> > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > returns.
> > > > > >
> > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > So gro_ressemble_burst() performs only inline processing on current
> > input packets
> > > > > > and doesn't try to save packets for future merging, correct?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > Such approach indeed is much lightweight and doesn't require any
> > extra timeouts and flush().
> > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > BTW, I think in that case we don't need any hashtables (or any other
> > persistent strucures)at all.
> > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > perform on given array of packets.
> > > > >
> > > > > Beside GRO types that are desired to perform, maybe it also needs
> > max_pkt_size and
> > > > > some GRO type specific information?
> > > >
> > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > Passing something like struct gro_param seems enough.
> > >
> > > Yes, we can just pass gro_param and allocate hashing tables
> > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > desired GRO types are created and freed in each invocation
> > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > are created by GRO type specific gro_tbl_create_fn. These
> > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > degrade GRO performance greatly.
> > 
> > I don't' understand why do we need to put/extract each packet into/from
> > hash table at all.
> > We have N input packets that need to be grouped/sorted  by some criteria.
> > Surely that can be done without any hash-table involved.
> > What is the need for hash table and all the overhead it brings here?
> 
> In current design, I assume all GRO types use hash tables to merge
> packets. The key of the hash table is the criteria to merge packets.
> So the main difference for different GRO types' hash tables is the
> key definition.
> 
> And the reason for using hash tables is to speed up reassembly. Given
> There are N TCP packets inputted, the simplest way to process packets[i]
> Is to traverse processed packets[0]~packets[i-1] and try to find a one
> to merge. In the worst case, we need to check all of packets[0~i-1].
> In this case, the time complexity of processing N packets is O(N^2).
> If we use a hash table, whose key is the criteria to merge two packets,
> the time to find a packet that may be merged with packets[i] is O(1).
> 
> Do you think it's too complicated?
> 
> Jiayu
> 
How big is your expected burst size? If you are expecting 32 or 64
packets per call, then N is small and the overhead of the hash table
seems a bit much. Perhaps you need different code paths for bigger and
smaller bursts?

/Bruce

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-29 10:22                         ` Hu, Jiayu
  2017-05-29 12:18                           ` Bruce Richardson
@ 2017-05-29 12:51                           ` Ananyev, Konstantin
  2017-05-30  5:29                             ` Hu, Jiayu
  1 sibling, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-29 12:51 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Monday, May 29, 2017 11:23 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Sunday, May 28, 2017 12:51 AM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> >
> >
> > Hi Jiayu,
> >
> > >
> > > Hi Konstantin,
> > >
> > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > >
> > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > >
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > >
> > > > > > > > > BRs,
> > > > > > > > > Jiayu
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > Konstantin wrote:
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > > My comments/questions below.
> > > > > > > > > > Konstantin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For applications, DPDK GRO provides three external functions
> > to
> > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > Before using GRO, applications should explicitly call
> > rte_gro_init to
> > > > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> > disable GRO for
> > > > > > > > > > > specific ports.
> > > > > > > > > >
> > > > > > > > > > I think this is too restrictive and wouldn't meet various user's
> > needs.
> > > > > > > > > > User might want to:
> > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > IPV6/TCP for queue 1, etc.
> > > > > > > > >
> > > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> > queue is that LINUX
> > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > provide more flexibility
> > > > > > > > > to applications. But are there any scenarios that different
> > queues of a port may
> > > > > > > > > require different GRO control (i.e. GRO types and enable/disable
> > GRO)?
> > > > > > > >
> > > > > > > > I think yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > - For various reasons, user might prefer not to use RX callbacks
> > for various reasons,
> > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > >
> > > > > > > > > An application-used GRO library can enable more flexibility to
> > applications. Besides,
> > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an
> > issue that
> > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > GROed packet number. And
> > > > > > > > > the same issue happens in GSO, and even more seriously. This is
> > because applications
> > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to
> > decide further
> > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > this issue won't exist.
> > > > > > > > > And DPDK is a library, which is not a holistic system like LINUX.
> > We don't need to do
> > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > directly provide SW
> > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > >
> > > > > > > > > > - Many users would like to control size (number of flows/items
> > per flow),
> > > > > > > > > >   max allowed packet size, max timeout, etc., for different GRO
> > tables.
> > > > > > > > > > - User would need a way to flush all or only timeout packets
> > from particular GRO tables.
> > > > > > > > > >
> > > > > > > > > > So I think that API needs to extended to become be much more
> > fine-grained.
> > > > > > > > > > Something like that:
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > >    int32_t socket_id;
> > > > > > > > > >    size_t max_flows;
> > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > >   <probably type specific params>
> > > > > > > > > >   ...
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > rte_gro_tbl_param *param);
> > > > > > > > > >
> > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > >
> > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > grained control APIs to
> > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it
> > for TCP packets?
> > > > > > > >
> > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >  * process packets, might store some packets inside the GRO
> > table,
> > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > >  */
> > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > >   */
> > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t
> > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > Currently, we implement GRO as RX callback, whose processing
> > logic is simple:
> > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > GRO stops after
> > > > > > > > > finishing processing received packets. If we provide
> > rte_gro_tbl_timeout, when
> > > > > > > > > and who will call it?
> > > > > > > >
> > > > > > > > I mean the following scenario:
> > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > gro_table
> > > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > > But it could happen that we would never (or for some long time)
> > receive
> > > > > > > > any new packets for that flow.
> > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > or delivered too late.
> > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > and push them to the upper layer.
> > > > > > > > My thought it would be application responsibility to call it from
> > time to time.
> > > > > > > > That's actually another reason why I think we shouldn't use
> > application
> > > > > > > > to always use RX callbacks for GRO.
> > > > > > >
> > > > > > > Currently, we only provide one reassembly function,
> > rte_gro_reassemble_burst,
> > > > > > > which merges N inputted packets at a time. After finishing
> > processing these
> > > > > > > packets, it returns all of them and reset hashing tables. Therefore,
> > there
> > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > returns.
> > > > > >
> > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > So gro_ressemble_burst() performs only inline processing on current
> > input packets
> > > > > > and doesn't try to save packets for future merging, correct?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > Such approach indeed is much lightweight and doesn't require any
> > extra timeouts and flush().
> > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > BTW, I think in that case we don't need any hashtables (or any other
> > persistent strucures)at all.
> > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > perform on given array of packets.
> > > > >
> > > > > Beside GRO types that are desired to perform, maybe it also needs
> > max_pkt_size and
> > > > > some GRO type specific information?
> > > >
> > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > Passing something like struct gro_param seems enough.
> > >
> > > Yes, we can just pass gro_param and allocate hashing tables
> > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > desired GRO types are created and freed in each invocation
> > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > are created by GRO type specific gro_tbl_create_fn. These
> > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > degrade GRO performance greatly.
> >
> > I don't' understand why do we need to put/extract each packet into/from
> > hash table at all.
> > We have N input packets that need to be grouped/sorted  by some criteria.
> > Surely that can be done without any hash-table involved.
> > What is the need for hash table and all the overhead it brings here?
> 
> In current design, I assume all GRO types use hash tables to merge
> packets. The key of the hash table is the criteria to merge packets.
> So the main difference for different GRO types' hash tables is the
> key definition.
> 
> And the reason for using hash tables is to speed up reassembly. Given
> There are N TCP packets inputted, the simplest way to process packets[i]
> Is to traverse processed packets[0]~packets[i-1] and try to find a one
> to merge. In the worst case, we need to check all of packets[0~i-1].
> In this case, the time complexity of processing N packets is O(N^2).
> If we use a hash table, whose key is the criteria to merge two packets,
> the time to find a packet that may be merged with packets[i] is O(1).

I understand that, but add/search inside the hash table,
plus resetting it for every burst of packets doesn't come for free also.
I think that for most common burst sizes (< 100 packets) 
hash overhead would significantly overweight the price of
worst case O(N^2) scan.
Also, if such worst case really worries you, you can  pre-sort
input packets first before starting the actual reassemble for them.
That might help to bring the price down.
Konstantin  

> 
> Do you think it's too complicated?
> 
> Jiayu
> 
> > Konstantin
> >
> > >
> > > But if we ask applications to input hashing tables, what we
> > > need to do is to reset them after finishing using in
> > > rte_gro_reassemble_burst, rather than to malloc and free each
> > > time. Therefore, I think this way is more efficient. How do you
> > > think?
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > If we provide rte_gro_tbl_timeout, we also need to provide another
> > reassmebly
> > > > > > > function, like rte_gro_reassemble, which processes one given
> > packet at a
> > > > > > > time and won't reset hashing tables. Applications decide when to
> > flush packets
> > > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that
> > can be used
> > > > > > > to flush the packets. Do you mean that?
> > > > > >
> > > > > > Yes, that's what I meant, but as I said above - I think your approach is
> > probably
> > > > > > more preferable - it is much simpler and lightweight.
> > > > > > Konstantin
> > > > > >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-29 12:51                           ` Ananyev, Konstantin
@ 2017-05-30  5:29                             ` Hu, Jiayu
  2017-05-30 11:56                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Hu, Jiayu @ 2017-05-30  5:29 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Wiles, Keith, yuanhan.liu

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, May 29, 2017 8:52 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Monday, May 29, 2017 11:23 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Sunday, May 28, 2017 12:51 AM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > >
> > >
> > > Hi Jiayu,
> > >
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hu, Jiayu
> > > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > >
> > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin
> wrote:
> > > > > > > Hi Jiayu,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Hu, Jiayu
> > > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev,
> Konstantin
> > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Jiayu,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Konstantin,
> > > > > > > > > >
> > > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > > >
> > > > > > > > > > BRs,
> > > > > > > > > > Jiayu
> > > > > > > > > >
> > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > > Konstantin wrote:
> > > > > > > > > > > Hi Jiayu,
> > > > > > > > > > > My comments/questions below.
> > > > > > > > > > > Konstantin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > For applications, DPDK GRO provides three external
> functions
> > > to
> > > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > > Before using GRO, applications should explicitly call
> > > rte_gro_init to
> > > > > > > > > > > > initizalize GRO environment. After that, applications can
> call
> > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable
> to
> > > disable GRO for
> > > > > > > > > > > > specific ports.
> > > > > > > > > > >
> > > > > > > > > > > I think this is too restrictive and wouldn't meet various
> user's
> > > needs.
> > > > > > > > > > > User might want to:
> > > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > > IPV6/TCP for queue 1, etc.
> > > > > > > > > >
> > > > > > > > > > The reason for enabling/disabling GRO per-port instead of
> per-
> > > queue is that LINUX
> > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > > provide more flexibility
> > > > > > > > > > to applications. But are there any scenarios that different
> > > queues of a port may
> > > > > > > > > > require different GRO control (i.e. GRO types and
> enable/disable
> > > GRO)?
> > > > > > > > >
> > > > > > > > > I think yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > - For various reasons, user might prefer not to use RX
> callbacks
> > > for various reasons,
> > > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > > >
> > > > > > > > > > An application-used GRO library can enable more flexibility to
> > > applications. Besides,
> > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is
> an
> > > issue that
> > > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > > GROed packet number. And
> > > > > > > > > > the same issue happens in GSO, and even more seriously.
> This is
> > > because applications
> > > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst
> to
> > > decide further
> > > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > > this issue won't exist.
> > > > > > > > > > And DPDK is a library, which is not a holistic system like
> LINUX.
> > > We don't need to do
> > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > > directly provide SW
> > > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > > >
> > > > > > > > > > > - Many users would like to control size (number of
> flows/items
> > > per flow),
> > > > > > > > > > >   max allowed packet size, max timeout, etc., for different
> GRO
> > > tables.
> > > > > > > > > > > - User would need a way to flush all or only timeout
> packets
> > > from particular GRO tables.
> > > > > > > > > > >
> > > > > > > > > > > So I think that API needs to extended to become be much
> more
> > > fine-grained.
> > > > > > > > > > > Something like that:
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > > >    int32_t socket_id;
> > > > > > > > > > >    size_t max_flows;
> > > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > > >   <probably type specific params>
> > > > > > > > > > >   ...
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > > rte_gro_tbl_param *param);
> > > > > > > > > > >
> > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > > >
> > > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > > grained control APIs to
> > > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is
> it
> > > for TCP packets?
> > > > > > > > >
> > > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >  * process packets, might store some packets inside the
> GRO
> > > table,
> > > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > > >  */
> > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > > >   */
> > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl,
> uint64_t
> > > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > Currently, we implement GRO as RX callback, whose
> processing
> > > logic is simple:
> > > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > > GRO stops after
> > > > > > > > > > finishing processing received packets. If we provide
> > > rte_gro_tbl_timeout, when
> > > > > > > > > > and who will call it?
> > > > > > > > >
> > > > > > > > > I mean the following scenario:
> > > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > > gro_table
> > > > > > > > > in expectation - there would be more packets for the same
> flow.
> > > > > > > > > But it could happen that we would never (or for some long
> time)
> > > receive
> > > > > > > > > any new packets for that flow.
> > > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > > or delivered too late.
> > > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > > and push them to the upper layer.
> > > > > > > > > My thought it would be application responsibility to call it from
> > > time to time.
> > > > > > > > > That's actually another reason why I think we shouldn't use
> > > application
> > > > > > > > > to always use RX callbacks for GRO.
> > > > > > > >
> > > > > > > > Currently, we only provide one reassembly function,
> > > rte_gro_reassemble_burst,
> > > > > > > > which merges N inputted packets at a time. After finishing
> > > processing these
> > > > > > > > packets, it returns all of them and reset hashing tables.
> Therefore,
> > > there
> > > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > > returns.
> > > > > > >
> > > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > > So gro_ressemble_burst() performs only inline processing on
> current
> > > input packets
> > > > > > > and doesn't try to save packets for future merging, correct?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Such approach indeed is much lightweight and doesn't require any
> > > extra timeouts and flush().
> > > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > > BTW, I think in that case we don't need any hashtables (or any
> other
> > > persistent strucures)at all.
> > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > > perform on given array of packets.
> > > > > >
> > > > > > Beside GRO types that are desired to perform, maybe it also needs
> > > max_pkt_size and
> > > > > > some GRO type specific information?
> > > > >
> > > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > > Passing something like struct gro_param seems enough.
> > > >
> > > > Yes, we can just pass gro_param and allocate hashing tables
> > > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > > desired GRO types are created and freed in each invocation
> > > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > > are created by GRO type specific gro_tbl_create_fn. These
> > > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > > degrade GRO performance greatly.
> > >
> > > I don't' understand why do we need to put/extract each packet into/from
> > > hash table at all.
> > > We have N input packets that need to be grouped/sorted  by some
> criteria.
> > > Surely that can be done without any hash-table involved.
> > > What is the need for hash table and all the overhead it brings here?
> >
> > In current design, I assume all GRO types use hash tables to merge
> > packets. The key of the hash table is the criteria to merge packets.
> > So the main difference for different GRO types' hash tables is the
> > key definition.
> >
> > And the reason for using hash tables is to speed up reassembly. Given
> > There are N TCP packets inputted, the simplest way to process packets[i]
> > Is to traverse processed packets[0]~packets[i-1] and try to find a one
> > to merge. In the worst case, we need to check all of packets[0~i-1].
> > In this case, the time complexity of processing N packets is O(N^2).
> > If we use a hash table, whose key is the criteria to merge two packets,
> > the time to find a packet that may be merged with packets[i] is O(1).
> 
> I understand that, but add/search inside the hash table,
> plus resetting it for every burst of packets doesn't come for free also.
> I think that for most common burst sizes (< 100 packets)
> hash overhead would significantly overweight the price of
> worst case O(N^2) scan.

Yes, the packet burst size is always less than 100, which may amplify
the cost of using hash tables. To avoid the high price, maybe a simpler
structure, like rte_ip_frag_tbl in IP reassembly library, is better. And
actually, I have never tried other designs. In next version, I will use 
a simpler structure to merge TCP packets and compare performance
results. Thanks.

> Also, if such worst case really worries you, you can  pre-sort
> input packets first before starting the actual reassemble for them.

If the inputted N packets are consist of N1 TCP packets and N2 UDP
packets. If we pre-sort them, what should they look like after sorting?

> That might help to bring the price down.
> Konstantin

Jiayu

> 
> >
> > Do you think it's too complicated?
> >
> > Jiayu
> >
> > > Konstantin
> > >
> > > >
> > > > But if we ask applications to input hashing tables, what we
> > > > need to do is to reset them after finishing using in
> > > > rte_gro_reassemble_burst, rather than to malloc and free each
> > > > time. Therefore, I think this way is more efficient. How do you
> > > > think?
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > If we provide rte_gro_tbl_timeout, we also need to provide
> another
> > > reassmebly
> > > > > > > > function, like rte_gro_reassemble, which processes one given
> > > packet at a
> > > > > > > > time and won't reset hashing tables. Applications decide when to
> > > flush packets
> > > > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways
> that
> > > can be used
> > > > > > > > to flush the packets. Do you mean that?
> > > > > > >
> > > > > > > Yes, that's what I meant, but as I said above - I think your
> approach is
> > > probably
> > > > > > > more preferable - it is much simpler and lightweight.
> > > > > > > Konstantin
> > > > > > >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-30  5:29                             ` Hu, Jiayu
@ 2017-05-30 11:56                               ` Ananyev, Konstantin
  0 siblings, 0 replies; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-05-30 11:56 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Wiles, Keith, yuanhan.liu

HI Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Tuesday, May 30, 2017 6:29 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Monday, May 29, 2017 8:52 PM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> >
> > Hi Jiayu,
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Monday, May 29, 2017 11:23 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > >
> > > Hi Konstantin,
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Sunday, May 28, 2017 12:51 AM
> > > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > > yuanhan.liu@linux.intel.com
> > > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > >
> > > >
> > > > Hi Jiayu,
> > > >
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > > yuanhan.liu@linux.intel.com
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > > framework
> > > > > > >
> > > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Hu, Jiayu
> > > > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > > yuanhan.liu@linux.intel.com
> > > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > > framework
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev,
> > Konstantin
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Konstantin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > > > >
> > > > > > > > > > > BRs,
> > > > > > > > > > > Jiayu
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > > > Konstantin wrote:
> > > > > > > > > > > > Hi Jiayu,
> > > > > > > > > > > > My comments/questions below.
> > > > > > > > > > > > Konstantin
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > For applications, DPDK GRO provides three external
> > functions
> > > > to
> > > > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > > > Before using GRO, applications should explicitly call
> > > > rte_gro_init to
> > > > > > > > > > > > > initizalize GRO environment. After that, applications can
> > call
> > > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable
> > to
> > > > disable GRO for
> > > > > > > > > > > > > specific ports.
> > > > > > > > > > > >
> > > > > > > > > > > > I think this is too restrictive and wouldn't meet various
> > user's
> > > > needs.
> > > > > > > > > > > > User might want to:
> > > > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > > > IPV6/TCP for queue 1, etc.
> > > > > > > > > > >
> > > > > > > > > > > The reason for enabling/disabling GRO per-port instead of
> > per-
> > > > queue is that LINUX
> > > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > > > provide more flexibility
> > > > > > > > > > > to applications. But are there any scenarios that different
> > > > queues of a port may
> > > > > > > > > > > require different GRO control (i.e. GRO types and
> > enable/disable
> > > > GRO)?
> > > > > > > > > >
> > > > > > > > > > I think yes.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > - For various reasons, user might prefer not to use RX
> > callbacks
> > > > for various reasons,
> > > > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > > > >
> > > > > > > > > > > An application-used GRO library can enable more flexibility to
> > > > applications. Besides,
> > > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is
> > an
> > > > issue that
> > > > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > > > GROed packet number. And
> > > > > > > > > > > the same issue happens in GSO, and even more seriously.
> > This is
> > > > because applications
> > > > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst
> > to
> > > > decide further
> > > > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > > > this issue won't exist.
> > > > > > > > > > > And DPDK is a library, which is not a holistic system like
> > LINUX.
> > > > We don't need to do
> > > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > > > directly provide SW
> > > > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > > > >
> > > > > > > > > > > > - Many users would like to control size (number of
> > flows/items
> > > > per flow),
> > > > > > > > > > > >   max allowed packet size, max timeout, etc., for different
> > GRO
> > > > tables.
> > > > > > > > > > > > - User would need a way to flush all or only timeout
> > packets
> > > > from particular GRO tables.
> > > > > > > > > > > >
> > > > > > > > > > > > So I think that API needs to extended to become be much
> > more
> > > > fine-grained.
> > > > > > > > > > > > Something like that:
> > > > > > > > > > > >
> > > > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > > > >    int32_t socket_id;
> > > > > > > > > > > >    size_t max_flows;
> > > > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > > > >   <probably type specific params>
> > > > > > > > > > > >   ...
> > > > > > > > > > > > };
> > > > > > > > > > > >
> > > > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > > > rte_gro_tbl_param *param);
> > > > > > > > > > > >
> > > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > > > >
> > > > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > > > grained control APIs to
> > > > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is
> > it
> > > > for TCP packets?
> > > > > > > > > >
> > > > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > /*
> > > > > > > > > > > >  * process packets, might store some packets inside the
> > GRO
> > > > table,
> > > > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > > > >  */
> > > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > > > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > > >
> > > > > > > > > > > > /*
> > > > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > > > >   */
> > > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl,
> > uint64_t
> > > > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > >
> > > > > > > > > > > Currently, we implement GRO as RX callback, whose
> > processing
> > > > logic is simple:
> > > > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > > > GRO stops after
> > > > > > > > > > > finishing processing received packets. If we provide
> > > > rte_gro_tbl_timeout, when
> > > > > > > > > > > and who will call it?
> > > > > > > > > >
> > > > > > > > > > I mean the following scenario:
> > > > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > > > gro_table
> > > > > > > > > > in expectation - there would be more packets for the same
> > flow.
> > > > > > > > > > But it could happen that we would never (or for some long
> > time)
> > > > receive
> > > > > > > > > > any new packets for that flow.
> > > > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > > > or delivered too late.
> > > > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > > > and push them to the upper layer.
> > > > > > > > > > My thought it would be application responsibility to call it from
> > > > time to time.
> > > > > > > > > > That's actually another reason why I think we shouldn't use
> > > > application
> > > > > > > > > > to always use RX callbacks for GRO.
> > > > > > > > >
> > > > > > > > > Currently, we only provide one reassembly function,
> > > > rte_gro_reassemble_burst,
> > > > > > > > > which merges N inputted packets at a time. After finishing
> > > > processing these
> > > > > > > > > packets, it returns all of them and reset hashing tables.
> > Therefore,
> > > > there
> > > > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > > > returns.
> > > > > > > >
> > > > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > > > So gro_ressemble_burst() performs only inline processing on
> > current
> > > > input packets
> > > > > > > > and doesn't try to save packets for future merging, correct?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > Such approach indeed is much lightweight and doesn't require any
> > > > extra timeouts and flush().
> > > > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > > > BTW, I think in that case we don't need any hashtables (or any
> > other
> > > > persistent strucures)at all.
> > > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > > > perform on given array of packets.
> > > > > > >
> > > > > > > Beside GRO types that are desired to perform, maybe it also needs
> > > > max_pkt_size and
> > > > > > > some GRO type specific information?
> > > > > >
> > > > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > > > Passing something like struct gro_param seems enough.
> > > > >
> > > > > Yes, we can just pass gro_param and allocate hashing tables
> > > > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > > > desired GRO types are created and freed in each invocation
> > > > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > > > are created by GRO type specific gro_tbl_create_fn. These
> > > > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > > > degrade GRO performance greatly.
> > > >
> > > > I don't' understand why do we need to put/extract each packet into/from
> > > > hash table at all.
> > > > We have N input packets that need to be grouped/sorted  by some
> > criteria.
> > > > Surely that can be done without any hash-table involved.
> > > > What is the need for hash table and all the overhead it brings here?
> > >
> > > In current design, I assume all GRO types use hash tables to merge
> > > packets. The key of the hash table is the criteria to merge packets.
> > > So the main difference for different GRO types' hash tables is the
> > > key definition.
> > >
> > > And the reason for using hash tables is to speed up reassembly. Given
> > > There are N TCP packets inputted, the simplest way to process packets[i]
> > > Is to traverse processed packets[0]~packets[i-1] and try to find a one
> > > to merge. In the worst case, we need to check all of packets[0~i-1].
> > > In this case, the time complexity of processing N packets is O(N^2).
> > > If we use a hash table, whose key is the criteria to merge two packets,
> > > the time to find a packet that may be merged with packets[i] is O(1).
> >
> > I understand that, but add/search inside the hash table,
> > plus resetting it for every burst of packets doesn't come for free also.
> > I think that for most common burst sizes (< 100 packets)
> > hash overhead would significantly overweight the price of
> > worst case O(N^2) scan.
> 
> Yes, the packet burst size is always less than 100, which may amplify
> the cost of using hash tables. To avoid the high price, maybe a simpler
> structure, like rte_ip_frag_tbl in IP reassembly library, is better. And
> actually, I have never tried other designs. In next version, I will use
> a simpler structure to merge TCP packets and compare performance
> results. Thanks.
> 
> > Also, if such worst case really worries you, you can  pre-sort
> > input packets first before starting the actual reassemble for them.
> 
> If the inputted N packets are consist of N1 TCP packets and N2 UDP
> packets. If we pre-sort them, what should they look like after sorting?
>

My thought was something like that:
<tcp_flow0 pkts>...<tcp_flowX pkts> <udp_flow0 pkts>...<udp_flowY pkts>
|------- N1 -------------------------------|--------- N2 -------------------------------|

Konstantin

 
> > That might help to bring the price down.
> > Konstantin
> 

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
  2017-05-29 12:18                           ` Bruce Richardson
@ 2017-05-30 14:10                             ` Hu, Jiayu
  0 siblings, 0 replies; 141+ messages in thread
From: Hu, Jiayu @ 2017-05-30 14:10 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: Ananyev, Konstantin, dev, Wiles, Keith, yuanhan.liu

Hi Bruce,

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Monday, May 29, 2017 8:19 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> 
> On Mon, May 29, 2017 at 10:22:57AM +0000, Hu, Jiayu wrote:
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Sunday, May 28, 2017 12:51 AM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > >
> > >
> > > Hi Jiayu,
> > >
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hu, Jiayu
> > > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > >
> > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin
> wrote:
> > > > > > > Hi Jiayu,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Hu, Jiayu
> > > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev,
> Konstantin
> > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Jiayu,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Konstantin,
> > > > > > > > > >
> > > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > > >
> > > > > > > > > > BRs,
> > > > > > > > > > Jiayu
> > > > > > > > > >
> > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > > Konstantin wrote:
> > > > > > > > > > > Hi Jiayu,
> > > > > > > > > > > My comments/questions below.
> > > > > > > > > > > Konstantin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > For applications, DPDK GRO provides three external
> functions
> > > to
> > > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > > Before using GRO, applications should explicitly call
> > > rte_gro_init to
> > > > > > > > > > > > initizalize GRO environment. After that, applications can
> call
> > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable
> to
> > > disable GRO for
> > > > > > > > > > > > specific ports.
> > > > > > > > > > >
> > > > > > > > > > > I think this is too restrictive and wouldn't meet various
> user's
> > > needs.
> > > > > > > > > > > User might want to:
> > > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > > IPV6/TCP for queue 1, etc.
> > > > > > > > > >
> > > > > > > > > > The reason for enabling/disabling GRO per-port instead of
> per-
> > > queue is that LINUX
> > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > > provide more flexibility
> > > > > > > > > > to applications. But are there any scenarios that different
> > > queues of a port may
> > > > > > > > > > require different GRO control (i.e. GRO types and
> enable/disable
> > > GRO)?
> > > > > > > > >
> > > > > > > > > I think yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > - For various reasons, user might prefer not to use RX
> callbacks
> > > for various reasons,
> > > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > > >
> > > > > > > > > > An application-used GRO library can enable more flexibility to
> > > applications. Besides,
> > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is
> an
> > > issue that
> > > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > > GROed packet number. And
> > > > > > > > > > the same issue happens in GSO, and even more seriously.
> This is
> > > because applications
> > > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst
> to
> > > decide further
> > > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > > this issue won't exist.
> > > > > > > > > > And DPDK is a library, which is not a holistic system like
> LINUX.
> > > We don't need to do
> > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > > directly provide SW
> > > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > > >
> > > > > > > > > > > - Many users would like to control size (number of
> flows/items
> > > per flow),
> > > > > > > > > > >   max allowed packet size, max timeout, etc., for different
> GRO
> > > tables.
> > > > > > > > > > > - User would need a way to flush all or only timeout
> packets
> > > from particular GRO tables.
> > > > > > > > > > >
> > > > > > > > > > > So I think that API needs to extended to become be much
> more
> > > fine-grained.
> > > > > > > > > > > Something like that:
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > > >    int32_t socket_id;
> > > > > > > > > > >    size_t max_flows;
> > > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > > >   <probably type specific params>
> > > > > > > > > > >   ...
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > > rte_gro_tbl_param *param);
> > > > > > > > > > >
> > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > > >
> > > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > > grained control APIs to
> > > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is
> it
> > > for TCP packets?
> > > > > > > > >
> > > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >  * process packets, might store some packets inside the
> GRO
> > > table,
> > > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > > >  */
> > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > > >   */
> > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl,
> uint64_t
> > > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > Currently, we implement GRO as RX callback, whose
> processing
> > > logic is simple:
> > > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > > GRO stops after
> > > > > > > > > > finishing processing received packets. If we provide
> > > rte_gro_tbl_timeout, when
> > > > > > > > > > and who will call it?
> > > > > > > > >
> > > > > > > > > I mean the following scenario:
> > > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > > gro_table
> > > > > > > > > in expectation - there would be more packets for the same
> flow.
> > > > > > > > > But it could happen that we would never (or for some long
> time)
> > > receive
> > > > > > > > > any new packets for that flow.
> > > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > > or delivered too late.
> > > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > > and push them to the upper layer.
> > > > > > > > > My thought it would be application responsibility to call it from
> > > time to time.
> > > > > > > > > That's actually another reason why I think we shouldn't use
> > > application
> > > > > > > > > to always use RX callbacks for GRO.
> > > > > > > >
> > > > > > > > Currently, we only provide one reassembly function,
> > > rte_gro_reassemble_burst,
> > > > > > > > which merges N inputted packets at a time. After finishing
> > > processing these
> > > > > > > > packets, it returns all of them and reset hashing tables.
> Therefore,
> > > there
> > > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > > returns.
> > > > > > >
> > > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > > So gro_ressemble_burst() performs only inline processing on
> current
> > > input packets
> > > > > > > and doesn't try to save packets for future merging, correct?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Such approach indeed is much lightweight and doesn't require any
> > > extra timeouts and flush().
> > > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > > BTW, I think in that case we don't need any hashtables (or any
> other
> > > persistent strucures)at all.
> > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > > perform on given array of packets.
> > > > > >
> > > > > > Beside GRO types that are desired to perform, maybe it also needs
> > > max_pkt_size and
> > > > > > some GRO type specific information?
> > > > >
> > > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > > Passing something like struct gro_param seems enough.
> > > >
> > > > Yes, we can just pass gro_param and allocate hashing tables
> > > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > > desired GRO types are created and freed in each invocation
> > > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > > are created by GRO type specific gro_tbl_create_fn. These
> > > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > > degrade GRO performance greatly.
> > >
> > > I don't' understand why do we need to put/extract each packet into/from
> > > hash table at all.
> > > We have N input packets that need to be grouped/sorted  by some
> criteria.
> > > Surely that can be done without any hash-table involved.
> > > What is the need for hash table and all the overhead it brings here?
> >
> > In current design, I assume all GRO types use hash tables to merge
> > packets. The key of the hash table is the criteria to merge packets.
> > So the main difference for different GRO types' hash tables is the
> > key definition.
> >
> > And the reason for using hash tables is to speed up reassembly. Given
> > There are N TCP packets inputted, the simplest way to process packets[i]
> > Is to traverse processed packets[0]~packets[i-1] and try to find a one
> > to merge. In the worst case, we need to check all of packets[0~i-1].
> > In this case, the time complexity of processing N packets is O(N^2).
> > If we use a hash table, whose key is the criteria to merge two packets,
> > the time to find a packet that may be merged with packets[i] is O(1).
> >
> > Do you think it's too complicated?
> >
> > Jiayu
> >
> How big is your expected burst size? If you are expecting 32 or 64

In current design, if the packet burst size is always small (e.g. less than
32 or 64), applications can use a small hash table to reduce the overhead
from the hash table. But it's not a good solution, since the cost is still high
when the burst size is small.

> packets per call, then N is small and the overhead of the hash table
> seems a bit much. Perhaps you need different code paths for bigger and
> smaller bursts?

Yes, if hash tables can bring much better performance for the scenario
whose N is large, maybe we can consider use two code paths.

Jiayu 

> 
> /Bruce

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v3 3/3] app/testpmd: enable GRO feature
  2017-04-24  8:09     ` [PATCH v3 3/3] app/testpmd: enable GRO feature Jiayu Hu
@ 2017-06-07  9:24       ` Wu, Jingjing
  0 siblings, 0 replies; 141+ messages in thread
From: Wu, Jingjing @ 2017-06-07  9:24 UTC (permalink / raw)
  To: Hu, Jiayu, dev; +Cc: Ananyev, Konstantin, Wiles, Keith, yuanhan.liu, Hu, Jiayu



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jiayu Hu
> Sent: Monday, April 24, 2017 4:10 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: [dpdk-dev] [PATCH v3 3/3] app/testpmd: enable GRO feature
> 
> This patch demonstrates the usage of GRO library in testpmd. By default,
> GRO is turned off. Command, "gro on (port_id)", turns on GRO for the
> given port; command, "gro off (port_id)", turns off GRO for the given
> port. Note that, current GRO only supports TCP IPv4 packets.
> 
> Once the feature is turned on, all received packets are performed GRO
> procedure before returned from rte_eth_rx_burst.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>  app/test-pmd/cmdline.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  app/test-pmd/config.c  | 26 ++++++++++++++++++++++++++
>  app/test-pmd/iofwd.c   |  1 +
>  app/test-pmd/testpmd.c |  5 +++++
>  app/test-pmd/testpmd.h |  3 +++
>  5 files changed, 80 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index f6bd75b..200ac3c 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -76,6 +76,7 @@
>  #include <rte_devargs.h>
>  #include <rte_eth_ctrl.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -420,6 +421,9 @@ static void cmd_help_long_parsed(void *parsed_result,
>  			"tso show (portid)"
>  			"    Display the status of TCP Segmentation Offload.\n\n"
> 
> +			"gro (on|off) (port_id)"
> +			"    Enable or disable Generic Receive Offload.\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -3825,6 +3829,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
>  	},
>  };
> 
> +/* *** SET GRO FOR A PORT *** */
> +struct cmd_gro_result {
> +	cmdline_fixed_string_t cmd_keyword;
> +	cmdline_fixed_string_t mode;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_set_gro_parsed(void *parsed_result,
> +		__attribute__((unused)) struct cmdline *cl,
> +		__attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_result *res;
> +
> +	res = parsed_result;
> +	setup_gro(res->mode, res->port_id);
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			cmd_keyword, "gro");
> +cmdline_parse_token_string_t cmd_gro_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			mode, NULL);
You can define gro_mode like
cmdline_parse_token_string_t cmd_gro_mode =
	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
			mode, "on#off");
Then only two options are valid in this command.

> +cmdline_parse_token_num_t cmd_gro_pid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
> +			port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_set_gro = {
> +	.f = cmd_set_gro_parsed,
> +	.data = NULL,
> +	.help_str = "gro (on|off) (port_id)",
> +	.tokens = {
> +		(void *)&cmd_gro_keyword,
> +		(void *)&cmd_gro_mode,
> +		(void *)&cmd_gro_pid,
> +		NULL,
> +	},
> +};
> +
>  /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
>  struct cmd_set_flush_rx {
>  	cmdline_fixed_string_t set;
> @@ -13592,6 +13636,7 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tso_show,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
> +	(cmdline_parse_inst_t *)&cmd_set_gro,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index ef07925..525f83b 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -97,6 +97,7 @@
>  #ifdef RTE_LIBRTE_IXGBE_PMD
>  #include <rte_pmd_ixgbe.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -2423,6 +2424,31 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
>  	tx_pkt_nb_segs = (uint8_t) nb_segs;
>  }
> 
> +void
> +setup_gro(const char *mode, uint8_t port_id)
> +{
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		printf("invalid port id %u\n", port_id);
> +		return;
> +	}
> +	if (strcmp(mode, "on") == 0) {
> +		if (test_done == 0) {
> +			printf("before enable GRO,"
> +					" please stop forwarding first\n");
> +			return;
> +		}
> +		rte_gro_enable(port_id, rte_eth_dev_socket_id(port_id));
> +	} else if (strcmp(mode, "off") == 0) {
> +		if (test_done == 0) {
> +			printf("before disable GRO,"
> +					" please stop forwarding first\n");
> +			return;
> +		}
> +		rte_gro_disable(port_id);
> +	} else
> +		printf("unsupported GRO mode\n");

When you change gro mode to "on#off", then the else branch is not needed.
And you can move the checking of forwarding to the beginning of this function.

At last, you also need to update the testpmd doc: doc/guides/testpmd_app_ug/ due to new command line.

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK
  2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
                       ` (2 preceding siblings ...)
  2017-04-24  8:09     ` [PATCH v3 3/3] app/testpmd: enable GRO feature Jiayu Hu
@ 2017-06-07 11:08     ` Jiayu Hu
  2017-06-07 11:08       ` [PATCH v4 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                         ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-07 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, keith.wiles, yliu, jianfeng.tan, tiwei.bie, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweigth mode, the other is called heavyweight mode.
If applications want merge packets in a simple way, they can use
lightweigth mode. If applications need more fine-grained controls,
they can choose heavyweigth mode.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch demonstrates how to use GRO library in app/testpmd.

We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
the performance gains from DPDK GRO. Specifically, the experiment
environment is:
a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. Iperf client
runs on p0, which sends TCP/IPv4 packets. The OS in VM is ubuntu 14.04;
c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
VM via vhost-user and virtio-net. The VM runs iperf server, whose IP
is 1.1.2.4;
d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
iperf client and server use the following commands:
	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
	- server: iperf -s -f g
Two test cases are:
a. w/o DPDK GRO: run testpmd without GRO
b. w DPDK GRO: testpmd enables GRO for p1
Result:
With GRO, the throughput improvement is around 40%.

Change log
==========
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c       |  45 ++++
 app/test-pmd/config.c        |  29 +++
 app/test-pmd/iofwd.c         |   6 +
 app/test-pmd/testpmd.c       |   3 +
 app/test-pmd/testpmd.h       |  11 +
 config/common_base           |   5 +
 lib/Makefile                 |   1 +
 lib/librte_gro/Makefile      |  51 +++++
 lib/librte_gro/rte_gro.c     | 243 +++++++++++++++++++++
 lib/librte_gro/rte_gro.h     | 216 ++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.c | 509 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h | 206 +++++++++++++++++
 mk/rte.app.mk                |   1 +
 13 files changed, 1326 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v4 1/3] lib: add Generic Receive Offload API framework
  2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-06-07 11:08       ` Jiayu Hu
  2017-06-07 11:08       ` [PATCH v4 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-07 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, keith.wiles, yliu, jianfeng.tan, tiwei.bie, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. This patchset is to
support GRO in DPDK. To support GRO, this patch implements a GRO API
framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweigth mode, the other is called heavyweight mode.
If applications want merge packets in a simple way, they can use
lightweigth mode. If applications need more fine-grained controls,
they can choose heavyweigth mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweigth mode and processes N packets at a time. For applications,
performing GRO in lightweigth mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
lightweigth mode and processes one packet at a time. For applications,
performing GRO in heavyweigth mode is relatively complicated. Before
performing GRO, applications need to create a GRO table by
rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
packets one by one. The processed packets are in the GRO table. If
applications want to get them, applications need to manually flush
them by flush APIs.

In DPDK GRO, different GRO types define own reassembly tables. When
create a GRO table, it keeps the reassembly tables of desired GRO types.
To process one packet, we search for the corresponding reassembly table
according to the packet type first. Then search for the reassembly table
to find an existed packet to merge. If find, chain the two packets
together. If not find, insert the packet into the reassembly table. If
the packet is with wrong checksum, or is fragmented etc., error happens.
The reassebly function will stop processing the packet.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base       |   5 ++
 lib/Makefile             |   1 +
 lib/librte_gro/Makefile  |  50 +++++++++++
 lib/librte_gro/rte_gro.c | 125 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h | 212 +++++++++++++++++++++++++++++++++++++++++++++++
 mk/rte.app.mk            |   1 +
 6 files changed, 394 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h

diff --git a/config/common_base b/config/common_base
index c858769..c237187 100644
--- a/config/common_base
+++ b/config/common_base
@@ -718,6 +718,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..e253053 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..9f4063a
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..ca6b0d2
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,125 @@
+/*-
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
+
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	struct rte_gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct rte_gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	gro_tbl->max_packet_size = max_packet_size;
+	gro_tbl->max_timeout_cycles = max_timeout_cycles;
+	gro_tbl->desired_gro_types = desired_gro_types;
+
+	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
+		gro_type_flag = 1 << i;
+		if (desired_gro_types & gro_type_flag) {
+			create_tbl_fn = tbl_create_functions[i];
+			if (create_tbl_fn)
+				create_tbl_fn(socket_id,
+						max_flow_num,
+						max_item_per_flow);
+			else
+				gro_tbl->tbls[i] = NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
+		gro_type_flag = 1 << i;
+		if (gro_tbl->desired_gro_types & gro_type_flag) {
+			destroy_tbl_fn = tbl_destroy_functions[i];
+			if (destroy_tbl_fn)
+				destroy_tbl_fn(gro_tbl->tbls[i]);
+			gro_tbl->tbls[i] = NULL;
+		}
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param __rte_unused)
+{
+	return nb_pkts;
+}
+
+int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
+		struct rte_gro_tbl *gro_tbl __rte_unused)
+{
+	return -1;
+}
+
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		uint16_t flush_num __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint16_t
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..7fe11a6
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,212 @@
+/*-
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+/* maximum number of supported GRO types */
+#define GRO_TYPE_MAX_NB 64
+#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
+
+/**
+ * GRO table structure. DPDK GRO uses GRO table to reassemble
+ * packets. In heightweight mode, applications must create GRO tables
+ * before performing GRO. However, in lightweight mode, applications
+ * don't need create GRO tables.
+ *
+ * A GRO table object stores many reassembly tables of desired
+ * GRO types.
+ */
+struct rte_gro_tbl {
+	/* table addresses of desired GRO types */
+	void *tbls[GRO_TYPE_MAX_NB];
+	uint64_t desired_gro_types;	/**< GRO types that want to perform */
+	/**
+	 * the maximum time of packets staying in GRO tables, measured in
+	 * nanosecond.
+	 */
+	uint64_t max_timeout_cycles;
+	/* the maximum length of merged packet, measured in byte */
+	uint32_t max_packet_size;
+};
+
+/**
+ * In lightweihgt mode, applications use this strcuture to pass the
+ * needed parameters to rte_gro_reassemble_burst.
+ */
+struct rte_gro_param {
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max item number per flow */
+	/**
+	 * It indicates the GRO types that applications want to perform,
+	 * whose value is the result of OR operation on GRO type flags.
+	 */
+	uint64_t desired_gro_types;
+	/* the maximum packet size after been merged */
+	uint32_t max_packet_size;
+};
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+
+/**
+ * This function create a GRO table, which is used to merge packets.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum flow number in the GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow
+ * @param max_packet_size
+ *  the maximum size of merged packets, which is measured in byte.
+ * @param max_timeout_cycles
+ *  the maximum time that a packet can stay in the GRO table.
+ * @param desired_gro_types
+ *  GRO types that applications want to perform. It's value is the
+ *  result of OR operation on desired GRO type flags.
+ * @return
+ *  If create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
+
+/**
+ * This is the main reassembly API used in lightweight mode, which
+ * merges numbers of packets at a time. After it returns, applications
+ * can get GROed packets immediately. Applications don't need to
+ * flush packets manually. In lightweight mode, applications just need
+ * to tell the reassembly API what rules should be applied when merge
+ * packets. Therefore, applications can perform GRO in very a simple
+ * way.
+ *
+ * To process one packet, we find its corresponding reassembly table
+ * according to the packet type. Then search for the reassembly table
+ * to find one packet to merge. If find, chain the two packets together.
+ * If not find, insert the inputted packet into the reassembly table.
+ * Besides, to merge two packets is to chain them together. No
+ * memory copy is needed. Before rte_gro_reassemble_burst returns,
+ * header checksums of merged packets are re-calculated.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. After
+ *  GRO, it is also used to keep GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  Applications use param to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		const uint16_t nb_pkts __rte_unused,
+		const struct rte_gro_param param __rte_unused);
+
+/**
+ * This is the main reassembly API used in heavyweight mode, which
+ * merges one packet at a time. The procedure of merging one packet is
+ * similar with rte_gro_reassemble_burst. But rte_gro_reassemble will
+ * not update header checksums. Header checksums of merged packets are
+ * re-calculated in flush APIs.
+ *
+ * If error happens, like packet with error checksum and with
+ * unsupported GRO types, the inputted packet won't be stored in GRO
+ * table. If no errors happen, the packet is either merged with an
+ * existed packet, or inserted into its corresponding reassembly table.
+ * Applications can get packets in the GRO table by flush APIs.
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param gro_tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  if merge the packet successfully, return a positive value. If fail
+ *  to merge, return zero. If errors happen, return a negative value.
+ */
+int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
+		struct rte_gro_tbl *gro_tbl __rte_unused);
+
+/**
+ * This function flushed packets of desired GRO types from their
+ * corresponding reassembly tables.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ *  GRO types whose packets will be flushed.
+ * @param flush_num
+ *  the number of packets that need flushing.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		uint16_t flush_num __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused);
+#endif
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v4 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-07 11:08       ` [PATCH v4 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-07 11:08       ` Jiayu Hu
  2017-06-07 11:08       ` [PATCH v4 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-07 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, keith.wiles, yliu, jianfeng.tan, tiwei.bie, Jiayu Hu

In this patch, we introduce six APIs to support TCP/IPv4 GRO.
- gro_tcp_tbl_create: create a TCP reassembly table, which is used to
    merge packets.
- gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
- gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
- gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
    reassembly table.
- gro_tcp4_reassemble: merge an inputted packet.
- gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
    all merged packets in the TCP reassembly table.

In TCP GRO, we use a table structure, called TCP reassembly table, to
reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
structure. A TCP reassembly table includes a flow array and a item array,
where the flow array is used to record flow information and the item
array is used to record packets information.

Each element in the flow array records the information of one flow,
which includes two parts:
- key: the criteria of the same flow. If packets have the same key
    value, they belong to the same flow.
- start_index: the index of the first incoming packet of this flow in
    the item array. With start_index, we can locate the first incoming
    packet of this flow.
Each element in the item array records one packet information. It mainly
includes two parts:
- pkt: packet address
- next_pkt_index: index of the next packet of the same flow in the item
    array. All packets of the same flow are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets of the same flow
    one by one.

To process an incoming packet, we need three steps:
a. check if the packet should be processed. Packets with the following
    properties won't be processed:
	- packets without data;
	- packets with wrong checksums;
	- fragmented packets.
b. traverse the flow array to find a flow which the packet belongs to.
    If not find, insert a new flow and store the packet into the item
    array.
c. locate the first packet of this flow in the item array via
    start_index. Then traverse all packets of this flow one by one via
    next_pkt_index. If find one packet to merge with the incoming packet,
    merge them but without updating checksums. If not, allocate one item
    in the item array to store the incoming packet and update
    next_pkt_index value.

For better performance, we don't udpate header checksums once two
packets are merged. The header checksums are updated only when packets
are flushed from TCP reassembly tables.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 lib/librte_gro/Makefile      |   1 +
 lib/librte_gro/rte_gro.c     | 150 +++++++++++--
 lib/librte_gro/rte_gro.h     |  34 +--
 lib/librte_gro/rte_gro_tcp.c | 509 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h | 206 +++++++++++++++++
 5 files changed, 869 insertions(+), 31 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 9f4063a..3495dfc 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index ca6b0d2..f2defbd 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -31,11 +31,17 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp.h"
 
-static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
-static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
+	gro_tcp_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
+	gro_tcp_tbl_destroy, NULL};
 
 struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -93,33 +99,145 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		const uint16_t nb_pkts,
-		const struct rte_gro_param param __rte_unused)
+		const struct rte_gro_param param)
 {
-	return nb_pkts;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t l3proc_type, i;
+	uint16_t nb_after_gro = nb_pkts;
+	const uint64_t item_num = nb_pkts >
+		param.max_flow_num * param.max_item_per_flow ?
+		param.max_flow_num * param.max_item_per_flow :
+		nb_pkts;
+	const uint32_t flow_num = nb_pkts > param.max_flow_num ?
+		param.max_flow_num : nb_pkts;
+
+	/* allocate respective GRO tables for all supported GRO types */
+	struct gro_tcp_tbl tcp_tbl;
+	struct gro_tcp_flow tcp_flows[flow_num];
+	struct gro_tcp_item tcp_items[item_num];
+	struct gro_tcp_rule tcp_rule;
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+
+	if (unlikely(nb_pkts <= 1))
+		return nb_pkts;
+
+	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
+			flow_num);
+	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
+			item_num);
+	tcp_tbl.flows = tcp_flows;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.flow_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_flow_num = flow_num;
+	tcp_tbl.max_item_num = item_num;
+	tcp_rule.max_packet_size = param.max_packet_size;
+
+	for (i = 0; i < nb_pkts; i++) {
+		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
+		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
+		if (l3proc_type == ETHER_TYPE_IPv4) {
+			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
+					(param.desired_gro_types &
+					 GRO_TCP_IPV4)) {
+				ret = gro_tcp4_reassemble(pkts[i],
+						&tcp_tbl,
+						&tcp_rule);
+				if (ret > 0)
+					nb_after_gro--;
+				else if (ret < 0)
+					unprocess_pkts[unprocess_num++] =
+						pkts[i];
+			} else
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	if (nb_after_gro < nb_pkts) {
+		/* update packets headers and re-arrange GROed packets */
+		if (param.desired_gro_types & GRO_TCP_IPV4) {
+			gro_tcp4_tbl_cksum_update(&tcp_tbl);
+			for (i = 0; i < tcp_tbl.item_num; i++)
+				pkts[i] = tcp_tbl.items[i].pkt;
+		}
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+			i += unprocess_num;
+		}
+		if (nb_pkts > i)
+			memset(&pkts[i], 0,
+					sizeof(struct rte_mbuf *) *
+					(nb_pkts - i));
+	}
+	return nb_after_gro;
 }
 
-int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
-		struct rte_gro_tbl *gro_tbl __rte_unused)
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl)
 {
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t l3proc_type;
+	struct gro_tcp_rule tcp_rule;
+
+	if (pkt == NULL)
+		return -1;
+	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
+	if (l3proc_type == ETHER_TYPE_IPv4) {
+		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
+				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
+			return gro_tcp4_reassemble(pkt,
+					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+					&tcp_rule);
+		}
+	}
 	return -1;
 }
 
-uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		uint16_t flush_num __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				flush_num,
+				out,
+				max_nb_out);
 	return 0;
 }
 
 uint16_t
-rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_timeout_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 7fe11a6..24e3e34 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -34,7 +34,11 @@
 
 /* maximum number of supported GRO types */
 #define GRO_TYPE_MAX_NB 64
-#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
+#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
+
+/* TCP/IPv4 GRO flag */
+#define GRO_TCP_IPV4_INDEX 0
+#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
 
 /**
  * GRO table structure. DPDK GRO uses GRO table to reassemble
@@ -138,9 +142,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
  * @return
  *  the number of packets after GROed.
  */
-uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
-		const uint16_t nb_pkts __rte_unused,
-		const struct rte_gro_param param __rte_unused);
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param);
 
 /**
  * This is the main reassembly API used in heavyweight mode, which
@@ -163,8 +167,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
  *  if merge the packet successfully, return a positive value. If fail
  *  to merge, return zero. If errors happen, return a negative value.
  */
-int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
-		struct rte_gro_tbl *gro_tbl __rte_unused);
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl);
 
 /**
  * This function flushed packets of desired GRO types from their
@@ -183,11 +187,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
  * @return
  *  the number of flushed packets. If no packets are flushed, return 0.
  */
-uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		uint16_t flush_num __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused);
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
 
 /**
  * This function flushes the timeout packets from reassembly tables of
@@ -205,8 +209,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
  * @return
  *  the number of flushed packets. If no packets are flushed, return 0.
  */
-uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused);
+uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
 #endif
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..15f28f4
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,509 @@
+/*-
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp.h"
+
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp_tbl *tbl;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	if (entries_num == 0 || max_flow_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+
+	size = sizeof(struct gro_tcp_item) * entries_num;
+	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp_flow) * max_flow_num;
+	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_flow_num = max_flow_num;
+	return tbl;
+}
+
+void gro_tcp_tbl_destroy(void *tbl)
+{
+	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (tcp_tbl) {
+		if (tcp_tbl->items)
+			rte_free(tcp_tbl->items);
+		if (tcp_tbl->flows)
+			rte_free(tcp_tbl->flows);
+		rte_free(tcp_tbl);
+	}
+}
+
+/* update TCP header and IPv4 header checksum */
+static void
+gro_tcp4_cksum_update(struct rte_mbuf *pkt)
+{
+	uint32_t len, offset, cksum;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, cksum_pld;
+
+	if (pkt == NULL)
+		return;
+
+	len = pkt->pkt_len;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+
+	offset = sizeof(struct ether_hdr) + ipv4_ihl;
+	len -= offset;
+
+	/* TCP cksum without IP pseudo header */
+	ipv4_hdr->hdr_checksum = 0;
+	tcp_hdr->cksum = 0;
+	if (rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld) < 0) {
+		printf("invalid param for raw_cksum_mbuf\n");
+		return;
+	}
+	/* IP pseudo header cksum */
+	cksum = cksum_pld;
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
+
+	/* combine TCP checksum and IP pseudo header checksum */
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	cksum = (cksum == 0) ? 0xffff : cksum;
+	tcp_hdr->cksum = cksum;
+
+	/* update IP header cksum */
+	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+}
+
+void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
+{
+	uint64_t i;
+
+	for (i = 0; i < tbl->item_num; i++) {
+		if (tbl->items[i].is_groed)
+			gro_tcp4_cksum_update(tbl->items[i].pkt);
+	}
+}
+
+/**
+ * merge two TCP/IPv4 packets without update header checksum.
+ */
+static int
+merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
+		struct rte_mbuf *pkt,
+		struct gro_tcp_rule *rule)
+{
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	struct rte_mbuf *tail;
+
+	/* parse the given packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	/* parse the original packet */
+	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
+				struct ether_hdr *) + 1);
+
+	/* check reassembly rules */
+	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
+		return -1;
+
+	/* remove the header of the incoming packet */
+	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+			ipv4_ihl1 + tcp_hl1);
+
+	/* chain the two packet together */
+	tail = rte_pktmbuf_lastseg(pkt_src);
+	tail->next = pkt;
+
+	/* update IP header */
+	ipv4_hdr2->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr2->total_length)
+			+ tcp_dl1);
+
+	/* update mbuf metadata for the merged packet */
+	pkt_src->nb_segs++;
+	pkt_src->pkt_len += pkt->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl)
+{
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	uint32_t sent_seq1, sent_seq;
+	int ret = -1;
+
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	/* check if the two packets are neighbor */
+	if ((sent_seq ^ sent_seq1) == 0) {
+		/* check if the option fields equal */
+		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+			if ((tcp_hl1 != tcp_hl) ||
+					(memcmp(tcp_hdr1 + 1,
+							tcp_hdr + 1,
+							tcp_hl - sizeof
+							(struct tcp_hdr))
+					 == 0))
+				ret = 1;
+		}
+	}
+	return ret;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].is_valid == 0)
+			return i;
+	return INVALID_ITEM_INDEX;
+}
+
+static uint16_t
+find_an_empty_flow(struct gro_tcp_tbl *tbl)
+{
+	uint16_t i;
+
+	for (i = 0; i < tbl->max_flow_num; i++)
+		if (tbl->flows[i].is_valid == 0)
+			return i;
+	return INVALID_FLOW_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		struct gro_tcp_rule *rule)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
+
+	struct gro_tcp_flow_key key;
+	uint64_t ol_flags;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint16_t i, flow_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* 1. check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto fail;
+	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
+		goto fail;
+	if ((ipv4_hdr->fragment_offset &
+				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
+			== 0)
+		goto fail;
+
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto fail;
+
+	/**
+	 * 2. if HW rx checksum offload isn't enabled, recalculate the
+	 * checksum in SW. Then, check if the checksum is correct
+	 */
+	ol_flags = pkt->ol_flags;
+	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
+			PKT_RX_IP_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
+			goto fail;
+	} else {
+		ip_cksum = ipv4_hdr->hdr_checksum;
+		ipv4_hdr->hdr_checksum = 0;
+		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
+			goto fail;
+	}
+
+	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
+			PKT_RX_L4_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
+			goto fail;
+	} else {
+		tcp_cksum = tcp_hdr->cksum;
+		tcp_hdr->cksum = 0;
+		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
+			(ipv4_hdr, tcp_hdr);
+		if (tcp_hdr->cksum ^ tcp_cksum)
+			goto fail;
+	}
+
+	/**
+	 * 3. search for a flow and traverse all packets in the flow
+	 * to find one to merge with the given packet.
+	 */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_flow_num; i++) {
+		/* search all packets in a valid flow. */
+		if (tbl->flows[i].is_valid &&
+				(memcmp(&(tbl->flows[i].key), &key,
+						sizeof(struct gro_tcp_flow_key))
+				 == 0)) {
+			cur_idx = tbl->flows[i].start_index;
+			prev_idx = cur_idx;
+			while (cur_idx != INVALID_ITEM_INDEX) {
+				if (check_seq_option(tbl->items[cur_idx].pkt,
+							tcp_hdr,
+							tcp_hl) > 0) {
+					if (merge_two_tcp4_packets(
+								tbl->items[cur_idx].pkt,
+								pkt,
+								rule) > 0) {
+						/* successfully merge two packets */
+						tbl->items[cur_idx].is_groed = 1;
+						return 1;
+					}
+					/**
+					 * fail to merge two packets since
+					 * break the rules, add the packet
+					 * into the flow.
+					 */
+					goto insert_to_existed_flow;
+				} else {
+					prev_idx = cur_idx;
+					cur_idx = tbl->items[cur_idx].next_pkt_idx;
+				}
+			}
+			/**
+			 * fail to merge the given packet into an existed flow,
+			 * add it into the flow.
+			 */
+insert_to_existed_flow:
+			item_idx = find_an_empty_item(tbl);
+			/* the item number is beyond the maximum value */
+			if (item_idx == INVALID_ITEM_INDEX)
+				return -1;
+			tbl->items[prev_idx].next_pkt_idx = item_idx;
+			tbl->items[item_idx].pkt = pkt;
+			tbl->items[item_idx].is_groed = 0;
+			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
+			tbl->items[item_idx].is_valid = 1;
+			tbl->items[item_idx].start_time = rte_rdtsc();
+			tbl->item_num++;
+			return 0;
+		}
+	}
+
+	/**
+	 * merge fail as the given packet is a new flow. Therefore,
+	 * insert a new flow.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	flow_idx = find_an_empty_flow(tbl);
+	/**
+	 * if the flow or item number are beyond the maximum values,
+	 * the inputted packet won't be processed.
+	 */
+	if (item_idx == INVALID_ITEM_INDEX ||
+			flow_idx == INVALID_FLOW_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
+	tbl->items[item_idx].is_groed = 0;
+	tbl->items[item_idx].is_valid = 1;
+	tbl->items[item_idx].start_time = rte_rdtsc();
+	tbl->item_num++;
+
+	memcpy(&(tbl->flows[flow_idx].key),
+			&key, sizeof(struct gro_tcp_flow_key));
+	tbl->flows[flow_idx].start_index = item_idx;
+	tbl->flows[flow_idx].is_valid = 1;
+	tbl->flow_num++;
+
+	return 0;
+fail:
+	return -1;
+}
+
+uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint16_t num, k;
+	uint16_t i;
+	uint32_t j;
+
+	k = 0;
+	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
+	num = num > nb_out ? nb_out : num;
+	if (num == 0)
+		return 0;
+
+	for (i = 0; i < tbl->max_flow_num; i++) {
+		if (tbl->flows[i].is_valid) {
+			j = tbl->flows[i].start_index;
+			while (j != INVALID_ITEM_INDEX) {
+				/* update checksum for GROed packet */
+				if (tbl->items[j].is_groed)
+					gro_tcp4_cksum_update(tbl->items[j].pkt);
+
+				out[k++] = tbl->items[j].pkt;
+				tbl->items[j].is_valid = 0;
+				tbl->item_num--;
+				j = tbl->items[j].next_pkt_idx;
+
+				if (k == num) {
+					/* delete the flow */
+					if (j == INVALID_ITEM_INDEX) {
+						tbl->flows[i].is_valid = 0;
+						tbl->flow_num--;
+					} else
+						/* update flow information */
+						tbl->flows[i].start_index = j;
+					goto end;
+				}
+			}
+			/* delete the flow, as all of its packets are flushed */
+			tbl->flows[i].is_valid = 0;
+			tbl->flow_num--;
+		}
+	}
+end:
+	return num;
+}
+
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint16_t k;
+	uint16_t i;
+	uint32_t j;
+	uint64_t current_time;
+
+	if (nb_out == 0)
+		return 0;
+	k = 0;
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < tbl->max_flow_num; i++) {
+		if (tbl->flows[i].is_valid) {
+			j = tbl->flows[i].start_index;
+			while (j != INVALID_ITEM_INDEX) {
+				if (current_time - tbl->items[j].start_time >=
+						timeout_cycles) {
+					/* update checksum for GROed packet */
+					if (tbl->items[j].is_groed)
+						gro_tcp4_cksum_update(tbl->items[j].pkt);
+
+					out[k++] = tbl->items[j].pkt;
+					tbl->items[j].is_valid = 0;
+					tbl->item_num--;
+					j = tbl->items[j].next_pkt_idx;
+
+					if (k == nb_out) {
+						if (j == INVALID_ITEM_INDEX) {
+							/* delete the flow */
+							tbl->flows[i].is_valid = 0;
+							tbl->flow_num--;
+						} else
+							tbl->flows[i].start_index = j;
+						goto end;
+					}
+				}
+			}
+			/* delete the flow, as all of its packets are flushed */
+			tbl->flows[i].is_valid = 0;
+			tbl->flow_num--;
+		}
+	}
+end:
+	return k;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..76b2107
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,206 @@
+/*-
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define IPV4_HDR_DF_SHIFT 14
+#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
+
+#define INVALID_FLOW_INDEX 0xffffUL
+#define INVALID_ITEM_INDEX 0xffffffffULL
+
+/* criteria of mergeing packets */
+struct gro_tcp_flow_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
+	uint32_t ip_dst_addr[4];
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+};
+
+struct gro_tcp_flow {
+	struct gro_tcp_flow_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+	/* flag to indicate if the packet is GROed */
+	uint8_t is_groed;
+	uint8_t is_valid;	/**< flag indicates if the item is valid */
+};
+
+/**
+ * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
+ * structure.
+ */
+struct gro_tcp_tbl {
+	struct gro_tcp_item *items;	/**< item array */
+	struct gro_tcp_flow *flows;	/**< flow array */
+	uint32_t item_num;	/**< current item number */
+	uint16_t flow_num;	/**< current flow num */
+	uint32_t max_item_num;	/**< item array size */
+	uint16_t max_flow_num;	/**< flow array size */
+};
+
+/* rules to reassemble TCP packets, which are decided by applications */
+struct gro_tcp_rule {
+	/* the maximum packet length after merged */
+	uint32_t max_packet_size;
+};
+
+/**
+ * This function is to update TCP and IPv4 header checksums
+ * for merged packets in the TCP reassembly table.
+ */
+void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
+
+/**
+ * This function creates a TCP reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP reassembly table.
+ */
+void gro_tcp_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP reassembly table to
+ * merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Note that this function won't
+ * re-calculate IPv4 and TCP checksums.
+ *
+ * If the packet doesn't have data, or with wrong checksums, or is
+ * fragmented etc., errors happen and gro_tcp4_reassemble returns
+ * immediately. If no errors happen, the packet is either merged, or
+ * inserted into the reassembly table.
+ *
+ * If applications want to get packets in the reassembly table, they
+ * need to manually flush the packets.
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP reassembly table.
+ * @param rule
+ *  TCP reassembly criteria defined by applications.
+ * @return
+ *  if the inputted packet is merged successfully, return an positive
+ *  value. If the packet hasn't be merged with any packets in the TCP
+ *  reassembly table. If errors happen, return a negative value and the
+ *  packet won't be inserted into the reassemble table.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		struct gro_tcp_rule *rule);
+
+/**
+ * This function flushes the packets in a TCP reassembly table to
+ * applications. Before returning the packets, it will update TCP and
+ * IPv4 header checksums.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param flush_num
+ *  the number of packets that applications want to flush.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the maximum element number of out.
+ * @return
+ *  the number of packets that are flushed finally.
+ */
+uint16_t
+gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+
+/**
+ * This function flushes timeout packets in a TCP reassembly table to
+ * applications. Before returning the packets, it updates TCP and IPv4
+ * header checksums.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the maximum element number of out.
+ * @return
+ *  It returns the number of packets that are flushed finally.
+ */
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v4 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-07 11:08       ` [PATCH v4 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-07 11:08       ` [PATCH v4 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-07 11:08       ` Jiayu Hu
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-07 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, keith.wiles, yliu, jianfeng.tan, tiwei.bie, Jiayu Hu

This patch demonstrates the usage of GRO library in testpmd. By default,
GRO is turned off. Command, "gro on (port_id)", turns on GRO for the
given port; command, "gro off (port_id)", turns off GRO for the given
port. Currently, GRO only supports to process TCP/IPv4 packets and works
in IO forward mode. Besides, only GRO lightweight mode is enabled.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/config.c  | 29 +++++++++++++++++++++++++++++
 app/test-pmd/iofwd.c   |  6 ++++++
 app/test-pmd/testpmd.c |  3 +++
 app/test-pmd/testpmd.h | 11 +++++++++++
 5 files changed, 94 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0afac68..24cb4ff 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -420,6 +421,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3825,6 +3829,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_set_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_set_gro = {
+	.f = cmd_set_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13629,6 +13673,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_set_gro,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 14be759..e8a665f 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -68,6 +68,7 @@
 #ifdef RTE_LIBRTE_IXGBE_PMD
 #include <rte_pmd_ixgbe.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2393,6 +2394,34 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (test_done == 0) {
+			printf("before enable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.max_flow_num = 4;
+		gro_ports[port_id].param.max_item_per_flow = 32;
+		gro_ports[port_id].param.desired_gro_types = GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+	} else if (strcmp(mode, "off") == 0) {
+		if (test_done == 0) {
+			printf("before disable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 15cb4a2..d9ec528 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -65,6 +65,7 @@
 #include <rte_ethdev.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -99,6 +100,11 @@ pkt_burst_io_forward(struct fwd_stream *fs)
 			pkts_burst, nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable)) {
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				gro_ports[fs->rx_port].param);
+	}
 	fs->rx_packets += nb_rx;
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index d1041af..8d7a6a6 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -87,6 +87,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -375,6 +376,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index e6c43ba..4c6ed03 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -109,6 +111,8 @@ struct fwd_stream {
 	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
 	streamid_t peer_addr; /**< index of peer ethernet address of packets */
 
+	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
+
 	unsigned int retry_enabled;
 
 	/* "read-write" results */
@@ -428,6 +432,12 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -625,6 +635,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                         ` (2 preceding siblings ...)
  2017-06-07 11:08       ` [PATCH v4 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-18  7:21       ` Jiayu Hu
  2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                           ` (4 more replies)
  3 siblings, 5 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-18  7:21 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, yliu, keith.wiles, jianfeng.tan, tiwei.bie,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweigth mode, the other is called heavyweight mode.
If applications want merge packets in a simple way, they can use
lightweigth mode. If applications need more fine-grained controls,
they can choose heavyweigth mode.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch demonstrates how to use GRO library in app/testpmd.

We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
the performance gains from DPDK GRO. Specifically, the experiment
environment is:
a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. Iperf client
runs on p0, which sends TCP/IPv4 packets. The OS in VM is ubuntu 14.04;
c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
VM via vhost-user and virtio-net. The VM runs iperf server, whose IP
is 1.1.2.4;
d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
iperf client and server use the following commands:
	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
	- server: iperf -s -f g
Two test cases are:
a. w/o DPDK GRO: run testpmd without GRO
b. w DPDK GRO: testpmd enables GRO for p1
Result:
With GRO, the throughput improvement is around 40%.

Change log
==========
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c       |  45 ++++
 app/test-pmd/config.c        |  29 +++
 app/test-pmd/iofwd.c         |   6 +
 app/test-pmd/testpmd.c       |   3 +
 app/test-pmd/testpmd.h       |  11 +
 config/common_base           |   5 +
 lib/Makefile                 |   1 +
 lib/librte_gro/Makefile      |  51 +++++
 lib/librte_gro/rte_gro.c     | 248 ++++++++++++++++++++
 lib/librte_gro/rte_gro.h     | 217 ++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
 mk/rte.app.mk                |   1 +
 13 files changed, 1354 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v5 1/3] lib: add Generic Receive Offload API framework
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-06-18  7:21         ` Jiayu Hu
  2017-06-19  4:03           ` Tiwei Bie
                             ` (2 more replies)
  2017-06-18  7:21         ` [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                           ` (3 subsequent siblings)
  4 siblings, 3 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-18  7:21 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, yliu, keith.wiles, jianfeng.tan, tiwei.bie,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweigth mode, the other is called heavyweight mode.
If applications want merge packets in a simple way, they can use
lightweigth mode. If applications need more fine-grained controls,
they can choose heavyweigth mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweigth mode and processes N packets at a time. For applications,
performing GRO in lightweigth mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
lightweigth mode and processes one packet at a time. For applications,
performing GRO in heavyweigth mode is relatively complicated. Before
performing GRO, applications need to create a GRO table by
rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
packets one by one. The processed packets are in the GRO table. If
applications want to get them, applications need to manually flush
them by flush APIs.

In DPDK GRO, different GRO types define own reassembly tables. When
create a GRO table, it keeps the reassembly tables of desired GRO types.
To process one packet, we search for the corresponding reassembly table
according to the packet type first. Then search for the reassembly table
to find an existed packet to merge. If find, chain the two packets
together. If not find, insert the packet into the reassembly table. If
the packet is with wrong checksum, or is fragmented etc., error happens.
The reassebly function will stop processing the packet.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base       |   5 ++
 lib/Makefile             |   1 +
 lib/librte_gro/Makefile  |  50 +++++++++++
 lib/librte_gro/rte_gro.c | 126 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h | 213 +++++++++++++++++++++++++++++++++++++++++++++++
 mk/rte.app.mk            |   1 +
 6 files changed, 396 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h

diff --git a/config/common_base b/config/common_base
index f6aafd1..167f5ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..e253053 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..9f4063a
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..1bc53a2
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,126 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
+
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	struct rte_gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct rte_gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	gro_tbl->max_packet_size = max_packet_size;
+	gro_tbl->max_timeout_cycles = max_timeout_cycles;
+	gro_tbl->desired_gro_types = desired_gro_types;
+
+	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
+		gro_type_flag = 1 << i;
+		if (desired_gro_types & gro_type_flag) {
+			create_tbl_fn = tbl_create_functions[i];
+			if (create_tbl_fn)
+				create_tbl_fn(socket_id,
+						max_flow_num,
+						max_item_per_flow);
+			else
+				gro_tbl->tbls[i] = NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
+		gro_type_flag = 1 << i;
+		if (gro_tbl->desired_gro_types & gro_type_flag) {
+			destroy_tbl_fn = tbl_destroy_functions[i];
+			if (destroy_tbl_fn)
+				destroy_tbl_fn(gro_tbl->tbls[i]);
+			gro_tbl->tbls[i] = NULL;
+		}
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param __rte_unused)
+{
+	return nb_pkts;
+}
+
+int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
+		struct rte_gro_tbl *gro_tbl __rte_unused)
+{
+	return -1;
+}
+
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		uint16_t flush_num __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint16_t
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..67bd90d
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,213 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+/* maximum number of supported GRO types */
+#define GRO_TYPE_MAX_NB 64
+#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
+
+/**
+ * GRO table structure. DPDK GRO uses GRO table to reassemble
+ * packets. In heightweight mode, applications must create GRO tables
+ * before performing GRO. However, in lightweight mode, applications
+ * don't need create GRO tables.
+ *
+ * A GRO table object stores many reassembly tables of desired
+ * GRO types.
+ */
+struct rte_gro_tbl {
+	/* table addresses of desired GRO types */
+	void *tbls[GRO_TYPE_MAX_NB];
+	uint64_t desired_gro_types;	/**< GRO types that want to perform */
+	/**
+	 * the maximum time of packets staying in GRO tables, measured in
+	 * nanosecond.
+	 */
+	uint64_t max_timeout_cycles;
+	/* the maximum length of merged packet, measured in byte */
+	uint32_t max_packet_size;
+};
+
+/**
+ * In lightweihgt mode, applications use this strcuture to pass the
+ * needed parameters to rte_gro_reassemble_burst.
+ */
+struct rte_gro_param {
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max item number per flow */
+	/**
+	 * It indicates the GRO types that applications want to perform,
+	 * whose value is the result of OR operation on GRO type flags.
+	 */
+	uint64_t desired_gro_types;
+	/* the maximum packet size after been merged */
+	uint32_t max_packet_size;
+};
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+
+/**
+ * This function create a GRO table, which is used to merge packets.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum flow number in the GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow
+ * @param max_packet_size
+ *  the maximum size of merged packets, which is measured in byte.
+ * @param max_timeout_cycles
+ *  the maximum time that a packet can stay in the GRO table.
+ * @param desired_gro_types
+ *  GRO types that applications want to perform. It's value is the
+ *  result of OR operation on desired GRO type flags.
+ * @return
+ *  If create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
+
+/**
+ * This is the main reassembly API used in lightweight mode, which
+ * merges numbers of packets at a time. After it returns, applications
+ * can get GROed packets immediately. Applications don't need to
+ * flush packets manually. In lightweight mode, applications just need
+ * to tell the reassembly API what rules should be applied when merge
+ * packets. Therefore, applications can perform GRO in very a simple
+ * way.
+ *
+ * To process one packet, we find its corresponding reassembly table
+ * according to the packet type. Then search for the reassembly table
+ * to find one packet to merge. If find, chain the two packets together.
+ * If not find, insert the inputted packet into the reassembly table.
+ * Besides, to merge two packets is to chain them together. No
+ * memory copy is needed. Before rte_gro_reassemble_burst returns,
+ * header checksums of merged packets are re-calculated.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. After
+ *  GRO, it is also used to keep GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  Applications use param to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		const uint16_t nb_pkts __rte_unused,
+		const struct rte_gro_param param __rte_unused);
+
+/**
+ * This is the main reassembly API used in heavyweight mode, which
+ * merges one packet at a time. The procedure of merging one packet is
+ * similar with rte_gro_reassemble_burst. But rte_gro_reassemble will
+ * not update header checksums. Header checksums of merged packets are
+ * re-calculated in flush APIs.
+ *
+ * If error happens, like packet with error checksum and with
+ * unsupported GRO types, the inputted packet won't be stored in GRO
+ * table. If no errors happen, the packet is either merged with an
+ * existed packet, or inserted into its corresponding reassembly table.
+ * Applications can get packets in the GRO table by flush APIs.
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param gro_tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  if merge the packet successfully, return a positive value. If fail
+ *  to merge, return zero. If errors happen, return a negative value.
+ */
+int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
+		struct rte_gro_tbl *gro_tbl __rte_unused);
+
+/**
+ * This function flushed packets of desired GRO types from their
+ * corresponding reassembly tables.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ *  GRO types whose packets will be flushed.
+ * @param flush_num
+ *  the number of packets that need flushing.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		uint16_t flush_num __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused);
+#endif
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-18  7:21         ` Jiayu Hu
  2017-06-19 15:43           ` Tan, Jianfeng
  2017-06-18  7:21         ` [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-18  7:21 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, yliu, keith.wiles, jianfeng.tan, tiwei.bie,
	lei.a.yao, Jiayu Hu

In this patch, we introduce six APIs to support TCP/IPv4 GRO.
- gro_tcp_tbl_create: create a TCP reassembly table, which is used to
    merge packets.
- gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
- gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
- gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
    reassembly table.
- gro_tcp4_reassemble: merge an inputted packet.
- gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
    all merged packets in the TCP reassembly table.

In TCP GRO, we use a table structure, called TCP reassembly table, to
reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
structure. A TCP reassembly table includes a flow array and a item array,
where the flow array is used to record flow information and the item
array is used to record packets information.

Each element in the flow array records the information of one flow,
which includes two parts:
- key: the criteria of the same flow. If packets have the same key
    value, they belong to the same flow.
- start_index: the index of the first incoming packet of this flow in
    the item array. With start_index, we can locate the first incoming
    packet of this flow.
Each element in the item array records one packet information. It mainly
includes two parts:
- pkt: packet address
- next_pkt_index: index of the next packet of the same flow in the item
    array. All packets of the same flow are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets of the same flow
    one by one.

To process an incoming packet, we need three steps:
a. check if the packet should be processed. Packets with the following
    properties won't be processed:
	- packets without data;
	- packets with wrong checksums;
	- fragmented packets.
b. traverse the flow array to find a flow which the packet belongs to.
    If not find, insert a new flow and store the packet into the item
    array.
c. locate the first packet of this flow in the item array via
    start_index. Then traverse all packets of this flow one by one via
    next_pkt_index. If find one packet to merge with the incoming packet,
    merge them but without updating checksums. If not, allocate one item
    in the item array to store the incoming packet and update
    next_pkt_index value.

For better performance, we don't update header checksums once two
packets are merged. The header checksums are updated only when packets
are flushed from TCP reassembly tables.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 lib/librte_gro/Makefile      |   1 +
 lib/librte_gro/rte_gro.c     | 154 +++++++++++--
 lib/librte_gro/rte_gro.h     |  34 +--
 lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
 5 files changed, 895 insertions(+), 31 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 9f4063a..3495dfc 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 1bc53a2..2620ef6 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,11 +32,17 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp.h"
 
-static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
-static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
+	gro_tcp_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
+	gro_tcp_tbl_destroy, NULL};
 
 struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		const uint16_t nb_pkts,
-		const struct rte_gro_param param __rte_unused)
+		const struct rte_gro_param param)
 {
-	return nb_pkts;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t l3proc_type, i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint16_t flow_num = nb_pkts < param.max_flow_num ?
+		nb_pkts : param.max_flow_num;
+	uint32_t item_num = nb_pkts <
+		flow_num * param.max_item_per_flow ?
+		nb_pkts :
+		flow_num * param.max_item_per_flow;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
+		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
+	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
+		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
+	struct gro_tcp_tbl tcp_tbl;
+	struct gro_tcp_flow tcp_flows[tcp_flow_num];
+	struct gro_tcp_item tcp_items[tcp_item_num];
+	struct gro_tcp_rule tcp_rule;
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+
+	if (unlikely(nb_pkts <= 1))
+		return nb_pkts;
+
+	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
+			tcp_flow_num);
+	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
+			tcp_item_num);
+	tcp_tbl.flows = tcp_flows;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.flow_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_flow_num = tcp_flow_num;
+	tcp_tbl.max_item_num = tcp_item_num;
+	tcp_rule.max_packet_size = param.max_packet_size;
+
+	for (i = 0; i < nb_pkts; i++) {
+		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
+		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
+		if (l3proc_type == ETHER_TYPE_IPv4) {
+			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
+					(param.desired_gro_types &
+					 GRO_TCP_IPV4)) {
+				ret = gro_tcp4_reassemble(pkts[i],
+						&tcp_tbl,
+						&tcp_rule);
+				if (ret > 0)
+					nb_after_gro--;
+				else if (ret < 0)
+					unprocess_pkts[unprocess_num++] =
+						pkts[i];
+			} else
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	if (nb_after_gro < nb_pkts) {
+		/* update packets headers and re-arrange GROed packets */
+		if (param.desired_gro_types & GRO_TCP_IPV4) {
+			gro_tcp4_tbl_cksum_update(&tcp_tbl);
+			for (i = 0; i < tcp_tbl.item_num; i++)
+				pkts[i] = tcp_tbl.items[i].pkt;
+		}
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+			i += unprocess_num;
+		}
+		if (nb_pkts > i)
+			memset(&pkts[i], 0,
+					sizeof(struct rte_mbuf *) *
+					(nb_pkts - i));
+	}
+	return nb_after_gro;
 }
 
-int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
-		struct rte_gro_tbl *gro_tbl __rte_unused)
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl)
 {
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	uint16_t l3proc_type;
+	struct gro_tcp_rule tcp_rule;
+
+	if (pkt == NULL)
+		return -1;
+	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
+	if (l3proc_type == ETHER_TYPE_IPv4) {
+		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
+				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
+			return gro_tcp4_reassemble(pkt,
+					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+					&tcp_rule);
+		}
+	}
 	return -1;
 }
 
-uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		uint16_t flush_num __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				flush_num,
+				out,
+				max_nb_out);
 	return 0;
 }
 
 uint16_t
-rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_timeout_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 67bd90d..e26aa5b 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -35,7 +35,11 @@
 
 /* maximum number of supported GRO types */
 #define GRO_TYPE_MAX_NB 64
-#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
+#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
+
+/* TCP/IPv4 GRO flag */
+#define GRO_TCP_IPV4_INDEX 0
+#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
 
 /**
  * GRO table structure. DPDK GRO uses GRO table to reassemble
@@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
  * @return
  *  the number of packets after GROed.
  */
-uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
-		const uint16_t nb_pkts __rte_unused,
-		const struct rte_gro_param param __rte_unused);
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param);
 
 /**
  * This is the main reassembly API used in heavyweight mode, which
@@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
  *  if merge the packet successfully, return a positive value. If fail
  *  to merge, return zero. If errors happen, return a negative value.
  */
-int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
-		struct rte_gro_tbl *gro_tbl __rte_unused);
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl);
 
 /**
  * This function flushed packets of desired GRO types from their
@@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
  * @return
  *  the number of flushed packets. If no packets are flushed, return 0.
  */
-uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		uint16_t flush_num __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused);
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
 
 /**
  * This function flushes the timeout packets from reassembly tables of
@@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
  * @return
  *  the number of flushed packets. If no packets are flushed, return 0.
  */
-uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused);
+uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
 #endif
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..86743cd
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,527 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp.h"
+
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp_tbl *tbl;
+
+	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
+		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
+		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0 || max_flow_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+
+	size = sizeof(struct gro_tcp_item) * entries_num;
+	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp_flow) * max_flow_num;
+	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_flow_num = max_flow_num;
+	return tbl;
+}
+
+void gro_tcp_tbl_destroy(void *tbl)
+{
+	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (tcp_tbl) {
+		if (tcp_tbl->items)
+			rte_free(tcp_tbl->items);
+		if (tcp_tbl->flows)
+			rte_free(tcp_tbl->flows);
+		rte_free(tcp_tbl);
+	}
+}
+
+/* update TCP header and IPv4 header checksum */
+static void
+gro_tcp4_cksum_update(struct rte_mbuf *pkt)
+{
+	uint32_t len, offset, cksum;
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, cksum_pld;
+
+	if (pkt == NULL)
+		return;
+
+	len = pkt->pkt_len;
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+
+	offset = sizeof(struct ether_hdr) + ipv4_ihl;
+	len -= offset;
+
+	/* TCP cksum without IP pseudo header */
+	ipv4_hdr->hdr_checksum = 0;
+	tcp_hdr->cksum = 0;
+	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
+
+	/* IP pseudo header cksum */
+	cksum = cksum_pld;
+	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
+
+	/* combine TCP checksum and IP pseudo header checksum */
+	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
+	cksum = (~cksum) & 0xffff;
+	cksum = (cksum == 0) ? 0xffff : cksum;
+	tcp_hdr->cksum = cksum;
+
+	/* update IP header cksum */
+	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+}
+
+void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+	uint32_t item_num = tbl->item_num;
+
+	for (i = 0; i < tbl->max_item_num; i++) {
+		if (tbl->items[i].is_valid) {
+			item_num--;
+			if (tbl->items[i].is_groed)
+				gro_tcp4_cksum_update(tbl->items[i].pkt);
+		}
+		if (unlikely(item_num == 0))
+			break;
+	}
+}
+
+/**
+ * merge two TCP/IPv4 packets without update header checksum.
+ */
+static int
+merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
+		struct rte_mbuf *pkt,
+		struct gro_tcp_rule *rule)
+{
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	struct rte_mbuf *tail;
+
+	/* parse the given packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	/* parse the original packet */
+	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
+				struct ether_hdr *) + 1);
+
+	/* check reassembly rules */
+	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
+		return -1;
+
+	/* remove the header of the incoming packet */
+	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+			ipv4_ihl1 + tcp_hl1);
+
+	/* chain the two packet together */
+	tail = rte_pktmbuf_lastseg(pkt_src);
+	tail->next = pkt;
+
+	/* update IP header */
+	ipv4_hdr2->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr2->total_length)
+			+ tcp_dl1);
+
+	/* update mbuf metadata for the merged packet */
+	pkt_src->nb_segs++;
+	pkt_src->pkt_len += pkt->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl)
+{
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	uint32_t sent_seq1, sent_seq;
+	int ret = -1;
+
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	/* check if the two packets are neighbor */
+	if ((sent_seq ^ sent_seq1) == 0) {
+		/* check if TCP option field equals */
+		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+			if ((tcp_hl1 != tcp_hl) ||
+					(memcmp(tcp_hdr1 + 1,
+							tcp_hdr + 1,
+							tcp_hl - sizeof
+							(struct tcp_hdr))
+					 == 0))
+				ret = 1;
+		}
+	}
+	return ret;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].is_valid == 0)
+			return i;
+	return INVALID_ITEM_INDEX;
+}
+
+static uint16_t
+find_an_empty_flow(struct gro_tcp_tbl *tbl)
+{
+	uint16_t i;
+
+	for (i = 0; i < tbl->max_flow_num; i++)
+		if (tbl->flows[i].is_valid == 0)
+			return i;
+	return INVALID_FLOW_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		struct gro_tcp_rule *rule)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
+
+	struct gro_tcp_flow_key key;
+	uint64_t ol_flags;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint16_t i, flow_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* 1. check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto fail;
+	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
+		goto fail;
+	if ((ipv4_hdr->fragment_offset &
+				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
+			== 0)
+		goto fail;
+
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto fail;
+
+	/**
+	 * 2. if HW rx checksum offload isn't enabled, recalculate the
+	 * checksum in SW. Then, check if the checksum is correct
+	 */
+	ol_flags = pkt->ol_flags;
+	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
+			PKT_RX_IP_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
+			goto fail;
+	} else {
+		ip_cksum = ipv4_hdr->hdr_checksum;
+		ipv4_hdr->hdr_checksum = 0;
+		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
+			goto fail;
+	}
+
+	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
+			PKT_RX_L4_CKSUM_UNKNOWN) {
+		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
+			goto fail;
+	} else {
+		tcp_cksum = tcp_hdr->cksum;
+		tcp_hdr->cksum = 0;
+		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
+			(ipv4_hdr, tcp_hdr);
+		if (tcp_hdr->cksum ^ tcp_cksum)
+			goto fail;
+	}
+
+	/**
+	 * 3. search for a flow and traverse all packets in the flow
+	 * to find one to merge with the given packet.
+	 */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_flow_num; i++) {
+		/* search all packets in a valid flow. */
+		if (tbl->flows[i].is_valid &&
+				(memcmp(&(tbl->flows[i].key), &key,
+						sizeof(struct gro_tcp_flow_key))
+				 == 0)) {
+			cur_idx = tbl->flows[i].start_index;
+			prev_idx = cur_idx;
+			while (cur_idx != INVALID_ITEM_INDEX) {
+				if (check_seq_option(tbl->items[cur_idx].pkt,
+							tcp_hdr,
+							tcp_hl) > 0) {
+					if (merge_two_tcp4_packets(
+								tbl->items[cur_idx].pkt,
+								pkt,
+								rule) > 0) {
+						/* successfully merge two packets */
+						tbl->items[cur_idx].is_groed = 1;
+						return 1;
+					}
+					/**
+					 * fail to merge two packets since
+					 * break the rules, add the packet
+					 * into the flow.
+					 */
+					goto insert_to_existed_flow;
+				} else {
+					prev_idx = cur_idx;
+					cur_idx = tbl->items[cur_idx].next_pkt_idx;
+				}
+			}
+			/**
+			 * fail to merge the given packet into an existed flow,
+			 * add it into the flow.
+			 */
+insert_to_existed_flow:
+			item_idx = find_an_empty_item(tbl);
+			/* the item number is beyond the maximum value */
+			if (item_idx == INVALID_ITEM_INDEX)
+				return -1;
+			tbl->items[prev_idx].next_pkt_idx = item_idx;
+			tbl->items[item_idx].pkt = pkt;
+			tbl->items[item_idx].is_groed = 0;
+			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
+			tbl->items[item_idx].is_valid = 1;
+			tbl->items[item_idx].start_time = rte_rdtsc();
+			tbl->item_num++;
+			return 0;
+		}
+	}
+
+	/**
+	 * merge fail as the given packet is a new flow. Therefore,
+	 * insert a new flow.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	flow_idx = find_an_empty_flow(tbl);
+	/**
+	 * if the flow or item number are beyond the maximum values,
+	 * the inputted packet won't be processed.
+	 */
+	if (item_idx == INVALID_ITEM_INDEX ||
+			flow_idx == INVALID_FLOW_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
+	tbl->items[item_idx].is_groed = 0;
+	tbl->items[item_idx].is_valid = 1;
+	tbl->items[item_idx].start_time = rte_rdtsc();
+	tbl->item_num++;
+
+	memcpy(&(tbl->flows[flow_idx].key),
+			&key, sizeof(struct gro_tcp_flow_key));
+	tbl->flows[flow_idx].start_index = item_idx;
+	tbl->flows[flow_idx].is_valid = 1;
+	tbl->flow_num++;
+
+	return 0;
+fail:
+	return -1;
+}
+
+uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint16_t num, k;
+	uint16_t i;
+	uint32_t j;
+
+	k = 0;
+	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
+	num = num > nb_out ? nb_out : num;
+	if (unlikely(num == 0))
+		return 0;
+
+	for (i = 0; i < tbl->max_flow_num; i++) {
+		if (tbl->flows[i].is_valid) {
+			j = tbl->flows[i].start_index;
+			while (j != INVALID_ITEM_INDEX) {
+				/* update checksum for GROed packet */
+				if (tbl->items[j].is_groed)
+					gro_tcp4_cksum_update(tbl->items[j].pkt);
+
+				out[k++] = tbl->items[j].pkt;
+				tbl->items[j].is_valid = 0;
+				tbl->item_num--;
+				j = tbl->items[j].next_pkt_idx;
+
+				if (k == num) {
+					/* delete the flow */
+					if (j == INVALID_ITEM_INDEX) {
+						tbl->flows[i].is_valid = 0;
+						tbl->flow_num--;
+					} else
+						/* update flow information */
+						tbl->flows[i].start_index = j;
+					goto end;
+				}
+			}
+			/* delete the flow, as all of its packets are flushed */
+			tbl->flows[i].is_valid = 0;
+			tbl->flow_num--;
+		}
+		if (tbl->flow_num == 0)
+			goto end;
+	}
+end:
+	return num;
+}
+
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint16_t k;
+	uint16_t i;
+	uint32_t j;
+	uint64_t current_time;
+
+	if (nb_out == 0)
+		return 0;
+	k = 0;
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < tbl->max_flow_num; i++) {
+		if (tbl->flows[i].is_valid) {
+			j = tbl->flows[i].start_index;
+			while (j != INVALID_ITEM_INDEX) {
+				if (current_time - tbl->items[j].start_time >=
+						timeout_cycles) {
+					/* update checksum for GROed packet */
+					if (tbl->items[j].is_groed)
+						gro_tcp4_cksum_update(tbl->items[j].pkt);
+
+					out[k++] = tbl->items[j].pkt;
+					tbl->items[j].is_valid = 0;
+					tbl->item_num--;
+					j = tbl->items[j].next_pkt_idx;
+
+					if (k == nb_out &&
+							j == INVALID_ITEM_INDEX) {
+						/* delete the flow */
+						tbl->flows[i].is_valid = 0;
+						tbl->flow_num--;
+						goto end;
+					} else if (k == nb_out &&
+							j != INVALID_ITEM_INDEX) {
+						tbl->flows[i].start_index = j;
+						goto end;
+					}
+				}
+			}
+			/* delete the flow, as all of its packets are flushed */
+			tbl->flows[i].is_valid = 0;
+			tbl->flow_num--;
+		}
+		if (tbl->flow_num == 0)
+			goto end;
+	}
+end:
+	return k;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..551efc4
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,210 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define IPV4_HDR_DF_SHIFT 14
+#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
+
+#define INVALID_FLOW_INDEX 0xffffU
+#define INVALID_ITEM_INDEX 0xffffffffUL
+
+#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
+#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/* criteria of mergeing packets */
+struct gro_tcp_flow_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
+	uint32_t ip_dst_addr[4];
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+};
+
+struct gro_tcp_flow {
+	struct gro_tcp_flow_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+	/* flag to indicate if the packet is GROed */
+	uint8_t is_groed;
+	uint8_t is_valid;	/**< flag indicates if the item is valid */
+};
+
+/**
+ * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
+ * structure.
+ */
+struct gro_tcp_tbl {
+	struct gro_tcp_item *items;	/**< item array */
+	struct gro_tcp_flow *flows;	/**< flow array */
+	uint32_t item_num;	/**< current item number */
+	uint16_t flow_num;	/**< current flow num */
+	uint32_t max_item_num;	/**< item array size */
+	uint16_t max_flow_num;	/**< flow array size */
+};
+
+/* rules to reassemble TCP packets, which are decided by applications */
+struct gro_tcp_rule {
+	/* the maximum packet length after merged */
+	uint32_t max_packet_size;
+};
+
+/**
+ * This function is to update TCP and IPv4 header checksums
+ * for merged packets in the TCP reassembly table.
+ */
+void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
+
+/**
+ * This function creates a TCP reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP reassembly table.
+ */
+void gro_tcp_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP reassembly table to
+ * merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Note that this function won't
+ * re-calculate IPv4 and TCP checksums.
+ *
+ * If the packet doesn't have data, or with wrong checksums, or is
+ * fragmented etc., errors happen and gro_tcp4_reassemble returns
+ * immediately. If no errors happen, the packet is either merged, or
+ * inserted into the reassembly table.
+ *
+ * If applications want to get packets in the reassembly table, they
+ * need to manually flush the packets.
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP reassembly table.
+ * @param rule
+ *  TCP reassembly criteria defined by applications.
+ * @return
+ *  if the inputted packet is merged successfully, return an positive
+ *  value. If the packet hasn't be merged with any packets in the TCP
+ *  reassembly table. If errors happen, return a negative value and the
+ *  packet won't be inserted into the reassemble table.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		struct gro_tcp_rule *rule);
+
+/**
+ * This function flushes the packets in a TCP reassembly table to
+ * applications. Before returning the packets, it will update TCP and
+ * IPv4 header checksums.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param flush_num
+ *  the number of packets that applications want to flush.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the maximum element number of out.
+ * @return
+ *  the number of packets that are flushed finally.
+ */
+uint16_t
+gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		uint16_t flush_num,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+
+/**
+ * This function flushes timeout packets in a TCP reassembly table to
+ * applications. Before returning the packets, it updates TCP and IPv4
+ * header checksums.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the maximum element number of out.
+ * @return
+ *  It returns the number of packets that are flushed finally.
+ */
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-18  7:21         ` [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-18  7:21         ` Jiayu Hu
  2017-06-19  1:24           ` Yao, Lei A
  2017-06-19  2:27           ` Wu, Jingjing
  2017-06-19  1:39         ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
  4 siblings, 2 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-18  7:21 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, yliu, keith.wiles, jianfeng.tan, tiwei.bie,
	lei.a.yao, Jiayu Hu

This patch demonstrates the usage of GRO library in testpmd. By default,
GRO is turned off. Command, "gro on (port_id)", turns on GRO for the
given port; command, "gro off (port_id)", turns off GRO for the given
port. Currently, GRO only supports to process TCP/IPv4 packets and works
in IO forward mode. Besides, only GRO lightweight mode is enabled.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/config.c  | 29 +++++++++++++++++++++++++++++
 app/test-pmd/iofwd.c   |  6 ++++++
 app/test-pmd/testpmd.c |  3 +++
 app/test-pmd/testpmd.h | 11 +++++++++++
 5 files changed, 94 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 105c71f..d1ca8df 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -423,6 +424,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3831,6 +3835,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_set_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_set_gro = {
+	.f = cmd_set_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13710,6 +13754,7 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_set_gro,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 3cd4f31..858342d 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,34 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (test_done == 0) {
+			printf("before enable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.max_flow_num = 4;
+		gro_ports[port_id].param.max_item_per_flow = 32;
+		gro_ports[port_id].param.desired_gro_types = GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+	} else if (strcmp(mode, "off") == 0) {
+		if (test_done == 0) {
+			printf("before disable GRO,"
+					" please stop forwarding first\n");
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 15cb4a2..d9ec528 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -65,6 +65,7 @@
 #include <rte_ethdev.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -99,6 +100,11 @@ pkt_burst_io_forward(struct fwd_stream *fs)
 			pkts_burst, nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable)) {
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				gro_ports[fs->rx_port].param);
+	}
 	fs->rx_packets += nb_rx;
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b29328a..ed27c7a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..0471e99 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -109,6 +111,8 @@ struct fwd_stream {
 	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
 	streamid_t peer_addr; /**< index of peer ethernet address of packets */
 
+	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
+
 	unsigned int retry_enabled;
 
 	/* "read-write" results */
@@ -428,6 +432,12 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-18  7:21         ` [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-19  1:24           ` Yao, Lei A
  2017-06-19  2:27           ` Wu, Jingjing
  1 sibling, 0 replies; 141+ messages in thread
From: Yao, Lei A @ 2017-06-19  1:24 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Ananyev, Konstantin, yliu, Wiles, Keith, Tan, Jianfeng, Bie, Tiwei



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Sunday, June 18, 2017 3:21 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> yliu@fridaylinux.org; Wiles, Keith <keith.wiles@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>; Yao, Lei A
> <lei.a.yao@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
> 
> This patch demonstrates the usage of GRO library in testpmd. By default,
> GRO is turned off. Command, "gro on (port_id)", turns on GRO for the
> given port; command, "gro off (port_id)", turns off GRO for the given
> port. Currently, GRO only supports to process TCP/IPv4 packets and works
> in IO forward mode. Besides, only GRO lightweight mode is enabled.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>

Tested-by: Lei Yao<lei.a.yao@intel.com>
This patch set has been tested on my bench using iperf. In
some scenario, the performance gain can reach about 50%.
The performance gain will depend on the rx burst size in real 
usage.
OS: Ubuntu 16.04
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz 


> ---
>  app/test-pmd/cmdline.c | 45
> +++++++++++++++++++++++++++++++++++++++++++++
>  app/test-pmd/config.c  | 29 +++++++++++++++++++++++++++++
>  app/test-pmd/iofwd.c   |  6 ++++++
>  app/test-pmd/testpmd.c |  3 +++
>  app/test-pmd/testpmd.h | 11 +++++++++++
>  5 files changed, 94 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 105c71f..d1ca8df 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -76,6 +76,7 @@
>  #include <rte_devargs.h>
>  #include <rte_eth_ctrl.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -423,6 +424,9 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>  			"tso show (portid)"
>  			"    Display the status of TCP Segmentation
> Offload.\n\n"
> 
> +			"gro (on|off) (port_id)"
> +			"    Enable or disable Generic Receive Offload.\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -3831,6 +3835,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
>  	},
>  };
> 
> +/* *** SET GRO FOR A PORT *** */
> +struct cmd_gro_result {
> +	cmdline_fixed_string_t cmd_keyword;
> +	cmdline_fixed_string_t mode;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_set_gro_parsed(void *parsed_result,
> +		__attribute__((unused)) struct cmdline *cl,
> +		__attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_result *res;
> +
> +	res = parsed_result;
> +	setup_gro(res->mode, res->port_id);
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			cmd_keyword, "gro");
> +cmdline_parse_token_string_t cmd_gro_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			mode, "on#off");
> +cmdline_parse_token_num_t cmd_gro_pid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
> +			port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_set_gro = {
> +	.f = cmd_set_gro_parsed,
> +	.data = NULL,
> +	.help_str = "gro (on|off) (port_id)",
> +	.tokens = {
> +		(void *)&cmd_gro_keyword,
> +		(void *)&cmd_gro_mode,
> +		(void *)&cmd_gro_pid,
> +		NULL,
> +	},
> +};
> +
>  /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
>  struct cmd_set_flush_rx {
>  	cmdline_fixed_string_t set;
> @@ -13710,6 +13754,7 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tso_show,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
> +	(cmdline_parse_inst_t *)&cmd_set_gro,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 3cd4f31..858342d 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -71,6 +71,7 @@
>  #ifdef RTE_LIBRTE_BNXT_PMD
>  #include <rte_pmd_bnxt.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -2414,6 +2415,34 @@ set_tx_pkt_segments(unsigned *seg_lengths,
> unsigned nb_segs)
>  	tx_pkt_nb_segs = (uint8_t) nb_segs;
>  }
> 
> +void
> +setup_gro(const char *mode, uint8_t port_id)
> +{
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		printf("invalid port id %u\n", port_id);
> +		return;
> +	}
> +	if (strcmp(mode, "on") == 0) {
> +		if (test_done == 0) {
> +			printf("before enable GRO,"
> +					" please stop forwarding first\n");
> +			return;
> +		}
> +		gro_ports[port_id].enable = 1;
> +		gro_ports[port_id].param.max_flow_num = 4;
> +		gro_ports[port_id].param.max_item_per_flow = 32;
> +		gro_ports[port_id].param.desired_gro_types =
> GRO_TCP_IPV4;
> +		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
> +	} else if (strcmp(mode, "off") == 0) {
> +		if (test_done == 0) {
> +			printf("before disable GRO,"
> +					" please stop forwarding first\n");
> +			return;
> +		}
> +		gro_ports[port_id].enable = 0;
> +	}
> +}
> +
>  char*
>  list_pkt_forwarding_modes(void)
>  {
> diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
> index 15cb4a2..d9ec528 100644
> --- a/app/test-pmd/iofwd.c
> +++ b/app/test-pmd/iofwd.c
> @@ -65,6 +65,7 @@
>  #include <rte_ethdev.h>
>  #include <rte_string_fns.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -99,6 +100,11 @@ pkt_burst_io_forward(struct fwd_stream *fs)
>  			pkts_burst, nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
>  		return;
> +	if (unlikely(gro_ports[fs->rx_port].enable)) {
> +		nb_rx = rte_gro_reassemble_burst(pkts_burst,
> +				nb_rx,
> +				gro_ports[fs->rx_port].param);
> +	}
>  	fs->rx_packets += nb_rx;
> 
>  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index b29328a..ed27c7a 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -90,6 +90,7 @@
>  #ifdef RTE_LIBRTE_LATENCY_STATS
>  #include <rte_latencystats.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
>  uint8_t bitrate_enabled;
>  #endif
> 
> +struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  /* Forward function declarations */
>  static void map_port_queue_stats_mapping_registers(uint8_t pi, struct
> rte_port *port);
>  static void check_all_ports_link_status(uint32_t port_mask);
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 364502d..0471e99 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -34,6 +34,8 @@
>  #ifndef _TESTPMD_H_
>  #define _TESTPMD_H_
> 
> +#include <rte_gro.h>
> +
>  #define RTE_PORT_ALL            (~(portid_t)0x0)
> 
>  #define RTE_TEST_RX_DESC_MAX    2048
> @@ -109,6 +111,8 @@ struct fwd_stream {
>  	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
>  	streamid_t peer_addr; /**< index of peer ethernet address of
> packets */
> 
> +	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
> +
>  	unsigned int retry_enabled;
> 
>  	/* "read-write" results */
> @@ -428,6 +432,12 @@ extern struct ether_addr
> peer_eth_addrs[RTE_MAX_ETHPORTS];
>  extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-
> retry. */
>  extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-
> retry. */
> 
> +struct gro_status {
> +	struct rte_gro_param param;
> +	uint8_t enable;
> +};
> +extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  static inline unsigned int
>  lcore_num(void)
>  {
> @@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t
> index);
>  void get_5tuple_filter(uint8_t port_id, uint16_t index);
>  int rx_queue_id_is_invalid(queueid_t rxq_id);
>  int tx_queue_id_is_invalid(queueid_t txq_id);
> +void setup_gro(const char *mode, uint8_t port_id);
> 
>  /* Functions to manage the set of filtered Multicast MAC addresses */
>  void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                           ` (2 preceding siblings ...)
  2017-06-18  7:21         ` [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-19  1:39         ` Tan, Jianfeng
  2017-06-19  3:07           ` Jiayu Hu
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
  4 siblings, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-19  1:39 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao

Hi Jiayu,

You need to update the document:
- Release note file: release_17_08.rst.
- A howto doc is welcomed.


On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains performance
> by reassembling small packets into large ones. Therefore, we propose to
> support GRO in DPDK.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweigth mode, the other is called heavyweight mode.
> If applications want merge packets in a simple way, they can use
> lightweigth mode. If applications need more fine-grained controls,
> they can choose heavyweigth mode.

So what's the real difference between the two modes? Might be an example 
is good way to clarify.

>
> This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
> provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
> The last patch demonstrates how to use GRO library in app/testpmd.

In which mode?

>
> We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
> the performance gains from DPDK GRO. Specifically, the experiment
> environment is:
> a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
> b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. Iperf client
> runs on p0, which sends TCP/IPv4 packets. The OS in VM is ubuntu 14.04;
> c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
> VM via vhost-user and virtio-net. The VM runs iperf server, whose IP
> is 1.1.2.4;
> d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
> iperf client and server use the following commands:
> 	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
> 	- server: iperf -s -f g
> Two test cases are:
> a. w/o DPDK GRO: run testpmd without GRO
> b. w DPDK GRO: testpmd enables GRO for p1
> Result:
> With GRO, the throughput improvement is around 40%.

Do you try running several pairs of iperf-s and iperf-c tests (on 40Gb 
NICs)? It can not only prove the performance, but also the functionality 
correctness.

Thanks,
Jianfeng

>
> Change log
> ==========
> v5:
> - fix some bugs
> - fix coding style issues
> v4:
> - implement DPDK GRO as an application-used library
> - introduce lightweight and heavyweight working modes to enable
> 	fine-grained controls to applications
> - replace cuckoo hash tables with simpler table structure
> v3:
> - fix compilation issues.
> v2:
> - provide generic reassembly function;
> - implement GRO as a device ability:
> add APIs for devices to support GRO;
> add APIs for applications to enable/disable GRO;
> - update testpmd example.
>
> Jiayu Hu (3):
>    lib: add Generic Receive Offload API framework
>    lib/gro: add TCP/IPv4 GRO support
>    app/testpmd: enable TCP/IPv4 GRO
>
>   app/test-pmd/cmdline.c       |  45 ++++
>   app/test-pmd/config.c        |  29 +++
>   app/test-pmd/iofwd.c         |   6 +
>   app/test-pmd/testpmd.c       |   3 +
>   app/test-pmd/testpmd.h       |  11 +
>   config/common_base           |   5 +
>   lib/Makefile                 |   1 +
>   lib/librte_gro/Makefile      |  51 +++++
>   lib/librte_gro/rte_gro.c     | 248 ++++++++++++++++++++
>   lib/librte_gro/rte_gro.h     | 217 ++++++++++++++++++
>   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
>   mk/rte.app.mk                |   1 +
>   13 files changed, 1354 insertions(+)
>   create mode 100644 lib/librte_gro/Makefile
>   create mode 100644 lib/librte_gro/rte_gro.c
>   create mode 100644 lib/librte_gro/rte_gro.h
>   create mode 100644 lib/librte_gro/rte_gro_tcp.c
>   create mode 100644 lib/librte_gro/rte_gro_tcp.h
>

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-18  7:21         ` [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-06-19  1:24           ` Yao, Lei A
@ 2017-06-19  2:27           ` Wu, Jingjing
  2017-06-19  3:22             ` Jiayu Hu
  1 sibling, 1 reply; 141+ messages in thread
From: Wu, Jingjing @ 2017-06-19  2:27 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Ananyev, Konstantin, yliu, Wiles, Keith, Tan, Jianfeng, Bie,
	Tiwei, Yao, Lei A, Hu, Jiayu



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jiayu Hu
> Sent: Sunday, June 18, 2017 3:21 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; yliu@fridaylinux.org;
> Wiles, Keith <keith.wiles@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>;
> Bie, Tiwei <tiwei.bie@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Hu, Jiayu
> <jiayu.hu@intel.com>
> Subject: [dpdk-dev] [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
> 
> This patch demonstrates the usage of GRO library in testpmd. By default, GRO
> is turned off. Command, "gro on (port_id)", turns on GRO for the given port;
> command, "gro off (port_id)", turns off GRO for the given port. Currently, GRO
> only supports to process TCP/IPv4 packets and works in IO forward mode.
> Besides, only GRO lightweight mode is enabled.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>  app/test-pmd/cmdline.c | 45
> +++++++++++++++++++++++++++++++++++++++++++++
>  app/test-pmd/config.c  | 29 +++++++++++++++++++++++++++++
>  app/test-pmd/iofwd.c   |  6 ++++++
>  app/test-pmd/testpmd.c |  3 +++
>  app/test-pmd/testpmd.h | 11 +++++++++++
>  5 files changed, 94 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 105c71f..d1ca8df 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -76,6 +76,7 @@
>  #include <rte_devargs.h>
>  #include <rte_eth_ctrl.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -423,6 +424,9 @@ static void cmd_help_long_parsed(void *parsed_result,
>  			"tso show (portid)"
>  			"    Display the status of TCP Segmentation
> Offload.\n\n"
> 
> +			"gro (on|off) (port_id)"
> +			"    Enable or disable Generic Receive Offload.\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -3831,6 +3835,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
>  	},
>  };
> 
> +/* *** SET GRO FOR A PORT *** */
> +struct cmd_gro_result {
> +	cmdline_fixed_string_t cmd_keyword;
> +	cmdline_fixed_string_t mode;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_set_gro_parsed(void *parsed_result,
> +		__attribute__((unused)) struct cmdline *cl,
> +		__attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_result *res;
> +
> +	res = parsed_result;
> +	setup_gro(res->mode, res->port_id);
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			cmd_keyword, "gro");
> +cmdline_parse_token_string_t cmd_gro_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			mode, "on#off");
> +cmdline_parse_token_num_t cmd_gro_pid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
> +			port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_set_gro = {
> +	.f = cmd_set_gro_parsed,
> +	.data = NULL,
> +	.help_str = "gro (on|off) (port_id)",
> +	.tokens = {
> +		(void *)&cmd_gro_keyword,
> +		(void *)&cmd_gro_mode,
> +		(void *)&cmd_gro_pid,
> +		NULL,
> +	},
> +};
> +
>  /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */  struct
> cmd_set_flush_rx {
>  	cmdline_fixed_string_t set;
> @@ -13710,6 +13754,7 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tso_show,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
> +	(cmdline_parse_inst_t *)&cmd_set_gro,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx, diff --git
> a/app/test-pmd/config.c b/app/test-pmd/config.c index 3cd4f31..858342d
> 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -71,6 +71,7 @@
>  #ifdef RTE_LIBRTE_BNXT_PMD
>  #include <rte_pmd_bnxt.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -2414,6 +2415,34 @@ set_tx_pkt_segments(unsigned *seg_lengths,
> unsigned nb_segs)
>  	tx_pkt_nb_segs = (uint8_t) nb_segs;
>  }
> 
> +void
> +setup_gro(const char *mode, uint8_t port_id) {
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		printf("invalid port id %u\n", port_id);
> +		return;
> +	}
> +	if (strcmp(mode, "on") == 0) {
> +		if (test_done == 0) {
> +			printf("before enable GRO,"
> +					" please stop forwarding first\n");
> +			return;
> +		}
> +		gro_ports[port_id].enable = 1;
> +		gro_ports[port_id].param.max_flow_num = 4;
> +		gro_ports[port_id].param.max_item_per_flow = 32;
Is 4 and 32 the default value for GRO? If so, how about to define the MACRO in rte_gro.h

> +		gro_ports[port_id].param.desired_gro_types = GRO_TCP_IPV4;
> +		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
> +	} else if (strcmp(mode, "off") == 0) {

"else" is enough, no need to compare it with "off".

> +		if (test_done == 0) {
> +			printf("before disable GRO,"
> +					" please stop forwarding first\n");
> +			return;
> +		}

How about to move the test_done before the mode checking?

> +		gro_ports[port_id].enable = 0;
> +	}
> +}
> +
>  char*
>  list_pkt_forwarding_modes(void)
>  {
> diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c index
> 15cb4a2..d9ec528 100644
> --- a/app/test-pmd/iofwd.c
> +++ b/app/test-pmd/iofwd.c
> @@ -65,6 +65,7 @@
>  #include <rte_ethdev.h>
>  #include <rte_string_fns.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -99,6 +100,11 @@ pkt_burst_io_forward(struct fwd_stream *fs)
>  			pkts_burst, nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
>  		return;
> +	if (unlikely(gro_ports[fs->rx_port].enable)) {
> +		nb_rx = rte_gro_reassemble_burst(pkts_burst,
> +				nb_rx,
> +				gro_ports[fs->rx_port].param);
> +	}
>  	fs->rx_packets += nb_rx;
> 
>  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> b29328a..ed27c7a 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -90,6 +90,7 @@
>  #ifdef RTE_LIBRTE_LATENCY_STATS
>  #include <rte_latencystats.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;  uint8_t bitrate_enabled;
> #endif
> 
> +struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  /* Forward function declarations */
>  static void map_port_queue_stats_mapping_registers(uint8_t pi, struct
> rte_port *port);  static void check_all_ports_link_status(uint32_t port_mask);
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> 364502d..0471e99 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -34,6 +34,8 @@
>  #ifndef _TESTPMD_H_
>  #define _TESTPMD_H_
> 
> +#include <rte_gro.h>
> +
>  #define RTE_PORT_ALL            (~(portid_t)0x0)
> 
>  #define RTE_TEST_RX_DESC_MAX    2048
> @@ -109,6 +111,8 @@ struct fwd_stream {
>  	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
>  	streamid_t peer_addr; /**< index of peer ethernet address of packets
> */
> 
> +	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
> +
>  	unsigned int retry_enabled;
> 
>  	/* "read-write" results */
> @@ -428,6 +432,12 @@ extern struct ether_addr
> peer_eth_addrs[RTE_MAX_ETHPORTS];  extern uint32_t burst_tx_delay_time;
> /**< Burst tx delay time(us) for mac-retry. */  extern uint32_t
> burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
> 
> +struct gro_status {
> +	struct rte_gro_param param;
> +	uint8_t enable;
> +};
> +extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  static inline unsigned int
>  lcore_num(void)
>  {
> @@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
> void get_5tuple_filter(uint8_t port_id, uint16_t index);  int
> rx_queue_id_is_invalid(queueid_t rxq_id);  int
> tx_queue_id_is_invalid(queueid_t txq_id);
> +void setup_gro(const char *mode, uint8_t port_id);
> 
>  /* Functions to manage the set of filtered Multicast MAC addresses */  void
> mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
> --
> 2.7.4

Looks fine to me, just don't forget the doc update "doc/guides/testpmd_app_ug/" due to new command line

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-19  1:39         ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
@ 2017-06-19  3:07           ` Jiayu Hu
  2017-06-19  5:12             ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-19  3:07 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao

On Mon, Jun 19, 2017 at 09:39:11AM +0800, Tan, Jianfeng wrote:
> Hi Jiayu,
> 
> You need to update the document:
> - Release note file: release_17_08.rst.
> - A howto doc is welcomed.

Thanks. I will update them in next patch.

> 
> 
> On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > technique to reduce per-packet processing overhead. It gains performance
> > by reassembling small packets into large ones. Therefore, we propose to
> > support GRO in DPDK.
> > 
> > To enable more flexibility to applications, DPDK GRO is implemented as
> > a user library. Applications explicitly use the GRO library to merge
> > small packets into large ones. DPDK GRO provides two reassembly modes.
> > One is called lightweigth mode, the other is called heavyweight mode.
> > If applications want merge packets in a simple way, they can use
> > lightweigth mode. If applications need more fine-grained controls,
> > they can choose heavyweigth mode.
> 
> So what's the real difference between the two modes? Might be an example is
> good way to clarify.

The heavyweight mode merges packets in a burst-mode. Applications just need
to give N packets to the heavyweight mode API, rte_gro_reassemble_burst.
After rte_gro_reassemble_burst returns, packets are merged together. For
applications, to use the heavyweight mode is very simple and they don't
need to allocate any GRO tables before. The lightweight mode enables more
flexibility to applications. Applications need to create a GRO table before
invoking the lightweight mode API, rte_gro_reassemble, to merge packets.
Besides, rte_gro_reassemble just processes one packet at a time. No matter
if the packet is merged successfully or not, it's stored in the GRO table.
When applications want these processed packets, they need to manually flush
them from the GRO table. You can see more detaileds in the next patch
'add Generic Receive Offload API framework'.

> 
> > 
> > This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
> > provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
> > The last patch demonstrates how to use GRO library in app/testpmd.
> 
> In which mode?

Testpmd just demonstrate the usage of the lightweight mode.

> 
> > 
> > We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
> > the performance gains from DPDK GRO. Specifically, the experiment
> > environment is:
> > a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
> > b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. Iperf client
> > runs on p0, which sends TCP/IPv4 packets. The OS in VM is ubuntu 14.04;
> > c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
> > VM via vhost-user and virtio-net. The VM runs iperf server, whose IP
> > is 1.1.2.4;
> > d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
> > iperf client and server use the following commands:
> > 	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
> > 	- server: iperf -s -f g
> > Two test cases are:
> > a. w/o DPDK GRO: run testpmd without GRO
> > b. w DPDK GRO: testpmd enables GRO for p1
> > Result:
> > With GRO, the throughput improvement is around 40%.
> 
> Do you try running several pairs of iperf-s and iperf-c tests (on 40Gb
> NICs)? It can not only prove the performance, but also the functionality
> correctness.

Besides the one pair scenario, I just tried two pairs of iperf-s and iperf-c.
Thanks for your advices, amd I will do more testes in next patch.


Thanks,
Jiayu

> 
> Thanks,
> Jianfeng
> 
> > 
> > Change log
> > ==========
> > v5:
> > - fix some bugs
> > - fix coding style issues
> > v4:
> > - implement DPDK GRO as an application-used library
> > - introduce lightweight and heavyweight working modes to enable
> > 	fine-grained controls to applications
> > - replace cuckoo hash tables with simpler table structure
> > v3:
> > - fix compilation issues.
> > v2:
> > - provide generic reassembly function;
> > - implement GRO as a device ability:
> > add APIs for devices to support GRO;
> > add APIs for applications to enable/disable GRO;
> > - update testpmd example.
> > 
> > Jiayu Hu (3):
> >    lib: add Generic Receive Offload API framework
> >    lib/gro: add TCP/IPv4 GRO support
> >    app/testpmd: enable TCP/IPv4 GRO
> > 
> >   app/test-pmd/cmdline.c       |  45 ++++
> >   app/test-pmd/config.c        |  29 +++
> >   app/test-pmd/iofwd.c         |   6 +
> >   app/test-pmd/testpmd.c       |   3 +
> >   app/test-pmd/testpmd.h       |  11 +
> >   config/common_base           |   5 +
> >   lib/Makefile                 |   1 +
> >   lib/librte_gro/Makefile      |  51 +++++
> >   lib/librte_gro/rte_gro.c     | 248 ++++++++++++++++++++
> >   lib/librte_gro/rte_gro.h     | 217 ++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> >   mk/rte.app.mk                |   1 +
> >   13 files changed, 1354 insertions(+)
> >   create mode 100644 lib/librte_gro/Makefile
> >   create mode 100644 lib/librte_gro/rte_gro.c
> >   create mode 100644 lib/librte_gro/rte_gro.h
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > 

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-19  2:27           ` Wu, Jingjing
@ 2017-06-19  3:22             ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-19  3:22 UTC (permalink / raw)
  To: Wu, Jingjing
  Cc: dev, Ananyev, Konstantin, yliu, Wiles, Keith, Tan, Jianfeng, Bie,
	Tiwei, Yao, Lei A

On Mon, Jun 19, 2017 at 10:27:05AM +0800, Wu, Jingjing wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jiayu Hu
> > Sent: Sunday, June 18, 2017 3:21 PM
> > To: dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; yliu@fridaylinux.org;
> > Wiles, Keith <keith.wiles@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>;
> > Bie, Tiwei <tiwei.bie@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Hu, Jiayu
> > <jiayu.hu@intel.com>
> > Subject: [dpdk-dev] [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
> > 
> > This patch demonstrates the usage of GRO library in testpmd. By default, GRO
> > is turned off. Command, "gro on (port_id)", turns on GRO for the given port;
> > command, "gro off (port_id)", turns off GRO for the given port. Currently, GRO
> > only supports to process TCP/IPv4 packets and works in IO forward mode.
> > Besides, only GRO lightweight mode is enabled.
> > 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >  app/test-pmd/cmdline.c | 45
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  app/test-pmd/config.c  | 29 +++++++++++++++++++++++++++++
> >  app/test-pmd/iofwd.c   |  6 ++++++
> >  app/test-pmd/testpmd.c |  3 +++
> >  app/test-pmd/testpmd.h | 11 +++++++++++
> >  5 files changed, 94 insertions(+)
> > 
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 105c71f..d1ca8df 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -76,6 +76,7 @@
> >  #include <rte_devargs.h>
> >  #include <rte_eth_ctrl.h>
> >  #include <rte_flow.h>
> > +#include <rte_gro.h>
> > 
> >  #include <cmdline_rdline.h>
> >  #include <cmdline_parse.h>
> > @@ -423,6 +424,9 @@ static void cmd_help_long_parsed(void *parsed_result,
> >  			"tso show (portid)"
> >  			"    Display the status of TCP Segmentation
> > Offload.\n\n"
> > 
> > +			"gro (on|off) (port_id)"
> > +			"    Enable or disable Generic Receive Offload.\n\n"
> > +
> >  			"set fwd (%s)\n"
> >  			"    Set packet forwarding mode.\n\n"
> > 
> > @@ -3831,6 +3835,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
> >  	},
> >  };
> > 
> > +/* *** SET GRO FOR A PORT *** */
> > +struct cmd_gro_result {
> > +	cmdline_fixed_string_t cmd_keyword;
> > +	cmdline_fixed_string_t mode;
> > +	uint8_t port_id;
> > +};
> > +
> > +static void
> > +cmd_set_gro_parsed(void *parsed_result,
> > +		__attribute__((unused)) struct cmdline *cl,
> > +		__attribute__((unused)) void *data)
> > +{
> > +	struct cmd_gro_result *res;
> > +
> > +	res = parsed_result;
> > +	setup_gro(res->mode, res->port_id);
> > +}
> > +
> > +cmdline_parse_token_string_t cmd_gro_keyword =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> > +			cmd_keyword, "gro");
> > +cmdline_parse_token_string_t cmd_gro_mode =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> > +			mode, "on#off");
> > +cmdline_parse_token_num_t cmd_gro_pid =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
> > +			port_id, UINT8);
> > +
> > +cmdline_parse_inst_t cmd_set_gro = {
> > +	.f = cmd_set_gro_parsed,
> > +	.data = NULL,
> > +	.help_str = "gro (on|off) (port_id)",
> > +	.tokens = {
> > +		(void *)&cmd_gro_keyword,
> > +		(void *)&cmd_gro_mode,
> > +		(void *)&cmd_gro_pid,
> > +		NULL,
> > +	},
> > +};
> > +
> >  /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */  struct
> > cmd_set_flush_rx {
> >  	cmdline_fixed_string_t set;
> > @@ -13710,6 +13754,7 @@ cmdline_parse_ctx_t main_ctx[] = {
> >  	(cmdline_parse_inst_t *)&cmd_tso_show,
> >  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
> >  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
> > +	(cmdline_parse_inst_t *)&cmd_set_gro,
> >  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
> >  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
> >  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx, diff --git
> > a/app/test-pmd/config.c b/app/test-pmd/config.c index 3cd4f31..858342d
> > 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -71,6 +71,7 @@
> >  #ifdef RTE_LIBRTE_BNXT_PMD
> >  #include <rte_pmd_bnxt.h>
> >  #endif
> > +#include <rte_gro.h>
> > 
> >  #include "testpmd.h"
> > 
> > @@ -2414,6 +2415,34 @@ set_tx_pkt_segments(unsigned *seg_lengths,
> > unsigned nb_segs)
> >  	tx_pkt_nb_segs = (uint8_t) nb_segs;
> >  }
> > 
> > +void
> > +setup_gro(const char *mode, uint8_t port_id) {
> > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > +		printf("invalid port id %u\n", port_id);
> > +		return;
> > +	}
> > +	if (strcmp(mode, "on") == 0) {
> > +		if (test_done == 0) {
> > +			printf("before enable GRO,"
> > +					" please stop forwarding first\n");
> > +			return;
> > +		}
> > +		gro_ports[port_id].enable = 1;
> > +		gro_ports[port_id].param.max_flow_num = 4;
> > +		gro_ports[port_id].param.max_item_per_flow = 32;
> Is 4 and 32 the default value for GRO? If so, how about to define the MACRO in rte_gro.h

The values of param.max_flow_num and param.max_item_per_flow should
depend on the specific usage scenarios. In testpmd case, the number
of packets returned by rte_eth_rx_burst each time is always less
than or equal to 32. So I simply hard-code this value. Besides, I
assume the flow number per queue receives is less than 4. So does
the value of param.max_flow_num. Maybe I should add commands in
testpmd to enable users to set these two values dynamically. How
do you think?

> 
> > +		gro_ports[port_id].param.desired_gro_types = GRO_TCP_IPV4;
> > +		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
> > +	} else if (strcmp(mode, "off") == 0) {
> 
> "else" is enough, no need to compare it with "off".

Thanks. I will modify it in next patch.

> 
> > +		if (test_done == 0) {
> > +			printf("before disable GRO,"
> > +					" please stop forwarding first\n");
> > +			return;
> > +		}
> 
> How about to move the test_done before the mode checking?

Thanks. I will modify it in next patch.

> 
> > +		gro_ports[port_id].enable = 0;
> > +	}
> > +}
> > +
> >  char*
> >  list_pkt_forwarding_modes(void)
> >  {
> > diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c index
> > 15cb4a2..d9ec528 100644
> > --- a/app/test-pmd/iofwd.c
> > +++ b/app/test-pmd/iofwd.c
> > @@ -65,6 +65,7 @@
> >  #include <rte_ethdev.h>
> >  #include <rte_string_fns.h>
> >  #include <rte_flow.h>
> > +#include <rte_gro.h>
> > 
> >  #include "testpmd.h"
> > 
> > @@ -99,6 +100,11 @@ pkt_burst_io_forward(struct fwd_stream *fs)
> >  			pkts_burst, nb_pkt_per_burst);
> >  	if (unlikely(nb_rx == 0))
> >  		return;
> > +	if (unlikely(gro_ports[fs->rx_port].enable)) {
> > +		nb_rx = rte_gro_reassemble_burst(pkts_burst,
> > +				nb_rx,
> > +				gro_ports[fs->rx_port].param);
> > +	}
> >  	fs->rx_packets += nb_rx;
> > 
> >  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > b29328a..ed27c7a 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -90,6 +90,7 @@
> >  #ifdef RTE_LIBRTE_LATENCY_STATS
> >  #include <rte_latencystats.h>
> >  #endif
> > +#include <rte_gro.h>
> > 
> >  #include "testpmd.h"
> > 
> > @@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;  uint8_t bitrate_enabled;
> > #endif
> > 
> > +struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> > +
> >  /* Forward function declarations */
> >  static void map_port_queue_stats_mapping_registers(uint8_t pi, struct
> > rte_port *port);  static void check_all_ports_link_status(uint32_t port_mask);
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> > 364502d..0471e99 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -34,6 +34,8 @@
> >  #ifndef _TESTPMD_H_
> >  #define _TESTPMD_H_
> > 
> > +#include <rte_gro.h>
> > +
> >  #define RTE_PORT_ALL            (~(portid_t)0x0)
> > 
> >  #define RTE_TEST_RX_DESC_MAX    2048
> > @@ -109,6 +111,8 @@ struct fwd_stream {
> >  	queueid_t  tx_queue;  /**< TX queue to send forwarded packets */
> >  	streamid_t peer_addr; /**< index of peer ethernet address of packets
> > */
> > 
> > +	uint16_t tbl_idx;	/**< TCP IPv4 GRO lookup tale index */
> > +
> >  	unsigned int retry_enabled;
> > 
> >  	/* "read-write" results */
> > @@ -428,6 +432,12 @@ extern struct ether_addr
> > peer_eth_addrs[RTE_MAX_ETHPORTS];  extern uint32_t burst_tx_delay_time;
> > /**< Burst tx delay time(us) for mac-retry. */  extern uint32_t
> > burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
> > 
> > +struct gro_status {
> > +	struct rte_gro_param param;
> > +	uint8_t enable;
> > +};
> > +extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> > +
> >  static inline unsigned int
> >  lcore_num(void)
> >  {
> > @@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
> > void get_5tuple_filter(uint8_t port_id, uint16_t index);  int
> > rx_queue_id_is_invalid(queueid_t rxq_id);  int
> > tx_queue_id_is_invalid(queueid_t txq_id);
> > +void setup_gro(const char *mode, uint8_t port_id);
> > 
> >  /* Functions to manage the set of filtered Multicast MAC addresses */  void
> > mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
> > --
> > 2.7.4
> 
> Looks fine to me, just don't forget the doc update "doc/guides/testpmd_app_ug/" due to new command line

Thanks for you comments. I will update it.

BRs,
Jiayu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 1/3] lib: add Generic Receive Offload API framework
  2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-19  4:03           ` Tiwei Bie
  2017-06-19  5:16             ` Jiayu Hu
  2017-06-19 15:43           ` Tan, Jianfeng
  2017-06-19 15:55           ` Stephen Hemminger
  2 siblings, 1 reply; 141+ messages in thread
From: Tiwei Bie @ 2017-06-19  4:03 UTC (permalink / raw)
  To: Jiayu Hu
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, jianfeng.tan, lei.a.yao

On Sun, Jun 18, 2017 at 03:21:07PM +0800, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
> 
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweigth mode, the other is called heavyweight mode.
> If applications want merge packets in a simple way, they can use
> lightweigth mode. If applications need more fine-grained controls,
> they can choose heavyweigth mode.
> 
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweigth mode and processes N packets at a time. For applications,
> performing GRO in lightweigth mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
> 
> rte_gro_reassemble is the main reassembly API which is used in
> lightweigth mode and processes one packet at a time. For applications,

Should be heavyweight here?

> performing GRO in heavyweigth mode is relatively complicated. Before
> performing GRO, applications need to create a GRO table by
> rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
> packets one by one. The processed packets are in the GRO table. If
> applications want to get them, applications need to manually flush
> them by flush APIs.
> 

Few typos:

s/lightweigth/lightweight/g
s/heavyweigth/heavyweight/g

Besides, it's a bit confusing. Based on your comments in below mail:

http://dpdk.org/ml/archives/dev/2017-June/068113.html

The rte_gro_reassemble_burst() is used in heavyweight mode. But it
is inconsistent with the description in above commit message.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-19  3:07           ` Jiayu Hu
@ 2017-06-19  5:12             ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-19  5:12 UTC (permalink / raw)
  To: Tan, Jianfeng, tiwei.bie
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, lei.a.yao

Hi Jianfeng,

Sorry for some typo errors. I have correct them below.

On Mon, Jun 19, 2017 at 11:07:34AM +0800, Jiayu Hu wrote:
> On Mon, Jun 19, 2017 at 09:39:11AM +0800, Tan, Jianfeng wrote:
> > Hi Jiayu,
> > 
> > You need to update the document:
> > - Release note file: release_17_08.rst.
> > - A howto doc is welcomed.
> 
> Thanks. I will update them in next patch.
> 
> > 
> > 
> > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > > technique to reduce per-packet processing overhead. It gains performance
> > > by reassembling small packets into large ones. Therefore, we propose to
> > > support GRO in DPDK.
> > > 
> > > To enable more flexibility to applications, DPDK GRO is implemented as
> > > a user library. Applications explicitly use the GRO library to merge
> > > small packets into large ones. DPDK GRO provides two reassembly modes.
> > > One is called lightweigth mode, the other is called heavyweight mode.
> > > If applications want merge packets in a simple way, they can use
> > > lightweigth mode. If applications need more fine-grained controls,
> > > they can choose heavyweigth mode.
> > 
> > So what's the real difference between the two modes? Might be an example is
> > good way to clarify.
> 
> The heavyweight mode merges packets in a burst-mode. Applications just need

Sorry for typo error. It should be 'lightweight mode'.

> to give N packets to the heavyweight mode API, rte_gro_reassemble_burst.

Sorry for typo error. It should be 'lightweight mode'.

> After rte_gro_reassemble_burst returns, packets are merged together. For
> applications, to use the heavyweight mode is very simple and they don't

Sorry for typo error. It should be 'lightweight mode'.

> need to allocate any GRO tables before. The lightweight mode enables more

Sorry for typo error. It should be 'heavyweight mode'.

> flexibility to applications. Applications need to create a GRO table before
> invoking the lightweight mode API, rte_gro_reassemble, to merge packets.

Sorry for typo error. It should be 'heavyweight mode'.

> Besides, rte_gro_reassemble just processes one packet at a time. No matter
> if the packet is merged successfully or not, it's stored in the GRO table.
> When applications want these processed packets, they need to manually flush
> them from the GRO table. You can see more detaileds in the next patch
> 'add Generic Receive Offload API framework'.
> 
> > 
> > > 
> > > This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
> > > provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
> > > The last patch demonstrates how to use GRO library in app/testpmd.
> > 
> > In which mode?
> 
> Testpmd just demonstrate the usage of the lightweight mode.
> 
> > 
> > > 
> > > We perform two iperf tests (with DPDK GRO and without DPDK GRO) to see
> > > the performance gains from DPDK GRO. Specifically, the experiment
> > > environment is:
> > > a. Two 10Gbps physical ports (p0 and p1) on one host are linked together;
> > > b. p0 is in networking namespace ns1, whose IP is 1.1.2.3. Iperf client
> > > runs on p0, which sends TCP/IPv4 packets. The OS in VM is ubuntu 14.04;
> > > c. testpmd runs on p1. Besides, testpmd has a vdev which connects to a
> > > VM via vhost-user and virtio-net. The VM runs iperf server, whose IP
> > > is 1.1.2.4;
> > > d. p0 turns on TSO; VM turns off kernel GRO; testpmd runs in iofwd mode.
> > > iperf client and server use the following commands:
> > > 	- client: ip netns exec ns1 iperf -c 1.1.2.4 -i2 -t 60 -f g -m
> > > 	- server: iperf -s -f g
> > > Two test cases are:
> > > a. w/o DPDK GRO: run testpmd without GRO
> > > b. w DPDK GRO: testpmd enables GRO for p1
> > > Result:
> > > With GRO, the throughput improvement is around 40%.
> > 
> > Do you try running several pairs of iperf-s and iperf-c tests (on 40Gb
> > NICs)? It can not only prove the performance, but also the functionality
> > correctness.
> 
> Besides the one pair scenario, I just tried two pairs of iperf-s and iperf-c.
> Thanks for your advices, amd I will do more testes in next patch.
> 
> 
> Thanks,
> Jiayu
> 
> > 
> > Thanks,
> > Jianfeng
> > 
> > > 
> > > Change log
> > > ==========
> > > v5:
> > > - fix some bugs
> > > - fix coding style issues
> > > v4:
> > > - implement DPDK GRO as an application-used library
> > > - introduce lightweight and heavyweight working modes to enable
> > > 	fine-grained controls to applications
> > > - replace cuckoo hash tables with simpler table structure
> > > v3:
> > > - fix compilation issues.
> > > v2:
> > > - provide generic reassembly function;
> > > - implement GRO as a device ability:
> > > add APIs for devices to support GRO;
> > > add APIs for applications to enable/disable GRO;
> > > - update testpmd example.
> > > 
> > > Jiayu Hu (3):
> > >    lib: add Generic Receive Offload API framework
> > >    lib/gro: add TCP/IPv4 GRO support
> > >    app/testpmd: enable TCP/IPv4 GRO
> > > 
> > >   app/test-pmd/cmdline.c       |  45 ++++
> > >   app/test-pmd/config.c        |  29 +++
> > >   app/test-pmd/iofwd.c         |   6 +
> > >   app/test-pmd/testpmd.c       |   3 +
> > >   app/test-pmd/testpmd.h       |  11 +
> > >   config/common_base           |   5 +
> > >   lib/Makefile                 |   1 +
> > >   lib/librte_gro/Makefile      |  51 +++++
> > >   lib/librte_gro/rte_gro.c     | 248 ++++++++++++++++++++
> > >   lib/librte_gro/rte_gro.h     | 217 ++++++++++++++++++
> > >   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> > >   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> > >   mk/rte.app.mk                |   1 +
> > >   13 files changed, 1354 insertions(+)
> > >   create mode 100644 lib/librte_gro/Makefile
> > >   create mode 100644 lib/librte_gro/rte_gro.c
> > >   create mode 100644 lib/librte_gro/rte_gro.h
> > >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> > >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > > 

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 1/3] lib: add Generic Receive Offload API framework
  2017-06-19  4:03           ` Tiwei Bie
@ 2017-06-19  5:16             ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-19  5:16 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, jianfeng.tan, lei.a.yao

On Mon, Jun 19, 2017 at 12:03:44PM +0800, Tiwei Bie wrote:
> On Sun, Jun 18, 2017 at 03:21:07PM +0800, Jiayu Hu wrote:
> > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > technique to reduce per-packet processing overhead. It gains
> > performance by reassembling small packets into large ones. This
> > patchset is to support GRO in DPDK. To support GRO, this patch
> > implements a GRO API framework.
> > 
> > To enable more flexibility to applications, DPDK GRO is implemented as
> > a user library. Applications explicitly use the GRO library to merge
> > small packets into large ones. DPDK GRO provides two reassembly modes.
> > One is called lightweigth mode, the other is called heavyweight mode.
> > If applications want merge packets in a simple way, they can use
> > lightweigth mode. If applications need more fine-grained controls,
> > they can choose heavyweigth mode.
> > 
> > rte_gro_reassemble_burst is the main reassembly API which is used in
> > lightweigth mode and processes N packets at a time. For applications,
> > performing GRO in lightweigth mode is simple. They just need to invoke
> > rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> > rte_gro_reassemble_burst returns.
> > 
> > rte_gro_reassemble is the main reassembly API which is used in
> > lightweigth mode and processes one packet at a time. For applications,
> 
> Should be heavyweight here?

Sorry for typo error. It should be 'heavyweight' here.

> 
> > performing GRO in heavyweigth mode is relatively complicated. Before
> > performing GRO, applications need to create a GRO table by
> > rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
> > packets one by one. The processed packets are in the GRO table. If
> > applications want to get them, applications need to manually flush
> > them by flush APIs.
> > 
> 
> Few typos:
> 
> s/lightweigth/lightweight/g
> s/heavyweigth/heavyweight/g
> 
> Besides, it's a bit confusing. Based on your comments in below mail:
> 
> http://dpdk.org/ml/archives/dev/2017-June/068113.html
> 
> The rte_gro_reassemble_burst() is used in heavyweight mode. But it
> is inconsistent with the description in above commit message.

I am so sorry for these typo errors. rte_gro_reassemble_burst is used
in lightweight mode and rte_gro_reassemble is used in heavyweight mode.

Thanks,
Jiayu
> 
> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 1/3] lib: add Generic Receive Offload API framework
  2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-19  4:03           ` Tiwei Bie
@ 2017-06-19 15:43           ` Tan, Jianfeng
  2017-06-19 15:55           ` Stephen Hemminger
  2 siblings, 0 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-19 15:43 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao



On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweigth mode, the other is called heavyweight mode.
> If applications want merge packets in a simple way, they can use
> lightweigth mode. If applications need more fine-grained controls,
> they can choose heavyweigth mode.
>
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweigth mode and processes N packets at a time. For applications,
> performing GRO in lightweigth mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
>
> rte_gro_reassemble is the main reassembly API which is used in
> lightweigth mode and processes one packet at a time. For applications,
> performing GRO in heavyweigth mode is relatively complicated. Before
> performing GRO, applications need to create a GRO table by
> rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
> packets one by one. The processed packets are in the GRO table. If
> applications want to get them, applications need to manually flush
> them by flush APIs.

For these two APIs, I suppose they will try best to reassemble the 
packets according to the supported GRO engine. So we need to call all 
GRO engines according to the ptype of this packet. And this framework 
should be implemented in this file.

>
> In DPDK GRO, different GRO types define own reassembly tables. When
> create a GRO table, it keeps the reassembly tables of desired GRO types.
> To process one packet, we search for the corresponding reassembly table
> according to the packet type first. Then search for the reassembly table
> to find an existed packet to merge. If find, chain the two packets
> together. If not find, insert the packet into the reassembly table. If
> the packet is with wrong checksum, or is fragmented etc., error happens.
> The reassebly function will stop processing the packet.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   config/common_base       |   5 ++
>   lib/Makefile             |   1 +
>   lib/librte_gro/Makefile  |  50 +++++++++++
>   lib/librte_gro/rte_gro.c | 126 ++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro.h | 213 +++++++++++++++++++++++++++++++++++++++++++++++
>   mk/rte.app.mk            |   1 +
>   6 files changed, 396 insertions(+)
>   create mode 100644 lib/librte_gro/Makefile
>   create mode 100644 lib/librte_gro/rte_gro.c
>   create mode 100644 lib/librte_gro/rte_gro.h

If we expose some APIs, we always add a rte_vhost_version.map file in 
that directory.

>
> diff --git a/config/common_base b/config/common_base
> index f6aafd1..167f5ef 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>   CONFIG_RTE_LIBRTE_PMD_VHOST=n
>   
>   #
> +# Compile GRO library
> +#
> +CONFIG_RTE_LIBRTE_GRO=y
> +
> +#
>   #Compile Xen domain0 support
>   #
>   CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/lib/Makefile b/lib/Makefile
> index 07e1fd0..e253053 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -106,6 +106,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
>   DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
>   DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
>   DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
> +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> new file mode 100644
> index 0000000..9f4063a
> --- /dev/null
> +++ b/lib/librte_gro/Makefile
> @@ -0,0 +1,50 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_gro.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +
> +EXPORT_MAP := rte_gro_version.map
> +
> +LIBABIVER := 1
> +
> +# source files
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +
> +# install this header file
> +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..1bc53a2
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,126 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.

The year should be 2017. The same to other files.

> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +
> +#include "rte_gro.h"
> +
> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];

> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> +
> +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow,
> +		uint32_t max_packet_size,
> +		uint64_t max_timeout_cycles,
> +		uint64_t desired_gro_types)
> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	struct rte_gro_tbl *gro_tbl;
> +	uint64_t gro_type_flag = 0;
> +	uint8_t i;
> +
> +	gro_tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct rte_gro_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	gro_tbl->max_packet_size = max_packet_size;
> +	gro_tbl->max_timeout_cycles = max_timeout_cycles;
> +	gro_tbl->desired_gro_types = desired_gro_types;
> +
> +	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
> +		gro_type_flag = 1 << i;
> +		if (desired_gro_types & gro_type_flag) {
> +			create_tbl_fn = tbl_create_functions[i];
> +			if (create_tbl_fn)
> +				create_tbl_fn(socket_id,
> +						max_flow_num,
> +						max_item_per_flow);
> +			else
> +				gro_tbl->tbls[i] = NULL;
> +		}
> +	}
> +	return gro_tbl;
> +}
> +
> +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> +{
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	if (gro_tbl == NULL)
> +		return;
> +	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
> +		gro_type_flag = 1 << i;
> +		if (gro_tbl->desired_gro_types & gro_type_flag) {
> +			destroy_tbl_fn = tbl_destroy_functions[i];
> +			if (destroy_tbl_fn)
> +				destroy_tbl_fn(gro_tbl->tbls[i]);
> +			gro_tbl->tbls[i] = NULL;
> +		}
> +	}
> +	rte_free(gro_tbl);
> +}
> +
> +uint16_t
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		const uint16_t nb_pkts,
> +		const struct rte_gro_param param __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> +		struct rte_gro_tbl *gro_tbl __rte_unused)
> +{
> +	return -1;
> +}
> +
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		uint16_t flush_num __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> +
> +uint16_t
> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> new file mode 100644
> index 0000000..67bd90d
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.h
> @@ -0,0 +1,213 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_H_
> +#define _RTE_GRO_H_
> +
> +/* maximum number of supported GRO types */
> +#define GRO_TYPE_MAX_NB 64
> +#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> +
> +/**
> + * GRO table structure. DPDK GRO uses GRO table to reassemble
> + * packets. In heightweight mode, applications must create GRO tables
> + * before performing GRO. However, in lightweight mode, applications
> + * don't need create GRO tables.
> + *
> + * A GRO table object stores many reassembly tables of desired
> + * GRO types.
> + */
> +struct rte_gro_tbl {
> +	/* table addresses of desired GRO types */
> +	void *tbls[GRO_TYPE_MAX_NB];
> +	uint64_t desired_gro_types;	/**< GRO types that want to perform */
> +	/**
> +	 * the maximum time of packets staying in GRO tables, measured in
> +	 * nanosecond.
> +	 */
> +	uint64_t max_timeout_cycles;
> +	/* the maximum length of merged packet, measured in byte */
> +	uint32_t max_packet_size;
> +};
> +
> +/**
> + * In lightweihgt mode, applications use this strcuture to pass the
> + * needed parameters to rte_gro_reassemble_burst.
> + */
> +struct rte_gro_param {
> +	uint16_t max_flow_num;	/**< max flow number */
> +	uint16_t max_item_per_flow;	/**< max item number per flow */
> +	/**
> +	 * It indicates the GRO types that applications want to perform,
> +	 * whose value is the result of OR operation on GRO type flags.
> +	 */
> +	uint64_t desired_gro_types;
> +	/* the maximum packet size after been merged */
> +	uint32_t max_packet_size;
> +};
> +
> +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +typedef void (*gro_tbl_destroy_fn)(void *tbl);
> +
> +/**
> + * This function create a GRO table, which is used to merge packets.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  the maximum flow number in the GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow
> + * @param max_packet_size
> + *  the maximum size of merged packets, which is measured in byte.
> + * @param max_timeout_cycles
> + *  the maximum time that a packet can stay in the GRO table.
> + * @param desired_gro_types
> + *  GRO types that applications want to perform. It's value is the
> + *  result of OR operation on desired GRO type flags.
> + * @return
> + *  If create successfully, return a pointer which points to the GRO
> + *  table. Otherwise, return NULL.
> + */
> +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow,
> +		uint32_t max_packet_size,
> +		uint64_t max_timeout_cycles,
> +		uint64_t desired_gro_types);

Strange, I did not see any where you use this API. And what's more, is 
it really necessary to make these two, create and destroy the table, as 
APIs?

> +/**
> + * This function destroys a GRO table.
> + */
> +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);

If this is a API to be used by users, please clarify how and when to use it.

> +
> +/**
> + * This is the main reassembly API used in lightweight mode, which
> + * merges numbers of packets at a time. After it returns, applications
> + * can get GROed packets immediately. Applications don't need to
> + * flush packets manually. In lightweight mode, applications just need
> + * to tell the reassembly API what rules should be applied when merge
> + * packets. Therefore, applications can perform GRO in very a simple
> + * way.
> + *
> + * To process one packet, we find its corresponding reassembly table
> + * according to the packet type. Then search for the reassembly table
> + * to find one packet to merge. If find, chain the two packets together.
> + * If not find, insert the inputted packet into the reassembly table.
> + * Besides, to merge two packets is to chain them together. No
> + * memory copy is needed. Before rte_gro_reassemble_burst returns,
> + * header checksums of merged packets are re-calculated.
> + *
> + * @param pkts
> + *  a pointer array which points to the packets to reassemble. After
> + *  GRO, it is also used to keep GROed packets.
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param param
> + *  Applications use param to tell rte_gro_reassemble_burst what rules
> + *  are demanded.
> + * @return
> + *  the number of packets after GROed.
> + */
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		const uint16_t nb_pkts __rte_unused,
> +		const struct rte_gro_param param __rte_unused);
> +
> +/**
> + * This is the main reassembly API used in heavyweight mode, which
> + * merges one packet at a time. The procedure of merging one packet is
> + * similar with rte_gro_reassemble_burst. But rte_gro_reassemble will
> + * not update header checksums. Header checksums of merged packets are
> + * re-calculated in flush APIs.
> + *
> + * If error happens, like packet with error checksum and with
> + * unsupported GRO types, the inputted packet won't be stored in GRO
> + * table. If no errors happen, the packet is either merged with an
> + * existed packet, or inserted into its corresponding reassembly table.
> + * Applications can get packets in the GRO table by flush APIs.
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param gro_tbl
> + *  a pointer points to a GRO table.
> + * @return
> + *  if merge the packet successfully, return a positive value. If fail
> + *  to merge, return zero. If errors happen, return a negative value.
> + */
> +int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> +		struct rte_gro_tbl *gro_tbl __rte_unused);
> +
> +/**
> + * This function flushed packets of desired GRO types from their
> + * corresponding reassembly tables.
> + *
> + * @param gro_tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + *  GRO types whose packets will be flushed.
> + * @param flush_num
> + *  the number of packets that need flushing.
> + * @param out
> + *  a pointer array that is used to keep flushed packets.
> + * @param nb_out
> + *  the size of out.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		uint16_t flush_num __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused);

Still, don't see anywhere to call this function. How can we make sure it 
correct then?

> +
> +/**
> + * This function flushes the timeout packets from reassembly tables of
> + * desired GRO types.
> + *
> + * @param gro_tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + * rte_gro_timeout_flush only processes packets which belong to the
> + * GRO types specified by desired_gro_types.
> + * @param out
> + *  a pointer array that is used to keep flushed packets.
> + * @param nb_out
> + *  the size of out.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused);
> +#endif
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index bcaf1b3..fc3776d 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-18  7:21         ` [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-19 15:43           ` Tan, Jianfeng
  2017-06-20  3:22             ` Jiayu Hu
  2017-06-22  8:18             ` Jiayu Hu
  0 siblings, 2 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-19 15:43 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao



On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> In this patch, we introduce six APIs to support TCP/IPv4 GRO.

Those functions are not used outside of this library. Don't make it as 
extern visible.

> - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
>      merge packets.

Will tcp6 shares the same function with tcp4? If no, please rename it to 
gro_tcp4_tlb_create

> - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
> - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
>      reassembly table.
> - gro_tcp4_reassemble: merge an inputted packet.
> - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
>      all merged packets in the TCP reassembly table.
>
> In TCP GRO, we use a table structure, called TCP reassembly table, to
> reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> structure. A TCP reassembly table includes a flow array and a item array,
> where the flow array is used to record flow information and the item
> array is used to record packets information.
>
> Each element in the flow array records the information of one flow,
> which includes two parts:
> - key: the criteria of the same flow. If packets have the same key
>      value, they belong to the same flow.
> - start_index: the index of the first incoming packet of this flow in
>      the item array. With start_index, we can locate the first incoming
>      packet of this flow.
> Each element in the item array records one packet information. It mainly
> includes two parts:
> - pkt: packet address
> - next_pkt_index: index of the next packet of the same flow in the item
>      array. All packets of the same flow are chained by next_pkt_index.
>      With next_pkt_index, we can locate all packets of the same flow
>      one by one.
>
> To process an incoming packet, we need three steps:
> a. check if the packet should be processed. Packets with the following
>      properties won't be processed:
> 	- packets without data;
> 	- packets with wrong checksums;

Why do we care to check this kind of error? Can we just suppose the 
applications have already dropped the packets with wrong cksum?

> 	- fragmented packets.

IP fragmented? I don't think we need to check it here either. It's the 
application's responsibility to call librte_ip_frag firstly to 
reassemble IP-fragmented packets, and then call this gro library to 
merge TCP packets. And this procedure should be shown in an example for 
other users to refer.

> b. traverse the flow array to find a flow which the packet belongs to.
>      If not find, insert a new flow and store the packet into the item
>      array.

You do not store the packet now. "store the packet into the item array" 
-> "then go to step c".

> c. locate the first packet of this flow in the item array via
>      start_index. Then traverse all packets of this flow one by one via
>      next_pkt_index. If find one packet to merge with the incoming packet,
>      merge them but without updating checksums. If not, allocate one item
>      in the item array to store the incoming packet and update
>      next_pkt_index value.
>
> For better performance, we don't update header checksums once two
> packets are merged. The header checksums are updated only when packets
> are flushed from TCP reassembly tables.

Why do we care to recalculate the L4 checksum when flushing? How about 
Just keeping the wrong cksum, and letting the applications to handle that?


>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   lib/librte_gro/Makefile      |   1 +
>   lib/librte_gro/rte_gro.c     | 154 +++++++++++--
>   lib/librte_gro/rte_gro.h     |  34 +--
>   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
>   5 files changed, 895 insertions(+), 31 deletions(-)
>   create mode 100644 lib/librte_gro/rte_gro_tcp.c
>   create mode 100644 lib/librte_gro/rte_gro_tcp.h
>
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 9f4063a..3495dfc 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
>   
>   # source files
>   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c

Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.

>   
>   # install this header file
>   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index 1bc53a2..2620ef6 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,11 +32,17 @@
>   
>   #include <rte_malloc.h>
>   #include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
>   
>   #include "rte_gro.h"
> +#include "rte_gro_tcp.h"
>   
> -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> +	gro_tcp_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> +	gro_tcp_tbl_destroy, NULL};
>   
>   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
>   		uint16_t max_flow_num,
> @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
>   }
>   
>   uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>   		const uint16_t nb_pkts,
> -		const struct rte_gro_param param __rte_unused)
> +		const struct rte_gro_param param)
>   {
> -	return nb_pkts;
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	uint16_t l3proc_type, i;

I did not catch the variable definition here: l3proc_type -> l3_proto?

> +	uint16_t nb_after_gro = nb_pkts;
> +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
> +		nb_pkts : param.max_flow_num;
> +	uint32_t item_num = nb_pkts <
> +		flow_num * param.max_item_per_flow ?
> +		nb_pkts :
> +		flow_num * param.max_item_per_flow;
> +
> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
> +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
> +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;

Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my 
previous comment, we iterate all ptypes of this packets to iterate all 
supported GRO engine.

> +	struct gro_tcp_tbl tcp_tbl;
> +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
> +	struct gro_tcp_item tcp_items[tcp_item_num];
> +	struct gro_tcp_rule tcp_rule;
> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +
> +	if (unlikely(nb_pkts <= 1))
> +		return nb_pkts;
> +
> +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
> +			tcp_flow_num);
> +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> +			tcp_item_num);
> +	tcp_tbl.flows = tcp_flows;
> +	tcp_tbl.items = tcp_items;
> +	tcp_tbl.flow_num = 0;
> +	tcp_tbl.item_num = 0;
> +	tcp_tbl.max_flow_num = tcp_flow_num;
> +	tcp_tbl.max_item_num = tcp_item_num;
> +	tcp_rule.max_packet_size = param.max_packet_size;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
> +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> +		if (l3proc_type == ETHER_TYPE_IPv4) {
> +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> +					(param.desired_gro_types &
> +					 GRO_TCP_IPV4)) {
> +				ret = gro_tcp4_reassemble(pkts[i],
> +						&tcp_tbl,
> +						&tcp_rule);
> +				if (ret > 0)
> +					nb_after_gro--;
> +				else if (ret < 0)
> +					unprocess_pkts[unprocess_num++] =
> +						pkts[i];
> +			} else
> +				unprocess_pkts[unprocess_num++] =
> +					pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] =
> +				pkts[i];
> +	}
> +
> +	if (nb_after_gro < nb_pkts) {
> +		/* update packets headers and re-arrange GROed packets */
> +		if (param.desired_gro_types & GRO_TCP_IPV4) {
> +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
> +			for (i = 0; i < tcp_tbl.item_num; i++)
> +				pkts[i] = tcp_tbl.items[i].pkt;
> +		}
> +		if (unprocess_num > 0) {
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) *
> +					unprocess_num);
> +			i += unprocess_num;
> +		}
> +		if (nb_pkts > i)
> +			memset(&pkts[i], 0,
> +					sizeof(struct rte_mbuf *) *
> +					(nb_pkts - i));
> +	}
> +	return nb_after_gro;
>   }
>   
> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> -		struct rte_gro_tbl *gro_tbl __rte_unused)
> +int rte_gro_reassemble(struct rte_mbuf *pkt,
> +		struct rte_gro_tbl *gro_tbl)
>   {
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	uint16_t l3proc_type;
> +	struct gro_tcp_rule tcp_rule;
> +
> +	if (pkt == NULL)
> +		return -1;
> +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> +	if (l3proc_type == ETHER_TYPE_IPv4) {
> +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
> +			return gro_tcp4_reassemble(pkt,
> +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +					&tcp_rule);
> +		}
> +	}
>   	return -1;
>   }
>   
> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		uint16_t flush_num __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused)
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		uint16_t flush_num,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out)
>   {

Ditto.

> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & GRO_TCP_IPV4)
> +		return gro_tcp_tbl_flush(
> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +				flush_num,
> +				out,
> +				max_nb_out);
>   	return 0;
>   }
>   
>   uint16_t
> -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out)
>   {
> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & GRO_TCP_IPV4)
> +		return gro_tcp_tbl_timeout_flush(
> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +				gro_tbl->max_timeout_cycles,
> +				out, max_nb_out);
>   	return 0;
>   }
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 67bd90d..e26aa5b 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -35,7 +35,11 @@
>   
>   /* maximum number of supported GRO types */
>   #define GRO_TYPE_MAX_NB 64
> -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> +
> +/* TCP/IPv4 GRO flag */
> +#define GRO_TCP_IPV4_INDEX 0
> +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
>   
>   /**
>    * GRO table structure. DPDK GRO uses GRO table to reassemble
> @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
>    * @return
>    *  the number of packets after GROed.
>    */
> -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> -		const uint16_t nb_pkts __rte_unused,
> -		const struct rte_gro_param param __rte_unused);
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> +		const uint16_t nb_pkts,
> +		const struct rte_gro_param param);
>   
>   /**
>    * This is the main reassembly API used in heavyweight mode, which
> @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>    *  if merge the packet successfully, return a positive value. If fail
>    *  to merge, return zero. If errors happen, return a negative value.
>    */
> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> -		struct rte_gro_tbl *gro_tbl __rte_unused);
> +int rte_gro_reassemble(struct rte_mbuf *pkt,
> +		struct rte_gro_tbl *gro_tbl);
>   
>   /**
>    * This function flushed packets of desired GRO types from their
> @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>    * @return
>    *  the number of flushed packets. If no packets are flushed, return 0.
>    */
> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		uint16_t flush_num __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused);
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		uint16_t flush_num,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out);
>   
>   /**
>    * This function flushes the timeout packets from reassembly tables of
> @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>    * @return
>    *  the number of flushed packets. If no packets are flushed, return 0.
>    */
> -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused);
> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out);

Do you have any cases to test this API? I don't see following example 
use this API. That means we are exposing an API that are never tested. I 
don't know if we can add some experimental flag on this API. Let's seek 
advice from others.

>   #endif
> diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> new file mode 100644
> index 0000000..86743cd
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp.c
> @@ -0,0 +1,527 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "rte_gro_tcp.h"
> +
> +void *gro_tcp_tbl_create(uint16_t socket_id,

Define it as "static". Similar to other functions.

> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	size_t size;
> +	uint32_t entries_num;
> +	struct gro_tcp_tbl *tbl;
> +
> +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
> +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> +
> +	if (entries_num == 0 || max_flow_num == 0)
> +		return NULL;
> +
> +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> +			__func__,
> +			sizeof(struct gro_tcp_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +
> +	size = sizeof(struct gro_tcp_item) * entries_num;
> +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> +			__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	tbl->max_item_num = entries_num;
> +
> +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
> +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
> +			__func__,
> +			size, RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	tbl->max_flow_num = max_flow_num;
> +	return tbl;
> +}
> +
> +void gro_tcp_tbl_destroy(void *tbl)
> +{
> +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> +
> +	if (tcp_tbl) {
> +		if (tcp_tbl->items)
> +			rte_free(tcp_tbl->items);
> +		if (tcp_tbl->flows)
> +			rte_free(tcp_tbl->flows);
> +		rte_free(tcp_tbl);
> +	}
> +}
> +
> +/* update TCP header and IPv4 header checksum */
> +static void
> +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
> +{
> +	uint32_t len, offset, cksum;
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint16_t ipv4_ihl, cksum_pld;
> +
> +	if (pkt == NULL)
> +		return;
> +
> +	len = pkt->pkt_len;
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> +
> +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
> +	len -= offset;
> +
> +	/* TCP cksum without IP pseudo header */
> +	ipv4_hdr->hdr_checksum = 0;
> +	tcp_hdr->cksum = 0;
> +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
> +
> +	/* IP pseudo header cksum */
> +	cksum = cksum_pld;
> +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> +
> +	/* combine TCP checksum and IP pseudo header checksum */
> +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> +	cksum = (~cksum) & 0xffff;
> +	cksum = (cksum == 0) ? 0xffff : cksum;
> +	tcp_hdr->cksum = cksum;
> +
> +	/* update IP header cksum */
> +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> +}
> +
> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
> +{
> +	uint32_t i;
> +	uint32_t item_num = tbl->item_num;
> +
> +	for (i = 0; i < tbl->max_item_num; i++) {
> +		if (tbl->items[i].is_valid) {
> +			item_num--;
> +			if (tbl->items[i].is_groed)
> +				gro_tcp4_cksum_update(tbl->items[i].pkt);
> +		}
> +		if (unlikely(item_num == 0))
> +			break;
> +	}
> +}
> +
> +/**
> + * merge two TCP/IPv4 packets without update header checksum.
> + */
> +static int
> +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> +		struct rte_mbuf *pkt,
> +		struct gro_tcp_rule *rule)
> +{
> +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> +	struct tcp_hdr *tcp_hdr1;
> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> +	struct rte_mbuf *tail;
> +
> +	/* parse the given packet */
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				struct ether_hdr *) + 1);
> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> +		- tcp_hl1;
> +
> +	/* parse the original packet */
> +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> +				struct ether_hdr *) + 1);
> +
> +	/* check reassembly rules */
> +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
> +		return -1;
> +
> +	/* remove the header of the incoming packet */
> +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> +			ipv4_ihl1 + tcp_hl1);
> +
> +	/* chain the two packet together */
> +	tail = rte_pktmbuf_lastseg(pkt_src);
> +	tail->next = pkt;
> +
> +	/* update IP header */
> +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> +			rte_be_to_cpu_16(
> +				ipv4_hdr2->total_length)
> +			+ tcp_dl1);
> +
> +	/* update mbuf metadata for the merged packet */
> +	pkt_src->nb_segs++;
> +	pkt_src->pkt_len += pkt->pkt_len;
> +	return 1;
> +}
> +
> +static int
> +check_seq_option(struct rte_mbuf *pkt,
> +		struct tcp_hdr *tcp_hdr,
> +		uint16_t tcp_hl)
> +{
> +	struct ipv4_hdr *ipv4_hdr1;
> +	struct tcp_hdr *tcp_hdr1;
> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> +	uint32_t sent_seq1, sent_seq;
> +	int ret = -1;
> +
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				struct ether_hdr *) + 1);
> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> +		- tcp_hl1;
> +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	/* check if the two packets are neighbor */
> +	if ((sent_seq ^ sent_seq1) == 0) {
> +		/* check if TCP option field equals */
> +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> +			if ((tcp_hl1 != tcp_hl) ||
> +					(memcmp(tcp_hdr1 + 1,
> +							tcp_hdr + 1,
> +							tcp_hl - sizeof
> +							(struct tcp_hdr))
> +					 == 0))
> +				ret = 1;
> +		}
> +	}
> +	return ret;
> +}
> +
> +static uint32_t
> +find_an_empty_item(struct gro_tcp_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_item_num; i++)
> +		if (tbl->items[i].is_valid == 0)
> +			return i;
> +	return INVALID_ITEM_INDEX;
> +}
> +
> +static uint16_t
> +find_an_empty_flow(struct gro_tcp_tbl *tbl)
> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < tbl->max_flow_num; i++)
> +		if (tbl->flows[i].is_valid == 0)
> +			return i;
> +	return INVALID_FLOW_INDEX;
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp_tbl *tbl,
> +		struct gro_tcp_rule *rule)
> +{
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
> +
> +	struct gro_tcp_flow_key key;
> +	uint64_t ol_flags;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint16_t i, flow_idx;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> +
> +	/* 1. check if the packet should be processed */
> +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> +		goto fail;
> +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
> +		goto fail;
> +	if ((ipv4_hdr->fragment_offset &
> +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
> +			== 0)
> +		goto fail;
> +
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> +		- tcp_hl;
> +	if (tcp_dl == 0)
> +		goto fail;
> +
> +	/**
> +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
> +	 * checksum in SW. Then, check if the checksum is correct
> +	 */
> +	ol_flags = pkt->ol_flags;
> +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
> +			PKT_RX_IP_CKSUM_UNKNOWN) {
> +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
> +			goto fail;
> +	} else {
> +		ip_cksum = ipv4_hdr->hdr_checksum;
> +		ipv4_hdr->hdr_checksum = 0;
> +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
> +			goto fail;
> +	}
> +
> +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
> +			PKT_RX_L4_CKSUM_UNKNOWN) {
> +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
> +			goto fail;
> +	} else {
> +		tcp_cksum = tcp_hdr->cksum;
> +		tcp_hdr->cksum = 0;
> +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
> +			(ipv4_hdr, tcp_hdr);
> +		if (tcp_hdr->cksum ^ tcp_cksum)
> +			goto fail;
> +	}
> +
> +	/**
> +	 * 3. search for a flow and traverse all packets in the flow
> +	 * to find one to merge with the given packet.
> +	 */
> +	key.eth_saddr = eth_hdr->s_addr;
> +	key.eth_daddr = eth_hdr->d_addr;
> +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> +	key.tcp_flags = tcp_hdr->tcp_flags;
> +
> +	for (i = 0; i < tbl->max_flow_num; i++) {
> +		/* search all packets in a valid flow. */
> +		if (tbl->flows[i].is_valid &&
> +				(memcmp(&(tbl->flows[i].key), &key,
> +						sizeof(struct gro_tcp_flow_key))
> +				 == 0)) {
> +			cur_idx = tbl->flows[i].start_index;
> +			prev_idx = cur_idx;
> +			while (cur_idx != INVALID_ITEM_INDEX) {
> +				if (check_seq_option(tbl->items[cur_idx].pkt,
> +							tcp_hdr,
> +							tcp_hl) > 0) {
> +					if (merge_two_tcp4_packets(
> +								tbl->items[cur_idx].pkt,
> +								pkt,
> +								rule) > 0) {
> +						/* successfully merge two packets */
> +						tbl->items[cur_idx].is_groed = 1;
> +						return 1;
> +					}
> +					/**
> +					 * fail to merge two packets since
> +					 * break the rules, add the packet
> +					 * into the flow.
> +					 */
> +					goto insert_to_existed_flow;
> +				} else {
> +					prev_idx = cur_idx;
> +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> +				}
> +			}
> +			/**
> +			 * fail to merge the given packet into an existed flow,
> +			 * add it into the flow.
> +			 */
> +insert_to_existed_flow:
> +			item_idx = find_an_empty_item(tbl);
> +			/* the item number is beyond the maximum value */
> +			if (item_idx == INVALID_ITEM_INDEX)
> +				return -1;
> +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> +			tbl->items[item_idx].pkt = pkt;
> +			tbl->items[item_idx].is_groed = 0;
> +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> +			tbl->items[item_idx].is_valid = 1;
> +			tbl->items[item_idx].start_time = rte_rdtsc();
> +			tbl->item_num++;
> +			return 0;
> +		}
> +	}
> +
> +	/**
> +	 * merge fail as the given packet is a new flow. Therefore,
> +	 * insert a new flow.
> +	 */
> +	item_idx = find_an_empty_item(tbl);
> +	flow_idx = find_an_empty_flow(tbl);
> +	/**
> +	 * if the flow or item number are beyond the maximum values,
> +	 * the inputted packet won't be processed.
> +	 */
> +	if (item_idx == INVALID_ITEM_INDEX ||
> +			flow_idx == INVALID_FLOW_INDEX)
> +		return -1;
> +	tbl->items[item_idx].pkt = pkt;
> +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> +	tbl->items[item_idx].is_groed = 0;
> +	tbl->items[item_idx].is_valid = 1;
> +	tbl->items[item_idx].start_time = rte_rdtsc();
> +	tbl->item_num++;
> +
> +	memcpy(&(tbl->flows[flow_idx].key),
> +			&key, sizeof(struct gro_tcp_flow_key));
> +	tbl->flows[flow_idx].start_index = item_idx;
> +	tbl->flows[flow_idx].is_valid = 1;
> +	tbl->flow_num++;
> +
> +	return 0;
> +fail:
> +	return -1;
> +}
> +
> +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> +		uint16_t flush_num,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out)
> +{
> +	uint16_t num, k;
> +	uint16_t i;
> +	uint32_t j;
> +
> +	k = 0;
> +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
> +	num = num > nb_out ? nb_out : num;
> +	if (unlikely(num == 0))
> +		return 0;
> +
> +	for (i = 0; i < tbl->max_flow_num; i++) {
> +		if (tbl->flows[i].is_valid) {
> +			j = tbl->flows[i].start_index;
> +			while (j != INVALID_ITEM_INDEX) {
> +				/* update checksum for GROed packet */
> +				if (tbl->items[j].is_groed)
> +					gro_tcp4_cksum_update(tbl->items[j].pkt);
> +
> +				out[k++] = tbl->items[j].pkt;
> +				tbl->items[j].is_valid = 0;
> +				tbl->item_num--;
> +				j = tbl->items[j].next_pkt_idx;
> +
> +				if (k == num) {
> +					/* delete the flow */
> +					if (j == INVALID_ITEM_INDEX) {
> +						tbl->flows[i].is_valid = 0;
> +						tbl->flow_num--;
> +					} else
> +						/* update flow information */
> +						tbl->flows[i].start_index = j;
> +					goto end;
> +				}
> +			}
> +			/* delete the flow, as all of its packets are flushed */
> +			tbl->flows[i].is_valid = 0;
> +			tbl->flow_num--;
> +		}
> +		if (tbl->flow_num == 0)
> +			goto end;
> +	}
> +end:
> +	return num;
> +}
> +
> +uint16_t
> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out)
> +{
> +	uint16_t k;
> +	uint16_t i;
> +	uint32_t j;
> +	uint64_t current_time;
> +
> +	if (nb_out == 0)
> +		return 0;
> +	k = 0;
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < tbl->max_flow_num; i++) {
> +		if (tbl->flows[i].is_valid) {
> +			j = tbl->flows[i].start_index;
> +			while (j != INVALID_ITEM_INDEX) {
> +				if (current_time - tbl->items[j].start_time >=
> +						timeout_cycles) {
> +					/* update checksum for GROed packet */
> +					if (tbl->items[j].is_groed)
> +						gro_tcp4_cksum_update(tbl->items[j].pkt);
> +
> +					out[k++] = tbl->items[j].pkt;
> +					tbl->items[j].is_valid = 0;
> +					tbl->item_num--;
> +					j = tbl->items[j].next_pkt_idx;
> +
> +					if (k == nb_out &&
> +							j == INVALID_ITEM_INDEX) {
> +						/* delete the flow */
> +						tbl->flows[i].is_valid = 0;
> +						tbl->flow_num--;
> +						goto end;
> +					} else if (k == nb_out &&
> +							j != INVALID_ITEM_INDEX) {
> +						tbl->flows[i].start_index = j;
> +						goto end;
> +					}
> +				}
> +			}
> +			/* delete the flow, as all of its packets are flushed */
> +			tbl->flows[i].is_valid = 0;
> +			tbl->flow_num--;
> +		}
> +		if (tbl->flow_num == 0)
> +			goto end;
> +	}
> +end:
> +	return k;
> +}
> diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> new file mode 100644
> index 0000000..551efc4
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp.h
> @@ -0,0 +1,210 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_TCP_H_
> +#define _RTE_GRO_TCP_H_
> +
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> +#define TCP_HDR_LEN(tcph) \
> +	((tcph->data_off >> 4) * 4)
> +#define IPv4_HDR_LEN(iph) \
> +	((iph->version_ihl & 0x0f) * 4)
> +#else
> +#define TCP_DATAOFF_MASK 0x0f
> +#define TCP_HDR_LEN(tcph) \
> +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> +#define IPv4_HDR_LEN(iph) \
> +	((iph->version_ihl >> 4) * 4)
> +#endif
> +
> +#define IPV4_HDR_DF_SHIFT 14
> +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> +
> +#define INVALID_FLOW_INDEX 0xffffU
> +#define INVALID_ITEM_INDEX 0xffffffffUL
> +
> +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
> +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> +
> +/* criteria of mergeing packets */
> +struct gro_tcp_flow_key {
> +	struct ether_addr eth_saddr;
> +	struct ether_addr eth_daddr;
> +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> +	uint32_t ip_dst_addr[4];
> +
> +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +	uint8_t tcp_flags;	/**< TCP flags. */
> +};
> +
> +struct gro_tcp_flow {
> +	struct gro_tcp_flow_key key;
> +	uint32_t start_index;	/**< the first packet index of the flow */
> +	uint8_t is_valid;
> +};
> +
> +struct gro_tcp_item {
> +	struct rte_mbuf *pkt;	/**< packet address. */
> +	/* the time when the packet in added into the table */
> +	uint64_t start_time;
> +	uint32_t next_pkt_idx;	/**< next packet index. */
> +	/* flag to indicate if the packet is GROed */
> +	uint8_t is_groed;
> +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> +};
> +
> +/**
> + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> + * structure.
> + */
> +struct gro_tcp_tbl {
> +	struct gro_tcp_item *items;	/**< item array */
> +	struct gro_tcp_flow *flows;	/**< flow array */
> +	uint32_t item_num;	/**< current item number */
> +	uint16_t flow_num;	/**< current flow num */
> +	uint32_t max_item_num;	/**< item array size */
> +	uint16_t max_flow_num;	/**< flow array size */
> +};
> +
> +/* rules to reassemble TCP packets, which are decided by applications */
> +struct gro_tcp_rule {
> +	/* the maximum packet length after merged */
> +	uint32_t max_packet_size;
> +};

Are there any other rules? If not, I prefer to use max_packet_size directly.

> +
> +/**
> + * This function is to update TCP and IPv4 header checksums
> + * for merged packets in the TCP reassembly table.
> + */
> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
> +
> +/**
> + * This function creates a TCP reassembly table.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  the maximum number of flows in the TCP GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow.
> + * @return
> + *  if create successfully, return a pointer which points to the
> + *  created TCP GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +
> +/**
> + * This function destroys a TCP reassembly table.
> + * @param tbl
> + *  a pointer points to the TCP reassembly table.
> + */
> +void gro_tcp_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP reassembly table to
> + * merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. Note that this function won't
> + * re-calculate IPv4 and TCP checksums.
> + *
> + * If the packet doesn't have data, or with wrong checksums, or is
> + * fragmented etc., errors happen and gro_tcp4_reassemble returns
> + * immediately. If no errors happen, the packet is either merged, or
> + * inserted into the reassembly table.
> + *
> + * If applications want to get packets in the reassembly table, they
> + * need to manually flush the packets.
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param tbl
> + *  a pointer that points to a TCP reassembly table.
> + * @param rule
> + *  TCP reassembly criteria defined by applications.
> + * @return
> + *  if the inputted packet is merged successfully, return an positive
> + *  value. If the packet hasn't be merged with any packets in the TCP
> + *  reassembly table. If errors happen, return a negative value and the
> + *  packet won't be inserted into the reassemble table.
> + */
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp_tbl *tbl,
> +		struct gro_tcp_rule *rule);
> +
> +/**
> + * This function flushes the packets in a TCP reassembly table to
> + * applications. Before returning the packets, it will update TCP and
> + * IPv4 header checksums.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param flush_num
> + *  the number of packets that applications want to flush.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the maximum element number of out.
> + * @return
> + *  the number of packets that are flushed finally.
> + */
> +uint16_t
> +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> +		uint16_t flush_num,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out);
> +
> +/**
> + * This function flushes timeout packets in a TCP reassembly table to
> + * applications. Before returning the packets, it updates TCP and IPv4
> + * header checksums.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + *  the maximum time that packets can stay in the table.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the maximum element number of out.
> + * @return
> + *  It returns the number of packets that are flushed finally.
> + */
> +uint16_t
> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out);
> +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 1/3] lib: add Generic Receive Offload API framework
  2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-19  4:03           ` Tiwei Bie
  2017-06-19 15:43           ` Tan, Jianfeng
@ 2017-06-19 15:55           ` Stephen Hemminger
  2017-06-20  1:48             ` Jiayu Hu
  2 siblings, 1 reply; 141+ messages in thread
From: Stephen Hemminger @ 2017-06-19 15:55 UTC (permalink / raw)
  To: Jiayu Hu
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, jianfeng.tan,
	tiwei.bie, lei.a.yao

On Sun, 18 Jun 2017 15:21:07 +0800
Jiayu Hu <jiayu.hu@intel.com> wrote:

> +/**
> + * This is the main reassembly API used in lightweight mode, which
> + * merges numbers of packets at a time. After it returns, applications
> + * can get GROed packets immediately. Applications don't need to
> + * flush packets manually. In lightweight mode, applications just need
> + * to tell the reassembly API what rules should be applied when merge
> + * packets. Therefore, applications can perform GRO in very a simple
> + * way.
> + *
> + * To process one packet, we find its corresponding reassembly table
> + * according to the packet type. Then search for the reassembly table
> + * to find one packet to merge. If find, chain the two packets together.
> + * If not find, insert the inputted packet into the reassembly table.
> + * Besides, to merge two packets is to chain them together. No
> + * memory copy is needed. Before rte_gro_reassemble_burst returns,
> + * header checksums of merged packets are re-calculated.
> + *
> + * @param pkts
> + *  a pointer array which points to the packets to reassemble. After
> + *  GRO, it is also used to keep GROed packets.
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param param
> + *  Applications use param to tell rte_gro_reassemble_burst what rules
> + *  are demanded.
> + * @return
> + *  the number of packets after GROed.
> + */
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		const uint16_t nb_pkts __rte_unused,
> +		const struct rte_gro_param param __rte_unused);

I think the __rte_unused attribute should be on the function definition,
not on the prototype. I think GCC ignores it on function prototypes.

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 1/3] lib: add Generic Receive Offload API framework
  2017-06-19 15:55           ` Stephen Hemminger
@ 2017-06-20  1:48             ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-20  1:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, jianfeng.tan,
	tiwei.bie, lei.a.yao

On Mon, Jun 19, 2017 at 08:55:00AM -0700, Stephen Hemminger wrote:
> On Sun, 18 Jun 2017 15:21:07 +0800
> Jiayu Hu <jiayu.hu@intel.com> wrote:
> 
> > +/**
> > + * This is the main reassembly API used in lightweight mode, which
> > + * merges numbers of packets at a time. After it returns, applications
> > + * can get GROed packets immediately. Applications don't need to
> > + * flush packets manually. In lightweight mode, applications just need
> > + * to tell the reassembly API what rules should be applied when merge
> > + * packets. Therefore, applications can perform GRO in very a simple
> > + * way.
> > + *
> > + * To process one packet, we find its corresponding reassembly table
> > + * according to the packet type. Then search for the reassembly table
> > + * to find one packet to merge. If find, chain the two packets together.
> > + * If not find, insert the inputted packet into the reassembly table.
> > + * Besides, to merge two packets is to chain them together. No
> > + * memory copy is needed. Before rte_gro_reassemble_burst returns,
> > + * header checksums of merged packets are re-calculated.
> > + *
> > + * @param pkts
> > + *  a pointer array which points to the packets to reassemble. After
> > + *  GRO, it is also used to keep GROed packets.
> > + * @param nb_pkts
> > + *  the number of packets to reassemble.
> > + * @param param
> > + *  Applications use param to tell rte_gro_reassemble_burst what rules
> > + *  are demanded.
> > + * @return
> > + *  the number of packets after GROed.
> > + */
> > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +		const uint16_t nb_pkts __rte_unused,
> > +		const struct rte_gro_param param __rte_unused);
> 
> I think the __rte_unused attribute should be on the function definition,
> not on the prototype. I think GCC ignores it on function prototypes.

Thanks. I will modify it in next patch.


BRs,
Jiayu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-19 15:43           ` Tan, Jianfeng
@ 2017-06-20  3:22             ` Jiayu Hu
  2017-06-20 15:15               ` Ananyev, Konstantin
                                 ` (2 more replies)
  2017-06-22  8:18             ` Jiayu Hu
  1 sibling, 3 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-20  3:22 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao

Hi Jianfeng,

On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > In this patch, we introduce six APIs to support TCP/IPv4 GRO.
> 
> Those functions are not used outside of this library. Don't make it as
> extern visible.

But they are called by functions in rte_gro.c, which are in the different
file. If we define these functions with static, how can they be called by
other functions in the different file?

> 
> > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> >      merge packets.
> 
> Will tcp6 shares the same function with tcp4? If no, please rename it to
> gro_tcp4_tlb_create

In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
they will have different reassembly function. Therefore, I use
gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.

> 
> > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
> > - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
> >      reassembly table.
> > - gro_tcp4_reassemble: merge an inputted packet.
> > - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
> >      all merged packets in the TCP reassembly table.
> > 
> > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > structure. A TCP reassembly table includes a flow array and a item array,
> > where the flow array is used to record flow information and the item
> > array is used to record packets information.
> > 
> > Each element in the flow array records the information of one flow,
> > which includes two parts:
> > - key: the criteria of the same flow. If packets have the same key
> >      value, they belong to the same flow.
> > - start_index: the index of the first incoming packet of this flow in
> >      the item array. With start_index, we can locate the first incoming
> >      packet of this flow.
> > Each element in the item array records one packet information. It mainly
> > includes two parts:
> > - pkt: packet address
> > - next_pkt_index: index of the next packet of the same flow in the item
> >      array. All packets of the same flow are chained by next_pkt_index.
> >      With next_pkt_index, we can locate all packets of the same flow
> >      one by one.
> > 
> > To process an incoming packet, we need three steps:
> > a. check if the packet should be processed. Packets with the following
> >      properties won't be processed:
> > 	- packets without data;
> > 	- packets with wrong checksums;
> 
> Why do we care to check this kind of error? Can we just suppose the
> applications have already dropped the packets with wrong cksum?

Indeed, if we assume all inputted packets are correct, we can avoid
checksum checking overhead. But as a library, I think a more flexible
way is to enable applications to tell GRO API if checksum checking
is needed. For example, we can add a flag to struct rte_gro_tbl
and struct rte_gro_param, which indicates if the checksum checking
is needed. If applications set this flag, reassembly function won't
check packet checksum. Otherwise, we check the checksum. How do you
think?

> 
> > 	- fragmented packets.
> 
> IP fragmented? I don't think we need to check it here either. It's the
> application's responsibility to call librte_ip_frag firstly to reassemble
> IP-fragmented packets, and then call this gro library to merge TCP packets.
> And this procedure should be shown in an example for other users to refer.
> 
> > b. traverse the flow array to find a flow which the packet belongs to.
> >      If not find, insert a new flow and store the packet into the item
> >      array.
> 
> You do not store the packet now. "store the packet into the item array" ->
> "then go to step c".

Thanks, I will update it in next patch.

> 
> > c. locate the first packet of this flow in the item array via
> >      start_index. Then traverse all packets of this flow one by one via
> >      next_pkt_index. If find one packet to merge with the incoming packet,
> >      merge them but without updating checksums. If not, allocate one item
> >      in the item array to store the incoming packet and update
> >      next_pkt_index value.
> > 
> > For better performance, we don't update header checksums once two
> > packets are merged. The header checksums are updated only when packets
> > are flushed from TCP reassembly tables.
> 
> Why do we care to recalculate the L4 checksum when flushing? How about Just
> keeping the wrong cksum, and letting the applications to handle that?

Not all applications want GROed packets with wrong checksum. So I think a
more reasonable way is to give a flag to applications to tell GRO API if
they need calculate checksum when flush them from GRO table. How do you
think?

> 
> 
> > 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >   lib/librte_gro/Makefile      |   1 +
> >   lib/librte_gro/rte_gro.c     | 154 +++++++++++--
> >   lib/librte_gro/rte_gro.h     |  34 +--
> >   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> >   5 files changed, 895 insertions(+), 31 deletions(-)
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > 
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > index 9f4063a..3495dfc 100644
> > --- a/lib/librte_gro/Makefile
> > +++ b/lib/librte_gro/Makefile
> > @@ -43,6 +43,7 @@ LIBABIVER := 1
> >   # source files
> >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> 
> Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.

TCP4 and TCP6 reassembly functions will be placed in the same file,
rte_gro_tcp.c. But currently, we don't support TCP6 GRO.

> 
> >   # install this header file
> >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > index 1bc53a2..2620ef6 100644
> > --- a/lib/librte_gro/rte_gro.c
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -32,11 +32,17 @@
> >   #include <rte_malloc.h>
> >   #include <rte_mbuf.h>
> > +#include <rte_ethdev.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> >   #include "rte_gro.h"
> > +#include "rte_gro_tcp.h"
> > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> > +	gro_tcp_tbl_create, NULL};
> > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> > +	gro_tcp_tbl_destroy, NULL};
> >   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> >   		uint16_t max_flow_num,
> > @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> >   }
> >   uint16_t
> > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> >   		const uint16_t nb_pkts,
> > -		const struct rte_gro_param param __rte_unused)
> > +		const struct rte_gro_param param)
> >   {
> > -	return nb_pkts;
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	uint16_t l3proc_type, i;
> 
> I did not catch the variable definition here: l3proc_type -> l3_proto?

You can see it in line 158 and line 159.

> 
> > +	uint16_t nb_after_gro = nb_pkts;
> > +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
> > +		nb_pkts : param.max_flow_num;
> > +	uint32_t item_num = nb_pkts <
> > +		flow_num * param.max_item_per_flow ?
> > +		nb_pkts :
> > +		flow_num * param.max_item_per_flow;
> > +
> > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
> > +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
> > +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> > +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
> 
> Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
> comment, we iterate all ptypes of this packets to iterate all supported GRO
> engine.

Sorry, I don't get the point. The table which is created here is used by
gro_tcp4_reassemble when merges packets. If we don't create table here,
what does gro_tcp4_reassemble use to merge packets?

> 
> > +	struct gro_tcp_tbl tcp_tbl;
> > +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
> > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > +	struct gro_tcp_rule tcp_rule;
> > +
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	uint16_t unprocess_num = 0;
> > +	int32_t ret;
> > +
> > +	if (unlikely(nb_pkts <= 1))
> > +		return nb_pkts;
> > +
> > +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
> > +			tcp_flow_num);
> > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > +			tcp_item_num);
> > +	tcp_tbl.flows = tcp_flows;
> > +	tcp_tbl.items = tcp_items;
> > +	tcp_tbl.flow_num = 0;
> > +	tcp_tbl.item_num = 0;
> > +	tcp_tbl.max_flow_num = tcp_flow_num;
> > +	tcp_tbl.max_item_num = tcp_item_num;
> > +	tcp_rule.max_packet_size = param.max_packet_size;
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
> > +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > +		if (l3proc_type == ETHER_TYPE_IPv4) {
> > +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > +					(param.desired_gro_types &
> > +					 GRO_TCP_IPV4)) {
> > +				ret = gro_tcp4_reassemble(pkts[i],
> > +						&tcp_tbl,
> > +						&tcp_rule);
> > +				if (ret > 0)
> > +					nb_after_gro--;
> > +				else if (ret < 0)
> > +					unprocess_pkts[unprocess_num++] =
> > +						pkts[i];
> > +			} else
> > +				unprocess_pkts[unprocess_num++] =
> > +					pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] =
> > +				pkts[i];
> > +	}
> > +
> > +	if (nb_after_gro < nb_pkts) {
> > +		/* update packets headers and re-arrange GROed packets */
> > +		if (param.desired_gro_types & GRO_TCP_IPV4) {
> > +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
> > +			for (i = 0; i < tcp_tbl.item_num; i++)
> > +				pkts[i] = tcp_tbl.items[i].pkt;
> > +		}
> > +		if (unprocess_num > 0) {
> > +			memcpy(&pkts[i], unprocess_pkts,
> > +					sizeof(struct rte_mbuf *) *
> > +					unprocess_num);
> > +			i += unprocess_num;
> > +		}
> > +		if (nb_pkts > i)
> > +			memset(&pkts[i], 0,
> > +					sizeof(struct rte_mbuf *) *
> > +					(nb_pkts - i));
> > +	}
> > +	return nb_after_gro;
> >   }
> > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > +		struct rte_gro_tbl *gro_tbl)
> >   {
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	uint16_t l3proc_type;
> > +	struct gro_tcp_rule tcp_rule;
> > +
> > +	if (pkt == NULL)
> > +		return -1;
> > +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > +	if (l3proc_type == ETHER_TYPE_IPv4) {
> > +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
> > +			return gro_tcp4_reassemble(pkt,
> > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +					&tcp_rule);
> > +		}
> > +	}
> >   	return -1;
> >   }
> > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		uint16_t flush_num __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused)
> > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		uint16_t flush_num,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out)
> >   {
> 
> Ditto.
> 
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & GRO_TCP_IPV4)
> > +		return gro_tcp_tbl_flush(
> > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +				flush_num,
> > +				out,
> > +				max_nb_out);
> >   	return 0;
> >   }
> >   uint16_t
> > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused)
> > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out)
> >   {
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & GRO_TCP_IPV4)
> > +		return gro_tcp_tbl_timeout_flush(
> > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +				gro_tbl->max_timeout_cycles,
> > +				out, max_nb_out);
> >   	return 0;
> >   }
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > index 67bd90d..e26aa5b 100644
> > --- a/lib/librte_gro/rte_gro.h
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -35,7 +35,11 @@
> >   /* maximum number of supported GRO types */
> >   #define GRO_TYPE_MAX_NB 64
> > -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> > +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> > +
> > +/* TCP/IPv4 GRO flag */
> > +#define GRO_TCP_IPV4_INDEX 0
> > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> >   /**
> >    * GRO table structure. DPDK GRO uses GRO table to reassemble
> > @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> >    * @return
> >    *  the number of packets after GROed.
> >    */
> > -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > -		const uint16_t nb_pkts __rte_unused,
> > -		const struct rte_gro_param param __rte_unused);
> > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > +		const uint16_t nb_pkts,
> > +		const struct rte_gro_param param);
> >   /**
> >    * This is the main reassembly API used in heavyweight mode, which
> > @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> >    *  if merge the packet successfully, return a positive value. If fail
> >    *  to merge, return zero. If errors happen, return a negative value.
> >    */
> > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > -		struct rte_gro_tbl *gro_tbl __rte_unused);
> > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > +		struct rte_gro_tbl *gro_tbl);
> >   /**
> >    * This function flushed packets of desired GRO types from their
> > @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> >    * @return
> >    *  the number of flushed packets. If no packets are flushed, return 0.
> >    */
> > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		uint16_t flush_num __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused);
> > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		uint16_t flush_num,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out);
> >   /**
> >    * This function flushes the timeout packets from reassembly tables of
> > @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> >    * @return
> >    *  the number of flushed packets. If no packets are flushed, return 0.
> >    */
> > -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused);
> > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out);
> 
> Do you have any cases to test this API? I don't see following example use
> this API. That means we are exposing an API that are never tested. I don't
> know if we can add some experimental flag on this API. Let's seek advice
> from others.

These flush APIs are used in heavyweight mode. But testpmd is not a good case
to use heavyweight mode. How do you think if we use some unit tests to test
them?

> 
> >   #endif
> > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > new file mode 100644
> > index 0000000..86743cd
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp.c
> > @@ -0,0 +1,527 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +
> > +#include <rte_ethdev.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "rte_gro_tcp.h"
> > +
> > +void *gro_tcp_tbl_create(uint16_t socket_id,
> 
> Define it as "static". Similar to other functions.
> 
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow)
> > +{
> > +	size_t size;
> > +	uint32_t entries_num;
> > +	struct gro_tcp_tbl *tbl;
> > +
> > +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
> > +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
> > +
> > +	entries_num = max_flow_num * max_item_per_flow;
> > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > +
> > +	if (entries_num == 0 || max_flow_num == 0)
> > +		return NULL;
> > +
> > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > +			__func__,
> > +			sizeof(struct gro_tcp_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +
> > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > +			__func__,
> > +			size,
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	tbl->max_item_num = entries_num;
> > +
> > +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
> > +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
> > +			__func__,
> > +			size, RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	tbl->max_flow_num = max_flow_num;
> > +	return tbl;
> > +}
> > +
> > +void gro_tcp_tbl_destroy(void *tbl)
> > +{
> > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > +
> > +	if (tcp_tbl) {
> > +		if (tcp_tbl->items)
> > +			rte_free(tcp_tbl->items);
> > +		if (tcp_tbl->flows)
> > +			rte_free(tcp_tbl->flows);
> > +		rte_free(tcp_tbl);
> > +	}
> > +}
> > +
> > +/* update TCP header and IPv4 header checksum */
> > +static void
> > +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
> > +{
> > +	uint32_t len, offset, cksum;
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	uint16_t ipv4_ihl, cksum_pld;
> > +
> > +	if (pkt == NULL)
> > +		return;
> > +
> > +	len = pkt->pkt_len;
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > +
> > +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
> > +	len -= offset;
> > +
> > +	/* TCP cksum without IP pseudo header */
> > +	ipv4_hdr->hdr_checksum = 0;
> > +	tcp_hdr->cksum = 0;
> > +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
> > +
> > +	/* IP pseudo header cksum */
> > +	cksum = cksum_pld;
> > +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> > +
> > +	/* combine TCP checksum and IP pseudo header checksum */
> > +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> > +	cksum = (~cksum) & 0xffff;
> > +	cksum = (cksum == 0) ? 0xffff : cksum;
> > +	tcp_hdr->cksum = cksum;
> > +
> > +	/* update IP header cksum */
> > +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > +}
> > +
> > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +	uint32_t item_num = tbl->item_num;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++) {
> > +		if (tbl->items[i].is_valid) {
> > +			item_num--;
> > +			if (tbl->items[i].is_groed)
> > +				gro_tcp4_cksum_update(tbl->items[i].pkt);
> > +		}
> > +		if (unlikely(item_num == 0))
> > +			break;
> > +	}
> > +}
> > +
> > +/**
> > + * merge two TCP/IPv4 packets without update header checksum.
> > + */
> > +static int
> > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > +		struct rte_mbuf *pkt,
> > +		struct gro_tcp_rule *rule)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > +	struct tcp_hdr *tcp_hdr1;
> > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > +	struct rte_mbuf *tail;
> > +
> > +	/* parse the given packet */
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				struct ether_hdr *) + 1);
> > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > +		- tcp_hl1;
> > +
> > +	/* parse the original packet */
> > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > +				struct ether_hdr *) + 1);
> > +
> > +	/* check reassembly rules */
> > +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
> > +		return -1;
> > +
> > +	/* remove the header of the incoming packet */
> > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > +			ipv4_ihl1 + tcp_hl1);
> > +
> > +	/* chain the two packet together */
> > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > +	tail->next = pkt;
> > +
> > +	/* update IP header */
> > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > +			rte_be_to_cpu_16(
> > +				ipv4_hdr2->total_length)
> > +			+ tcp_dl1);
> > +
> > +	/* update mbuf metadata for the merged packet */
> > +	pkt_src->nb_segs++;
> > +	pkt_src->pkt_len += pkt->pkt_len;
> > +	return 1;
> > +}
> > +
> > +static int
> > +check_seq_option(struct rte_mbuf *pkt,
> > +		struct tcp_hdr *tcp_hdr,
> > +		uint16_t tcp_hl)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr1;
> > +	struct tcp_hdr *tcp_hdr1;
> > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > +	uint32_t sent_seq1, sent_seq;
> > +	int ret = -1;
> > +
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				struct ether_hdr *) + 1);
> > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > +		- tcp_hl1;
> > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	/* check if the two packets are neighbor */
> > +	if ((sent_seq ^ sent_seq1) == 0) {
> > +		/* check if TCP option field equals */
> > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> > +			if ((tcp_hl1 != tcp_hl) ||
> > +					(memcmp(tcp_hdr1 + 1,
> > +							tcp_hdr + 1,
> > +							tcp_hl - sizeof
> > +							(struct tcp_hdr))
> > +					 == 0))
> > +				ret = 1;
> > +		}
> > +	}
> > +	return ret;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++)
> > +		if (tbl->items[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ITEM_INDEX;
> > +}
> > +
> > +static uint16_t
> > +find_an_empty_flow(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint16_t i;
> > +
> > +	for (i = 0; i < tbl->max_flow_num; i++)
> > +		if (tbl->flows[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_FLOW_INDEX;
> > +}
> > +
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp_tbl *tbl,
> > +		struct gro_tcp_rule *rule)
> > +{
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
> > +
> > +	struct gro_tcp_flow_key key;
> > +	uint64_t ol_flags;
> > +	uint32_t cur_idx, prev_idx, item_idx;
> > +	uint16_t i, flow_idx;
> > +
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > +
> > +	/* 1. check if the packet should be processed */
> > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > +		goto fail;
> > +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
> > +		goto fail;
> > +	if ((ipv4_hdr->fragment_offset &
> > +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
> > +			== 0)
> > +		goto fail;
> > +
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > +		- tcp_hl;
> > +	if (tcp_dl == 0)
> > +		goto fail;
> > +
> > +	/**
> > +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
> > +	 * checksum in SW. Then, check if the checksum is correct
> > +	 */
> > +	ol_flags = pkt->ol_flags;
> > +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
> > +			PKT_RX_IP_CKSUM_UNKNOWN) {
> > +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
> > +			goto fail;
> > +	} else {
> > +		ip_cksum = ipv4_hdr->hdr_checksum;
> > +		ipv4_hdr->hdr_checksum = 0;
> > +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
> > +			goto fail;
> > +	}
> > +
> > +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
> > +			PKT_RX_L4_CKSUM_UNKNOWN) {
> > +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
> > +			goto fail;
> > +	} else {
> > +		tcp_cksum = tcp_hdr->cksum;
> > +		tcp_hdr->cksum = 0;
> > +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
> > +			(ipv4_hdr, tcp_hdr);
> > +		if (tcp_hdr->cksum ^ tcp_cksum)
> > +			goto fail;
> > +	}
> > +
> > +	/**
> > +	 * 3. search for a flow and traverse all packets in the flow
> > +	 * to find one to merge with the given packet.
> > +	 */
> > +	key.eth_saddr = eth_hdr->s_addr;
> > +	key.eth_daddr = eth_hdr->d_addr;
> > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > +
> > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > +		/* search all packets in a valid flow. */
> > +		if (tbl->flows[i].is_valid &&
> > +				(memcmp(&(tbl->flows[i].key), &key,
> > +						sizeof(struct gro_tcp_flow_key))
> > +				 == 0)) {
> > +			cur_idx = tbl->flows[i].start_index;
> > +			prev_idx = cur_idx;
> > +			while (cur_idx != INVALID_ITEM_INDEX) {
> > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > +							tcp_hdr,
> > +							tcp_hl) > 0) {
> > +					if (merge_two_tcp4_packets(
> > +								tbl->items[cur_idx].pkt,
> > +								pkt,
> > +								rule) > 0) {
> > +						/* successfully merge two packets */
> > +						tbl->items[cur_idx].is_groed = 1;
> > +						return 1;
> > +					}
> > +					/**
> > +					 * fail to merge two packets since
> > +					 * break the rules, add the packet
> > +					 * into the flow.
> > +					 */
> > +					goto insert_to_existed_flow;
> > +				} else {
> > +					prev_idx = cur_idx;
> > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > +				}
> > +			}
> > +			/**
> > +			 * fail to merge the given packet into an existed flow,
> > +			 * add it into the flow.
> > +			 */
> > +insert_to_existed_flow:
> > +			item_idx = find_an_empty_item(tbl);
> > +			/* the item number is beyond the maximum value */
> > +			if (item_idx == INVALID_ITEM_INDEX)
> > +				return -1;
> > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > +			tbl->items[item_idx].pkt = pkt;
> > +			tbl->items[item_idx].is_groed = 0;
> > +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > +			tbl->items[item_idx].is_valid = 1;
> > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > +			tbl->item_num++;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/**
> > +	 * merge fail as the given packet is a new flow. Therefore,
> > +	 * insert a new flow.
> > +	 */
> > +	item_idx = find_an_empty_item(tbl);
> > +	flow_idx = find_an_empty_flow(tbl);
> > +	/**
> > +	 * if the flow or item number are beyond the maximum values,
> > +	 * the inputted packet won't be processed.
> > +	 */
> > +	if (item_idx == INVALID_ITEM_INDEX ||
> > +			flow_idx == INVALID_FLOW_INDEX)
> > +		return -1;
> > +	tbl->items[item_idx].pkt = pkt;
> > +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > +	tbl->items[item_idx].is_groed = 0;
> > +	tbl->items[item_idx].is_valid = 1;
> > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > +	tbl->item_num++;
> > +
> > +	memcpy(&(tbl->flows[flow_idx].key),
> > +			&key, sizeof(struct gro_tcp_flow_key));
> > +	tbl->flows[flow_idx].start_index = item_idx;
> > +	tbl->flows[flow_idx].is_valid = 1;
> > +	tbl->flow_num++;
> > +
> > +	return 0;
> > +fail:
> > +	return -1;
> > +}
> > +
> > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > +		uint16_t flush_num,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out)
> > +{
> > +	uint16_t num, k;
> > +	uint16_t i;
> > +	uint32_t j;
> > +
> > +	k = 0;
> > +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
> > +	num = num > nb_out ? nb_out : num;
> > +	if (unlikely(num == 0))
> > +		return 0;
> > +
> > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > +		if (tbl->flows[i].is_valid) {
> > +			j = tbl->flows[i].start_index;
> > +			while (j != INVALID_ITEM_INDEX) {
> > +				/* update checksum for GROed packet */
> > +				if (tbl->items[j].is_groed)
> > +					gro_tcp4_cksum_update(tbl->items[j].pkt);
> > +
> > +				out[k++] = tbl->items[j].pkt;
> > +				tbl->items[j].is_valid = 0;
> > +				tbl->item_num--;
> > +				j = tbl->items[j].next_pkt_idx;
> > +
> > +				if (k == num) {
> > +					/* delete the flow */
> > +					if (j == INVALID_ITEM_INDEX) {
> > +						tbl->flows[i].is_valid = 0;
> > +						tbl->flow_num--;
> > +					} else
> > +						/* update flow information */
> > +						tbl->flows[i].start_index = j;
> > +					goto end;
> > +				}
> > +			}
> > +			/* delete the flow, as all of its packets are flushed */
> > +			tbl->flows[i].is_valid = 0;
> > +			tbl->flow_num--;
> > +		}
> > +		if (tbl->flow_num == 0)
> > +			goto end;
> > +	}
> > +end:
> > +	return num;
> > +}
> > +
> > +uint16_t
> > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out)
> > +{
> > +	uint16_t k;
> > +	uint16_t i;
> > +	uint32_t j;
> > +	uint64_t current_time;
> > +
> > +	if (nb_out == 0)
> > +		return 0;
> > +	k = 0;
> > +	current_time = rte_rdtsc();
> > +
> > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > +		if (tbl->flows[i].is_valid) {
> > +			j = tbl->flows[i].start_index;
> > +			while (j != INVALID_ITEM_INDEX) {
> > +				if (current_time - tbl->items[j].start_time >=
> > +						timeout_cycles) {
> > +					/* update checksum for GROed packet */
> > +					if (tbl->items[j].is_groed)
> > +						gro_tcp4_cksum_update(tbl->items[j].pkt);
> > +
> > +					out[k++] = tbl->items[j].pkt;
> > +					tbl->items[j].is_valid = 0;
> > +					tbl->item_num--;
> > +					j = tbl->items[j].next_pkt_idx;
> > +
> > +					if (k == nb_out &&
> > +							j == INVALID_ITEM_INDEX) {
> > +						/* delete the flow */
> > +						tbl->flows[i].is_valid = 0;
> > +						tbl->flow_num--;
> > +						goto end;
> > +					} else if (k == nb_out &&
> > +							j != INVALID_ITEM_INDEX) {
> > +						tbl->flows[i].start_index = j;
> > +						goto end;
> > +					}
> > +				}
> > +			}
> > +			/* delete the flow, as all of its packets are flushed */
> > +			tbl->flows[i].is_valid = 0;
> > +			tbl->flow_num--;
> > +		}
> > +		if (tbl->flow_num == 0)
> > +			goto end;
> > +	}
> > +end:
> > +	return k;
> > +}
> > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > new file mode 100644
> > index 0000000..551efc4
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp.h
> > @@ -0,0 +1,210 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_TCP_H_
> > +#define _RTE_GRO_TCP_H_
> > +
> > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > +#define TCP_HDR_LEN(tcph) \
> > +	((tcph->data_off >> 4) * 4)
> > +#define IPv4_HDR_LEN(iph) \
> > +	((iph->version_ihl & 0x0f) * 4)
> > +#else
> > +#define TCP_DATAOFF_MASK 0x0f
> > +#define TCP_HDR_LEN(tcph) \
> > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > +#define IPv4_HDR_LEN(iph) \
> > +	((iph->version_ihl >> 4) * 4)
> > +#endif
> > +
> > +#define IPV4_HDR_DF_SHIFT 14
> > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > +
> > +#define INVALID_FLOW_INDEX 0xffffU
> > +#define INVALID_ITEM_INDEX 0xffffffffUL
> > +
> > +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
> > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > +
> > +/* criteria of mergeing packets */
> > +struct gro_tcp_flow_key {
> > +	struct ether_addr eth_saddr;
> > +	struct ether_addr eth_daddr;
> > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > +	uint32_t ip_dst_addr[4];
> > +
> > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > +	uint16_t src_port;
> > +	uint16_t dst_port;
> > +	uint8_t tcp_flags;	/**< TCP flags. */
> > +};
> > +
> > +struct gro_tcp_flow {
> > +	struct gro_tcp_flow_key key;
> > +	uint32_t start_index;	/**< the first packet index of the flow */
> > +	uint8_t is_valid;
> > +};
> > +
> > +struct gro_tcp_item {
> > +	struct rte_mbuf *pkt;	/**< packet address. */
> > +	/* the time when the packet in added into the table */
> > +	uint64_t start_time;
> > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > +	/* flag to indicate if the packet is GROed */
> > +	uint8_t is_groed;
> > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> > +};
> > +
> > +/**
> > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > + * structure.
> > + */
> > +struct gro_tcp_tbl {
> > +	struct gro_tcp_item *items;	/**< item array */
> > +	struct gro_tcp_flow *flows;	/**< flow array */
> > +	uint32_t item_num;	/**< current item number */
> > +	uint16_t flow_num;	/**< current flow num */
> > +	uint32_t max_item_num;	/**< item array size */
> > +	uint16_t max_flow_num;	/**< flow array size */
> > +};
> > +
> > +/* rules to reassemble TCP packets, which are decided by applications */
> > +struct gro_tcp_rule {
> > +	/* the maximum packet length after merged */
> > +	uint32_t max_packet_size;
> > +};
> 
> Are there any other rules? If not, I prefer to use max_packet_size directly.

If we agree to use a flag to indicate if check checksum, this structure should
be used to keep this flag.

> 
> > +
> > +/**
> > + * This function is to update TCP and IPv4 header checksums
> > + * for merged packets in the TCP reassembly table.
> > + */
> > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
> > +
> > +/**
> > + * This function creates a TCP reassembly table.
> > + *
> > + * @param socket_id
> > + *  socket index where the Ethernet port connects to.
> > + * @param max_flow_num
> > + *  the maximum number of flows in the TCP GRO table
> > + * @param max_item_per_flow
> > + *  the maximum packet number per flow.
> > + * @return
> > + *  if create successfully, return a pointer which points to the
> > + *  created TCP GRO table. Otherwise, return NULL.
> > + */
> > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> > +
> > +/**
> > + * This function destroys a TCP reassembly table.
> > + * @param tbl
> > + *  a pointer points to the TCP reassembly table.
> > + */
> > +void gro_tcp_tbl_destroy(void *tbl);
> > +
> > +/**
> > + * This function searches for a packet in the TCP reassembly table to
> > + * merge with the inputted one. To merge two packets is to chain them
> > + * together and update packet headers. Note that this function won't
> > + * re-calculate IPv4 and TCP checksums.
> > + *
> > + * If the packet doesn't have data, or with wrong checksums, or is
> > + * fragmented etc., errors happen and gro_tcp4_reassemble returns
> > + * immediately. If no errors happen, the packet is either merged, or
> > + * inserted into the reassembly table.
> > + *
> > + * If applications want to get packets in the reassembly table, they
> > + * need to manually flush the packets.
> > + *
> > + * @param pkt
> > + *  packet to reassemble.
> > + * @param tbl
> > + *  a pointer that points to a TCP reassembly table.
> > + * @param rule
> > + *  TCP reassembly criteria defined by applications.
> > + * @return
> > + *  if the inputted packet is merged successfully, return an positive
> > + *  value. If the packet hasn't be merged with any packets in the TCP
> > + *  reassembly table. If errors happen, return a negative value and the
> > + *  packet won't be inserted into the reassemble table.
> > + */
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp_tbl *tbl,
> > +		struct gro_tcp_rule *rule);
> > +
> > +/**
> > + * This function flushes the packets in a TCP reassembly table to
> > + * applications. Before returning the packets, it will update TCP and
> > + * IPv4 header checksums.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param flush_num
> > + *  the number of packets that applications want to flush.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets.
> > + * @param nb_out
> > + *  the maximum element number of out.
> > + * @return
> > + *  the number of packets that are flushed finally.
> > + */
> > +uint16_t
> > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > +		uint16_t flush_num,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out);
> > +
> > +/**
> > + * This function flushes timeout packets in a TCP reassembly table to
> > + * applications. Before returning the packets, it updates TCP and IPv4
> > + * header checksums.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param timeout_cycles
> > + *  the maximum time that packets can stay in the table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets.
> > + * @param nb_out
> > + *  the maximum element number of out.
> > + * @return
> > + *  It returns the number of packets that are flushed finally.
> > + */
> > +uint16_t
> > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out);
> > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-20  3:22             ` Jiayu Hu
@ 2017-06-20 15:15               ` Ananyev, Konstantin
  2017-06-20 16:16                 ` Jiayu Hu
  2017-06-20 15:21               ` Ananyev, Konstantin
  2017-06-20 23:30               ` Tan, Jianfeng
  2 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-06-20 15:15 UTC (permalink / raw)
  To: Hu, Jiayu, Tan, Jianfeng; +Cc: dev, yliu, Wiles, Keith, Bie, Tiwei, Yao, Lei A

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Tuesday, June 20, 2017 4:22 AM
> To: Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; yliu@fridaylinux.org; Wiles, Keith <keith.wiles@intel.com>; Bie,
> Tiwei <tiwei.bie@intel.com>; Yao, Lei A <lei.a.yao@intel.com>
> Subject: Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> Hi Jianfeng,
> 
> On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> >
> >
> > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > > In this patch, we introduce six APIs to support TCP/IPv4 GRO.
> >
> > Those functions are not used outside of this library. Don't make it as
> > extern visible.
> 
> But they are called by functions in rte_gro.c, which are in the different
> file. If we define these functions with static, how can they be called by
> other functions in the different file?
> 
> >
> > > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> > >      merge packets.
> >
> > Will tcp6 shares the same function with tcp4? If no, please rename it to
> > gro_tcp4_tlb_create
> 
> In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
> they will have different reassembly function. Therefore, I use
> gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.
> 
> >
> > > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > > - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
> > > - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
> > >      reassembly table.
> > > - gro_tcp4_reassemble: merge an inputted packet.
> > > - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
> > >      all merged packets in the TCP reassembly table.
> > >
> > > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > > structure. A TCP reassembly table includes a flow array and a item array,
> > > where the flow array is used to record flow information and the item
> > > array is used to record packets information.
> > >
> > > Each element in the flow array records the information of one flow,
> > > which includes two parts:
> > > - key: the criteria of the same flow. If packets have the same key
> > >      value, they belong to the same flow.
> > > - start_index: the index of the first incoming packet of this flow in
> > >      the item array. With start_index, we can locate the first incoming
> > >      packet of this flow.
> > > Each element in the item array records one packet information. It mainly
> > > includes two parts:
> > > - pkt: packet address
> > > - next_pkt_index: index of the next packet of the same flow in the item
> > >      array. All packets of the same flow are chained by next_pkt_index.
> > >      With next_pkt_index, we can locate all packets of the same flow
> > >      one by one.
> > >
> > > To process an incoming packet, we need three steps:
> > > a. check if the packet should be processed. Packets with the following
> > >      properties won't be processed:
> > > 	- packets without data;
> > > 	- packets with wrong checksums;
> >
> > Why do we care to check this kind of error? Can we just suppose the
> > applications have already dropped the packets with wrong cksum?
> 
> Indeed, if we assume all inputted packets are correct, we can avoid
> checksum checking overhead. But as a library, I think a more flexible
> way is to enable applications to tell GRO API if checksum checking
> is needed. For example, we can add a flag to struct rte_gro_tbl
> and struct rte_gro_param, which indicates if the checksum checking
> is needed. If applications set this flag, reassembly function won't
> check packet checksum. Otherwise, we check the checksum. How do you
> think?
> 
> >
> > > 	- fragmented packets.
> >
> > IP fragmented? I don't think we need to check it here either. It's the
> > application's responsibility to call librte_ip_frag firstly to reassemble
> > IP-fragmented packets, and then call this gro library to merge TCP packets.
> > And this procedure should be shown in an example for other users to refer.
> >
> > > b. traverse the flow array to find a flow which the packet belongs to.
> > >      If not find, insert a new flow and store the packet into the item
> > >      array.
> >
> > You do not store the packet now. "store the packet into the item array" ->
> > "then go to step c".
> 
> Thanks, I will update it in next patch.
> 
> >
> > > c. locate the first packet of this flow in the item array via
> > >      start_index. Then traverse all packets of this flow one by one via
> > >      next_pkt_index. If find one packet to merge with the incoming packet,
> > >      merge them but without updating checksums. If not, allocate one item
> > >      in the item array to store the incoming packet and update
> > >      next_pkt_index value.
> > >
> > > For better performance, we don't update header checksums once two
> > > packets are merged. The header checksums are updated only when packets
> > > are flushed from TCP reassembly tables.
> >
> > Why do we care to recalculate the L4 checksum when flushing? How about Just
> > keeping the wrong cksum, and letting the applications to handle that?
> 
> Not all applications want GROed packets with wrong checksum. So I think a
> more reasonable way is to give a flag to applications to tell GRO API if
> they need calculate checksum when flush them from GRO table. How do you
> think?

I didn't look closely into the latest patch yet, but would echo Jianfeng comment:
I think TCP cksum calculation/validation should be out of scope of that library.
User can do that before/after doing the actual GRO, or it might be done in HW.
Also I'd suggest that inside the library we might add expectation that RX cksum flags plus
ptype and l2/l3/l4 hdr len fields inside mbuf are already setup correctly by the caller. 
Konstantin

> 
> >
> >
> > >
> > > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > > ---
> > >   lib/librte_gro/Makefile      |   1 +
> > >   lib/librte_gro/rte_gro.c     | 154 +++++++++++--
> > >   lib/librte_gro/rte_gro.h     |  34 +--
> > >   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> > >   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> > >   5 files changed, 895 insertions(+), 31 deletions(-)
> > >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> > >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > >
> > > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > > index 9f4063a..3495dfc 100644
> > > --- a/lib/librte_gro/Makefile
> > > +++ b/lib/librte_gro/Makefile
> > > @@ -43,6 +43,7 @@ LIBABIVER := 1
> > >   # source files
> > >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> >
> > Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.
> 
> TCP4 and TCP6 reassembly functions will be placed in the same file,
> rte_gro_tcp.c. But currently, we don't support TCP6 GRO.
> 
> >
> > >   # install this header file
> > >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > index 1bc53a2..2620ef6 100644
> > > --- a/lib/librte_gro/rte_gro.c
> > > +++ b/lib/librte_gro/rte_gro.c
> > > @@ -32,11 +32,17 @@
> > >   #include <rte_malloc.h>
> > >   #include <rte_mbuf.h>
> > > +#include <rte_ethdev.h>
> > > +#include <rte_ip.h>
> > > +#include <rte_tcp.h>
> > >   #include "rte_gro.h"
> > > +#include "rte_gro_tcp.h"
> > > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> > > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> > > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> > > +	gro_tcp_tbl_create, NULL};
> > > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> > > +	gro_tcp_tbl_destroy, NULL};
> > >   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> > >   		uint16_t max_flow_num,
> > > @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > >   }
> > >   uint16_t
> > > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > >   		const uint16_t nb_pkts,
> > > -		const struct rte_gro_param param __rte_unused)
> > > +		const struct rte_gro_param param)
> > >   {
> > > -	return nb_pkts;
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t l3proc_type, i;
> >
> > I did not catch the variable definition here: l3proc_type -> l3_proto?
> 
> You can see it in line 158 and line 159.
> 
> >
> > > +	uint16_t nb_after_gro = nb_pkts;
> > > +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
> > > +		nb_pkts : param.max_flow_num;
> > > +	uint32_t item_num = nb_pkts <
> > > +		flow_num * param.max_item_per_flow ?
> > > +		nb_pkts :
> > > +		flow_num * param.max_item_per_flow;
> > > +
> > > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > > +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
> > > +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
> >
> > Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
> > comment, we iterate all ptypes of this packets to iterate all supported GRO
> > engine.
> 
> Sorry, I don't get the point. The table which is created here is used by
> gro_tcp4_reassemble when merges packets. If we don't create table here,
> what does gro_tcp4_reassemble use to merge packets?
> 
> >
> > > +	struct gro_tcp_tbl tcp_tbl;
> > > +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
> > > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > > +	struct gro_tcp_rule tcp_rule;
> > > +
> > > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > > +	uint16_t unprocess_num = 0;
> > > +	int32_t ret;
> > > +
> > > +	if (unlikely(nb_pkts <= 1))
> > > +		return nb_pkts;
> > > +
> > > +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
> > > +			tcp_flow_num);
> > > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > > +			tcp_item_num);
> > > +	tcp_tbl.flows = tcp_flows;
> > > +	tcp_tbl.items = tcp_items;
> > > +	tcp_tbl.flow_num = 0;
> > > +	tcp_tbl.item_num = 0;
> > > +	tcp_tbl.max_flow_num = tcp_flow_num;
> > > +	tcp_tbl.max_item_num = tcp_item_num;
> > > +	tcp_rule.max_packet_size = param.max_packet_size;
> > > +
> > > +	for (i = 0; i < nb_pkts; i++) {
> > > +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
> > > +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > +		if (l3proc_type == ETHER_TYPE_IPv4) {
> > > +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > +					(param.desired_gro_types &
> > > +					 GRO_TCP_IPV4)) {
> > > +				ret = gro_tcp4_reassemble(pkts[i],
> > > +						&tcp_tbl,
> > > +						&tcp_rule);
> > > +				if (ret > 0)
> > > +					nb_after_gro--;
> > > +				else if (ret < 0)
> > > +					unprocess_pkts[unprocess_num++] =
> > > +						pkts[i];
> > > +			} else
> > > +				unprocess_pkts[unprocess_num++] =
> > > +					pkts[i];
> > > +		} else
> > > +			unprocess_pkts[unprocess_num++] =
> > > +				pkts[i];
> > > +	}
> > > +
> > > +	if (nb_after_gro < nb_pkts) {
> > > +		/* update packets headers and re-arrange GROed packets */
> > > +		if (param.desired_gro_types & GRO_TCP_IPV4) {
> > > +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
> > > +			for (i = 0; i < tcp_tbl.item_num; i++)
> > > +				pkts[i] = tcp_tbl.items[i].pkt;
> > > +		}
> > > +		if (unprocess_num > 0) {
> > > +			memcpy(&pkts[i], unprocess_pkts,
> > > +					sizeof(struct rte_mbuf *) *
> > > +					unprocess_num);
> > > +			i += unprocess_num;
> > > +		}
> > > +		if (nb_pkts > i)
> > > +			memset(&pkts[i], 0,
> > > +					sizeof(struct rte_mbuf *) *
> > > +					(nb_pkts - i));
> > > +	}
> > > +	return nb_after_gro;
> > >   }
> > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > +		struct rte_gro_tbl *gro_tbl)
> > >   {
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t l3proc_type;
> > > +	struct gro_tcp_rule tcp_rule;
> > > +
> > > +	if (pkt == NULL)
> > > +		return -1;
> > > +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > +	if (l3proc_type == ETHER_TYPE_IPv4) {
> > > +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
> > > +			return gro_tcp4_reassemble(pkt,
> > > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > +					&tcp_rule);
> > > +		}
> > > +	}
> > >   	return -1;
> > >   }
> > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		uint16_t flush_num __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused)
> > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out)
> > >   {
> >
> > Ditto.
> >
> > > +	desired_gro_types = desired_gro_types &
> > > +		gro_tbl->desired_gro_types;
> > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > +		return gro_tcp_tbl_flush(
> > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > +				flush_num,
> > > +				out,
> > > +				max_nb_out);
> > >   	return 0;
> > >   }
> > >   uint16_t
> > > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused)
> > > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out)
> > >   {
> > > +	desired_gro_types = desired_gro_types &
> > > +		gro_tbl->desired_gro_types;
> > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > +		return gro_tcp_tbl_timeout_flush(
> > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > +				gro_tbl->max_timeout_cycles,
> > > +				out, max_nb_out);
> > >   	return 0;
> > >   }
> > > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > > index 67bd90d..e26aa5b 100644
> > > --- a/lib/librte_gro/rte_gro.h
> > > +++ b/lib/librte_gro/rte_gro.h
> > > @@ -35,7 +35,11 @@
> > >   /* maximum number of supported GRO types */
> > >   #define GRO_TYPE_MAX_NB 64
> > > -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> > > +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> > > +
> > > +/* TCP/IPv4 GRO flag */
> > > +#define GRO_TCP_IPV4_INDEX 0
> > > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> > >   /**
> > >    * GRO table structure. DPDK GRO uses GRO table to reassemble
> > > @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> > >    * @return
> > >    *  the number of packets after GROed.
> > >    */
> > > -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > -		const uint16_t nb_pkts __rte_unused,
> > > -		const struct rte_gro_param param __rte_unused);
> > > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > > +		const uint16_t nb_pkts,
> > > +		const struct rte_gro_param param);
> > >   /**
> > >    * This is the main reassembly API used in heavyweight mode, which
> > > @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > >    *  if merge the packet successfully, return a positive value. If fail
> > >    *  to merge, return zero. If errors happen, return a negative value.
> > >    */
> > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > -		struct rte_gro_tbl *gro_tbl __rte_unused);
> > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > +		struct rte_gro_tbl *gro_tbl);
> > >   /**
> > >    * This function flushed packets of desired GRO types from their
> > > @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > >    * @return
> > >    *  the number of flushed packets. If no packets are flushed, return 0.
> > >    */
> > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		uint16_t flush_num __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused);
> > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out);
> > >   /**
> > >    * This function flushes the timeout packets from reassembly tables of
> > > @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > >    * @return
> > >    *  the number of flushed packets. If no packets are flushed, return 0.
> > >    */
> > > -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused);
> > > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out);
> >
> > Do you have any cases to test this API? I don't see following example use
> > this API. That means we are exposing an API that are never tested. I don't
> > know if we can add some experimental flag on this API. Let's seek advice
> > from others.
> 
> These flush APIs are used in heavyweight mode. But testpmd is not a good case
> to use heavyweight mode. How do you think if we use some unit tests to test
> them?
> 
> >
> > >   #endif
> > > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > > new file mode 100644
> > > index 0000000..86743cd
> > > --- /dev/null
> > > +++ b/lib/librte_gro/rte_gro_tcp.c
> > > @@ -0,0 +1,527 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + *     * Redistributions of source code must retain the above copyright
> > > + *       notice, this list of conditions and the following disclaimer.
> > > + *     * Redistributions in binary form must reproduce the above copyright
> > > + *       notice, this list of conditions and the following disclaimer in
> > > + *       the documentation and/or other materials provided with the
> > > + *       distribution.
> > > + *     * Neither the name of Intel Corporation nor the names of its
> > > + *       contributors may be used to endorse or promote products derived
> > > + *       from this software without specific prior written permission.
> > > + *
> > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > + */
> > > +
> > > +#include <rte_malloc.h>
> > > +#include <rte_mbuf.h>
> > > +#include <rte_cycles.h>
> > > +
> > > +#include <rte_ethdev.h>
> > > +#include <rte_ip.h>
> > > +#include <rte_tcp.h>
> > > +
> > > +#include "rte_gro_tcp.h"
> > > +
> > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> >
> > Define it as "static". Similar to other functions.
> >
> > > +		uint16_t max_flow_num,
> > > +		uint16_t max_item_per_flow)
> > > +{
> > > +	size_t size;
> > > +	uint32_t entries_num;
> > > +	struct gro_tcp_tbl *tbl;
> > > +
> > > +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
> > > +
> > > +	entries_num = max_flow_num * max_item_per_flow;
> > > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > > +
> > > +	if (entries_num == 0 || max_flow_num == 0)
> > > +		return NULL;
> > > +
> > > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > > +			__func__,
> > > +			sizeof(struct gro_tcp_tbl),
> > > +			RTE_CACHE_LINE_SIZE,
> > > +			socket_id);
> > > +
> > > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > > +			__func__,
> > > +			size,
> > > +			RTE_CACHE_LINE_SIZE,
> > > +			socket_id);
> > > +	tbl->max_item_num = entries_num;
> > > +
> > > +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
> > > +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
> > > +			__func__,
> > > +			size, RTE_CACHE_LINE_SIZE,
> > > +			socket_id);
> > > +	tbl->max_flow_num = max_flow_num;
> > > +	return tbl;
> > > +}
> > > +
> > > +void gro_tcp_tbl_destroy(void *tbl)
> > > +{
> > > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > > +
> > > +	if (tcp_tbl) {
> > > +		if (tcp_tbl->items)
> > > +			rte_free(tcp_tbl->items);
> > > +		if (tcp_tbl->flows)
> > > +			rte_free(tcp_tbl->flows);
> > > +		rte_free(tcp_tbl);
> > > +	}
> > > +}
> > > +
> > > +/* update TCP header and IPv4 header checksum */
> > > +static void
> > > +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
> > > +{
> > > +	uint32_t len, offset, cksum;
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	uint16_t ipv4_ihl, cksum_pld;
> > > +
> > > +	if (pkt == NULL)
> > > +		return;
> > > +
> > > +	len = pkt->pkt_len;
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > +
> > > +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
> > > +	len -= offset;
> > > +
> > > +	/* TCP cksum without IP pseudo header */
> > > +	ipv4_hdr->hdr_checksum = 0;
> > > +	tcp_hdr->cksum = 0;
> > > +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
> > > +
> > > +	/* IP pseudo header cksum */
> > > +	cksum = cksum_pld;
> > > +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> > > +
> > > +	/* combine TCP checksum and IP pseudo header checksum */
> > > +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> > > +	cksum = (~cksum) & 0xffff;
> > > +	cksum = (cksum == 0) ? 0xffff : cksum;
> > > +	tcp_hdr->cksum = cksum;
> > > +
> > > +	/* update IP header cksum */
> > > +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > +}
> > > +
> > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
> > > +{
> > > +	uint32_t i;
> > > +	uint32_t item_num = tbl->item_num;
> > > +
> > > +	for (i = 0; i < tbl->max_item_num; i++) {
> > > +		if (tbl->items[i].is_valid) {
> > > +			item_num--;
> > > +			if (tbl->items[i].is_groed)
> > > +				gro_tcp4_cksum_update(tbl->items[i].pkt);
> > > +		}
> > > +		if (unlikely(item_num == 0))
> > > +			break;
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * merge two TCP/IPv4 packets without update header checksum.
> > > + */
> > > +static int
> > > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > > +		struct rte_mbuf *pkt,
> > > +		struct gro_tcp_rule *rule)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > > +	struct tcp_hdr *tcp_hdr1;
> > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > +	struct rte_mbuf *tail;
> > > +
> > > +	/* parse the given packet */
> > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > +				struct ether_hdr *) + 1);
> > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > +		- tcp_hl1;
> > > +
> > > +	/* parse the original packet */
> > > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > > +				struct ether_hdr *) + 1);
> > > +
> > > +	/* check reassembly rules */
> > > +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
> > > +		return -1;
> > > +
> > > +	/* remove the header of the incoming packet */
> > > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > > +			ipv4_ihl1 + tcp_hl1);
> > > +
> > > +	/* chain the two packet together */
> > > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > > +	tail->next = pkt;
> > > +
> > > +	/* update IP header */
> > > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > > +			rte_be_to_cpu_16(
> > > +				ipv4_hdr2->total_length)
> > > +			+ tcp_dl1);
> > > +
> > > +	/* update mbuf metadata for the merged packet */
> > > +	pkt_src->nb_segs++;
> > > +	pkt_src->pkt_len += pkt->pkt_len;
> > > +	return 1;
> > > +}
> > > +
> > > +static int
> > > +check_seq_option(struct rte_mbuf *pkt,
> > > +		struct tcp_hdr *tcp_hdr,
> > > +		uint16_t tcp_hl)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr1;
> > > +	struct tcp_hdr *tcp_hdr1;
> > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > +	uint32_t sent_seq1, sent_seq;
> > > +	int ret = -1;
> > > +
> > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > +				struct ether_hdr *) + 1);
> > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > +		- tcp_hl1;
> > > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > > +
> > > +	/* check if the two packets are neighbor */
> > > +	if ((sent_seq ^ sent_seq1) == 0) {
> > > +		/* check if TCP option field equals */
> > > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> > > +			if ((tcp_hl1 != tcp_hl) ||
> > > +					(memcmp(tcp_hdr1 + 1,
> > > +							tcp_hdr + 1,
> > > +							tcp_hl - sizeof
> > > +							(struct tcp_hdr))
> > > +					 == 0))
> > > +				ret = 1;
> > > +		}
> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +static uint32_t
> > > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > > +{
> > > +	uint32_t i;
> > > +
> > > +	for (i = 0; i < tbl->max_item_num; i++)
> > > +		if (tbl->items[i].is_valid == 0)
> > > +			return i;
> > > +	return INVALID_ITEM_INDEX;
> > > +}
> > > +
> > > +static uint16_t
> > > +find_an_empty_flow(struct gro_tcp_tbl *tbl)
> > > +{
> > > +	uint16_t i;
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++)
> > > +		if (tbl->flows[i].is_valid == 0)
> > > +			return i;
> > > +	return INVALID_FLOW_INDEX;
> > > +}
> > > +
> > > +int32_t
> > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > +		struct gro_tcp_tbl *tbl,
> > > +		struct gro_tcp_rule *rule)
> > > +{
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
> > > +
> > > +	struct gro_tcp_flow_key key;
> > > +	uint64_t ol_flags;
> > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > +	uint16_t i, flow_idx;
> > > +
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > +
> > > +	/* 1. check if the packet should be processed */
> > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > +		goto fail;
> > > +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
> > > +		goto fail;
> > > +	if ((ipv4_hdr->fragment_offset &
> > > +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
> > > +			== 0)
> > > +		goto fail;
> > > +
> > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > +		- tcp_hl;
> > > +	if (tcp_dl == 0)
> > > +		goto fail;
> > > +
> > > +	/**
> > > +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
> > > +	 * checksum in SW. Then, check if the checksum is correct
> > > +	 */
> > > +	ol_flags = pkt->ol_flags;
> > > +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
> > > +			PKT_RX_IP_CKSUM_UNKNOWN) {
> > > +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
> > > +			goto fail;
> > > +	} else {
> > > +		ip_cksum = ipv4_hdr->hdr_checksum;
> > > +		ipv4_hdr->hdr_checksum = 0;
> > > +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
> > > +			goto fail;
> > > +	}
> > > +
> > > +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
> > > +			PKT_RX_L4_CKSUM_UNKNOWN) {
> > > +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
> > > +			goto fail;
> > > +	} else {
> > > +		tcp_cksum = tcp_hdr->cksum;
> > > +		tcp_hdr->cksum = 0;
> > > +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
> > > +			(ipv4_hdr, tcp_hdr);
> > > +		if (tcp_hdr->cksum ^ tcp_cksum)
> > > +			goto fail;
> > > +	}
> > > +
> > > +	/**
> > > +	 * 3. search for a flow and traverse all packets in the flow
> > > +	 * to find one to merge with the given packet.
> > > +	 */
> > > +	key.eth_saddr = eth_hdr->s_addr;
> > > +	key.eth_daddr = eth_hdr->d_addr;
> > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > +		/* search all packets in a valid flow. */
> > > +		if (tbl->flows[i].is_valid &&
> > > +				(memcmp(&(tbl->flows[i].key), &key,
> > > +						sizeof(struct gro_tcp_flow_key))
> > > +				 == 0)) {
> > > +			cur_idx = tbl->flows[i].start_index;
> > > +			prev_idx = cur_idx;
> > > +			while (cur_idx != INVALID_ITEM_INDEX) {
> > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > +							tcp_hdr,
> > > +							tcp_hl) > 0) {
> > > +					if (merge_two_tcp4_packets(
> > > +								tbl->items[cur_idx].pkt,
> > > +								pkt,
> > > +								rule) > 0) {
> > > +						/* successfully merge two packets */
> > > +						tbl->items[cur_idx].is_groed = 1;
> > > +						return 1;
> > > +					}
> > > +					/**
> > > +					 * fail to merge two packets since
> > > +					 * break the rules, add the packet
> > > +					 * into the flow.
> > > +					 */
> > > +					goto insert_to_existed_flow;
> > > +				} else {
> > > +					prev_idx = cur_idx;
> > > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > > +				}
> > > +			}
> > > +			/**
> > > +			 * fail to merge the given packet into an existed flow,
> > > +			 * add it into the flow.
> > > +			 */
> > > +insert_to_existed_flow:
> > > +			item_idx = find_an_empty_item(tbl);
> > > +			/* the item number is beyond the maximum value */
> > > +			if (item_idx == INVALID_ITEM_INDEX)
> > > +				return -1;
> > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > +			tbl->items[item_idx].pkt = pkt;
> > > +			tbl->items[item_idx].is_groed = 0;
> > > +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > +			tbl->items[item_idx].is_valid = 1;
> > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > +			tbl->item_num++;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	/**
> > > +	 * merge fail as the given packet is a new flow. Therefore,
> > > +	 * insert a new flow.
> > > +	 */
> > > +	item_idx = find_an_empty_item(tbl);
> > > +	flow_idx = find_an_empty_flow(tbl);
> > > +	/**
> > > +	 * if the flow or item number are beyond the maximum values,
> > > +	 * the inputted packet won't be processed.
> > > +	 */
> > > +	if (item_idx == INVALID_ITEM_INDEX ||
> > > +			flow_idx == INVALID_FLOW_INDEX)
> > > +		return -1;
> > > +	tbl->items[item_idx].pkt = pkt;
> > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > +	tbl->items[item_idx].is_groed = 0;
> > > +	tbl->items[item_idx].is_valid = 1;
> > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > > +	tbl->item_num++;
> > > +
> > > +	memcpy(&(tbl->flows[flow_idx].key),
> > > +			&key, sizeof(struct gro_tcp_flow_key));
> > > +	tbl->flows[flow_idx].start_index = item_idx;
> > > +	tbl->flows[flow_idx].is_valid = 1;
> > > +	tbl->flow_num++;
> > > +
> > > +	return 0;
> > > +fail:
> > > +	return -1;
> > > +}
> > > +
> > > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out)
> > > +{
> > > +	uint16_t num, k;
> > > +	uint16_t i;
> > > +	uint32_t j;
> > > +
> > > +	k = 0;
> > > +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
> > > +	num = num > nb_out ? nb_out : num;
> > > +	if (unlikely(num == 0))
> > > +		return 0;
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > +		if (tbl->flows[i].is_valid) {
> > > +			j = tbl->flows[i].start_index;
> > > +			while (j != INVALID_ITEM_INDEX) {
> > > +				/* update checksum for GROed packet */
> > > +				if (tbl->items[j].is_groed)
> > > +					gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > +
> > > +				out[k++] = tbl->items[j].pkt;
> > > +				tbl->items[j].is_valid = 0;
> > > +				tbl->item_num--;
> > > +				j = tbl->items[j].next_pkt_idx;
> > > +
> > > +				if (k == num) {
> > > +					/* delete the flow */
> > > +					if (j == INVALID_ITEM_INDEX) {
> > > +						tbl->flows[i].is_valid = 0;
> > > +						tbl->flow_num--;
> > > +					} else
> > > +						/* update flow information */
> > > +						tbl->flows[i].start_index = j;
> > > +					goto end;
> > > +				}
> > > +			}
> > > +			/* delete the flow, as all of its packets are flushed */
> > > +			tbl->flows[i].is_valid = 0;
> > > +			tbl->flow_num--;
> > > +		}
> > > +		if (tbl->flow_num == 0)
> > > +			goto end;
> > > +	}
> > > +end:
> > > +	return num;
> > > +}
> > > +
> > > +uint16_t
> > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > +		uint64_t timeout_cycles,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out)
> > > +{
> > > +	uint16_t k;
> > > +	uint16_t i;
> > > +	uint32_t j;
> > > +	uint64_t current_time;
> > > +
> > > +	if (nb_out == 0)
> > > +		return 0;
> > > +	k = 0;
> > > +	current_time = rte_rdtsc();
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > +		if (tbl->flows[i].is_valid) {
> > > +			j = tbl->flows[i].start_index;
> > > +			while (j != INVALID_ITEM_INDEX) {
> > > +				if (current_time - tbl->items[j].start_time >=
> > > +						timeout_cycles) {
> > > +					/* update checksum for GROed packet */
> > > +					if (tbl->items[j].is_groed)
> > > +						gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > +
> > > +					out[k++] = tbl->items[j].pkt;
> > > +					tbl->items[j].is_valid = 0;
> > > +					tbl->item_num--;
> > > +					j = tbl->items[j].next_pkt_idx;
> > > +
> > > +					if (k == nb_out &&
> > > +							j == INVALID_ITEM_INDEX) {
> > > +						/* delete the flow */
> > > +						tbl->flows[i].is_valid = 0;
> > > +						tbl->flow_num--;
> > > +						goto end;
> > > +					} else if (k == nb_out &&
> > > +							j != INVALID_ITEM_INDEX) {
> > > +						tbl->flows[i].start_index = j;
> > > +						goto end;
> > > +					}
> > > +				}
> > > +			}
> > > +			/* delete the flow, as all of its packets are flushed */
> > > +			tbl->flows[i].is_valid = 0;
> > > +			tbl->flow_num--;
> > > +		}
> > > +		if (tbl->flow_num == 0)
> > > +			goto end;
> > > +	}
> > > +end:
> > > +	return k;
> > > +}
> > > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > > new file mode 100644
> > > index 0000000..551efc4
> > > --- /dev/null
> > > +++ b/lib/librte_gro/rte_gro_tcp.h
> > > @@ -0,0 +1,210 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + *     * Redistributions of source code must retain the above copyright
> > > + *       notice, this list of conditions and the following disclaimer.
> > > + *     * Redistributions in binary form must reproduce the above copyright
> > > + *       notice, this list of conditions and the following disclaimer in
> > > + *       the documentation and/or other materials provided with the
> > > + *       distribution.
> > > + *     * Neither the name of Intel Corporation nor the names of its
> > > + *       contributors may be used to endorse or promote products derived
> > > + *       from this software without specific prior written permission.
> > > + *
> > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > + */
> > > +
> > > +#ifndef _RTE_GRO_TCP_H_
> > > +#define _RTE_GRO_TCP_H_
> > > +
> > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > +#define TCP_HDR_LEN(tcph) \
> > > +	((tcph->data_off >> 4) * 4)
> > > +#define IPv4_HDR_LEN(iph) \
> > > +	((iph->version_ihl & 0x0f) * 4)
> > > +#else
> > > +#define TCP_DATAOFF_MASK 0x0f
> > > +#define TCP_HDR_LEN(tcph) \
> > > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > > +#define IPv4_HDR_LEN(iph) \
> > > +	((iph->version_ihl >> 4) * 4)
> > > +#endif
> > > +
> > > +#define IPV4_HDR_DF_SHIFT 14
> > > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > > +
> > > +#define INVALID_FLOW_INDEX 0xffffU
> > > +#define INVALID_ITEM_INDEX 0xffffffffUL
> > > +
> > > +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
> > > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > > +
> > > +/* criteria of mergeing packets */
> > > +struct gro_tcp_flow_key {
> > > +	struct ether_addr eth_saddr;
> > > +	struct ether_addr eth_daddr;
> > > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > > +	uint32_t ip_dst_addr[4];
> > > +
> > > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > > +	uint16_t src_port;
> > > +	uint16_t dst_port;
> > > +	uint8_t tcp_flags;	/**< TCP flags. */
> > > +};
> > > +
> > > +struct gro_tcp_flow {
> > > +	struct gro_tcp_flow_key key;
> > > +	uint32_t start_index;	/**< the first packet index of the flow */
> > > +	uint8_t is_valid;
> > > +};
> > > +
> > > +struct gro_tcp_item {
> > > +	struct rte_mbuf *pkt;	/**< packet address. */
> > > +	/* the time when the packet in added into the table */
> > > +	uint64_t start_time;
> > > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > > +	/* flag to indicate if the packet is GROed */
> > > +	uint8_t is_groed;
> > > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> > > +};
> > > +
> > > +/**
> > > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > > + * structure.
> > > + */
> > > +struct gro_tcp_tbl {
> > > +	struct gro_tcp_item *items;	/**< item array */
> > > +	struct gro_tcp_flow *flows;	/**< flow array */
> > > +	uint32_t item_num;	/**< current item number */
> > > +	uint16_t flow_num;	/**< current flow num */
> > > +	uint32_t max_item_num;	/**< item array size */
> > > +	uint16_t max_flow_num;	/**< flow array size */
> > > +};
> > > +
> > > +/* rules to reassemble TCP packets, which are decided by applications */
> > > +struct gro_tcp_rule {
> > > +	/* the maximum packet length after merged */
> > > +	uint32_t max_packet_size;
> > > +};
> >
> > Are there any other rules? If not, I prefer to use max_packet_size directly.
> 
> If we agree to use a flag to indicate if check checksum, this structure should
> be used to keep this flag.
> 
> >
> > > +
> > > +/**
> > > + * This function is to update TCP and IPv4 header checksums
> > > + * for merged packets in the TCP reassembly table.
> > > + */
> > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
> > > +
> > > +/**
> > > + * This function creates a TCP reassembly table.
> > > + *
> > > + * @param socket_id
> > > + *  socket index where the Ethernet port connects to.
> > > + * @param max_flow_num
> > > + *  the maximum number of flows in the TCP GRO table
> > > + * @param max_item_per_flow
> > > + *  the maximum packet number per flow.
> > > + * @return
> > > + *  if create successfully, return a pointer which points to the
> > > + *  created TCP GRO table. Otherwise, return NULL.
> > > + */
> > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > > +		uint16_t max_flow_num,
> > > +		uint16_t max_item_per_flow);
> > > +
> > > +/**
> > > + * This function destroys a TCP reassembly table.
> > > + * @param tbl
> > > + *  a pointer points to the TCP reassembly table.
> > > + */
> > > +void gro_tcp_tbl_destroy(void *tbl);
> > > +
> > > +/**
> > > + * This function searches for a packet in the TCP reassembly table to
> > > + * merge with the inputted one. To merge two packets is to chain them
> > > + * together and update packet headers. Note that this function won't
> > > + * re-calculate IPv4 and TCP checksums.
> > > + *
> > > + * If the packet doesn't have data, or with wrong checksums, or is
> > > + * fragmented etc., errors happen and gro_tcp4_reassemble returns
> > > + * immediately. If no errors happen, the packet is either merged, or
> > > + * inserted into the reassembly table.
> > > + *
> > > + * If applications want to get packets in the reassembly table, they
> > > + * need to manually flush the packets.
> > > + *
> > > + * @param pkt
> > > + *  packet to reassemble.
> > > + * @param tbl
> > > + *  a pointer that points to a TCP reassembly table.
> > > + * @param rule
> > > + *  TCP reassembly criteria defined by applications.
> > > + * @return
> > > + *  if the inputted packet is merged successfully, return an positive
> > > + *  value. If the packet hasn't be merged with any packets in the TCP
> > > + *  reassembly table. If errors happen, return a negative value and the
> > > + *  packet won't be inserted into the reassemble table.
> > > + */
> > > +int32_t
> > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > +		struct gro_tcp_tbl *tbl,
> > > +		struct gro_tcp_rule *rule);
> > > +
> > > +/**
> > > + * This function flushes the packets in a TCP reassembly table to
> > > + * applications. Before returning the packets, it will update TCP and
> > > + * IPv4 header checksums.
> > > + *
> > > + * @param tbl
> > > + *  a pointer that points to a TCP GRO table.
> > > + * @param flush_num
> > > + *  the number of packets that applications want to flush.
> > > + * @param out
> > > + *  pointer array which is used to keep flushed packets.
> > > + * @param nb_out
> > > + *  the maximum element number of out.
> > > + * @return
> > > + *  the number of packets that are flushed finally.
> > > + */
> > > +uint16_t
> > > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out);
> > > +
> > > +/**
> > > + * This function flushes timeout packets in a TCP reassembly table to
> > > + * applications. Before returning the packets, it updates TCP and IPv4
> > > + * header checksums.
> > > + *
> > > + * @param tbl
> > > + *  a pointer that points to a TCP GRO table.
> > > + * @param timeout_cycles
> > > + *  the maximum time that packets can stay in the table.
> > > + * @param out
> > > + *  pointer array which is used to keep flushed packets.
> > > + * @param nb_out
> > > + *  the maximum element number of out.
> > > + * @return
> > > + *  It returns the number of packets that are flushed finally.
> > > + */
> > > +uint16_t
> > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > +		uint64_t timeout_cycles,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out);
> > > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-20  3:22             ` Jiayu Hu
  2017-06-20 15:15               ` Ananyev, Konstantin
@ 2017-06-20 15:21               ` Ananyev, Konstantin
  2017-06-20 23:30               ` Tan, Jianfeng
  2 siblings, 0 replies; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-06-20 15:21 UTC (permalink / raw)
  To: Hu, Jiayu, Tan, Jianfeng; +Cc: dev, yliu, Wiles, Keith, Bie, Tiwei, Yao, Lei A



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Tuesday, June 20, 2017 4:22 AM
> To: Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; yliu@fridaylinux.org; Wiles, Keith <keith.wiles@intel.com>; Bie,
> Tiwei <tiwei.bie@intel.com>; Yao, Lei A <lei.a.yao@intel.com>
> Subject: Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> Hi Jianfeng,
> 
> On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> >
> >
> > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > > In this patch, we introduce six APIs to support TCP/IPv4 GRO.
> >
> > Those functions are not used outside of this library. Don't make it as
> > extern visible.
> 
> But they are called by functions in rte_gro.c, which are in the different
> file. If we define these functions with static, how can they be called by
> other functions in the different file?

Declare them in non-public header?

> 
> >
> > > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> > >      merge packets.
> >
> > Will tcp6 shares the same function with tcp4? If no, please rename it to
> > gro_tcp4_tlb_create
> 
> In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
> they will have different reassembly function. Therefore, I use
> gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.
> 
> >
> > > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > > - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
> > > - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
> > >      reassembly table.
> > > - gro_tcp4_reassemble: merge an inputted packet.
> > > - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
> > >      all merged packets in the TCP reassembly table.
> > >
> > > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > > structure. A TCP reassembly table includes a flow array and a item array,
> > > where the flow array is used to record flow information and the item
> > > array is used to record packets information.
> > >
> > > Each element in the flow array records the information of one flow,
> > > which includes two parts:
> > > - key: the criteria of the same flow. If packets have the same key
> > >      value, they belong to the same flow.
> > > - start_index: the index of the first incoming packet of this flow in
> > >      the item array. With start_index, we can locate the first incoming
> > >      packet of this flow.
> > > Each element in the item array records one packet information. It mainly
> > > includes two parts:
> > > - pkt: packet address
> > > - next_pkt_index: index of the next packet of the same flow in the item
> > >      array. All packets of the same flow are chained by next_pkt_index.
> > >      With next_pkt_index, we can locate all packets of the same flow
> > >      one by one.
> > >
> > > To process an incoming packet, we need three steps:
> > > a. check if the packet should be processed. Packets with the following
> > >      properties won't be processed:
> > > 	- packets without data;
> > > 	- packets with wrong checksums;
> >
> > Why do we care to check this kind of error? Can we just suppose the
> > applications have already dropped the packets with wrong cksum?
> 
> Indeed, if we assume all inputted packets are correct, we can avoid
> checksum checking overhead. But as a library, I think a more flexible
> way is to enable applications to tell GRO API if checksum checking
> is needed. For example, we can add a flag to struct rte_gro_tbl
> and struct rte_gro_param, which indicates if the checksum checking
> is needed. If applications set this flag, reassembly function won't
> check packet checksum. Otherwise, we check the checksum. How do you
> think?
> 
> >
> > > 	- fragmented packets.
> >
> > IP fragmented? I don't think we need to check it here either. It's the
> > application's responsibility to call librte_ip_frag firstly to reassemble
> > IP-fragmented packets, and then call this gro library to merge TCP packets.
> > And this procedure should be shown in an example for other users to refer.
> >
> > > b. traverse the flow array to find a flow which the packet belongs to.
> > >      If not find, insert a new flow and store the packet into the item
> > >      array.
> >
> > You do not store the packet now. "store the packet into the item array" ->
> > "then go to step c".
> 
> Thanks, I will update it in next patch.
> 
> >
> > > c. locate the first packet of this flow in the item array via
> > >      start_index. Then traverse all packets of this flow one by one via
> > >      next_pkt_index. If find one packet to merge with the incoming packet,
> > >      merge them but without updating checksums. If not, allocate one item
> > >      in the item array to store the incoming packet and update
> > >      next_pkt_index value.
> > >
> > > For better performance, we don't update header checksums once two
> > > packets are merged. The header checksums are updated only when packets
> > > are flushed from TCP reassembly tables.
> >
> > Why do we care to recalculate the L4 checksum when flushing? How about Just
> > keeping the wrong cksum, and letting the applications to handle that?
> 
> Not all applications want GROed packets with wrong checksum. So I think a
> more reasonable way is to give a flag to applications to tell GRO API if
> they need calculate checksum when flush them from GRO table. How do you
> think?
> 
> >
> >
> > >
> > > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > > ---
> > >   lib/librte_gro/Makefile      |   1 +
> > >   lib/librte_gro/rte_gro.c     | 154 +++++++++++--
> > >   lib/librte_gro/rte_gro.h     |  34 +--
> > >   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> > >   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> > >   5 files changed, 895 insertions(+), 31 deletions(-)
> > >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> > >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > >
> > > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > > index 9f4063a..3495dfc 100644
> > > --- a/lib/librte_gro/Makefile
> > > +++ b/lib/librte_gro/Makefile
> > > @@ -43,6 +43,7 @@ LIBABIVER := 1
> > >   # source files
> > >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> >
> > Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.
> 
> TCP4 and TCP6 reassembly functions will be placed in the same file,
> rte_gro_tcp.c. But currently, we don't support TCP6 GRO.
> 
> >
> > >   # install this header file
> > >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > index 1bc53a2..2620ef6 100644
> > > --- a/lib/librte_gro/rte_gro.c
> > > +++ b/lib/librte_gro/rte_gro.c
> > > @@ -32,11 +32,17 @@
> > >   #include <rte_malloc.h>
> > >   #include <rte_mbuf.h>
> > > +#include <rte_ethdev.h>
> > > +#include <rte_ip.h>
> > > +#include <rte_tcp.h>
> > >   #include "rte_gro.h"
> > > +#include "rte_gro_tcp.h"
> > > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> > > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> > > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> > > +	gro_tcp_tbl_create, NULL};
> > > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> > > +	gro_tcp_tbl_destroy, NULL};
> > >   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> > >   		uint16_t max_flow_num,
> > > @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > >   }
> > >   uint16_t
> > > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > >   		const uint16_t nb_pkts,
> > > -		const struct rte_gro_param param __rte_unused)
> > > +		const struct rte_gro_param param)
> > >   {
> > > -	return nb_pkts;
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t l3proc_type, i;
> >
> > I did not catch the variable definition here: l3proc_type -> l3_proto?
> 
> You can see it in line 158 and line 159.
> 
> >
> > > +	uint16_t nb_after_gro = nb_pkts;
> > > +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
> > > +		nb_pkts : param.max_flow_num;
> > > +	uint32_t item_num = nb_pkts <
> > > +		flow_num * param.max_item_per_flow ?
> > > +		nb_pkts :
> > > +		flow_num * param.max_item_per_flow;
> > > +
> > > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > > +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
> > > +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
> >
> > Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
> > comment, we iterate all ptypes of this packets to iterate all supported GRO
> > engine.
> 
> Sorry, I don't get the point. The table which is created here is used by
> gro_tcp4_reassemble when merges packets. If we don't create table here,
> what does gro_tcp4_reassemble use to merge packets?
> 
> >
> > > +	struct gro_tcp_tbl tcp_tbl;
> > > +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
> > > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > > +	struct gro_tcp_rule tcp_rule;
> > > +
> > > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > > +	uint16_t unprocess_num = 0;
> > > +	int32_t ret;
> > > +
> > > +	if (unlikely(nb_pkts <= 1))
> > > +		return nb_pkts;
> > > +
> > > +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
> > > +			tcp_flow_num);
> > > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > > +			tcp_item_num);
> > > +	tcp_tbl.flows = tcp_flows;
> > > +	tcp_tbl.items = tcp_items;
> > > +	tcp_tbl.flow_num = 0;
> > > +	tcp_tbl.item_num = 0;
> > > +	tcp_tbl.max_flow_num = tcp_flow_num;
> > > +	tcp_tbl.max_item_num = tcp_item_num;
> > > +	tcp_rule.max_packet_size = param.max_packet_size;
> > > +
> > > +	for (i = 0; i < nb_pkts; i++) {
> > > +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
> > > +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > +		if (l3proc_type == ETHER_TYPE_IPv4) {
> > > +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > +					(param.desired_gro_types &
> > > +					 GRO_TCP_IPV4)) {
> > > +				ret = gro_tcp4_reassemble(pkts[i],
> > > +						&tcp_tbl,
> > > +						&tcp_rule);
> > > +				if (ret > 0)
> > > +					nb_after_gro--;
> > > +				else if (ret < 0)
> > > +					unprocess_pkts[unprocess_num++] =
> > > +						pkts[i];
> > > +			} else
> > > +				unprocess_pkts[unprocess_num++] =
> > > +					pkts[i];
> > > +		} else
> > > +			unprocess_pkts[unprocess_num++] =
> > > +				pkts[i];
> > > +	}
> > > +
> > > +	if (nb_after_gro < nb_pkts) {
> > > +		/* update packets headers and re-arrange GROed packets */
> > > +		if (param.desired_gro_types & GRO_TCP_IPV4) {
> > > +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
> > > +			for (i = 0; i < tcp_tbl.item_num; i++)
> > > +				pkts[i] = tcp_tbl.items[i].pkt;
> > > +		}
> > > +		if (unprocess_num > 0) {
> > > +			memcpy(&pkts[i], unprocess_pkts,
> > > +					sizeof(struct rte_mbuf *) *
> > > +					unprocess_num);
> > > +			i += unprocess_num;
> > > +		}
> > > +		if (nb_pkts > i)
> > > +			memset(&pkts[i], 0,
> > > +					sizeof(struct rte_mbuf *) *
> > > +					(nb_pkts - i));
> > > +	}
> > > +	return nb_after_gro;
> > >   }
> > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > +		struct rte_gro_tbl *gro_tbl)
> > >   {
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t l3proc_type;
> > > +	struct gro_tcp_rule tcp_rule;
> > > +
> > > +	if (pkt == NULL)
> > > +		return -1;
> > > +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > +	if (l3proc_type == ETHER_TYPE_IPv4) {
> > > +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
> > > +			return gro_tcp4_reassemble(pkt,
> > > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > +					&tcp_rule);
> > > +		}
> > > +	}
> > >   	return -1;
> > >   }
> > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		uint16_t flush_num __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused)
> > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out)
> > >   {
> >
> > Ditto.
> >
> > > +	desired_gro_types = desired_gro_types &
> > > +		gro_tbl->desired_gro_types;
> > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > +		return gro_tcp_tbl_flush(
> > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > +				flush_num,
> > > +				out,
> > > +				max_nb_out);
> > >   	return 0;
> > >   }
> > >   uint16_t
> > > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused)
> > > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out)
> > >   {
> > > +	desired_gro_types = desired_gro_types &
> > > +		gro_tbl->desired_gro_types;
> > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > +		return gro_tcp_tbl_timeout_flush(
> > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > +				gro_tbl->max_timeout_cycles,
> > > +				out, max_nb_out);
> > >   	return 0;
> > >   }
> > > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > > index 67bd90d..e26aa5b 100644
> > > --- a/lib/librte_gro/rte_gro.h
> > > +++ b/lib/librte_gro/rte_gro.h
> > > @@ -35,7 +35,11 @@
> > >   /* maximum number of supported GRO types */
> > >   #define GRO_TYPE_MAX_NB 64
> > > -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> > > +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> > > +
> > > +/* TCP/IPv4 GRO flag */
> > > +#define GRO_TCP_IPV4_INDEX 0
> > > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> > >   /**
> > >    * GRO table structure. DPDK GRO uses GRO table to reassemble
> > > @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> > >    * @return
> > >    *  the number of packets after GROed.
> > >    */
> > > -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > -		const uint16_t nb_pkts __rte_unused,
> > > -		const struct rte_gro_param param __rte_unused);
> > > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > > +		const uint16_t nb_pkts,
> > > +		const struct rte_gro_param param);
> > >   /**
> > >    * This is the main reassembly API used in heavyweight mode, which
> > > @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > >    *  if merge the packet successfully, return a positive value. If fail
> > >    *  to merge, return zero. If errors happen, return a negative value.
> > >    */
> > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > -		struct rte_gro_tbl *gro_tbl __rte_unused);
> > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > +		struct rte_gro_tbl *gro_tbl);
> > >   /**
> > >    * This function flushed packets of desired GRO types from their
> > > @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > >    * @return
> > >    *  the number of flushed packets. If no packets are flushed, return 0.
> > >    */
> > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		uint16_t flush_num __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused);
> > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out);
> > >   /**
> > >    * This function flushes the timeout packets from reassembly tables of
> > > @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > >    * @return
> > >    *  the number of flushed packets. If no packets are flushed, return 0.
> > >    */
> > > -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > -		uint64_t desired_gro_types __rte_unused,
> > > -		struct rte_mbuf **out __rte_unused,
> > > -		const uint16_t max_nb_out __rte_unused);
> > > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out);
> >
> > Do you have any cases to test this API? I don't see following example use
> > this API. That means we are exposing an API that are never tested. I don't
> > know if we can add some experimental flag on this API. Let's seek advice
> > from others.
> 
> These flush APIs are used in heavyweight mode. But testpmd is not a good case
> to use heavyweight mode. How do you think if we use some unit tests to test
> them?
> 
> >
> > >   #endif
> > > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > > new file mode 100644
> > > index 0000000..86743cd
> > > --- /dev/null
> > > +++ b/lib/librte_gro/rte_gro_tcp.c
> > > @@ -0,0 +1,527 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + *     * Redistributions of source code must retain the above copyright
> > > + *       notice, this list of conditions and the following disclaimer.
> > > + *     * Redistributions in binary form must reproduce the above copyright
> > > + *       notice, this list of conditions and the following disclaimer in
> > > + *       the documentation and/or other materials provided with the
> > > + *       distribution.
> > > + *     * Neither the name of Intel Corporation nor the names of its
> > > + *       contributors may be used to endorse or promote products derived
> > > + *       from this software without specific prior written permission.
> > > + *
> > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > + */
> > > +
> > > +#include <rte_malloc.h>
> > > +#include <rte_mbuf.h>
> > > +#include <rte_cycles.h>
> > > +
> > > +#include <rte_ethdev.h>
> > > +#include <rte_ip.h>
> > > +#include <rte_tcp.h>
> > > +
> > > +#include "rte_gro_tcp.h"
> > > +
> > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> >
> > Define it as "static". Similar to other functions.
> >
> > > +		uint16_t max_flow_num,
> > > +		uint16_t max_item_per_flow)
> > > +{
> > > +	size_t size;
> > > +	uint32_t entries_num;
> > > +	struct gro_tcp_tbl *tbl;
> > > +
> > > +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
> > > +
> > > +	entries_num = max_flow_num * max_item_per_flow;
> > > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > > +
> > > +	if (entries_num == 0 || max_flow_num == 0)
> > > +		return NULL;
> > > +
> > > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > > +			__func__,
> > > +			sizeof(struct gro_tcp_tbl),
> > > +			RTE_CACHE_LINE_SIZE,
> > > +			socket_id);
> > > +
> > > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > > +			__func__,
> > > +			size,
> > > +			RTE_CACHE_LINE_SIZE,
> > > +			socket_id);
> > > +	tbl->max_item_num = entries_num;
> > > +
> > > +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
> > > +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
> > > +			__func__,
> > > +			size, RTE_CACHE_LINE_SIZE,
> > > +			socket_id);
> > > +	tbl->max_flow_num = max_flow_num;
> > > +	return tbl;
> > > +}
> > > +
> > > +void gro_tcp_tbl_destroy(void *tbl)
> > > +{
> > > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > > +
> > > +	if (tcp_tbl) {
> > > +		if (tcp_tbl->items)
> > > +			rte_free(tcp_tbl->items);
> > > +		if (tcp_tbl->flows)
> > > +			rte_free(tcp_tbl->flows);
> > > +		rte_free(tcp_tbl);
> > > +	}
> > > +}
> > > +
> > > +/* update TCP header and IPv4 header checksum */
> > > +static void
> > > +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
> > > +{
> > > +	uint32_t len, offset, cksum;
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	uint16_t ipv4_ihl, cksum_pld;
> > > +
> > > +	if (pkt == NULL)
> > > +		return;
> > > +
> > > +	len = pkt->pkt_len;
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > +
> > > +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
> > > +	len -= offset;
> > > +
> > > +	/* TCP cksum without IP pseudo header */
> > > +	ipv4_hdr->hdr_checksum = 0;
> > > +	tcp_hdr->cksum = 0;
> > > +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
> > > +
> > > +	/* IP pseudo header cksum */
> > > +	cksum = cksum_pld;
> > > +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> > > +
> > > +	/* combine TCP checksum and IP pseudo header checksum */
> > > +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> > > +	cksum = (~cksum) & 0xffff;
> > > +	cksum = (cksum == 0) ? 0xffff : cksum;
> > > +	tcp_hdr->cksum = cksum;
> > > +
> > > +	/* update IP header cksum */
> > > +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > +}
> > > +
> > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
> > > +{
> > > +	uint32_t i;
> > > +	uint32_t item_num = tbl->item_num;
> > > +
> > > +	for (i = 0; i < tbl->max_item_num; i++) {
> > > +		if (tbl->items[i].is_valid) {
> > > +			item_num--;
> > > +			if (tbl->items[i].is_groed)
> > > +				gro_tcp4_cksum_update(tbl->items[i].pkt);
> > > +		}
> > > +		if (unlikely(item_num == 0))
> > > +			break;
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * merge two TCP/IPv4 packets without update header checksum.
> > > + */
> > > +static int
> > > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > > +		struct rte_mbuf *pkt,
> > > +		struct gro_tcp_rule *rule)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > > +	struct tcp_hdr *tcp_hdr1;
> > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > +	struct rte_mbuf *tail;
> > > +
> > > +	/* parse the given packet */
> > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > +				struct ether_hdr *) + 1);
> > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > +		- tcp_hl1;
> > > +
> > > +	/* parse the original packet */
> > > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > > +				struct ether_hdr *) + 1);
> > > +
> > > +	/* check reassembly rules */
> > > +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
> > > +		return -1;
> > > +
> > > +	/* remove the header of the incoming packet */
> > > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > > +			ipv4_ihl1 + tcp_hl1);
> > > +
> > > +	/* chain the two packet together */
> > > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > > +	tail->next = pkt;
> > > +
> > > +	/* update IP header */
> > > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > > +			rte_be_to_cpu_16(
> > > +				ipv4_hdr2->total_length)
> > > +			+ tcp_dl1);
> > > +
> > > +	/* update mbuf metadata for the merged packet */
> > > +	pkt_src->nb_segs++;
> > > +	pkt_src->pkt_len += pkt->pkt_len;
> > > +	return 1;
> > > +}
> > > +
> > > +static int
> > > +check_seq_option(struct rte_mbuf *pkt,
> > > +		struct tcp_hdr *tcp_hdr,
> > > +		uint16_t tcp_hl)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr1;
> > > +	struct tcp_hdr *tcp_hdr1;
> > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > +	uint32_t sent_seq1, sent_seq;
> > > +	int ret = -1;
> > > +
> > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > +				struct ether_hdr *) + 1);
> > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > +		- tcp_hl1;
> > > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > > +
> > > +	/* check if the two packets are neighbor */
> > > +	if ((sent_seq ^ sent_seq1) == 0) {
> > > +		/* check if TCP option field equals */
> > > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> > > +			if ((tcp_hl1 != tcp_hl) ||
> > > +					(memcmp(tcp_hdr1 + 1,
> > > +							tcp_hdr + 1,
> > > +							tcp_hl - sizeof
> > > +							(struct tcp_hdr))
> > > +					 == 0))
> > > +				ret = 1;
> > > +		}
> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +static uint32_t
> > > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > > +{
> > > +	uint32_t i;
> > > +
> > > +	for (i = 0; i < tbl->max_item_num; i++)
> > > +		if (tbl->items[i].is_valid == 0)
> > > +			return i;
> > > +	return INVALID_ITEM_INDEX;
> > > +}
> > > +
> > > +static uint16_t
> > > +find_an_empty_flow(struct gro_tcp_tbl *tbl)
> > > +{
> > > +	uint16_t i;
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++)
> > > +		if (tbl->flows[i].is_valid == 0)
> > > +			return i;
> > > +	return INVALID_FLOW_INDEX;
> > > +}
> > > +
> > > +int32_t
> > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > +		struct gro_tcp_tbl *tbl,
> > > +		struct gro_tcp_rule *rule)
> > > +{
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
> > > +
> > > +	struct gro_tcp_flow_key key;
> > > +	uint64_t ol_flags;
> > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > +	uint16_t i, flow_idx;
> > > +
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > +
> > > +	/* 1. check if the packet should be processed */
> > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > +		goto fail;
> > > +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
> > > +		goto fail;
> > > +	if ((ipv4_hdr->fragment_offset &
> > > +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
> > > +			== 0)
> > > +		goto fail;
> > > +
> > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > +		- tcp_hl;
> > > +	if (tcp_dl == 0)
> > > +		goto fail;
> > > +
> > > +	/**
> > > +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
> > > +	 * checksum in SW. Then, check if the checksum is correct
> > > +	 */
> > > +	ol_flags = pkt->ol_flags;
> > > +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
> > > +			PKT_RX_IP_CKSUM_UNKNOWN) {
> > > +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
> > > +			goto fail;
> > > +	} else {
> > > +		ip_cksum = ipv4_hdr->hdr_checksum;
> > > +		ipv4_hdr->hdr_checksum = 0;
> > > +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
> > > +			goto fail;
> > > +	}
> > > +
> > > +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
> > > +			PKT_RX_L4_CKSUM_UNKNOWN) {
> > > +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
> > > +			goto fail;
> > > +	} else {
> > > +		tcp_cksum = tcp_hdr->cksum;
> > > +		tcp_hdr->cksum = 0;
> > > +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
> > > +			(ipv4_hdr, tcp_hdr);
> > > +		if (tcp_hdr->cksum ^ tcp_cksum)
> > > +			goto fail;
> > > +	}
> > > +
> > > +	/**
> > > +	 * 3. search for a flow and traverse all packets in the flow
> > > +	 * to find one to merge with the given packet.
> > > +	 */
> > > +	key.eth_saddr = eth_hdr->s_addr;
> > > +	key.eth_daddr = eth_hdr->d_addr;
> > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > +		/* search all packets in a valid flow. */
> > > +		if (tbl->flows[i].is_valid &&
> > > +				(memcmp(&(tbl->flows[i].key), &key,
> > > +						sizeof(struct gro_tcp_flow_key))
> > > +				 == 0)) {
> > > +			cur_idx = tbl->flows[i].start_index;
> > > +			prev_idx = cur_idx;
> > > +			while (cur_idx != INVALID_ITEM_INDEX) {
> > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > +							tcp_hdr,
> > > +							tcp_hl) > 0) {
> > > +					if (merge_two_tcp4_packets(
> > > +								tbl->items[cur_idx].pkt,
> > > +								pkt,
> > > +								rule) > 0) {
> > > +						/* successfully merge two packets */
> > > +						tbl->items[cur_idx].is_groed = 1;
> > > +						return 1;
> > > +					}
> > > +					/**
> > > +					 * fail to merge two packets since
> > > +					 * break the rules, add the packet
> > > +					 * into the flow.
> > > +					 */
> > > +					goto insert_to_existed_flow;
> > > +				} else {
> > > +					prev_idx = cur_idx;
> > > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > > +				}
> > > +			}
> > > +			/**
> > > +			 * fail to merge the given packet into an existed flow,
> > > +			 * add it into the flow.
> > > +			 */
> > > +insert_to_existed_flow:
> > > +			item_idx = find_an_empty_item(tbl);
> > > +			/* the item number is beyond the maximum value */
> > > +			if (item_idx == INVALID_ITEM_INDEX)
> > > +				return -1;
> > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > +			tbl->items[item_idx].pkt = pkt;
> > > +			tbl->items[item_idx].is_groed = 0;
> > > +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > +			tbl->items[item_idx].is_valid = 1;
> > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > +			tbl->item_num++;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	/**
> > > +	 * merge fail as the given packet is a new flow. Therefore,
> > > +	 * insert a new flow.
> > > +	 */
> > > +	item_idx = find_an_empty_item(tbl);
> > > +	flow_idx = find_an_empty_flow(tbl);
> > > +	/**
> > > +	 * if the flow or item number are beyond the maximum values,
> > > +	 * the inputted packet won't be processed.
> > > +	 */
> > > +	if (item_idx == INVALID_ITEM_INDEX ||
> > > +			flow_idx == INVALID_FLOW_INDEX)
> > > +		return -1;
> > > +	tbl->items[item_idx].pkt = pkt;
> > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > +	tbl->items[item_idx].is_groed = 0;
> > > +	tbl->items[item_idx].is_valid = 1;
> > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > > +	tbl->item_num++;
> > > +
> > > +	memcpy(&(tbl->flows[flow_idx].key),
> > > +			&key, sizeof(struct gro_tcp_flow_key));
> > > +	tbl->flows[flow_idx].start_index = item_idx;
> > > +	tbl->flows[flow_idx].is_valid = 1;
> > > +	tbl->flow_num++;
> > > +
> > > +	return 0;
> > > +fail:
> > > +	return -1;
> > > +}
> > > +
> > > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out)
> > > +{
> > > +	uint16_t num, k;
> > > +	uint16_t i;
> > > +	uint32_t j;
> > > +
> > > +	k = 0;
> > > +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
> > > +	num = num > nb_out ? nb_out : num;
> > > +	if (unlikely(num == 0))
> > > +		return 0;
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > +		if (tbl->flows[i].is_valid) {
> > > +			j = tbl->flows[i].start_index;
> > > +			while (j != INVALID_ITEM_INDEX) {
> > > +				/* update checksum for GROed packet */
> > > +				if (tbl->items[j].is_groed)
> > > +					gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > +
> > > +				out[k++] = tbl->items[j].pkt;
> > > +				tbl->items[j].is_valid = 0;
> > > +				tbl->item_num--;
> > > +				j = tbl->items[j].next_pkt_idx;
> > > +
> > > +				if (k == num) {
> > > +					/* delete the flow */
> > > +					if (j == INVALID_ITEM_INDEX) {
> > > +						tbl->flows[i].is_valid = 0;
> > > +						tbl->flow_num--;
> > > +					} else
> > > +						/* update flow information */
> > > +						tbl->flows[i].start_index = j;
> > > +					goto end;
> > > +				}
> > > +			}
> > > +			/* delete the flow, as all of its packets are flushed */
> > > +			tbl->flows[i].is_valid = 0;
> > > +			tbl->flow_num--;
> > > +		}
> > > +		if (tbl->flow_num == 0)
> > > +			goto end;
> > > +	}
> > > +end:
> > > +	return num;
> > > +}
> > > +
> > > +uint16_t
> > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > +		uint64_t timeout_cycles,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out)
> > > +{
> > > +	uint16_t k;
> > > +	uint16_t i;
> > > +	uint32_t j;
> > > +	uint64_t current_time;
> > > +
> > > +	if (nb_out == 0)
> > > +		return 0;
> > > +	k = 0;
> > > +	current_time = rte_rdtsc();
> > > +
> > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > +		if (tbl->flows[i].is_valid) {
> > > +			j = tbl->flows[i].start_index;
> > > +			while (j != INVALID_ITEM_INDEX) {
> > > +				if (current_time - tbl->items[j].start_time >=
> > > +						timeout_cycles) {
> > > +					/* update checksum for GROed packet */
> > > +					if (tbl->items[j].is_groed)
> > > +						gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > +
> > > +					out[k++] = tbl->items[j].pkt;
> > > +					tbl->items[j].is_valid = 0;
> > > +					tbl->item_num--;
> > > +					j = tbl->items[j].next_pkt_idx;
> > > +
> > > +					if (k == nb_out &&
> > > +							j == INVALID_ITEM_INDEX) {
> > > +						/* delete the flow */
> > > +						tbl->flows[i].is_valid = 0;
> > > +						tbl->flow_num--;
> > > +						goto end;
> > > +					} else if (k == nb_out &&
> > > +							j != INVALID_ITEM_INDEX) {
> > > +						tbl->flows[i].start_index = j;
> > > +						goto end;
> > > +					}
> > > +				}
> > > +			}
> > > +			/* delete the flow, as all of its packets are flushed */
> > > +			tbl->flows[i].is_valid = 0;
> > > +			tbl->flow_num--;
> > > +		}
> > > +		if (tbl->flow_num == 0)
> > > +			goto end;
> > > +	}
> > > +end:
> > > +	return k;
> > > +}
> > > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > > new file mode 100644
> > > index 0000000..551efc4
> > > --- /dev/null
> > > +++ b/lib/librte_gro/rte_gro_tcp.h
> > > @@ -0,0 +1,210 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + *     * Redistributions of source code must retain the above copyright
> > > + *       notice, this list of conditions and the following disclaimer.
> > > + *     * Redistributions in binary form must reproduce the above copyright
> > > + *       notice, this list of conditions and the following disclaimer in
> > > + *       the documentation and/or other materials provided with the
> > > + *       distribution.
> > > + *     * Neither the name of Intel Corporation nor the names of its
> > > + *       contributors may be used to endorse or promote products derived
> > > + *       from this software without specific prior written permission.
> > > + *
> > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > + */
> > > +
> > > +#ifndef _RTE_GRO_TCP_H_
> > > +#define _RTE_GRO_TCP_H_
> > > +
> > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > +#define TCP_HDR_LEN(tcph) \
> > > +	((tcph->data_off >> 4) * 4)
> > > +#define IPv4_HDR_LEN(iph) \
> > > +	((iph->version_ihl & 0x0f) * 4)
> > > +#else
> > > +#define TCP_DATAOFF_MASK 0x0f
> > > +#define TCP_HDR_LEN(tcph) \
> > > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > > +#define IPv4_HDR_LEN(iph) \
> > > +	((iph->version_ihl >> 4) * 4)
> > > +#endif
> > > +
> > > +#define IPV4_HDR_DF_SHIFT 14
> > > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > > +
> > > +#define INVALID_FLOW_INDEX 0xffffU
> > > +#define INVALID_ITEM_INDEX 0xffffffffUL
> > > +
> > > +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
> > > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > > +
> > > +/* criteria of mergeing packets */
> > > +struct gro_tcp_flow_key {
> > > +	struct ether_addr eth_saddr;
> > > +	struct ether_addr eth_daddr;
> > > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > > +	uint32_t ip_dst_addr[4];
> > > +
> > > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > > +	uint16_t src_port;
> > > +	uint16_t dst_port;
> > > +	uint8_t tcp_flags;	/**< TCP flags. */
> > > +};
> > > +
> > > +struct gro_tcp_flow {
> > > +	struct gro_tcp_flow_key key;
> > > +	uint32_t start_index;	/**< the first packet index of the flow */
> > > +	uint8_t is_valid;
> > > +};
> > > +
> > > +struct gro_tcp_item {
> > > +	struct rte_mbuf *pkt;	/**< packet address. */
> > > +	/* the time when the packet in added into the table */
> > > +	uint64_t start_time;
> > > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > > +	/* flag to indicate if the packet is GROed */
> > > +	uint8_t is_groed;
> > > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> > > +};
> > > +
> > > +/**
> > > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > > + * structure.
> > > + */
> > > +struct gro_tcp_tbl {
> > > +	struct gro_tcp_item *items;	/**< item array */
> > > +	struct gro_tcp_flow *flows;	/**< flow array */
> > > +	uint32_t item_num;	/**< current item number */
> > > +	uint16_t flow_num;	/**< current flow num */
> > > +	uint32_t max_item_num;	/**< item array size */
> > > +	uint16_t max_flow_num;	/**< flow array size */
> > > +};
> > > +
> > > +/* rules to reassemble TCP packets, which are decided by applications */
> > > +struct gro_tcp_rule {
> > > +	/* the maximum packet length after merged */
> > > +	uint32_t max_packet_size;
> > > +};
> >
> > Are there any other rules? If not, I prefer to use max_packet_size directly.
> 
> If we agree to use a flag to indicate if check checksum, this structure should
> be used to keep this flag.
> 
> >
> > > +
> > > +/**
> > > + * This function is to update TCP and IPv4 header checksums
> > > + * for merged packets in the TCP reassembly table.
> > > + */
> > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
> > > +
> > > +/**
> > > + * This function creates a TCP reassembly table.
> > > + *
> > > + * @param socket_id
> > > + *  socket index where the Ethernet port connects to.
> > > + * @param max_flow_num
> > > + *  the maximum number of flows in the TCP GRO table
> > > + * @param max_item_per_flow
> > > + *  the maximum packet number per flow.
> > > + * @return
> > > + *  if create successfully, return a pointer which points to the
> > > + *  created TCP GRO table. Otherwise, return NULL.
> > > + */
> > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > > +		uint16_t max_flow_num,
> > > +		uint16_t max_item_per_flow);
> > > +
> > > +/**
> > > + * This function destroys a TCP reassembly table.
> > > + * @param tbl
> > > + *  a pointer points to the TCP reassembly table.
> > > + */
> > > +void gro_tcp_tbl_destroy(void *tbl);
> > > +
> > > +/**
> > > + * This function searches for a packet in the TCP reassembly table to
> > > + * merge with the inputted one. To merge two packets is to chain them
> > > + * together and update packet headers. Note that this function won't
> > > + * re-calculate IPv4 and TCP checksums.
> > > + *
> > > + * If the packet doesn't have data, or with wrong checksums, or is
> > > + * fragmented etc., errors happen and gro_tcp4_reassemble returns
> > > + * immediately. If no errors happen, the packet is either merged, or
> > > + * inserted into the reassembly table.
> > > + *
> > > + * If applications want to get packets in the reassembly table, they
> > > + * need to manually flush the packets.
> > > + *
> > > + * @param pkt
> > > + *  packet to reassemble.
> > > + * @param tbl
> > > + *  a pointer that points to a TCP reassembly table.
> > > + * @param rule
> > > + *  TCP reassembly criteria defined by applications.
> > > + * @return
> > > + *  if the inputted packet is merged successfully, return an positive
> > > + *  value. If the packet hasn't be merged with any packets in the TCP
> > > + *  reassembly table. If errors happen, return a negative value and the
> > > + *  packet won't be inserted into the reassemble table.
> > > + */
> > > +int32_t
> > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > +		struct gro_tcp_tbl *tbl,
> > > +		struct gro_tcp_rule *rule);
> > > +
> > > +/**
> > > + * This function flushes the packets in a TCP reassembly table to
> > > + * applications. Before returning the packets, it will update TCP and
> > > + * IPv4 header checksums.
> > > + *
> > > + * @param tbl
> > > + *  a pointer that points to a TCP GRO table.
> > > + * @param flush_num
> > > + *  the number of packets that applications want to flush.
> > > + * @param out
> > > + *  pointer array which is used to keep flushed packets.
> > > + * @param nb_out
> > > + *  the maximum element number of out.
> > > + * @return
> > > + *  the number of packets that are flushed finally.
> > > + */
> > > +uint16_t
> > > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > +		uint16_t flush_num,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out);
> > > +
> > > +/**
> > > + * This function flushes timeout packets in a TCP reassembly table to
> > > + * applications. Before returning the packets, it updates TCP and IPv4
> > > + * header checksums.
> > > + *
> > > + * @param tbl
> > > + *  a pointer that points to a TCP GRO table.
> > > + * @param timeout_cycles
> > > + *  the maximum time that packets can stay in the table.
> > > + * @param out
> > > + *  pointer array which is used to keep flushed packets.
> > > + * @param nb_out
> > > + *  the maximum element number of out.
> > > + * @return
> > > + *  It returns the number of packets that are flushed finally.
> > > + */
> > > +uint16_t
> > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > +		uint64_t timeout_cycles,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t nb_out);
> > > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-20 15:15               ` Ananyev, Konstantin
@ 2017-06-20 16:16                 ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-20 16:16 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Tan, Jianfeng, dev, yliu, Wiles, Keith, Bie, Tiwei, Yao, Lei A

On Tue, Jun 20, 2017 at 11:15:58PM +0800, Ananyev, Konstantin wrote:

Hi Konstantin,

> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Tuesday, June 20, 2017 4:22 AM
> > To: Tan, Jianfeng <jianfeng.tan@intel.com>
> > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; yliu@fridaylinux.org; Wiles, Keith <keith.wiles@intel.com>; Bie,
> > Tiwei <tiwei.bie@intel.com>; Yao, Lei A <lei.a.yao@intel.com>
> > Subject: Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
> > 
> > Hi Jianfeng,
> > 
> > On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> > >
> > >
> > > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > > > In this patch, we introduce six APIs to support TCP/IPv4 GRO.
> > >
> > > Those functions are not used outside of this library. Don't make it as
> > > extern visible.
> > 
> > But they are called by functions in rte_gro.c, which are in the different
> > file. If we define these functions with static, how can they be called by
> > other functions in the different file?
> > 
> > >
> > > > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> > > >      merge packets.
> > >
> > > Will tcp6 shares the same function with tcp4? If no, please rename it to
> > > gro_tcp4_tlb_create
> > 
> > In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
> > they will have different reassembly function. Therefore, I use
> > gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.
> > 
> > >
> > > > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > > > - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
> > > > - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
> > > >      reassembly table.
> > > > - gro_tcp4_reassemble: merge an inputted packet.
> > > > - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
> > > >      all merged packets in the TCP reassembly table.
> > > >
> > > > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > > > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > > > structure. A TCP reassembly table includes a flow array and a item array,
> > > > where the flow array is used to record flow information and the item
> > > > array is used to record packets information.
> > > >
> > > > Each element in the flow array records the information of one flow,
> > > > which includes two parts:
> > > > - key: the criteria of the same flow. If packets have the same key
> > > >      value, they belong to the same flow.
> > > > - start_index: the index of the first incoming packet of this flow in
> > > >      the item array. With start_index, we can locate the first incoming
> > > >      packet of this flow.
> > > > Each element in the item array records one packet information. It mainly
> > > > includes two parts:
> > > > - pkt: packet address
> > > > - next_pkt_index: index of the next packet of the same flow in the item
> > > >      array. All packets of the same flow are chained by next_pkt_index.
> > > >      With next_pkt_index, we can locate all packets of the same flow
> > > >      one by one.
> > > >
> > > > To process an incoming packet, we need three steps:
> > > > a. check if the packet should be processed. Packets with the following
> > > >      properties won't be processed:
> > > > 	- packets without data;
> > > > 	- packets with wrong checksums;
> > >
> > > Why do we care to check this kind of error? Can we just suppose the
> > > applications have already dropped the packets with wrong cksum?
> > 
> > Indeed, if we assume all inputted packets are correct, we can avoid
> > checksum checking overhead. But as a library, I think a more flexible
> > way is to enable applications to tell GRO API if checksum checking
> > is needed. For example, we can add a flag to struct rte_gro_tbl
> > and struct rte_gro_param, which indicates if the checksum checking
> > is needed. If applications set this flag, reassembly function won't
> > check packet checksum. Otherwise, we check the checksum. How do you
> > think?
> > 
> > >
> > > > 	- fragmented packets.
> > >
> > > IP fragmented? I don't think we need to check it here either. It's the
> > > application's responsibility to call librte_ip_frag firstly to reassemble
> > > IP-fragmented packets, and then call this gro library to merge TCP packets.
> > > And this procedure should be shown in an example for other users to refer.
> > >
> > > > b. traverse the flow array to find a flow which the packet belongs to.
> > > >      If not find, insert a new flow and store the packet into the item
> > > >      array.
> > >
> > > You do not store the packet now. "store the packet into the item array" ->
> > > "then go to step c".
> > 
> > Thanks, I will update it in next patch.
> > 
> > >
> > > > c. locate the first packet of this flow in the item array via
> > > >      start_index. Then traverse all packets of this flow one by one via
> > > >      next_pkt_index. If find one packet to merge with the incoming packet,
> > > >      merge them but without updating checksums. If not, allocate one item
> > > >      in the item array to store the incoming packet and update
> > > >      next_pkt_index value.
> > > >
> > > > For better performance, we don't update header checksums once two
> > > > packets are merged. The header checksums are updated only when packets
> > > > are flushed from TCP reassembly tables.
> > >
> > > Why do we care to recalculate the L4 checksum when flushing? How about Just
> > > keeping the wrong cksum, and letting the applications to handle that?
> > 
> > Not all applications want GROed packets with wrong checksum. So I think a
> > more reasonable way is to give a flag to applications to tell GRO API if
> > they need calculate checksum when flush them from GRO table. How do you
> > think?
> 
> I didn't look closely into the latest patch yet, but would echo Jianfeng comment:
> I think TCP cksum calculation/validation should be out of scope of that library.
> User can do that before/after doing the actual GRO, or it might be done in HW.
> Also I'd suggest that inside the library we might add expectation that RX cksum flags plus
> ptype and l2/l3/l4 hdr len fields inside mbuf are already setup correctly by the caller. 
> Konstantin

OK, if both you and Jianfeng think that checksum validation and calculation
are out of the scope of GRO library, I will take this suggestion. Thanks.


BRs,
Jiayu

> 
> > 
> > >
> > >
> > > >
> > > > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > > > ---
> > > >   lib/librte_gro/Makefile      |   1 +
> > > >   lib/librte_gro/rte_gro.c     | 154 +++++++++++--
> > > >   lib/librte_gro/rte_gro.h     |  34 +--
> > > >   lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> > > >   lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> > > >   5 files changed, 895 insertions(+), 31 deletions(-)
> > > >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> > > >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > > >
> > > > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > > > index 9f4063a..3495dfc 100644
> > > > --- a/lib/librte_gro/Makefile
> > > > +++ b/lib/librte_gro/Makefile
> > > > @@ -43,6 +43,7 @@ LIBABIVER := 1
> > > >   # source files
> > > >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > > > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> > >
> > > Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.
> > 
> > TCP4 and TCP6 reassembly functions will be placed in the same file,
> > rte_gro_tcp.c. But currently, we don't support TCP6 GRO.
> > 
> > >
> > > >   # install this header file
> > > >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > > index 1bc53a2..2620ef6 100644
> > > > --- a/lib/librte_gro/rte_gro.c
> > > > +++ b/lib/librte_gro/rte_gro.c
> > > > @@ -32,11 +32,17 @@
> > > >   #include <rte_malloc.h>
> > > >   #include <rte_mbuf.h>
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_ip.h>
> > > > +#include <rte_tcp.h>
> > > >   #include "rte_gro.h"
> > > > +#include "rte_gro_tcp.h"
> > > > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> > > > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> > > > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> > > > +	gro_tcp_tbl_create, NULL};
> > > > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> > > > +	gro_tcp_tbl_destroy, NULL};
> > > >   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> > > >   		uint16_t max_flow_num,
> > > > @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > >   }
> > > >   uint16_t
> > > > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > > >   		const uint16_t nb_pkts,
> > > > -		const struct rte_gro_param param __rte_unused)
> > > > +		const struct rte_gro_param param)
> > > >   {
> > > > -	return nb_pkts;
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	uint16_t l3proc_type, i;
> > >
> > > I did not catch the variable definition here: l3proc_type -> l3_proto?
> > 
> > You can see it in line 158 and line 159.
> > 
> > >
> > > > +	uint16_t nb_after_gro = nb_pkts;
> > > > +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
> > > > +		nb_pkts : param.max_flow_num;
> > > > +	uint32_t item_num = nb_pkts <
> > > > +		flow_num * param.max_item_per_flow ?
> > > > +		nb_pkts :
> > > > +		flow_num * param.max_item_per_flow;
> > > > +
> > > > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > > > +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > > +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
> > > > +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > > +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
> > >
> > > Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
> > > comment, we iterate all ptypes of this packets to iterate all supported GRO
> > > engine.
> > 
> > Sorry, I don't get the point. The table which is created here is used by
> > gro_tcp4_reassemble when merges packets. If we don't create table here,
> > what does gro_tcp4_reassemble use to merge packets?
> > 
> > >
> > > > +	struct gro_tcp_tbl tcp_tbl;
> > > > +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
> > > > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > > > +	struct gro_tcp_rule tcp_rule;
> > > > +
> > > > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > > > +	uint16_t unprocess_num = 0;
> > > > +	int32_t ret;
> > > > +
> > > > +	if (unlikely(nb_pkts <= 1))
> > > > +		return nb_pkts;
> > > > +
> > > > +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
> > > > +			tcp_flow_num);
> > > > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > > > +			tcp_item_num);
> > > > +	tcp_tbl.flows = tcp_flows;
> > > > +	tcp_tbl.items = tcp_items;
> > > > +	tcp_tbl.flow_num = 0;
> > > > +	tcp_tbl.item_num = 0;
> > > > +	tcp_tbl.max_flow_num = tcp_flow_num;
> > > > +	tcp_tbl.max_item_num = tcp_item_num;
> > > > +	tcp_rule.max_packet_size = param.max_packet_size;
> > > > +
> > > > +	for (i = 0; i < nb_pkts; i++) {
> > > > +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
> > > > +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > > +		if (l3proc_type == ETHER_TYPE_IPv4) {
> > > > +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > > +					(param.desired_gro_types &
> > > > +					 GRO_TCP_IPV4)) {
> > > > +				ret = gro_tcp4_reassemble(pkts[i],
> > > > +						&tcp_tbl,
> > > > +						&tcp_rule);
> > > > +				if (ret > 0)
> > > > +					nb_after_gro--;
> > > > +				else if (ret < 0)
> > > > +					unprocess_pkts[unprocess_num++] =
> > > > +						pkts[i];
> > > > +			} else
> > > > +				unprocess_pkts[unprocess_num++] =
> > > > +					pkts[i];
> > > > +		} else
> > > > +			unprocess_pkts[unprocess_num++] =
> > > > +				pkts[i];
> > > > +	}
> > > > +
> > > > +	if (nb_after_gro < nb_pkts) {
> > > > +		/* update packets headers and re-arrange GROed packets */
> > > > +		if (param.desired_gro_types & GRO_TCP_IPV4) {
> > > > +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
> > > > +			for (i = 0; i < tcp_tbl.item_num; i++)
> > > > +				pkts[i] = tcp_tbl.items[i].pkt;
> > > > +		}
> > > > +		if (unprocess_num > 0) {
> > > > +			memcpy(&pkts[i], unprocess_pkts,
> > > > +					sizeof(struct rte_mbuf *) *
> > > > +					unprocess_num);
> > > > +			i += unprocess_num;
> > > > +		}
> > > > +		if (nb_pkts > i)
> > > > +			memset(&pkts[i], 0,
> > > > +					sizeof(struct rte_mbuf *) *
> > > > +					(nb_pkts - i));
> > > > +	}
> > > > +	return nb_after_gro;
> > > >   }
> > > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > > +		struct rte_gro_tbl *gro_tbl)
> > > >   {
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	uint16_t l3proc_type;
> > > > +	struct gro_tcp_rule tcp_rule;
> > > > +
> > > > +	if (pkt == NULL)
> > > > +		return -1;
> > > > +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > > +	if (l3proc_type == ETHER_TYPE_IPv4) {
> > > > +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > > +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
> > > > +			return gro_tcp4_reassemble(pkt,
> > > > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > > +					&tcp_rule);
> > > > +		}
> > > > +	}
> > > >   	return -1;
> > > >   }
> > > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		uint16_t flush_num __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused)
> > > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out)
> > > >   {
> > >
> > > Ditto.
> > >
> > > > +	desired_gro_types = desired_gro_types &
> > > > +		gro_tbl->desired_gro_types;
> > > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > > +		return gro_tcp_tbl_flush(
> > > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > > +				flush_num,
> > > > +				out,
> > > > +				max_nb_out);
> > > >   	return 0;
> > > >   }
> > > >   uint16_t
> > > > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused)
> > > > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out)
> > > >   {
> > > > +	desired_gro_types = desired_gro_types &
> > > > +		gro_tbl->desired_gro_types;
> > > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > > +		return gro_tcp_tbl_timeout_flush(
> > > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > > +				gro_tbl->max_timeout_cycles,
> > > > +				out, max_nb_out);
> > > >   	return 0;
> > > >   }
> > > > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > > > index 67bd90d..e26aa5b 100644
> > > > --- a/lib/librte_gro/rte_gro.h
> > > > +++ b/lib/librte_gro/rte_gro.h
> > > > @@ -35,7 +35,11 @@
> > > >   /* maximum number of supported GRO types */
> > > >   #define GRO_TYPE_MAX_NB 64
> > > > -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> > > > +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> > > > +
> > > > +/* TCP/IPv4 GRO flag */
> > > > +#define GRO_TCP_IPV4_INDEX 0
> > > > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> > > >   /**
> > > >    * GRO table structure. DPDK GRO uses GRO table to reassemble
> > > > @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> > > >    * @return
> > > >    *  the number of packets after GROed.
> > > >    */
> > > > -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > > -		const uint16_t nb_pkts __rte_unused,
> > > > -		const struct rte_gro_param param __rte_unused);
> > > > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > > > +		const uint16_t nb_pkts,
> > > > +		const struct rte_gro_param param);
> > > >   /**
> > > >    * This is the main reassembly API used in heavyweight mode, which
> > > > @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > >    *  if merge the packet successfully, return a positive value. If fail
> > > >    *  to merge, return zero. If errors happen, return a negative value.
> > > >    */
> > > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > > -		struct rte_gro_tbl *gro_tbl __rte_unused);
> > > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > > +		struct rte_gro_tbl *gro_tbl);
> > > >   /**
> > > >    * This function flushed packets of desired GRO types from their
> > > > @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > >    * @return
> > > >    *  the number of flushed packets. If no packets are flushed, return 0.
> > > >    */
> > > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		uint16_t flush_num __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused);
> > > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out);
> > > >   /**
> > > >    * This function flushes the timeout packets from reassembly tables of
> > > > @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > >    * @return
> > > >    *  the number of flushed packets. If no packets are flushed, return 0.
> > > >    */
> > > > -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused);
> > > > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out);
> > >
> > > Do you have any cases to test this API? I don't see following example use
> > > this API. That means we are exposing an API that are never tested. I don't
> > > know if we can add some experimental flag on this API. Let's seek advice
> > > from others.
> > 
> > These flush APIs are used in heavyweight mode. But testpmd is not a good case
> > to use heavyweight mode. How do you think if we use some unit tests to test
> > them?
> > 
> > >
> > > >   #endif
> > > > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > > > new file mode 100644
> > > > index 0000000..86743cd
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro_tcp.c
> > > > @@ -0,0 +1,527 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > + */
> > > > +
> > > > +#include <rte_malloc.h>
> > > > +#include <rte_mbuf.h>
> > > > +#include <rte_cycles.h>
> > > > +
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_ip.h>
> > > > +#include <rte_tcp.h>
> > > > +
> > > > +#include "rte_gro_tcp.h"
> > > > +
> > > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > >
> > > Define it as "static". Similar to other functions.
> > >
> > > > +		uint16_t max_flow_num,
> > > > +		uint16_t max_item_per_flow)
> > > > +{
> > > > +	size_t size;
> > > > +	uint32_t entries_num;
> > > > +	struct gro_tcp_tbl *tbl;
> > > > +
> > > > +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > > +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
> > > > +
> > > > +	entries_num = max_flow_num * max_item_per_flow;
> > > > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > > > +
> > > > +	if (entries_num == 0 || max_flow_num == 0)
> > > > +		return NULL;
> > > > +
> > > > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > > > +			__func__,
> > > > +			sizeof(struct gro_tcp_tbl),
> > > > +			RTE_CACHE_LINE_SIZE,
> > > > +			socket_id);
> > > > +
> > > > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > > > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > > > +			__func__,
> > > > +			size,
> > > > +			RTE_CACHE_LINE_SIZE,
> > > > +			socket_id);
> > > > +	tbl->max_item_num = entries_num;
> > > > +
> > > > +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
> > > > +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
> > > > +			__func__,
> > > > +			size, RTE_CACHE_LINE_SIZE,
> > > > +			socket_id);
> > > > +	tbl->max_flow_num = max_flow_num;
> > > > +	return tbl;
> > > > +}
> > > > +
> > > > +void gro_tcp_tbl_destroy(void *tbl)
> > > > +{
> > > > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > > > +
> > > > +	if (tcp_tbl) {
> > > > +		if (tcp_tbl->items)
> > > > +			rte_free(tcp_tbl->items);
> > > > +		if (tcp_tbl->flows)
> > > > +			rte_free(tcp_tbl->flows);
> > > > +		rte_free(tcp_tbl);
> > > > +	}
> > > > +}
> > > > +
> > > > +/* update TCP header and IPv4 header checksum */
> > > > +static void
> > > > +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
> > > > +{
> > > > +	uint32_t len, offset, cksum;
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, cksum_pld;
> > > > +
> > > > +	if (pkt == NULL)
> > > > +		return;
> > > > +
> > > > +	len = pkt->pkt_len;
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +
> > > > +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
> > > > +	len -= offset;
> > > > +
> > > > +	/* TCP cksum without IP pseudo header */
> > > > +	ipv4_hdr->hdr_checksum = 0;
> > > > +	tcp_hdr->cksum = 0;
> > > > +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
> > > > +
> > > > +	/* IP pseudo header cksum */
> > > > +	cksum = cksum_pld;
> > > > +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> > > > +
> > > > +	/* combine TCP checksum and IP pseudo header checksum */
> > > > +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> > > > +	cksum = (~cksum) & 0xffff;
> > > > +	cksum = (cksum == 0) ? 0xffff : cksum;
> > > > +	tcp_hdr->cksum = cksum;
> > > > +
> > > > +	/* update IP header cksum */
> > > > +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > > +}
> > > > +
> > > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
> > > > +{
> > > > +	uint32_t i;
> > > > +	uint32_t item_num = tbl->item_num;
> > > > +
> > > > +	for (i = 0; i < tbl->max_item_num; i++) {
> > > > +		if (tbl->items[i].is_valid) {
> > > > +			item_num--;
> > > > +			if (tbl->items[i].is_groed)
> > > > +				gro_tcp4_cksum_update(tbl->items[i].pkt);
> > > > +		}
> > > > +		if (unlikely(item_num == 0))
> > > > +			break;
> > > > +	}
> > > > +}
> > > > +
> > > > +/**
> > > > + * merge two TCP/IPv4 packets without update header checksum.
> > > > + */
> > > > +static int
> > > > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > > > +		struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_rule *rule)
> > > > +{
> > > > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > > > +	struct tcp_hdr *tcp_hdr1;
> > > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > > +	struct rte_mbuf *tail;
> > > > +
> > > > +	/* parse the given packet */
> > > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > > +				struct ether_hdr *) + 1);
> > > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > > +		- tcp_hl1;
> > > > +
> > > > +	/* parse the original packet */
> > > > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > > > +				struct ether_hdr *) + 1);
> > > > +
> > > > +	/* check reassembly rules */
> > > > +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
> > > > +		return -1;
> > > > +
> > > > +	/* remove the header of the incoming packet */
> > > > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > > > +			ipv4_ihl1 + tcp_hl1);
> > > > +
> > > > +	/* chain the two packet together */
> > > > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > > > +	tail->next = pkt;
> > > > +
> > > > +	/* update IP header */
> > > > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > > > +			rte_be_to_cpu_16(
> > > > +				ipv4_hdr2->total_length)
> > > > +			+ tcp_dl1);
> > > > +
> > > > +	/* update mbuf metadata for the merged packet */
> > > > +	pkt_src->nb_segs++;
> > > > +	pkt_src->pkt_len += pkt->pkt_len;
> > > > +	return 1;
> > > > +}
> > > > +
> > > > +static int
> > > > +check_seq_option(struct rte_mbuf *pkt,
> > > > +		struct tcp_hdr *tcp_hdr,
> > > > +		uint16_t tcp_hl)
> > > > +{
> > > > +	struct ipv4_hdr *ipv4_hdr1;
> > > > +	struct tcp_hdr *tcp_hdr1;
> > > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > > +	uint32_t sent_seq1, sent_seq;
> > > > +	int ret = -1;
> > > > +
> > > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > > +				struct ether_hdr *) + 1);
> > > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > > +		- tcp_hl1;
> > > > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > > > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > > > +
> > > > +	/* check if the two packets are neighbor */
> > > > +	if ((sent_seq ^ sent_seq1) == 0) {
> > > > +		/* check if TCP option field equals */
> > > > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> > > > +			if ((tcp_hl1 != tcp_hl) ||
> > > > +					(memcmp(tcp_hdr1 + 1,
> > > > +							tcp_hdr + 1,
> > > > +							tcp_hl - sizeof
> > > > +							(struct tcp_hdr))
> > > > +					 == 0))
> > > > +				ret = 1;
> > > > +		}
> > > > +	}
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static uint32_t
> > > > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > > > +{
> > > > +	uint32_t i;
> > > > +
> > > > +	for (i = 0; i < tbl->max_item_num; i++)
> > > > +		if (tbl->items[i].is_valid == 0)
> > > > +			return i;
> > > > +	return INVALID_ITEM_INDEX;
> > > > +}
> > > > +
> > > > +static uint16_t
> > > > +find_an_empty_flow(struct gro_tcp_tbl *tbl)
> > > > +{
> > > > +	uint16_t i;
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++)
> > > > +		if (tbl->flows[i].is_valid == 0)
> > > > +			return i;
> > > > +	return INVALID_FLOW_INDEX;
> > > > +}
> > > > +
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		struct gro_tcp_rule *rule)
> > > > +{
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
> > > > +
> > > > +	struct gro_tcp_flow_key key;
> > > > +	uint64_t ol_flags;
> > > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > > +	uint16_t i, flow_idx;
> > > > +
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > > +
> > > > +	/* 1. check if the packet should be processed */
> > > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > > +		goto fail;
> > > > +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
> > > > +		goto fail;
> > > > +	if ((ipv4_hdr->fragment_offset &
> > > > +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
> > > > +			== 0)
> > > > +		goto fail;
> > > > +
> > > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > > +		- tcp_hl;
> > > > +	if (tcp_dl == 0)
> > > > +		goto fail;
> > > > +
> > > > +	/**
> > > > +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
> > > > +	 * checksum in SW. Then, check if the checksum is correct
> > > > +	 */
> > > > +	ol_flags = pkt->ol_flags;
> > > > +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
> > > > +			PKT_RX_IP_CKSUM_UNKNOWN) {
> > > > +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
> > > > +			goto fail;
> > > > +	} else {
> > > > +		ip_cksum = ipv4_hdr->hdr_checksum;
> > > > +		ipv4_hdr->hdr_checksum = 0;
> > > > +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > > +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
> > > > +			goto fail;
> > > > +	}
> > > > +
> > > > +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
> > > > +			PKT_RX_L4_CKSUM_UNKNOWN) {
> > > > +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
> > > > +			goto fail;
> > > > +	} else {
> > > > +		tcp_cksum = tcp_hdr->cksum;
> > > > +		tcp_hdr->cksum = 0;
> > > > +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
> > > > +			(ipv4_hdr, tcp_hdr);
> > > > +		if (tcp_hdr->cksum ^ tcp_cksum)
> > > > +			goto fail;
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * 3. search for a flow and traverse all packets in the flow
> > > > +	 * to find one to merge with the given packet.
> > > > +	 */
> > > > +	key.eth_saddr = eth_hdr->s_addr;
> > > > +	key.eth_daddr = eth_hdr->d_addr;
> > > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > > +		/* search all packets in a valid flow. */
> > > > +		if (tbl->flows[i].is_valid &&
> > > > +				(memcmp(&(tbl->flows[i].key), &key,
> > > > +						sizeof(struct gro_tcp_flow_key))
> > > > +				 == 0)) {
> > > > +			cur_idx = tbl->flows[i].start_index;
> > > > +			prev_idx = cur_idx;
> > > > +			while (cur_idx != INVALID_ITEM_INDEX) {
> > > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > > +							tcp_hdr,
> > > > +							tcp_hl) > 0) {
> > > > +					if (merge_two_tcp4_packets(
> > > > +								tbl->items[cur_idx].pkt,
> > > > +								pkt,
> > > > +								rule) > 0) {
> > > > +						/* successfully merge two packets */
> > > > +						tbl->items[cur_idx].is_groed = 1;
> > > > +						return 1;
> > > > +					}
> > > > +					/**
> > > > +					 * fail to merge two packets since
> > > > +					 * break the rules, add the packet
> > > > +					 * into the flow.
> > > > +					 */
> > > > +					goto insert_to_existed_flow;
> > > > +				} else {
> > > > +					prev_idx = cur_idx;
> > > > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > > > +				}
> > > > +			}
> > > > +			/**
> > > > +			 * fail to merge the given packet into an existed flow,
> > > > +			 * add it into the flow.
> > > > +			 */
> > > > +insert_to_existed_flow:
> > > > +			item_idx = find_an_empty_item(tbl);
> > > > +			/* the item number is beyond the maximum value */
> > > > +			if (item_idx == INVALID_ITEM_INDEX)
> > > > +				return -1;
> > > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > > +			tbl->items[item_idx].pkt = pkt;
> > > > +			tbl->items[item_idx].is_groed = 0;
> > > > +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > > +			tbl->items[item_idx].is_valid = 1;
> > > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > > +			tbl->item_num++;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * merge fail as the given packet is a new flow. Therefore,
> > > > +	 * insert a new flow.
> > > > +	 */
> > > > +	item_idx = find_an_empty_item(tbl);
> > > > +	flow_idx = find_an_empty_flow(tbl);
> > > > +	/**
> > > > +	 * if the flow or item number are beyond the maximum values,
> > > > +	 * the inputted packet won't be processed.
> > > > +	 */
> > > > +	if (item_idx == INVALID_ITEM_INDEX ||
> > > > +			flow_idx == INVALID_FLOW_INDEX)
> > > > +		return -1;
> > > > +	tbl->items[item_idx].pkt = pkt;
> > > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > > +	tbl->items[item_idx].is_groed = 0;
> > > > +	tbl->items[item_idx].is_valid = 1;
> > > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > > > +	tbl->item_num++;
> > > > +
> > > > +	memcpy(&(tbl->flows[flow_idx].key),
> > > > +			&key, sizeof(struct gro_tcp_flow_key));
> > > > +	tbl->flows[flow_idx].start_index = item_idx;
> > > > +	tbl->flows[flow_idx].is_valid = 1;
> > > > +	tbl->flow_num++;
> > > > +
> > > > +	return 0;
> > > > +fail:
> > > > +	return -1;
> > > > +}
> > > > +
> > > > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out)
> > > > +{
> > > > +	uint16_t num, k;
> > > > +	uint16_t i;
> > > > +	uint32_t j;
> > > > +
> > > > +	k = 0;
> > > > +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
> > > > +	num = num > nb_out ? nb_out : num;
> > > > +	if (unlikely(num == 0))
> > > > +		return 0;
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > > +		if (tbl->flows[i].is_valid) {
> > > > +			j = tbl->flows[i].start_index;
> > > > +			while (j != INVALID_ITEM_INDEX) {
> > > > +				/* update checksum for GROed packet */
> > > > +				if (tbl->items[j].is_groed)
> > > > +					gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > > +
> > > > +				out[k++] = tbl->items[j].pkt;
> > > > +				tbl->items[j].is_valid = 0;
> > > > +				tbl->item_num--;
> > > > +				j = tbl->items[j].next_pkt_idx;
> > > > +
> > > > +				if (k == num) {
> > > > +					/* delete the flow */
> > > > +					if (j == INVALID_ITEM_INDEX) {
> > > > +						tbl->flows[i].is_valid = 0;
> > > > +						tbl->flow_num--;
> > > > +					} else
> > > > +						/* update flow information */
> > > > +						tbl->flows[i].start_index = j;
> > > > +					goto end;
> > > > +				}
> > > > +			}
> > > > +			/* delete the flow, as all of its packets are flushed */
> > > > +			tbl->flows[i].is_valid = 0;
> > > > +			tbl->flow_num--;
> > > > +		}
> > > > +		if (tbl->flow_num == 0)
> > > > +			goto end;
> > > > +	}
> > > > +end:
> > > > +	return num;
> > > > +}
> > > > +
> > > > +uint16_t
> > > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint64_t timeout_cycles,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out)
> > > > +{
> > > > +	uint16_t k;
> > > > +	uint16_t i;
> > > > +	uint32_t j;
> > > > +	uint64_t current_time;
> > > > +
> > > > +	if (nb_out == 0)
> > > > +		return 0;
> > > > +	k = 0;
> > > > +	current_time = rte_rdtsc();
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > > +		if (tbl->flows[i].is_valid) {
> > > > +			j = tbl->flows[i].start_index;
> > > > +			while (j != INVALID_ITEM_INDEX) {
> > > > +				if (current_time - tbl->items[j].start_time >=
> > > > +						timeout_cycles) {
> > > > +					/* update checksum for GROed packet */
> > > > +					if (tbl->items[j].is_groed)
> > > > +						gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > > +
> > > > +					out[k++] = tbl->items[j].pkt;
> > > > +					tbl->items[j].is_valid = 0;
> > > > +					tbl->item_num--;
> > > > +					j = tbl->items[j].next_pkt_idx;
> > > > +
> > > > +					if (k == nb_out &&
> > > > +							j == INVALID_ITEM_INDEX) {
> > > > +						/* delete the flow */
> > > > +						tbl->flows[i].is_valid = 0;
> > > > +						tbl->flow_num--;
> > > > +						goto end;
> > > > +					} else if (k == nb_out &&
> > > > +							j != INVALID_ITEM_INDEX) {
> > > > +						tbl->flows[i].start_index = j;
> > > > +						goto end;
> > > > +					}
> > > > +				}
> > > > +			}
> > > > +			/* delete the flow, as all of its packets are flushed */
> > > > +			tbl->flows[i].is_valid = 0;
> > > > +			tbl->flow_num--;
> > > > +		}
> > > > +		if (tbl->flow_num == 0)
> > > > +			goto end;
> > > > +	}
> > > > +end:
> > > > +	return k;
> > > > +}
> > > > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > > > new file mode 100644
> > > > index 0000000..551efc4
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro_tcp.h
> > > > @@ -0,0 +1,210 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > + */
> > > > +
> > > > +#ifndef _RTE_GRO_TCP_H_
> > > > +#define _RTE_GRO_TCP_H_
> > > > +
> > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > +#define TCP_HDR_LEN(tcph) \
> > > > +	((tcph->data_off >> 4) * 4)
> > > > +#define IPv4_HDR_LEN(iph) \
> > > > +	((iph->version_ihl & 0x0f) * 4)
> > > > +#else
> > > > +#define TCP_DATAOFF_MASK 0x0f
> > > > +#define TCP_HDR_LEN(tcph) \
> > > > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > > > +#define IPv4_HDR_LEN(iph) \
> > > > +	((iph->version_ihl >> 4) * 4)
> > > > +#endif
> > > > +
> > > > +#define IPV4_HDR_DF_SHIFT 14
> > > > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > > > +
> > > > +#define INVALID_FLOW_INDEX 0xffffU
> > > > +#define INVALID_ITEM_INDEX 0xffffffffUL
> > > > +
> > > > +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
> > > > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > > > +
> > > > +/* criteria of mergeing packets */
> > > > +struct gro_tcp_flow_key {
> > > > +	struct ether_addr eth_saddr;
> > > > +	struct ether_addr eth_daddr;
> > > > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > > > +	uint32_t ip_dst_addr[4];
> > > > +
> > > > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > > > +	uint16_t src_port;
> > > > +	uint16_t dst_port;
> > > > +	uint8_t tcp_flags;	/**< TCP flags. */
> > > > +};
> > > > +
> > > > +struct gro_tcp_flow {
> > > > +	struct gro_tcp_flow_key key;
> > > > +	uint32_t start_index;	/**< the first packet index of the flow */
> > > > +	uint8_t is_valid;
> > > > +};
> > > > +
> > > > +struct gro_tcp_item {
> > > > +	struct rte_mbuf *pkt;	/**< packet address. */
> > > > +	/* the time when the packet in added into the table */
> > > > +	uint64_t start_time;
> > > > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > > > +	/* flag to indicate if the packet is GROed */
> > > > +	uint8_t is_groed;
> > > > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> > > > +};
> > > > +
> > > > +/**
> > > > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > > > + * structure.
> > > > + */
> > > > +struct gro_tcp_tbl {
> > > > +	struct gro_tcp_item *items;	/**< item array */
> > > > +	struct gro_tcp_flow *flows;	/**< flow array */
> > > > +	uint32_t item_num;	/**< current item number */
> > > > +	uint16_t flow_num;	/**< current flow num */
> > > > +	uint32_t max_item_num;	/**< item array size */
> > > > +	uint16_t max_flow_num;	/**< flow array size */
> > > > +};
> > > > +
> > > > +/* rules to reassemble TCP packets, which are decided by applications */
> > > > +struct gro_tcp_rule {
> > > > +	/* the maximum packet length after merged */
> > > > +	uint32_t max_packet_size;
> > > > +};
> > >
> > > Are there any other rules? If not, I prefer to use max_packet_size directly.
> > 
> > If we agree to use a flag to indicate if check checksum, this structure should
> > be used to keep this flag.
> > 
> > >
> > > > +
> > > > +/**
> > > > + * This function is to update TCP and IPv4 header checksums
> > > > + * for merged packets in the TCP reassembly table.
> > > > + */
> > > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
> > > > +
> > > > +/**
> > > > + * This function creates a TCP reassembly table.
> > > > + *
> > > > + * @param socket_id
> > > > + *  socket index where the Ethernet port connects to.
> > > > + * @param max_flow_num
> > > > + *  the maximum number of flows in the TCP GRO table
> > > > + * @param max_item_per_flow
> > > > + *  the maximum packet number per flow.
> > > > + * @return
> > > > + *  if create successfully, return a pointer which points to the
> > > > + *  created TCP GRO table. Otherwise, return NULL.
> > > > + */
> > > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > > > +		uint16_t max_flow_num,
> > > > +		uint16_t max_item_per_flow);
> > > > +
> > > > +/**
> > > > + * This function destroys a TCP reassembly table.
> > > > + * @param tbl
> > > > + *  a pointer points to the TCP reassembly table.
> > > > + */
> > > > +void gro_tcp_tbl_destroy(void *tbl);
> > > > +
> > > > +/**
> > > > + * This function searches for a packet in the TCP reassembly table to
> > > > + * merge with the inputted one. To merge two packets is to chain them
> > > > + * together and update packet headers. Note that this function won't
> > > > + * re-calculate IPv4 and TCP checksums.
> > > > + *
> > > > + * If the packet doesn't have data, or with wrong checksums, or is
> > > > + * fragmented etc., errors happen and gro_tcp4_reassemble returns
> > > > + * immediately. If no errors happen, the packet is either merged, or
> > > > + * inserted into the reassembly table.
> > > > + *
> > > > + * If applications want to get packets in the reassembly table, they
> > > > + * need to manually flush the packets.
> > > > + *
> > > > + * @param pkt
> > > > + *  packet to reassemble.
> > > > + * @param tbl
> > > > + *  a pointer that points to a TCP reassembly table.
> > > > + * @param rule
> > > > + *  TCP reassembly criteria defined by applications.
> > > > + * @return
> > > > + *  if the inputted packet is merged successfully, return an positive
> > > > + *  value. If the packet hasn't be merged with any packets in the TCP
> > > > + *  reassembly table. If errors happen, return a negative value and the
> > > > + *  packet won't be inserted into the reassemble table.
> > > > + */
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		struct gro_tcp_rule *rule);
> > > > +
> > > > +/**
> > > > + * This function flushes the packets in a TCP reassembly table to
> > > > + * applications. Before returning the packets, it will update TCP and
> > > > + * IPv4 header checksums.
> > > > + *
> > > > + * @param tbl
> > > > + *  a pointer that points to a TCP GRO table.
> > > > + * @param flush_num
> > > > + *  the number of packets that applications want to flush.
> > > > + * @param out
> > > > + *  pointer array which is used to keep flushed packets.
> > > > + * @param nb_out
> > > > + *  the maximum element number of out.
> > > > + * @return
> > > > + *  the number of packets that are flushed finally.
> > > > + */
> > > > +uint16_t
> > > > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out);
> > > > +
> > > > +/**
> > > > + * This function flushes timeout packets in a TCP reassembly table to
> > > > + * applications. Before returning the packets, it updates TCP and IPv4
> > > > + * header checksums.
> > > > + *
> > > > + * @param tbl
> > > > + *  a pointer that points to a TCP GRO table.
> > > > + * @param timeout_cycles
> > > > + *  the maximum time that packets can stay in the table.
> > > > + * @param out
> > > > + *  pointer array which is used to keep flushed packets.
> > > > + * @param nb_out
> > > > + *  the maximum element number of out.
> > > > + * @return
> > > > + *  It returns the number of packets that are flushed finally.
> > > > + */
> > > > +uint16_t
> > > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint64_t timeout_cycles,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out);
> > > > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-20  3:22             ` Jiayu Hu
  2017-06-20 15:15               ` Ananyev, Konstantin
  2017-06-20 15:21               ` Ananyev, Konstantin
@ 2017-06-20 23:30               ` Tan, Jianfeng
  2017-06-20 23:55                 ` Stephen Hemminger
  2017-06-22  7:39                 ` Jiayu Hu
  2 siblings, 2 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-20 23:30 UTC (permalink / raw)
  To: Jiayu Hu; +Cc: dev, konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao

Hi Jiayu,


On 6/20/2017 11:22 AM, Jiayu Hu wrote:
> Hi Jianfeng,
>
> On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
>>
>> On 6/18/2017 3:21 PM, Jiayu Hu wrote:
>>> In this patch, we introduce six APIs to support TCP/IPv4 GRO.
>> Those functions are not used outside of this library. Don't make it as
>> extern visible.
> But they are called by functions in rte_gro.c, which are in the different
> file. If we define these functions with static, how can they be called by
> other functions in the different file?

We can define some ops for GRO engines. And in each GRO engine, tcp4 in 
this case, we just need to register those ops; then we can iterate all 
GRO engines in rte_gro,c. It's a better way for other developers to 
contribute other GRO engines.

>
>>> - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
>>>       merge packets.
>> Will tcp6 shares the same function with tcp4? If no, please rename it to
>> gro_tcp4_tlb_create
> In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
> they will have different reassembly function. Therefore, I use
> gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.

Then as far as I see, we are gonna call this function for all GRO 
engines except different flow structures are allocated for different GRO 
engines. Then I suggest we put this function into rte_gro.c.

>
>>> - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
>>> - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
>>> - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
>>>       reassembly table.
>>> - gro_tcp4_reassemble: merge an inputted packet.
>>> - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
>>>       all merged packets in the TCP reassembly table.
>>>
>>> In TCP GRO, we use a table structure, called TCP reassembly table, to
>>> reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
>>> structure. A TCP reassembly table includes a flow array and a item array,
>>> where the flow array is used to record flow information and the item
>>> array is used to record packets information.
>>>
>>> Each element in the flow array records the information of one flow,
>>> which includes two parts:
>>> - key: the criteria of the same flow. If packets have the same key
>>>       value, they belong to the same flow.
>>> - start_index: the index of the first incoming packet of this flow in
>>>       the item array. With start_index, we can locate the first incoming
>>>       packet of this flow.
>>> Each element in the item array records one packet information. It mainly
>>> includes two parts:
>>> - pkt: packet address
>>> - next_pkt_index: index of the next packet of the same flow in the item
>>>       array. All packets of the same flow are chained by next_pkt_index.
>>>       With next_pkt_index, we can locate all packets of the same flow
>>>       one by one.
>>>
>>> To process an incoming packet, we need three steps:
>>> a. check if the packet should be processed. Packets with the following
>>>       properties won't be processed:
>>> 	- packets without data;
>>> 	- packets with wrong checksums;
>> Why do we care to check this kind of error? Can we just suppose the
>> applications have already dropped the packets with wrong cksum?
> Indeed, if we assume all inputted packets are correct, we can avoid
> checksum checking overhead. But as a library, I think a more flexible
> way is to enable applications to tell GRO API if checksum checking
> is needed. For example, we can add a flag to struct rte_gro_tbl
> and struct rte_gro_param, which indicates if the checksum checking
> is needed. If applications set this flag, reassembly function won't
> check packet checksum. Otherwise, we check the checksum. How do you
> think?

My opinion is to keep the library focusing on what it does, and make 
clear its dependency. This flag thing will differ for different GRO 
engines, which makes it a little complicated to me.

>
>>> 	- fragmented packets.
>> IP fragmented? I don't think we need to check it here either. It's the
>> application's responsibility to call librte_ip_frag firstly to reassemble
>> IP-fragmented packets, and then call this gro library to merge TCP packets.
>> And this procedure should be shown in an example for other users to refer.
>>
>>> b. traverse the flow array to find a flow which the packet belongs to.
>>>       If not find, insert a new flow and store the packet into the item
>>>       array.
>> You do not store the packet now. "store the packet into the item array" ->
>> "then go to step c".
> Thanks, I will update it in next patch.
>
>>> c. locate the first packet of this flow in the item array via
>>>       start_index. Then traverse all packets of this flow one by one via
>>>       next_pkt_index. If find one packet to merge with the incoming packet,
>>>       merge them but without updating checksums. If not, allocate one item
>>>       in the item array to store the incoming packet and update
>>>       next_pkt_index value.
>>>
>>> For better performance, we don't update header checksums once two
>>> packets are merged. The header checksums are updated only when packets
>>> are flushed from TCP reassembly tables.
>> Why do we care to recalculate the L4 checksum when flushing? How about Just
>> keeping the wrong cksum, and letting the applications to handle that?
> Not all applications want GROed packets with wrong checksum. So I think a
> more reasonable way is to give a flag to applications to tell GRO API if
> they need calculate checksum when flush them from GRO table. How do you
> think?

The two main directions: (1) to be sent out from physical NIC; (2) to be 
sent out from a vhost port. It is very easy to take care of the wrong 
checksum for applications.

>
>>
>>> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
>>> ---
>>>    lib/librte_gro/Makefile      |   1 +
>>>    lib/librte_gro/rte_gro.c     | 154 +++++++++++--
>>>    lib/librte_gro/rte_gro.h     |  34 +--
>>>    lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
>>>    lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
>>>    5 files changed, 895 insertions(+), 31 deletions(-)
>>>    create mode 100644 lib/librte_gro/rte_gro_tcp.c
>>>    create mode 100644 lib/librte_gro/rte_gro_tcp.h
>>>
>>> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
>>> index 9f4063a..3495dfc 100644
>>> --- a/lib/librte_gro/Makefile
>>> +++ b/lib/librte_gro/Makefile
>>> @@ -43,6 +43,7 @@ LIBABIVER := 1
>>>    # source files
>>>    SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
>>> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
>> Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.
> TCP4 and TCP6 reassembly functions will be placed in the same file,
> rte_gro_tcp.c. But currently, we don't support TCP6 GRO.

That's ok to me. But then we have to have different struct gro_tcp_flow 
for tcp4 and tcp6.

>
>>>    # install this header file
>>>    SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
>>> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
>>> index 1bc53a2..2620ef6 100644
>>> --- a/lib/librte_gro/rte_gro.c
>>> +++ b/lib/librte_gro/rte_gro.c
>>> @@ -32,11 +32,17 @@
>>>    #include <rte_malloc.h>
>>>    #include <rte_mbuf.h>
>>> +#include <rte_ethdev.h>
>>> +#include <rte_ip.h>
>>> +#include <rte_tcp.h>
>>>    #include "rte_gro.h"
>>> +#include "rte_gro_tcp.h"
>>> -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
>>> -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
>>> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
>>> +	gro_tcp_tbl_create, NULL};
>>> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
>>> +	gro_tcp_tbl_destroy, NULL};
>>>    struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
>>>    		uint16_t max_flow_num,
>>> @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
>>>    }
>>>    uint16_t
>>> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>>> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>>>    		const uint16_t nb_pkts,
>>> -		const struct rte_gro_param param __rte_unused)
>>> +		const struct rte_gro_param param)
>>>    {
>>> -	return nb_pkts;
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	uint16_t l3proc_type, i;
>> I did not catch the variable definition here: l3proc_type -> l3_proto?
> You can see it in line 158 and line 159.

It's not the reference. I mean the variable name is not that clear.

>
>>> +	uint16_t nb_after_gro = nb_pkts;
>>> +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
>>> +		nb_pkts : param.max_flow_num;
>>> +	uint32_t item_num = nb_pkts <
>>> +		flow_num * param.max_item_per_flow ?
>>> +		nb_pkts :
>>> +		flow_num * param.max_item_per_flow;
>>> +
>>> +	/* allocate a reassembly table for TCP/IPv4 GRO */
>>> +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
>>> +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
>>> +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
>>> +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
>> Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
>> comment, we iterate all ptypes of this packets to iterate all supported GRO
>> engine.
> Sorry, I don't get the point. The table which is created here is used by
> gro_tcp4_reassemble when merges packets. If we don't create table here,
> what does gro_tcp4_reassemble use to merge packets?

Too much tcp* code here. If we add another GRO engine, take udp as an 
example, shall we add more udp* code here? Not a good idea to me. In 
fact, gro_tcp4_reassemble is defined in rte_gro_tcp.c instead of this 
file. For better modularity, we'd better put these tcp-related code into 
rte_gro_tcp.c.


>
>>> +	struct gro_tcp_tbl tcp_tbl;
>>> +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
>>> +	struct gro_tcp_item tcp_items[tcp_item_num];
>>> +	struct gro_tcp_rule tcp_rule;
>>> +
>>> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
>>> +	uint16_t unprocess_num = 0;
>>> +	int32_t ret;
>>> +
>>> +	if (unlikely(nb_pkts <= 1))
>>> +		return nb_pkts;
>>> +
>>> +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
>>> +			tcp_flow_num);
>>> +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
>>> +			tcp_item_num);
>>> +	tcp_tbl.flows = tcp_flows;
>>> +	tcp_tbl.items = tcp_items;
>>> +	tcp_tbl.flow_num = 0;
>>> +	tcp_tbl.item_num = 0;
>>> +	tcp_tbl.max_flow_num = tcp_flow_num;
>>> +	tcp_tbl.max_item_num = tcp_item_num;
>>> +	tcp_rule.max_packet_size = param.max_packet_size;
>>> +
>>> +	for (i = 0; i < nb_pkts; i++) {
>>> +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
>>> +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>> +		if (l3proc_type == ETHER_TYPE_IPv4) {
>>> +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
>>> +					(param.desired_gro_types &
>>> +					 GRO_TCP_IPV4)) {
>>> +				ret = gro_tcp4_reassemble(pkts[i],
>>> +						&tcp_tbl,
>>> +						&tcp_rule);
>>> +				if (ret > 0)
>>> +					nb_after_gro--;
>>> +				else if (ret < 0)
>>> +					unprocess_pkts[unprocess_num++] =
>>> +						pkts[i];
>>> +			} else
>>> +				unprocess_pkts[unprocess_num++] =
>>> +					pkts[i];
>>> +		} else
>>> +			unprocess_pkts[unprocess_num++] =
>>> +				pkts[i];
>>> +	}
>>> +
>>> +	if (nb_after_gro < nb_pkts) {
>>> +		/* update packets headers and re-arrange GROed packets */
>>> +		if (param.desired_gro_types & GRO_TCP_IPV4) {
>>> +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
>>> +			for (i = 0; i < tcp_tbl.item_num; i++)
>>> +				pkts[i] = tcp_tbl.items[i].pkt;
>>> +		}
>>> +		if (unprocess_num > 0) {
>>> +			memcpy(&pkts[i], unprocess_pkts,
>>> +					sizeof(struct rte_mbuf *) *
>>> +					unprocess_num);
>>> +			i += unprocess_num;
>>> +		}
>>> +		if (nb_pkts > i)
>>> +			memset(&pkts[i], 0,
>>> +					sizeof(struct rte_mbuf *) *
>>> +					(nb_pkts - i));
>>> +	}
>>> +	return nb_after_gro;
>>>    }
>>> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>>> -		struct rte_gro_tbl *gro_tbl __rte_unused)
>>> +int rte_gro_reassemble(struct rte_mbuf *pkt,
>>> +		struct rte_gro_tbl *gro_tbl)
>>>    {
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	uint16_t l3proc_type;
>>> +	struct gro_tcp_rule tcp_rule;
>>> +
>>> +	if (pkt == NULL)
>>> +		return -1;
>>> +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
>>> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>>> +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>> +	if (l3proc_type == ETHER_TYPE_IPv4) {
>>> +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
>>> +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
>>> +			return gro_tcp4_reassemble(pkt,
>>> +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
>>> +					&tcp_rule);
>>> +		}
>>> +	}
>>>    	return -1;
>>>    }
>>> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		uint16_t flush_num __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused)
>>> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out)
>>>    {
>> Ditto.
>>
>>> +	desired_gro_types = desired_gro_types &
>>> +		gro_tbl->desired_gro_types;
>>> +	if (desired_gro_types & GRO_TCP_IPV4)
>>> +		return gro_tcp_tbl_flush(
>>> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
>>> +				flush_num,
>>> +				out,
>>> +				max_nb_out);
>>>    	return 0;
>>>    }
>>>    uint16_t
>>> -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused)
>>> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out)
>>>    {
>>> +	desired_gro_types = desired_gro_types &
>>> +		gro_tbl->desired_gro_types;
>>> +	if (desired_gro_types & GRO_TCP_IPV4)
>>> +		return gro_tcp_tbl_timeout_flush(
>>> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
>>> +				gro_tbl->max_timeout_cycles,
>>> +				out, max_nb_out);
>>>    	return 0;
>>>    }
>>> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
>>> index 67bd90d..e26aa5b 100644
>>> --- a/lib/librte_gro/rte_gro.h
>>> +++ b/lib/librte_gro/rte_gro.h
>>> @@ -35,7 +35,11 @@
>>>    /* maximum number of supported GRO types */
>>>    #define GRO_TYPE_MAX_NB 64
>>> -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
>>> +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
>>> +
>>> +/* TCP/IPv4 GRO flag */
>>> +#define GRO_TCP_IPV4_INDEX 0
>>> +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
>>>    /**
>>>     * GRO table structure. DPDK GRO uses GRO table to reassemble
>>> @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
>>>     * @return
>>>     *  the number of packets after GROed.
>>>     */
>>> -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>>> -		const uint16_t nb_pkts __rte_unused,
>>> -		const struct rte_gro_param param __rte_unused);
>>> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>>> +		const uint16_t nb_pkts,
>>> +		const struct rte_gro_param param);
>>>    /**
>>>     * This is the main reassembly API used in heavyweight mode, which
>>> @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>>>     *  if merge the packet successfully, return a positive value. If fail
>>>     *  to merge, return zero. If errors happen, return a negative value.
>>>     */
>>> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>>> -		struct rte_gro_tbl *gro_tbl __rte_unused);
>>> +int rte_gro_reassemble(struct rte_mbuf *pkt,
>>> +		struct rte_gro_tbl *gro_tbl);
>>>    /**
>>>     * This function flushed packets of desired GRO types from their
>>> @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>>>     * @return
>>>     *  the number of flushed packets. If no packets are flushed, return 0.
>>>     */
>>> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		uint16_t flush_num __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused);
>>> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out);
>>>    /**
>>>     * This function flushes the timeout packets from reassembly tables of
>>> @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>>     * @return
>>>     *  the number of flushed packets. If no packets are flushed, return 0.
>>>     */
>>> -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused);
>>> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out);
>> Do you have any cases to test this API? I don't see following example use
>> this API. That means we are exposing an API that are never tested. I don't
>> know if we can add some experimental flag on this API. Let's seek advice
>> from others.
> These flush APIs are used in heavyweight mode. But testpmd is not a good case
> to use heavyweight mode. How do you think if we use some unit tests to test
> them?

I think vhost example is a good place to implement heavyweight mode. 
There is timeout mechanism in vhost example which can call this flush 
API. Feel free to ping yuanhan and Maxime for suggestions.

>
>>>    #endif
>>> diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
>>> new file mode 100644
>>> index 0000000..86743cd
>>> --- /dev/null
>>> +++ b/lib/librte_gro/rte_gro_tcp.c
>>> @@ -0,0 +1,527 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
>>> + *
>>> + *   Redistribution and use in source and binary forms, with or without
>>> + *   modification, are permitted provided that the following conditions
>>> + *   are met:
>>> + *
>>> + *     * Redistributions of source code must retain the above copyright
>>> + *       notice, this list of conditions and the following disclaimer.
>>> + *     * Redistributions in binary form must reproduce the above copyright
>>> + *       notice, this list of conditions and the following disclaimer in
>>> + *       the documentation and/or other materials provided with the
>>> + *       distribution.
>>> + *     * Neither the name of Intel Corporation nor the names of its
>>> + *       contributors may be used to endorse or promote products derived
>>> + *       from this software without specific prior written permission.
>>> + *
>>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>> + */
>>> +
>>> +#include <rte_malloc.h>
>>> +#include <rte_mbuf.h>
>>> +#include <rte_cycles.h>
>>> +
>>> +#include <rte_ethdev.h>
>>> +#include <rte_ip.h>
>>> +#include <rte_tcp.h>
>>> +
>>> +#include "rte_gro_tcp.h"
>>> +
>>> +void *gro_tcp_tbl_create(uint16_t socket_id,
>> Define it as "static". Similar to other functions.
>>
>>> +		uint16_t max_flow_num,
>>> +		uint16_t max_item_per_flow)
>>> +{
>>> +	size_t size;
>>> +	uint32_t entries_num;
>>> +	struct gro_tcp_tbl *tbl;
>>> +
>>> +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
>>> +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
>>> +
>>> +	entries_num = max_flow_num * max_item_per_flow;
>>> +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
>>> +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
>>> +
>>> +	if (entries_num == 0 || max_flow_num == 0)
>>> +		return NULL;
>>> +
>>> +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
>>> +			__func__,
>>> +			sizeof(struct gro_tcp_tbl),
>>> +			RTE_CACHE_LINE_SIZE,
>>> +			socket_id);
>>> +
>>> +	size = sizeof(struct gro_tcp_item) * entries_num;
>>> +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
>>> +			__func__,
>>> +			size,
>>> +			RTE_CACHE_LINE_SIZE,
>>> +			socket_id);
>>> +	tbl->max_item_num = entries_num;
>>> +
>>> +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
>>> +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
>>> +			__func__,
>>> +			size, RTE_CACHE_LINE_SIZE,
>>> +			socket_id);
>>> +	tbl->max_flow_num = max_flow_num;
>>> +	return tbl;
>>> +}
>>> +
>>> +void gro_tcp_tbl_destroy(void *tbl)
>>> +{
>>> +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
>>> +
>>> +	if (tcp_tbl) {
>>> +		if (tcp_tbl->items)
>>> +			rte_free(tcp_tbl->items);
>>> +		if (tcp_tbl->flows)
>>> +			rte_free(tcp_tbl->flows);
>>> +		rte_free(tcp_tbl);
>>> +	}
>>> +}
>>> +
>>> +/* update TCP header and IPv4 header checksum */
>>> +static void
>>> +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
>>> +{
>>> +	uint32_t len, offset, cksum;
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	struct tcp_hdr *tcp_hdr;
>>> +	uint16_t ipv4_ihl, cksum_pld;
>>> +
>>> +	if (pkt == NULL)
>>> +		return;
>>> +
>>> +	len = pkt->pkt_len;
>>> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>>> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
>>> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
>>> +
>>> +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
>>> +	len -= offset;
>>> +
>>> +	/* TCP cksum without IP pseudo header */
>>> +	ipv4_hdr->hdr_checksum = 0;
>>> +	tcp_hdr->cksum = 0;
>>> +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
>>> +
>>> +	/* IP pseudo header cksum */
>>> +	cksum = cksum_pld;
>>> +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
>>> +
>>> +	/* combine TCP checksum and IP pseudo header checksum */
>>> +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
>>> +	cksum = (~cksum) & 0xffff;
>>> +	cksum = (cksum == 0) ? 0xffff : cksum;
>>> +	tcp_hdr->cksum = cksum;
>>> +
>>> +	/* update IP header cksum */
>>> +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
>>> +}
>>> +
>>> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
>>> +{
>>> +	uint32_t i;
>>> +	uint32_t item_num = tbl->item_num;
>>> +
>>> +	for (i = 0; i < tbl->max_item_num; i++) {
>>> +		if (tbl->items[i].is_valid) {
>>> +			item_num--;
>>> +			if (tbl->items[i].is_groed)
>>> +				gro_tcp4_cksum_update(tbl->items[i].pkt);
>>> +		}
>>> +		if (unlikely(item_num == 0))
>>> +			break;
>>> +	}
>>> +}
>>> +
>>> +/**
>>> + * merge two TCP/IPv4 packets without update header checksum.
>>> + */
>>> +static int
>>> +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
>>> +		struct rte_mbuf *pkt,
>>> +		struct gro_tcp_rule *rule)
>>> +{
>>> +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
>>> +	struct tcp_hdr *tcp_hdr1;
>>> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
>>> +	struct rte_mbuf *tail;
>>> +
>>> +	/* parse the given packet */
>>> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
>>> +				struct ether_hdr *) + 1);
>>> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
>>> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
>>> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
>>> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
>>> +		- tcp_hl1;
>>> +
>>> +	/* parse the original packet */
>>> +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
>>> +				struct ether_hdr *) + 1);
>>> +
>>> +	/* check reassembly rules */
>>> +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
>>> +		return -1;
>>> +
>>> +	/* remove the header of the incoming packet */
>>> +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
>>> +			ipv4_ihl1 + tcp_hl1);
>>> +
>>> +	/* chain the two packet together */
>>> +	tail = rte_pktmbuf_lastseg(pkt_src);
>>> +	tail->next = pkt;
>>> +
>>> +	/* update IP header */
>>> +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
>>> +			rte_be_to_cpu_16(
>>> +				ipv4_hdr2->total_length)
>>> +			+ tcp_dl1);
>>> +
>>> +	/* update mbuf metadata for the merged packet */
>>> +	pkt_src->nb_segs++;
>>> +	pkt_src->pkt_len += pkt->pkt_len;
>>> +	return 1;
>>> +}
>>> +
>>> +static int
>>> +check_seq_option(struct rte_mbuf *pkt,
>>> +		struct tcp_hdr *tcp_hdr,
>>> +		uint16_t tcp_hl)
>>> +{
>>> +	struct ipv4_hdr *ipv4_hdr1;
>>> +	struct tcp_hdr *tcp_hdr1;
>>> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
>>> +	uint32_t sent_seq1, sent_seq;
>>> +	int ret = -1;
>>> +
>>> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
>>> +				struct ether_hdr *) + 1);
>>> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
>>> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
>>> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
>>> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
>>> +		- tcp_hl1;
>>> +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
>>> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
>>> +
>>> +	/* check if the two packets are neighbor */
>>> +	if ((sent_seq ^ sent_seq1) == 0) {
>>> +		/* check if TCP option field equals */
>>> +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
>>> +			if ((tcp_hl1 != tcp_hl) ||
>>> +					(memcmp(tcp_hdr1 + 1,
>>> +							tcp_hdr + 1,
>>> +							tcp_hl - sizeof
>>> +							(struct tcp_hdr))
>>> +					 == 0))
>>> +				ret = 1;
>>> +		}
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>> +static uint32_t
>>> +find_an_empty_item(struct gro_tcp_tbl *tbl)
>>> +{
>>> +	uint32_t i;
>>> +
>>> +	for (i = 0; i < tbl->max_item_num; i++)
>>> +		if (tbl->items[i].is_valid == 0)
>>> +			return i;
>>> +	return INVALID_ITEM_INDEX;
>>> +}
>>> +
>>> +static uint16_t
>>> +find_an_empty_flow(struct gro_tcp_tbl *tbl)
>>> +{
>>> +	uint16_t i;
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++)
>>> +		if (tbl->flows[i].is_valid == 0)
>>> +			return i;
>>> +	return INVALID_FLOW_INDEX;
>>> +}
>>> +
>>> +int32_t
>>> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
>>> +		struct gro_tcp_tbl *tbl,
>>> +		struct gro_tcp_rule *rule)
>>> +{
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	struct tcp_hdr *tcp_hdr;
>>> +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
>>> +
>>> +	struct gro_tcp_flow_key key;
>>> +	uint64_t ol_flags;
>>> +	uint32_t cur_idx, prev_idx, item_idx;
>>> +	uint16_t i, flow_idx;
>>> +
>>> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>>> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
>>> +
>>> +	/* 1. check if the packet should be processed */
>>> +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
>>> +		goto fail;
>>> +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
>>> +		goto fail;
>>> +	if ((ipv4_hdr->fragment_offset &
>>> +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
>>> +			== 0)
>>> +		goto fail;
>>> +
>>> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
>>> +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
>>> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
>>> +		- tcp_hl;
>>> +	if (tcp_dl == 0)
>>> +		goto fail;
>>> +
>>> +	/**
>>> +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
>>> +	 * checksum in SW. Then, check if the checksum is correct
>>> +	 */
>>> +	ol_flags = pkt->ol_flags;
>>> +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
>>> +			PKT_RX_IP_CKSUM_UNKNOWN) {
>>> +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
>>> +			goto fail;
>>> +	} else {
>>> +		ip_cksum = ipv4_hdr->hdr_checksum;
>>> +		ipv4_hdr->hdr_checksum = 0;
>>> +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
>>> +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
>>> +			goto fail;
>>> +	}
>>> +
>>> +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
>>> +			PKT_RX_L4_CKSUM_UNKNOWN) {
>>> +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
>>> +			goto fail;
>>> +	} else {
>>> +		tcp_cksum = tcp_hdr->cksum;
>>> +		tcp_hdr->cksum = 0;
>>> +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
>>> +			(ipv4_hdr, tcp_hdr);
>>> +		if (tcp_hdr->cksum ^ tcp_cksum)
>>> +			goto fail;
>>> +	}
>>> +
>>> +	/**
>>> +	 * 3. search for a flow and traverse all packets in the flow
>>> +	 * to find one to merge with the given packet.
>>> +	 */
>>> +	key.eth_saddr = eth_hdr->s_addr;
>>> +	key.eth_daddr = eth_hdr->d_addr;
>>> +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
>>> +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
>>> +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
>>> +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
>>> +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
>>> +	key.tcp_flags = tcp_hdr->tcp_flags;
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++) {
>>> +		/* search all packets in a valid flow. */
>>> +		if (tbl->flows[i].is_valid &&
>>> +				(memcmp(&(tbl->flows[i].key), &key,
>>> +						sizeof(struct gro_tcp_flow_key))
>>> +				 == 0)) {
>>> +			cur_idx = tbl->flows[i].start_index;
>>> +			prev_idx = cur_idx;
>>> +			while (cur_idx != INVALID_ITEM_INDEX) {
>>> +				if (check_seq_option(tbl->items[cur_idx].pkt,
>>> +							tcp_hdr,
>>> +							tcp_hl) > 0) {
>>> +					if (merge_two_tcp4_packets(
>>> +								tbl->items[cur_idx].pkt,
>>> +								pkt,
>>> +								rule) > 0) {
>>> +						/* successfully merge two packets */
>>> +						tbl->items[cur_idx].is_groed = 1;
>>> +						return 1;
>>> +					}
>>> +					/**
>>> +					 * fail to merge two packets since
>>> +					 * break the rules, add the packet
>>> +					 * into the flow.
>>> +					 */
>>> +					goto insert_to_existed_flow;
>>> +				} else {
>>> +					prev_idx = cur_idx;
>>> +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
>>> +				}
>>> +			}
>>> +			/**
>>> +			 * fail to merge the given packet into an existed flow,
>>> +			 * add it into the flow.
>>> +			 */
>>> +insert_to_existed_flow:
>>> +			item_idx = find_an_empty_item(tbl);
>>> +			/* the item number is beyond the maximum value */
>>> +			if (item_idx == INVALID_ITEM_INDEX)
>>> +				return -1;
>>> +			tbl->items[prev_idx].next_pkt_idx = item_idx;
>>> +			tbl->items[item_idx].pkt = pkt;
>>> +			tbl->items[item_idx].is_groed = 0;
>>> +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
>>> +			tbl->items[item_idx].is_valid = 1;
>>> +			tbl->items[item_idx].start_time = rte_rdtsc();
>>> +			tbl->item_num++;
>>> +			return 0;
>>> +		}
>>> +	}
>>> +
>>> +	/**
>>> +	 * merge fail as the given packet is a new flow. Therefore,
>>> +	 * insert a new flow.
>>> +	 */
>>> +	item_idx = find_an_empty_item(tbl);
>>> +	flow_idx = find_an_empty_flow(tbl);
>>> +	/**
>>> +	 * if the flow or item number are beyond the maximum values,
>>> +	 * the inputted packet won't be processed.
>>> +	 */
>>> +	if (item_idx == INVALID_ITEM_INDEX ||
>>> +			flow_idx == INVALID_FLOW_INDEX)
>>> +		return -1;
>>> +	tbl->items[item_idx].pkt = pkt;
>>> +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
>>> +	tbl->items[item_idx].is_groed = 0;
>>> +	tbl->items[item_idx].is_valid = 1;
>>> +	tbl->items[item_idx].start_time = rte_rdtsc();
>>> +	tbl->item_num++;
>>> +
>>> +	memcpy(&(tbl->flows[flow_idx].key),
>>> +			&key, sizeof(struct gro_tcp_flow_key));
>>> +	tbl->flows[flow_idx].start_index = item_idx;
>>> +	tbl->flows[flow_idx].is_valid = 1;
>>> +	tbl->flow_num++;
>>> +
>>> +	return 0;
>>> +fail:
>>> +	return -1;
>>> +}
>>> +
>>> +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out)
>>> +{
>>> +	uint16_t num, k;
>>> +	uint16_t i;
>>> +	uint32_t j;
>>> +
>>> +	k = 0;
>>> +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
>>> +	num = num > nb_out ? nb_out : num;
>>> +	if (unlikely(num == 0))
>>> +		return 0;
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++) {
>>> +		if (tbl->flows[i].is_valid) {
>>> +			j = tbl->flows[i].start_index;
>>> +			while (j != INVALID_ITEM_INDEX) {
>>> +				/* update checksum for GROed packet */
>>> +				if (tbl->items[j].is_groed)
>>> +					gro_tcp4_cksum_update(tbl->items[j].pkt);
>>> +
>>> +				out[k++] = tbl->items[j].pkt;
>>> +				tbl->items[j].is_valid = 0;
>>> +				tbl->item_num--;
>>> +				j = tbl->items[j].next_pkt_idx;
>>> +
>>> +				if (k == num) {
>>> +					/* delete the flow */
>>> +					if (j == INVALID_ITEM_INDEX) {
>>> +						tbl->flows[i].is_valid = 0;
>>> +						tbl->flow_num--;
>>> +					} else
>>> +						/* update flow information */
>>> +						tbl->flows[i].start_index = j;
>>> +					goto end;
>>> +				}
>>> +			}
>>> +			/* delete the flow, as all of its packets are flushed */
>>> +			tbl->flows[i].is_valid = 0;
>>> +			tbl->flow_num--;
>>> +		}
>>> +		if (tbl->flow_num == 0)
>>> +			goto end;
>>> +	}
>>> +end:
>>> +	return num;
>>> +}
>>> +
>>> +uint16_t
>>> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
>>> +		uint64_t timeout_cycles,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out)
>>> +{
>>> +	uint16_t k;
>>> +	uint16_t i;
>>> +	uint32_t j;
>>> +	uint64_t current_time;
>>> +
>>> +	if (nb_out == 0)
>>> +		return 0;
>>> +	k = 0;
>>> +	current_time = rte_rdtsc();
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++) {
>>> +		if (tbl->flows[i].is_valid) {
>>> +			j = tbl->flows[i].start_index;
>>> +			while (j != INVALID_ITEM_INDEX) {
>>> +				if (current_time - tbl->items[j].start_time >=
>>> +						timeout_cycles) {
>>> +					/* update checksum for GROed packet */
>>> +					if (tbl->items[j].is_groed)
>>> +						gro_tcp4_cksum_update(tbl->items[j].pkt);
>>> +
>>> +					out[k++] = tbl->items[j].pkt;
>>> +					tbl->items[j].is_valid = 0;
>>> +					tbl->item_num--;
>>> +					j = tbl->items[j].next_pkt_idx;
>>> +
>>> +					if (k == nb_out &&
>>> +							j == INVALID_ITEM_INDEX) {
>>> +						/* delete the flow */
>>> +						tbl->flows[i].is_valid = 0;
>>> +						tbl->flow_num--;
>>> +						goto end;
>>> +					} else if (k == nb_out &&
>>> +							j != INVALID_ITEM_INDEX) {
>>> +						tbl->flows[i].start_index = j;
>>> +						goto end;
>>> +					}
>>> +				}
>>> +			}
>>> +			/* delete the flow, as all of its packets are flushed */
>>> +			tbl->flows[i].is_valid = 0;
>>> +			tbl->flow_num--;
>>> +		}
>>> +		if (tbl->flow_num == 0)
>>> +			goto end;
>>> +	}
>>> +end:
>>> +	return k;
>>> +}
>>> diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
>>> new file mode 100644
>>> index 0000000..551efc4
>>> --- /dev/null
>>> +++ b/lib/librte_gro/rte_gro_tcp.h
>>> @@ -0,0 +1,210 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
>>> + *
>>> + *   Redistribution and use in source and binary forms, with or without
>>> + *   modification, are permitted provided that the following conditions
>>> + *   are met:
>>> + *
>>> + *     * Redistributions of source code must retain the above copyright
>>> + *       notice, this list of conditions and the following disclaimer.
>>> + *     * Redistributions in binary form must reproduce the above copyright
>>> + *       notice, this list of conditions and the following disclaimer in
>>> + *       the documentation and/or other materials provided with the
>>> + *       distribution.
>>> + *     * Neither the name of Intel Corporation nor the names of its
>>> + *       contributors may be used to endorse or promote products derived
>>> + *       from this software without specific prior written permission.
>>> + *
>>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>> + */
>>> +
>>> +#ifndef _RTE_GRO_TCP_H_
>>> +#define _RTE_GRO_TCP_H_
>>> +
>>> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>>> +#define TCP_HDR_LEN(tcph) \
>>> +	((tcph->data_off >> 4) * 4)
>>> +#define IPv4_HDR_LEN(iph) \
>>> +	((iph->version_ihl & 0x0f) * 4)
>>> +#else
>>> +#define TCP_DATAOFF_MASK 0x0f
>>> +#define TCP_HDR_LEN(tcph) \
>>> +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
>>> +#define IPv4_HDR_LEN(iph) \
>>> +	((iph->version_ihl >> 4) * 4)
>>> +#endif
>>> +
>>> +#define IPV4_HDR_DF_SHIFT 14
>>> +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
>>> +
>>> +#define INVALID_FLOW_INDEX 0xffffU
>>> +#define INVALID_ITEM_INDEX 0xffffffffUL
>>> +
>>> +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
>>> +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
>>> +
>>> +/* criteria of mergeing packets */
>>> +struct gro_tcp_flow_key {
>>> +	struct ether_addr eth_saddr;
>>> +	struct ether_addr eth_daddr;
>>> +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
>>> +	uint32_t ip_dst_addr[4];
>>> +
>>> +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
>>> +	uint16_t src_port;
>>> +	uint16_t dst_port;
>>> +	uint8_t tcp_flags;	/**< TCP flags. */
>>> +};
>>> +
>>> +struct gro_tcp_flow {
>>> +	struct gro_tcp_flow_key key;
>>> +	uint32_t start_index;	/**< the first packet index of the flow */
>>> +	uint8_t is_valid;
>>> +};
>>> +
>>> +struct gro_tcp_item {
>>> +	struct rte_mbuf *pkt;	/**< packet address. */
>>> +	/* the time when the packet in added into the table */
>>> +	uint64_t start_time;
>>> +	uint32_t next_pkt_idx;	/**< next packet index. */
>>> +	/* flag to indicate if the packet is GROed */
>>> +	uint8_t is_groed;
>>> +	uint8_t is_valid;	/**< flag indicates if the item is valid */
>>> +};
>>> +
>>> +/**
>>> + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
>>> + * structure.
>>> + */
>>> +struct gro_tcp_tbl {
>>> +	struct gro_tcp_item *items;	/**< item array */
>>> +	struct gro_tcp_flow *flows;	/**< flow array */
>>> +	uint32_t item_num;	/**< current item number */
>>> +	uint16_t flow_num;	/**< current flow num */
>>> +	uint32_t max_item_num;	/**< item array size */
>>> +	uint16_t max_flow_num;	/**< flow array size */
>>> +};
>>> +
>>> +/* rules to reassemble TCP packets, which are decided by applications */
>>> +struct gro_tcp_rule {
>>> +	/* the maximum packet length after merged */
>>> +	uint32_t max_packet_size;
>>> +};
>> Are there any other rules? If not, I prefer to use max_packet_size directly.
> If we agree to use a flag to indicate if check checksum, this structure should
> be used to keep this flag.
>
>>> +
>>> +/**
>>> + * This function is to update TCP and IPv4 header checksums
>>> + * for merged packets in the TCP reassembly table.
>>> + */
>>> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
>>> +
>>> +/**
>>> + * This function creates a TCP reassembly table.
>>> + *
>>> + * @param socket_id
>>> + *  socket index where the Ethernet port connects to.
>>> + * @param max_flow_num
>>> + *  the maximum number of flows in the TCP GRO table
>>> + * @param max_item_per_flow
>>> + *  the maximum packet number per flow.
>>> + * @return
>>> + *  if create successfully, return a pointer which points to the
>>> + *  created TCP GRO table. Otherwise, return NULL.
>>> + */
>>> +void *gro_tcp_tbl_create(uint16_t socket_id,
>>> +		uint16_t max_flow_num,
>>> +		uint16_t max_item_per_flow);
>>> +
>>> +/**
>>> + * This function destroys a TCP reassembly table.
>>> + * @param tbl
>>> + *  a pointer points to the TCP reassembly table.
>>> + */
>>> +void gro_tcp_tbl_destroy(void *tbl);
>>> +
>>> +/**
>>> + * This function searches for a packet in the TCP reassembly table to
>>> + * merge with the inputted one. To merge two packets is to chain them
>>> + * together and update packet headers. Note that this function won't
>>> + * re-calculate IPv4 and TCP checksums.
>>> + *
>>> + * If the packet doesn't have data, or with wrong checksums, or is
>>> + * fragmented etc., errors happen and gro_tcp4_reassemble returns
>>> + * immediately. If no errors happen, the packet is either merged, or
>>> + * inserted into the reassembly table.
>>> + *
>>> + * If applications want to get packets in the reassembly table, they
>>> + * need to manually flush the packets.
>>> + *
>>> + * @param pkt
>>> + *  packet to reassemble.
>>> + * @param tbl
>>> + *  a pointer that points to a TCP reassembly table.
>>> + * @param rule
>>> + *  TCP reassembly criteria defined by applications.
>>> + * @return
>>> + *  if the inputted packet is merged successfully, return an positive
>>> + *  value. If the packet hasn't be merged with any packets in the TCP
>>> + *  reassembly table. If errors happen, return a negative value and the
>>> + *  packet won't be inserted into the reassemble table.
>>> + */
>>> +int32_t
>>> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
>>> +		struct gro_tcp_tbl *tbl,
>>> +		struct gro_tcp_rule *rule);
>>> +
>>> +/**
>>> + * This function flushes the packets in a TCP reassembly table to
>>> + * applications. Before returning the packets, it will update TCP and
>>> + * IPv4 header checksums.
>>> + *
>>> + * @param tbl
>>> + *  a pointer that points to a TCP GRO table.
>>> + * @param flush_num
>>> + *  the number of packets that applications want to flush.
>>> + * @param out
>>> + *  pointer array which is used to keep flushed packets.
>>> + * @param nb_out
>>> + *  the maximum element number of out.
>>> + * @return
>>> + *  the number of packets that are flushed finally.
>>> + */
>>> +uint16_t
>>> +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out);
>>> +
>>> +/**
>>> + * This function flushes timeout packets in a TCP reassembly table to
>>> + * applications. Before returning the packets, it updates TCP and IPv4
>>> + * header checksums.
>>> + *
>>> + * @param tbl
>>> + *  a pointer that points to a TCP GRO table.
>>> + * @param timeout_cycles
>>> + *  the maximum time that packets can stay in the table.
>>> + * @param out
>>> + *  pointer array which is used to keep flushed packets.
>>> + * @param nb_out
>>> + *  the maximum element number of out.
>>> + * @return
>>> + *  It returns the number of packets that are flushed finally.
>>> + */
>>> +uint16_t
>>> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
>>> +		uint64_t timeout_cycles,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out);
>>> +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-20 23:30               ` Tan, Jianfeng
@ 2017-06-20 23:55                 ` Stephen Hemminger
  2017-06-22  7:39                 ` Jiayu Hu
  1 sibling, 0 replies; 141+ messages in thread
From: Stephen Hemminger @ 2017-06-20 23:55 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: Jiayu Hu, dev, konstantin.ananyev, yliu, keith.wiles, tiwei.bie,
	lei.a.yao

On Wed, 21 Jun 2017 07:30:08 +0800
"Tan, Jianfeng" <jianfeng.tan@intel.com> wrote:

> >>> To process an incoming packet, we need three steps:
> >>> a. check if the packet should be processed. Packets with the following
> >>>       properties won't be processed:
> >>> 	- packets without data;
> >>> 	- packets with wrong checksums;  
> >> Why do we care to check this kind of error? Can we just suppose the
> >> applications have already dropped the packets with wrong cksum?  
> > Indeed, if we assume all inputted packets are correct, we can avoid
> > checksum checking overhead. But as a library, I think a more flexible
> > way is to enable applications to tell GRO API if checksum checking
> > is needed. For example, we can add a flag to struct rte_gro_tbl
> > and struct rte_gro_param, which indicates if the checksum checking
> > is needed. If applications set this flag, reassembly function won't
> > check packet checksum. Otherwise, we check the checksum. How do you
> > think?  
> 
> My opinion is to keep the library focusing on what it does, and make 
> clear its dependency. This flag thing will differ for different GRO 
> engines, which makes it a little complicated to me.

As long as it is documented behavior. GIGO is fine.
IMHO a well designed library does as little as possible and nothing more.

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-20 23:30               ` Tan, Jianfeng
  2017-06-20 23:55                 ` Stephen Hemminger
@ 2017-06-22  7:39                 ` Jiayu Hu
  1 sibling, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-22  7:39 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao

Hi Jianfeng,

On Wed, Jun 21, 2017 at 07:30:08AM +0800, Tan, Jianfeng wrote:
> Hi Jiayu,
> 
> 
> On 6/20/2017 11:22 AM, Jiayu Hu wrote:
> > Hi Jianfeng,
> > 
> > On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> > > 
> > > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > > > In this patch, we introduce six APIs to support TCP/IPv4 GRO.
> > > Those functions are not used outside of this library. Don't make it as
> > > extern visible.
> > But they are called by functions in rte_gro.c, which are in the different
> > file. If we define these functions with static, how can they be called by
> > other functions in the different file?
> 
> We can define some ops for GRO engines. And in each GRO engine, tcp4 in this
> case, we just need to register those ops; then we can iterate all GRO
> engines in rte_gro,c. It's a better way for other developers to contribute
> other GRO engines.
> 
> > 
> > > > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> > > >       merge packets.
> > > Will tcp6 shares the same function with tcp4? If no, please rename it to
> > > gro_tcp4_tlb_create
> > In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
> > they will have different reassembly function. Therefore, I use
> > gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.
> 
> Then as far as I see, we are gonna call this function for all GRO engines
> except different flow structures are allocated for different GRO engines.
> Then I suggest we put this function into rte_gro.c.
> 
> > 
> > > > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > > > - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
> > > > - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
> > > >       reassembly table.
> > > > - gro_tcp4_reassemble: merge an inputted packet.
> > > > - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
> > > >       all merged packets in the TCP reassembly table.
> > > > 
> > > > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > > > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > > > structure. A TCP reassembly table includes a flow array and a item array,
> > > > where the flow array is used to record flow information and the item
> > > > array is used to record packets information.
> > > > 
> > > > Each element in the flow array records the information of one flow,
> > > > which includes two parts:
> > > > - key: the criteria of the same flow. If packets have the same key
> > > >       value, they belong to the same flow.
> > > > - start_index: the index of the first incoming packet of this flow in
> > > >       the item array. With start_index, we can locate the first incoming
> > > >       packet of this flow.
> > > > Each element in the item array records one packet information. It mainly
> > > > includes two parts:
> > > > - pkt: packet address
> > > > - next_pkt_index: index of the next packet of the same flow in the item
> > > >       array. All packets of the same flow are chained by next_pkt_index.
> > > >       With next_pkt_index, we can locate all packets of the same flow
> > > >       one by one.
> > > > 
> > > > To process an incoming packet, we need three steps:
> > > > a. check if the packet should be processed. Packets with the following
> > > >       properties won't be processed:
> > > > 	- packets without data;
> > > > 	- packets with wrong checksums;
> > > Why do we care to check this kind of error? Can we just suppose the
> > > applications have already dropped the packets with wrong cksum?
> > Indeed, if we assume all inputted packets are correct, we can avoid
> > checksum checking overhead. But as a library, I think a more flexible
> > way is to enable applications to tell GRO API if checksum checking
> > is needed. For example, we can add a flag to struct rte_gro_tbl
> > and struct rte_gro_param, which indicates if the checksum checking
> > is needed. If applications set this flag, reassembly function won't
> > check packet checksum. Otherwise, we check the checksum. How do you
> > think?
> 
> My opinion is to keep the library focusing on what it does, and make clear
> its dependency. This flag thing will differ for different GRO engines, which
> makes it a little complicated to me.

Make sense. Not all packet types need to calculate checksums. It's not a good
idea to use this flag for all GRO engines. I will not do checksum calculation
and validation in next version.

> 
> > 
> > > > 	- fragmented packets.
> > > IP fragmented? I don't think we need to check it here either. It's the
> > > application's responsibility to call librte_ip_frag firstly to reassemble
> > > IP-fragmented packets, and then call this gro library to merge TCP packets.
> > > And this procedure should be shown in an example for other users to refer.
> > > 
> > > > b. traverse the flow array to find a flow which the packet belongs to.
> > > >       If not find, insert a new flow and store the packet into the item
> > > >       array.
> > > You do not store the packet now. "store the packet into the item array" ->
> > > "then go to step c".
> > Thanks, I will update it in next patch.
> > 
> > > > c. locate the first packet of this flow in the item array via
> > > >       start_index. Then traverse all packets of this flow one by one via
> > > >       next_pkt_index. If find one packet to merge with the incoming packet,
> > > >       merge them but without updating checksums. If not, allocate one item
> > > >       in the item array to store the incoming packet and update
> > > >       next_pkt_index value.
> > > > 
> > > > For better performance, we don't update header checksums once two
> > > > packets are merged. The header checksums are updated only when packets
> > > > are flushed from TCP reassembly tables.
> > > Why do we care to recalculate the L4 checksum when flushing? How about Just
> > > keeping the wrong cksum, and letting the applications to handle that?
> > Not all applications want GROed packets with wrong checksum. So I think a
> > more reasonable way is to give a flag to applications to tell GRO API if
> > they need calculate checksum when flush them from GRO table. How do you
> > think?
> 
> The two main directions: (1) to be sent out from physical NIC; (2) to be
> sent out from a vhost port. It is very easy to take care of the wrong
> checksum for applications.

Ditto.

> 
> > 
> > > 
> > > > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > > > ---
> > > >    lib/librte_gro/Makefile      |   1 +
> > > >    lib/librte_gro/rte_gro.c     | 154 +++++++++++--
> > > >    lib/librte_gro/rte_gro.h     |  34 +--
> > > >    lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
> > > >    lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
> > > >    5 files changed, 895 insertions(+), 31 deletions(-)
> > > >    create mode 100644 lib/librte_gro/rte_gro_tcp.c
> > > >    create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > > > 
> > > > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > > > index 9f4063a..3495dfc 100644
> > > > --- a/lib/librte_gro/Makefile
> > > > +++ b/lib/librte_gro/Makefile
> > > > @@ -43,6 +43,7 @@ LIBABIVER := 1
> > > >    # source files
> > > >    SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > > > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> > > Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.
> > TCP4 and TCP6 reassembly functions will be placed in the same file,
> > rte_gro_tcp.c. But currently, we don't support TCP6 GRO.
> 
> That's ok to me. But then we have to have different struct gro_tcp_flow for
> tcp4 and tcp6.

For TCP4 and TCP6, the criteria to merge packets is almost the same, except
the different IP address length. So I think they can share a same gro_tcp_flow
structure.

> 
> > 
> > > >    # install this header file
> > > >    SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > > index 1bc53a2..2620ef6 100644
> > > > --- a/lib/librte_gro/rte_gro.c
> > > > +++ b/lib/librte_gro/rte_gro.c
> > > > @@ -32,11 +32,17 @@
> > > >    #include <rte_malloc.h>
> > > >    #include <rte_mbuf.h>
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_ip.h>
> > > > +#include <rte_tcp.h>
> > > >    #include "rte_gro.h"
> > > > +#include "rte_gro_tcp.h"
> > > > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> > > > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> > > > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> > > > +	gro_tcp_tbl_create, NULL};
> > > > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> > > > +	gro_tcp_tbl_destroy, NULL};
> > > >    struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> > > >    		uint16_t max_flow_num,
> > > > @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > >    }
> > > >    uint16_t
> > > > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > > >    		const uint16_t nb_pkts,
> > > > -		const struct rte_gro_param param __rte_unused)
> > > > +		const struct rte_gro_param param)
> > > >    {
> > > > -	return nb_pkts;
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	uint16_t l3proc_type, i;
> > > I did not catch the variable definition here: l3proc_type -> l3_proto?
> > You can see it in line 158 and line 159.
> 
> It's not the reference. I mean the variable name is not that clear.

Thanks, I will change a variable name.

> 
> > 
> > > > +	uint16_t nb_after_gro = nb_pkts;
> > > > +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
> > > > +		nb_pkts : param.max_flow_num;
> > > > +	uint32_t item_num = nb_pkts <
> > > > +		flow_num * param.max_item_per_flow ?
> > > > +		nb_pkts :
> > > > +		flow_num * param.max_item_per_flow;
> > > > +
> > > > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > > > +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > > +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
> > > > +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > > +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
> > > Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
> > > comment, we iterate all ptypes of this packets to iterate all supported GRO
> > > engine.
> > Sorry, I don't get the point. The table which is created here is used by
> > gro_tcp4_reassemble when merges packets. If we don't create table here,
> > what does gro_tcp4_reassemble use to merge packets?
> 
> Too much tcp* code here. If we add another GRO engine, take udp as an
> example, shall we add more udp* code here? Not a good idea to me. In fact,
> gro_tcp4_reassemble is defined in rte_gro_tcp.c instead of this file. For
> better modularity, we'd better put these tcp-related code into
> rte_gro_tcp.c.

If we move these codes to rte_gro_tcp.c, we have to use malloc (or rte_malloc)
to allocate the table. That is, in each invocation of rte_gro_reassemble_burst,
there will be one malloc and free. But frequently using malloc will be a big
overhead. So for better performance, I allocate the table in stack here.

I will compare the performance different with or without using rte_malloc.
If the overhead is not so big, I will use gro_tcp_tbl_create to allocate
the table. If not, I prefer to allocate space in stack. How do you think?

> 
> 
> > 
> > > > +	struct gro_tcp_tbl tcp_tbl;
> > > > +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
> > > > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > > > +	struct gro_tcp_rule tcp_rule;
> > > > +
> > > > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > > > +	uint16_t unprocess_num = 0;
> > > > +	int32_t ret;
> > > > +
> > > > +	if (unlikely(nb_pkts <= 1))
> > > > +		return nb_pkts;
> > > > +
> > > > +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
> > > > +			tcp_flow_num);
> > > > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > > > +			tcp_item_num);
> > > > +	tcp_tbl.flows = tcp_flows;
> > > > +	tcp_tbl.items = tcp_items;
> > > > +	tcp_tbl.flow_num = 0;
> > > > +	tcp_tbl.item_num = 0;
> > > > +	tcp_tbl.max_flow_num = tcp_flow_num;
> > > > +	tcp_tbl.max_item_num = tcp_item_num;
> > > > +	tcp_rule.max_packet_size = param.max_packet_size;
> > > > +
> > > > +	for (i = 0; i < nb_pkts; i++) {
> > > > +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
> > > > +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > > +		if (l3proc_type == ETHER_TYPE_IPv4) {
> > > > +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > > +					(param.desired_gro_types &
> > > > +					 GRO_TCP_IPV4)) {
> > > > +				ret = gro_tcp4_reassemble(pkts[i],
> > > > +						&tcp_tbl,
> > > > +						&tcp_rule);
> > > > +				if (ret > 0)
> > > > +					nb_after_gro--;
> > > > +				else if (ret < 0)
> > > > +					unprocess_pkts[unprocess_num++] =
> > > > +						pkts[i];
> > > > +			} else
> > > > +				unprocess_pkts[unprocess_num++] =
> > > > +					pkts[i];
> > > > +		} else
> > > > +			unprocess_pkts[unprocess_num++] =
> > > > +				pkts[i];
> > > > +	}
> > > > +
> > > > +	if (nb_after_gro < nb_pkts) {
> > > > +		/* update packets headers and re-arrange GROed packets */
> > > > +		if (param.desired_gro_types & GRO_TCP_IPV4) {
> > > > +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
> > > > +			for (i = 0; i < tcp_tbl.item_num; i++)
> > > > +				pkts[i] = tcp_tbl.items[i].pkt;
> > > > +		}
> > > > +		if (unprocess_num > 0) {
> > > > +			memcpy(&pkts[i], unprocess_pkts,
> > > > +					sizeof(struct rte_mbuf *) *
> > > > +					unprocess_num);
> > > > +			i += unprocess_num;
> > > > +		}
> > > > +		if (nb_pkts > i)
> > > > +			memset(&pkts[i], 0,
> > > > +					sizeof(struct rte_mbuf *) *
> > > > +					(nb_pkts - i));
> > > > +	}
> > > > +	return nb_after_gro;
> > > >    }
> > > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > > +		struct rte_gro_tbl *gro_tbl)
> > > >    {
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	uint16_t l3proc_type;
> > > > +	struct gro_tcp_rule tcp_rule;
> > > > +
> > > > +	if (pkt == NULL)
> > > > +		return -1;
> > > > +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> > > > +	if (l3proc_type == ETHER_TYPE_IPv4) {
> > > > +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
> > > > +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
> > > > +			return gro_tcp4_reassemble(pkt,
> > > > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > > +					&tcp_rule);
> > > > +		}
> > > > +	}
> > > >    	return -1;
> > > >    }
> > > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		uint16_t flush_num __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused)
> > > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out)
> > > >    {
> > > Ditto.
> > > 
> > > > +	desired_gro_types = desired_gro_types &
> > > > +		gro_tbl->desired_gro_types;
> > > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > > +		return gro_tcp_tbl_flush(
> > > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > > +				flush_num,
> > > > +				out,
> > > > +				max_nb_out);
> > > >    	return 0;
> > > >    }
> > > >    uint16_t
> > > > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused)
> > > > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out)
> > > >    {
> > > > +	desired_gro_types = desired_gro_types &
> > > > +		gro_tbl->desired_gro_types;
> > > > +	if (desired_gro_types & GRO_TCP_IPV4)
> > > > +		return gro_tcp_tbl_timeout_flush(
> > > > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > > > +				gro_tbl->max_timeout_cycles,
> > > > +				out, max_nb_out);
> > > >    	return 0;
> > > >    }
> > > > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > > > index 67bd90d..e26aa5b 100644
> > > > --- a/lib/librte_gro/rte_gro.h
> > > > +++ b/lib/librte_gro/rte_gro.h
> > > > @@ -35,7 +35,11 @@
> > > >    /* maximum number of supported GRO types */
> > > >    #define GRO_TYPE_MAX_NB 64
> > > > -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> > > > +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> > > > +
> > > > +/* TCP/IPv4 GRO flag */
> > > > +#define GRO_TCP_IPV4_INDEX 0
> > > > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> > > >    /**
> > > >     * GRO table structure. DPDK GRO uses GRO table to reassemble
> > > > @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> > > >     * @return
> > > >     *  the number of packets after GROed.
> > > >     */
> > > > -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > > -		const uint16_t nb_pkts __rte_unused,
> > > > -		const struct rte_gro_param param __rte_unused);
> > > > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > > > +		const uint16_t nb_pkts,
> > > > +		const struct rte_gro_param param);
> > > >    /**
> > > >     * This is the main reassembly API used in heavyweight mode, which
> > > > @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > > >     *  if merge the packet successfully, return a positive value. If fail
> > > >     *  to merge, return zero. If errors happen, return a negative value.
> > > >     */
> > > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > > -		struct rte_gro_tbl *gro_tbl __rte_unused);
> > > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > > +		struct rte_gro_tbl *gro_tbl);
> > > >    /**
> > > >     * This function flushed packets of desired GRO types from their
> > > > @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > > >     * @return
> > > >     *  the number of flushed packets. If no packets are flushed, return 0.
> > > >     */
> > > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		uint16_t flush_num __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused);
> > > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out);
> > > >    /**
> > > >     * This function flushes the timeout packets from reassembly tables of
> > > > @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > >     * @return
> > > >     *  the number of flushed packets. If no packets are flushed, return 0.
> > > >     */
> > > > -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > > > -		uint64_t desired_gro_types __rte_unused,
> > > > -		struct rte_mbuf **out __rte_unused,
> > > > -		const uint16_t max_nb_out __rte_unused);
> > > > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out);
> > > Do you have any cases to test this API? I don't see following example use
> > > this API. That means we are exposing an API that are never tested. I don't
> > > know if we can add some experimental flag on this API. Let's seek advice
> > > from others.
> > These flush APIs are used in heavyweight mode. But testpmd is not a good case
> > to use heavyweight mode. How do you think if we use some unit tests to test
> > them?
> 
> I think vhost example is a good place to implement heavyweight mode. There
> is timeout mechanism in vhost example which can call this flush API. Feel
> free to ping yuanhan and Maxime for suggestions.

Thanks for your suggestion. It's a good idea.

> 
> > 
> > > >    #endif
> > > > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > > > new file mode 100644
> > > > index 0000000..86743cd
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro_tcp.c
> > > > @@ -0,0 +1,527 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > + */
> > > > +
> > > > +#include <rte_malloc.h>
> > > > +#include <rte_mbuf.h>
> > > > +#include <rte_cycles.h>
> > > > +
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_ip.h>
> > > > +#include <rte_tcp.h>
> > > > +
> > > > +#include "rte_gro_tcp.h"
> > > > +
> > > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > > Define it as "static". Similar to other functions.
> > > 
> > > > +		uint16_t max_flow_num,
> > > > +		uint16_t max_item_per_flow)
> > > > +{
> > > > +	size_t size;
> > > > +	uint32_t entries_num;
> > > > +	struct gro_tcp_tbl *tbl;
> > > > +
> > > > +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
> > > > +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
> > > > +
> > > > +	entries_num = max_flow_num * max_item_per_flow;
> > > > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > > > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > > > +
> > > > +	if (entries_num == 0 || max_flow_num == 0)
> > > > +		return NULL;
> > > > +
> > > > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > > > +			__func__,
> > > > +			sizeof(struct gro_tcp_tbl),
> > > > +			RTE_CACHE_LINE_SIZE,
> > > > +			socket_id);
> > > > +
> > > > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > > > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > > > +			__func__,
> > > > +			size,
> > > > +			RTE_CACHE_LINE_SIZE,
> > > > +			socket_id);
> > > > +	tbl->max_item_num = entries_num;
> > > > +
> > > > +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
> > > > +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
> > > > +			__func__,
> > > > +			size, RTE_CACHE_LINE_SIZE,
> > > > +			socket_id);
> > > > +	tbl->max_flow_num = max_flow_num;
> > > > +	return tbl;
> > > > +}
> > > > +
> > > > +void gro_tcp_tbl_destroy(void *tbl)
> > > > +{
> > > > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > > > +
> > > > +	if (tcp_tbl) {
> > > > +		if (tcp_tbl->items)
> > > > +			rte_free(tcp_tbl->items);
> > > > +		if (tcp_tbl->flows)
> > > > +			rte_free(tcp_tbl->flows);
> > > > +		rte_free(tcp_tbl);
> > > > +	}
> > > > +}
> > > > +
> > > > +/* update TCP header and IPv4 header checksum */
> > > > +static void
> > > > +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
> > > > +{
> > > > +	uint32_t len, offset, cksum;
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, cksum_pld;
> > > > +
> > > > +	if (pkt == NULL)
> > > > +		return;
> > > > +
> > > > +	len = pkt->pkt_len;
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +
> > > > +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
> > > > +	len -= offset;
> > > > +
> > > > +	/* TCP cksum without IP pseudo header */
> > > > +	ipv4_hdr->hdr_checksum = 0;
> > > > +	tcp_hdr->cksum = 0;
> > > > +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
> > > > +
> > > > +	/* IP pseudo header cksum */
> > > > +	cksum = cksum_pld;
> > > > +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
> > > > +
> > > > +	/* combine TCP checksum and IP pseudo header checksum */
> > > > +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
> > > > +	cksum = (~cksum) & 0xffff;
> > > > +	cksum = (cksum == 0) ? 0xffff : cksum;
> > > > +	tcp_hdr->cksum = cksum;
> > > > +
> > > > +	/* update IP header cksum */
> > > > +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > > +}
> > > > +
> > > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
> > > > +{
> > > > +	uint32_t i;
> > > > +	uint32_t item_num = tbl->item_num;
> > > > +
> > > > +	for (i = 0; i < tbl->max_item_num; i++) {
> > > > +		if (tbl->items[i].is_valid) {
> > > > +			item_num--;
> > > > +			if (tbl->items[i].is_groed)
> > > > +				gro_tcp4_cksum_update(tbl->items[i].pkt);
> > > > +		}
> > > > +		if (unlikely(item_num == 0))
> > > > +			break;
> > > > +	}
> > > > +}
> > > > +
> > > > +/**
> > > > + * merge two TCP/IPv4 packets without update header checksum.
> > > > + */
> > > > +static int
> > > > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > > > +		struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_rule *rule)
> > > > +{
> > > > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > > > +	struct tcp_hdr *tcp_hdr1;
> > > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > > +	struct rte_mbuf *tail;
> > > > +
> > > > +	/* parse the given packet */
> > > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > > +				struct ether_hdr *) + 1);
> > > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > > +		- tcp_hl1;
> > > > +
> > > > +	/* parse the original packet */
> > > > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > > > +				struct ether_hdr *) + 1);
> > > > +
> > > > +	/* check reassembly rules */
> > > > +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
> > > > +		return -1;
> > > > +
> > > > +	/* remove the header of the incoming packet */
> > > > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > > > +			ipv4_ihl1 + tcp_hl1);
> > > > +
> > > > +	/* chain the two packet together */
> > > > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > > > +	tail->next = pkt;
> > > > +
> > > > +	/* update IP header */
> > > > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > > > +			rte_be_to_cpu_16(
> > > > +				ipv4_hdr2->total_length)
> > > > +			+ tcp_dl1);
> > > > +
> > > > +	/* update mbuf metadata for the merged packet */
> > > > +	pkt_src->nb_segs++;
> > > > +	pkt_src->pkt_len += pkt->pkt_len;
> > > > +	return 1;
> > > > +}
> > > > +
> > > > +static int
> > > > +check_seq_option(struct rte_mbuf *pkt,
> > > > +		struct tcp_hdr *tcp_hdr,
> > > > +		uint16_t tcp_hl)
> > > > +{
> > > > +	struct ipv4_hdr *ipv4_hdr1;
> > > > +	struct tcp_hdr *tcp_hdr1;
> > > > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > > > +	uint32_t sent_seq1, sent_seq;
> > > > +	int ret = -1;
> > > > +
> > > > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > > > +				struct ether_hdr *) + 1);
> > > > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > > > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > > > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > > > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > > > +		- tcp_hl1;
> > > > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > > > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > > > +
> > > > +	/* check if the two packets are neighbor */
> > > > +	if ((sent_seq ^ sent_seq1) == 0) {
> > > > +		/* check if TCP option field equals */
> > > > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> > > > +			if ((tcp_hl1 != tcp_hl) ||
> > > > +					(memcmp(tcp_hdr1 + 1,
> > > > +							tcp_hdr + 1,
> > > > +							tcp_hl - sizeof
> > > > +							(struct tcp_hdr))
> > > > +					 == 0))
> > > > +				ret = 1;
> > > > +		}
> > > > +	}
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static uint32_t
> > > > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > > > +{
> > > > +	uint32_t i;
> > > > +
> > > > +	for (i = 0; i < tbl->max_item_num; i++)
> > > > +		if (tbl->items[i].is_valid == 0)
> > > > +			return i;
> > > > +	return INVALID_ITEM_INDEX;
> > > > +}
> > > > +
> > > > +static uint16_t
> > > > +find_an_empty_flow(struct gro_tcp_tbl *tbl)
> > > > +{
> > > > +	uint16_t i;
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++)
> > > > +		if (tbl->flows[i].is_valid == 0)
> > > > +			return i;
> > > > +	return INVALID_FLOW_INDEX;
> > > > +}
> > > > +
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		struct gro_tcp_rule *rule)
> > > > +{
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
> > > > +
> > > > +	struct gro_tcp_flow_key key;
> > > > +	uint64_t ol_flags;
> > > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > > +	uint16_t i, flow_idx;
> > > > +
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > > +
> > > > +	/* 1. check if the packet should be processed */
> > > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > > +		goto fail;
> > > > +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
> > > > +		goto fail;
> > > > +	if ((ipv4_hdr->fragment_offset &
> > > > +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
> > > > +			== 0)
> > > > +		goto fail;
> > > > +
> > > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > > +		- tcp_hl;
> > > > +	if (tcp_dl == 0)
> > > > +		goto fail;
> > > > +
> > > > +	/**
> > > > +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
> > > > +	 * checksum in SW. Then, check if the checksum is correct
> > > > +	 */
> > > > +	ol_flags = pkt->ol_flags;
> > > > +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
> > > > +			PKT_RX_IP_CKSUM_UNKNOWN) {
> > > > +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
> > > > +			goto fail;
> > > > +	} else {
> > > > +		ip_cksum = ipv4_hdr->hdr_checksum;
> > > > +		ipv4_hdr->hdr_checksum = 0;
> > > > +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
> > > > +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
> > > > +			goto fail;
> > > > +	}
> > > > +
> > > > +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
> > > > +			PKT_RX_L4_CKSUM_UNKNOWN) {
> > > > +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
> > > > +			goto fail;
> > > > +	} else {
> > > > +		tcp_cksum = tcp_hdr->cksum;
> > > > +		tcp_hdr->cksum = 0;
> > > > +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
> > > > +			(ipv4_hdr, tcp_hdr);
> > > > +		if (tcp_hdr->cksum ^ tcp_cksum)
> > > > +			goto fail;
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * 3. search for a flow and traverse all packets in the flow
> > > > +	 * to find one to merge with the given packet.
> > > > +	 */
> > > > +	key.eth_saddr = eth_hdr->s_addr;
> > > > +	key.eth_daddr = eth_hdr->d_addr;
> > > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > > +		/* search all packets in a valid flow. */
> > > > +		if (tbl->flows[i].is_valid &&
> > > > +				(memcmp(&(tbl->flows[i].key), &key,
> > > > +						sizeof(struct gro_tcp_flow_key))
> > > > +				 == 0)) {
> > > > +			cur_idx = tbl->flows[i].start_index;
> > > > +			prev_idx = cur_idx;
> > > > +			while (cur_idx != INVALID_ITEM_INDEX) {
> > > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > > +							tcp_hdr,
> > > > +							tcp_hl) > 0) {
> > > > +					if (merge_two_tcp4_packets(
> > > > +								tbl->items[cur_idx].pkt,
> > > > +								pkt,
> > > > +								rule) > 0) {
> > > > +						/* successfully merge two packets */
> > > > +						tbl->items[cur_idx].is_groed = 1;
> > > > +						return 1;
> > > > +					}
> > > > +					/**
> > > > +					 * fail to merge two packets since
> > > > +					 * break the rules, add the packet
> > > > +					 * into the flow.
> > > > +					 */
> > > > +					goto insert_to_existed_flow;
> > > > +				} else {
> > > > +					prev_idx = cur_idx;
> > > > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > > > +				}
> > > > +			}
> > > > +			/**
> > > > +			 * fail to merge the given packet into an existed flow,
> > > > +			 * add it into the flow.
> > > > +			 */
> > > > +insert_to_existed_flow:
> > > > +			item_idx = find_an_empty_item(tbl);
> > > > +			/* the item number is beyond the maximum value */
> > > > +			if (item_idx == INVALID_ITEM_INDEX)
> > > > +				return -1;
> > > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > > +			tbl->items[item_idx].pkt = pkt;
> > > > +			tbl->items[item_idx].is_groed = 0;
> > > > +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > > +			tbl->items[item_idx].is_valid = 1;
> > > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > > +			tbl->item_num++;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * merge fail as the given packet is a new flow. Therefore,
> > > > +	 * insert a new flow.
> > > > +	 */
> > > > +	item_idx = find_an_empty_item(tbl);
> > > > +	flow_idx = find_an_empty_flow(tbl);
> > > > +	/**
> > > > +	 * if the flow or item number are beyond the maximum values,
> > > > +	 * the inputted packet won't be processed.
> > > > +	 */
> > > > +	if (item_idx == INVALID_ITEM_INDEX ||
> > > > +			flow_idx == INVALID_FLOW_INDEX)
> > > > +		return -1;
> > > > +	tbl->items[item_idx].pkt = pkt;
> > > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
> > > > +	tbl->items[item_idx].is_groed = 0;
> > > > +	tbl->items[item_idx].is_valid = 1;
> > > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > > > +	tbl->item_num++;
> > > > +
> > > > +	memcpy(&(tbl->flows[flow_idx].key),
> > > > +			&key, sizeof(struct gro_tcp_flow_key));
> > > > +	tbl->flows[flow_idx].start_index = item_idx;
> > > > +	tbl->flows[flow_idx].is_valid = 1;
> > > > +	tbl->flow_num++;
> > > > +
> > > > +	return 0;
> > > > +fail:
> > > > +	return -1;
> > > > +}
> > > > +
> > > > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out)
> > > > +{
> > > > +	uint16_t num, k;
> > > > +	uint16_t i;
> > > > +	uint32_t j;
> > > > +
> > > > +	k = 0;
> > > > +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
> > > > +	num = num > nb_out ? nb_out : num;
> > > > +	if (unlikely(num == 0))
> > > > +		return 0;
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > > +		if (tbl->flows[i].is_valid) {
> > > > +			j = tbl->flows[i].start_index;
> > > > +			while (j != INVALID_ITEM_INDEX) {
> > > > +				/* update checksum for GROed packet */
> > > > +				if (tbl->items[j].is_groed)
> > > > +					gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > > +
> > > > +				out[k++] = tbl->items[j].pkt;
> > > > +				tbl->items[j].is_valid = 0;
> > > > +				tbl->item_num--;
> > > > +				j = tbl->items[j].next_pkt_idx;
> > > > +
> > > > +				if (k == num) {
> > > > +					/* delete the flow */
> > > > +					if (j == INVALID_ITEM_INDEX) {
> > > > +						tbl->flows[i].is_valid = 0;
> > > > +						tbl->flow_num--;
> > > > +					} else
> > > > +						/* update flow information */
> > > > +						tbl->flows[i].start_index = j;
> > > > +					goto end;
> > > > +				}
> > > > +			}
> > > > +			/* delete the flow, as all of its packets are flushed */
> > > > +			tbl->flows[i].is_valid = 0;
> > > > +			tbl->flow_num--;
> > > > +		}
> > > > +		if (tbl->flow_num == 0)
> > > > +			goto end;
> > > > +	}
> > > > +end:
> > > > +	return num;
> > > > +}
> > > > +
> > > > +uint16_t
> > > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint64_t timeout_cycles,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out)
> > > > +{
> > > > +	uint16_t k;
> > > > +	uint16_t i;
> > > > +	uint32_t j;
> > > > +	uint64_t current_time;
> > > > +
> > > > +	if (nb_out == 0)
> > > > +		return 0;
> > > > +	k = 0;
> > > > +	current_time = rte_rdtsc();
> > > > +
> > > > +	for (i = 0; i < tbl->max_flow_num; i++) {
> > > > +		if (tbl->flows[i].is_valid) {
> > > > +			j = tbl->flows[i].start_index;
> > > > +			while (j != INVALID_ITEM_INDEX) {
> > > > +				if (current_time - tbl->items[j].start_time >=
> > > > +						timeout_cycles) {
> > > > +					/* update checksum for GROed packet */
> > > > +					if (tbl->items[j].is_groed)
> > > > +						gro_tcp4_cksum_update(tbl->items[j].pkt);
> > > > +
> > > > +					out[k++] = tbl->items[j].pkt;
> > > > +					tbl->items[j].is_valid = 0;
> > > > +					tbl->item_num--;
> > > > +					j = tbl->items[j].next_pkt_idx;
> > > > +
> > > > +					if (k == nb_out &&
> > > > +							j == INVALID_ITEM_INDEX) {
> > > > +						/* delete the flow */
> > > > +						tbl->flows[i].is_valid = 0;
> > > > +						tbl->flow_num--;
> > > > +						goto end;
> > > > +					} else if (k == nb_out &&
> > > > +							j != INVALID_ITEM_INDEX) {
> > > > +						tbl->flows[i].start_index = j;
> > > > +						goto end;
> > > > +					}
> > > > +				}
> > > > +			}
> > > > +			/* delete the flow, as all of its packets are flushed */
> > > > +			tbl->flows[i].is_valid = 0;
> > > > +			tbl->flow_num--;
> > > > +		}
> > > > +		if (tbl->flow_num == 0)
> > > > +			goto end;
> > > > +	}
> > > > +end:
> > > > +	return k;
> > > > +}
> > > > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > > > new file mode 100644
> > > > index 0000000..551efc4
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro_tcp.h
> > > > @@ -0,0 +1,210 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > + */
> > > > +
> > > > +#ifndef _RTE_GRO_TCP_H_
> > > > +#define _RTE_GRO_TCP_H_
> > > > +
> > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > +#define TCP_HDR_LEN(tcph) \
> > > > +	((tcph->data_off >> 4) * 4)
> > > > +#define IPv4_HDR_LEN(iph) \
> > > > +	((iph->version_ihl & 0x0f) * 4)
> > > > +#else
> > > > +#define TCP_DATAOFF_MASK 0x0f
> > > > +#define TCP_HDR_LEN(tcph) \
> > > > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > > > +#define IPv4_HDR_LEN(iph) \
> > > > +	((iph->version_ihl >> 4) * 4)
> > > > +#endif
> > > > +
> > > > +#define IPV4_HDR_DF_SHIFT 14
> > > > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
> > > > +
> > > > +#define INVALID_FLOW_INDEX 0xffffU
> > > > +#define INVALID_ITEM_INDEX 0xffffffffUL
> > > > +
> > > > +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
> > > > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > > > +
> > > > +/* criteria of mergeing packets */
> > > > +struct gro_tcp_flow_key {
> > > > +	struct ether_addr eth_saddr;
> > > > +	struct ether_addr eth_daddr;
> > > > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > > > +	uint32_t ip_dst_addr[4];
> > > > +
> > > > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > > > +	uint16_t src_port;
> > > > +	uint16_t dst_port;
> > > > +	uint8_t tcp_flags;	/**< TCP flags. */
> > > > +};
> > > > +
> > > > +struct gro_tcp_flow {
> > > > +	struct gro_tcp_flow_key key;
> > > > +	uint32_t start_index;	/**< the first packet index of the flow */
> > > > +	uint8_t is_valid;
> > > > +};
> > > > +
> > > > +struct gro_tcp_item {
> > > > +	struct rte_mbuf *pkt;	/**< packet address. */
> > > > +	/* the time when the packet in added into the table */
> > > > +	uint64_t start_time;
> > > > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > > > +	/* flag to indicate if the packet is GROed */
> > > > +	uint8_t is_groed;
> > > > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> > > > +};
> > > > +
> > > > +/**
> > > > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > > > + * structure.
> > > > + */
> > > > +struct gro_tcp_tbl {
> > > > +	struct gro_tcp_item *items;	/**< item array */
> > > > +	struct gro_tcp_flow *flows;	/**< flow array */
> > > > +	uint32_t item_num;	/**< current item number */
> > > > +	uint16_t flow_num;	/**< current flow num */
> > > > +	uint32_t max_item_num;	/**< item array size */
> > > > +	uint16_t max_flow_num;	/**< flow array size */
> > > > +};
> > > > +
> > > > +/* rules to reassemble TCP packets, which are decided by applications */
> > > > +struct gro_tcp_rule {
> > > > +	/* the maximum packet length after merged */
> > > > +	uint32_t max_packet_size;
> > > > +};
> > > Are there any other rules? If not, I prefer to use max_packet_size directly.
> > If we agree to use a flag to indicate if check checksum, this structure should
> > be used to keep this flag.
> > 
> > > > +
> > > > +/**
> > > > + * This function is to update TCP and IPv4 header checksums
> > > > + * for merged packets in the TCP reassembly table.
> > > > + */
> > > > +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
> > > > +
> > > > +/**
> > > > + * This function creates a TCP reassembly table.
> > > > + *
> > > > + * @param socket_id
> > > > + *  socket index where the Ethernet port connects to.
> > > > + * @param max_flow_num
> > > > + *  the maximum number of flows in the TCP GRO table
> > > > + * @param max_item_per_flow
> > > > + *  the maximum packet number per flow.
> > > > + * @return
> > > > + *  if create successfully, return a pointer which points to the
> > > > + *  created TCP GRO table. Otherwise, return NULL.
> > > > + */
> > > > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > > > +		uint16_t max_flow_num,
> > > > +		uint16_t max_item_per_flow);
> > > > +
> > > > +/**
> > > > + * This function destroys a TCP reassembly table.
> > > > + * @param tbl
> > > > + *  a pointer points to the TCP reassembly table.
> > > > + */
> > > > +void gro_tcp_tbl_destroy(void *tbl);
> > > > +
> > > > +/**
> > > > + * This function searches for a packet in the TCP reassembly table to
> > > > + * merge with the inputted one. To merge two packets is to chain them
> > > > + * together and update packet headers. Note that this function won't
> > > > + * re-calculate IPv4 and TCP checksums.
> > > > + *
> > > > + * If the packet doesn't have data, or with wrong checksums, or is
> > > > + * fragmented etc., errors happen and gro_tcp4_reassemble returns
> > > > + * immediately. If no errors happen, the packet is either merged, or
> > > > + * inserted into the reassembly table.
> > > > + *
> > > > + * If applications want to get packets in the reassembly table, they
> > > > + * need to manually flush the packets.
> > > > + *
> > > > + * @param pkt
> > > > + *  packet to reassemble.
> > > > + * @param tbl
> > > > + *  a pointer that points to a TCP reassembly table.
> > > > + * @param rule
> > > > + *  TCP reassembly criteria defined by applications.
> > > > + * @return
> > > > + *  if the inputted packet is merged successfully, return an positive
> > > > + *  value. If the packet hasn't be merged with any packets in the TCP
> > > > + *  reassembly table. If errors happen, return a negative value and the
> > > > + *  packet won't be inserted into the reassemble table.
> > > > + */
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		struct gro_tcp_rule *rule);
> > > > +
> > > > +/**
> > > > + * This function flushes the packets in a TCP reassembly table to
> > > > + * applications. Before returning the packets, it will update TCP and
> > > > + * IPv4 header checksums.
> > > > + *
> > > > + * @param tbl
> > > > + *  a pointer that points to a TCP GRO table.
> > > > + * @param flush_num
> > > > + *  the number of packets that applications want to flush.
> > > > + * @param out
> > > > + *  pointer array which is used to keep flushed packets.
> > > > + * @param nb_out
> > > > + *  the maximum element number of out.
> > > > + * @return
> > > > + *  the number of packets that are flushed finally.
> > > > + */
> > > > +uint16_t
> > > > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint16_t flush_num,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out);
> > > > +
> > > > +/**
> > > > + * This function flushes timeout packets in a TCP reassembly table to
> > > > + * applications. Before returning the packets, it updates TCP and IPv4
> > > > + * header checksums.
> > > > + *
> > > > + * @param tbl
> > > > + *  a pointer that points to a TCP GRO table.
> > > > + * @param timeout_cycles
> > > > + *  the maximum time that packets can stay in the table.
> > > > + * @param out
> > > > + *  pointer array which is used to keep flushed packets.
> > > > + * @param nb_out
> > > > + *  the maximum element number of out.
> > > > + * @return
> > > > + *  It returns the number of packets that are flushed finally.
> > > > + */
> > > > +uint16_t
> > > > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > > > +		uint64_t timeout_cycles,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t nb_out);
> > > > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-19 15:43           ` Tan, Jianfeng
  2017-06-20  3:22             ` Jiayu Hu
@ 2017-06-22  8:18             ` Jiayu Hu
  2017-06-22  9:35               ` Tan, Jianfeng
  1 sibling, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-22  8:18 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, konstantin.ananyev, yliu, keith.wiles, tiwei.bie, lei.a.yao

On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > 
> > Each element in the flow array records the information of one flow,
> > which includes two parts:
> > - key: the criteria of the same flow. If packets have the same key
> >      value, they belong to the same flow.
> > - start_index: the index of the first incoming packet of this flow in
> >      the item array. With start_index, we can locate the first incoming
> >      packet of this flow.
> > Each element in the item array records one packet information. It mainly
> > includes two parts:
> > - pkt: packet address
> > - next_pkt_index: index of the next packet of the same flow in the item
> >      array. All packets of the same flow are chained by next_pkt_index.
> >      With next_pkt_index, we can locate all packets of the same flow
> >      one by one.
> > 
> > To process an incoming packet, we need three steps:
> > a. check if the packet should be processed. Packets with the following
> >      properties won't be processed:
> > 	- packets without data;
> > 	- packets with wrong checksums;
> 
> Why do we care to check this kind of error? Can we just suppose the
> applications have already dropped the packets with wrong cksum?
> 
> > 	- fragmented packets.
> 
> IP fragmented? I don't think we need to check it here either. It's the
> application's responsibility to call librte_ip_frag firstly to reassemble
> IP-fragmented packets, and then call this gro library to merge TCP packets.
> And this procedure should be shown in an example for other users to refer.

If we leave this check to applications, they have to call librte_ip_frag
first, then call GRO library. If they call GRO library first, error
happens. But if GRO checks if the packet is fragmented, no error happens. 
IMO, it's more flexible. Besides, for GRO library, to check if the packet
is fragmented is very simple. Just to check if DF bit is set is enough.
So I still think GRO library need to check it.

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-22  8:18             ` Jiayu Hu
@ 2017-06-22  9:35               ` Tan, Jianfeng
  2017-06-22 13:55                 ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-22  9:35 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: dev, Ananyev, Konstantin, yliu, Wiles, Keith, Bie, Tiwei, Yao, Lei A

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Thursday, June 22, 2017 4:18 PM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org; Ananyev, Konstantin; yliu@fridaylinux.org; Wiles, Keith;
> Bie, Tiwei; Yao, Lei A
> Subject: Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> >
> >
> > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > >
> > > Each element in the flow array records the information of one flow,
> > > which includes two parts:
> > > - key: the criteria of the same flow. If packets have the same key
> > >      value, they belong to the same flow.
> > > - start_index: the index of the first incoming packet of this flow in
> > >      the item array. With start_index, we can locate the first incoming
> > >      packet of this flow.
> > > Each element in the item array records one packet information. It mainly
> > > includes two parts:
> > > - pkt: packet address
> > > - next_pkt_index: index of the next packet of the same flow in the item
> > >      array. All packets of the same flow are chained by next_pkt_index.
> > >      With next_pkt_index, we can locate all packets of the same flow
> > >      one by one.
> > >
> > > To process an incoming packet, we need three steps:
> > > a. check if the packet should be processed. Packets with the following
> > >      properties won't be processed:
> > > 	- packets without data;
> > > 	- packets with wrong checksums;
> >
> > Why do we care to check this kind of error? Can we just suppose the
> > applications have already dropped the packets with wrong cksum?
> >
> > > 	- fragmented packets.
> >
> > IP fragmented? I don't think we need to check it here either. It's the
> > application's responsibility to call librte_ip_frag firstly to reassemble
> > IP-fragmented packets, and then call this gro library to merge TCP packets.
> > And this procedure should be shown in an example for other users to refer.
> 
> If we leave this check to applications, they have to call librte_ip_frag
> first, then call GRO library. If they call GRO library first, error
> happens. But if GRO checks if the packet is fragmented, no error happens.
> IMO, it's more flexible. Besides, for GRO library, to check if the packet
> is fragmented is very simple. Just to check if DF bit is set is enough.
> So I still think GRO library need to check it.

I would expect ip fragment is also a gro engine (might be two for ipv4 and ipv6).

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-22  9:35               ` Tan, Jianfeng
@ 2017-06-22 13:55                 ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-22 13:55 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, Ananyev, Konstantin, yliu, Wiles, Keith, Bie, Tiwei, Yao,
	Lei A, stephen

Hi Jianfeng,

On Thu, Jun 22, 2017 at 05:35:23PM +0800, Tan, Jianfeng wrote:
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Thursday, June 22, 2017 4:18 PM
> > To: Tan, Jianfeng
> > Cc: dev@dpdk.org; Ananyev, Konstantin; yliu@fridaylinux.org; Wiles, Keith;
> > Bie, Tiwei; Yao, Lei A
> > Subject: Re: [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
> > 
> > On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
> > >
> > >
> > > On 6/18/2017 3:21 PM, Jiayu Hu wrote:
> > > >
> > > > Each element in the flow array records the information of one flow,
> > > > which includes two parts:
> > > > - key: the criteria of the same flow. If packets have the same key
> > > >      value, they belong to the same flow.
> > > > - start_index: the index of the first incoming packet of this flow in
> > > >      the item array. With start_index, we can locate the first incoming
> > > >      packet of this flow.
> > > > Each element in the item array records one packet information. It mainly
> > > > includes two parts:
> > > > - pkt: packet address
> > > > - next_pkt_index: index of the next packet of the same flow in the item
> > > >      array. All packets of the same flow are chained by next_pkt_index.
> > > >      With next_pkt_index, we can locate all packets of the same flow
> > > >      one by one.
> > > >
> > > > To process an incoming packet, we need three steps:
> > > > a. check if the packet should be processed. Packets with the following
> > > >      properties won't be processed:
> > > > 	- packets without data;
> > > > 	- packets with wrong checksums;
> > >
> > > Why do we care to check this kind of error? Can we just suppose the
> > > applications have already dropped the packets with wrong cksum?
> > >
> > > > 	- fragmented packets.
> > >
> > > IP fragmented? I don't think we need to check it here either. It's the
> > > application's responsibility to call librte_ip_frag firstly to reassemble
> > > IP-fragmented packets, and then call this gro library to merge TCP packets.
> > > And this procedure should be shown in an example for other users to refer.
> > 
> > If we leave this check to applications, they have to call librte_ip_frag
> > first, then call GRO library. If they call GRO library first, error
> > happens. But if GRO checks if the packet is fragmented, no error happens.
> > IMO, it's more flexible. Besides, for GRO library, to check if the packet
> > is fragmented is very simple. Just to check if DF bit is set is enough.
> > So I still think GRO library need to check it.
> 
> I would expect ip fragment is also a gro engine (might be two for ipv4 and ipv6).

The case you mentioned is that packets are TSOed and also
IP fragmented. But current GRO doesn't support this case.
If we want to add a IP engine to perform IP reassembly
and cooperate with TCP engine to handle this case, I think
it's very hard to implement in current GRO. Because the
correct processing logic to process these packets is:
perform IP reassembly on N packets --> IP reassembled
packets to perform TCP GRO. So if we want to support
this case, I would prefer to ask applications to call
librte_ip_frag first and then to call GRO library,
rather than to add this IP engine.

Besides, to support this case, we also need to assume all
inputted packets are IP reassembled. Fragments which don't
have L4 header shouldn't be inputted. How do you think?

Thanks,
Jiayu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                           ` (3 preceding siblings ...)
  2017-06-19  1:39         ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
@ 2017-06-23 14:43         ` Jiayu Hu
  2017-06-23 14:43           ` [PATCH v6 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                             ` (4 more replies)
  4 siblings, 5 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-23 14:43 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, stephen, yliu, keith.wiles,
	tiwei.bie, lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweigth mode and heavyweight mode. If applications want to merge
packets in a simple way, they can select lightweight mode API. If
applications need more fine-grained controls, they can select heavyweigth
mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.

The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP stream
	- multiple flows and single TCP stream
	- single flow and parallel TCP streams

We run above iperf tests on three scenatios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++++
 app/test-pmd/config.c                       |  37 +++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 ++++
 lib/librte_gro/rte_gro.c                    | 221 ++++++++++++++++
 lib/librte_gro/rte_gro.h                    | 195 ++++++++++++++
 lib/librte_gro/rte_gro_tcp.c                | 393 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h                | 188 +++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1290 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v6 1/3] lib: add Generic Receive Offload API framework
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
@ 2017-06-23 14:43           ` Jiayu Hu
  2017-06-25 16:53             ` Tan, Jianfeng
  2017-06-23 14:43           ` [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-23 14:43 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, stephen, yliu, keith.wiles,
	tiwei.bie, lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweigth mode, the other is called heavyweight mode.
If applications want merge packets in a simple way, they can use
lightweigth mode. If applications need more fine-grained controls,
they can choose heavyweigth mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweigth mode and processes N packets at a time. For applications,
performing GRO in lightweigth mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and processes one packet at a time. For applications,
performing GRO in heavyweigth mode is relatively complicated. Before
performing GRO, applications need to create a GRO table by
rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
packets one by one. The processed packets are in the GRO table. If
applications want to get them, applications need to manually flush
them by flush APIs.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 ++++++++++
 lib/librte_gro/rte_gro.c           | 125 ++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 191 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 386 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index f6aafd1..167f5ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..ebc545f
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,125 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
+
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	struct rte_gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct rte_gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	gro_tbl->max_packet_size = max_packet_size;
+	gro_tbl->max_timeout_cycles = max_timeout_cycles;
+	gro_tbl->desired_gro_types = desired_gro_types;
+
+	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
+		gro_type_flag = 1 << i;
+		if (desired_gro_types & gro_type_flag) {
+			create_tbl_fn = tbl_create_functions[i];
+			if (create_tbl_fn)
+				create_tbl_fn(socket_id,
+						max_flow_num,
+						max_item_per_flow);
+			else
+				gro_tbl->tbls[i] = NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
+		gro_type_flag = 1 << i;
+		if (gro_tbl->desired_gro_types & gro_type_flag) {
+			destroy_tbl_fn = tbl_destroy_functions[i];
+			if (destroy_tbl_fn)
+				destroy_tbl_fn(gro_tbl->tbls[i]);
+			gro_tbl->tbls[i] = NULL;
+		}
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param __rte_unused)
+{
+	return nb_pkts;
+}
+
+int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
+		struct rte_gro_tbl *gro_tbl __rte_unused)
+{
+	return -1;
+}
+
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint16_t
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..2c547fa
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,191 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+/* max number of supported GRO types */
+#define GRO_TYPE_MAX_NB 64
+#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
+
+/**
+ * GRO table, which is used to merge packets. It keeps many reassembly
+ * tables of desired GRO types. Applications need to create GRO tables
+ * before using rte_gro_reassemble to perform GRO.
+ */
+struct rte_gro_tbl {
+	uint64_t desired_gro_types;	/**< GRO types to perform */
+	/* max TTL measured in nanosecond */
+	uint64_t max_timeout_cycles;
+	/* max length of merged packet measured in byte */
+	uint32_t max_packet_size;
+	/* reassebly tables of desired GRO types */
+	void *tbls[GRO_TYPE_MAX_NB];
+};
+
+struct rte_gro_param {
+	uint64_t desired_gro_types;	/**< desired GRO types */
+	uint32_t max_packet_size;	/**< max length of merged packets */
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max packet number per flow */
+};
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+
+/**
+ * This function create a GRO table, which is used to merge packets.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  max number of flows in the GRO table.
+ * @param max_item_per_flow
+ *  max packet number per flow. We use the value of (max_flow_num *
+ *  max_item_per_fow) to calculate table size.
+ * @param max_packet_size
+ *  max length of merged packets. Measured in byte.
+ * @param max_timeout_cycles
+ *  max TTL for a packet in the GRO table. It's measured in nanosecond.
+ * @param desired_gro_types
+ *  GRO types to perform.
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to  applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps addresses of GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param);
+
+/**
+ * Reassembly function, which tries to merge the inputted packet with
+ * one packet in a given GRO table. This function assumes the inputted
+ * packet is with correct checksums. And it won't update checksums if
+ * two packets are merged. Besides, if the inputted packet is IP
+ * fragmented, this function assumes it's a complete packet (i.e. with
+ * L4 header).
+ *
+ * If the inputted packet doesn't have data or it's with unsupported GRO
+ * type, function returns immediately. Otherwise, the inputted packet is
+ * either merged or inserted into the table. If applications want get
+ * packets in the table, they need to call flush APIs.
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param gro_tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  if merge the packet successfully, return a positive value. If fail
+ *  to merge, return zero. If the packet doesn't have data, or its GRO
+ *  type is unsupported, return a negative value.
+ */
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl);
+
+/**
+ * This function flushed packets from reassembly tables of desired GRO
+ * types. It won't re-calculate checksums for merged packets in the
+ * tables. That is, the returned packets may be with wrong checksums.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ *  GRO types whose packets will be flushed.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. It won't re-calculate checksums for merged packets
+ * in the tables. That is, the returned packets may be with wrong
+ * checksums.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
+#endif
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..827596b
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_tbl_create;
+	rte_gro_tbl_destroy;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_flush;
+	rte_gro_timeout_flush;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
  2017-06-23 14:43           ` [PATCH v6 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-23 14:43           ` Jiayu Hu
  2017-06-25 16:53             ` Tan, Jianfeng
  2017-06-23 14:43           ` [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-23 14:43 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, stephen, yliu, keith.wiles,
	tiwei.bie, lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp_tbl_create: create a TCP reassembly table, which is used to
    merge packets.
- gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
- gro_tcp_tbl_flush: flush all packets from a TCP reassembly table.
- gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
    reassembly table.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP GRO, we use a table structure, called TCP reassembly table, to
reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
structure. A TCP reassembly table includes a key array and a item array,
where the key array keeps the criteria to merge packets and the item
array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes two parts:
- pkt: packet address
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with the following
    properties won't be processed:
	- packets without data (e.g. SYN, SYN-ACK)
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/rte_gro.c               | 126 +++++++++--
 lib/librte_gro/rte_gro.h               |   6 +-
 lib/librte_gro/rte_gro_tcp.c           | 393 +++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h           | 188 ++++++++++++++++
 6 files changed, 705 insertions(+), 16 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..e89344d 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index ebc545f..ae800f9 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,11 +32,15 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp.h"
 
-static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
-static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
+	gro_tcp_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
+	gro_tcp_tbl_destroy, NULL};
 
 struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -94,32 +98,124 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		const uint16_t nb_pkts,
-		const struct rte_gro_param param __rte_unused)
+		const struct rte_gro_param param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num = nb_pkts <
+		param.max_flow_num * param.max_item_per_flow ?
+		nb_pkts :
+		param.max_flow_num * param.max_item_per_flow;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
+		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
+	struct gro_tcp_tbl tcp_tbl;
+	struct gro_tcp_key tcp_keys[tcp_item_num];
+	struct gro_tcp_item tcp_items[tcp_item_num];
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+
+	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
+			tcp_item_num);
+	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
+			tcp_item_num);
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = tcp_item_num;
+	tcp_tbl.max_item_num = tcp_item_num;
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) {
+			if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) &&
+				(param.desired_gro_types &
+					 GRO_TCP_IPV4)) {
+				ret = gro_tcp4_reassemble(pkts[i],
+						&tcp_tbl,
+						param.max_packet_size);
+				/* merge successfully */
+				if (ret > 0)
+					nb_after_gro--;
+				else if (ret < 0)
+					unprocess_pkts[unprocess_num++] =
+						pkts[i];
+			} else
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		if (param.desired_gro_types & GRO_TCP_IPV4)
+			i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts);
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+			i += unprocess_num;
+		}
+		if (nb_pkts > i)
+			memset(&pkts[i], 0,
+					sizeof(struct rte_mbuf *) *
+					(nb_pkts - i));
+	}
+	return nb_after_gro;
 }
 
-int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
-		struct rte_gro_tbl *gro_tbl __rte_unused)
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl)
 {
+	if (unlikely(pkt == NULL))
+		return -1;
+
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+		if ((pkt->packet_type & RTE_PTYPE_L4_TCP) &&
+				(gro_tbl->desired_gro_types &
+				 GRO_TCP_IPV4))
+			return gro_tcp4_reassemble(pkt,
+					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+					gro_tbl->max_packet_size);
+	}
+
 	return -1;
 }
 
-uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				out,
+				max_nb_out);
 	return 0;
 }
 
 uint16_t
-rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_timeout_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 2c547fa..41cd51a 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -35,7 +35,11 @@
 
 /* max number of supported GRO types */
 #define GRO_TYPE_MAX_NB 64
-#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
+#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
+
+/* TCP/IPv4 GRO flag */
+#define GRO_TCP_IPV4_INDEX 0
+#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
 
 /**
  * GRO table, which is used to merge packets. It keeps many reassembly
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..cfcd89e
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,393 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp.h"
+
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp_tbl *tbl;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
+		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+
+	size = sizeof(struct gro_tcp_item) * entries_num;
+	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp_key) * entries_num;
+	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_key_num = entries_num;
+	return tbl;
+}
+
+void gro_tcp_tbl_destroy(void *tbl)
+{
+	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (tcp_tbl) {
+		if (tcp_tbl->items)
+			rte_free(tcp_tbl->items);
+		if (tcp_tbl->keys)
+			rte_free(tcp_tbl->keys);
+		rte_free(tcp_tbl);
+	}
+}
+
+/**
+ * merge two TCP/IPv4 packets without update checksums.
+ */
+static int
+merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
+		struct rte_mbuf *pkt,
+		uint32_t max_packet_size)
+{
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	struct rte_mbuf *tail;
+
+	/* parse the given packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	/* parse the original packet */
+	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
+				struct ether_hdr *) + 1);
+
+	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
+		return -1;
+
+	/* remove the header of the incoming packet */
+	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+			ipv4_ihl1 + tcp_hl1);
+
+	/* chain the two packet together */
+	tail = rte_pktmbuf_lastseg(pkt_src);
+	tail->next = pkt;
+
+	/* update IP header */
+	ipv4_hdr2->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr2->total_length)
+			+ tcp_dl1);
+
+	/* update mbuf metadata for the merged packet */
+	pkt_src->nb_segs++;
+	pkt_src->pkt_len += pkt->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl)
+{
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	uint32_t sent_seq1, sent_seq;
+	int ret = -1;
+
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	/* check if the two packets are neighbor */
+	if ((sent_seq ^ sent_seq1) == 0) {
+		/* check if TCP option field equals */
+		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+			if ((tcp_hl1 != tcp_hl) ||
+					(memcmp(tcp_hdr1 + 1,
+							tcp_hdr + 1,
+							tcp_hl - sizeof
+							(struct tcp_hdr))
+					 == 0))
+				ret = 1;
+		}
+	}
+	return ret;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static uint32_t
+find_an_empty_key(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
+
+	struct tcp_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, key_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto fail;
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto fail;
+
+	/* find a key and traverse all packets in its item group */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if (tbl->keys[i].is_valid &&
+				(memcmp(&(tbl->keys[i].key), &key,
+						sizeof(struct tcp_key))
+				 == 0)) {
+			cur_idx = tbl->keys[i].start_index;
+			prev_idx = cur_idx;
+			while (cur_idx != INVALID_ARRAY_INDEX) {
+				if (check_seq_option(tbl->items[cur_idx].pkt,
+							tcp_hdr,
+							tcp_hl) > 0) {
+					if (merge_two_tcp4_packets(
+								tbl->items[cur_idx].pkt,
+								pkt,
+								max_packet_size) > 0) {
+						/* successfully merge two packets */
+						tbl->items[cur_idx].is_groed = 1;
+						return 1;
+					}
+					/**
+					 * fail to merge two packets since
+					 * it's beyond the max packet length.
+					 * Insert it into the item group.
+					 */
+					goto insert_to_item_group;
+				} else {
+					prev_idx = cur_idx;
+					cur_idx = tbl->items[cur_idx].next_pkt_idx;
+				}
+			}
+			/**
+			 * find a corresponding item group but fails to find
+			 * one packet to merge. Insert it into this item group.
+			 */
+insert_to_item_group:
+			item_idx = find_an_empty_item(tbl);
+			/* the item number is beyond the maximum value */
+			if (item_idx == INVALID_ARRAY_INDEX)
+				return -1;
+			tbl->items[prev_idx].next_pkt_idx = item_idx;
+			tbl->items[item_idx].pkt = pkt;
+			tbl->items[item_idx].is_groed = 0;
+			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+			tbl->items[item_idx].is_valid = 1;
+			tbl->items[item_idx].start_time = rte_rdtsc();
+			tbl->item_num++;
+			return 0;
+		}
+	}
+
+	/**
+	 * merge fail as the given packet has a
+	 * new key. So insert a new key.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	key_idx = find_an_empty_key(tbl);
+	/**
+	 * if the key or item number is beyond the maximum
+	 * value, the inputted packet won't be processed.
+	 */
+	if (item_idx == INVALID_ARRAY_INDEX ||
+			key_idx == INVALID_ARRAY_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].is_groed = 0;
+	tbl->items[item_idx].is_valid = 1;
+	tbl->items[item_idx].start_time = rte_rdtsc();
+	tbl->item_num++;
+
+	memcpy(&(tbl->keys[key_idx].key),
+			&key, sizeof(struct tcp_key));
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->keys[key_idx].is_valid = 1;
+	tbl->key_num++;
+
+	return 0;
+fail:
+	return -1;
+}
+
+uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint32_t i, num = 0;
+
+	if (nb_out < tbl->item_num)
+		return 0;
+
+	for (i = 0; i < tbl->max_item_num; i++) {
+		if (tbl->items[i].is_valid) {
+			out[num++] = tbl->items[i].pkt;
+			tbl->items[i].is_valid = 0;
+			tbl->item_num--;
+		}
+	}
+	memset(tbl->keys, 0, sizeof(struct gro_tcp_key) *
+			tbl->max_key_num);
+	tbl->key_num = 0;
+
+	return num;
+}
+
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint16_t k;
+	uint32_t i, j;
+	uint64_t current_time;
+
+	if (nb_out == 0)
+		return 0;
+	k = 0;
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if (tbl->keys[i].is_valid) {
+			j = tbl->keys[i].start_index;
+			while (j != INVALID_ARRAY_INDEX) {
+				if (current_time - tbl->items[j].start_time >=
+						timeout_cycles) {
+					out[k++] = tbl->items[j].pkt;
+					tbl->items[j].is_valid = 0;
+					tbl->item_num--;
+					j = tbl->items[j].next_pkt_idx;
+
+					if (k == nb_out &&
+							j == INVALID_ARRAY_INDEX) {
+						/* delete the key */
+						tbl->keys[i].is_valid = 0;
+						tbl->key_num--;
+						goto end;
+					} else if (k == nb_out &&
+							j != INVALID_ARRAY_INDEX) {
+						/* update the first item index */
+						tbl->keys[i].start_index = j;
+						goto end;
+					}
+				}
+			}
+			/* delete the key, as all of its packets are flushed */
+			tbl->keys[i].is_valid = 0;
+			tbl->key_num--;
+		}
+		if (tbl->key_num == 0)
+			goto end;
+	}
+end:
+	return k;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..4c4f9c7
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,188 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/* criteria of mergeing packets */
+struct tcp_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
+	uint32_t ip_dst_addr[4];
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+};
+
+struct gro_tcp_key {
+	struct tcp_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+	/* flag to indicate if the packet is GROed */
+	uint8_t is_groed;
+	uint8_t is_valid;	/**< flag indicates if the item is valid */
+};
+
+/**
+ * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
+ * structure.
+ */
+struct gro_tcp_tbl {
+	struct gro_tcp_item *items;	/**< item array */
+	struct gro_tcp_key *keys;	/**< key array */
+	uint32_t item_num;	/**< current item number */
+	uint32_t key_num;	/**< current key num */
+	uint32_t max_item_num;	/**< item array size */
+	uint32_t max_key_num;	/**< key array size */
+};
+
+/**
+ * This function creates a TCP reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP reassembly table.
+ */
+void gro_tcp_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP reassembly table to
+ * merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. If the packet is without data
+ * (e.g. SYN, SYN-ACK packet), this function returns immediately.
+ * Otherwise, the packet is either merged, or inserted into the table.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP reassembly table.
+ * @param max_packet_size
+ *  max packet length after merged
+ * @return
+ *  if the packet doesn't have data, return a negative value. If the
+ *  packet is merged successfully, return an positive value. If the
+ *  packet is inserted into the table, return 0.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size);
+
+/**
+ * This function flushes all packets in a TCP reassembly table to
+ * applications, and without updating checksums for merged packets.
+ * If the array which is used to keep flushed packets is not large
+ * enough, error happens and this function returns immediately.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param out
+ *  pointer array which is used to keep flushed packets. Applications
+ *  should guarantee it's large enough to hold all packets in the table.
+ * @param nb_out
+ *  the element number of out.
+ * @return
+ *  the number of flushed packets. If out is not large enough to hold
+ *  all packets in the table, return 0.
+ */
+uint16_t
+gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+
+/**
+ * This function flushes timeout packets in a TCP reassembly table to
+ * applications, and without updating checksums for merged packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out.
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
  2017-06-23 14:43           ` [PATCH v6 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-23 14:43           ` [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-23 14:43           ` Jiayu Hu
  2017-06-24  8:01             ` Yao, Lei A
  2017-06-25 16:03           ` [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
  2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
  4 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-23 14:43 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, stephen, yliu, keith.wiles,
	tiwei.bie, lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  37 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 215 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ff8ffd2..cb359e1 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b0b340e..2a33a63 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,42 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.desired_gro_types = GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..430bd8b 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				gro_ports[fs->rx_port].param);
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b29328a..ed27c7a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..377d933 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -428,6 +430,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2b9a1ea..528c833 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-23 14:43           ` [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-24  8:01             ` Yao, Lei A
  0 siblings, 0 replies; 141+ messages in thread
From: Yao, Lei A @ 2017-06-24  8:01 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Ananyev, Konstantin, Tan, Jianfeng, stephen, yliu, Wiles, Keith,
	Bie, Tiwei



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Friday, June 23, 2017 10:43 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; stephen@networkplumber.org;
> yliu@fridaylinux.org; Wiles, Keith <keith.wiles@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Hu, Jiayu
> <jiayu.hu@intel.com>
> Subject: [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO
> 
> This patch enables TCP/IPv4 GRO library in csum forwarding engine.
> By default, GRO is turned off. Users can use command "gro (on|off)
> (port_id)" to enable or disable GRO for a given port. If a port is
> enabled GRO, all TCP/IPv4 packets received from the port are performed
> GRO. Besides, users can set max flow number and packets number per-flow
> by command "gro set (max_flow_num) (max_item_num_per_flow)
> (port_id)".
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-By: Lei Yao<lei.a.yao@intel.com>
This patch is tested on the following test bench:
OS: Ubuntu 16.04
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: XXV710 25G
We can see the iperf result improve a lot after enable GRO. The data
flow is NIC1->NIC2->testpmd(GRO on/off)->vhost->virtio-net(in VM) 

> ---
>  app/test-pmd/cmdline.c                      | 125
> ++++++++++++++++++++++++++++
>  app/test-pmd/config.c                       |  37 ++++++++
>  app/test-pmd/csumonly.c                     |   5 ++
>  app/test-pmd/testpmd.c                      |   3 +
>  app/test-pmd/testpmd.h                      |  11 +++
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
>  6 files changed, 215 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index ff8ffd2..cb359e1 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -76,6 +76,7 @@
>  #include <rte_devargs.h>
>  #include <rte_eth_ctrl.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>  			"tso show (portid)"
>  			"    Display the status of TCP Segmentation
> Offload.\n\n"
> 
> +			"gro (on|off) (port_id)"
> +			"    Enable or disable Generic Receive Offload in io"
> +			" forward engine.\n\n"
> +
> +			"gro set (max_flow_num)
> (max_item_num_per_flow) (port_id)\n"
> +			"    Set max flow number and max packet number
> per-flow"
> +			" for GRO.\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
>  	},
>  };
> 
> +/* *** SET GRO FOR A PORT *** */
> +struct cmd_gro_result {
> +	cmdline_fixed_string_t cmd_keyword;
> +	cmdline_fixed_string_t mode;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_enable_gro_parsed(void *parsed_result,
> +		__attribute__((unused)) struct cmdline *cl,
> +		__attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_result *res;
> +
> +	res = parsed_result;
> +	setup_gro(res->mode, res->port_id);
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			cmd_keyword, "gro");
> +cmdline_parse_token_string_t cmd_gro_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			mode, "on#off");
> +cmdline_parse_token_num_t cmd_gro_pid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
> +			port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_enable_gro = {
> +	.f = cmd_enable_gro_parsed,
> +	.data = NULL,
> +	.help_str = "gro (on|off) (port_id)",
> +	.tokens = {
> +		(void *)&cmd_gro_keyword,
> +		(void *)&cmd_gro_mode,
> +		(void *)&cmd_gro_pid,
> +		NULL,
> +	},
> +};
> +
> +/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO ***
> */
> +struct cmd_gro_set_result {
> +	cmdline_fixed_string_t gro;
> +	cmdline_fixed_string_t mode;
> +	uint16_t flow_num;
> +	uint16_t item_num_per_flow;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_gro_set_parsed(void *parsed_result,
> +		       __attribute__((unused)) struct cmdline *cl,
> +		       __attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_set_result *res = parsed_result;
> +
> +	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
> +		return;
> +	if (test_done == 0) {
> +		printf("Before set GRO flow_num and item_num_per_flow,"
> +				" please stop forwarding first\n");
> +		return;
> +	}
> +
> +	if (!strcmp(res->mode, "set")) {
> +		if (res->flow_num == 0)
> +			printf("Invalid flow number. Revert to default value:"
> +					" %u.\n",
> GRO_DEFAULT_FLOW_NUM);
> +		else
> +			gro_ports[res->port_id].param.max_flow_num =
> +				res->flow_num;
> +
> +		if (res->item_num_per_flow == 0)
> +			printf("Invalid item number per-flow. Revert"
> +					" to default value:%u.\n",
> +
> 	GRO_DEFAULT_ITEM_NUM_PER_FLOW);
> +		else
> +			gro_ports[res->port_id].param.max_item_per_flow
> =
> +				res->item_num_per_flow;
> +	}
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_set_gro =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
> +				gro, "gro");
> +cmdline_parse_token_string_t cmd_gro_set_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
> +				mode, "set");
> +cmdline_parse_token_num_t cmd_gro_set_flow_num =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
> +				flow_num, UINT16);
> +cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
> +				item_num_per_flow, UINT16);
> +cmdline_parse_token_num_t cmd_gro_set_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
> +				port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_gro_set = {
> +	.f = cmd_gro_set_parsed,
> +	.data = NULL,
> +	.help_str = "gro set <max_flow_num> <max_item_num_per_flow>
> "
> +		"<port_id>: set max flow number and max packet number
> per-flow "
> +		"for GRO",
> +	.tokens = {
> +		(void *)&cmd_gro_set_gro,
> +		(void *)&cmd_gro_set_mode,
> +		(void *)&cmd_gro_set_flow_num,
> +		(void *)&cmd_gro_set_item_num_per_flow,
> +		(void *)&cmd_gro_set_portid,
> +		NULL,
> +	},
> +};
> +
>  /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
>  struct cmd_set_flush_rx {
>  	cmdline_fixed_string_t set;
> @@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tso_show,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
> +	(cmdline_parse_inst_t *)&cmd_enable_gro,
> +	(cmdline_parse_inst_t *)&cmd_gro_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index b0b340e..2a33a63 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -71,6 +71,7 @@
>  #ifdef RTE_LIBRTE_BNXT_PMD
>  #include <rte_pmd_bnxt.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -2414,6 +2415,42 @@ set_tx_pkt_segments(unsigned *seg_lengths,
> unsigned nb_segs)
>  	tx_pkt_nb_segs = (uint8_t) nb_segs;
>  }
> 
> +void
> +setup_gro(const char *mode, uint8_t port_id)
> +{
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		printf("invalid port id %u\n", port_id);
> +		return;
> +	}
> +	if (test_done == 0) {
> +		printf("Before enable/disable GRO,"
> +				" please stop forwarding first\n");
> +		return;
> +	}
> +	if (strcmp(mode, "on") == 0) {
> +		if (gro_ports[port_id].enable) {
> +			printf("port %u has enabled GRO\n", port_id);
> +			return;
> +		}
> +		gro_ports[port_id].enable = 1;
> +		gro_ports[port_id].param.desired_gro_types =
> GRO_TCP_IPV4;
> +		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
> +
> +		if (gro_ports[port_id].param.max_flow_num == 0)
> +			gro_ports[port_id].param.max_flow_num =
> +				GRO_DEFAULT_FLOW_NUM;
> +		if (gro_ports[port_id].param.max_item_per_flow == 0)
> +			gro_ports[port_id].param.max_item_per_flow =
> +				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
> +	} else {
> +		if (gro_ports[port_id].enable == 0) {
> +			printf("port %u has disabled GRO\n", port_id);
> +			return;
> +		}
> +		gro_ports[port_id].enable = 0;
> +	}
> +}
> +
>  char*
>  list_pkt_forwarding_modes(void)
>  {
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 66fc9a0..430bd8b 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -71,6 +71,7 @@
>  #include <rte_prefetch.h>
>  #include <rte_string_fns.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
>  #include "testpmd.h"
> 
>  #define IP_DEFTTL  64   /* from RFC 1340. */
> @@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
>  				 nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
>  		return;
> +	if (unlikely(gro_ports[fs->rx_port].enable))
> +		nb_rx = rte_gro_reassemble_burst(pkts_burst,
> +				nb_rx,
> +				gro_ports[fs->rx_port].param);
> 
>  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
>  	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index b29328a..ed27c7a 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -90,6 +90,7 @@
>  #ifdef RTE_LIBRTE_LATENCY_STATS
>  #include <rte_latencystats.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
>  uint8_t bitrate_enabled;
>  #endif
> 
> +struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  /* Forward function declarations */
>  static void map_port_queue_stats_mapping_registers(uint8_t pi, struct
> rte_port *port);
>  static void check_all_ports_link_status(uint32_t port_mask);
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 364502d..377d933 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -34,6 +34,8 @@
>  #ifndef _TESTPMD_H_
>  #define _TESTPMD_H_
> 
> +#include <rte_gro.h>
> +
>  #define RTE_PORT_ALL            (~(portid_t)0x0)
> 
>  #define RTE_TEST_RX_DESC_MAX    2048
> @@ -428,6 +430,14 @@ extern struct ether_addr
> peer_eth_addrs[RTE_MAX_ETHPORTS];
>  extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-
> retry. */
>  extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-
> retry. */
> 
> +#define GRO_DEFAULT_FLOW_NUM 4
> +#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
> +struct gro_status {
> +	struct rte_gro_param param;
> +	uint8_t enable;
> +};
> +extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  static inline unsigned int
>  lcore_num(void)
>  {
> @@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t
> index);
>  void get_5tuple_filter(uint8_t port_id, uint16_t index);
>  int rx_queue_id_is_invalid(queueid_t rxq_id);
>  int tx_queue_id_is_invalid(queueid_t txq_id);
> +void setup_gro(const char *mode, uint8_t port_id);
> 
>  /* Functions to manage the set of filtered Multicast MAC addresses */
>  void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 2b9a1ea..528c833 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
> 
>     testpmd> tso show (port_id)
> 
> +gro
> +~~~~~~~~
> +
> +Enable or disable GRO in ``csum`` forwarding engine::
> +
> +   testpmd> gro (on|off) (port_id)
> +
> +If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
> +packets received from the given port.
> +
> +If disabled, packets received from the given port won't be performed
> +GRO. By default, GRO is disabled for all ports.
> +
> +.. note::
> +
> +   When enable GRO for a port, TCP/IPv4 packets received from the port
> +   will be performed GRO. After GRO, the merged packets are multi-
> segments.
> +   But csum forwarding engine doesn't support to calculate TCP checksum
> +   for multi-segment packets in SW. So please select TCP HW checksum
> +   calculation for the port which GROed packets are transmitted to.
> +
> +gro set
> +~~~~~~~~
> +
> +Set max flow number and max packet number per-flow for GRO::
> +
> +   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
> +
> +The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the
> max
> +number of packets a GRO table can store.
> +
> +If current packet number is greater than or equal to the max value, GRO
> +will stop processing incoming packets.
> +
>  mac_addr add
>  ~~~~~~~~~~~~
> 
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
                             ` (2 preceding siblings ...)
  2017-06-23 14:43           ` [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-25 16:03           ` Tan, Jianfeng
  2017-06-26  1:35             ` Jiayu Hu
  2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
  4 siblings, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-25 16:03 UTC (permalink / raw)
  To: Jiayu Hu, dev
  Cc: konstantin.ananyev, stephen, yliu, keith.wiles, tiwei.bie, lei.a.yao



On 6/23/2017 10:43 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains performance
> by reassembling small packets into large ones. Therefore, we propose to
> support GRO in DPDK.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes:
> lightweigth mode and heavyweight mode. If applications want to merge
> packets in a simple way, they can select lightweight mode API. If
> applications need more fine-grained controls, they can select heavyweigth
> mode API.
>
> This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
> provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
> The last patch is to enable TCP/IPv4 GRO in testpmd.
>
> We perform many iperf tests to see the performance gains from DPDK GRO.
>
> The test environment is:
> a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
> 	to one networking namespace and assign p1 to DPDK;
> b. enable TSO for p0. Run iperf client on p0;
> c. launch testpmd with p1 and a vhost-user port, and run it in csum
> 	forwarding mode. Select TCP HW checksum calculation for the
> 	vhost-user port in csum forwarding engine. And for better
> 	performance, we select IPv4 and TCP HW checksum calculation for p1
> 	too;
> d. launch a VM with one CPU core and a virtio-net port. The VM OS is
> 	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
> 	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
> e. to run iperf tests, we need to avoid the csum forwarding engine
> 	compulsorily changes packet mac addresses. SO in our tests, we
> 	comment these codes out (line701 ~ line704 in csumonly.c).
>
> In each test, we run iperf with the following three configurations:
> 	- single flow and single TCP stream
> 	- multiple flows and single TCP stream
> 	- single flow and parallel TCP streams

To  me, flow == TCP stream; so could you explain what does flow mean?

>
> We run above iperf tests on three scenatios:
> 	s1: disabling kernel GRO and enabling DPDK GRO
> 	s2: disabling kernel GRO and disabling DPDK GRO
> 	s3: enabling kernel GRO and disabling DPDK GRO
> Comparing the throughput of s1 with s2, we can see the performance gains
> from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
> GRO performance with kernel GRO performance.
>
> Test results:
> 	- DPDK GRO throughput is almost 2 times than the throughput of no
> 		DPDK GRO and no kernel GRO;
> 	- DPDK GRO throughput is almost 1.2 times than the throughput of
> 		kernel GRO.
>
> Change log
> ==========
> v6:
> - avoid checksum validation and calculation
> - enable to process IP fragmented packets
> - add a command in testpmd
> - update documents
> - modify rte_gro_timeout_flush and rte_gro_reassemble_burst
> - rename veriable name
> v5:
> - fix some bugs
> - fix coding style issues
> v4:
> - implement DPDK GRO as an application-used library
> - introduce lightweight and heavyweight working modes to enable
> 	fine-grained controls to applications
> - replace cuckoo hash tables with simpler table structure
> v3:
> - fix compilation issues.
> v2:
> - provide generic reassembly function;
> - implement GRO as a device ability:
> add APIs for devices to support GRO;
> add APIs for applications to enable/disable GRO;
> - update testpmd example.
>
> Jiayu Hu (3):
>    lib: add Generic Receive Offload API framework
>    lib/gro: add TCP/IPv4 GRO support
>    app/testpmd: enable TCP/IPv4 GRO
>
>   app/test-pmd/cmdline.c                      | 125 +++++++++
>   app/test-pmd/config.c                       |  37 +++
>   app/test-pmd/csumonly.c                     |   5 +
>   app/test-pmd/testpmd.c                      |   3 +
>   app/test-pmd/testpmd.h                      |  11 +
>   config/common_base                          |   5 +
>   doc/guides/rel_notes/release_17_08.rst      |   7 +
>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
>   lib/Makefile                                |   2 +
>   lib/librte_gro/Makefile                     |  51 ++++
>   lib/librte_gro/rte_gro.c                    | 221 ++++++++++++++++
>   lib/librte_gro/rte_gro.h                    | 195 ++++++++++++++
>   lib/librte_gro/rte_gro_tcp.c                | 393 ++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_tcp.h                | 188 +++++++++++++
>   lib/librte_gro/rte_gro_version.map          |  12 +
>   mk/rte.app.mk                               |   1 +
>   16 files changed, 1290 insertions(+)
>   create mode 100644 lib/librte_gro/Makefile
>   create mode 100644 lib/librte_gro/rte_gro.c
>   create mode 100644 lib/librte_gro/rte_gro.h
>   create mode 100644 lib/librte_gro/rte_gro_tcp.c
>   create mode 100644 lib/librte_gro/rte_gro_tcp.h
>   create mode 100644 lib/librte_gro/rte_gro_version.map
>

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-23 14:43           ` [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-25 16:53             ` Tan, Jianfeng
  2017-06-26  1:58               ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-25 16:53 UTC (permalink / raw)
  To: Jiayu Hu, dev
  Cc: konstantin.ananyev, stephen, yliu, keith.wiles, tiwei.bie, lei.a.yao

Hi Jiayu,


On 6/23/2017 10:43 PM, Jiayu Hu wrote:
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
>      merge packets.
> - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> - gro_tcp_tbl_flush: flush all packets from a TCP reassembly table.
> - gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
>      reassembly table.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
>
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
>
> In TCP GRO, we use a table structure, called TCP reassembly table, to
> reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> structure. A TCP reassembly table includes a key array and a item array,
> where the key array keeps the criteria to merge packets and the item
> array keeps packet information.
>
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
>      merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
>
> Each element in the item array keeps the information of one packet. It
> mainly includes two parts:
> - pkt: packet address
> - next_pkt_index: the index of the next packet in the same item group.
>      All packets in the same item group are chained by next_pkt_index.
>      With next_pkt_index, we can locate all packets in the same item
>      group one by one.
>
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with the following
>      properties won't be processed:
> 	- packets without data (e.g. SYN, SYN-ACK)
> b. traverse the key array to find a key which has the same criteria
>      value with the incoming packet. If find, goto step c. Otherwise,
>      insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
>      key. Then traverse all packets in the item group via next_pkt_index.
>      If find one packet which can merge with the incoming one, merge them
>      together. If can't find, insert the packet into this item group.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   doc/guides/rel_notes/release_17_08.rst |   7 +
>   lib/librte_gro/Makefile                |   1 +
>   lib/librte_gro/rte_gro.c               | 126 +++++++++--
>   lib/librte_gro/rte_gro.h               |   6 +-
>   lib/librte_gro/rte_gro_tcp.c           | 393 +++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_tcp.h           | 188 ++++++++++++++++
>   6 files changed, 705 insertions(+), 16 deletions(-)
>   create mode 100644 lib/librte_gro/rte_gro_tcp.c
>   create mode 100644 lib/librte_gro/rte_gro_tcp.h
>
> diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> index 842f46f..f067247 100644
> --- a/doc/guides/rel_notes/release_17_08.rst
> +++ b/doc/guides/rel_notes/release_17_08.rst
> @@ -75,6 +75,13 @@ New Features
>   
>     Added support for firmwares with multiple Ethernet ports per physical port.
>   
> +* **Add Generic Receive Offload API support.**
> +
> +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> +  packets. GRO API assumes all inputted packets are with correct
> +  checksums. GRO API doesn't update checksums for merged packets. If
> +  inputted packets are IP fragmented, GRO API assumes they are complete
> +  packets (i.e. with L4 headers).
>   
>   Resolved Issues
>   ---------------
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 7e0f128..e89344d 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
>   
>   # source files
>   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
>   
>   # install this header file
>   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index ebc545f..ae800f9 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,11 +32,15 @@
>   
>   #include <rte_malloc.h>
>   #include <rte_mbuf.h>
> +#include <rte_ethdev.h>
>   
>   #include "rte_gro.h"
> +#include "rte_gro_tcp.h"
>   
> -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> +	gro_tcp_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> +	gro_tcp_tbl_destroy, NULL};
>   
>   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
>   		uint16_t max_flow_num,
> @@ -94,32 +98,124 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
>   }
>   
>   uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>   		const uint16_t nb_pkts,
> -		const struct rte_gro_param param __rte_unused)
> +		const struct rte_gro_param param)
>   {
> -	return nb_pkts;
> +	uint16_t i;
> +	uint16_t nb_after_gro = nb_pkts;
> +	uint32_t item_num = nb_pkts <
> +		param.max_flow_num * param.max_item_per_flow ?
> +		nb_pkts :
> +		param.max_flow_num * param.max_item_per_flow;
> +
> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;

This is a bad check here as GRO_TCP_TBL_MAX_ITEM_NUM is defined as 
(UINT32_MAX - 1), and we cannot allocate such a big array on the stack.

What's more, I still don't think we should put any TCP-specific code 
here. As we discussed offline, the reason you did this is to make the 
allocation as soon as possible. I suggest to define a macro named 
GRO_TCP_TBL_MAX_FLOWS and GRO_TCP_TBL_MAX_ITEMS_PER_FLOW, and memory are 
allocated when the library is loaded. This can even save the users from 
assigning the rte_gro_param. If there are more flows than 
GRO_TCP_TBL_MAX_FLOWS, we can just stop adding new flows.

> +	struct gro_tcp_tbl tcp_tbl;
> +	struct gro_tcp_key tcp_keys[tcp_item_num];
> +	struct gro_tcp_item tcp_items[tcp_item_num];
> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +
> +	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
> +			tcp_item_num);
> +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> +			tcp_item_num);
> +	tcp_tbl.keys = tcp_keys;
> +	tcp_tbl.items = tcp_items;
> +	tcp_tbl.key_num = 0;
> +	tcp_tbl.item_num = 0;
> +	tcp_tbl.max_key_num = tcp_item_num;
> +	tcp_tbl.max_item_num = tcp_item_num;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) {
> +			if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) &&
> +				(param.desired_gro_types &
> +					 GRO_TCP_IPV4)) {
> +				ret = gro_tcp4_reassemble(pkts[i],
> +						&tcp_tbl,
> +						param.max_packet_size);
> +				/* merge successfully */
> +				if (ret > 0)
> +					nb_after_gro--;
> +				else if (ret < 0)
> +					unprocess_pkts[unprocess_num++] =
> +						pkts[i];
> +			} else
> +				unprocess_pkts[unprocess_num++] =
> +					pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] =
> +				pkts[i];
> +	}
> +
> +	/* re-arrange GROed packets */
> +	if (nb_after_gro < nb_pkts) {
> +		if (param.desired_gro_types & GRO_TCP_IPV4)
> +			i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts);
> +		if (unprocess_num > 0) {
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) *
> +					unprocess_num);
> +			i += unprocess_num;
> +		}
> +		if (nb_pkts > i)
> +			memset(&pkts[i], 0,
> +					sizeof(struct rte_mbuf *) *
> +					(nb_pkts - i));
> +	}
> +	return nb_after_gro;
>   }
>   
> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> -		struct rte_gro_tbl *gro_tbl __rte_unused)
> +int rte_gro_reassemble(struct rte_mbuf *pkt,
> +		struct rte_gro_tbl *gro_tbl)
>   {
> +	if (unlikely(pkt == NULL))
> +		return -1;
> +
> +	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
> +		if ((pkt->packet_type & RTE_PTYPE_L4_TCP) &&
> +				(gro_tbl->desired_gro_types &
> +				 GRO_TCP_IPV4))
> +			return gro_tcp4_reassemble(pkt,
> +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +					gro_tbl->max_packet_size);
> +	}
> +
>   	return -1;
>   }
>   
> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused)
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out)
>   {
> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & GRO_TCP_IPV4)
> +		return gro_tcp_tbl_flush(
> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +				out,
> +				max_nb_out);
>   	return 0;
>   }
>   
>   uint16_t
> -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out)
>   {
> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & GRO_TCP_IPV4)
> +		return gro_tcp_tbl_timeout_flush(
> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +				gro_tbl->max_timeout_cycles,
> +				out, max_nb_out);
>   	return 0;
>   }
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 2c547fa..41cd51a 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -35,7 +35,11 @@
>   
>   /* max number of supported GRO types */
>   #define GRO_TYPE_MAX_NB 64
> -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> +
> +/* TCP/IPv4 GRO flag */
> +#define GRO_TCP_IPV4_INDEX 0
> +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
>   
>   /**
>    * GRO table, which is used to merge packets. It keeps many reassembly
> diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> new file mode 100644
> index 0000000..cfcd89e
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp.c
> @@ -0,0 +1,393 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "rte_gro_tcp.h"
> +
> +void *gro_tcp_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	size_t size;
> +	uint32_t entries_num;
> +	struct gro_tcp_tbl *tbl;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> +
> +	if (entries_num == 0)
> +		return NULL;
> +
> +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> +			__func__,
> +			sizeof(struct gro_tcp_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +
> +	size = sizeof(struct gro_tcp_item) * entries_num;
> +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> +			__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	tbl->max_item_num = entries_num;
> +
> +	size = sizeof(struct gro_tcp_key) * entries_num;
> +	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
> +			__func__,
> +			size, RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	tbl->max_key_num = entries_num;
> +	return tbl;
> +}
> +
> +void gro_tcp_tbl_destroy(void *tbl)
> +{
> +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> +
> +	if (tcp_tbl) {
> +		if (tcp_tbl->items)
> +			rte_free(tcp_tbl->items);
> +		if (tcp_tbl->keys)
> +			rte_free(tcp_tbl->keys);
> +		rte_free(tcp_tbl);
> +	}
> +}
> +
> +/**
> + * merge two TCP/IPv4 packets without update checksums.
> + */
> +static int
> +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> +		struct rte_mbuf *pkt,
> +		uint32_t max_packet_size)
> +{
> +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> +	struct tcp_hdr *tcp_hdr1;
> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> +	struct rte_mbuf *tail;
> +
> +	/* parse the given packet */
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				struct ether_hdr *) + 1);
> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> +		- tcp_hl1;
> +
> +	/* parse the original packet */
> +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> +				struct ether_hdr *) + 1);
> +
> +	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
> +		return -1;
> +
> +	/* remove the header of the incoming packet */
> +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> +			ipv4_ihl1 + tcp_hl1);
> +
> +	/* chain the two packet together */
> +	tail = rte_pktmbuf_lastseg(pkt_src);
> +	tail->next = pkt;
> +
> +	/* update IP header */
> +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> +			rte_be_to_cpu_16(
> +				ipv4_hdr2->total_length)
> +			+ tcp_dl1);
> +
> +	/* update mbuf metadata for the merged packet */
> +	pkt_src->nb_segs++;
> +	pkt_src->pkt_len += pkt->pkt_len;
> +	return 1;
> +}
> +
> +static int
> +check_seq_option(struct rte_mbuf *pkt,
> +		struct tcp_hdr *tcp_hdr,
> +		uint16_t tcp_hl)
> +{
> +	struct ipv4_hdr *ipv4_hdr1;
> +	struct tcp_hdr *tcp_hdr1;
> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> +	uint32_t sent_seq1, sent_seq;
> +	int ret = -1;
> +
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				struct ether_hdr *) + 1);
> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> +		- tcp_hl1;
> +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	/* check if the two packets are neighbor */
> +	if ((sent_seq ^ sent_seq1) == 0) {
> +		/* check if TCP option field equals */
> +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> +			if ((tcp_hl1 != tcp_hl) ||
> +					(memcmp(tcp_hdr1 + 1,
> +							tcp_hdr + 1,
> +							tcp_hl - sizeof
> +							(struct tcp_hdr))
> +					 == 0))
> +				ret = 1;
> +		}
> +	}
> +	return ret;
> +}
> +
> +static uint32_t
> +find_an_empty_item(struct gro_tcp_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_item_num; i++)
> +		if (tbl->items[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static uint32_t
> +find_an_empty_key(struct gro_tcp_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_key_num; i++)
> +		if (tbl->keys[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp_tbl *tbl,
> +		uint32_t max_packet_size)
> +{
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> +
> +	struct tcp_key key;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint32_t i, key_idx;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> +
> +	/* check if the packet should be processed */
> +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> +		goto fail;
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> +		- tcp_hl;
> +	if (tcp_dl == 0)
> +		goto fail;
> +
> +	/* find a key and traverse all packets in its item group */
> +	key.eth_saddr = eth_hdr->s_addr;
> +	key.eth_daddr = eth_hdr->d_addr;
> +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> +	key.tcp_flags = tcp_hdr->tcp_flags;
> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		if (tbl->keys[i].is_valid &&
> +				(memcmp(&(tbl->keys[i].key), &key,
> +						sizeof(struct tcp_key))
> +				 == 0)) {
> +			cur_idx = tbl->keys[i].start_index;
> +			prev_idx = cur_idx;
> +			while (cur_idx != INVALID_ARRAY_INDEX) {
> +				if (check_seq_option(tbl->items[cur_idx].pkt,
> +							tcp_hdr,
> +							tcp_hl) > 0) {
> +					if (merge_two_tcp4_packets(
> +								tbl->items[cur_idx].pkt,
> +								pkt,
> +								max_packet_size) > 0) {
> +						/* successfully merge two packets */
> +						tbl->items[cur_idx].is_groed = 1;
> +						return 1;
> +					}
> +					/**
> +					 * fail to merge two packets since
> +					 * it's beyond the max packet length.
> +					 * Insert it into the item group.
> +					 */
> +					goto insert_to_item_group;
> +				} else {
> +					prev_idx = cur_idx;
> +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> +				}
> +			}
> +			/**
> +			 * find a corresponding item group but fails to find
> +			 * one packet to merge. Insert it into this item group.
> +			 */
> +insert_to_item_group:
> +			item_idx = find_an_empty_item(tbl);
> +			/* the item number is beyond the maximum value */
> +			if (item_idx == INVALID_ARRAY_INDEX)
> +				return -1;
> +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> +			tbl->items[item_idx].pkt = pkt;
> +			tbl->items[item_idx].is_groed = 0;
> +			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +			tbl->items[item_idx].is_valid = 1;
> +			tbl->items[item_idx].start_time = rte_rdtsc();
> +			tbl->item_num++;
> +			return 0;
> +		}
> +	}
> +
> +	/**
> +	 * merge fail as the given packet has a
> +	 * new key. So insert a new key.
> +	 */
> +	item_idx = find_an_empty_item(tbl);
> +	key_idx = find_an_empty_key(tbl);
> +	/**
> +	 * if the key or item number is beyond the maximum
> +	 * value, the inputted packet won't be processed.
> +	 */
> +	if (item_idx == INVALID_ARRAY_INDEX ||
> +			key_idx == INVALID_ARRAY_INDEX)
> +		return -1;
> +	tbl->items[item_idx].pkt = pkt;
> +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +	tbl->items[item_idx].is_groed = 0;
> +	tbl->items[item_idx].is_valid = 1;
> +	tbl->items[item_idx].start_time = rte_rdtsc();
> +	tbl->item_num++;
> +
> +	memcpy(&(tbl->keys[key_idx].key),
> +			&key, sizeof(struct tcp_key));
> +	tbl->keys[key_idx].start_index = item_idx;
> +	tbl->keys[key_idx].is_valid = 1;
> +	tbl->key_num++;
> +
> +	return 0;
> +fail:
> +	return -1;
> +}
> +
> +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out)
> +{
> +	uint32_t i, num = 0;
> +
> +	if (nb_out < tbl->item_num)
> +		return 0;
> +
> +	for (i = 0; i < tbl->max_item_num; i++) {
> +		if (tbl->items[i].is_valid) {
> +			out[num++] = tbl->items[i].pkt;
> +			tbl->items[i].is_valid = 0;
> +			tbl->item_num--;
> +		}
> +	}
> +	memset(tbl->keys, 0, sizeof(struct gro_tcp_key) *
> +			tbl->max_key_num);
> +	tbl->key_num = 0;
> +
> +	return num;
> +}
> +
> +uint16_t
> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out)
> +{
> +	uint16_t k;
> +	uint32_t i, j;
> +	uint64_t current_time;
> +
> +	if (nb_out == 0)
> +		return 0;
> +	k = 0;
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		if (tbl->keys[i].is_valid) {
> +			j = tbl->keys[i].start_index;
> +			while (j != INVALID_ARRAY_INDEX) {
> +				if (current_time - tbl->items[j].start_time >=
> +						timeout_cycles) {
> +					out[k++] = tbl->items[j].pkt;
> +					tbl->items[j].is_valid = 0;
> +					tbl->item_num--;
> +					j = tbl->items[j].next_pkt_idx;
> +
> +					if (k == nb_out &&
> +							j == INVALID_ARRAY_INDEX) {
> +						/* delete the key */
> +						tbl->keys[i].is_valid = 0;
> +						tbl->key_num--;
> +						goto end;
> +					} else if (k == nb_out &&
> +							j != INVALID_ARRAY_INDEX) {
> +						/* update the first item index */
> +						tbl->keys[i].start_index = j;
> +						goto end;
> +					}
> +				}
> +			}
> +			/* delete the key, as all of its packets are flushed */
> +			tbl->keys[i].is_valid = 0;
> +			tbl->key_num--;
> +		}
> +		if (tbl->key_num == 0)
> +			goto end;
> +	}
> +end:
> +	return k;
> +}
> diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> new file mode 100644
> index 0000000..4c4f9c7
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp.h
> @@ -0,0 +1,188 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_TCP_H_
> +#define _RTE_GRO_TCP_H_
> +
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> +#define TCP_HDR_LEN(tcph) \
> +	((tcph->data_off >> 4) * 4)
> +#define IPv4_HDR_LEN(iph) \
> +	((iph->version_ihl & 0x0f) * 4)
> +#else
> +#define TCP_DATAOFF_MASK 0x0f
> +#define TCP_HDR_LEN(tcph) \
> +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> +#define IPv4_HDR_LEN(iph) \
> +	((iph->version_ihl >> 4) * 4)
> +#endif
> +
> +#define INVALID_ARRAY_INDEX 0xffffffffUL
> +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)

Defining such a big number does not make any sense.

> +
> +/* criteria of mergeing packets */
> +struct tcp_key {
> +	struct ether_addr eth_saddr;
> +	struct ether_addr eth_daddr;
> +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> +	uint32_t ip_dst_addr[4];
> +
> +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +	uint8_t tcp_flags;	/**< TCP flags. */
> +};
> +
> +struct gro_tcp_key {
> +	struct tcp_key key;
> +	uint32_t start_index;	/**< the first packet index of the flow */
> +	uint8_t is_valid;
> +};
> +
> +struct gro_tcp_item {
> +	struct rte_mbuf *pkt;	/**< packet address. */
> +	/* the time when the packet in added into the table */
> +	uint64_t start_time;
> +	uint32_t next_pkt_idx;	/**< next packet index. */
> +	/* flag to indicate if the packet is GROed */
> +	uint8_t is_groed;
> +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> +};
> +
> +/**
> + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> + * structure.
> + */
> +struct gro_tcp_tbl {
> +	struct gro_tcp_item *items;	/**< item array */
> +	struct gro_tcp_key *keys;	/**< key array */
> +	uint32_t item_num;	/**< current item number */
> +	uint32_t key_num;	/**< current key num */
> +	uint32_t max_item_num;	/**< item array size */
> +	uint32_t max_key_num;	/**< key array size */
> +};
> +
> +/**
> + * This function creates a TCP reassembly table.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  the maximum number of flows in the TCP GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow.
> + * @return
> + *  if create successfully, return a pointer which points to the
> + *  created TCP GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +
> +/**
> + * This function destroys a TCP reassembly table.
> + * @param tbl
> + *  a pointer points to the TCP reassembly table.
> + */
> +void gro_tcp_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP reassembly table to
> + * merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. If the packet is without data
> + * (e.g. SYN, SYN-ACK packet), this function returns immediately.
> + * Otherwise, the packet is either merged, or inserted into the table.
> + *
> + * This function assumes the inputted packet is with correct IPv4 and
> + * TCP checksums. And if two packets are merged, it won't re-calculate
> + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> + * fragmented, it assumes the packet is complete (with TCP header).
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param tbl
> + *  a pointer that points to a TCP reassembly table.
> + * @param max_packet_size
> + *  max packet length after merged
> + * @return
> + *  if the packet doesn't have data, return a negative value. If the
> + *  packet is merged successfully, return an positive value. If the
> + *  packet is inserted into the table, return 0.
> + */
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp_tbl *tbl,
> +		uint32_t max_packet_size);
> +
> +/**
> + * This function flushes all packets in a TCP reassembly table to
> + * applications, and without updating checksums for merged packets.
> + * If the array which is used to keep flushed packets is not large
> + * enough, error happens and this function returns immediately.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param out
> + *  pointer array which is used to keep flushed packets. Applications
> + *  should guarantee it's large enough to hold all packets in the table.
> + * @param nb_out
> + *  the element number of out.
> + * @return
> + *  the number of flushed packets. If out is not large enough to hold
> + *  all packets in the table, return 0.
> + */
> +uint16_t
> +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out);
> +
> +/**
> + * This function flushes timeout packets in a TCP reassembly table to
> + * applications, and without updating checksums for merged packets.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + *  the maximum time that packets can stay in the table.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the element number of out.
> + * @return
> + *  the number of packets that are returned.
> + */
> +uint16_t
> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out);
> +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v6 1/3] lib: add Generic Receive Offload API framework
  2017-06-23 14:43           ` [PATCH v6 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-25 16:53             ` Tan, Jianfeng
  0 siblings, 0 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-06-25 16:53 UTC (permalink / raw)
  To: Jiayu Hu, dev
  Cc: konstantin.ananyev, stephen, yliu, keith.wiles, tiwei.bie, lei.a.yao

Hi Jiayu,


On 6/23/2017 10:43 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweigth mode, the other is called heavyweight mode.
> If applications want merge packets in a simple way, they can use
> lightweigth mode. If applications need more fine-grained controls,
> they can choose heavyweigth mode.
>
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweigth mode and processes N packets at a time. For applications,
> performing GRO in lightweigth mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
>
> rte_gro_reassemble is the main reassembly API which is used in
> heavyweight mode and processes one packet at a time. For applications,
> performing GRO in heavyweigth mode is relatively complicated. Before
> performing GRO, applications need to create a GRO table by
> rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
> packets one by one. The processed packets are in the GRO table. If
> applications want to get them, applications need to manually flush
> them by flush APIs.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   config/common_base                 |   5 +
>   lib/Makefile                       |   2 +
>   lib/librte_gro/Makefile            |  50 ++++++++++
>   lib/librte_gro/rte_gro.c           | 125 ++++++++++++++++++++++++
>   lib/librte_gro/rte_gro.h           | 191 +++++++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_version.map |  12 +++
>   mk/rte.app.mk                      |   1 +
>   7 files changed, 386 insertions(+)
>   create mode 100644 lib/librte_gro/Makefile
>   create mode 100644 lib/librte_gro/rte_gro.c
>   create mode 100644 lib/librte_gro/rte_gro.h
>   create mode 100644 lib/librte_gro/rte_gro_version.map
>
> diff --git a/config/common_base b/config/common_base
> index f6aafd1..167f5ef 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>   CONFIG_RTE_LIBRTE_PMD_VHOST=n
>   
>   #
> +# Compile GRO library
> +#
> +CONFIG_RTE_LIBRTE_GRO=y
> +
> +#
>   #Compile Xen domain0 support
>   #
>   CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/lib/Makefile b/lib/Makefile
> index 07e1fd0..ac1c2f6 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
>   DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
>   DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
>   DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
> +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
> +DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> new file mode 100644
> index 0000000..7e0f128
> --- /dev/null
> +++ b/lib/librte_gro/Makefile
> @@ -0,0 +1,50 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_gro.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +
> +EXPORT_MAP := rte_gro_version.map
> +
> +LIBABIVER := 1
> +
> +# source files
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +
> +# install this header file
> +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..ebc545f
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,125 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +
> +#include "rte_gro.h"
> +
> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> +
> +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow,
> +		uint32_t max_packet_size,
> +		uint64_t max_timeout_cycles,
> +		uint64_t desired_gro_types)
> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	struct rte_gro_tbl *gro_tbl;
> +	uint64_t gro_type_flag = 0;
> +	uint8_t i;
> +
> +	gro_tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct rte_gro_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	gro_tbl->max_packet_size = max_packet_size;
> +	gro_tbl->max_timeout_cycles = max_timeout_cycles;
> +	gro_tbl->desired_gro_types = desired_gro_types;
> +
> +	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
> +		gro_type_flag = 1 << i;
> +		if (desired_gro_types & gro_type_flag) {
> +			create_tbl_fn = tbl_create_functions[i];
> +			if (create_tbl_fn)
> +				create_tbl_fn(socket_id,
> +						max_flow_num,
> +						max_item_per_flow);
> +			else
> +				gro_tbl->tbls[i] = NULL;
> +		}
> +	}
> +	return gro_tbl;
> +}
> +
> +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> +{
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	if (gro_tbl == NULL)
> +		return;
> +	for (i = 0; i < GRO_TYPE_MAX_NB; i++) {
> +		gro_type_flag = 1 << i;
> +		if (gro_tbl->desired_gro_types & gro_type_flag) {
> +			destroy_tbl_fn = tbl_destroy_functions[i];
> +			if (destroy_tbl_fn)
> +				destroy_tbl_fn(gro_tbl->tbls[i]);
> +			gro_tbl->tbls[i] = NULL;
> +		}
> +	}
> +	rte_free(gro_tbl);
> +}
> +
> +uint16_t
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		const uint16_t nb_pkts,
> +		const struct rte_gro_param param __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> +		struct rte_gro_tbl *gro_tbl __rte_unused)
> +{
> +	return -1;
> +}
> +
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> +
> +uint16_t
> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> new file mode 100644
> index 0000000..2c547fa
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.h
> @@ -0,0 +1,191 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_H_
> +#define _RTE_GRO_H_

Below code snippet is missing:
     #ifdef __cplusplus
     extern "C" {
     #endif

> +
> +/* max number of supported GRO types */
> +#define GRO_TYPE_MAX_NB 64
> +#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */

I prefer to use _NUM to _NB. Not a strong objection.

> +
> +/**
> + * GRO table, which is used to merge packets. It keeps many reassembly
> + * tables of desired GRO types. Applications need to create GRO tables
> + * before using rte_gro_reassemble to perform GRO.
> + */
> +struct rte_gro_tbl {
> +	uint64_t desired_gro_types;	/**< GRO types to perform */
> +	/* max TTL measured in nanosecond */
> +	uint64_t max_timeout_cycles;
> +	/* max length of merged packet measured in byte */
> +	uint32_t max_packet_size;
> +	/* reassebly tables of desired GRO types */
> +	void *tbls[GRO_TYPE_MAX_NB];
> +};
> +
> +struct rte_gro_param {
> +	uint64_t desired_gro_types;	/**< desired GRO types */
> +	uint32_t max_packet_size;	/**< max length of merged packets */
> +	uint16_t max_flow_num;	/**< max flow number */
> +	uint16_t max_item_per_flow;	/**< max packet number per flow */
> +};
> +
> +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +typedef void (*gro_tbl_destroy_fn)(void *tbl);
> +
> +/**
> + * This function create a GRO table, which is used to merge packets.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  max number of flows in the GRO table.
> + * @param max_item_per_flow
> + *  max packet number per flow. We use the value of (max_flow_num *
> + *  max_item_per_fow) to calculate table size.
> + * @param max_packet_size
> + *  max length of merged packets. Measured in byte.
> + * @param max_timeout_cycles
> + *  max TTL for a packet in the GRO table. It's measured in nanosecond.
> + * @param desired_gro_types
> + *  GRO types to perform.
> + * @return
> + *  if create successfully, return a pointer which points to the GRO
> + *  table. Otherwise, return NULL.
> + */
> +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow,
> +		uint32_t max_packet_size,
> +		uint64_t max_timeout_cycles,
> +		uint64_t desired_gro_types);
> +/**
> + * This function destroys a GRO table.
> + */
> +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> +
> +/**
> + * This is one of the main reassembly APIs, which merges numbers of
> + * packets at a time. It assumes that all inputted packets are with
> + * correct checksums. That is, applications should guarantee all
> + * inputted packets are correct. Besides, it doesn't re-calculate
> + * checksums for merged packets. If inputted packets are IP fragmented,
> + * this function assumes them are complete (i.e. with L4 header). After
> + * finishing processing, it returns all GROed packets to  applications
> + * immediately.
> + *
> + * @param pkts
> + *  a pointer array which points to the packets to reassemble. Besides,
> + *  it keeps addresses of GROed packets.
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param param
> + *  applications use it to tell rte_gro_reassemble_burst what rules
> + *  are demanded.
> + * @return
> + *  the number of packets after GROed.
> + */
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> +		const uint16_t nb_pkts,
> +		const struct rte_gro_param param);
> +
> +/**
> + * Reassembly function, which tries to merge the inputted packet with
> + * one packet in a given GRO table. This function assumes the inputted
> + * packet is with correct checksums. And it won't update checksums if
> + * two packets are merged. Besides, if the inputted packet is IP
> + * fragmented, this function assumes it's a complete packet (i.e. with
> + * L4 header).
> + *
> + * If the inputted packet doesn't have data or it's with unsupported GRO
> + * type, function returns immediately. Otherwise, the inputted packet is
> + * either merged or inserted into the table. If applications want get
> + * packets in the table, they need to call flush APIs.
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param gro_tbl
> + *  a pointer points to a GRO table.
> + * @return
> + *  if merge the packet successfully, return a positive value. If fail
> + *  to merge, return zero. If the packet doesn't have data, or its GRO
> + *  type is unsupported, return a negative value.
> + */
> +int rte_gro_reassemble(struct rte_mbuf *pkt,
> +		struct rte_gro_tbl *gro_tbl);
> +
> +/**
> + * This function flushed packets from reassembly tables of desired GRO
> + * types. It won't re-calculate checksums for merged packets in the
> + * tables. That is, the returned packets may be with wrong checksums.
> + *
> + * @param gro_tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + *  GRO types whose packets will be flushed.
> + * @param out
> + *  a pointer array that is used to keep flushed packets.
> + * @param nb_out
> + *  the size of out.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out);
> +
> +/**
> + * This function flushes the timeout packets from reassembly tables of
> + * desired GRO types. It won't re-calculate checksums for merged packets
> + * in the tables. That is, the returned packets may be with wrong
> + * checksums.
> + *
> + * @param gro_tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + * rte_gro_timeout_flush only processes packets which belong to the
> + * GRO types specified by desired_gro_types.
> + * @param out
> + *  a pointer array that is used to keep flushed packets.
> + * @param nb_out
> + *  the size of out.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out);
> +#endif
> diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
> new file mode 100644
> index 0000000..827596b
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_version.map
> @@ -0,0 +1,12 @@
> +DPDK_17.08 {
> +	global:
> +
> +	rte_gro_tbl_create;
> +	rte_gro_tbl_destroy;
> +	rte_gro_reassemble_burst;
> +	rte_gro_reassemble;
> +	rte_gro_flush;
> +	rte_gro_timeout_flush;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index bcaf1b3..fc3776d 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-25 16:03           ` [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
@ 2017-06-26  1:35             ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-26  1:35 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, konstantin.ananyev, stephen, yliu, keith.wiles, tiwei.bie,
	lei.a.yao

Hi Jianfeng,

On Mon, Jun 26, 2017 at 12:03:33AM +0800, Tan, Jianfeng wrote:
> 
> 
> On 6/23/2017 10:43 PM, Jiayu Hu wrote:
> > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > technique to reduce per-packet processing overhead. It gains performance
> > by reassembling small packets into large ones. Therefore, we propose to
> > support GRO in DPDK.
> > 
> > To enable more flexibility to applications, DPDK GRO is implemented as
> > a user library. Applications explicitly use the GRO library to merge
> > small packets into large ones. DPDK GRO provides two reassembly modes:
> > lightweigth mode and heavyweight mode. If applications want to merge
> > packets in a simple way, they can select lightweight mode API. If
> > applications need more fine-grained controls, they can select heavyweigth
> > mode API.
> > 
> > This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
> > provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
> > The last patch is to enable TCP/IPv4 GRO in testpmd.
> > 
> > We perform many iperf tests to see the performance gains from DPDK GRO.
> > 
> > The test environment is:
> > a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
> > 	to one networking namespace and assign p1 to DPDK;
> > b. enable TSO for p0. Run iperf client on p0;
> > c. launch testpmd with p1 and a vhost-user port, and run it in csum
> > 	forwarding mode. Select TCP HW checksum calculation for the
> > 	vhost-user port in csum forwarding engine. And for better
> > 	performance, we select IPv4 and TCP HW checksum calculation for p1
> > 	too;
> > d. launch a VM with one CPU core and a virtio-net port. The VM OS is
> > 	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
> > 	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
> > e. to run iperf tests, we need to avoid the csum forwarding engine
> > 	compulsorily changes packet mac addresses. SO in our tests, we
> > 	comment these codes out (line701 ~ line704 in csumonly.c).
> > 
> > In each test, we run iperf with the following three configurations:
> > 	- single flow and single TCP stream
> > 	- multiple flows and single TCP stream
> > 	- single flow and parallel TCP streams
> 
> To  me, flow == TCP stream; so could you explain what does flow mean?

Sorry, I use inappropriate terms. 'flow' means TCP connection here. And
'multiple TCP streams' means parallel iperf-client threads.

Thanks,
Jiayu

> 
> > 
> > We run above iperf tests on three scenatios:
> > 	s1: disabling kernel GRO and enabling DPDK GRO
> > 	s2: disabling kernel GRO and disabling DPDK GRO
> > 	s3: enabling kernel GRO and disabling DPDK GRO
> > Comparing the throughput of s1 with s2, we can see the performance gains
> > from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
> > GRO performance with kernel GRO performance.
> > 
> > Test results:
> > 	- DPDK GRO throughput is almost 2 times than the throughput of no
> > 		DPDK GRO and no kernel GRO;
> > 	- DPDK GRO throughput is almost 1.2 times than the throughput of
> > 		kernel GRO.
> > 
> > Change log
> > ==========
> > v6:
> > - avoid checksum validation and calculation
> > - enable to process IP fragmented packets
> > - add a command in testpmd
> > - update documents
> > - modify rte_gro_timeout_flush and rte_gro_reassemble_burst
> > - rename veriable name
> > v5:
> > - fix some bugs
> > - fix coding style issues
> > v4:
> > - implement DPDK GRO as an application-used library
> > - introduce lightweight and heavyweight working modes to enable
> > 	fine-grained controls to applications
> > - replace cuckoo hash tables with simpler table structure
> > v3:
> > - fix compilation issues.
> > v2:
> > - provide generic reassembly function;
> > - implement GRO as a device ability:
> > add APIs for devices to support GRO;
> > add APIs for applications to enable/disable GRO;
> > - update testpmd example.
> > 
> > Jiayu Hu (3):
> >    lib: add Generic Receive Offload API framework
> >    lib/gro: add TCP/IPv4 GRO support
> >    app/testpmd: enable TCP/IPv4 GRO
> > 
> >   app/test-pmd/cmdline.c                      | 125 +++++++++
> >   app/test-pmd/config.c                       |  37 +++
> >   app/test-pmd/csumonly.c                     |   5 +
> >   app/test-pmd/testpmd.c                      |   3 +
> >   app/test-pmd/testpmd.h                      |  11 +
> >   config/common_base                          |   5 +
> >   doc/guides/rel_notes/release_17_08.rst      |   7 +
> >   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
> >   lib/Makefile                                |   2 +
> >   lib/librte_gro/Makefile                     |  51 ++++
> >   lib/librte_gro/rte_gro.c                    | 221 ++++++++++++++++
> >   lib/librte_gro/rte_gro.h                    | 195 ++++++++++++++
> >   lib/librte_gro/rte_gro_tcp.c                | 393 ++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp.h                | 188 +++++++++++++
> >   lib/librte_gro/rte_gro_version.map          |  12 +
> >   mk/rte.app.mk                               |   1 +
> >   16 files changed, 1290 insertions(+)
> >   create mode 100644 lib/librte_gro/Makefile
> >   create mode 100644 lib/librte_gro/rte_gro.c
> >   create mode 100644 lib/librte_gro/rte_gro.h
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> >   create mode 100644 lib/librte_gro/rte_gro_version.map
> > 

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-25 16:53             ` Tan, Jianfeng
@ 2017-06-26  1:58               ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-26  1:58 UTC (permalink / raw)
  To: Tan, Jianfeng
  Cc: dev, konstantin.ananyev, stephen, yliu, keith.wiles, tiwei.bie,
	lei.a.yao

Hi Jianfeng,

On Mon, Jun 26, 2017 at 12:53:31AM +0800, Tan, Jianfeng wrote:
> Hi Jiayu,
> 
> 
> On 6/23/2017 10:43 PM, Jiayu Hu wrote:
> > In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> >      merge packets.
> > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > - gro_tcp_tbl_flush: flush all packets from a TCP reassembly table.
> > - gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
> >      reassembly table.
> > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
> > 
> > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> > checksums for merged packets. If inputted packets are IP fragmented,
> > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> > headers).
> > 
> > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > structure. A TCP reassembly table includes a key array and a item array,
> > where the key array keeps the criteria to merge packets and the item
> > array keeps packet information.
> > 
> > One key in the key array points to an item group, which consists of
> > packets which have the same criteria value. If two packets are able to
> > merge, they must be in the same item group. Each key in the key array
> > includes two parts:
> > - criteria: the criteria of merging packets. If two packets can be
> >      merged, they must have the same criteria value.
> > - start_index: the index of the first incoming packet of the item group.
> > 
> > Each element in the item array keeps the information of one packet. It
> > mainly includes two parts:
> > - pkt: packet address
> > - next_pkt_index: the index of the next packet in the same item group.
> >      All packets in the same item group are chained by next_pkt_index.
> >      With next_pkt_index, we can locate all packets in the same item
> >      group one by one.
> > 
> > To process an incoming packet needs three steps:
> > a. check if the packet should be processed. Packets with the following
> >      properties won't be processed:
> > 	- packets without data (e.g. SYN, SYN-ACK)
> > b. traverse the key array to find a key which has the same criteria
> >      value with the incoming packet. If find, goto step c. Otherwise,
> >      insert a new key and insert the packet into the item array.
> > c. locate the first packet in the item group via the start_index in the
> >      key. Then traverse all packets in the item group via next_pkt_index.
> >      If find one packet which can merge with the incoming one, merge them
> >      together. If can't find, insert the packet into this item group.
> > 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >   doc/guides/rel_notes/release_17_08.rst |   7 +
> >   lib/librte_gro/Makefile                |   1 +
> >   lib/librte_gro/rte_gro.c               | 126 +++++++++--
> >   lib/librte_gro/rte_gro.h               |   6 +-
> >   lib/librte_gro/rte_gro_tcp.c           | 393 +++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp.h           | 188 ++++++++++++++++
> >   6 files changed, 705 insertions(+), 16 deletions(-)
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.c
> >   create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > 
> > diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> > index 842f46f..f067247 100644
> > --- a/doc/guides/rel_notes/release_17_08.rst
> > +++ b/doc/guides/rel_notes/release_17_08.rst
> > @@ -75,6 +75,13 @@ New Features
> >     Added support for firmwares with multiple Ethernet ports per physical port.
> > +* **Add Generic Receive Offload API support.**
> > +
> > +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> > +  packets. GRO API assumes all inputted packets are with correct
> > +  checksums. GRO API doesn't update checksums for merged packets. If
> > +  inputted packets are IP fragmented, GRO API assumes they are complete
> > +  packets (i.e. with L4 headers).
> >   Resolved Issues
> >   ---------------
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > index 7e0f128..e89344d 100644
> > --- a/lib/librte_gro/Makefile
> > +++ b/lib/librte_gro/Makefile
> > @@ -43,6 +43,7 @@ LIBABIVER := 1
> >   # source files
> >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> >   # install this header file
> >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > index ebc545f..ae800f9 100644
> > --- a/lib/librte_gro/rte_gro.c
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -32,11 +32,15 @@
> >   #include <rte_malloc.h>
> >   #include <rte_mbuf.h>
> > +#include <rte_ethdev.h>
> >   #include "rte_gro.h"
> > +#include "rte_gro_tcp.h"
> > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
> > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
> > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
> > +	gro_tcp_tbl_create, NULL};
> > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
> > +	gro_tcp_tbl_destroy, NULL};
> >   struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> >   		uint16_t max_flow_num,
> > @@ -94,32 +98,124 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> >   }
> >   uint16_t
> > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> >   		const uint16_t nb_pkts,
> > -		const struct rte_gro_param param __rte_unused)
> > +		const struct rte_gro_param param)
> >   {
> > -	return nb_pkts;
> > +	uint16_t i;
> > +	uint16_t nb_after_gro = nb_pkts;
> > +	uint32_t item_num = nb_pkts <
> > +		param.max_flow_num * param.max_item_per_flow ?
> > +		nb_pkts :
> > +		param.max_flow_num * param.max_item_per_flow;
> > +
> > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
> > +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
> 
> This is a bad check here as GRO_TCP_TBL_MAX_ITEM_NUM is defined as
> (UINT32_MAX - 1), and we cannot allocate such a big array on the stack.

Thanks a lot. I will use a small size to define this macro. But to avoid
stack overflow, I will recommend applications to use heavyweight mode when
packets number is greater than that macro.

> 
> What's more, I still don't think we should put any TCP-specific code here.
> As we discussed offline, the reason you did this is to make the allocation
> as soon as possible. I suggest to define a macro named GRO_TCP_TBL_MAX_FLOWS
> and GRO_TCP_TBL_MAX_ITEMS_PER_FLOW, and memory are allocated when the
> library is loaded. This can even save the users from assigning the
> rte_gro_param. If there are more flows than GRO_TCP_TBL_MAX_FLOWS, we can
> just stop adding new flows.

As we discussed offline, if a process has many threads and they use GRO
library at the same time, this global array will be accessed by them
simultaneously. To guarantee safety accesses, we need to use lock etc.
The codes will become very complicated. IMO, we would prefer to recommend
applications to use heavyweight mode API when packets number is large.
When the number is small. like 32, I recommend to use lightweight mode API.
How do you think?

> 
> > +	struct gro_tcp_tbl tcp_tbl;
> > +	struct gro_tcp_key tcp_keys[tcp_item_num];
> > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > +
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	uint16_t unprocess_num = 0;
> > +	int32_t ret;
> > +
> > +	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
> > +			tcp_item_num);
> > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > +			tcp_item_num);
> > +	tcp_tbl.keys = tcp_keys;
> > +	tcp_tbl.items = tcp_items;
> > +	tcp_tbl.key_num = 0;
> > +	tcp_tbl.item_num = 0;
> > +	tcp_tbl.max_key_num = tcp_item_num;
> > +	tcp_tbl.max_item_num = tcp_item_num;
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) {
> > +			if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) &&
> > +				(param.desired_gro_types &
> > +					 GRO_TCP_IPV4)) {
> > +				ret = gro_tcp4_reassemble(pkts[i],
> > +						&tcp_tbl,
> > +						param.max_packet_size);
> > +				/* merge successfully */
> > +				if (ret > 0)
> > +					nb_after_gro--;
> > +				else if (ret < 0)
> > +					unprocess_pkts[unprocess_num++] =
> > +						pkts[i];
> > +			} else
> > +				unprocess_pkts[unprocess_num++] =
> > +					pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] =
> > +				pkts[i];
> > +	}
> > +
> > +	/* re-arrange GROed packets */
> > +	if (nb_after_gro < nb_pkts) {
> > +		if (param.desired_gro_types & GRO_TCP_IPV4)
> > +			i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts);
> > +		if (unprocess_num > 0) {
> > +			memcpy(&pkts[i], unprocess_pkts,
> > +					sizeof(struct rte_mbuf *) *
> > +					unprocess_num);
> > +			i += unprocess_num;
> > +		}
> > +		if (nb_pkts > i)
> > +			memset(&pkts[i], 0,
> > +					sizeof(struct rte_mbuf *) *
> > +					(nb_pkts - i));
> > +	}
> > +	return nb_after_gro;
> >   }
> > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > +		struct rte_gro_tbl *gro_tbl)
> >   {
> > +	if (unlikely(pkt == NULL))
> > +		return -1;
> > +
> > +	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
> > +		if ((pkt->packet_type & RTE_PTYPE_L4_TCP) &&
> > +				(gro_tbl->desired_gro_types &
> > +				 GRO_TCP_IPV4))
> > +			return gro_tcp4_reassemble(pkt,
> > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +					gro_tbl->max_packet_size);
> > +	}
> > +
> >   	return -1;
> >   }
> > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused)
> > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out)
> >   {
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & GRO_TCP_IPV4)
> > +		return gro_tcp_tbl_flush(
> > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +				out,
> > +				max_nb_out);
> >   	return 0;
> >   }
> >   uint16_t
> > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused)
> > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out)
> >   {
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & GRO_TCP_IPV4)
> > +		return gro_tcp_tbl_timeout_flush(
> > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +				gro_tbl->max_timeout_cycles,
> > +				out, max_nb_out);
> >   	return 0;
> >   }
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > index 2c547fa..41cd51a 100644
> > --- a/lib/librte_gro/rte_gro.h
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -35,7 +35,11 @@
> >   /* max number of supported GRO types */
> >   #define GRO_TYPE_MAX_NB 64
> > -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
> > +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
> > +
> > +/* TCP/IPv4 GRO flag */
> > +#define GRO_TCP_IPV4_INDEX 0
> > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> >   /**
> >    * GRO table, which is used to merge packets. It keeps many reassembly
> > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > new file mode 100644
> > index 0000000..cfcd89e
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp.c
> > @@ -0,0 +1,393 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +
> > +#include <rte_ethdev.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "rte_gro_tcp.h"
> > +
> > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow)
> > +{
> > +	size_t size;
> > +	uint32_t entries_num;
> > +	struct gro_tcp_tbl *tbl;
> > +
> > +	entries_num = max_flow_num * max_item_per_flow;
> > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > +
> > +	if (entries_num == 0)
> > +		return NULL;
> > +
> > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > +			__func__,
> > +			sizeof(struct gro_tcp_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +
> > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > +			__func__,
> > +			size,
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	tbl->max_item_num = entries_num;
> > +
> > +	size = sizeof(struct gro_tcp_key) * entries_num;
> > +	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
> > +			__func__,
> > +			size, RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	tbl->max_key_num = entries_num;
> > +	return tbl;
> > +}
> > +
> > +void gro_tcp_tbl_destroy(void *tbl)
> > +{
> > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > +
> > +	if (tcp_tbl) {
> > +		if (tcp_tbl->items)
> > +			rte_free(tcp_tbl->items);
> > +		if (tcp_tbl->keys)
> > +			rte_free(tcp_tbl->keys);
> > +		rte_free(tcp_tbl);
> > +	}
> > +}
> > +
> > +/**
> > + * merge two TCP/IPv4 packets without update checksums.
> > + */
> > +static int
> > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > +		struct rte_mbuf *pkt,
> > +		uint32_t max_packet_size)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > +	struct tcp_hdr *tcp_hdr1;
> > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > +	struct rte_mbuf *tail;
> > +
> > +	/* parse the given packet */
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				struct ether_hdr *) + 1);
> > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > +		- tcp_hl1;
> > +
> > +	/* parse the original packet */
> > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > +				struct ether_hdr *) + 1);
> > +
> > +	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
> > +		return -1;
> > +
> > +	/* remove the header of the incoming packet */
> > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > +			ipv4_ihl1 + tcp_hl1);
> > +
> > +	/* chain the two packet together */
> > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > +	tail->next = pkt;
> > +
> > +	/* update IP header */
> > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > +			rte_be_to_cpu_16(
> > +				ipv4_hdr2->total_length)
> > +			+ tcp_dl1);
> > +
> > +	/* update mbuf metadata for the merged packet */
> > +	pkt_src->nb_segs++;
> > +	pkt_src->pkt_len += pkt->pkt_len;
> > +	return 1;
> > +}
> > +
> > +static int
> > +check_seq_option(struct rte_mbuf *pkt,
> > +		struct tcp_hdr *tcp_hdr,
> > +		uint16_t tcp_hl)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr1;
> > +	struct tcp_hdr *tcp_hdr1;
> > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > +	uint32_t sent_seq1, sent_seq;
> > +	int ret = -1;
> > +
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				struct ether_hdr *) + 1);
> > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > +		- tcp_hl1;
> > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	/* check if the two packets are neighbor */
> > +	if ((sent_seq ^ sent_seq1) == 0) {
> > +		/* check if TCP option field equals */
> > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> > +			if ((tcp_hl1 != tcp_hl) ||
> > +					(memcmp(tcp_hdr1 + 1,
> > +							tcp_hdr + 1,
> > +							tcp_hl - sizeof
> > +							(struct tcp_hdr))
> > +					 == 0))
> > +				ret = 1;
> > +		}
> > +	}
> > +	return ret;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++)
> > +		if (tbl->items[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_key(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++)
> > +		if (tbl->keys[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp_tbl *tbl,
> > +		uint32_t max_packet_size)
> > +{
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > +
> > +	struct tcp_key key;
> > +	uint32_t cur_idx, prev_idx, item_idx;
> > +	uint32_t i, key_idx;
> > +
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > +
> > +	/* check if the packet should be processed */
> > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > +		goto fail;
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > +		- tcp_hl;
> > +	if (tcp_dl == 0)
> > +		goto fail;
> > +
> > +	/* find a key and traverse all packets in its item group */
> > +	key.eth_saddr = eth_hdr->s_addr;
> > +	key.eth_daddr = eth_hdr->d_addr;
> > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		if (tbl->keys[i].is_valid &&
> > +				(memcmp(&(tbl->keys[i].key), &key,
> > +						sizeof(struct tcp_key))
> > +				 == 0)) {
> > +			cur_idx = tbl->keys[i].start_index;
> > +			prev_idx = cur_idx;
> > +			while (cur_idx != INVALID_ARRAY_INDEX) {
> > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > +							tcp_hdr,
> > +							tcp_hl) > 0) {
> > +					if (merge_two_tcp4_packets(
> > +								tbl->items[cur_idx].pkt,
> > +								pkt,
> > +								max_packet_size) > 0) {
> > +						/* successfully merge two packets */
> > +						tbl->items[cur_idx].is_groed = 1;
> > +						return 1;
> > +					}
> > +					/**
> > +					 * fail to merge two packets since
> > +					 * it's beyond the max packet length.
> > +					 * Insert it into the item group.
> > +					 */
> > +					goto insert_to_item_group;
> > +				} else {
> > +					prev_idx = cur_idx;
> > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > +				}
> > +			}
> > +			/**
> > +			 * find a corresponding item group but fails to find
> > +			 * one packet to merge. Insert it into this item group.
> > +			 */
> > +insert_to_item_group:
> > +			item_idx = find_an_empty_item(tbl);
> > +			/* the item number is beyond the maximum value */
> > +			if (item_idx == INVALID_ARRAY_INDEX)
> > +				return -1;
> > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > +			tbl->items[item_idx].pkt = pkt;
> > +			tbl->items[item_idx].is_groed = 0;
> > +			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +			tbl->items[item_idx].is_valid = 1;
> > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > +			tbl->item_num++;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/**
> > +	 * merge fail as the given packet has a
> > +	 * new key. So insert a new key.
> > +	 */
> > +	item_idx = find_an_empty_item(tbl);
> > +	key_idx = find_an_empty_key(tbl);
> > +	/**
> > +	 * if the key or item number is beyond the maximum
> > +	 * value, the inputted packet won't be processed.
> > +	 */
> > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > +			key_idx == INVALID_ARRAY_INDEX)
> > +		return -1;
> > +	tbl->items[item_idx].pkt = pkt;
> > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +	tbl->items[item_idx].is_groed = 0;
> > +	tbl->items[item_idx].is_valid = 1;
> > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > +	tbl->item_num++;
> > +
> > +	memcpy(&(tbl->keys[key_idx].key),
> > +			&key, sizeof(struct tcp_key));
> > +	tbl->keys[key_idx].start_index = item_idx;
> > +	tbl->keys[key_idx].is_valid = 1;
> > +	tbl->key_num++;
> > +
> > +	return 0;
> > +fail:
> > +	return -1;
> > +}
> > +
> > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out)
> > +{
> > +	uint32_t i, num = 0;
> > +
> > +	if (nb_out < tbl->item_num)
> > +		return 0;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++) {
> > +		if (tbl->items[i].is_valid) {
> > +			out[num++] = tbl->items[i].pkt;
> > +			tbl->items[i].is_valid = 0;
> > +			tbl->item_num--;
> > +		}
> > +	}
> > +	memset(tbl->keys, 0, sizeof(struct gro_tcp_key) *
> > +			tbl->max_key_num);
> > +	tbl->key_num = 0;
> > +
> > +	return num;
> > +}
> > +
> > +uint16_t
> > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out)
> > +{
> > +	uint16_t k;
> > +	uint32_t i, j;
> > +	uint64_t current_time;
> > +
> > +	if (nb_out == 0)
> > +		return 0;
> > +	k = 0;
> > +	current_time = rte_rdtsc();
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		if (tbl->keys[i].is_valid) {
> > +			j = tbl->keys[i].start_index;
> > +			while (j != INVALID_ARRAY_INDEX) {
> > +				if (current_time - tbl->items[j].start_time >=
> > +						timeout_cycles) {
> > +					out[k++] = tbl->items[j].pkt;
> > +					tbl->items[j].is_valid = 0;
> > +					tbl->item_num--;
> > +					j = tbl->items[j].next_pkt_idx;
> > +
> > +					if (k == nb_out &&
> > +							j == INVALID_ARRAY_INDEX) {
> > +						/* delete the key */
> > +						tbl->keys[i].is_valid = 0;
> > +						tbl->key_num--;
> > +						goto end;
> > +					} else if (k == nb_out &&
> > +							j != INVALID_ARRAY_INDEX) {
> > +						/* update the first item index */
> > +						tbl->keys[i].start_index = j;
> > +						goto end;
> > +					}
> > +				}
> > +			}
> > +			/* delete the key, as all of its packets are flushed */
> > +			tbl->keys[i].is_valid = 0;
> > +			tbl->key_num--;
> > +		}
> > +		if (tbl->key_num == 0)
> > +			goto end;
> > +	}
> > +end:
> > +	return k;
> > +}
> > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > new file mode 100644
> > index 0000000..4c4f9c7
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp.h
> > @@ -0,0 +1,188 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_TCP_H_
> > +#define _RTE_GRO_TCP_H_
> > +
> > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > +#define TCP_HDR_LEN(tcph) \
> > +	((tcph->data_off >> 4) * 4)
> > +#define IPv4_HDR_LEN(iph) \
> > +	((iph->version_ihl & 0x0f) * 4)
> > +#else
> > +#define TCP_DATAOFF_MASK 0x0f
> > +#define TCP_HDR_LEN(tcph) \
> > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > +#define IPv4_HDR_LEN(iph) \
> > +	((iph->version_ihl >> 4) * 4)
> > +#endif
> > +
> > +#define INVALID_ARRAY_INDEX 0xffffffffUL
> > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> 
> Defining such a big number does not make any sense.
> 
> > +
> > +/* criteria of mergeing packets */
> > +struct tcp_key {
> > +	struct ether_addr eth_saddr;
> > +	struct ether_addr eth_daddr;
> > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > +	uint32_t ip_dst_addr[4];
> > +
> > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > +	uint16_t src_port;
> > +	uint16_t dst_port;
> > +	uint8_t tcp_flags;	/**< TCP flags. */
> > +};
> > +
> > +struct gro_tcp_key {
> > +	struct tcp_key key;
> > +	uint32_t start_index;	/**< the first packet index of the flow */
> > +	uint8_t is_valid;
> > +};
> > +
> > +struct gro_tcp_item {
> > +	struct rte_mbuf *pkt;	/**< packet address. */
> > +	/* the time when the packet in added into the table */
> > +	uint64_t start_time;
> > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > +	/* flag to indicate if the packet is GROed */
> > +	uint8_t is_groed;
> > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> > +};
> > +
> > +/**
> > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > + * structure.
> > + */
> > +struct gro_tcp_tbl {
> > +	struct gro_tcp_item *items;	/**< item array */
> > +	struct gro_tcp_key *keys;	/**< key array */
> > +	uint32_t item_num;	/**< current item number */
> > +	uint32_t key_num;	/**< current key num */
> > +	uint32_t max_item_num;	/**< item array size */
> > +	uint32_t max_key_num;	/**< key array size */
> > +};
> > +
> > +/**
> > + * This function creates a TCP reassembly table.
> > + *
> > + * @param socket_id
> > + *  socket index where the Ethernet port connects to.
> > + * @param max_flow_num
> > + *  the maximum number of flows in the TCP GRO table
> > + * @param max_item_per_flow
> > + *  the maximum packet number per flow.
> > + * @return
> > + *  if create successfully, return a pointer which points to the
> > + *  created TCP GRO table. Otherwise, return NULL.
> > + */
> > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> > +
> > +/**
> > + * This function destroys a TCP reassembly table.
> > + * @param tbl
> > + *  a pointer points to the TCP reassembly table.
> > + */
> > +void gro_tcp_tbl_destroy(void *tbl);
> > +
> > +/**
> > + * This function searches for a packet in the TCP reassembly table to
> > + * merge with the inputted one. To merge two packets is to chain them
> > + * together and update packet headers. If the packet is without data
> > + * (e.g. SYN, SYN-ACK packet), this function returns immediately.
> > + * Otherwise, the packet is either merged, or inserted into the table.
> > + *
> > + * This function assumes the inputted packet is with correct IPv4 and
> > + * TCP checksums. And if two packets are merged, it won't re-calculate
> > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> > + * fragmented, it assumes the packet is complete (with TCP header).
> > + *
> > + * @param pkt
> > + *  packet to reassemble.
> > + * @param tbl
> > + *  a pointer that points to a TCP reassembly table.
> > + * @param max_packet_size
> > + *  max packet length after merged
> > + * @return
> > + *  if the packet doesn't have data, return a negative value. If the
> > + *  packet is merged successfully, return an positive value. If the
> > + *  packet is inserted into the table, return 0.
> > + */
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp_tbl *tbl,
> > +		uint32_t max_packet_size);
> > +
> > +/**
> > + * This function flushes all packets in a TCP reassembly table to
> > + * applications, and without updating checksums for merged packets.
> > + * If the array which is used to keep flushed packets is not large
> > + * enough, error happens and this function returns immediately.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets. Applications
> > + *  should guarantee it's large enough to hold all packets in the table.
> > + * @param nb_out
> > + *  the element number of out.
> > + * @return
> > + *  the number of flushed packets. If out is not large enough to hold
> > + *  all packets in the table, return 0.
> > + */
> > +uint16_t
> > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out);
> > +
> > +/**
> > + * This function flushes timeout packets in a TCP reassembly table to
> > + * applications, and without updating checksums for merged packets.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param timeout_cycles
> > + *  the maximum time that packets can stay in the table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets.
> > + * @param nb_out
> > + *  the element number of out.
> > + * @return
> > + *  the number of packets that are returned.
> > + */
> > +uint16_t
> > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out);
> > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v7 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
                             ` (3 preceding siblings ...)
  2017-06-25 16:03           ` [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
@ 2017-06-26  6:43           ` Jiayu Hu
  2017-06-26  6:43             ` [PATCH v7 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                               ` (3 more replies)
  4 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-26  6:43 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, stephen, yliu, jingjing.wu,
	lei.a.yao, keith.wiles, tiwei.bie, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweigth mode and heavyweight mode. If applications want to merge
packets in a simple way, they can select the lightweight mode API. If
applications need more fine-grained controls, they can select the
heavyweigth mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++++
 app/test-pmd/config.c                       |  37 +++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 ++++
 lib/librte_gro/rte_gro.c                    | 218 +++++++++++++++
 lib/librte_gro/rte_gro.h                    | 209 +++++++++++++++
 lib/librte_gro/rte_gro_tcp.c                | 394 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h                | 191 ++++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1305 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v7 1/3] lib: add Generic Receive Offload API framework
  2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
@ 2017-06-26  6:43             ` Jiayu Hu
  2017-06-27 23:42               ` Ananyev, Konstantin
  2017-06-26  6:43             ` [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-26  6:43 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, stephen, yliu, jingjing.wu,
	lei.a.yao, keith.wiles, tiwei.bie, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweigth mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweigth mode.
If applications need more fine-grained controls, they can choose the
heavyweigth mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweigth mode and processes N packets at a time. For applications,
performing GRO in lightweigth mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and processes one packet at a time. For applications,
performing GRO in heavyweigth mode is relatively complicated. Before
performing GRO, applications need to create a GRO table by
rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
packets one by one. The processed packets are in the GRO table. If
applications want to get them, applications need to manually flush
them by flush APIs.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++
 lib/librte_gro/rte_gro.c           | 125 ++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 205 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 400 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index f6aafd1..167f5ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..33275e8
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,125 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM];
+
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	struct rte_gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct rte_gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	gro_tbl->max_packet_size = max_packet_size;
+	gro_tbl->max_timeout_cycles = max_timeout_cycles;
+	gro_tbl->desired_gro_types = desired_gro_types;
+
+	for (i = 0; i < GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if (desired_gro_types & gro_type_flag) {
+			create_tbl_fn = tbl_create_functions[i];
+			if (create_tbl_fn)
+				create_tbl_fn(socket_id,
+						max_flow_num,
+						max_item_per_flow);
+			else
+				gro_tbl->tbls[i] = NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if (gro_tbl->desired_gro_types & gro_type_flag) {
+			destroy_tbl_fn = tbl_destroy_functions[i];
+			if (destroy_tbl_fn)
+				destroy_tbl_fn(gro_tbl->tbls[i]);
+			gro_tbl->tbls[i] = NULL;
+		}
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param __rte_unused)
+{
+	return nb_pkts;
+}
+
+int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
+		struct rte_gro_tbl *gro_tbl __rte_unused)
+{
+	return -1;
+}
+
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint16_t
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		const uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..f9d36e8
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,205 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * the max packets number that rte_gro_reassemble_burst can
+ * process in each invocation.
+ */
+#define GRO_MAX_BURST_ITEM_NUM 1024UL
+
+/* max number of supported GRO types */
+#define GRO_TYPE_MAX_NUM 64
+#define GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+
+/**
+ * GRO table, which is used to merge packets. It keeps many reassembly
+ * tables of desired GRO types. Applications need to create GRO tables
+ * before using rte_gro_reassemble to perform GRO.
+ */
+struct rte_gro_tbl {
+	uint64_t desired_gro_types;	/**< GRO types to perform */
+	/* max TTL measured in nanosecond */
+	uint64_t max_timeout_cycles;
+	/* max length of merged packet measured in byte */
+	uint32_t max_packet_size;
+	/* reassebly tables of desired GRO types */
+	void *tbls[GRO_TYPE_MAX_NUM];
+};
+
+struct rte_gro_param {
+	uint64_t desired_gro_types;	/**< desired GRO types */
+	uint32_t max_packet_size;	/**< max length of merged packets */
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max packet number per flow */
+};
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+
+/**
+ * This function create a GRO table, which is used to merge packets.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  max number of flows in the GRO table.
+ * @param max_item_per_flow
+ *  max packet number per flow. We use the value of (max_flow_num *
+ *  max_item_per_fow) to calculate table size.
+ * @param max_packet_size
+ *  max length of merged packets. Measured in byte.
+ * @param max_timeout_cycles
+ *  max TTL for a packet in the GRO table. It's measured in nanosecond.
+ * @param desired_gro_types
+ *  GRO types to perform.
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow,
+		uint32_t max_packet_size,
+		uint64_t max_timeout_cycles,
+		uint64_t desired_gro_types);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to  applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps addresses of GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		const uint16_t nb_pkts,
+		const struct rte_gro_param param);
+
+/**
+ * Reassembly function, which tries to merge the inputted packet with
+ * one packet in a given GRO table. This function assumes the inputted
+ * packet is with correct checksums. And it won't update checksums if
+ * two packets are merged. Besides, if the inputted packet is IP
+ * fragmented, this function assumes it's a complete packet (i.e. with
+ * L4 header).
+ *
+ * If the inputted packet doesn't have data or it's with unsupported GRO
+ * type, function returns immediately. Otherwise, the inputted packet is
+ * either merged or inserted into the table. If applications want get
+ * packets in the table, they need to call flush APIs.
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param gro_tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  if merge the packet successfully, return a positive value. If fail
+ *  to merge, return zero. If the packet doesn't have data, or its GRO
+ *  type is unsupported, return a negative value.
+ */
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl);
+
+/**
+ * This function flushed packets from reassembly tables of desired GRO
+ * types. It won't re-calculate checksums for merged packets in the
+ * tables. That is, the returned packets may be with wrong checksums.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ *  GRO types whose packets will be flushed.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. It won't re-calculate checksums for merged packets
+ * in the tables. That is, the returned packets may be with wrong
+ * checksums.
+ *
+ * @param gro_tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed packets.
+ * @param nb_out
+ *  the size of out.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out);
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..827596b
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_tbl_create;
+	rte_gro_tbl_destroy;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_flush;
+	rte_gro_timeout_flush;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
  2017-06-26  6:43             ` [PATCH v7 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-26  6:43             ` Jiayu Hu
  2017-06-28 23:56               ` Ananyev, Konstantin
  2017-06-26  6:43             ` [PATCH v7 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-26  6:43 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, stephen, yliu, jingjing.wu,
	lei.a.yao, keith.wiles, tiwei.bie, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp_tbl_create: create a TCP reassembly table, which is used to
    merge packets.
- gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
- gro_tcp_tbl_flush: flush all packets from a TCP reassembly table.
- gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
    reassembly table.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP GRO, we use a table structure, called TCP reassembly table, to
reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
structure. A TCP reassembly table includes a key array and a item array,
where the key array keeps the criteria to merge packets and the item
array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes two parts:
- pkt: packet address
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with the following
    properties won't be processed:
	- packets without data (e.g. SYN, SYN-ACK)
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/rte_gro.c               | 123 ++++++++--
 lib/librte_gro/rte_gro.h               |   6 +-
 lib/librte_gro/rte_gro_tcp.c           | 394 +++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h           | 191 ++++++++++++++++
 6 files changed, 706 insertions(+), 16 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..e89344d 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 33275e8..5b89928 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,11 +32,15 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp.h"
 
-static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM] = {
+	gro_tcp_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM] = {
+	gro_tcp_tbl_destroy, NULL};
 
 struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -94,32 +98,121 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		const uint16_t nb_pkts,
-		const struct rte_gro_param param __rte_unused)
+		const struct rte_gro_param param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num = RTE_MIN(nb_pkts, param.max_flow_num *
+			param.max_item_per_flow);
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	uint32_t tcp_item_num = RTE_MIN(item_num, GRO_MAX_BURST_ITEM_NUM);
+	struct gro_tcp_tbl tcp_tbl;
+	struct gro_tcp_key tcp_keys[tcp_item_num];
+	struct gro_tcp_item tcp_items[tcp_item_num];
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+
+	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
+			tcp_item_num);
+	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
+			tcp_item_num);
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = tcp_item_num;
+	tcp_tbl.max_item_num = tcp_item_num;
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) {
+			if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) &&
+				(param.desired_gro_types &
+					 GRO_TCP_IPV4)) {
+				ret = gro_tcp4_reassemble(pkts[i],
+						&tcp_tbl,
+						param.max_packet_size);
+				/* merge successfully */
+				if (ret > 0)
+					nb_after_gro--;
+				else if (ret < 0)
+					unprocess_pkts[unprocess_num++] =
+						pkts[i];
+			} else
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		if (param.desired_gro_types & GRO_TCP_IPV4)
+			i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts);
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+			i += unprocess_num;
+		}
+		if (nb_pkts > i)
+			memset(&pkts[i], 0,
+					sizeof(struct rte_mbuf *) *
+					(nb_pkts - i));
+	}
+	return nb_after_gro;
 }
 
-int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
-		struct rte_gro_tbl *gro_tbl __rte_unused)
+int rte_gro_reassemble(struct rte_mbuf *pkt,
+		struct rte_gro_tbl *gro_tbl)
 {
+	if (unlikely(pkt == NULL))
+		return -1;
+
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+		if ((pkt->packet_type & RTE_PTYPE_L4_TCP) &&
+				(gro_tbl->desired_gro_types &
+				 GRO_TCP_IPV4))
+			return gro_tcp4_reassemble(pkt,
+					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+					gro_tbl->max_packet_size);
+	}
+
 	return -1;
 }
 
-uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				out,
+				max_nb_out);
 	return 0;
 }
 
 uint16_t
-rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		const uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		const uint16_t max_nb_out)
 {
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & GRO_TCP_IPV4)
+		return gro_tcp_tbl_timeout_flush(
+				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index f9d36e8..a30b1c6 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,7 +45,11 @@ extern "C" {
 
 /* max number of supported GRO types */
 #define GRO_TYPE_MAX_NUM 64
-#define GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+#define GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
+
+/* TCP/IPv4 GRO flag */
+#define GRO_TCP_IPV4_INDEX 0
+#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
 
 /**
  * GRO table, which is used to merge packets. It keeps many reassembly
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..c0eaa45
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,394 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp.h"
+
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp_tbl *tbl;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
+		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+
+	size = sizeof(struct gro_tcp_item) * entries_num;
+	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp_key) * entries_num;
+	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	tbl->max_key_num = entries_num;
+	return tbl;
+}
+
+void gro_tcp_tbl_destroy(void *tbl)
+{
+	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (tcp_tbl) {
+		if (tcp_tbl->items)
+			rte_free(tcp_tbl->items);
+		if (tcp_tbl->keys)
+			rte_free(tcp_tbl->keys);
+		rte_free(tcp_tbl);
+	}
+}
+
+/**
+ * merge two TCP/IPv4 packets without update checksums.
+ */
+static int
+merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
+		struct rte_mbuf *pkt,
+		uint32_t max_packet_size)
+{
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	struct rte_mbuf *tail;
+
+	/* parse the given packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+
+	/* parse the original packet */
+	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
+				struct ether_hdr *) + 1);
+
+	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
+		return -1;
+
+	/* remove the header of the incoming packet */
+	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
+			ipv4_ihl1 + tcp_hl1);
+
+	/* chain the two packet together */
+	tail = rte_pktmbuf_lastseg(pkt_src);
+	tail->next = pkt;
+
+	/* update IP header */
+	ipv4_hdr2->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr2->total_length)
+			+ tcp_dl1);
+
+	/* update mbuf metadata for the merged packet */
+	pkt_src->nb_segs++;
+	pkt_src->pkt_len += pkt->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl)
+{
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	uint32_t sent_seq1, sent_seq;
+	int ret = -1;
+
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				struct ether_hdr *) + 1);
+	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	/* check if the two packets are neighbor */
+	if ((sent_seq ^ sent_seq1) == 0) {
+		/* check if TCP option field equals */
+		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
+			if ((tcp_hl1 != tcp_hl) ||
+					(memcmp(tcp_hdr1 + 1,
+							tcp_hdr + 1,
+							tcp_hl - sizeof
+							(struct tcp_hdr))
+					 == 0))
+				ret = 1;
+		}
+	}
+	return ret;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static uint32_t
+find_an_empty_key(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
+
+	struct tcp_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, key_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
+
+	/* check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		goto fail;
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = TCP_HDR_LEN(tcp_hdr);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		goto fail;
+
+	/* find a key and traverse all packets in its item group */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if (tbl->keys[i].is_valid &&
+				(memcmp(&(tbl->keys[i].key), &key,
+						sizeof(struct tcp_key))
+				 == 0)) {
+			cur_idx = tbl->keys[i].start_index;
+			prev_idx = cur_idx;
+			while (cur_idx != INVALID_ARRAY_INDEX) {
+				if (check_seq_option(tbl->items[cur_idx].pkt,
+							tcp_hdr,
+							tcp_hl) > 0) {
+					if (merge_two_tcp4_packets(
+								tbl->items[cur_idx].pkt,
+								pkt,
+								max_packet_size) > 0) {
+						/* successfully merge two packets */
+						tbl->items[cur_idx].is_groed = 1;
+						return 1;
+					}
+					/**
+					 * fail to merge two packets since
+					 * it's beyond the max packet length.
+					 * Insert it into the item group.
+					 */
+					goto insert_to_item_group;
+				} else {
+					prev_idx = cur_idx;
+					cur_idx = tbl->items[cur_idx].next_pkt_idx;
+				}
+			}
+			/**
+			 * find a corresponding item group but fails to find
+			 * one packet to merge. Insert it into this item group.
+			 */
+insert_to_item_group:
+			item_idx = find_an_empty_item(tbl);
+			/* the item number is greater than the max value */
+			if (item_idx == INVALID_ARRAY_INDEX)
+				return -1;
+			tbl->items[prev_idx].next_pkt_idx = item_idx;
+			tbl->items[item_idx].pkt = pkt;
+			tbl->items[item_idx].is_groed = 0;
+			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+			tbl->items[item_idx].is_valid = 1;
+			tbl->items[item_idx].start_time = rte_rdtsc();
+			tbl->item_num++;
+			return 0;
+		}
+	}
+
+	/**
+	 * merge fail as the given packet has a new key.
+	 * So insert a new key.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	key_idx = find_an_empty_key(tbl);
+	/**
+	 * if current key or item number is greater than the max
+	 * value, don't insert the packet into the table and return
+	 * immediately.
+	 */
+	if (item_idx == INVALID_ARRAY_INDEX ||
+			key_idx == INVALID_ARRAY_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].is_groed = 0;
+	tbl->items[item_idx].is_valid = 1;
+	tbl->items[item_idx].start_time = rte_rdtsc();
+	tbl->item_num++;
+
+	memcpy(&(tbl->keys[key_idx].key),
+			&key, sizeof(struct tcp_key));
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->keys[key_idx].is_valid = 1;
+	tbl->key_num++;
+
+	return 0;
+fail:
+	return -1;
+}
+
+uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint32_t i, num = 0;
+
+	if (nb_out < tbl->item_num)
+		return 0;
+
+	for (i = 0; i < tbl->max_item_num; i++) {
+		if (tbl->items[i].is_valid) {
+			out[num++] = tbl->items[i].pkt;
+			tbl->items[i].is_valid = 0;
+			tbl->item_num--;
+		}
+	}
+	memset(tbl->keys, 0, sizeof(struct gro_tcp_key) *
+			tbl->max_key_num);
+	tbl->key_num = 0;
+
+	return num;
+}
+
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out)
+{
+	uint16_t k;
+	uint32_t i, j;
+	uint64_t current_time;
+
+	if (nb_out == 0)
+		return 0;
+	k = 0;
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if (tbl->keys[i].is_valid) {
+			j = tbl->keys[i].start_index;
+			while (j != INVALID_ARRAY_INDEX) {
+				if (current_time - tbl->items[j].start_time >=
+						timeout_cycles) {
+					out[k++] = tbl->items[j].pkt;
+					tbl->items[j].is_valid = 0;
+					tbl->item_num--;
+					j = tbl->items[j].next_pkt_idx;
+
+					if (k == nb_out &&
+							j == INVALID_ARRAY_INDEX) {
+						/* delete the key */
+						tbl->keys[i].is_valid = 0;
+						tbl->key_num--;
+						goto end;
+					} else if (k == nb_out &&
+							j != INVALID_ARRAY_INDEX) {
+						/* update the first item index */
+						tbl->keys[i].start_index = j;
+						goto end;
+					}
+				}
+			}
+			/* delete the key, as all of its packets are flushed */
+			tbl->keys[i].is_valid = 0;
+			tbl->key_num--;
+		}
+		if (tbl->key_num == 0)
+			goto end;
+	}
+end:
+	return k;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..a9a7aca
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,191 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off >> 4) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl & 0x0f) * 4)
+#else
+#define TCP_DATAOFF_MASK 0x0f
+#define TCP_HDR_LEN(tcph) \
+	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
+#define IPv4_HDR_LEN(iph) \
+	((iph->version_ihl >> 4) * 4)
+#endif
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/* criteria of mergeing packets */
+struct tcp_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
+	uint32_t ip_dst_addr[4];
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+};
+
+struct gro_tcp_key {
+	struct tcp_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+	/* flag to indicate if the packet is GROed */
+	uint8_t is_groed;
+	uint8_t is_valid;	/**< flag indicates if the item is valid */
+};
+
+/**
+ * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
+ * structure.
+ */
+struct gro_tcp_tbl {
+	struct gro_tcp_item *items;	/**< item array */
+	struct gro_tcp_key *keys;	/**< key array */
+	uint32_t item_num;	/**< current item number */
+	uint32_t key_num;	/**< current key num */
+	uint32_t max_item_num;	/**< item array size */
+	uint32_t max_key_num;	/**< key array size */
+};
+
+/**
+ * This function creates a TCP reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP reassembly table.
+ */
+void gro_tcp_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP reassembly table to
+ * merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. If the packet is without data
+ * (e.g. SYN, SYN-ACK packet), this function returns immediately.
+ * Otherwise, the packet is either merged, or inserted into the table.
+ * Besides, if there is no available space to insert the packet, this
+ * function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP reassembly table.
+ * @param max_packet_size
+ *  max packet length after merged
+ * @return
+ *  if the packet doesn't have data, or there is no available space
+ *  in the table to insert a new item or a new key, return a negative
+ *  value. If the packet is merged successfully, return an positive
+ *  value. If the packet is inserted into the table, return 0.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size);
+
+/**
+ * This function flushes all packets in a TCP reassembly table to
+ * applications, and without updating checksums for merged packets.
+ * If the array which is used to keep flushed packets is not large
+ * enough, error happens and this function returns immediately.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param out
+ *  pointer array which is used to keep flushed packets. Applications
+ *  should guarantee it's large enough to hold all packets in the table.
+ * @param nb_out
+ *  the element number of out.
+ * @return
+ *  the number of flushed packets. If out is not large enough to hold
+ *  all packets in the table, return 0.
+ */
+uint16_t
+gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+
+/**
+ * This function flushes timeout packets in a TCP reassembly table to
+ * applications, and without updating checksums for merged packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out.
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		const uint16_t nb_out);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v7 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
  2017-06-26  6:43             ` [PATCH v7 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-26  6:43             ` [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-26  6:43             ` Jiayu Hu
  2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-26  6:43 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, stephen, yliu, jingjing.wu,
	lei.a.yao, keith.wiles, tiwei.bie, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  37 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 215 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ff8ffd2..cb359e1 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b0b340e..2a33a63 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,42 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.desired_gro_types = GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..430bd8b 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				gro_ports[fs->rx_port].param);
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b29328a..ed27c7a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..377d933 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -428,6 +430,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2b9a1ea..528c833 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 1/3] lib: add Generic Receive Offload API framework
  2017-06-26  6:43             ` [PATCH v7 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-27 23:42               ` Ananyev, Konstantin
  2017-06-28  2:17                 ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-06-27 23:42 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A, Wiles,
	Keith, Bie, Tiwei

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Monday, June 26, 2017 7:44 AM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org;
> yliu@fridaylinux.org; Wu, Jingjing <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Wiles, Keith <keith.wiles@intel.com>; Bie,
> Tiwei <tiwei.bie@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: [PATCH v7 1/3] lib: add Generic Receive Offload API framework
> 
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
> 
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweigth mode, the other is called heavyweight mode.
> If applications want to merge packets in a simple way and the number
> of packets is relatively small, they can use the lightweigth mode.
> If applications need more fine-grained controls, they can choose the
> heavyweigth mode.
> 
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweigth mode and processes N packets at a time. For applications,
> performing GRO in lightweigth mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
> 
> rte_gro_reassemble is the main reassembly API which is used in
> heavyweight mode and processes one packet at a time. For applications,
> performing GRO in heavyweigth mode is relatively complicated. Before
> performing GRO, applications need to create a GRO table by
> rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
> packets one by one. The processed packets are in the GRO table. If
> applications want to get them, applications need to manually flush
> them by flush APIs.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>  config/common_base                 |   5 +
>  lib/Makefile                       |   2 +
>  lib/librte_gro/Makefile            |  50 +++++++++
>  lib/librte_gro/rte_gro.c           | 125 ++++++++++++++++++++++
>  lib/librte_gro/rte_gro.h           | 205 +++++++++++++++++++++++++++++++++++++
>  lib/librte_gro/rte_gro_version.map |  12 +++
>  mk/rte.app.mk                      |   1 +
>  7 files changed, 400 insertions(+)
>  create mode 100644 lib/librte_gro/Makefile
>  create mode 100644 lib/librte_gro/rte_gro.c
>  create mode 100644 lib/librte_gro/rte_gro.h
>  create mode 100644 lib/librte_gro/rte_gro_version.map
> 
> diff --git a/config/common_base b/config/common_base
> index f6aafd1..167f5ef 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>  CONFIG_RTE_LIBRTE_PMD_VHOST=n
> 
>  #
> +# Compile GRO library
> +#
> +CONFIG_RTE_LIBRTE_GRO=y
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/lib/Makefile b/lib/Makefile
> index 07e1fd0..ac1c2f6 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
>  DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
>  DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
>  DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
> +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
> +DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
> 
>  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> new file mode 100644
> index 0000000..7e0f128
> --- /dev/null
> +++ b/lib/librte_gro/Makefile
> @@ -0,0 +1,50 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_gro.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +
> +EXPORT_MAP := rte_gro_version.map
> +
> +LIBABIVER := 1
> +
> +# source files
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +
> +# install this header file
> +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..33275e8
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,125 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +
> +#include "rte_gro.h"
> +
> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM];
> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM];
> +
> +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow,
> +		uint32_t max_packet_size,
> +		uint64_t max_timeout_cycles,
> +		uint64_t desired_gro_types)
> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	struct rte_gro_tbl *gro_tbl;
> +	uint64_t gro_type_flag = 0;
> +	uint8_t i;
> +
> +	gro_tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct rte_gro_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	gro_tbl->max_packet_size = max_packet_size;
> +	gro_tbl->max_timeout_cycles = max_timeout_cycles;
> +	gro_tbl->desired_gro_types = desired_gro_types;
> +
> +	for (i = 0; i < GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if (desired_gro_types & gro_type_flag) {
> +			create_tbl_fn = tbl_create_functions[i];
> +			if (create_tbl_fn)
> +				create_tbl_fn(socket_id,
> +						max_flow_num,
> +						max_item_per_flow);

As I understand, create_tbl_fn(0 can fail.
You should handle such situation correctly.


> +			else
> +				gro_tbl->tbls[i] = NULL;
> +		}
> +	}
> +	return gro_tbl;
> +}
> +
> +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> +{
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	if (gro_tbl == NULL)
> +		return;
> +	for (i = 0; i < GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if (gro_tbl->desired_gro_types & gro_type_flag) {
> +			destroy_tbl_fn = tbl_destroy_functions[i];
> +			if (destroy_tbl_fn)
> +				destroy_tbl_fn(gro_tbl->tbls[i]);
> +			gro_tbl->tbls[i] = NULL;
> +		}
> +	}
> +	rte_free(gro_tbl);
> +}
> +
> +uint16_t
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		const uint16_t nb_pkts,
> +		const struct rte_gro_param param __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> +		struct rte_gro_tbl *gro_tbl __rte_unused)
> +{
> +	return -1;
> +}
> +
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> +
> +uint16_t
> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		const uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> new file mode 100644
> index 0000000..f9d36e8
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.h
> @@ -0,0 +1,205 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_H_
> +#define _RTE_GRO_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * the max packets number that rte_gro_reassemble_burst can
> + * process in each invocation.
> + */
> +#define GRO_MAX_BURST_ITEM_NUM 1024UL
> +
> +/* max number of supported GRO types */
> +#define GRO_TYPE_MAX_NUM 64
> +#define GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */

Herre and everywhere public macros should start with RTE_ prefix to follow
DPDK coding style.

> +
> +/**
> + * GRO table, which is used to merge packets. It keeps many reassembly
> + * tables of desired GRO types. Applications need to create GRO tables
> + * before using rte_gro_reassemble to perform GRO.
> + */
> +struct rte_gro_tbl {
> +	uint64_t desired_gro_types;	/**< GRO types to perform */
> +	/* max TTL measured in nanosecond */
> +	uint64_t max_timeout_cycles;
> +	/* max length of merged packet measured in byte */
> +	uint32_t max_packet_size;
> +	/* reassebly tables of desired GRO types */
> +	void *tbls[GRO_TYPE_MAX_NUM];
> +};

Not sure why do you need to define that structure here.
As I understand it is internal to the library.
Just declaration should be enough.

> +
> +struct rte_gro_param {
> +	uint64_t desired_gro_types;	/**< desired GRO types */
> +	uint32_t max_packet_size;	/**< max length of merged packets */
> +	uint16_t max_flow_num;	/**< max flow number */
> +	uint16_t max_item_per_flow;	/**< max packet number per flow */
> +};
> +
> +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +typedef void (*gro_tbl_destroy_fn)(void *tbl);

Same here  - user probably shouldn't see these typedefs,
so better to hide them inside.

> +
> +/**
> + * This function create a GRO table, which is used to merge packets.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  max number of flows in the GRO table.
> + * @param max_item_per_flow
> + *  max packet number per flow. We use the value of (max_flow_num *
> + *  max_item_per_fow) to calculate table size.
> + * @param max_packet_size
> + *  max length of merged packets. Measured in byte.
> + * @param max_timeout_cycles
> + *  max TTL for a packet in the GRO table. It's measured in nanosecond.
> + * @param desired_gro_types
> + *  GRO types to perform.
> + * @return
> + *  if create successfully, return a pointer which points to the GRO
> + *  table. Otherwise, return NULL.
> + */
> +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow,
> +		uint32_t max_packet_size,
> +		uint64_t max_timeout_cycles,
> +		uint64_t desired_gro_types);

Hm, couldn't we have here struct rte_gro_tbl_param * instead of dozen arguments?

> +/**
> + * This function destroys a GRO table.
> + */
> +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> +
> +/**
> + * This is one of the main reassembly APIs, which merges numbers of
> + * packets at a time. It assumes that all inputted packets are with
> + * correct checksums. That is, applications should guarantee all
> + * inputted packets are correct. Besides, it doesn't re-calculate
> + * checksums for merged packets. If inputted packets are IP fragmented,
> + * this function assumes them are complete (i.e. with L4 header). After
> + * finishing processing, it returns all GROed packets to  applications
> + * immediately.
> + *
> + * @param pkts
> + *  a pointer array which points to the packets to reassemble. Besides,
> + *  it keeps addresses of GROed packets.
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param param
> + *  applications use it to tell rte_gro_reassemble_burst what rules
> + *  are demanded.
> + * @return
> + *  the number of packets after GROed.
> + */
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> +		const uint16_t nb_pkts,

Here and everywhere - no much point in defining intger input parameter
(or any other that is passed by value) as const.

> +		const struct rte_gro_param param);

You probably meant 'const struct rte_gro_param *param' here?

> +
> +/**
> + * Reassembly function, which tries to merge the inputted packet with
> + * one packet in a given GRO table. This function assumes the inputted
> + * packet is with correct checksums. And it won't update checksums if
> + * two packets are merged. Besides, if the inputted packet is IP
> + * fragmented, this function assumes it's a complete packet (i.e. with
> + * L4 header).
> + *
> + * If the inputted packet doesn't have data or it's with unsupported GRO
> + * type, function returns immediately. Otherwise, the inputted packet is
> + * either merged or inserted into the table. If applications want get
> + * packets in the table, they need to call flush APIs.
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param gro_tbl
> + *  a pointer points to a GRO table.
> + * @return
> + *  if merge the packet successfully, return a positive value. If fail
> + *  to merge, return zero. If the packet doesn't have data, or its GRO
> + *  type is unsupported, return a negative value.
> + */
> +int rte_gro_reassemble(struct rte_mbuf *pkt,
> +		struct rte_gro_tbl *gro_tbl);


Ok, and why tbl one can't do bursts?


> +
> +/**
> + * This function flushed packets from reassembly tables of desired GRO
> + * types. It won't re-calculate checksums for merged packets in the
> + * tables. That is, the returned packets may be with wrong checksums.
> + *
> + * @param gro_tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + *  GRO types whose packets will be flushed.
> + * @param out
> + *  a pointer array that is used to keep flushed packets.
> + * @param nb_out
> + *  the size of out.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out);
> +
> +/**
> + * This function flushes the timeout packets from reassembly tables of
> + * desired GRO types. It won't re-calculate checksums for merged packets
> + * in the tables. That is, the returned packets may be with wrong
> + * checksums.
> + *
> + * @param gro_tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + * rte_gro_timeout_flush only processes packets which belong to the
> + * GRO types specified by desired_gro_types.
> + * @param out
> + *  a pointer array that is used to keep flushed packets.
> + * @param nb_out
> + *  the size of out.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out);

No point to have 2 flush() functions.
I suggest to merge them together.

> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
> new file mode 100644
> index 0000000..827596b
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_version.map
> @@ -0,0 +1,12 @@
> +DPDK_17.08 {
> +	global:
> +
> +	rte_gro_tbl_create;
> +	rte_gro_tbl_destroy;
> +	rte_gro_reassemble_burst;
> +	rte_gro_reassemble;
> +	rte_gro_flush;
> +	rte_gro_timeout_flush;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index bcaf1b3..fc3776d 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> 
>  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 1/3] lib: add Generic Receive Offload API framework
  2017-06-27 23:42               ` Ananyev, Konstantin
@ 2017-06-28  2:17                 ` Jiayu Hu
  2017-06-28 17:41                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-28  2:17 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A,
	Wiles, Keith, Bie, Tiwei

Hi Konstantin,

On Wed, Jun 28, 2017 at 07:42:01AM +0800, Ananyev, Konstantin wrote:
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Monday, June 26, 2017 7:44 AM
> > To: dev@dpdk.org
> > Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org;
> > yliu@fridaylinux.org; Wu, Jingjing <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Wiles, Keith <keith.wiles@intel.com>; Bie,
> > Tiwei <tiwei.bie@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> > Subject: [PATCH v7 1/3] lib: add Generic Receive Offload API framework
> > 
> > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > technique to reduce per-packet processing overhead. It gains
> > performance by reassembling small packets into large ones. This
> > patchset is to support GRO in DPDK. To support GRO, this patch
> > implements a GRO API framework.
> > 
> > To enable more flexibility to applications, DPDK GRO is implemented as
> > a user library. Applications explicitly use the GRO library to merge
> > small packets into large ones. DPDK GRO provides two reassembly modes.
> > One is called lightweigth mode, the other is called heavyweight mode.
> > If applications want to merge packets in a simple way and the number
> > of packets is relatively small, they can use the lightweigth mode.
> > If applications need more fine-grained controls, they can choose the
> > heavyweigth mode.
> > 
> > rte_gro_reassemble_burst is the main reassembly API which is used in
> > lightweigth mode and processes N packets at a time. For applications,
> > performing GRO in lightweigth mode is simple. They just need to invoke
> > rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> > rte_gro_reassemble_burst returns.
> > 
> > rte_gro_reassemble is the main reassembly API which is used in
> > heavyweight mode and processes one packet at a time. For applications,
> > performing GRO in heavyweigth mode is relatively complicated. Before
> > performing GRO, applications need to create a GRO table by
> > rte_gro_tbl_create. Then they can use rte_gro_reassemble to process
> > packets one by one. The processed packets are in the GRO table. If
> > applications want to get them, applications need to manually flush
> > them by flush APIs.
> > 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >  config/common_base                 |   5 +
> >  lib/Makefile                       |   2 +
> >  lib/librte_gro/Makefile            |  50 +++++++++
> >  lib/librte_gro/rte_gro.c           | 125 ++++++++++++++++++++++
> >  lib/librte_gro/rte_gro.h           | 205 +++++++++++++++++++++++++++++++++++++
> >  lib/librte_gro/rte_gro_version.map |  12 +++
> >  mk/rte.app.mk                      |   1 +
> >  7 files changed, 400 insertions(+)
> >  create mode 100644 lib/librte_gro/Makefile
> >  create mode 100644 lib/librte_gro/rte_gro.c
> >  create mode 100644 lib/librte_gro/rte_gro.h
> >  create mode 100644 lib/librte_gro/rte_gro_version.map
> > 
> > diff --git a/config/common_base b/config/common_base
> > index f6aafd1..167f5ef 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> >  CONFIG_RTE_LIBRTE_PMD_VHOST=n
> > 
> >  #
> > +# Compile GRO library
> > +#
> > +CONFIG_RTE_LIBRTE_GRO=y
> > +
> > +#
> >  #Compile Xen domain0 support
> >  #
> >  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 07e1fd0..ac1c2f6 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
> >  DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
> >  DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
> >  DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
> > +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
> > +DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
> > 
> >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> >  DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > new file mode 100644
> > index 0000000..7e0f128
> > --- /dev/null
> > +++ b/lib/librte_gro/Makefile
> > @@ -0,0 +1,50 @@
> > +#   BSD LICENSE
> > +#
> > +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > +#   All rights reserved.
> > +#
> > +#   Redistribution and use in source and binary forms, with or without
> > +#   modification, are permitted provided that the following conditions
> > +#   are met:
> > +#
> > +#     * Redistributions of source code must retain the above copyright
> > +#       notice, this list of conditions and the following disclaimer.
> > +#     * Redistributions in binary form must reproduce the above copyright
> > +#       notice, this list of conditions and the following disclaimer in
> > +#       the documentation and/or other materials provided with the
> > +#       distribution.
> > +#     * Neither the name of Intel Corporation nor the names of its
> > +#       contributors may be used to endorse or promote products derived
> > +#       from this software without specific prior written permission.
> > +#
> > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +# library name
> > +LIB = librte_gro.a
> > +
> > +CFLAGS += -O3
> > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> > +
> > +EXPORT_MAP := rte_gro_version.map
> > +
> > +LIBABIVER := 1
> > +
> > +# source files
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +
> > +# install this header file
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > new file mode 100644
> > index 0000000..33275e8
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -0,0 +1,125 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +
> > +#include "rte_gro.h"
> > +
> > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM];
> > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM];
> > +
> > +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow,
> > +		uint32_t max_packet_size,
> > +		uint64_t max_timeout_cycles,
> > +		uint64_t desired_gro_types)
> > +{
> > +	gro_tbl_create_fn create_tbl_fn;
> > +	struct rte_gro_tbl *gro_tbl;
> > +	uint64_t gro_type_flag = 0;
> > +	uint8_t i;
> > +
> > +	gro_tbl = rte_zmalloc_socket(__func__,
> > +			sizeof(struct rte_gro_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	gro_tbl->max_packet_size = max_packet_size;
> > +	gro_tbl->max_timeout_cycles = max_timeout_cycles;
> > +	gro_tbl->desired_gro_types = desired_gro_types;
> > +
> > +	for (i = 0; i < GRO_TYPE_MAX_NUM; i++) {
> > +		gro_type_flag = 1 << i;
> > +		if (desired_gro_types & gro_type_flag) {
> > +			create_tbl_fn = tbl_create_functions[i];
> > +			if (create_tbl_fn)
> > +				create_tbl_fn(socket_id,
> > +						max_flow_num,
> > +						max_item_per_flow);
> 
> As I understand, create_tbl_fn(0 can fail.
> You should handle such situation correctly.

Thanks, I will add failure check.

> 
> 
> > +			else
> > +				gro_tbl->tbls[i] = NULL;
> > +		}
> > +	}
> > +	return gro_tbl;
> > +}
> > +
> > +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > +{
> > +	gro_tbl_destroy_fn destroy_tbl_fn;
> > +	uint64_t gro_type_flag;
> > +	uint8_t i;
> > +
> > +	if (gro_tbl == NULL)
> > +		return;
> > +	for (i = 0; i < GRO_TYPE_MAX_NUM; i++) {
> > +		gro_type_flag = 1 << i;
> > +		if (gro_tbl->desired_gro_types & gro_type_flag) {
> > +			destroy_tbl_fn = tbl_destroy_functions[i];
> > +			if (destroy_tbl_fn)
> > +				destroy_tbl_fn(gro_tbl->tbls[i]);
> > +			gro_tbl->tbls[i] = NULL;
> > +		}
> > +	}
> > +	rte_free(gro_tbl);
> > +}
> > +
> > +uint16_t
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +		const uint16_t nb_pkts,
> > +		const struct rte_gro_param param __rte_unused)
> > +{
> > +	return nb_pkts;
> > +}
> > +
> > +int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > +		struct rte_gro_tbl *gro_tbl __rte_unused)
> > +{
> > +	return -1;
> > +}
> > +
> > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > +		uint64_t desired_gro_types __rte_unused,
> > +		struct rte_mbuf **out __rte_unused,
> > +		const uint16_t max_nb_out __rte_unused)
> > +{
> > +	return 0;
> > +}
> > +
> > +uint16_t
> > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > +		uint64_t desired_gro_types __rte_unused,
> > +		struct rte_mbuf **out __rte_unused,
> > +		const uint16_t max_nb_out __rte_unused)
> > +{
> > +	return 0;
> > +}
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > new file mode 100644
> > index 0000000..f9d36e8
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -0,0 +1,205 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_H_
> > +#define _RTE_GRO_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * the max packets number that rte_gro_reassemble_burst can
> > + * process in each invocation.
> > + */
> > +#define GRO_MAX_BURST_ITEM_NUM 1024UL
> > +
> > +/* max number of supported GRO types */
> > +#define GRO_TYPE_MAX_NUM 64
> > +#define GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
> 
> Herre and everywhere public macros should start with RTE_ prefix to follow
> DPDK coding style.

Thanks, I will change the names.

> 
> > +
> > +/**
> > + * GRO table, which is used to merge packets. It keeps many reassembly
> > + * tables of desired GRO types. Applications need to create GRO tables
> > + * before using rte_gro_reassemble to perform GRO.
> > + */
> > +struct rte_gro_tbl {
> > +	uint64_t desired_gro_types;	/**< GRO types to perform */
> > +	/* max TTL measured in nanosecond */
> > +	uint64_t max_timeout_cycles;
> > +	/* max length of merged packet measured in byte */
> > +	uint32_t max_packet_size;
> > +	/* reassebly tables of desired GRO types */
> > +	void *tbls[GRO_TYPE_MAX_NUM];
> > +};
> 
> Not sure why do you need to define that structure here.
> As I understand it is internal to the library.
> Just declaration should be enough.

This structure defines a GRO table, which is used by rte_gro_reassemble
to merge packets. Applications need to create this table before calling
rte_gro_reassemble. So I define it in rte_gro.h.

> 
> > +
> > +struct rte_gro_param {
> > +	uint64_t desired_gro_types;	/**< desired GRO types */
> > +	uint32_t max_packet_size;	/**< max length of merged packets */
> > +	uint16_t max_flow_num;	/**< max flow number */
> > +	uint16_t max_item_per_flow;	/**< max packet number per flow */
> > +};
> > +
> > +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> > +typedef void (*gro_tbl_destroy_fn)(void *tbl);
> 
> Same here  - user probably shouldn't see these typedefs,
> so better to hide them inside.

Thanks, I will hide them inside.

> 
> > +
> > +/**
> > + * This function create a GRO table, which is used to merge packets.
> > + *
> > + * @param socket_id
> > + *  socket index where the Ethernet port connects to.
> > + * @param max_flow_num
> > + *  max number of flows in the GRO table.
> > + * @param max_item_per_flow
> > + *  max packet number per flow. We use the value of (max_flow_num *
> > + *  max_item_per_fow) to calculate table size.
> > + * @param max_packet_size
> > + *  max length of merged packets. Measured in byte.
> > + * @param max_timeout_cycles
> > + *  max TTL for a packet in the GRO table. It's measured in nanosecond.
> > + * @param desired_gro_types
> > + *  GRO types to perform.
> > + * @return
> > + *  if create successfully, return a pointer which points to the GRO
> > + *  table. Otherwise, return NULL.
> > + */
> > +struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow,
> > +		uint32_t max_packet_size,
> > +		uint64_t max_timeout_cycles,
> > +		uint64_t desired_gro_types);
> 
> Hm, couldn't we have here struct rte_gro_tbl_param * instead of dozen arguments?

Thanks, I will change it.

> 
> > +/**
> > + * This function destroys a GRO table.
> > + */
> > +void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
> > +
> > +/**
> > + * This is one of the main reassembly APIs, which merges numbers of
> > + * packets at a time. It assumes that all inputted packets are with
> > + * correct checksums. That is, applications should guarantee all
> > + * inputted packets are correct. Besides, it doesn't re-calculate
> > + * checksums for merged packets. If inputted packets are IP fragmented,
> > + * this function assumes them are complete (i.e. with L4 header). After
> > + * finishing processing, it returns all GROed packets to  applications
> > + * immediately.
> > + *
> > + * @param pkts
> > + *  a pointer array which points to the packets to reassemble. Besides,
> > + *  it keeps addresses of GROed packets.
> > + * @param nb_pkts
> > + *  the number of packets to reassemble.
> > + * @param param
> > + *  applications use it to tell rte_gro_reassemble_burst what rules
> > + *  are demanded.
> > + * @return
> > + *  the number of packets after GROed.
> > + */
> > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > +		const uint16_t nb_pkts,
> 
> Here and everywhere - no much point in defining intger input parameter
> (or any other that is passed by value) as const.

Thanks. I will modify it.

> 
> > +		const struct rte_gro_param param);
> 
> You probably meant 'const struct rte_gro_param *param' here?

'const struct rte_gro_param *param' is better. I will modify it.
Thanks.

> 
> > +
> > +/**
> > + * Reassembly function, which tries to merge the inputted packet with
> > + * one packet in a given GRO table. This function assumes the inputted
> > + * packet is with correct checksums. And it won't update checksums if
> > + * two packets are merged. Besides, if the inputted packet is IP
> > + * fragmented, this function assumes it's a complete packet (i.e. with
> > + * L4 header).
> > + *
> > + * If the inputted packet doesn't have data or it's with unsupported GRO
> > + * type, function returns immediately. Otherwise, the inputted packet is
> > + * either merged or inserted into the table. If applications want get
> > + * packets in the table, they need to call flush APIs.
> > + *
> > + * @param pkt
> > + *  packet to reassemble.
> > + * @param gro_tbl
> > + *  a pointer points to a GRO table.
> > + * @return
> > + *  if merge the packet successfully, return a positive value. If fail
> > + *  to merge, return zero. If the packet doesn't have data, or its GRO
> > + *  type is unsupported, return a negative value.
> > + */
> > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > +		struct rte_gro_tbl *gro_tbl);
> 
> 
> Ok, and why tbl one can't do bursts?

In current design, if applications want to do bursts, they don't need to
create gro_tbl. rte_gro_reassemble_burst will create a temporary table
in stack. So when do bursts (we call it lightweight mode), the operations
of applications is very simple: calling rte_gro_reassemble_burst. And
after rte_gro_reassemble_burst returns, applications can get all merged
packets. rte_gro_reassemble is another mode API, called heavyweight mode.
The gro_tbl is just used in rte_gro_reassemble. rte_gro_reassemble just
processes one packet at a time.

So you mean: we should enable rte_gro_reassemble to merge N inputted
packets with the packets in a given gro_tbl?

> 
> 
> > +
> > +/**
> > + * This function flushed packets from reassembly tables of desired GRO
> > + * types. It won't re-calculate checksums for merged packets in the
> > + * tables. That is, the returned packets may be with wrong checksums.
> > + *
> > + * @param gro_tbl
> > + *  a pointer points to a GRO table object.
> > + * @param desired_gro_types
> > + *  GRO types whose packets will be flushed.
> > + * @param out
> > + *  a pointer array that is used to keep flushed packets.
> > + * @param nb_out
> > + *  the size of out.
> > + * @return
> > + *  the number of flushed packets. If no packets are flushed, return 0.
> > + */
> > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out);
> > +
> > +/**
> > + * This function flushes the timeout packets from reassembly tables of
> > + * desired GRO types. It won't re-calculate checksums for merged packets
> > + * in the tables. That is, the returned packets may be with wrong
> > + * checksums.
> > + *
> > + * @param gro_tbl
> > + *  a pointer points to a GRO table object.
> > + * @param desired_gro_types
> > + * rte_gro_timeout_flush only processes packets which belong to the
> > + * GRO types specified by desired_gro_types.
> > + * @param out
> > + *  a pointer array that is used to keep flushed packets.
> > + * @param nb_out
> > + *  the size of out.
> > + * @return
> > + *  the number of flushed packets. If no packets are flushed, return 0.
> > + */
> > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out);
> 
> No point to have 2 flush() functions.
> I suggest to merge them together.

rte_gro_flush flush all packets from table, but rte_gro_timeout_flush only
flush timeout packets. They have different operations. But if we merge them
together, we need to flush all or only timeout ones?

> 
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif
> > diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
> > new file mode 100644
> > index 0000000..827596b
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_version.map
> > @@ -0,0 +1,12 @@
> > +DPDK_17.08 {
> > +	global:
> > +
> > +	rte_gro_tbl_create;
> > +	rte_gro_tbl_destroy;
> > +	rte_gro_reassemble_burst;
> > +	rte_gro_reassemble;
> > +	rte_gro_flush;
> > +	rte_gro_timeout_flush;
> > +
> > +	local: *;
> > +};
> > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > index bcaf1b3..fc3776d 100644
> > --- a/mk/rte.app.mk
> > +++ b/mk/rte.app.mk
> > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > 
> >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > --
> > 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 1/3] lib: add Generic Receive Offload API framework
  2017-06-28  2:17                 ` Jiayu Hu
@ 2017-06-28 17:41                   ` Ananyev, Konstantin
  2017-06-29  1:19                     ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-06-28 17:41 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: dev, Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A,
	Wiles, Keith, Bie, Tiwei


Hi Jiayu,

> 
> >
> > > +
> > > +/**
> > > + * GRO table, which is used to merge packets. It keeps many reassembly
> > > + * tables of desired GRO types. Applications need to create GRO tables
> > > + * before using rte_gro_reassemble to perform GRO.
> > > + */
> > > +struct rte_gro_tbl {
> > > +	uint64_t desired_gro_types;	/**< GRO types to perform */
> > > +	/* max TTL measured in nanosecond */
> > > +	uint64_t max_timeout_cycles;
> > > +	/* max length of merged packet measured in byte */
> > > +	uint32_t max_packet_size;
> > > +	/* reassebly tables of desired GRO types */
> > > +	void *tbls[GRO_TYPE_MAX_NUM];
> > > +};
> >
> > Not sure why do you need to define that structure here.
> > As I understand it is internal to the library.
> > Just declaration should be enough.
> 
> This structure defines a GRO table, which is used by rte_gro_reassemble
> to merge packets. Applications need to create this table before calling
> rte_gro_reassemble. So I define it in rte_gro.h.

Yes, application has to call gro_table_create().
But application don't need to access contents of struct rte_gro_tbl,
which means at it can (and should) treat it as opaque pointer.

> > > +
> > > +/**
> > > + * Reassembly function, which tries to merge the inputted packet with
> > > + * one packet in a given GRO table. This function assumes the inputted
> > > + * packet is with correct checksums. And it won't update checksums if
> > > + * two packets are merged. Besides, if the inputted packet is IP
> > > + * fragmented, this function assumes it's a complete packet (i.e. with
> > > + * L4 header).
> > > + *
> > > + * If the inputted packet doesn't have data or it's with unsupported GRO
> > > + * type, function returns immediately. Otherwise, the inputted packet is
> > > + * either merged or inserted into the table. If applications want get
> > > + * packets in the table, they need to call flush APIs.
> > > + *
> > > + * @param pkt
> > > + *  packet to reassemble.
> > > + * @param gro_tbl
> > > + *  a pointer points to a GRO table.
> > > + * @return
> > > + *  if merge the packet successfully, return a positive value. If fail
> > > + *  to merge, return zero. If the packet doesn't have data, or its GRO
> > > + *  type is unsupported, return a negative value.
> > > + */
> > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > +		struct rte_gro_tbl *gro_tbl);
> >
> >
> > Ok, and why tbl one can't do bursts?
> 
> In current design, if applications want to do bursts, they don't need to
> create gro_tbl. rte_gro_reassemble_burst will create a temporary table
> in stack. So when do bursts (we call it lightweight mode), the operations
> of applications is very simple: calling rte_gro_reassemble_burst. And
> after rte_gro_reassemble_burst returns, applications can get all merged
> packets. rte_gro_reassemble is another mode API, called heavyweight mode.
> The gro_tbl is just used in rte_gro_reassemble. rte_gro_reassemble just
> processes one packet at a time.
> 
> So you mean: we should enable rte_gro_reassemble to merge N inputted
> packets with the packets in a given gro_tbl?

Yes, I suppose that will be faster.

> 
> >
> >
> > > +
> > > +/**
> > > + * This function flushed packets from reassembly tables of desired GRO
> > > + * types. It won't re-calculate checksums for merged packets in the
> > > + * tables. That is, the returned packets may be with wrong checksums.
> > > + *
> > > + * @param gro_tbl
> > > + *  a pointer points to a GRO table object.
> > > + * @param desired_gro_types
> > > + *  GRO types whose packets will be flushed.
> > > + * @param out
> > > + *  a pointer array that is used to keep flushed packets.
> > > + * @param nb_out
> > > + *  the size of out.
> > > + * @return
> > > + *  the number of flushed packets. If no packets are flushed, return 0.
> > > + */
> > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out);
> > > +
> > > +/**
> > > + * This function flushes the timeout packets from reassembly tables of
> > > + * desired GRO types. It won't re-calculate checksums for merged packets
> > > + * in the tables. That is, the returned packets may be with wrong
> > > + * checksums.
> > > + *
> > > + * @param gro_tbl
> > > + *  a pointer points to a GRO table object.
> > > + * @param desired_gro_types
> > > + * rte_gro_timeout_flush only processes packets which belong to the
> > > + * GRO types specified by desired_gro_types.
> > > + * @param out
> > > + *  a pointer array that is used to keep flushed packets.
> > > + * @param nb_out
> > > + *  the size of out.
> > > + * @return
> > > + *  the number of flushed packets. If no packets are flushed, return 0.
> > > + */
> > > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > +		uint64_t desired_gro_types,
> > > +		struct rte_mbuf **out,
> > > +		const uint16_t max_nb_out);
> >
> > No point to have 2 flush() functions.
> > I suggest to merge them together.
> 
> rte_gro_flush flush all packets from table, but rte_gro_timeout_flush only
> flush timeout packets. They have different operations. But if we merge them
> together, we need to flush all or only timeout ones?

We can specify that if timeout is zero (or less then current time)  then
we flush all packets.

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-26  6:43             ` [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-28 23:56               ` Ananyev, Konstantin
  2017-06-29  2:26                 ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-06-28 23:56 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A, Wiles,
	Keith, Bie, Tiwei

Hi Jiayu,

> 
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
>     merge packets.
> - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> - gro_tcp_tbl_flush: flush all packets from a TCP reassembly table.
> - gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
>     reassembly table.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
> 
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
> 
> In TCP GRO, we use a table structure, called TCP reassembly table, to
> reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> structure. A TCP reassembly table includes a key array and a item array,
> where the key array keeps the criteria to merge packets and the item
> array keeps packet information.
> 
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
>     merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
> 
> Each element in the item array keeps the information of one packet. It
> mainly includes two parts:
> - pkt: packet address
> - next_pkt_index: the index of the next packet in the same item group.
>     All packets in the same item group are chained by next_pkt_index.
>     With next_pkt_index, we can locate all packets in the same item
>     group one by one.
> 
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with the following
>     properties won't be processed:
> 	- packets without data (e.g. SYN, SYN-ACK)
> b. traverse the key array to find a key which has the same criteria
>     value with the incoming packet. If find, goto step c. Otherwise,
>     insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
>     key. Then traverse all packets in the item group via next_pkt_index.
>     If find one packet which can merge with the incoming one, merge them
>     together. If can't find, insert the packet into this item group.
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>  doc/guides/rel_notes/release_17_08.rst |   7 +
>  lib/librte_gro/Makefile                |   1 +
>  lib/librte_gro/rte_gro.c               | 123 ++++++++--
>  lib/librte_gro/rte_gro.h               |   6 +-
>  lib/librte_gro/rte_gro_tcp.c           | 394 +++++++++++++++++++++++++++++++++
>  lib/librte_gro/rte_gro_tcp.h           | 191 ++++++++++++++++
>  6 files changed, 706 insertions(+), 16 deletions(-)
>  create mode 100644 lib/librte_gro/rte_gro_tcp.c
>  create mode 100644 lib/librte_gro/rte_gro_tcp.h
> 
> diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> index 842f46f..f067247 100644
> --- a/doc/guides/rel_notes/release_17_08.rst
> +++ b/doc/guides/rel_notes/release_17_08.rst
> @@ -75,6 +75,13 @@ New Features
> 
>    Added support for firmwares with multiple Ethernet ports per physical port.
> 
> +* **Add Generic Receive Offload API support.**
> +
> +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> +  packets. GRO API assumes all inputted packets are with correct
> +  checksums. GRO API doesn't update checksums for merged packets. If
> +  inputted packets are IP fragmented, GRO API assumes they are complete
> +  packets (i.e. with L4 headers).
> 
>  Resolved Issues
>  ---------------
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 7e0f128..e89344d 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
> 
>  # source files
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> 
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index 33275e8..5b89928 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,11 +32,15 @@
> 
>  #include <rte_malloc.h>
>  #include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> 
>  #include "rte_gro.h"
> +#include "rte_gro_tcp.h"
> 
> -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM];
> -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM];
> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM] = {
> +	gro_tcp_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM] = {
> +	gro_tcp_tbl_destroy, NULL};
> 
>  struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
>  		uint16_t max_flow_num,
> @@ -94,32 +98,121 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
>  }
> 
>  uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>  		const uint16_t nb_pkts,
> -		const struct rte_gro_param param __rte_unused)
> +		const struct rte_gro_param param)
>  {
> -	return nb_pkts;
> +	uint16_t i;
> +	uint16_t nb_after_gro = nb_pkts;
> +	uint32_t item_num = RTE_MIN(nb_pkts, param.max_flow_num *
> +			param.max_item_per_flow);
> +
> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	uint32_t tcp_item_num = RTE_MIN(item_num, GRO_MAX_BURST_ITEM_NUM);
> +	struct gro_tcp_tbl tcp_tbl;
> +	struct gro_tcp_key tcp_keys[tcp_item_num];
> +	struct gro_tcp_item tcp_items[tcp_item_num];
> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +
> +	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
> +			tcp_item_num);
> +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> +			tcp_item_num);
> +	tcp_tbl.keys = tcp_keys;
> +	tcp_tbl.items = tcp_items;
> +	tcp_tbl.key_num = 0;
> +	tcp_tbl.item_num = 0;
> +	tcp_tbl.max_key_num = tcp_item_num;
> +	tcp_tbl.max_item_num = tcp_item_num;
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) {

Why just not && for these 2 conditions?

> +			if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) &&
> +				(param.desired_gro_types &
> +					 GRO_TCP_IPV4)) {

No need to check param.desired_gro_types inside the loop.
You can do that before the loop.

> +				ret = gro_tcp4_reassemble(pkts[i],
> +						&tcp_tbl,
> +						param.max_packet_size);
> +				/* merge successfully */
> +				if (ret > 0)
> +					nb_after_gro--;
> +				else if (ret < 0)
> +					unprocess_pkts[unprocess_num++] =
> +						pkts[i];
> +			} else
> +				unprocess_pkts[unprocess_num++] =
> +					pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] =
> +				pkts[i];
> +	}
> +
> +	/* re-arrange GROed packets */
> +	if (nb_after_gro < nb_pkts) {
> +		if (param.desired_gro_types & GRO_TCP_IPV4)
> +			i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts);
> +		if (unprocess_num > 0) {
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) *
> +					unprocess_num);
> +			i += unprocess_num;
> +		}
> +		if (nb_pkts > i)
> +			memset(&pkts[i], 0,
> +					sizeof(struct rte_mbuf *) *
> +					(nb_pkts - i));
> +	}

Why do you need to zero remaining pkts[]?

> +	return nb_after_gro;
>  }
> 
> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> -		struct rte_gro_tbl *gro_tbl __rte_unused)
> +int rte_gro_reassemble(struct rte_mbuf *pkt,
> +		struct rte_gro_tbl *gro_tbl)
>  {
> +	if (unlikely(pkt == NULL))
> +		return -1;
> +
> +	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
> +		if ((pkt->packet_type & RTE_PTYPE_L4_TCP) &&
> +				(gro_tbl->desired_gro_types &
> +				 GRO_TCP_IPV4))
> +			return gro_tcp4_reassemble(pkt,
> +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +					gro_tbl->max_packet_size);
> +	}
> +
>  	return -1;
>  }
> 
> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused)
> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out)
>  {
> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & GRO_TCP_IPV4)
> +		return gro_tcp_tbl_flush(
> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +				out,
> +				max_nb_out);
>  	return 0;
>  }
> 
>  uint16_t
> -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		const uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		const uint16_t max_nb_out)
>  {
> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & GRO_TCP_IPV4)
> +		return gro_tcp_tbl_timeout_flush(
> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> +				gro_tbl->max_timeout_cycles,
> +				out, max_nb_out);
>  	return 0;
>  }
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index f9d36e8..a30b1c6 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -45,7 +45,11 @@ extern "C" {
> 
>  /* max number of supported GRO types */
>  #define GRO_TYPE_MAX_NUM 64
> -#define GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
> +#define GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
> +
> +/* TCP/IPv4 GRO flag */
> +#define GRO_TCP_IPV4_INDEX 0
> +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> 
>  /**
>   * GRO table, which is used to merge packets. It keeps many reassembly
> diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> new file mode 100644
> index 0000000..c0eaa45
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp.c
> @@ -0,0 +1,394 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "rte_gro_tcp.h"
> +
> +void *gro_tcp_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	size_t size;
> +	uint32_t entries_num;
> +	struct gro_tcp_tbl *tbl;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> +
> +	if (entries_num == 0)
> +		return NULL;
> +
> +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> +			__func__,
> +			sizeof(struct gro_tcp_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);

Here and everywhere - rte_malloc() can fail.
Add proper error handling.

> +
> +	size = sizeof(struct gro_tcp_item) * entries_num;
> +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> +			__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	tbl->max_item_num = entries_num;
> +
> +	size = sizeof(struct gro_tcp_key) * entries_num;
> +	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
> +			__func__,
> +			size, RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	tbl->max_key_num = entries_num;
> +	return tbl;
> +}
> +
> +void gro_tcp_tbl_destroy(void *tbl)
> +{
> +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> +
> +	if (tcp_tbl) {
> +		if (tcp_tbl->items)

No need to, rte_free(NULL) is a valid construction.
Same below.

> +			rte_free(tcp_tbl->items);
> +		if (tcp_tbl->keys)
> +			rte_free(tcp_tbl->keys);
> +		rte_free(tcp_tbl);
> +	}
> +}
> +
> +/**
> + * merge two TCP/IPv4 packets without update checksums.
> + */
> +static int
> +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> +		struct rte_mbuf *pkt,
> +		uint32_t max_packet_size)
> +{
> +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> +	struct tcp_hdr *tcp_hdr1;
> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> +	struct rte_mbuf *tail;
> +
> +	/* parse the given packet */
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				struct ether_hdr *) + 1);

You probably shouldn't assume that l2_len is always 14B long.

> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> +		- tcp_hl1;
> +
> +	/* parse the original packet */
> +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> +				struct ether_hdr *) + 1);
> +
> +	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
> +		return -1;
> +
> +	/* remove the header of the incoming packet */
> +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> +			ipv4_ihl1 + tcp_hl1);
> +
> +	/* chain the two packet together */
> +	tail = rte_pktmbuf_lastseg(pkt_src);
> +	tail->next = pkt;

What I see as a problem here:
You have to reparse your packet and do lastseg for it for each new segment.
That seems like a big overhead.
Would be good instead to parse the packet once and then store that infomatio
inside mbuf: l2_len/l3_len/l4_len, etc.
You can probably even avoid parsing inside your library - by adding as a prerequisite
for the caller to fill these fields properly.

Similar thought about lastseg - would be good to store it somewhere inside your table.

> +
> +	/* update IP header */
> +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> +			rte_be_to_cpu_16(
> +				ipv4_hdr2->total_length)
> +			+ tcp_dl1);
> +
> +	/* update mbuf metadata for the merged packet */
> +	pkt_src->nb_segs++;

Why do you assume that incoming packet always contains only one segment?

> +	pkt_src->pkt_len += pkt->pkt_len;
> +	return 1;
> +}
> +
> +static int
> +check_seq_option(struct rte_mbuf *pkt,
> +		struct tcp_hdr *tcp_hdr,
> +		uint16_t tcp_hl)
> +{
> +	struct ipv4_hdr *ipv4_hdr1;
> +	struct tcp_hdr *tcp_hdr1;
> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> +	uint32_t sent_seq1, sent_seq;
> +	int ret = -1;
> +
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				struct ether_hdr *) + 1);
> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> +		- tcp_hl1;
> +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	/* check if the two packets are neighbor */
> +	if ((sent_seq ^ sent_seq1) == 0) {

Why just not if sent_seq == sent_seq1?

> +		/* check if TCP option field equals */
> +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {

And what if tcp_hl1 == sizeof(struct tcp_hdr), but tcp_hl > tcp_hl1?
I think you need to remove that check.
  
> +			if ((tcp_hl1 != tcp_hl) ||
> +					(memcmp(tcp_hdr1 + 1,
> +							tcp_hdr + 1,
> +							tcp_hl - sizeof
> +							(struct tcp_hdr))
> +					 == 0))
> +				ret = 1;
> +		}
> +	}
> +	return ret;
> +}
> +
> +static uint32_t
> +find_an_empty_item(struct gro_tcp_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_item_num; i++)
> +		if (tbl->items[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static uint32_t
> +find_an_empty_key(struct gro_tcp_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_key_num; i++)
> +		if (tbl->keys[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp_tbl *tbl,
> +		uint32_t max_packet_size)
> +{
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> +
> +	struct tcp_key key;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint32_t i, key_idx;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> +
> +	/* check if the packet should be processed */
> +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> +		goto fail;
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> +		- tcp_hl;
> +	if (tcp_dl == 0)
> +		goto fail;
> +
> +	/* find a key and traverse all packets in its item group */
> +	key.eth_saddr = eth_hdr->s_addr;
> +	key.eth_daddr = eth_hdr->d_addr;
> +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);

Your key.ip_src_addr[1-3] still contains some junk.
How memcmp below supposed to worj properly?
BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
Same for ip_dst_addr.

> +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> +	key.tcp_flags = tcp_hdr->tcp_flags;
> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		if (tbl->keys[i].is_valid &&
> +				(memcmp(&(tbl->keys[i].key), &key,
> +						sizeof(struct tcp_key))
> +				 == 0)) {
> +			cur_idx = tbl->keys[i].start_index;
> +			prev_idx = cur_idx;
> +			while (cur_idx != INVALID_ARRAY_INDEX) {
> +				if (check_seq_option(tbl->items[cur_idx].pkt,
> +							tcp_hdr,
> +							tcp_hl) > 0) {

As I remember linux gro also check ipv4 packet_id - it should be consecutive.

> +					if (merge_two_tcp4_packets(
> +								tbl->items[cur_idx].pkt,
> +								pkt,
> +								max_packet_size) > 0) {
> +						/* successfully merge two packets */
> +						tbl->items[cur_idx].is_groed = 1;
> +						return 1;
> +					}

If you allow more then packet per flow to be stored in the table, then you should be
prepared that new segment can fill a gap between 2 packets.
Probably the easiest thing - don't allow more then one 'item' per flow.  

> +					/**
> +					 * fail to merge two packets since
> +					 * it's beyond the max packet length.
> +					 * Insert it into the item group.
> +					 */
> +					goto insert_to_item_group;
> +				} else {
> +					prev_idx = cur_idx;
> +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> +				}
> +			}
> +			/**
> +			 * find a corresponding item group but fails to find
> +			 * one packet to merge. Insert it into this item group.
> +			 */
> +insert_to_item_group:
> +			item_idx = find_an_empty_item(tbl);
> +			/* the item number is greater than the max value */
> +			if (item_idx == INVALID_ARRAY_INDEX)
> +				return -1;
> +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> +			tbl->items[item_idx].pkt = pkt;
> +			tbl->items[item_idx].is_groed = 0;
> +			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +			tbl->items[item_idx].is_valid = 1;
> +			tbl->items[item_idx].start_time = rte_rdtsc();
> +			tbl->item_num++;
> +			return 0;
> +		}
> +	}
> +
> +	/**
> +	 * merge fail as the given packet has a new key.
> +	 * So insert a new key.
> +	 */
> +	item_idx = find_an_empty_item(tbl);
> +	key_idx = find_an_empty_key(tbl);
> +	/**
> +	 * if current key or item number is greater than the max
> +	 * value, don't insert the packet into the table and return
> +	 * immediately.
> +	 */
> +	if (item_idx == INVALID_ARRAY_INDEX ||
> +			key_idx == INVALID_ARRAY_INDEX)
> +		return -1;
> +	tbl->items[item_idx].pkt = pkt;
> +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +	tbl->items[item_idx].is_groed = 0;
> +	tbl->items[item_idx].is_valid = 1;
> +	tbl->items[item_idx].start_time = rte_rdtsc();

You can pass start-time as a parameter instead.

> +	tbl->item_num++;
> +
> +	memcpy(&(tbl->keys[key_idx].key),
> +			&key, sizeof(struct tcp_key));
> +	tbl->keys[key_idx].start_index = item_idx;
> +	tbl->keys[key_idx].is_valid = 1;
> +	tbl->key_num++;
> +
> +	return 0;
> +fail:

Please try to avoid goto whenever possible.
Looks really ugly.

> +	return -1;
> +}
> +
> +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out)
> +{
> +	uint32_t i, num = 0;
> +
> +	if (nb_out < tbl->item_num)
> +		return 0;

And how user would now how many items are now in the table?

> +
> +	for (i = 0; i < tbl->max_item_num; i++) {
> +		if (tbl->items[i].is_valid) {
> +			out[num++] = tbl->items[i].pkt;
> +			tbl->items[i].is_valid = 0;
> +			tbl->item_num--;
> +		}
> +	}
> +	memset(tbl->keys, 0, sizeof(struct gro_tcp_key) *
> +			tbl->max_key_num);
> +	tbl->key_num = 0;
> +
> +	return num;
> +}
> +
> +uint16_t
> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out)
> +{
> +	uint16_t k;
> +	uint32_t i, j;
> +	uint64_t current_time;
> +
> +	if (nb_out == 0)
> +		return 0;
> +	k = 0;
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		if (tbl->keys[i].is_valid) {

Seems pretty expensive to traverse the whole table...
Would it worth to have some sort of LRU list?

> +			j = tbl->keys[i].start_index;
> +			while (j != INVALID_ARRAY_INDEX) {
> +				if (current_time - tbl->items[j].start_time >=
> +						timeout_cycles) {
> +					out[k++] = tbl->items[j].pkt;
> +					tbl->items[j].is_valid = 0;
> +					tbl->item_num--;
> +					j = tbl->items[j].next_pkt_idx;
> +
> +					if (k == nb_out &&
> +							j == INVALID_ARRAY_INDEX) {
> +						/* delete the key */
> +						tbl->keys[i].is_valid = 0;
> +						tbl->key_num--;
> +						goto end;

Please rearrange the code to avoid gotos.

> +					} else if (k == nb_out &&
> +							j != INVALID_ARRAY_INDEX) {
> +						/* update the first item index */
> +						tbl->keys[i].start_index = j;
> +						goto end;
> +					}
> +				}
> +			}
> +			/* delete the key, as all of its packets are flushed */
> +			tbl->keys[i].is_valid = 0;
> +			tbl->key_num--;
> +		}
> +		if (tbl->key_num == 0)
> +			goto end;
> +	}
> +end:
> +	return k;
> +}
> diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> new file mode 100644
> index 0000000..a9a7aca
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp.h
> @@ -0,0 +1,191 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_TCP_H_
> +#define _RTE_GRO_TCP_H_
> +
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> +#define TCP_HDR_LEN(tcph) \
> +	((tcph->data_off >> 4) * 4)
> +#define IPv4_HDR_LEN(iph) \
> +	((iph->version_ihl & 0x0f) * 4)
> +#else
> +#define TCP_DATAOFF_MASK 0x0f
> +#define TCP_HDR_LEN(tcph) \
> +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> +#define IPv4_HDR_LEN(iph) \
> +	((iph->version_ihl >> 4) * 4)
> +#endif
> +
> +#define INVALID_ARRAY_INDEX 0xffffffffUL
> +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> +
> +/* criteria of mergeing packets */
> +struct tcp_key {
> +	struct ether_addr eth_saddr;
> +	struct ether_addr eth_daddr;
> +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> +	uint32_t ip_dst_addr[4];
> +
> +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +	uint8_t tcp_flags;	/**< TCP flags. */
> +};
> +
> +struct gro_tcp_key {
> +	struct tcp_key key;
> +	uint32_t start_index;	/**< the first packet index of the flow */
> +	uint8_t is_valid;
> +};
> +
> +struct gro_tcp_item {
> +	struct rte_mbuf *pkt;	/**< packet address. */
> +	/* the time when the packet in added into the table */
> +	uint64_t start_time;
> +	uint32_t next_pkt_idx;	/**< next packet index. */
> +	/* flag to indicate if the packet is GROed */
> +	uint8_t is_groed;
> +	uint8_t is_valid;	/**< flag indicates if the item is valid */

Why do you need these 2 flags at all?
Why not just reset let say pkt to NULL for invalid item?

> +};
> +
> +/**
> + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> + * structure.
> + */
> +struct gro_tcp_tbl {
> +	struct gro_tcp_item *items;	/**< item array */
> +	struct gro_tcp_key *keys;	/**< key array */
> +	uint32_t item_num;	/**< current item number */
> +	uint32_t key_num;	/**< current key num */
> +	uint32_t max_item_num;	/**< item array size */
> +	uint32_t max_key_num;	/**< key array size */
> +};
> +
> +/**
> + * This function creates a TCP reassembly table.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  the maximum number of flows in the TCP GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow.
> + * @return
> + *  if create successfully, return a pointer which points to the
> + *  created TCP GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +
> +/**
> + * This function destroys a TCP reassembly table.
> + * @param tbl
> + *  a pointer points to the TCP reassembly table.
> + */
> +void gro_tcp_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP reassembly table to
> + * merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. If the packet is without data
> + * (e.g. SYN, SYN-ACK packet), this function returns immediately.
> + * Otherwise, the packet is either merged, or inserted into the table.
> + * Besides, if there is no available space to insert the packet, this
> + * function returns immediately too.
> + *
> + * This function assumes the inputted packet is with correct IPv4 and
> + * TCP checksums. And if two packets are merged, it won't re-calculate
> + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> + * fragmented, it assumes the packet is complete (with TCP header).
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param tbl
> + *  a pointer that points to a TCP reassembly table.
> + * @param max_packet_size
> + *  max packet length after merged
> + * @return
> + *  if the packet doesn't have data, or there is no available space
> + *  in the table to insert a new item or a new key, return a negative
> + *  value. If the packet is merged successfully, return an positive
> + *  value. If the packet is inserted into the table, return 0.
> + */
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp_tbl *tbl,
> +		uint32_t max_packet_size);
> +
> +/**
> + * This function flushes all packets in a TCP reassembly table to
> + * applications, and without updating checksums for merged packets.
> + * If the array which is used to keep flushed packets is not large
> + * enough, error happens and this function returns immediately.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param out
> + *  pointer array which is used to keep flushed packets. Applications
> + *  should guarantee it's large enough to hold all packets in the table.
> + * @param nb_out
> + *  the element number of out.
> + * @return
> + *  the number of flushed packets. If out is not large enough to hold
> + *  all packets in the table, return 0.
> + */
> +uint16_t
> +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out);
> +
> +/**
> + * This function flushes timeout packets in a TCP reassembly table to
> + * applications, and without updating checksums for merged packets.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + *  the maximum time that packets can stay in the table.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the element number of out.
> + * @return
> + *  the number of packets that are returned.
> + */
> +uint16_t
> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		const uint16_t nb_out);
> +#endif
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 1/3] lib: add Generic Receive Offload API framework
  2017-06-28 17:41                   ` Ananyev, Konstantin
@ 2017-06-29  1:19                     ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-29  1:19 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A,
	Wiles, Keith, Bie, Tiwei

Hi Konstantin,

On Thu, Jun 29, 2017 at 01:41:40AM +0800, Ananyev, Konstantin wrote:
> 
> Hi Jiayu,
> 
> > 
> > >
> > > > +
> > > > +/**
> > > > + * GRO table, which is used to merge packets. It keeps many reassembly
> > > > + * tables of desired GRO types. Applications need to create GRO tables
> > > > + * before using rte_gro_reassemble to perform GRO.
> > > > + */
> > > > +struct rte_gro_tbl {
> > > > +	uint64_t desired_gro_types;	/**< GRO types to perform */
> > > > +	/* max TTL measured in nanosecond */
> > > > +	uint64_t max_timeout_cycles;
> > > > +	/* max length of merged packet measured in byte */
> > > > +	uint32_t max_packet_size;
> > > > +	/* reassebly tables of desired GRO types */
> > > > +	void *tbls[GRO_TYPE_MAX_NUM];
> > > > +};
> > >
> > > Not sure why do you need to define that structure here.
> > > As I understand it is internal to the library.
> > > Just declaration should be enough.
> > 
> > This structure defines a GRO table, which is used by rte_gro_reassemble
> > to merge packets. Applications need to create this table before calling
> > rte_gro_reassemble. So I define it in rte_gro.h.
> 
> Yes, application has to call gro_table_create().
> But application don't need to access contents of struct rte_gro_tbl,
> which means at it can (and should) treat it as opaque pointer.

Thanks, I will modify it.

> 
> > > > +
> > > > +/**
> > > > + * Reassembly function, which tries to merge the inputted packet with
> > > > + * one packet in a given GRO table. This function assumes the inputted
> > > > + * packet is with correct checksums. And it won't update checksums if
> > > > + * two packets are merged. Besides, if the inputted packet is IP
> > > > + * fragmented, this function assumes it's a complete packet (i.e. with
> > > > + * L4 header).
> > > > + *
> > > > + * If the inputted packet doesn't have data or it's with unsupported GRO
> > > > + * type, function returns immediately. Otherwise, the inputted packet is
> > > > + * either merged or inserted into the table. If applications want get
> > > > + * packets in the table, they need to call flush APIs.
> > > > + *
> > > > + * @param pkt
> > > > + *  packet to reassemble.
> > > > + * @param gro_tbl
> > > > + *  a pointer points to a GRO table.
> > > > + * @return
> > > > + *  if merge the packet successfully, return a positive value. If fail
> > > > + *  to merge, return zero. If the packet doesn't have data, or its GRO
> > > > + *  type is unsupported, return a negative value.
> > > > + */
> > > > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > > > +		struct rte_gro_tbl *gro_tbl);
> > >
> > >
> > > Ok, and why tbl one can't do bursts?
> > 
> > In current design, if applications want to do bursts, they don't need to
> > create gro_tbl. rte_gro_reassemble_burst will create a temporary table
> > in stack. So when do bursts (we call it lightweight mode), the operations
> > of applications is very simple: calling rte_gro_reassemble_burst. And
> > after rte_gro_reassemble_burst returns, applications can get all merged
> > packets. rte_gro_reassemble is another mode API, called heavyweight mode.
> > The gro_tbl is just used in rte_gro_reassemble. rte_gro_reassemble just
> > processes one packet at a time.
> > 
> > So you mean: we should enable rte_gro_reassemble to merge N inputted
> > packets with the packets in a given gro_tbl?
> 
> Yes, I suppose that will be faster.

Thanks, I will enable it to process N packets at a time.

> 
> > 
> > >
> > >
> > > > +
> > > > +/**
> > > > + * This function flushed packets from reassembly tables of desired GRO
> > > > + * types. It won't re-calculate checksums for merged packets in the
> > > > + * tables. That is, the returned packets may be with wrong checksums.
> > > > + *
> > > > + * @param gro_tbl
> > > > + *  a pointer points to a GRO table object.
> > > > + * @param desired_gro_types
> > > > + *  GRO types whose packets will be flushed.
> > > > + * @param out
> > > > + *  a pointer array that is used to keep flushed packets.
> > > > + * @param nb_out
> > > > + *  the size of out.
> > > > + * @return
> > > > + *  the number of flushed packets. If no packets are flushed, return 0.
> > > > + */
> > > > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out);
> > > > +
> > > > +/**
> > > > + * This function flushes the timeout packets from reassembly tables of
> > > > + * desired GRO types. It won't re-calculate checksums for merged packets
> > > > + * in the tables. That is, the returned packets may be with wrong
> > > > + * checksums.
> > > > + *
> > > > + * @param gro_tbl
> > > > + *  a pointer points to a GRO table object.
> > > > + * @param desired_gro_types
> > > > + * rte_gro_timeout_flush only processes packets which belong to the
> > > > + * GRO types specified by desired_gro_types.
> > > > + * @param out
> > > > + *  a pointer array that is used to keep flushed packets.
> > > > + * @param nb_out
> > > > + *  the size of out.
> > > > + * @return
> > > > + *  the number of flushed packets. If no packets are flushed, return 0.
> > > > + */
> > > > +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > > > +		uint64_t desired_gro_types,
> > > > +		struct rte_mbuf **out,
> > > > +		const uint16_t max_nb_out);
> > >
> > > No point to have 2 flush() functions.
> > > I suggest to merge them together.
> > 
> > rte_gro_flush flush all packets from table, but rte_gro_timeout_flush only
> > flush timeout packets. They have different operations. But if we merge them
> > together, we need to flush all or only timeout ones?
> 
> We can specify that if timeout is zero (or less then current time)  then
> we flush all packets.

Thanks, I will merge them together.

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-28 23:56               ` Ananyev, Konstantin
@ 2017-06-29  2:26                 ` Jiayu Hu
  2017-06-30 12:07                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-29  2:26 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A,
	Wiles, Keith, Bie, Tiwei

Hi Konstantin,

On Thu, Jun 29, 2017 at 07:56:10AM +0800, Ananyev, Konstantin wrote:
> Hi Jiayu,
> 
> > 
> > In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
> >     merge packets.
> > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
> > - gro_tcp_tbl_flush: flush all packets from a TCP reassembly table.
> > - gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
> >     reassembly table.
> > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
> > 
> > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> > checksums for merged packets. If inputted packets are IP fragmented,
> > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> > headers).
> > 
> > In TCP GRO, we use a table structure, called TCP reassembly table, to
> > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
> > structure. A TCP reassembly table includes a key array and a item array,
> > where the key array keeps the criteria to merge packets and the item
> > array keeps packet information.
> > 
> > One key in the key array points to an item group, which consists of
> > packets which have the same criteria value. If two packets are able to
> > merge, they must be in the same item group. Each key in the key array
> > includes two parts:
> > - criteria: the criteria of merging packets. If two packets can be
> >     merged, they must have the same criteria value.
> > - start_index: the index of the first incoming packet of the item group.
> > 
> > Each element in the item array keeps the information of one packet. It
> > mainly includes two parts:
> > - pkt: packet address
> > - next_pkt_index: the index of the next packet in the same item group.
> >     All packets in the same item group are chained by next_pkt_index.
> >     With next_pkt_index, we can locate all packets in the same item
> >     group one by one.
> > 
> > To process an incoming packet needs three steps:
> > a. check if the packet should be processed. Packets with the following
> >     properties won't be processed:
> > 	- packets without data (e.g. SYN, SYN-ACK)
> > b. traverse the key array to find a key which has the same criteria
> >     value with the incoming packet. If find, goto step c. Otherwise,
> >     insert a new key and insert the packet into the item array.
> > c. locate the first packet in the item group via the start_index in the
> >     key. Then traverse all packets in the item group via next_pkt_index.
> >     If find one packet which can merge with the incoming one, merge them
> >     together. If can't find, insert the packet into this item group.
> > 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >  doc/guides/rel_notes/release_17_08.rst |   7 +
> >  lib/librte_gro/Makefile                |   1 +
> >  lib/librte_gro/rte_gro.c               | 123 ++++++++--
> >  lib/librte_gro/rte_gro.h               |   6 +-
> >  lib/librte_gro/rte_gro_tcp.c           | 394 +++++++++++++++++++++++++++++++++
> >  lib/librte_gro/rte_gro_tcp.h           | 191 ++++++++++++++++
> >  6 files changed, 706 insertions(+), 16 deletions(-)
> >  create mode 100644 lib/librte_gro/rte_gro_tcp.c
> >  create mode 100644 lib/librte_gro/rte_gro_tcp.h
> > 
> > diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> > index 842f46f..f067247 100644
> > --- a/doc/guides/rel_notes/release_17_08.rst
> > +++ b/doc/guides/rel_notes/release_17_08.rst
> > @@ -75,6 +75,13 @@ New Features
> > 
> >    Added support for firmwares with multiple Ethernet ports per physical port.
> > 
> > +* **Add Generic Receive Offload API support.**
> > +
> > +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> > +  packets. GRO API assumes all inputted packets are with correct
> > +  checksums. GRO API doesn't update checksums for merged packets. If
> > +  inputted packets are IP fragmented, GRO API assumes they are complete
> > +  packets (i.e. with L4 headers).
> > 
> >  Resolved Issues
> >  ---------------
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > index 7e0f128..e89344d 100644
> > --- a/lib/librte_gro/Makefile
> > +++ b/lib/librte_gro/Makefile
> > @@ -43,6 +43,7 @@ LIBABIVER := 1
> > 
> >  # source files
> >  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
> > 
> >  # install this header file
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > index 33275e8..5b89928 100644
> > --- a/lib/librte_gro/rte_gro.c
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -32,11 +32,15 @@
> > 
> >  #include <rte_malloc.h>
> >  #include <rte_mbuf.h>
> > +#include <rte_ethdev.h>
> > 
> >  #include "rte_gro.h"
> > +#include "rte_gro_tcp.h"
> > 
> > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM];
> > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM];
> > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM] = {
> > +	gro_tcp_tbl_create, NULL};
> > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM] = {
> > +	gro_tcp_tbl_destroy, NULL};
> > 
> >  struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
> >  		uint16_t max_flow_num,
> > @@ -94,32 +98,121 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> >  }
> > 
> >  uint16_t
> > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> >  		const uint16_t nb_pkts,
> > -		const struct rte_gro_param param __rte_unused)
> > +		const struct rte_gro_param param)
> >  {
> > -	return nb_pkts;
> > +	uint16_t i;
> > +	uint16_t nb_after_gro = nb_pkts;
> > +	uint32_t item_num = RTE_MIN(nb_pkts, param.max_flow_num *
> > +			param.max_item_per_flow);
> > +
> > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > +	uint32_t tcp_item_num = RTE_MIN(item_num, GRO_MAX_BURST_ITEM_NUM);
> > +	struct gro_tcp_tbl tcp_tbl;
> > +	struct gro_tcp_key tcp_keys[tcp_item_num];
> > +	struct gro_tcp_item tcp_items[tcp_item_num];
> > +
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	uint16_t unprocess_num = 0;
> > +	int32_t ret;
> > +
> > +	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
> > +			tcp_item_num);
> > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > +			tcp_item_num);
> > +	tcp_tbl.keys = tcp_keys;
> > +	tcp_tbl.items = tcp_items;
> > +	tcp_tbl.key_num = 0;
> > +	tcp_tbl.item_num = 0;
> > +	tcp_tbl.max_key_num = tcp_item_num;
> > +	tcp_tbl.max_item_num = tcp_item_num;
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) {
> 
> Why just not && for these 2 conditions?
> 
> > +			if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) &&
> > +				(param.desired_gro_types &
> > +					 GRO_TCP_IPV4)) {
> 
> No need to check param.desired_gro_types inside the loop.
> You can do that before the loop.

Yes, I should do it before the loop. Thanks, I will change it.

> 
> > +				ret = gro_tcp4_reassemble(pkts[i],
> > +						&tcp_tbl,
> > +						param.max_packet_size);
> > +				/* merge successfully */
> > +				if (ret > 0)
> > +					nb_after_gro--;
> > +				else if (ret < 0)
> > +					unprocess_pkts[unprocess_num++] =
> > +						pkts[i];
> > +			} else
> > +				unprocess_pkts[unprocess_num++] =
> > +					pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] =
> > +				pkts[i];
> > +	}
> > +
> > +	/* re-arrange GROed packets */
> > +	if (nb_after_gro < nb_pkts) {
> > +		if (param.desired_gro_types & GRO_TCP_IPV4)
> > +			i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts);
> > +		if (unprocess_num > 0) {
> > +			memcpy(&pkts[i], unprocess_pkts,
> > +					sizeof(struct rte_mbuf *) *
> > +					unprocess_num);
> > +			i += unprocess_num;
> > +		}
> > +		if (nb_pkts > i)
> > +			memset(&pkts[i], 0,
> > +					sizeof(struct rte_mbuf *) *
> > +					(nb_pkts - i));
> > +	}
> 
> Why do you need to zero remaining pkts[]?

Thanks, I will remove these reseting operations.

> 
> > +	return nb_after_gro;
> >  }
> > 
> > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
> > -		struct rte_gro_tbl *gro_tbl __rte_unused)
> > +int rte_gro_reassemble(struct rte_mbuf *pkt,
> > +		struct rte_gro_tbl *gro_tbl)
> >  {
> > +	if (unlikely(pkt == NULL))
> > +		return -1;
> > +
> > +	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
> > +		if ((pkt->packet_type & RTE_PTYPE_L4_TCP) &&
> > +				(gro_tbl->desired_gro_types &
> > +				 GRO_TCP_IPV4))
> > +			return gro_tcp4_reassemble(pkt,
> > +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +					gro_tbl->max_packet_size);
> > +	}
> > +
> >  	return -1;
> >  }
> > 
> > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused)
> > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out)
> >  {
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & GRO_TCP_IPV4)
> > +		return gro_tcp_tbl_flush(
> > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +				out,
> > +				max_nb_out);
> >  	return 0;
> >  }
> > 
> >  uint16_t
> > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		const uint16_t max_nb_out __rte_unused)
> > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		const uint16_t max_nb_out)
> >  {
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & GRO_TCP_IPV4)
> > +		return gro_tcp_tbl_timeout_flush(
> > +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
> > +				gro_tbl->max_timeout_cycles,
> > +				out, max_nb_out);
> >  	return 0;
> >  }
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > index f9d36e8..a30b1c6 100644
> > --- a/lib/librte_gro/rte_gro.h
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -45,7 +45,11 @@ extern "C" {
> > 
> >  /* max number of supported GRO types */
> >  #define GRO_TYPE_MAX_NUM 64
> > -#define GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
> > +#define GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
> > +
> > +/* TCP/IPv4 GRO flag */
> > +#define GRO_TCP_IPV4_INDEX 0
> > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
> > 
> >  /**
> >   * GRO table, which is used to merge packets. It keeps many reassembly
> > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
> > new file mode 100644
> > index 0000000..c0eaa45
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp.c
> > @@ -0,0 +1,394 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +
> > +#include <rte_ethdev.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "rte_gro_tcp.h"
> > +
> > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow)
> > +{
> > +	size_t size;
> > +	uint32_t entries_num;
> > +	struct gro_tcp_tbl *tbl;
> > +
> > +	entries_num = max_flow_num * max_item_per_flow;
> > +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
> > +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
> > +
> > +	if (entries_num == 0)
> > +		return NULL;
> > +
> > +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
> > +			__func__,
> > +			sizeof(struct gro_tcp_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> 
> Here and everywhere - rte_malloc() can fail.
> Add proper error handling.

Thanks, I will modify it.

> 
> > +
> > +	size = sizeof(struct gro_tcp_item) * entries_num;
> > +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
> > +			__func__,
> > +			size,
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	tbl->max_item_num = entries_num;
> > +
> > +	size = sizeof(struct gro_tcp_key) * entries_num;
> > +	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
> > +			__func__,
> > +			size, RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	tbl->max_key_num = entries_num;
> > +	return tbl;
> > +}
> > +
> > +void gro_tcp_tbl_destroy(void *tbl)
> > +{
> > +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
> > +
> > +	if (tcp_tbl) {
> > +		if (tcp_tbl->items)
> 
> No need to, rte_free(NULL) is a valid construction.
> Same below.

Thanks.

> 
> > +			rte_free(tcp_tbl->items);
> > +		if (tcp_tbl->keys)
> > +			rte_free(tcp_tbl->keys);
> > +		rte_free(tcp_tbl);
> > +	}
> > +}
> > +
> > +/**
> > + * merge two TCP/IPv4 packets without update checksums.
> > + */
> > +static int
> > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
> > +		struct rte_mbuf *pkt,
> > +		uint32_t max_packet_size)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
> > +	struct tcp_hdr *tcp_hdr1;
> > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > +	struct rte_mbuf *tail;
> > +
> > +	/* parse the given packet */
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				struct ether_hdr *) + 1);
> 
> You probably shouldn't assume that l2_len is always 14B long.
> 
> > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > +		- tcp_hl1;
> > +
> > +	/* parse the original packet */
> > +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
> > +				struct ether_hdr *) + 1);
> > +
> > +	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
> > +		return -1;
> > +
> > +	/* remove the header of the incoming packet */
> > +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
> > +			ipv4_ihl1 + tcp_hl1);
> > +
> > +	/* chain the two packet together */
> > +	tail = rte_pktmbuf_lastseg(pkt_src);
> > +	tail->next = pkt;
> 
> What I see as a problem here:
> You have to reparse your packet and do lastseg for it for each new segment.
> That seems like a big overhead.
> Would be good instead to parse the packet once and then store that infomatio
> inside mbuf: l2_len/l3_len/l4_len, etc.
> You can probably even avoid parsing inside your library - by adding as a prerequisite
> for the caller to fill these fields properly.
> 
> Similar thought about lastseg - would be good to store it somewhere inside your table.

Yes, it's faster and simpler to store the lastseg into the table. I will modify it.

> 
> > +
> > +	/* update IP header */
> > +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
> > +			rte_be_to_cpu_16(
> > +				ipv4_hdr2->total_length)
> > +			+ tcp_dl1);
> > +
> > +	/* update mbuf metadata for the merged packet */
> > +	pkt_src->nb_segs++;
> 
> Why do you assume that incoming packet always contains only one segment?

I think it's a bug here. I need to handle the multi-segment packets. Thanks,
I will modify it.

> 
> > +	pkt_src->pkt_len += pkt->pkt_len;
> > +	return 1;
> > +}
> > +
> > +static int
> > +check_seq_option(struct rte_mbuf *pkt,
> > +		struct tcp_hdr *tcp_hdr,
> > +		uint16_t tcp_hl)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr1;
> > +	struct tcp_hdr *tcp_hdr1;
> > +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
> > +	uint32_t sent_seq1, sent_seq;
> > +	int ret = -1;
> > +
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				struct ether_hdr *) + 1);
> > +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
> > +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
> > +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
> > +		- tcp_hl1;
> > +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	/* check if the two packets are neighbor */
> > +	if ((sent_seq ^ sent_seq1) == 0) {
> 
> Why just not if sent_seq == sent_seq1?

Thanks, I will change it.

> 
> > +		/* check if TCP option field equals */
> > +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
> 
> And what if tcp_hl1 == sizeof(struct tcp_hdr), but tcp_hl > tcp_hl1?
> I think you need to remove that check.

I will remove it. Thanks.

>   
> > +			if ((tcp_hl1 != tcp_hl) ||
> > +					(memcmp(tcp_hdr1 + 1,
> > +							tcp_hdr + 1,
> > +							tcp_hl - sizeof
> > +							(struct tcp_hdr))
> > +					 == 0))
> > +				ret = 1;
> > +		}
> > +	}
> > +	return ret;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_item(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++)
> > +		if (tbl->items[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_key(struct gro_tcp_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++)
> > +		if (tbl->keys[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp_tbl *tbl,
> > +		uint32_t max_packet_size)
> > +{
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > +
> > +	struct tcp_key key;
> > +	uint32_t cur_idx, prev_idx, item_idx;
> > +	uint32_t i, key_idx;
> > +
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > +
> > +	/* check if the packet should be processed */
> > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > +		goto fail;
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > +		- tcp_hl;
> > +	if (tcp_dl == 0)
> > +		goto fail;
> > +
> > +	/* find a key and traverse all packets in its item group */
> > +	key.eth_saddr = eth_hdr->s_addr;
> > +	key.eth_daddr = eth_hdr->d_addr;
> > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> 
> Your key.ip_src_addr[1-3] still contains some junk.
> How memcmp below supposed to worj properly?

When allocate an item, we already guarantee the content of its
memory space is 0. So memcpy won't be error.

> BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
> Same for ip_dst_addr.

I think tcp6 and tcp4 can share the same table structure. So I use
128bit IP address here. You mean we need to use different structures
for tcp4 and tcp6?

> 
> > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		if (tbl->keys[i].is_valid &&
> > +				(memcmp(&(tbl->keys[i].key), &key,
> > +						sizeof(struct tcp_key))
> > +				 == 0)) {
> > +			cur_idx = tbl->keys[i].start_index;
> > +			prev_idx = cur_idx;
> > +			while (cur_idx != INVALID_ARRAY_INDEX) {
> > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > +							tcp_hdr,
> > +							tcp_hl) > 0) {
> 
> As I remember linux gro also check ipv4 packet_id - it should be consecutive.

IP fragmented packets have the same IP ID, but others are consecutive. As we
suppose GRO can merge IP fragmented packets, so I think we shouldn't check if
the IP ID is consecutive here. How do you think?

> 
> > +					if (merge_two_tcp4_packets(
> > +								tbl->items[cur_idx].pkt,
> > +								pkt,
> > +								max_packet_size) > 0) {
> > +						/* successfully merge two packets */
> > +						tbl->items[cur_idx].is_groed = 1;
> > +						return 1;
> > +					}
> 
> If you allow more then packet per flow to be stored in the table, then you should be
> prepared that new segment can fill a gap between 2 packets.
> Probably the easiest thing - don't allow more then one 'item' per flow.

We allow the table to store same flow but out-of-order arriving packets. For
these packets, they will occupy different 'item' and we won't re-merge them.
For example, there are three same flow tcp packets: p1, p2 and p3. And p1 arrives
first, then p3, and last is p2. So TCP GRO will allocate one 'item' for p1 and one
'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefore, in the
table, we will have two 'item': item1 to store merged p1 and p2, item2 to store p3.

As you can see, TCP GRO can only merges sequential arriving packets. If we want to
merge all out-of-order arriving packets, we need to re-process these packets which
are already processed and have one 'item'. IMO, this procedure will be very complicated.
So we don't do that.

Sorry, I don't understand how to allow one 'item' per-flow. Because packets are arriving
out-of-order. If we don't re-process these packets which already have one 'item', how to
guarantee it?
 
> 
> > +					/**
> > +					 * fail to merge two packets since
> > +					 * it's beyond the max packet length.
> > +					 * Insert it into the item group.
> > +					 */
> > +					goto insert_to_item_group;
> > +				} else {
> > +					prev_idx = cur_idx;
> > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > +				}
> > +			}
> > +			/**
> > +			 * find a corresponding item group but fails to find
> > +			 * one packet to merge. Insert it into this item group.
> > +			 */
> > +insert_to_item_group:
> > +			item_idx = find_an_empty_item(tbl);
> > +			/* the item number is greater than the max value */
> > +			if (item_idx == INVALID_ARRAY_INDEX)
> > +				return -1;
> > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > +			tbl->items[item_idx].pkt = pkt;
> > +			tbl->items[item_idx].is_groed = 0;
> > +			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +			tbl->items[item_idx].is_valid = 1;
> > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > +			tbl->item_num++;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/**
> > +	 * merge fail as the given packet has a new key.
> > +	 * So insert a new key.
> > +	 */
> > +	item_idx = find_an_empty_item(tbl);
> > +	key_idx = find_an_empty_key(tbl);
> > +	/**
> > +	 * if current key or item number is greater than the max
> > +	 * value, don't insert the packet into the table and return
> > +	 * immediately.
> > +	 */
> > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > +			key_idx == INVALID_ARRAY_INDEX)
> > +		return -1;
> > +	tbl->items[item_idx].pkt = pkt;
> > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +	tbl->items[item_idx].is_groed = 0;
> > +	tbl->items[item_idx].is_valid = 1;
> > +	tbl->items[item_idx].start_time = rte_rdtsc();
> 
> You can pass start-time as a parameter instead.

Thanks, I will modify it.

> 
> > +	tbl->item_num++;
> > +
> > +	memcpy(&(tbl->keys[key_idx].key),
> > +			&key, sizeof(struct tcp_key));
> > +	tbl->keys[key_idx].start_index = item_idx;
> > +	tbl->keys[key_idx].is_valid = 1;
> > +	tbl->key_num++;
> > +
> > +	return 0;
> > +fail:
> 
> Please try to avoid goto whenever possible.
> Looks really ugly.

Thanks, I will modify it.

> 
> > +	return -1;
> > +}
> > +
> > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out)
> > +{
> > +	uint32_t i, num = 0;
> > +
> > +	if (nb_out < tbl->item_num)
> > +		return 0;
> 
> And how user would now how many items are now in the table?

I will add a API to tell users the item number. Thanks.

> 
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++) {
> > +		if (tbl->items[i].is_valid) {
> > +			out[num++] = tbl->items[i].pkt;
> > +			tbl->items[i].is_valid = 0;
> > +			tbl->item_num--;
> > +		}
> > +	}
> > +	memset(tbl->keys, 0, sizeof(struct gro_tcp_key) *
> > +			tbl->max_key_num);
> > +	tbl->key_num = 0;
> > +
> > +	return num;
> > +}
> > +
> > +uint16_t
> > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out)
> > +{
> > +	uint16_t k;
> > +	uint32_t i, j;
> > +	uint64_t current_time;
> > +
> > +	if (nb_out == 0)
> > +		return 0;
> > +	k = 0;
> > +	current_time = rte_rdtsc();
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		if (tbl->keys[i].is_valid) {
> 
> Seems pretty expensive to traverse the whole table...
> Would it worth to have some sort of LRU list?
> 
> > +			j = tbl->keys[i].start_index;
> > +			while (j != INVALID_ARRAY_INDEX) {
> > +				if (current_time - tbl->items[j].start_time >=
> > +						timeout_cycles) {
> > +					out[k++] = tbl->items[j].pkt;
> > +					tbl->items[j].is_valid = 0;
> > +					tbl->item_num--;
> > +					j = tbl->items[j].next_pkt_idx;
> > +
> > +					if (k == nb_out &&
> > +							j == INVALID_ARRAY_INDEX) {
> > +						/* delete the key */
> > +						tbl->keys[i].is_valid = 0;
> > +						tbl->key_num--;
> > +						goto end;
> 
> Please rearrange the code to avoid gotos.
> 
> > +					} else if (k == nb_out &&
> > +							j != INVALID_ARRAY_INDEX) {
> > +						/* update the first item index */
> > +						tbl->keys[i].start_index = j;
> > +						goto end;
> > +					}
> > +				}
> > +			}
> > +			/* delete the key, as all of its packets are flushed */
> > +			tbl->keys[i].is_valid = 0;
> > +			tbl->key_num--;
> > +		}
> > +		if (tbl->key_num == 0)
> > +			goto end;
> > +	}
> > +end:
> > +	return k;
> > +}
> > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
> > new file mode 100644
> > index 0000000..a9a7aca
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp.h
> > @@ -0,0 +1,191 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_TCP_H_
> > +#define _RTE_GRO_TCP_H_
> > +
> > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > +#define TCP_HDR_LEN(tcph) \
> > +	((tcph->data_off >> 4) * 4)
> > +#define IPv4_HDR_LEN(iph) \
> > +	((iph->version_ihl & 0x0f) * 4)
> > +#else
> > +#define TCP_DATAOFF_MASK 0x0f
> > +#define TCP_HDR_LEN(tcph) \
> > +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
> > +#define IPv4_HDR_LEN(iph) \
> > +	((iph->version_ihl >> 4) * 4)
> > +#endif
> > +
> > +#define INVALID_ARRAY_INDEX 0xffffffffUL
> > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > +
> > +/* criteria of mergeing packets */
> > +struct tcp_key {
> > +	struct ether_addr eth_saddr;
> > +	struct ether_addr eth_daddr;
> > +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
> > +	uint32_t ip_dst_addr[4];
> > +
> > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > +	uint16_t src_port;
> > +	uint16_t dst_port;
> > +	uint8_t tcp_flags;	/**< TCP flags. */
> > +};
> > +
> > +struct gro_tcp_key {
> > +	struct tcp_key key;
> > +	uint32_t start_index;	/**< the first packet index of the flow */
> > +	uint8_t is_valid;
> > +};
> > +
> > +struct gro_tcp_item {
> > +	struct rte_mbuf *pkt;	/**< packet address. */
> > +	/* the time when the packet in added into the table */
> > +	uint64_t start_time;
> > +	uint32_t next_pkt_idx;	/**< next packet index. */
> > +	/* flag to indicate if the packet is GROed */
> > +	uint8_t is_groed;
> > +	uint8_t is_valid;	/**< flag indicates if the item is valid */
> 
> Why do you need these 2 flags at all?
> Why not just reset let say pkt to NULL for invalid item?

Thanks, I will use NULL to replace is_valid.

> 
> > +};
> > +
> > +/**
> > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
> > + * structure.
> > + */
> > +struct gro_tcp_tbl {
> > +	struct gro_tcp_item *items;	/**< item array */
> > +	struct gro_tcp_key *keys;	/**< key array */
> > +	uint32_t item_num;	/**< current item number */
> > +	uint32_t key_num;	/**< current key num */
> > +	uint32_t max_item_num;	/**< item array size */
> > +	uint32_t max_key_num;	/**< key array size */
> > +};
> > +
> > +/**
> > + * This function creates a TCP reassembly table.
> > + *
> > + * @param socket_id
> > + *  socket index where the Ethernet port connects to.
> > + * @param max_flow_num
> > + *  the maximum number of flows in the TCP GRO table
> > + * @param max_item_per_flow
> > + *  the maximum packet number per flow.
> > + * @return
> > + *  if create successfully, return a pointer which points to the
> > + *  created TCP GRO table. Otherwise, return NULL.
> > + */
> > +void *gro_tcp_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> > +
> > +/**
> > + * This function destroys a TCP reassembly table.
> > + * @param tbl
> > + *  a pointer points to the TCP reassembly table.
> > + */
> > +void gro_tcp_tbl_destroy(void *tbl);
> > +
> > +/**
> > + * This function searches for a packet in the TCP reassembly table to
> > + * merge with the inputted one. To merge two packets is to chain them
> > + * together and update packet headers. If the packet is without data
> > + * (e.g. SYN, SYN-ACK packet), this function returns immediately.
> > + * Otherwise, the packet is either merged, or inserted into the table.
> > + * Besides, if there is no available space to insert the packet, this
> > + * function returns immediately too.
> > + *
> > + * This function assumes the inputted packet is with correct IPv4 and
> > + * TCP checksums. And if two packets are merged, it won't re-calculate
> > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> > + * fragmented, it assumes the packet is complete (with TCP header).
> > + *
> > + * @param pkt
> > + *  packet to reassemble.
> > + * @param tbl
> > + *  a pointer that points to a TCP reassembly table.
> > + * @param max_packet_size
> > + *  max packet length after merged
> > + * @return
> > + *  if the packet doesn't have data, or there is no available space
> > + *  in the table to insert a new item or a new key, return a negative
> > + *  value. If the packet is merged successfully, return an positive
> > + *  value. If the packet is inserted into the table, return 0.
> > + */
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp_tbl *tbl,
> > +		uint32_t max_packet_size);
> > +
> > +/**
> > + * This function flushes all packets in a TCP reassembly table to
> > + * applications, and without updating checksums for merged packets.
> > + * If the array which is used to keep flushed packets is not large
> > + * enough, error happens and this function returns immediately.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets. Applications
> > + *  should guarantee it's large enough to hold all packets in the table.
> > + * @param nb_out
> > + *  the element number of out.
> > + * @return
> > + *  the number of flushed packets. If out is not large enough to hold
> > + *  all packets in the table, return 0.
> > + */
> > +uint16_t
> > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out);
> > +
> > +/**
> > + * This function flushes timeout packets in a TCP reassembly table to
> > + * applications, and without updating checksums for merged packets.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param timeout_cycles
> > + *  the maximum time that packets can stay in the table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets.
> > + * @param nb_out
> > + *  the element number of out.
> > + * @return
> > + *  the number of packets that are returned.
> > + */
> > +uint16_t
> > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		const uint16_t nb_out);
> > +#endif
> > --
> > 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
                               ` (2 preceding siblings ...)
  2017-06-26  6:43             ` [PATCH v7 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-29 10:58             ` Jiayu Hu
  2017-06-29 10:58               ` [PATCH v8 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                 ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-29 10:58 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, yliu, stephen, jingjing.wu,
	tiwei.bie, lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way, they can select the lightweight mode API. If
applications need more fine-grained controls, they can select the
heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++++
 app/test-pmd/config.c                       |  37 +++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 ++++
 lib/librte_gro/rte_gro.c                    | 275 +++++++++++++++++++
 lib/librte_gro/rte_gro.h                    | 180 +++++++++++++
 lib/librte_gro/rte_gro_tcp.c                | 400 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h                | 172 ++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1320 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v8 1/3] lib: add Generic Receive Offload API framework
  2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-06-29 10:58               ` Jiayu Hu
  2017-06-29 10:58               ` [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-29 10:58 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, yliu, stephen, jingjing.wu,
	tiwei.bie, lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in a givn GRO table. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO table by rte_gro_tbl_create. Then they can use
rte_gro_reassemble to merge packets. The GROed packets are in the GRO
table. If applications want to get them, applications need to manually
flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 ++
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++++
 lib/librte_gro/rte_gro.c           | 176 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 176 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 422 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index f6aafd1..167f5ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..7efed18
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
+
+/**
+ * GRO table, which is used to merge packets. It keeps many reassembly
+ * tables of desired GRO types. Applications need to create GRO tables
+ * before using rte_gro_reassemble to perform GRO.
+ */
+struct rte_gro_tbl {
+	uint64_t desired_gro_types;	/**< GRO types to perform */
+	/* max TTL measured in nanosecond */
+	uint64_t max_timeout_cycles;
+	/* max length of merged packet measured in byte */
+	uint32_t max_packet_size;
+	/* reassebly tables of desired GRO types */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *rte_gro_tbl_create(const
+		const struct rte_gro_param *param)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct rte_gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i, j;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct rte_gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_tbl == NULL)
+		return NULL;
+	gro_tbl->max_packet_size = param->max_packet_size;
+	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
+	gro_tbl->desired_gro_types = param->desired_gro_types;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+
+		if ((param->desired_gro_types & gro_type_flag) == 0)
+			continue;
+		create_tbl_fn = tbl_create_functions[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_tbl->tbls[i] = create_tbl_fn(
+				param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_tbl->tbls[i] == NULL) {
+			/* destroy all allocated tables */
+			for (j = 0; j < i; j++) {
+				gro_type_flag = 1 << j;
+				if ((param->desired_gro_types & gro_type_flag) == 0)
+					continue;
+				destroy_tbl_fn = tbl_destroy_functions[j];
+				if (destroy_tbl_fn)
+					destroy_tbl_fn(gro_tbl->tbls[j]);
+			}
+			rte_free(gro_tbl);
+			return NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(void *tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct rte_gro_tbl *gro_tbl = (struct rte_gro_tbl *)tbl;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_functions[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_tbl->tbls[i]);
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *tbl __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t rte_gro_tbl_item_num(void *tbl)
+{
+	struct rte_gro_tbl *gro_tbl = (struct rte_gro_tbl *)tbl;
+	gro_tbl_item_num_fn item_num_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
+			continue;
+
+		item_num_fn = tbl_item_num_functions[i];
+		if (item_num_fn == NULL)
+			continue;
+		item_num += item_num_fn(gro_tbl->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..e44a510
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * the max packets number that rte_gro_reassemble_burst can
+ * process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 1024UL
+
+/* max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+
+
+struct rte_gro_param {
+	uint64_t desired_gro_types;	/**< desired GRO types */
+	uint32_t max_packet_size;	/**< max length of merged packets */
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max packet number per flow */
+
+	/* socket index where the Ethernet port connects to */
+	uint16_t socket_id;
+	/* max TTL for a packet in the GRO table, measured in nanosecond */
+	uint64_t max_timeout_cycles;
+};
+
+/**
+ * This function create a GRO table, which is used to merge packets in
+ * rte_gro_reassemble.
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO table.
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+void *rte_gro_tbl_create(
+		const struct rte_gro_param *param);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(void *tbl);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after been GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in a given GRO table. This function assumes all inputted
+ * packet is with correct checksums. And it won't update checksums if
+ * two packets are merged. Besides, if inputted packets are IP
+ * fragmented, this function assumes they are complete packets (i.e.
+ * with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types, they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (i.e. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  return the number of unprocessed packets (i.e. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *tbl);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * This function returns the number of packets in a given GRO table.
+ * @param tbl
+ *  pointer points to a GRO table.
+ * @return
+ *  the number of packets in the table.
+ */
+uint64_t rte_gro_tbl_item_num(void *tbl);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..358fb9d
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_tbl_create;
+	rte_gro_tbl_destroy;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+	rte_gro_tbl_item_num;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-29 10:58               ` [PATCH v8 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-29 10:58               ` Jiayu Hu
  2017-06-29 17:51                 ` Stephen Hemminger
  2017-06-29 10:59               ` [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-29 10:58 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, yliu, stephen, jingjing.wu,
	tiwei.bie, lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp_tbl_create: create a TCP reassembly table, which is used to
    merge packets.
- gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
- gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
    reassembly table.
- gro_tcp_tbl_item_num: return the number of packets in a TCP reassembly
    table.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP GRO, we use a table structure, called TCP reassembly table, to
reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
structure. A TCP reassembly table includes a key array and a item array,
where the key array keeps the criteria to merge packets and the item
array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes two parts:
- pkt: packet address
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with the following
    properties won't be processed:
	- packets without data (e.g. SYN, SYN-ACK)
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/rte_gro.c               | 125 +++++++++--
 lib/librte_gro/rte_gro.h               |   6 +-
 lib/librte_gro/rte_gro_tcp.c           | 400 +++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h           | 172 ++++++++++++++
 6 files changed, 697 insertions(+), 14 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..e89344d 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 7efed18..dd175fb 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM] = {
+	gro_tcp_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM] = {
+	gro_tcp_tbl_destroy, NULL};
+static gro_tbl_item_num_fn tbl_item_num_functions[
+	RTE_GRO_TYPE_MAX_NUM] = {gro_tcp_tbl_item_num, NULL};
 
 /**
  * GRO table, which is used to merge packets. It keeps many reassembly
@@ -130,27 +136,120 @@ void rte_gro_tbl_destroy(void *tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num = RTE_MIN(nb_pkts, param->max_flow_num *
+			param->max_item_per_flow);
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	uint32_t tcp_item_num = RTE_MIN(item_num,
+			RTE_GRO_MAX_BURST_ITEM_NUM);
+	struct gro_tcp_tbl tcp_tbl;
+	struct gro_tcp_key tcp_keys[tcp_item_num];
+	struct gro_tcp_item tcp_items[tcp_item_num];
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
+			tcp_item_num);
+	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
+			tcp_item_num);
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = tcp_item_num;
+	tcp_tbl.max_item_num = tcp_item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					param->max_packet_size,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0)
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp_tbl_timeout_flush(&tcp_tbl, 0,
+				pkts, nb_pkts);
+		if (unprocess_num > 0)
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+	}
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *tbl __rte_unused)
+		void *tbl)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct rte_gro_tbl *gro_tbl = (struct rte_gro_tbl *)tbl;
+	uint64_t current_time;
+
+	if ((gro_tbl->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
+						gro_tbl->max_packet_size,
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0)
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) * unprocess_num);
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct rte_gro_tbl *gro_tbl = (struct rte_gro_tbl *)tbl;
+
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & RTE_GRO_TCP_IPV4)
+		return gro_tcp_tbl_timeout_flush(
+				gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index e44a510..8fa5266 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,7 +45,11 @@ extern "C" {
 
 /* max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
-#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
+
+/* TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 
 struct rte_gro_param {
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..3719ca3
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,400 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp.h"
+
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp_tbl *tbl;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
+		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp_item) * entries_num;
+	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp_key) * entries_num;
+	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_key_num = entries_num;
+	return tbl;
+}
+
+void gro_tcp_tbl_destroy(void *tbl)
+{
+	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+static struct rte_mbuf *get_mbuf_lastseg(struct rte_mbuf *pkt)
+{
+	struct rte_mbuf *lastseg = pkt;
+
+	while (lastseg->next)
+		lastseg = lastseg->next;
+
+	return lastseg;
+}
+
+/**
+ * merge two TCP/IPv4 packets without updating checksums.
+ */
+static int
+merge_two_tcp4_packets(struct gro_tcp_item *item_src,
+		struct rte_mbuf *pkt,
+		uint32_t max_packet_size)
+{
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	struct rte_mbuf *pkt_src = item_src->pkt;
+
+	/* parse the given packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				char *) + pkt->l2_len);
+	ipv4_ihl1 = pkt->l3_len;
+	tcp_hl1 = pkt->l4_len;
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
+		ipv4_ihl1 - tcp_hl1;
+
+	/* parse the original packet */
+	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
+				char *) + pkt_src->l2_len);
+
+	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
+		return -1;
+
+	/* remove the header of the incoming packet */
+	rte_pktmbuf_adj(pkt, pkt->l2_len + ipv4_ihl1 + tcp_hl1);
+
+	/* chain the two packet together and update lastseg */
+	item_src->lastseg->next = pkt;
+	item_src->lastseg = get_mbuf_lastseg(pkt);
+
+	/* update IP header */
+	ipv4_hdr2->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr2->total_length)
+			+ tcp_dl1);
+
+	/* update mbuf metadata for the merged packet */
+	pkt_src->nb_segs += pkt->nb_segs;
+	pkt_src->pkt_len += pkt->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl)
+{
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
+	uint32_t sent_seq1, sent_seq;
+	uint16_t len;
+	int ret = -1;
+
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				char *) + pkt->l2_len);
+	ipv4_ihl1 = pkt->l3_len;
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
+	tcp_hl1 = pkt->l4_len;
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
+		- tcp_hl1;
+	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	/* check if the two packets are neighbor */
+	if (sent_seq == sent_seq1) {
+		ret = 1;
+		len = RTE_MAX(tcp_hl, tcp_hl1) - sizeof(struct tcp_hdr);
+		/* check if TCP option field equals */
+		if ((tcp_hl1 != tcp_hl) || ((len > 0) &&
+					(memcmp(tcp_hdr1 + 1,
+							tcp_hdr + 1,
+							len) != 0)))
+			ret = -1;
+	}
+	return ret;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].pkt == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static uint32_t
+find_an_empty_key(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
+
+	struct tcp_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, key_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+	ipv4_ihl = pkt->l3_len;
+
+	/* check if the packet should be processed */
+	if (ipv4_ihl < sizeof(struct ipv4_hdr))
+		return -1;
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
+	tcp_hl = pkt->l4_len;
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
+		- tcp_hl;
+	if (tcp_dl == 0)
+		return -1;
+
+	/* find a key and traverse all packets in its item group */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* search for a key */
+		if ((tbl->keys[i].is_valid == 0) ||
+				(memcmp(&(tbl->keys[i].key), &key,
+						sizeof(struct tcp_key)) != 0))
+			continue;
+
+		cur_idx = tbl->keys[i].start_index;
+		prev_idx = cur_idx;
+		while (cur_idx != INVALID_ARRAY_INDEX) {
+			if (check_seq_option(tbl->items[cur_idx].pkt,
+						tcp_hdr,
+						tcp_hl) > 0) {
+				if (merge_two_tcp4_packets(
+							&(tbl->items[cur_idx]),
+							pkt,
+							max_packet_size) > 0)
+					return 1;
+				/**
+				 * fail to merge two packets since
+				 * it's beyond the max packet length.
+				 * Insert it into the item group.
+				 */
+				item_idx = find_an_empty_item(tbl);
+				if (item_idx == INVALID_ARRAY_INDEX)
+					return -1;
+				tbl->items[prev_idx].next_pkt_idx = item_idx;
+				tbl->items[item_idx].pkt = pkt;
+				tbl->items[item_idx].lastseg =
+					get_mbuf_lastseg(pkt);
+				tbl->items[item_idx].next_pkt_idx =
+					INVALID_ARRAY_INDEX;
+				tbl->items[item_idx].start_time = start_time;
+				tbl->item_num++;
+				return 0;
+			}
+			prev_idx = cur_idx;
+			cur_idx = tbl->items[cur_idx].next_pkt_idx;
+		}
+		/**
+		 * find a corresponding item group but fails to find
+		 * one packet to merge. Insert it into this item group.
+		 */
+		item_idx = find_an_empty_item(tbl);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+		tbl->items[item_idx].pkt = pkt;
+		tbl->items[item_idx].lastseg =
+			get_mbuf_lastseg(pkt);
+		tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+		tbl->items[item_idx].start_time = start_time;
+		tbl->item_num++;
+		return 0;
+	}
+
+	/**
+	 * merge fail as the given packet has
+	 * a new key. So insert a new key.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	key_idx = find_an_empty_key(tbl);
+	/**
+	 * if current key or item number is greater than the max
+	 * value, don't insert the packet into the table and return
+	 * immediately.
+	 */
+	if (item_idx == INVALID_ARRAY_INDEX ||
+			key_idx == INVALID_ARRAY_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].lastseg = get_mbuf_lastseg(pkt);
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].start_time = start_time;
+	tbl->item_num++;
+
+	memcpy(&(tbl->keys[key_idx].key),
+			&key, sizeof(struct tcp_key));
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->keys[key_idx].is_valid = 1;
+	tbl->key_num++;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+	uint64_t current_time;
+
+	current_time = rte_rdtsc();
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		if (tbl->keys[i].is_valid == 0)
+			continue;
+
+		j = tbl->keys[i].start_index;
+		while (j != INVALID_ARRAY_INDEX) {
+			if (current_time - tbl->items[j].start_time >=
+					timeout_cycles) {
+				out[k++] = tbl->items[j].pkt;
+				tbl->items[j].pkt = NULL;
+				tbl->item_num--;
+				j = tbl->items[j].next_pkt_idx;
+
+				/**
+				 * delete the key as all of
+				 * its packets are flushed.
+				 */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].is_valid = 0;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/**
+				 * left packets of this key won't be
+				 * timeout, so go to check other keys.
+				 */
+				break;
+		}
+	}
+	return k;
+}
+
+uint32_t gro_tcp_tbl_item_num(void *tbl)
+{
+	struct gro_tcp_tbl *gro_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..2000318
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,172 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/* criteria of mergeing packets */
+struct tcp_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 4B */
+	uint32_t ip_dst_addr[4];
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+};
+
+struct gro_tcp_key {
+	struct tcp_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	struct rte_mbuf *lastseg;	/**< last segment of the packet */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+};
+
+/**
+ * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
+ * structure.
+ */
+struct gro_tcp_tbl {
+	struct gro_tcp_item *items;	/**< item array */
+	struct gro_tcp_key *keys;	/**< key array */
+	uint32_t item_num;	/**< current item number */
+	uint32_t key_num;	/**< current key num */
+	uint32_t max_item_num;	/**< item array size */
+	uint32_t max_key_num;	/**< key array size */
+};
+
+/**
+ * This function creates a TCP reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP reassembly table.
+ */
+void gro_tcp_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP reassembly table to
+ * merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. If the packet is without data
+ * (e.g. SYN, SYN-ACK packet), this function returns immediately.
+ * Otherwise, the packet is either merged, or inserted into the table.
+ * Besides, if there is no available space to insert the packet, this
+ * function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP reassembly table.
+ * @param max_packet_size
+ *  max packet length after merged
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ * @return
+ *  if the packet doesn't have data, or there is no available space
+ *  in the table to insert a new item or a new key, return a negative
+ *  value. If the packet is merged successfully, return an positive
+ *  value. If the packet is inserted into the table, return 0.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP reassembly table to
+ * applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in the TCP
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP reassembly table.
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t
+gro_tcp_tbl_item_num(void *tbl);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-29 10:58               ` [PATCH v8 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-29 10:58               ` [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-29 10:59               ` Jiayu Hu
  2017-06-30  2:26                 ` Wu, Jingjing
  2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-06-29 10:59 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, jianfeng.tan, yliu, stephen, jingjing.wu,
	tiwei.bie, lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  37 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 215 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ff8ffd2..cb359e1 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b0b340e..a5285b0 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,42 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.desired_gro_types = RTE_GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b29328a..ed27c7a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..377d933 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -428,6 +430,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2b9a1ea..528c833 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-29 10:58               ` [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-29 17:51                 ` Stephen Hemminger
  2017-06-30  2:07                   ` Jiayu Hu
  0 siblings, 1 reply; 141+ messages in thread
From: Stephen Hemminger @ 2017-06-29 17:51 UTC (permalink / raw)
  To: Jiayu Hu
  Cc: dev, konstantin.ananyev, jianfeng.tan, yliu, jingjing.wu,
	tiwei.bie, lei.a.yao

On Thu, 29 Jun 2017 18:58:59 +0800
Jiayu Hu <jiayu.hu@intel.com> wrote:

> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	uint32_t tcp_item_num = RTE_MIN(item_num,
> +			RTE_GRO_MAX_BURST_ITEM_NUM);
> +	struct gro_tcp_tbl tcp_tbl;
> +	struct gro_tcp_key tcp_keys[tcp_item_num];
> +	struct gro_tcp_item tcp_items[tcp_item_num];

Variable size structures on stack were not supported by
CLANG, last time I checked. Why not always max size?

> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +	uint64_t current_time;
> +
> +	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
> +			tcp_item_num);
> +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> +			tcp_item_num);

Variable size memset's are slow. They generate 'rep; stoz' last
I checked in GCC and because of rep instruction it kills multi-execution
pipeline.

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-29 17:51                 ` Stephen Hemminger
@ 2017-06-30  2:07                   ` Jiayu Hu
  0 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-30  2:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, konstantin.ananyev, jianfeng.tan, yliu, jingjing.wu,
	tiwei.bie, lei.a.yao

Hi Stephen,

On Thu, Jun 29, 2017 at 10:51:50AM -0700, Stephen Hemminger wrote:
> On Thu, 29 Jun 2017 18:58:59 +0800
> Jiayu Hu <jiayu.hu@intel.com> wrote:
> 
> > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > +	uint32_t tcp_item_num = RTE_MIN(item_num,
> > +			RTE_GRO_MAX_BURST_ITEM_NUM);
> > +	struct gro_tcp_tbl tcp_tbl;
> > +	struct gro_tcp_key tcp_keys[tcp_item_num];
> > +	struct gro_tcp_item tcp_items[tcp_item_num];
> 
> Variable size structures on stack were not supported by
> CLANG, last time I checked. Why not always max size?

The reason why I try to use the min value is to reduce the
overhead of array initialization. But I ignore the clang issue.
Thanks for your reminder. I will replace it with the max value.

> 
> > +
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	uint16_t unprocess_num = 0;
> > +	int32_t ret;
> > +	uint64_t current_time;
> > +
> > +	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> > +		return nb_pkts;
> > +
> > +	memset(tcp_keys, 0, sizeof(struct gro_tcp_key) *
> > +			tcp_item_num);
> > +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
> > +			tcp_item_num);
> 
> Variable size memset's are slow. They generate 'rep; stoz' last
> I checked in GCC and because of rep instruction it kills multi-execution
> pipeline.

Thanks, I will modify it.

BRs,
Jiayu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-29 10:59               ` [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-30  2:26                 ` Wu, Jingjing
  0 siblings, 0 replies; 141+ messages in thread
From: Wu, Jingjing @ 2017-06-30  2:26 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Ananyev, Konstantin, Tan, Jianfeng, yliu, stephen, Bie, Tiwei,
	Yao, Lei A



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Thursday, June 29, 2017 6:59 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; yliu@fridaylinux.org; stephen@networkplumber.org;
> Wu, Jingjing <jingjing.wu@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>; Yao,
> Lei A <lei.a.yao@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO
> 
> This patch enables TCP/IPv4 GRO library in csum forwarding engine.
> By default, GRO is turned off. Users can use command "gro (on|off) (port_id)"
> to enable or disable GRO for a given port. If a port is enabled GRO, all TCP/IPv4
> packets received from the port are performed GRO. Besides, users can set max
> flow number and packets number per-flow by command "gro set
> (max_flow_num) (max_item_num_per_flow) (port_id)".
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>

Looks fine from me about those new commands.

Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                 ` (2 preceding siblings ...)
  2017-06-29 10:59               ` [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-06-30  6:53               ` Jiayu Hu
  2017-06-30  6:53                 ` [PATCH v9 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                   ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-30  6:53 UTC (permalink / raw)
  To: dev
  Cc: stephen, konstantin.ananyev, jianfeng.tan, yliu, jingjing.wu,
	keith.wiles, tiwei.bie, lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way and the number of packets is small, they can
select the lightweight mode API. If applications need more fine-grained
controls, they can select the heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v9:
- avoid defining variable size structure array and memset variable size
	in rte_gro_reassemble_burst
- change internal structure name from 'te_gro_tbl' to 'gro_tbl'
- delete useless variables in rte_gro_tcp.c
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++++
 app/test-pmd/config.c                       |  37 +++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 ++++
 lib/librte_gro/rte_gro.c                    | 273 +++++++++++++++++++
 lib/librte_gro/rte_gro.h                    | 180 +++++++++++++
 lib/librte_gro/rte_gro_tcp.c                | 395 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h                | 172 ++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1313 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v9 1/3] lib: add Generic Receive Offload API framework
  2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-06-30  6:53                 ` Jiayu Hu
  2017-06-30  6:53                 ` [PATCH v9 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-30  6:53 UTC (permalink / raw)
  To: dev
  Cc: stephen, konstantin.ananyev, jianfeng.tan, yliu, jingjing.wu,
	keith.wiles, tiwei.bie, lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in a givn GRO table. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO table by rte_gro_tbl_create. Then they can use
rte_gro_reassemble to merge packets. The GROed packets are in the GRO
table. If applications want to get them, applications need to manually
flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 ++
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++++
 lib/librte_gro/rte_gro.c           | 176 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 176 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 422 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index f6aafd1..167f5ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..648835b
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
+
+/**
+ * GRO table, which is used to merge packets. It keeps many reassembly
+ * tables of desired GRO types. Applications need to create GRO tables
+ * before using rte_gro_reassemble to perform GRO.
+ */
+struct gro_tbl {
+	uint64_t desired_gro_types;	/**< GRO types to perform */
+	/* max TTL measured in nanosecond */
+	uint64_t max_timeout_cycles;
+	/* max length of merged packet measured in byte */
+	uint32_t max_packet_size;
+	/* reassebly tables of desired GRO types */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *rte_gro_tbl_create(const
+		const struct rte_gro_param *param)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i, j;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_tbl == NULL)
+		return NULL;
+	gro_tbl->max_packet_size = param->max_packet_size;
+	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
+	gro_tbl->desired_gro_types = param->desired_gro_types;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+
+		if ((param->desired_gro_types & gro_type_flag) == 0)
+			continue;
+		create_tbl_fn = tbl_create_functions[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_tbl->tbls[i] = create_tbl_fn(
+				param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_tbl->tbls[i] == NULL) {
+			/* destroy all allocated tables */
+			for (j = 0; j < i; j++) {
+				gro_type_flag = 1 << j;
+				if ((param->desired_gro_types & gro_type_flag) == 0)
+					continue;
+				destroy_tbl_fn = tbl_destroy_functions[j];
+				if (destroy_tbl_fn)
+					destroy_tbl_fn(gro_tbl->tbls[j]);
+			}
+			rte_free(gro_tbl);
+			return NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(void *tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_functions[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_tbl->tbls[i]);
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *tbl __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t rte_gro_tbl_item_num(void *tbl)
+{
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+	gro_tbl_item_num_fn item_num_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
+			continue;
+
+		item_num_fn = tbl_item_num_functions[i];
+		if (item_num_fn == NULL)
+			continue;
+		item_num += item_num_fn(gro_tbl->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..02c9113
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * the max packets number that rte_gro_reassemble_burst can
+ * process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 128UL
+
+/* max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+
+
+struct rte_gro_param {
+	uint64_t desired_gro_types;	/**< desired GRO types */
+	uint32_t max_packet_size;	/**< max length of merged packets */
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max packet number per flow */
+
+	/* socket index where the Ethernet port connects to */
+	uint16_t socket_id;
+	/* max TTL for a packet in the GRO table, measured in nanosecond */
+	uint64_t max_timeout_cycles;
+};
+
+/**
+ * This function create a GRO table, which is used to merge packets in
+ * rte_gro_reassemble.
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO table.
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+void *rte_gro_tbl_create(
+		const struct rte_gro_param *param);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(void *tbl);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after been GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in a given GRO table. This function assumes all inputted
+ * packet is with correct checksums. And it won't update checksums if
+ * two packets are merged. Besides, if inputted packets are IP
+ * fragmented, this function assumes they are complete packets (i.e.
+ * with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types, they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (i.e. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  return the number of unprocessed packets (i.e. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *tbl);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * This function returns the number of packets in a given GRO table.
+ * @param tbl
+ *  pointer points to a GRO table.
+ * @return
+ *  the number of packets in the table.
+ */
+uint64_t rte_gro_tbl_item_num(void *tbl);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..358fb9d
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_tbl_create;
+	rte_gro_tbl_destroy;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+	rte_gro_tbl_item_num;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v9 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-30  6:53                 ` [PATCH v9 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-06-30  6:53                 ` Jiayu Hu
  2017-06-30  6:53                 ` [PATCH v9 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-30  6:53 UTC (permalink / raw)
  To: dev
  Cc: stephen, konstantin.ananyev, jianfeng.tan, yliu, jingjing.wu,
	keith.wiles, tiwei.bie, lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp_tbl_create: create a TCP reassembly table, which is used to
    merge packets.
- gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
- gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP
    reassembly table.
- gro_tcp_tbl_item_num: return the number of packets in a TCP reassembly
    table.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP GRO, we use a table structure, called TCP reassembly table, to
reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
structure. A TCP reassembly table includes a key array and a item array,
where the key array keeps the criteria to merge packets and the item
array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes two parts:
- pkt: packet address
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with the following
    properties won't be processed:
	- packets without data (e.g. SYN, SYN-ACK)
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/rte_gro.c               | 123 ++++++++--
 lib/librte_gro/rte_gro.h               |   6 +-
 lib/librte_gro/rte_gro_tcp.c           | 395 +++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h           | 172 ++++++++++++++
 6 files changed, 690 insertions(+), 14 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..e89344d 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 648835b..993cf29 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM] = {
+	gro_tcp_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM] = {
+	gro_tcp_tbl_destroy, NULL};
+static gro_tbl_item_num_fn tbl_item_num_functions[
+	RTE_GRO_TYPE_MAX_NUM] = {gro_tcp_tbl_item_num, NULL};
 
 /**
  * GRO table, which is used to merge packets. It keeps many reassembly
@@ -130,27 +136,118 @@ void rte_gro_tbl_destroy(void *tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	struct gro_tcp_tbl tcp_tbl;
+	struct gro_tcp_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+	struct gro_tcp_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	/* get the actual number of items */
+	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
+			param->max_item_per_flow));
+	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
+
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = item_num;
+	tcp_tbl.max_item_num = item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					param->max_packet_size,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0)
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp_tbl_timeout_flush(&tcp_tbl, 0,
+				pkts, nb_pkts);
+		if (unprocess_num > 0)
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+	}
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *tbl __rte_unused)
+		void *tbl)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+	uint64_t current_time;
+
+	if ((gro_tbl->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
+						gro_tbl->max_packet_size,
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0)
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) * unprocess_num);
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & RTE_GRO_TCP_IPV4)
+		return gro_tcp_tbl_timeout_flush(
+				gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 02c9113..2cd06ee 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,7 +45,11 @@ extern "C" {
 
 /* max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
-#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
+
+/* TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 
 struct rte_gro_param {
diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
new file mode 100644
index 0000000..cf5cea2
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.c
@@ -0,0 +1,395 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp.h"
+
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp_tbl *tbl;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
+		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp_item) * entries_num;
+	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp_key) * entries_num;
+	tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_key_num = entries_num;
+	return tbl;
+}
+
+void gro_tcp_tbl_destroy(void *tbl)
+{
+	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+static struct rte_mbuf *get_mbuf_lastseg(struct rte_mbuf *pkt)
+{
+	struct rte_mbuf *lastseg = pkt;
+
+	while (lastseg->next)
+		lastseg = lastseg->next;
+
+	return lastseg;
+}
+
+/**
+ * merge two TCP/IPv4 packets without updating checksums.
+ */
+static int
+merge_two_tcp4_packets(struct gro_tcp_item *item_src,
+		struct rte_mbuf *pkt,
+		uint32_t max_packet_size)
+{
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
+	uint16_t tcp_dl1;
+	struct rte_mbuf *pkt_src = item_src->pkt;
+
+	/* parse the given packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				char *) + pkt->l2_len);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
+		pkt->l3_len - pkt->l4_len;
+
+	/* parse the original packet */
+	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
+				char *) + pkt_src->l2_len);
+
+	if (pkt_src->pkt_len + tcp_dl1 > max_packet_size)
+		return -1;
+
+	/* remove the header of the incoming packet */
+	rte_pktmbuf_adj(pkt, pkt->l2_len + pkt->l3_len + pkt->l4_len);
+
+	/* chain the two packet together and update lastseg */
+	item_src->lastseg->next = pkt;
+	item_src->lastseg = get_mbuf_lastseg(pkt);
+
+	/* update IP header */
+	ipv4_hdr2->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr2->total_length)
+			+ tcp_dl1);
+
+	/* update mbuf metadata for the merged packet */
+	pkt_src->nb_segs += pkt->nb_segs;
+	pkt_src->pkt_len += pkt->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl)
+{
+	struct ipv4_hdr *ipv4_hdr1;
+	struct tcp_hdr *tcp_hdr1;
+	uint16_t tcp_hl1, tcp_dl1;
+	uint32_t sent_seq1, sent_seq;
+	uint16_t len;
+	int ret = -1;
+
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				char *) + pkt->l2_len);
+	tcp_hl1 = pkt->l4_len;
+	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + pkt->l3_len);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
+		pkt->l3_len - tcp_hl1;
+	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	/* check if the two packets are neighbor */
+	if (sent_seq == sent_seq1) {
+		ret = 1;
+		len = RTE_MAX(tcp_hl, tcp_hl1) - sizeof(struct tcp_hdr);
+		/* check if TCP option field equals */
+		if ((tcp_hl1 != tcp_hl) || ((len > 0) &&
+					(memcmp(tcp_hdr1 + 1,
+							tcp_hdr + 1,
+							len) != 0)))
+			ret = -1;
+	}
+	return ret;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].pkt == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static uint32_t
+find_an_empty_key(struct gro_tcp_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t tcp_dl;
+
+	struct tcp_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, key_idx;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+
+	/* check if the packet should be processed */
+	if (pkt->l3_len < sizeof(struct ipv4_hdr))
+		return -1;
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len
+		- pkt->l4_len;
+	if (tcp_dl == 0)
+		return -1;
+
+	/* find a key and traverse all packets in its item group */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+	key.tcp_flags = tcp_hdr->tcp_flags;
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* search for a key */
+		if ((tbl->keys[i].is_valid == 0) ||
+				(memcmp(&(tbl->keys[i].key), &key,
+						sizeof(struct tcp_key)) != 0))
+			continue;
+
+		cur_idx = tbl->keys[i].start_index;
+		prev_idx = cur_idx;
+		while (cur_idx != INVALID_ARRAY_INDEX) {
+			if (check_seq_option(tbl->items[cur_idx].pkt,
+						tcp_hdr,
+						pkt->l4_len) > 0) {
+				if (merge_two_tcp4_packets(
+							&(tbl->items[cur_idx]),
+							pkt,
+							max_packet_size) > 0)
+					return 1;
+				/**
+				 * fail to merge two packets since
+				 * it's beyond the max packet length.
+				 * Insert it into the item group.
+				 */
+				item_idx = find_an_empty_item(tbl);
+				if (item_idx == INVALID_ARRAY_INDEX)
+					return -1;
+				tbl->items[prev_idx].next_pkt_idx = item_idx;
+				tbl->items[item_idx].pkt = pkt;
+				tbl->items[item_idx].lastseg =
+					get_mbuf_lastseg(pkt);
+				tbl->items[item_idx].next_pkt_idx =
+					INVALID_ARRAY_INDEX;
+				tbl->items[item_idx].start_time = start_time;
+				tbl->item_num++;
+				return 0;
+			}
+			prev_idx = cur_idx;
+			cur_idx = tbl->items[cur_idx].next_pkt_idx;
+		}
+		/**
+		 * find a corresponding item group but fails to find
+		 * one packet to merge. Insert it into this item group.
+		 */
+		item_idx = find_an_empty_item(tbl);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+		tbl->items[item_idx].pkt = pkt;
+		tbl->items[item_idx].lastseg =
+			get_mbuf_lastseg(pkt);
+		tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+		tbl->items[item_idx].start_time = start_time;
+		tbl->item_num++;
+		return 0;
+	}
+
+	/**
+	 * merge fail as the given packet has
+	 * a new key. So insert a new key.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	key_idx = find_an_empty_key(tbl);
+	/**
+	 * if current key or item number is greater than the max
+	 * value, don't insert the packet into the table and return
+	 * immediately.
+	 */
+	if (item_idx == INVALID_ARRAY_INDEX ||
+			key_idx == INVALID_ARRAY_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].lastseg = get_mbuf_lastseg(pkt);
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].start_time = start_time;
+	tbl->item_num++;
+
+	memcpy(&(tbl->keys[key_idx].key),
+			&key, sizeof(struct tcp_key));
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->keys[key_idx].is_valid = 1;
+	tbl->key_num++;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+	uint64_t current_time;
+
+	current_time = rte_rdtsc();
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		if (tbl->keys[i].is_valid == 0)
+			continue;
+
+		j = tbl->keys[i].start_index;
+		while (j != INVALID_ARRAY_INDEX) {
+			if (current_time - tbl->items[j].start_time >=
+					timeout_cycles) {
+				out[k++] = tbl->items[j].pkt;
+				tbl->items[j].pkt = NULL;
+				tbl->item_num--;
+				j = tbl->items[j].next_pkt_idx;
+
+				/**
+				 * delete the key as all of
+				 * its packets are flushed.
+				 */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].is_valid = 0;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/**
+				 * left packets of this key won't be
+				 * timeout, so go to check other keys.
+				 */
+				break;
+		}
+	}
+	return k;
+}
+
+uint32_t gro_tcp_tbl_item_num(void *tbl)
+{
+	struct gro_tcp_tbl *gro_tbl = (struct gro_tcp_tbl *)tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
new file mode 100644
index 0000000..2000318
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp.h
@@ -0,0 +1,172 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP_H_
+#define _RTE_GRO_TCP_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/* criteria of mergeing packets */
+struct tcp_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 4B */
+	uint32_t ip_dst_addr[4];
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+	uint8_t tcp_flags;	/**< TCP flags. */
+};
+
+struct gro_tcp_key {
+	struct tcp_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	struct rte_mbuf *lastseg;	/**< last segment of the packet */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+};
+
+/**
+ * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
+ * structure.
+ */
+struct gro_tcp_tbl {
+	struct gro_tcp_item *items;	/**< item array */
+	struct gro_tcp_key *keys;	/**< key array */
+	uint32_t item_num;	/**< current item number */
+	uint32_t key_num;	/**< current key num */
+	uint32_t max_item_num;	/**< item array size */
+	uint32_t max_key_num;	/**< key array size */
+};
+
+/**
+ * This function creates a TCP reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP reassembly table.
+ */
+void gro_tcp_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP reassembly table to
+ * merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. If the packet is without data
+ * (e.g. SYN, SYN-ACK packet), this function returns immediately.
+ * Otherwise, the packet is either merged, or inserted into the table.
+ * Besides, if there is no available space to insert the packet, this
+ * function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP reassembly table.
+ * @param max_packet_size
+ *  max packet length after merged
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ * @return
+ *  if the packet doesn't have data, or there is no available space
+ *  in the table to insert a new item or a new key, return a negative
+ *  value. If the packet is merged successfully, return an positive
+ *  value. If the packet is inserted into the table, return 0.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp_tbl *tbl,
+		uint32_t max_packet_size,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP reassembly table to
+ * applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t
+gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in the TCP
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP reassembly table.
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t
+gro_tcp_tbl_item_num(void *tbl);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v9 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-06-30  6:53                 ` [PATCH v9 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-06-30  6:53                 ` [PATCH v9 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-06-30  6:53                 ` Jiayu Hu
  2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-06-30  6:53 UTC (permalink / raw)
  To: dev
  Cc: stephen, konstantin.ananyev, jianfeng.tan, yliu, jingjing.wu,
	keith.wiles, tiwei.bie, lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  37 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 215 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ff8ffd2..cb359e1 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b0b340e..a5285b0 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,42 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.desired_gro_types = RTE_GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b29328a..ed27c7a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..377d933 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -428,6 +430,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2b9a1ea..528c833 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-29  2:26                 ` Jiayu Hu
@ 2017-06-30 12:07                   ` Ananyev, Konstantin
  2017-06-30 15:40                     ` Hu, Jiayu
  0 siblings, 1 reply; 141+ messages in thread
From: Ananyev, Konstantin @ 2017-06-30 12:07 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: dev, Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A,
	Wiles, Keith, Bie, Tiwei

Hi Jiayu,

> > > +
> > > +int32_t
> > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > +		struct gro_tcp_tbl *tbl,
> > > +		uint32_t max_packet_size)
> > > +{
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > > +
> > > +	struct tcp_key key;
> > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > +	uint32_t i, key_idx;
> > > +
> > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > +
> > > +	/* check if the packet should be processed */
> > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > +		goto fail;
> > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > +		- tcp_hl;
> > > +	if (tcp_dl == 0)
> > > +		goto fail;
> > > +
> > > +	/* find a key and traverse all packets in its item group */
> > > +	key.eth_saddr = eth_hdr->s_addr;
> > > +	key.eth_daddr = eth_hdr->d_addr;
> > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >
> > Your key.ip_src_addr[1-3] still contains some junk.
> > How memcmp below supposed to worj properly?
> 
> When allocate an item, we already guarantee the content of its
> memory space is 0. So memcpy won't be error.

key is allocated on the stack.
Obviously fileds that are not initialized manually will contain undefined values,
i.e.: ip_src-addr[1-3].
Then below you are doing:
memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key));
...
memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key));

So I still think you are comparing/copying some junk here.

> 
> > BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
> > Same for ip_dst_addr.
> 
> I think tcp6 and tcp4 can share the same table structure. So I use
> 128bit IP address here. You mean we need to use different structures
> for tcp4 and tcp6?

That would be my preference.

> 
> >
> > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > +
> > > +	for (i = 0; i < tbl->max_key_num; i++) {
> > > +		if (tbl->keys[i].is_valid &&
> > > +				(memcmp(&(tbl->keys[i].key), &key,
> > > +						sizeof(struct tcp_key))
> > > +				 == 0)) {
> > > +			cur_idx = tbl->keys[i].start_index;
> > > +			prev_idx = cur_idx;
> > > +			while (cur_idx != INVALID_ARRAY_INDEX) {
> > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > +							tcp_hdr,
> > > +							tcp_hl) > 0) {
> >
> > As I remember linux gro also check ipv4 packet_id - it should be consecutive.
> 
> IP fragmented packets have the same IP ID, but others are consecutive.

Yes, you assume that they are consecutive.
But the only way to know for sure that they are - check it.
Another thing - I think we need to perform GRO only for TCP packets with only ACK bit set
(i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.).

> As we  suppose GRO can merge IP fragmented packets, so I think we shouldn't check if
> the IP ID is consecutive here. How do you think?
> 
> >
> > > +					if (merge_two_tcp4_packets(
> > > +								tbl->items[cur_idx].pkt,
> > > +								pkt,
> > > +								max_packet_size) > 0) {
> > > +						/* successfully merge two packets */
> > > +						tbl->items[cur_idx].is_groed = 1;
> > > +						return 1;
> > > +					}
> >
> > If you allow more then packet per flow to be stored in the table, then you should be
> > prepared that new segment can fill a gap between 2 packets.
> > Probably the easiest thing - don't allow more then one 'item' per flow.
> 
> We allow the table to store same flow but out-of-order arriving packets. For
> these packets, they will occupy different 'item' and we won't re-merge them.
> For example, there are three same flow tcp packets: p1, p2 and p3. And p1 arrives
> first, then p3, and last is p2. So TCP GRO will allocate one 'item' for p1 and one
> 'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefore, in the
> table, we will have two 'item': item1 to store merged p1 and p2, item2 to store p3.
> 
> As you can see, TCP GRO can only merges sequential arriving packets. If we want to
> merge all out-of-order arriving packets, we need to re-process these packets which
> are already processed and have one 'item'. IMO, this procedure will be very complicated.
> So we don't do that.
> 
> Sorry, I don't understand how to allow one 'item' per-flow. Because packets are arriving
> out-of-order. If we don't re-process these packets which already have one 'item', how to
> guarantee it?

As I understand you'll have an input array:
<seq=2, seq=1> - you wouldn't be able to merge it.
So I think your merge need be prepared to merge both smaller and bigger sequence.
About one item my thought : instead of allowing N items per key(flow) - for simplicity
just allow one item per flow.
In that case we wouldn't allow holes, but still would be able to merge reordered packets.
Alternatively you can store items ordered by seq, and after merge, try to merge neighbor
Items too.

> 
> >
> > > +					/**
> > > +					 * fail to merge two packets since
> > > +					 * it's beyond the max packet length.
> > > +					 * Insert it into the item group.
> > > +					 */
> > > +					goto insert_to_item_group;
> > > +				} else {
> > > +					prev_idx = cur_idx;
> > > +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > > +				}
> > > +			}
> > > +			/**
> > > +			 * find a corresponding item group but fails to find
> > > +			 * one packet to merge. Insert it into this item group.
> > > +			 */
> > > +insert_to_item_group:
> > > +			item_idx = find_an_empty_item(tbl);
> > > +			/* the item number is greater than the max value */
> > > +			if (item_idx == INVALID_ARRAY_INDEX)
> > > +				return -1;
> > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > +			tbl->items[item_idx].pkt = pkt;
> > > +			tbl->items[item_idx].is_groed = 0;
> > > +			tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > > +			tbl->items[item_idx].is_valid = 1;
> > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > +			tbl->item_num++;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	/**
> > > +	 * merge fail as the given packet has a new key.
> > > +	 * So insert a new key.
> > > +	 */
> > > +	item_idx = find_an_empty_item(tbl);
> > > +	key_idx = find_an_empty_key(tbl);
> > > +	/**
> > > +	 * if current key or item number is greater than the max
> > > +	 * value, don't insert the packet into the table and return
> > > +	 * immediately.
> > > +	 */
> > > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > > +			key_idx == INVALID_ARRAY_INDEX)
> > > +		return -1;
> > > +	tbl->items[item_idx].pkt = pkt;
> > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > > +	tbl->items[item_idx].is_groed = 0;
> > > +	tbl->items[item_idx].is_valid = 1;
> > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-06-30 12:07                   ` Ananyev, Konstantin
@ 2017-06-30 15:40                     ` Hu, Jiayu
  0 siblings, 0 replies; 141+ messages in thread
From: Hu, Jiayu @ 2017-06-30 15:40 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Tan, Jianfeng, stephen, yliu, Wu, Jingjing, Yao, Lei A,
	Wiles, Keith, Bie, Tiwei

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, June 30, 2017 8:07 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>;
> stephen@networkplumber.org; yliu@fridaylinux.org; Wu, Jingjing
> <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Subject: RE: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> Hi Jiayu,
> 
> > > > +
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		uint32_t max_packet_size)
> > > > +{
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > > > +
> > > > +	struct tcp_key key;
> > > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > > +	uint32_t i, key_idx;
> > > > +
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > > +
> > > > +	/* check if the packet should be processed */
> > > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > > +		goto fail;
> > > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > > +		- tcp_hl;
> > > > +	if (tcp_dl == 0)
> > > > +		goto fail;
> > > > +
> > > > +	/* find a key and traverse all packets in its item group */
> > > > +	key.eth_saddr = eth_hdr->s_addr;
> > > > +	key.eth_daddr = eth_hdr->d_addr;
> > > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > >
> > > Your key.ip_src_addr[1-3] still contains some junk.
> > > How memcmp below supposed to worj properly?
> >
> > When allocate an item, we already guarantee the content of its
> > memory space is 0. So memcpy won't be error.
> 
> key is allocated on the stack.
> Obviously fileds that are not initialized manually will contain undefined values,
> i.e.: ip_src-addr[1-3].
> Then below you are doing:
> memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key));
> ...
> memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key));
> 
> So I still think you are comparing/copying some junk here.

Oh, yes. Key is allocated in stack. Thanks, I will modify it.

> 
> >
> > > BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
> > > Same for ip_dst_addr.
> >
> > I think tcp6 and tcp4 can share the same table structure. So I use
> > 128bit IP address here. You mean we need to use different structures
> > for tcp4 and tcp6?
> 
> That would be my preference.

Got it. I will modify it. Thanks.

> 
> >
> > >
> > > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > > +
> > > > +	for (i = 0; i < tbl->max_key_num; i++) {
> > > > +		if (tbl->keys[i].is_valid &&
> > > > +				(memcmp(&(tbl->keys[i].key), &key,
> > > > +						sizeof(struct tcp_key))
> > > > +				 == 0)) {
> > > > +			cur_idx = tbl->keys[i].start_index;
> > > > +			prev_idx = cur_idx;
> > > > +			while (cur_idx != INVALID_ARRAY_INDEX) {
> > > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > > +							tcp_hdr,
> > > > +							tcp_hl) > 0) {
> > >
> > > As I remember linux gro also check ipv4 packet_id - it should be
> consecutive.
> >
> > IP fragmented packets have the same IP ID, but others are consecutive.
> 
> Yes, you assume that they are consecutive.
> But the only way to know for sure that they are - check it.

Yes, I will modify it. Thanks.

> Another thing - I think we need to perform GRO only for TCP packets with
> only ACK bit set
> (i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.).

Currently, we don't process packets whose payload length is 0. So if the packets
are SYN or FIN, we won't process them, since their payload length is 0. For URG,
PSH and RST, TCP/IPv4 GRO may still process these packets.

You are right. We shouldn't process packets whose URG, PSH or RST bit is set.
Thanks, I will modify it. 

> 
> > As we  suppose GRO can merge IP fragmented packets, so I think we
> shouldn't check if
> > the IP ID is consecutive here. How do you think?
> >
> > >
> > > > +					if (merge_two_tcp4_packets(
> > > > +								tbl-
> >items[cur_idx].pkt,
> > > > +								pkt,
> > > > +
> 	max_packet_size) > 0) {
> > > > +						/* successfully merge two
> packets */
> > > > +						tbl-
> >items[cur_idx].is_groed = 1;
> > > > +						return 1;
> > > > +					}
> > >
> > > If you allow more then packet per flow to be stored in the table, then you
> should be
> > > prepared that new segment can fill a gap between 2 packets.
> > > Probably the easiest thing - don't allow more then one 'item' per flow.
> >
> > We allow the table to store same flow but out-of-order arriving packets.
> For
> > these packets, they will occupy different 'item' and we won't re-merge
> them.
> > For example, there are three same flow tcp packets: p1, p2 and p3. And p1
> arrives
> > first, then p3, and last is p2. So TCP GRO will allocate one 'item' for p1 and
> one
> > 'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefore, in
> the
> > table, we will have two 'item': item1 to store merged p1 and p2, item2 to
> store p3.
> >
> > As you can see, TCP GRO can only merges sequential arriving packets. If we
> want to
> > merge all out-of-order arriving packets, we need to re-process these
> packets which
> > are already processed and have one 'item'. IMO, this procedure will be very
> complicated.
> > So we don't do that.
> >
> > Sorry, I don't understand how to allow one 'item' per-flow. Because
> packets are arriving
> > out-of-order. If we don't re-process these packets which already have one
> 'item', how to
> > guarantee it?
> 
> As I understand you'll have an input array:
> <seq=2, seq=1> - you wouldn't be able to merge it.
> So I think your merge need be prepared to merge both smaller and bigger
> sequence.

Oh yes, it's much better. I will modify it.

> About one item my thought : instead of allowing N items per key(flow) - for
> simplicity
> just allow one item per flow.
> In that case we wouldn't allow holes, but still would be able to merge
> reordered packets.
> Alternatively you can store items ordered by seq, and after merge, try to
> merge neighbor
> Items too.

Yes, when we insert a new item, we can chain it with the packets of its item
group ordered by seq. After processing all packets, we need to traverse each
item group and try to merge neighbors. But when will 'merge neighbors' happen?
When flush packets from the table? (e.g.  gro_tcp_tbl_timeout_flush)

BRs,
Jiayu
> 
> >
> > >
> > > > +					/**
> > > > +					 * fail to merge two packets since
> > > > +					 * it's beyond the max packet length.
> > > > +					 * Insert it into the item group.
> > > > +					 */
> > > > +					goto insert_to_item_group;
> > > > +				} else {
> > > > +					prev_idx = cur_idx;
> > > > +					cur_idx = tbl-
> >items[cur_idx].next_pkt_idx;
> > > > +				}
> > > > +			}
> > > > +			/**
> > > > +			 * find a corresponding item group but fails to find
> > > > +			 * one packet to merge. Insert it into this item group.
> > > > +			 */
> > > > +insert_to_item_group:
> > > > +			item_idx = find_an_empty_item(tbl);
> > > > +			/* the item number is greater than the max value */
> > > > +			if (item_idx == INVALID_ARRAY_INDEX)
> > > > +				return -1;
> > > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > > +			tbl->items[item_idx].pkt = pkt;
> > > > +			tbl->items[item_idx].is_groed = 0;
> > > > +			tbl->items[item_idx].next_pkt_idx =
> INVALID_ARRAY_INDEX;
> > > > +			tbl->items[item_idx].is_valid = 1;
> > > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > > +			tbl->item_num++;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * merge fail as the given packet has a new key.
> > > > +	 * So insert a new key.
> > > > +	 */
> > > > +	item_idx = find_an_empty_item(tbl);
> > > > +	key_idx = find_an_empty_key(tbl);
> > > > +	/**
> > > > +	 * if current key or item number is greater than the max
> > > > +	 * value, don't insert the packet into the table and return
> > > > +	 * immediately.
> > > > +	 */
> > > > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > > > +			key_idx == INVALID_ARRAY_INDEX)
> > > > +		return -1;
> > > > +	tbl->items[item_idx].pkt = pkt;
> > > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > > > +	tbl->items[item_idx].is_groed = 0;
> > > > +	tbl->items[item_idx].is_valid = 1;
> > > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > >

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK
  2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                   ` (2 preceding siblings ...)
  2017-06-30  6:53                 ` [PATCH v9 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-01 11:08                 ` Jiayu Hu
  2017-07-01 11:08                   ` [PATCH v10 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                     ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-01 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, stephen, jianfeng.tan, yliu, jingjing.wu,
	lei.a.yao, tiwei.bie, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way and the number of packets is small, they can
select the lightweight mode API. If applications need more fine-grained
controls, they can select the heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v10:
- add support to merge '<seq=2, seq=1>' TCP/IPv4 packets
- check if IP ID is consecutive and update IP ID for merged packets
- check SYN, FIN, PSH, RST, URG flags
- use different reassembly table structures and APIs for TCP/IPv4 GRO
	and TCP/IPv6 GRO
- change file name from 'rte_gro_tcp.x' to 'rte_gro_tcp4.x'
v9:
- avoid defining variable size structure array and memset variable size
	in rte_gro_reassemble_burst
- change internal structure name from 'te_gro_tbl' to 'gro_tbl'
- delete useless variables in rte_gro_tcp.c
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 ++++++++
 app/test-pmd/config.c                       |  37 +++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 ++++
 lib/librte_gro/rte_gro.c                    | 273 +++++++++++++++++
 lib/librte_gro/rte_gro.h                    | 180 ++++++++++++
 lib/librte_gro/rte_gro_tcp4.c               | 439 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp4.h               | 172 +++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1357 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp4.c
 create mode 100644 lib/librte_gro/rte_gro_tcp4.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v10 1/3] lib: add Generic Receive Offload API framework
  2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-07-01 11:08                   ` Jiayu Hu
  2017-07-02 10:19                     ` Tan, Jianfeng
  2017-07-04  8:37                     ` Yuanhan Liu
  2017-07-01 11:08                   ` [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-01 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, stephen, jianfeng.tan, yliu, jingjing.wu,
	lei.a.yao, tiwei.bie, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in a givn GRO table. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO table by rte_gro_tbl_create. Then they can use
rte_gro_reassemble to merge packets. The GROed packets are in the GRO
table. If applications want to get them, applications need to manually
flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 ++
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++++
 lib/librte_gro/rte_gro.c           | 176 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 176 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 422 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index f6aafd1..167f5ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..648835b
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
+
+/**
+ * GRO table, which is used to merge packets. It keeps many reassembly
+ * tables of desired GRO types. Applications need to create GRO tables
+ * before using rte_gro_reassemble to perform GRO.
+ */
+struct gro_tbl {
+	uint64_t desired_gro_types;	/**< GRO types to perform */
+	/* max TTL measured in nanosecond */
+	uint64_t max_timeout_cycles;
+	/* max length of merged packet measured in byte */
+	uint32_t max_packet_size;
+	/* reassebly tables of desired GRO types */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *rte_gro_tbl_create(const
+		const struct rte_gro_param *param)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_tbl *gro_tbl;
+	uint64_t gro_type_flag = 0;
+	uint8_t i, j;
+
+	gro_tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_tbl),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_tbl == NULL)
+		return NULL;
+	gro_tbl->max_packet_size = param->max_packet_size;
+	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
+	gro_tbl->desired_gro_types = param->desired_gro_types;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+
+		if ((param->desired_gro_types & gro_type_flag) == 0)
+			continue;
+		create_tbl_fn = tbl_create_functions[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_tbl->tbls[i] = create_tbl_fn(
+				param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_tbl->tbls[i] == NULL) {
+			/* destroy all allocated tables */
+			for (j = 0; j < i; j++) {
+				gro_type_flag = 1 << j;
+				if ((param->desired_gro_types & gro_type_flag) == 0)
+					continue;
+				destroy_tbl_fn = tbl_destroy_functions[j];
+				if (destroy_tbl_fn)
+					destroy_tbl_fn(gro_tbl->tbls[j]);
+			}
+			rte_free(gro_tbl);
+			return NULL;
+		}
+	}
+	return gro_tbl;
+}
+
+void rte_gro_tbl_destroy(void *tbl)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_functions[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_tbl->tbls[i]);
+	}
+	rte_free(gro_tbl);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *tbl __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *tbl __rte_unused,
+		uint64_t desired_gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t rte_gro_tbl_item_num(void *tbl)
+{
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+	gro_tbl_item_num_fn item_num_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
+			continue;
+
+		item_num_fn = tbl_item_num_functions[i];
+		if (item_num_fn == NULL)
+			continue;
+		item_num += item_num_fn(gro_tbl->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..02c9113
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * the max packets number that rte_gro_reassemble_burst can
+ * process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 128UL
+
+/* max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+
+
+struct rte_gro_param {
+	uint64_t desired_gro_types;	/**< desired GRO types */
+	uint32_t max_packet_size;	/**< max length of merged packets */
+	uint16_t max_flow_num;	/**< max flow number */
+	uint16_t max_item_per_flow;	/**< max packet number per flow */
+
+	/* socket index where the Ethernet port connects to */
+	uint16_t socket_id;
+	/* max TTL for a packet in the GRO table, measured in nanosecond */
+	uint64_t max_timeout_cycles;
+};
+
+/**
+ * This function create a GRO table, which is used to merge packets in
+ * rte_gro_reassemble.
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO table.
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  table. Otherwise, return NULL.
+ */
+void *rte_gro_tbl_create(
+		const struct rte_gro_param *param);
+/**
+ * This function destroys a GRO table.
+ */
+void rte_gro_tbl_destroy(void *tbl);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst what rules
+ *  are demanded.
+ * @return
+ *  the number of packets after been GROed.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in a given GRO table. This function assumes all inputted
+ * packet is with correct checksums. And it won't update checksums if
+ * two packets are merged. Besides, if inputted packets are IP
+ * fragmented, this function assumes they are complete packets (i.e.
+ * with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types, they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (i.e. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param tbl
+ *  a pointer points to a GRO table.
+ * @return
+ *  return the number of unprocessed packets (i.e. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *tbl);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param tbl
+ *  a pointer points to a GRO table object.
+ * @param desired_gro_types
+ * rte_gro_timeout_flush only processes packets which belong to the
+ * GRO types specified by desired_gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * This function returns the number of packets in a given GRO table.
+ * @param tbl
+ *  pointer points to a GRO table.
+ * @return
+ *  the number of packets in the table.
+ */
+uint64_t rte_gro_tbl_item_num(void *tbl);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..358fb9d
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_tbl_create;
+	rte_gro_tbl_destroy;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+	rte_gro_tbl_item_num;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index bcaf1b3..fc3776d 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-01 11:08                   ` [PATCH v10 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-01 11:08                   ` Jiayu Hu
  2017-07-02 10:19                     ` Tan, Jianfeng
  2017-07-04  9:03                     ` Yuanhan Liu
  2017-07-01 11:08                   ` [PATCH v10 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 2 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-01 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, stephen, jianfeng.tan, yliu, jingjing.wu,
	lei.a.yao, tiwei.bie, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
    to merge packets.
- gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
- gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
    reassembly table.
- gro_tcp4_tbl_item_num: return the number of packets in a TCP/IPv4
    reassembly table.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
array and a item array, where the key array keeps the criteria to merge
packets and the item array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes two parts:
- pkt: packet address
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with one of the
    following properties won't be processed:
	- SYN, FIN, RST or PSH bit is set;
	- packet payload length is 0.
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/rte_gro.c               | 123 ++++++++-
 lib/librte_gro/rte_gro.h               |  10 +-
 lib/librte_gro/rte_gro_tcp4.c          | 439 +++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp4.h          | 172 +++++++++++++
 6 files changed, 736 insertions(+), 16 deletions(-)
 create mode 100644 lib/librte_gro/rte_gro_tcp4.c
 create mode 100644 lib/librte_gro/rte_gro_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..43e276e 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 648835b..a4641f9 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "rte_gro_tcp4.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM] = {
+	gro_tcp4_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM] = {
+	gro_tcp4_tbl_destroy, NULL};
+static gro_tbl_item_num_fn tbl_item_num_functions[
+	RTE_GRO_TYPE_MAX_NUM] = {gro_tcp4_tbl_item_num, NULL};
 
 /**
  * GRO table, which is used to merge packets. It keeps many reassembly
@@ -130,27 +136,118 @@ void rte_gro_tbl_destroy(void *tbl)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	struct gro_tcp4_tbl tcp_tbl;
+	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	/* get the actual number of items */
+	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
+			param->max_item_per_flow));
+	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
+
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = item_num;
+	tcp_tbl.max_item_num = item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					param->max_packet_size,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0)
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] =
+				pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
+				pkts, nb_pkts);
+		if (unprocess_num > 0)
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+	}
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *tbl __rte_unused)
+		void *tbl)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+	uint64_t current_time;
+
+	if ((gro_tbl->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
+						gro_tbl->max_packet_size,
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0)
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) * unprocess_num);
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *tbl __rte_unused,
-		uint64_t desired_gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *tbl,
+		uint64_t desired_gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
+
+	desired_gro_types = desired_gro_types &
+		gro_tbl->desired_gro_types;
+	if (desired_gro_types & RTE_GRO_TCP_IPV4)
+		return gro_tcp4_tbl_timeout_flush(
+				gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				gro_tbl->max_timeout_cycles,
+				out, max_nb_out);
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 02c9113..68d2fc6 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,7 +45,11 @@ extern "C" {
 
 /* max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
-#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
+
+/* TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 
 struct rte_gro_param {
@@ -118,14 +122,14 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
  *
  * @param pkts
  *  packet to reassemble. Besides, after this function finishes, it
- *  keeps the unprocessed packets (i.e. without data or unsupported
+ *  keeps the unprocessed packets (e.g. without data or unsupported
  *  GRO types).
  * @param nb_pkts
  *  the number of packets to reassemble.
  * @param tbl
  *  a pointer points to a GRO table.
  * @return
- *  return the number of unprocessed packets (i.e. without data or
+ *  return the number of unprocessed packets (e.g. without data or
  *  unsupported GRO types). If all packets are processed (merged or
  *  inserted into the table), return 0.
  */
diff --git a/lib/librte_gro/rte_gro_tcp4.c b/lib/librte_gro/rte_gro_tcp4.c
new file mode 100644
index 0000000..8f2aa86
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp4.c
@@ -0,0 +1,439 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "rte_gro_tcp4.h"
+
+void *gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	size_t size;
+	uint32_t entries_num;
+	struct gro_tcp4_tbl *tbl;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
+		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = (struct gro_tcp4_tbl *)rte_zmalloc_socket(
+			__func__,
+			sizeof(struct gro_tcp4_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp4_item) * entries_num;
+	tbl->items = (struct gro_tcp4_item *)rte_zmalloc_socket(
+			__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp4_key) * entries_num;
+	tbl->keys = (struct gro_tcp4_key *)rte_zmalloc_socket(
+			__func__,
+			size, RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_key_num = entries_num;
+	return tbl;
+}
+
+void gro_tcp4_tbl_destroy(void *tbl)
+{
+	struct gro_tcp4_tbl *tcp_tbl = (struct gro_tcp4_tbl *)tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+static struct rte_mbuf *get_mbuf_lastseg(struct rte_mbuf *pkt)
+{
+	struct rte_mbuf *lastseg = pkt;
+
+	while (lastseg->next)
+		lastseg = lastseg->next;
+
+	return lastseg;
+}
+
+/**
+ * merge two TCP/IPv4 packets without updating checksums.
+ */
+static int
+merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
+		struct rte_mbuf *pkt,
+		uint32_t max_packet_size,
+		int cmp)
+{
+	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
+	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr0;
+	uint16_t tcp_dl1;
+
+	if (cmp > 0) {
+		/* append the new packet into tail */
+		pkt_head = item_src->pkt;
+		pkt_tail = pkt;
+	} else {
+		/* append the new packet into head */
+		pkt_head = pkt;
+		pkt_tail = item_src->pkt;
+	}
+
+	/* parse the tail packet */
+	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_tail,
+				char *) + pkt_tail->l2_len);
+	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
+		pkt_tail->l3_len - pkt_tail->l4_len;
+
+	if (pkt_head->pkt_len + tcp_dl1 > max_packet_size)
+		return -1;
+
+	/* remove packet header for the tail packet */
+	rte_pktmbuf_adj(pkt_tail, pkt_tail->l2_len +
+			pkt_tail->l3_len +
+			pkt_tail->l4_len);
+
+	if (cmp > 0) {
+		/* chain the new packet to the tail of the original packet */
+		item_src->lastseg->next = pkt_tail;
+		/* update the lastseg for the item */
+		item_src->lastseg = get_mbuf_lastseg(pkt_tail);
+	} else {
+		/* chain the original packet to the tail of the new packet */
+		lastseg = get_mbuf_lastseg(pkt_head);
+		lastseg->next = pkt_tail;
+		/* update the item */
+		item_src->pkt = pkt_head;
+	}
+
+	/* parse the head packet */
+	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_head,
+				char *) + pkt_head->l2_len);
+
+	/* update IP header for the head packet */
+	ipv4_hdr0->total_length = rte_cpu_to_be_16(
+			rte_be_to_cpu_16(
+				ipv4_hdr0->total_length)
+			+ tcp_dl1);
+	ipv4_hdr0->packet_id = ipv4_hdr1->packet_id;
+
+	/* update mbuf metadata for the merged packet */
+	pkt_head->nb_segs += pkt_tail->nb_segs;
+	pkt_head->pkt_len += pkt_tail->pkt_len;
+	return 1;
+}
+
+static int
+check_seq_option(struct rte_mbuf *pkt,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t ip_id,
+		uint16_t tcp_hl,
+		uint16_t tcp_dl)
+{
+	struct ipv4_hdr *ipv4_hdr0;
+	struct tcp_hdr *tcp_hdr0;
+	uint16_t tcp_hl0, tcp_dl0, ip_id0;
+	uint32_t sent_seq0, sent_seq;
+	uint16_t len;
+
+	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
+				char *) + pkt->l2_len);
+	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt->l3_len);
+
+	ip_id0 = rte_be_to_cpu_16(ipv4_hdr0->packet_id);
+	tcp_hl0 = pkt->l4_len;
+	tcp_dl0 = rte_be_to_cpu_16(ipv4_hdr0->total_length) -
+		pkt->l3_len - tcp_hl0;
+
+	/* check if TCP option fields equal. If not, return 0. */
+	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
+	if ((tcp_hl != tcp_hl0) || ((len > 0) &&
+				(memcmp(tcp_hdr + 1,
+						tcp_hdr0 + 1,
+						len) != 0)))
+		return 0;
+
+	/* check if the two packets are neighbors */
+	sent_seq0 = rte_be_to_cpu_32(tcp_hdr0->sent_seq);
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	if ((sent_seq == (sent_seq0 + tcp_dl0)) &&
+			(ip_id == (ip_id0 + 1)))
+		/* append the new packet into tail */
+		return 1;
+	else if (((sent_seq + tcp_dl) == sent_seq0) &&
+		((ip_id + 1) == ip_id0))
+		/* append the new packet into head */
+		return -1;
+	else
+		return 0;
+}
+
+static uint32_t
+find_an_empty_item(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].pkt == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static uint32_t
+find_an_empty_key(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint32_t max_packet_size,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint16_t tcp_dl, ip_id;
+
+	struct tcp4_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, key_idx;
+	int cmp;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+
+	/* check if the packet should be processed */
+	if (pkt->l3_len < sizeof(struct ipv4_hdr))
+		return -1;
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+	/* if SYN, FIN, RST, PSH or URG is set, return immediately */
+	if ((tcp_hdr->tcp_flags & (~((uint8_t)TCP_ACK_FLAG))) ||
+			((tcp_hdr->tcp_flags & TCP_ACK_FLAG) == 0))
+		return -1;
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len
+		- pkt->l4_len;
+	if (tcp_dl == 0)
+		return -1;
+
+	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+
+	/* find a key and traverse all packets in its item group */
+	key.eth_saddr = eth_hdr->s_addr;
+	key.eth_daddr = eth_hdr->d_addr;
+	key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
+	key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
+	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
+	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* search for a key */
+		if ((tbl->keys[i].is_valid == 0) ||
+				(memcmp(&(tbl->keys[i].key), &key,
+						sizeof(struct tcp4_key)) != 0))
+			continue;
+
+		cur_idx = tbl->keys[i].start_index;
+		prev_idx = cur_idx;
+		while (cur_idx != INVALID_ARRAY_INDEX) {
+			cmp = check_seq_option(tbl->items[cur_idx].pkt,
+					tcp_hdr,
+					ip_id,
+					pkt->l4_len,
+					tcp_dl);
+			if (cmp != 0) {
+				if (merge_two_tcp4_packets(
+							&(tbl->items[cur_idx]),
+							pkt,
+							max_packet_size,
+							cmp) > 0)
+					return 1;
+				/**
+				 * fail to merge two packets since
+				 * it's beyond the max packet length.
+				 * Insert it into the item group.
+				 */
+				item_idx = find_an_empty_item(tbl);
+				if (item_idx == INVALID_ARRAY_INDEX)
+					return -1;
+				tbl->items[prev_idx].next_pkt_idx = item_idx;
+				tbl->items[item_idx].pkt = pkt;
+				tbl->items[item_idx].lastseg =
+					get_mbuf_lastseg(pkt);
+				tbl->items[item_idx].next_pkt_idx =
+					INVALID_ARRAY_INDEX;
+				tbl->items[item_idx].start_time = start_time;
+				tbl->item_num++;
+				return 0;
+			}
+			prev_idx = cur_idx;
+			cur_idx = tbl->items[cur_idx].next_pkt_idx;
+		}
+		/**
+		 * find a corresponding item group but fails to find
+		 * one packet to merge. Insert it into this item group.
+		 */
+		item_idx = find_an_empty_item(tbl);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+		tbl->items[item_idx].pkt = pkt;
+		tbl->items[item_idx].lastseg =
+			get_mbuf_lastseg(pkt);
+		tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+		tbl->items[item_idx].start_time = start_time;
+		tbl->item_num++;
+		return 0;
+	}
+
+	/**
+	 * merge fail as the given packet has
+	 * a new key. So insert a new key.
+	 */
+	item_idx = find_an_empty_item(tbl);
+	key_idx = find_an_empty_key(tbl);
+	/**
+	 * if current key or item number is greater than the max
+	 * value, don't insert the packet into the table and return
+	 * immediately.
+	 */
+	if (item_idx == INVALID_ARRAY_INDEX ||
+			key_idx == INVALID_ARRAY_INDEX)
+		return -1;
+	tbl->items[item_idx].pkt = pkt;
+	tbl->items[item_idx].lastseg = get_mbuf_lastseg(pkt);
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].start_time = start_time;
+	tbl->item_num++;
+
+	memcpy(&(tbl->keys[key_idx].key),
+			&key, sizeof(struct tcp4_key));
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->keys[key_idx].is_valid = 1;
+	tbl->key_num++;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+	uint64_t current_time;
+
+	current_time = rte_rdtsc();
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		if (tbl->keys[i].is_valid == 0)
+			continue;
+
+		j = tbl->keys[i].start_index;
+		while (j != INVALID_ARRAY_INDEX) {
+			if (current_time - tbl->items[j].start_time >=
+					timeout_cycles) {
+				out[k++] = tbl->items[j].pkt;
+				tbl->items[j].pkt = NULL;
+				tbl->item_num--;
+				j = tbl->items[j].next_pkt_idx;
+
+				/**
+				 * delete the key as all of
+				 * its packets are flushed.
+				 */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].is_valid = 0;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/**
+				 * left packets of this key won't be
+				 * timeout, so go to check other keys.
+				 */
+				break;
+		}
+	}
+	return k;
+}
+
+uint32_t gro_tcp4_tbl_item_num(void *tbl)
+{
+	struct gro_tcp4_tbl *gro_tbl = (struct gro_tcp4_tbl *)tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+	return 0;
+}
diff --git a/lib/librte_gro/rte_gro_tcp4.h b/lib/librte_gro/rte_gro_tcp4.h
new file mode 100644
index 0000000..27856f8
--- /dev/null
+++ b/lib/librte_gro/rte_gro_tcp4.h
@@ -0,0 +1,172 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_TCP4_H_
+#define _RTE_GRO_TCP4_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/* criteria of mergeing packets */
+struct tcp4_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;	/**< acknowledgment sequence number. */
+	uint16_t src_port;
+	uint16_t dst_port;
+};
+
+struct gro_tcp4_key {
+	struct tcp4_key key;
+	uint32_t start_index;	/**< the first packet index of the flow */
+	uint8_t is_valid;
+};
+
+struct gro_tcp4_item {
+	struct rte_mbuf *pkt;	/**< packet address. */
+	struct rte_mbuf *lastseg;	/**< last segment of the packet */
+	/* the time when the packet in added into the table */
+	uint64_t start_time;
+	uint32_t next_pkt_idx;	/**< next packet index. */
+};
+
+/**
+ * TCP/IPv4 reassembly table.
+ */
+struct gro_tcp4_tbl {
+	struct gro_tcp4_item *items;	/**< item array */
+	struct gro_tcp4_key *keys;	/**< key array */
+	uint32_t item_num;	/**< current item number */
+	uint32_t key_num;	/**< current key num */
+	uint32_t max_item_num;	/**< item array size */
+	uint32_t max_key_num;	/**< key array size */
+};
+
+/**
+ * This function creates a TCP/IPv4 reassembly table.
+ *
+ * @param socket_id
+ *  socket index where the Ethernet port connects to.
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP/IPv4 GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP/IPv4 GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP/IPv4 reassembly table.
+ * @param tbl
+ *  a pointer points to the TCP/IPv4 reassembly table.
+ */
+void gro_tcp4_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP/IPv4 reassembly table
+ * to merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
+ * or URG bit is set, are returned immediately. Packets which only have
+ * packet headers (i.e. without data) are also returned immediately.
+ * Otherwise, the packet is either merged, or inserted into the table.
+ * Besides, if there is no available space to insert the packet, this
+ * function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP/IPv4 reassembly table.
+ * @param max_packet_size
+ *  max packet length after merged
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ * @return
+ *  if the packet doesn't have data, or SYN, FIN, RST, PSH or URG bit is
+ *  set, or there is no available space in the table to insert a new
+ *  item or a new key, return a negative value. If the packet is merged
+ *  successfully, return an positive value. If the packet is inserted
+ *  into the table, return 0.
+ */
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint32_t max_packet_size,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP/IPv4 reassembly table
+ * to applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t
+gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in a TCP/IPv4
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP/IPv4 reassembly table.
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t
+gro_tcp4_tbl_item_num(void *tbl);
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v10 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-01 11:08                   ` [PATCH v10 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-07-01 11:08                   ` [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-01 11:08                   ` Jiayu Hu
  2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-01 11:08 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, stephen, jianfeng.tan, yliu, jingjing.wu,
	lei.a.yao, tiwei.bie, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  37 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 215 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ff8ffd2..cb359e1 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b0b340e..a5285b0 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,42 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.desired_gro_types = RTE_GRO_TCP_IPV4;
+		gro_ports[port_id].param.max_packet_size = UINT16_MAX;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b29328a..ed27c7a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..377d933 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -428,6 +430,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2b9a1ea..528c833 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-01 11:08                   ` [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-02 10:19                     ` Tan, Jianfeng
  2017-07-03  5:13                       ` Hu, Jiayu
  2017-07-04  9:03                     ` Yuanhan Liu
  1 sibling, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-02 10:19 UTC (permalink / raw)
  To: Jiayu Hu, dev
  Cc: konstantin.ananyev, stephen, yliu, jingjing.wu, lei.a.yao, tiwei.bie



On 7/1/2017 7:08 PM, Jiayu Hu wrote:
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
>      to merge packets.
> - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
>      reassembly table.
> - gro_tcp4_tbl_item_num: return the number of packets in a TCP/IPv4
>      reassembly table.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
>
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
>
> In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> array and a item array, where the key array keeps the criteria to merge
> packets and the item array keeps packet information.
>
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
>      merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
>
> Each element in the item array keeps the information of one packet. It
> mainly includes two parts:
> - pkt: packet address
> - next_pkt_index: the index of the next packet in the same item group.
>      All packets in the same item group are chained by next_pkt_index.
>      With next_pkt_index, we can locate all packets in the same item
>      group one by one.
>
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with one of the
>      following properties won't be processed:
> 	- SYN, FIN, RST or PSH bit is set;
> 	- packet payload length is 0.
> b. traverse the key array to find a key which has the same criteria
>      value with the incoming packet. If find, goto step c. Otherwise,
>      insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
>      key. Then traverse all packets in the item group via next_pkt_index.
>      If find one packet which can merge with the incoming one, merge them
>      together. If can't find, insert the packet into this item group.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   doc/guides/rel_notes/release_17_08.rst |   7 +
>   lib/librte_gro/Makefile                |   1 +
>   lib/librte_gro/rte_gro.c               | 123 ++++++++-
>   lib/librte_gro/rte_gro.h               |  10 +-
>   lib/librte_gro/rte_gro_tcp4.c          | 439 +++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_tcp4.h          | 172 +++++++++++++
>   6 files changed, 736 insertions(+), 16 deletions(-)
>   create mode 100644 lib/librte_gro/rte_gro_tcp4.c
>   create mode 100644 lib/librte_gro/rte_gro_tcp4.h
>
> diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> index 842f46f..f067247 100644
> --- a/doc/guides/rel_notes/release_17_08.rst
> +++ b/doc/guides/rel_notes/release_17_08.rst
> @@ -75,6 +75,13 @@ New Features
>   
>     Added support for firmwares with multiple Ethernet ports per physical port.
>   
> +* **Add Generic Receive Offload API support.**
> +
> +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> +  packets. GRO API assumes all inputted packets are with correct
> +  checksums. GRO API doesn't update checksums for merged packets. If
> +  inputted packets are IP fragmented, GRO API assumes they are complete
> +  packets (i.e. with L4 headers).
>   
>   Resolved Issues
>   ---------------
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 7e0f128..43e276e 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
>   
>   # source files
>   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp4.c
>   
>   # install this header file
>   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index 648835b..a4641f9 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,8 +32,11 @@
>   
>   #include <rte_malloc.h>
>   #include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
>   
>   #include "rte_gro.h"
> +#include "rte_gro_tcp4.h"
>   
>   typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
>   		uint16_t max_flow_num,
> @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
>   typedef void (*gro_tbl_destroy_fn)(void *tbl);
>   typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
>   
> -static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM] = {
> +	gro_tcp4_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM] = {
> +	gro_tcp4_tbl_destroy, NULL};
> +static gro_tbl_item_num_fn tbl_item_num_functions[
> +	RTE_GRO_TYPE_MAX_NUM] = {gro_tcp4_tbl_item_num, NULL};
>   
>   /**
>    * GRO table, which is used to merge packets. It keeps many reassembly
> @@ -130,27 +136,118 @@ void rte_gro_tbl_destroy(void *tbl)
>   }
>   
>   uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>   		uint16_t nb_pkts,
> -		const struct rte_gro_param *param __rte_unused)
> +		const struct rte_gro_param *param)
>   {
> -	return nb_pkts;
> +	uint16_t i;
> +	uint16_t nb_after_gro = nb_pkts;
> +	uint32_t item_num;
> +
> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	struct gro_tcp4_tbl tcp_tbl;
> +	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +	uint64_t current_time;
> +
> +	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	/* get the actual number of items */
> +	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> +			param->max_item_per_flow));
> +	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
> +
> +	tcp_tbl.keys = tcp_keys;
> +	tcp_tbl.items = tcp_items;
> +	tcp_tbl.key_num = 0;
> +	tcp_tbl.item_num = 0;
> +	tcp_tbl.max_key_num = item_num;
> +	tcp_tbl.max_item_num = item_num;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
> +			ret = gro_tcp4_reassemble(pkts[i],
> +					&tcp_tbl,
> +					param->max_packet_size,
> +					current_time);
> +			if (ret > 0)
> +				/* merge successfully */
> +				nb_after_gro--;
> +			else if (ret < 0)
> +				unprocess_pkts[unprocess_num++] =
> +					pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] =
> +				pkts[i];
> +	}
> +
> +	/* re-arrange GROed packets */
> +	if (nb_after_gro < nb_pkts) {
> +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
> +				pkts, nb_pkts);
> +		if (unprocess_num > 0)
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) *
> +					unprocess_num);
> +	}
> +	return nb_after_gro;
>   }
>   
>   uint16_t
> -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble(struct rte_mbuf **pkts,
>   		uint16_t nb_pkts,
> -		void *tbl __rte_unused)
> +		void *tbl)
>   {
> -	return nb_pkts;
> +	uint16_t i, unprocess_num = 0;
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> +	uint64_t current_time;
> +
> +	if ((gro_tbl->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	current_time = rte_rdtsc();
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
> +			if (gro_tcp4_reassemble(pkts[i],
> +						gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
> +						gro_tbl->max_packet_size,
> +						current_time) < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] = pkts[i];
> +	}
> +	if (unprocess_num > 0)
> +		memcpy(pkts, unprocess_pkts,
> +				sizeof(struct rte_mbuf *) * unprocess_num);
> +
> +	return unprocess_num;
>   }
>   
>   uint16_t
> -rte_gro_timeout_flush(void *tbl __rte_unused,
> -		uint64_t desired_gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(void *tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		uint16_t max_nb_out)
>   {
> +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> +
> +	desired_gro_types = desired_gro_types &
> +		gro_tbl->desired_gro_types;
> +	if (desired_gro_types & RTE_GRO_TCP_IPV4)
> +		return gro_tcp4_tbl_timeout_flush(
> +				gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
> +				gro_tbl->max_timeout_cycles,
> +				out, max_nb_out);
>   	return 0;
>   }
>   
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 02c9113..68d2fc6 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -45,7 +45,11 @@ extern "C" {
>   
>   /* max number of supported GRO types */
>   #define RTE_GRO_TYPE_MAX_NUM 64
> -#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
> +#define RTE_GRO_TYPE_SUPPORT_NUM 1	/**< current supported GRO num */
> +
> +/* TCP/IPv4 GRO flag */
> +#define RTE_GRO_TCP_IPV4_INDEX 0
> +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
>   
>   
>   struct rte_gro_param {
> @@ -118,14 +122,14 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>    *
>    * @param pkts
>    *  packet to reassemble. Besides, after this function finishes, it
> - *  keeps the unprocessed packets (i.e. without data or unsupported
> + *  keeps the unprocessed packets (e.g. without data or unsupported
>    *  GRO types).
>    * @param nb_pkts
>    *  the number of packets to reassemble.
>    * @param tbl
>    *  a pointer points to a GRO table.
>    * @return
> - *  return the number of unprocessed packets (i.e. without data or
> + *  return the number of unprocessed packets (e.g. without data or
>    *  unsupported GRO types). If all packets are processed (merged or
>    *  inserted into the table), return 0.
>    */
> diff --git a/lib/librte_gro/rte_gro_tcp4.c b/lib/librte_gro/rte_gro_tcp4.c
> new file mode 100644
> index 0000000..8f2aa86
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp4.c
> @@ -0,0 +1,439 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "rte_gro_tcp4.h"
> +
> +void *gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	size_t size;
> +	uint32_t entries_num;
> +	struct gro_tcp4_tbl *tbl;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
> +
> +	if (entries_num == 0)
> +		return NULL;
> +
> +	tbl = (struct gro_tcp4_tbl *)rte_zmalloc_socket(
> +			__func__,
> +			sizeof(struct gro_tcp4_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl == NULL)
> +		return NULL;
> +
> +	size = sizeof(struct gro_tcp4_item) * entries_num;
> +	tbl->items = (struct gro_tcp4_item *)rte_zmalloc_socket(
> +			__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl->items == NULL) {
> +		rte_free(tbl);
> +		return NULL;
> +	}
> +	tbl->max_item_num = entries_num;
> +
> +	size = sizeof(struct gro_tcp4_key) * entries_num;
> +	tbl->keys = (struct gro_tcp4_key *)rte_zmalloc_socket(
> +			__func__,
> +			size, RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl->keys == NULL) {
> +		rte_free(tbl->items);
> +		rte_free(tbl);
> +		return NULL;
> +	}
> +	tbl->max_key_num = entries_num;
> +	return tbl;
> +}
> +
> +void gro_tcp4_tbl_destroy(void *tbl)
> +{
> +	struct gro_tcp4_tbl *tcp_tbl = (struct gro_tcp4_tbl *)tbl;
> +
> +	if (tcp_tbl) {
> +		rte_free(tcp_tbl->items);
> +		rte_free(tcp_tbl->keys);
> +	}
> +	rte_free(tcp_tbl);
> +}
> +
> +static struct rte_mbuf *get_mbuf_lastseg(struct rte_mbuf *pkt)
> +{
> +	struct rte_mbuf *lastseg = pkt;
> +
> +	while (lastseg->next)
> +		lastseg = lastseg->next;
> +
> +	return lastseg;
> +}
> +
> +/**
> + * merge two TCP/IPv4 packets without updating checksums.
> + */
> +static int
> +merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> +		struct rte_mbuf *pkt,
> +		uint32_t max_packet_size,
> +		int cmp)
> +{
> +	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr0;
> +	uint16_t tcp_dl1;
> +
> +	if (cmp > 0) {
> +		/* append the new packet into tail */
> +		pkt_head = item_src->pkt;
> +		pkt_tail = pkt;
> +	} else {
> +		/* append the new packet into head */

Typo: append -> prepend

> +		pkt_head = pkt;
> +		pkt_tail = item_src->pkt;
> +	}
> +
> +	/* parse the tail packet */
> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_tail,
> +				char *) + pkt_tail->l2_len);
> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
> +		pkt_tail->l3_len - pkt_tail->l4_len;
> +
> +	if (pkt_head->pkt_len + tcp_dl1 > max_packet_size)
> +		return -1;
> +
> +	/* remove packet header for the tail packet */
> +	rte_pktmbuf_adj(pkt_tail, pkt_tail->l2_len +
> +			pkt_tail->l3_len +
> +			pkt_tail->l4_len);
> +
> +	if (cmp > 0) {
> +		/* chain the new packet to the tail of the original packet */
> +		item_src->lastseg->next = pkt_tail;
> +		/* update the lastseg for the item */
> +		item_src->lastseg = get_mbuf_lastseg(pkt_tail);
> +	} else {
> +		/* chain the original packet to the tail of the new packet */
> +		lastseg = get_mbuf_lastseg(pkt_head);
> +		lastseg->next = pkt_tail;
> +		/* update the item */
> +		item_src->pkt = pkt_head;
> +	}
> +
> +	/* parse the head packet */
> +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_head,
> +				char *) + pkt_head->l2_len);
> +
> +	/* update IP header for the head packet */
> +	ipv4_hdr0->total_length = rte_cpu_to_be_16(
> +			rte_be_to_cpu_16(
> +				ipv4_hdr0->total_length)
> +			+ tcp_dl1);
> +	ipv4_hdr0->packet_id = ipv4_hdr1->packet_id;

Why do we bother to change id field in IP header? I think it's not 
necessary.

> +
> +	/* update mbuf metadata for the merged packet */
> +	pkt_head->nb_segs += pkt_tail->nb_segs;
> +	pkt_head->pkt_len += pkt_tail->pkt_len;
> +	return 1;
> +}
> +
> +static int
> +check_seq_option(struct rte_mbuf *pkt,
> +		struct tcp_hdr *tcp_hdr,
> +		uint16_t ip_id,
> +		uint16_t tcp_hl,
> +		uint16_t tcp_dl)
> +{
> +	struct ipv4_hdr *ipv4_hdr0;
> +	struct tcp_hdr *tcp_hdr0;
> +	uint16_t tcp_hl0, tcp_dl0, ip_id0;
> +	uint32_t sent_seq0, sent_seq;
> +	uint16_t len;
> +
> +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> +				char *) + pkt->l2_len);
> +	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt->l3_len);
> +
> +	ip_id0 = rte_be_to_cpu_16(ipv4_hdr0->packet_id);
> +	tcp_hl0 = pkt->l4_len;
> +	tcp_dl0 = rte_be_to_cpu_16(ipv4_hdr0->total_length) -
> +		pkt->l3_len - tcp_hl0;
> +
> +	/* check if TCP option fields equal. If not, return 0. */
> +	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> +	if ((tcp_hl != tcp_hl0) || ((len > 0) &&
> +				(memcmp(tcp_hdr + 1,
> +						tcp_hdr0 + 1,
> +						len) != 0)))
> +		return 0;
> +
> +	/* check if the two packets are neighbors */
> +	sent_seq0 = rte_be_to_cpu_32(tcp_hdr0->sent_seq);
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	if ((sent_seq == (sent_seq0 + tcp_dl0)) &&
> +			(ip_id == (ip_id0 + 1)))
> +		/* append the new packet into tail */
> +		return 1;
> +	else if (((sent_seq + tcp_dl) == sent_seq0) &&
> +		((ip_id + 1) == ip_id0))
> +		/* append the new packet into head */
> +		return -1;
> +	else
> +		return 0;
> +}
> +
> +static uint32_t
> +find_an_empty_item(struct gro_tcp4_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_item_num; i++)
> +		if (tbl->items[i].pkt == NULL)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static uint32_t
> +find_an_empty_key(struct gro_tcp4_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_key_num; i++)
> +		if (tbl->keys[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp4_tbl *tbl,
> +		uint32_t max_packet_size,
> +		uint64_t start_time)
> +{
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint16_t tcp_dl, ip_id;
> +
> +	struct tcp4_key key;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint32_t i, key_idx;
> +	int cmp;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> +
> +	/* check if the packet should be processed */
> +	if (pkt->l3_len < sizeof(struct ipv4_hdr))
> +		return -1;

Unnecessary precheck as you already checked it's a valid IPv4 + TCP 
packet in the upper gro framework.

> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> +	/* if SYN, FIN, RST, 	or URG is set, return immediately */
> +	if ((tcp_hdr->tcp_flags & (~((uint8_t)TCP_ACK_FLAG))) ||
> +			((tcp_hdr->tcp_flags & TCP_ACK_FLAG) == 0))

This can be simplified to: if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)

And in the above comment, we state that CWR and ECE is also not in the 
scope.

> +		return -1;
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len
> +		- pkt->l4_len;
> +	if (tcp_dl == 0)
> +		return -1;
> +
> +	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +
> +	/* find a key and traverse all packets in its item group */
> +	key.eth_saddr = eth_hdr->s_addr;
> +	key.eth_daddr = eth_hdr->d_addr;
> +	key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> +	key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);

Why do we bother to care about the endian? We only use them for equality 
comparison.

> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		/* search for a key */
> +		if ((tbl->keys[i].is_valid == 0) ||
> +				(memcmp(&(tbl->keys[i].key), &key,
> +						sizeof(struct tcp4_key)) != 0))
> +			continue;

Please put below code out of the for loop to reduce indent. And further, 
better to keep this flow identification into an inline function as I 
think finally we will optimize this when there are lots of flows.

As Stephen suggested about the memset, I think it also applies to memcpy 
and memcmp here and anywhere for the key.

> +
> +		cur_idx = tbl->keys[i].start_index;
> +		prev_idx = cur_idx;
> +		while (cur_idx != INVALID_ARRAY_INDEX) {

do ... while {} can help to reduce one comparison.

> +			cmp = check_seq_option(tbl->items[cur_idx].pkt,
> +					tcp_hdr,


> +					ip_id,
> +					pkt->l4_len,
> +					tcp_dl);
> +			if (cmp != 0) {
> +				if (merge_two_tcp4_packets(
> +							&(tbl->items[cur_idx]),
> +							pkt,
> +							max_packet_size,
> +							cmp) > 0)
> +					return 1;
> +				/**
> +				 * fail to merge two packets since
> +				 * it's beyond the max packet length.
> +				 * Insert it into the item group.
> +				 */
> +				item_idx = find_an_empty_item(tbl);
> +				if (item_idx == INVALID_ARRAY_INDEX)
> +					return -1;
> +				tbl->items[prev_idx].next_pkt_idx = item_idx;
> +				tbl->items[item_idx].pkt = pkt;
> +				tbl->items[item_idx].lastseg =
> +					get_mbuf_lastseg(pkt);
> +				tbl->items[item_idx].next_pkt_idx =
> +					INVALID_ARRAY_INDEX;
> +				tbl->items[item_idx].start_time = start_time;
> +				tbl->item_num++;

Above code piece will be duplicated in the following. Please abstract 
them into a function named, for example, gro_tcp4_add_new_item().

> +				return 0;
> +			}
> +			prev_idx = cur_idx;
> +			cur_idx = tbl->items[cur_idx].next_pkt_idx;
> +		}
> +		/**
> +		 * find a corresponding item group but fails to find
> +		 * one packet to merge. Insert it into this item group.
> +		 */
> +		item_idx = find_an_empty_item(tbl);
> +		if (item_idx == INVALID_ARRAY_INDEX)
> +			return -1;
> +		tbl->items[prev_idx].next_pkt_idx = item_idx;
> +		tbl->items[item_idx].pkt = pkt;
> +		tbl->items[item_idx].lastseg =
> +			get_mbuf_lastseg(pkt);
> +		tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +		tbl->items[item_idx].start_time = start_time;
> +		tbl->item_num++;
> +		return 0;
> +	}
> +
> +	/**
> +	 * merge fail as the given packet has
> +	 * a new key. So insert a new key.
> +	 */
> +	item_idx = find_an_empty_item(tbl);
> +	key_idx = find_an_empty_key(tbl);
> +	/**
> +	 * if current key or item number is greater than the max
> +	 * value, don't insert the packet into the table and return
> +	 * immediately.
> +	 */
> +	if (item_idx == INVALID_ARRAY_INDEX ||
> +			key_idx == INVALID_ARRAY_INDEX)
> +		return -1;
> +	tbl->items[item_idx].pkt = pkt;
> +	tbl->items[item_idx].lastseg = get_mbuf_lastseg(pkt);
> +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +	tbl->items[item_idx].start_time = start_time;
> +	tbl->item_num++;
> +
> +	memcpy(&(tbl->keys[key_idx].key),
> +			&key, sizeof(struct tcp4_key));
> +	tbl->keys[key_idx].start_index = item_idx;
> +	tbl->keys[key_idx].is_valid = 1;
> +	tbl->key_num++;
> +
> +	return 0;
> +}
> +
> +uint16_t
> +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out)
> +{
> +	uint16_t k = 0;
> +	uint32_t i, j;
> +	uint64_t current_time;
> +
> +	current_time = rte_rdtsc();
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		/* all keys have been checked, return immediately */
> +		if (tbl->key_num == 0)
> +			return k;
> +
> +		if (tbl->keys[i].is_valid == 0)
> +			continue;
> +
> +		j = tbl->keys[i].start_index;
> +		while (j != INVALID_ARRAY_INDEX) {
> +			if (current_time - tbl->items[j].start_time >=
> +					timeout_cycles) {
> +				out[k++] = tbl->items[j].pkt;
> +				tbl->items[j].pkt = NULL;
> +				tbl->item_num--;
> +				j = tbl->items[j].next_pkt_idx;
> +
> +				/**
> +				 * delete the key as all of
> +				 * its packets are flushed.
> +				 */
> +				if (j == INVALID_ARRAY_INDEX) {
> +					tbl->keys[i].is_valid = 0;
> +					tbl->key_num--;
> +				} else
> +					/* update start_index of the key */
> +					tbl->keys[i].start_index = j;
> +
> +				if (k == nb_out)
> +					return k;
> +			} else
> +				/**
> +				 * left packets of this key won't be
> +				 * timeout, so go to check other keys.
> +				 */
> +				break;
> +		}
> +	}
> +	return k;
> +}
> +
> +uint32_t gro_tcp4_tbl_item_num(void *tbl)
> +{
> +	struct gro_tcp4_tbl *gro_tbl = (struct gro_tcp4_tbl *)tbl;
> +
> +	if (gro_tbl)
> +		return gro_tbl->item_num;
> +	return 0;
> +}
> diff --git a/lib/librte_gro/rte_gro_tcp4.h b/lib/librte_gro/rte_gro_tcp4.h
> new file mode 100644
> index 0000000..27856f8
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_tcp4.h
> @@ -0,0 +1,172 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_TCP4_H_
> +#define _RTE_GRO_TCP4_H_
> +
> +#define INVALID_ARRAY_INDEX 0xffffffffUL
> +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> +
> +/* criteria of mergeing packets */
> +struct tcp4_key {
> +	struct ether_addr eth_saddr;
> +	struct ether_addr eth_daddr;

Why we need to keep ether addr in the key? Suppose we are running a 
bonding port combined with phy port 0 and phy port1. Then no matter 
which dst mac addr is used, these packets belongs to the same flow.

> +	uint32_t ip_src_addr;
> +	uint32_t ip_dst_addr;
> +
> +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +};
> +
> +struct gro_tcp4_key {
> +	struct tcp4_key key;
> +	uint32_t start_index;	/**< the first packet index of the flow */
> +	uint8_t is_valid;
> +};
> +
> +struct gro_tcp4_item {
> +	struct rte_mbuf *pkt;	/**< packet address. */
> +	struct rte_mbuf *lastseg;	/**< last segment of the packet */
> +	/* the time when the packet in added into the table */
> +	uint64_t start_time;

I think the comment should be the time of the eldest packet in this 
item. And an explicit unit statement would be useful for understanding.

> +	uint32_t next_pkt_idx;	/**< next packet index. */

The comment is not good enough. We use this field to chain all packets 
belonging to the same flow but not mergeable in sequence.

What's more, if we change this field to pointer, we can make this 
structure universal for all gro engines, for example:

struct gro_item {
     struct gro_item *next;
     struct rte_mbuf *first_seg;
     struct rte_mbuf *last_seg;
     uint64_t start_time;
};

And I think if we store two more fields, seq_begin, and seq_end, in the 
item, we can avoid touching any mbufs' metadata.

> +};
> +
> +/**
> + * TCP/IPv4 reassembly table.
> + */
> +struct gro_tcp4_tbl {
> +	struct gro_tcp4_item *items;	/**< item array */
> +	struct gro_tcp4_key *keys;	/**< key array */
> +	uint32_t item_num;	/**< current item number */
> +	uint32_t key_num;	/**< current key num */
> +	uint32_t max_item_num;	/**< item array size */
> +	uint32_t max_key_num;	/**< key array size */
> +};

This structure could be universal for all gro engines;

struct gro_tbl {
     void *items; /* will use different initializer for  different gro 
engines */
     void *keys; /* will use different intializer for different gro 
engines */
     uint32_t item_num;
     ...
}

> +
> +/**
> + * This function creates a TCP/IPv4 reassembly table.
> + *
> + * @param socket_id
> + *  socket index where the Ethernet port connects to.
> + * @param max_flow_num
> + *  the maximum number of flows in the TCP/IPv4 GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow.
> + * @return
> + *  if create successfully, return a pointer which points to the
> + *  created TCP/IPv4 GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);

Fix the alignment.

> +
> +/**
> + * This function destroys a TCP/IPv4 reassembly table.
> + * @param tbl
> + *  a pointer points to the TCP/IPv4 reassembly table.
> + */
> +void gro_tcp4_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP/IPv4 reassembly table
> + * to merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
> + * or URG bit is set, are returned immediately. Packets which only have
> + * packet headers (i.e. without data) are also returned immediately.
> + * Otherwise, the packet is either merged, or inserted into the table.
> + * Besides, if there is no available space to insert the packet, this
> + * function returns immediately too.
> + *
> + * This function assumes the inputted packet is with correct IPv4 and
> + * TCP checksums. And if two packets are merged, it won't re-calculate
> + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> + * fragmented, it assumes the packet is complete (with TCP header).
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param tbl
> + *  a pointer that points to a TCP/IPv4 reassembly table.
> + * @param max_packet_size
> + *  max packet length after merged
> + * @start_time
> + *  the start time that the packet is inserted into the table
> + * @return
> + *  if the packet doesn't have data, or SYN, FIN, RST, PSH or URG bit is
> + *  set, or there is no available space in the table to insert a new
> + *  item or a new key, return a negative value. If the packet is merged
> + *  successfully, return an positive value. If the packet is inserted
> + *  into the table, return 0.
> + */
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp4_tbl *tbl,
> +		uint32_t max_packet_size,
> +		uint64_t start_time);

Ditto.

> +
> +/**
> + * This function flushes timeout packets in a TCP/IPv4 reassembly table
> + * to applications, and without updating checksums for merged packets.
> + * The max number of flushed timeout packets is the element number of
> + * the array which is used to keep flushed packets.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + *  the maximum time that packets can stay in the table.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the element number of out. It's also the max number of timeout
> + *  packets that can be flushed finally.
> + * @return
> + *  the number of packets that are returned.
> + */
> +uint16_t
> +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out);

Ditto.

> +
> +/**
> + * This function returns the number of the packets in a TCP/IPv4
> + * reassembly table.
> + *
> + * @param tbl
> + *  pointer points to a TCP/IPv4 reassembly table.
> + * @return
> + *  the number of packets in the table
> + */
> +uint32_t
> +gro_tcp4_tbl_item_num(void *tbl);
> +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
  2017-07-01 11:08                   ` [PATCH v10 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-02 10:19                     ` Tan, Jianfeng
  2017-07-03  5:56                       ` Hu, Jiayu
  2017-07-04  8:37                     ` Yuanhan Liu
  1 sibling, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-02 10:19 UTC (permalink / raw)
  To: Jiayu Hu, dev
  Cc: konstantin.ananyev, stephen, yliu, jingjing.wu, lei.a.yao, tiwei.bie



On 7/1/2017 7:08 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweight mode, the other is called heavyweight mode.
> If applications want to merge packets in a simple way and the number
> of packets is relatively small, they can use the lightweight mode.
> If applications need more fine-grained controls, they can choose the
> heavyweight mode.
>
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweight mode and processes N packets at a time. For applications,
> performing GRO in lightweight mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
>
> rte_gro_reassemble is the main reassembly API which is used in
> heavyweight mode and tries to merge N inputted packets with the packets
> in a givn GRO table. For applications, performing GRO in heavyweight
> mode is relatively complicated. Before performing GRO, applications need
> to create a GRO table by rte_gro_tbl_create. Then they can use
> rte_gro_reassemble to merge packets. The GROed packets are in the GRO
> table. If applications want to get them, applications need to manually
> flush them by flush API.

> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   config/common_base                 |   5 ++
>   lib/Makefile                       |   2 +
>   lib/librte_gro/Makefile            |  50 +++++++++++
>   lib/librte_gro/rte_gro.c           | 176 +++++++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro.h           | 176 +++++++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_version.map |  12 +++
>   mk/rte.app.mk                      |   1 +
>   7 files changed, 422 insertions(+)
>   create mode 100644 lib/librte_gro/Makefile
>   create mode 100644 lib/librte_gro/rte_gro.c
>   create mode 100644 lib/librte_gro/rte_gro.h
>   create mode 100644 lib/librte_gro/rte_gro_version.map
>
> diff --git a/config/common_base b/config/common_base
> index f6aafd1..167f5ef 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>   CONFIG_RTE_LIBRTE_PMD_VHOST=n
>   
>   #
> +# Compile GRO library
> +#
> +CONFIG_RTE_LIBRTE_GRO=y
> +
> +#
>   #Compile Xen domain0 support
>   #
>   CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/lib/Makefile b/lib/Makefile
> index 07e1fd0..ac1c2f6 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
>   DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
>   DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
>   DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
> +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
> +DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> new file mode 100644
> index 0000000..7e0f128
> --- /dev/null
> +++ b/lib/librte_gro/Makefile
> @@ -0,0 +1,50 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_gro.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +
> +EXPORT_MAP := rte_gro_version.map
> +
> +LIBABIVER := 1
> +
> +# source files
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +
> +# install this header file
> +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..648835b
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,176 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +
> +#include "rte_gro.h"
> +
> +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +typedef void (*gro_tbl_destroy_fn)(void *tbl);
> +typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
> +
> +static gro_tbl_create_fn tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_destroy_fn tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_item_num_fn tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
> +
> +/**
> + * GRO table, which is used to merge packets. It keeps many reassembly
> + * tables of desired GRO types. Applications need to create GRO tables
> + * before using rte_gro_reassemble to perform GRO.
> + */
> +struct gro_tbl {
> +	uint64_t desired_gro_types;	/**< GRO types to perform */
> +	/* max TTL measured in nanosecond */
> +	uint64_t max_timeout_cycles;
> +	/* max length of merged packet measured in byte */
> +	uint32_t max_packet_size;
> +	/* reassebly tables of desired GRO types */
> +	void *tbls[RTE_GRO_TYPE_MAX_NUM];
> +};
> +
> +void *rte_gro_tbl_create(const
> +		const struct rte_gro_param *param)

The name of this API and the definition of struct gro_tbl involve some 
confusion. A gro table contains gro tables? I suppose a better name is 
needed, for example, struct gro_ctl.

> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	struct gro_tbl *gro_tbl;
> +	uint64_t gro_type_flag = 0;
> +	uint8_t i, j;
> +
> +	gro_tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct gro_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			param->socket_id);
> +	if (gro_tbl == NULL)
> +		return NULL;
> +	gro_tbl->max_packet_size = param->max_packet_size;
> +	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
> +	gro_tbl->desired_gro_types = param->desired_gro_types;
> +
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +
> +		if ((param->desired_gro_types & gro_type_flag) == 0)
> +			continue;
> +		create_tbl_fn = tbl_create_functions[i];
> +		if (create_tbl_fn == NULL)
> +			continue;
> +
> +		gro_tbl->tbls[i] = create_tbl_fn(
> +				param->socket_id,
> +				param->max_flow_num,
> +				param->max_item_per_flow);

Here and somewhere else: the alignment seems not correct.
         gro_tbl->tbls[i] = create_tbl_fn(param->socket_id,
                                                                 /* keep 
all parameters aligned like this */
param->max_flow_num,
param->max_item_per_flow);
> +		if (gro_tbl->tbls[i] == NULL) {
> +			/* destroy all allocated tables */
> +			for (j = 0; j < i; j++) {
> +				gro_type_flag = 1 << j;
> +				if ((param->desired_gro_types & gro_type_flag) == 0)
> +					continue;
> +				destroy_tbl_fn = tbl_destroy_functions[j];
> +				if (destroy_tbl_fn)
> +					destroy_tbl_fn(gro_tbl->tbls[j]);
> +			}
> +			rte_free(gro_tbl);
> +			return NULL;
> +		}
> +	}
> +	return gro_tbl;
> +}
> +
> +void rte_gro_tbl_destroy(void *tbl)
> +{
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	if (gro_tbl == NULL)
> +		return;
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
> +			continue;
> +		destroy_tbl_fn = tbl_destroy_functions[i];
> +		if (destroy_tbl_fn)
> +			destroy_tbl_fn(gro_tbl->tbls[i]);
> +	}
> +	rte_free(gro_tbl);
> +}
> +
> +uint16_t
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		uint16_t nb_pkts,
> +		const struct rte_gro_param *param __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +uint16_t
> +rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> +		uint16_t nb_pkts,
> +		void *tbl __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +uint16_t
> +rte_gro_timeout_flush(void *tbl __rte_unused,
> +		uint64_t desired_gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> +
> +uint64_t rte_gro_tbl_item_num(void *tbl)

Does rte_gro_get_count() sound better?

> +{
> +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> +	gro_tbl_item_num_fn item_num_fn;
> +	uint64_t item_num = 0;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
> +			continue;
> +
> +		item_num_fn = tbl_item_num_functions[i];
> +		if (item_num_fn == NULL)
> +			continue;
> +		item_num += item_num_fn(gro_tbl->tbls[i]);
> +	}
> +	return item_num;
> +}
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> new file mode 100644
> index 0000000..02c9113
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.h
> @@ -0,0 +1,176 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_H_
> +#define _RTE_GRO_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * the max packets number that rte_gro_reassemble_burst can
> + * process in each invocation.
> + */
> +#define RTE_GRO_MAX_BURST_ITEM_NUM 128UL
> +
> +/* max number of supported GRO types */
> +#define RTE_GRO_TYPE_MAX_NUM 64
> +#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */
> +
> +
> +struct rte_gro_param {
> +	uint64_t desired_gro_types;	/**< desired GRO types */

Make it gro_types for simplicity.

> +	uint32_t max_packet_size;	/**< max length of merged packets */

Refer to the tcp4 gro implementation, this is the max size for tcp 
payload. But in principle, the 65535-byte limitation (including TCP 
header) is because IP header lenght is 2-byte long.

What does it means for other GRO engines then? I think these should be 
decided by each gro engine. And applications don't have to change them.

> +	uint16_t max_flow_num;	/**< max flow number */
> +	uint16_t max_item_per_flow;	/**< max packet number per flow */
> +
> +	/* socket index where the Ethernet port connects to */

The comment needs to be refined. We have different socket idx for port, 
for pmd thread. We just explain how this will be used: "socket index for 
allocating gro related data structure".

> +	uint16_t socket_id;
> +	/* max TTL for a packet in the GRO table, measured in nanosecond */
> +	uint64_t max_timeout_cycles;

We don't need to set it in lightweight mode. Please add this into the 
comment.

> +};
> +
> +/**
> + * This function create a GRO table, which is used to merge packets in
> + * rte_gro_reassemble.
> + *
> + * @param param
> + *  applications use it to pass needed parameters to create a GRO table.
> + * @return
> + *  if create successfully, return a pointer which points to the GRO
> + *  table. Otherwise, return NULL.
> + */
> +void *rte_gro_tbl_create(
> +		const struct rte_gro_param *param);

Merge above two lines into one.

> +/**
> + * This function destroys a GRO table.
> + */
> +void rte_gro_tbl_destroy(void *tbl);
> +
> +/**
> + * This is one of the main reassembly APIs, which merges numbers of
> + * packets at a time. It assumes that all inputted packets are with
> + * correct checksums. That is, applications should guarantee all
> + * inputted packets are correct. Besides, it doesn't re-calculate
> + * checksums for merged packets. If inputted packets are IP fragmented,
> + * this function assumes them are complete (i.e. with L4 header). After
> + * finishing processing, it returns all GROed packets to applications
> + * immediately.
> + *
> + * @param pkts
> + *  a pointer array which points to the packets to reassemble. Besides,
> + *  it keeps packet addresses for GROed packets.
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param param
> + *  applications use it to tell rte_gro_reassemble_burst what rules
> + *  are demanded.
> + * @return
> + *  the number of packets after been GROed.
> + */
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		const struct rte_gro_param *param);

Fix the alignment.

> +
> +/**
> + * Reassembly function, which tries to merge inputted packets with
> + * the packets in a given GRO table. This function assumes all inputted
> + * packet is with correct checksums. And it won't update checksums if
> + * two packets are merged. Besides, if inputted packets are IP
> + * fragmented, this function assumes they are complete packets (i.e.
> + * with L4 header).
> + *
> + * If the inputted packets don't have data or are with unsupported GRO
> + * types, they won't be processed and are returned to applications.
> + * Otherwise, the inputted packets are either merged or inserted into
> + * the table. If applications want get packets in the table, they need
> + * to call flush API.
> + *
> + * @param pkts
> + *  packet to reassemble. Besides, after this function finishes, it
> + *  keeps the unprocessed packets (i.e. without data or unsupported
> + *  GRO types).
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param tbl
> + *  a pointer points to a GRO table.
> + * @return
> + *  return the number of unprocessed packets (i.e. without data or
> + *  unsupported GRO types). If all packets are processed (merged or
> + *  inserted into the table), return 0.
> + */
> +uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		void *tbl);
> +
> +/**
> + * This function flushes the timeout packets from reassembly tables of
> + * desired GRO types. The max number of flushed timeout packets is the
> + * element number of the array which is used to keep the flushed packets.
> + *
> + * Besides, this function won't re-calculate checksums for merged
> + * packets in the tables. That is, the returned packets may be with
> + * wrong checksums.
> + *
> + * @param tbl
> + *  a pointer points to a GRO table object.
> + * @param desired_gro_types
> + * rte_gro_timeout_flush only processes packets which belong to the
> + * GRO types specified by desired_gro_types.
> + * @param out
> + *  a pointer array that is used to keep flushed timeout packets.
> + * @param nb_out
> + *  the element number of out. It's also the max number of timeout
> + *  packets that can be flushed finally.
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_timeout_flush(void *tbl,
> +		uint64_t desired_gro_types,
> +		struct rte_mbuf **out,
> +		uint16_t max_nb_out);
> +
> +/**
> + * This function returns the number of packets in a given GRO table.
> + * @param tbl
> + *  pointer points to a GRO table.
> + * @return
> + *  the number of packets in the table.
> + */
> +uint64_t rte_gro_tbl_item_num(void *tbl);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
> new file mode 100644
> index 0000000..358fb9d
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_version.map
> @@ -0,0 +1,12 @@
> +DPDK_17.08 {
> +	global:
> +
> +	rte_gro_tbl_create;
> +	rte_gro_tbl_destroy;

As stated earlier, here are the API names I suggested: 
rte_gro_ctl_create()/rte_gro_ctl_destroy()/rte_gro_get_count().

> +	rte_gro_reassemble_burst;
> +	rte_gro_reassemble;
> +	rte_gro_timeout_flush;
> +	rte_gro_tbl_item_num;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index bcaf1b3..fc3776d 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-02 10:19                     ` Tan, Jianfeng
@ 2017-07-03  5:13                       ` Hu, Jiayu
  0 siblings, 0 replies; 141+ messages in thread
From: Hu, Jiayu @ 2017-07-03  5:13 UTC (permalink / raw)
  To: Tan, Jianfeng, dev
  Cc: Ananyev, Konstantin, stephen, yliu, Wu, Jingjing, Yao, Lei A, Bie, Tiwei

Hi Jianfeng,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Sunday, July 2, 2017 6:20 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> stephen@networkplumber.org; yliu@fridaylinux.org; Wu, Jingjing
> <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>
> Subject: Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> 
> 
> On 7/1/2017 7:08 PM, Jiayu Hu wrote:
> > In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> > - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
> >      to merge packets.
> > - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> > - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
> >      reassembly table.
> > - gro_tcp4_tbl_item_num: return the number of packets in a TCP/IPv4
> >      reassembly table.
> > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
> >
> > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> > checksums for merged packets. If inputted packets are IP fragmented,
> > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> > headers).
> >
> > In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> > table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> > array and a item array, where the key array keeps the criteria to merge
> > packets and the item array keeps packet information.
> >
> > One key in the key array points to an item group, which consists of
> > packets which have the same criteria value. If two packets are able to
> > merge, they must be in the same item group. Each key in the key array
> > includes two parts:
> > - criteria: the criteria of merging packets. If two packets can be
> >      merged, they must have the same criteria value.
> > - start_index: the index of the first incoming packet of the item group.
> >
> > Each element in the item array keeps the information of one packet. It
> > mainly includes two parts:
> > - pkt: packet address
> > - next_pkt_index: the index of the next packet in the same item group.
> >      All packets in the same item group are chained by next_pkt_index.
> >      With next_pkt_index, we can locate all packets in the same item
> >      group one by one.
> >
> > To process an incoming packet needs three steps:
> > a. check if the packet should be processed. Packets with one of the
> >      following properties won't be processed:
> > 	- SYN, FIN, RST or PSH bit is set;
> > 	- packet payload length is 0.
> > b. traverse the key array to find a key which has the same criteria
> >      value with the incoming packet. If find, goto step c. Otherwise,
> >      insert a new key and insert the packet into the item array.
> > c. locate the first packet in the item group via the start_index in the
> >      key. Then traverse all packets in the item group via next_pkt_index.
> >      If find one packet which can merge with the incoming one, merge them
> >      together. If can't find, insert the packet into this item group.
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >   doc/guides/rel_notes/release_17_08.rst |   7 +
> >   lib/librte_gro/Makefile                |   1 +
> >   lib/librte_gro/rte_gro.c               | 123 ++++++++-
> >   lib/librte_gro/rte_gro.h               |  10 +-
> >   lib/librte_gro/rte_gro_tcp4.c          | 439
> +++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp4.h          | 172 +++++++++++++
> >   6 files changed, 736 insertions(+), 16 deletions(-)
> >   create mode 100644 lib/librte_gro/rte_gro_tcp4.c
> >   create mode 100644 lib/librte_gro/rte_gro_tcp4.h
> >
> > diff --git a/doc/guides/rel_notes/release_17_08.rst
> b/doc/guides/rel_notes/release_17_08.rst
> > index 842f46f..f067247 100644
> > --- a/doc/guides/rel_notes/release_17_08.rst
> > +++ b/doc/guides/rel_notes/release_17_08.rst
> > @@ -75,6 +75,13 @@ New Features
> >
> >     Added support for firmwares with multiple Ethernet ports per physical
> port.
> >
> > +* **Add Generic Receive Offload API support.**
> > +
> > +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> > +  packets. GRO API assumes all inputted packets are with correct
> > +  checksums. GRO API doesn't update checksums for merged packets. If
> > +  inputted packets are IP fragmented, GRO API assumes they are complete
> > +  packets (i.e. with L4 headers).
> >
> >   Resolved Issues
> >   ---------------
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > index 7e0f128..43e276e 100644
> > --- a/lib/librte_gro/Makefile
> > +++ b/lib/librte_gro/Makefile
> > @@ -43,6 +43,7 @@ LIBABIVER := 1
> >
> >   # source files
> >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp4.c
> >
> >   # install this header file
> >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > index 648835b..a4641f9 100644
> > --- a/lib/librte_gro/rte_gro.c
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -32,8 +32,11 @@
> >
> >   #include <rte_malloc.h>
> >   #include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +#include <rte_ethdev.h>
> >
> >   #include "rte_gro.h"
> > +#include "rte_gro_tcp4.h"
> >
> >   typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> >   		uint16_t max_flow_num,
> > @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t
> socket_id,
> >   typedef void (*gro_tbl_destroy_fn)(void *tbl);
> >   typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
> >
> > -static gro_tbl_create_fn
> tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
> > -static gro_tbl_destroy_fn
> tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
> > -static gro_tbl_item_num_fn
> tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
> > +static gro_tbl_create_fn
> tbl_create_functions[RTE_GRO_TYPE_MAX_NUM] = {
> > +	gro_tcp4_tbl_create, NULL};
> > +static gro_tbl_destroy_fn
> tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM] = {
> > +	gro_tcp4_tbl_destroy, NULL};
> > +static gro_tbl_item_num_fn tbl_item_num_functions[
> > +	RTE_GRO_TYPE_MAX_NUM] = {gro_tcp4_tbl_item_num, NULL};
> >
> >   /**
> >    * GRO table, which is used to merge packets. It keeps many reassembly
> > @@ -130,27 +136,118 @@ void rte_gro_tbl_destroy(void *tbl)
> >   }
> >
> >   uint16_t
> > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> >   		uint16_t nb_pkts,
> > -		const struct rte_gro_param *param __rte_unused)
> > +		const struct rte_gro_param *param)
> >   {
> > -	return nb_pkts;
> > +	uint16_t i;
> > +	uint16_t nb_after_gro = nb_pkts;
> > +	uint32_t item_num;
> > +
> > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > +	struct gro_tcp4_tbl tcp_tbl;
> > +	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM]
> = {0};
> > +	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> = {0};
> > +
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	uint16_t unprocess_num = 0;
> > +	int32_t ret;
> > +	uint64_t current_time;
> > +
> > +	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> > +		return nb_pkts;
> > +
> > +	/* get the actual number of items */
> > +	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> > +			param->max_item_per_flow));
> > +	item_num = RTE_MIN(item_num,
> RTE_GRO_MAX_BURST_ITEM_NUM);
> > +
> > +	tcp_tbl.keys = tcp_keys;
> > +	tcp_tbl.items = tcp_items;
> > +	tcp_tbl.key_num = 0;
> > +	tcp_tbl.item_num = 0;
> > +	tcp_tbl.max_key_num = item_num;
> > +	tcp_tbl.max_item_num = item_num;
> > +
> > +	current_time = rte_rdtsc();
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> > +				(pkts[i]->packet_type &
> RTE_PTYPE_L4_TCP)) {
> > +			ret = gro_tcp4_reassemble(pkts[i],
> > +					&tcp_tbl,
> > +					param->max_packet_size,
> > +					current_time);
> > +			if (ret > 0)
> > +				/* merge successfully */
> > +				nb_after_gro--;
> > +			else if (ret < 0)
> > +				unprocess_pkts[unprocess_num++] =
> > +					pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] =
> > +				pkts[i];
> > +	}
> > +
> > +	/* re-arrange GROed packets */
> > +	if (nb_after_gro < nb_pkts) {
> > +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
> > +				pkts, nb_pkts);
> > +		if (unprocess_num > 0)
> > +			memcpy(&pkts[i], unprocess_pkts,
> > +					sizeof(struct rte_mbuf *) *
> > +					unprocess_num);
> > +	}
> > +	return nb_after_gro;
> >   }
> >
> >   uint16_t
> > -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble(struct rte_mbuf **pkts,
> >   		uint16_t nb_pkts,
> > -		void *tbl __rte_unused)
> > +		void *tbl)
> >   {
> > -	return nb_pkts;
> > +	uint16_t i, unprocess_num = 0;
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> > +	uint64_t current_time;
> > +
> > +	if ((gro_tbl->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> > +		return nb_pkts;
> > +
> > +	current_time = rte_rdtsc();
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> > +				(pkts[i]->packet_type &
> RTE_PTYPE_L4_TCP)) {
> > +			if (gro_tcp4_reassemble(pkts[i],
> > +						gro_tbl-
> >tbls[RTE_GRO_TCP_IPV4_INDEX],
> > +						gro_tbl->max_packet_size,
> > +						current_time) < 0)
> > +				unprocess_pkts[unprocess_num++] = pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] = pkts[i];
> > +	}
> > +	if (unprocess_num > 0)
> > +		memcpy(pkts, unprocess_pkts,
> > +				sizeof(struct rte_mbuf *) * unprocess_num);
> > +
> > +	return unprocess_num;
> >   }
> >
> >   uint16_t
> > -rte_gro_timeout_flush(void *tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		uint16_t max_nb_out __rte_unused)
> > +rte_gro_timeout_flush(void *tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		uint16_t max_nb_out)
> >   {
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> > +
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & RTE_GRO_TCP_IPV4)
> > +		return gro_tcp4_tbl_timeout_flush(
> > +				gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
> > +				gro_tbl->max_timeout_cycles,
> > +				out, max_nb_out);
> >   	return 0;
> >   }
> >
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > index 02c9113..68d2fc6 100644
> > --- a/lib/librte_gro/rte_gro.h
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -45,7 +45,11 @@ extern "C" {
> >
> >   /* max number of supported GRO types */
> >   #define RTE_GRO_TYPE_MAX_NUM 64
> > -#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported
> GRO num */
> > +#define RTE_GRO_TYPE_SUPPORT_NUM 1	/**< current supported
> GRO num */
> > +
> > +/* TCP/IPv4 GRO flag */
> > +#define RTE_GRO_TCP_IPV4_INDEX 0
> > +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
> >
> >
> >   struct rte_gro_param {
> > @@ -118,14 +122,14 @@ uint16_t rte_gro_reassemble_burst(struct
> rte_mbuf **pkts,
> >    *
> >    * @param pkts
> >    *  packet to reassemble. Besides, after this function finishes, it
> > - *  keeps the unprocessed packets (i.e. without data or unsupported
> > + *  keeps the unprocessed packets (e.g. without data or unsupported
> >    *  GRO types).
> >    * @param nb_pkts
> >    *  the number of packets to reassemble.
> >    * @param tbl
> >    *  a pointer points to a GRO table.
> >    * @return
> > - *  return the number of unprocessed packets (i.e. without data or
> > + *  return the number of unprocessed packets (e.g. without data or
> >    *  unsupported GRO types). If all packets are processed (merged or
> >    *  inserted into the table), return 0.
> >    */
> > diff --git a/lib/librte_gro/rte_gro_tcp4.c b/lib/librte_gro/rte_gro_tcp4.c
> > new file mode 100644
> > index 0000000..8f2aa86
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp4.c
> > @@ -0,0 +1,439 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +
> > +#include <rte_ethdev.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "rte_gro_tcp4.h"
> > +
> > +void *gro_tcp4_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow)
> > +{
> > +	size_t size;
> > +	uint32_t entries_num;
> > +	struct gro_tcp4_tbl *tbl;
> > +
> > +	entries_num = max_flow_num * max_item_per_flow;
> > +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> > +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
> > +
> > +	if (entries_num == 0)
> > +		return NULL;
> > +
> > +	tbl = (struct gro_tcp4_tbl *)rte_zmalloc_socket(
> > +			__func__,
> > +			sizeof(struct gro_tcp4_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	if (tbl == NULL)
> > +		return NULL;
> > +
> > +	size = sizeof(struct gro_tcp4_item) * entries_num;
> > +	tbl->items = (struct gro_tcp4_item *)rte_zmalloc_socket(
> > +			__func__,
> > +			size,
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	if (tbl->items == NULL) {
> > +		rte_free(tbl);
> > +		return NULL;
> > +	}
> > +	tbl->max_item_num = entries_num;
> > +
> > +	size = sizeof(struct gro_tcp4_key) * entries_num;
> > +	tbl->keys = (struct gro_tcp4_key *)rte_zmalloc_socket(
> > +			__func__,
> > +			size, RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	if (tbl->keys == NULL) {
> > +		rte_free(tbl->items);
> > +		rte_free(tbl);
> > +		return NULL;
> > +	}
> > +	tbl->max_key_num = entries_num;
> > +	return tbl;
> > +}
> > +
> > +void gro_tcp4_tbl_destroy(void *tbl)
> > +{
> > +	struct gro_tcp4_tbl *tcp_tbl = (struct gro_tcp4_tbl *)tbl;
> > +
> > +	if (tcp_tbl) {
> > +		rte_free(tcp_tbl->items);
> > +		rte_free(tcp_tbl->keys);
> > +	}
> > +	rte_free(tcp_tbl);
> > +}
> > +
> > +static struct rte_mbuf *get_mbuf_lastseg(struct rte_mbuf *pkt)
> > +{
> > +	struct rte_mbuf *lastseg = pkt;
> > +
> > +	while (lastseg->next)
> > +		lastseg = lastseg->next;
> > +
> > +	return lastseg;
> > +}
> > +
> > +/**
> > + * merge two TCP/IPv4 packets without updating checksums.
> > + */
> > +static int
> > +merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> > +		struct rte_mbuf *pkt,
> > +		uint32_t max_packet_size,
> > +		int cmp)
> > +{
> > +	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr0;
> > +	uint16_t tcp_dl1;
> > +
> > +	if (cmp > 0) {
> > +		/* append the new packet into tail */
> > +		pkt_head = item_src->pkt;
> > +		pkt_tail = pkt;
> > +	} else {
> > +		/* append the new packet into head */
> 
> Typo: append -> prepend

Thanks, I will modify it.

> 
> > +		pkt_head = pkt;
> > +		pkt_tail = item_src->pkt;
> > +	}
> > +
> > +	/* parse the tail packet */
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_tail,
> > +				char *) + pkt_tail->l2_len);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
> > +		pkt_tail->l3_len - pkt_tail->l4_len;
> > +
> > +	if (pkt_head->pkt_len + tcp_dl1 > max_packet_size)
> > +		return -1;
> > +
> > +	/* remove packet header for the tail packet */
> > +	rte_pktmbuf_adj(pkt_tail, pkt_tail->l2_len +
> > +			pkt_tail->l3_len +
> > +			pkt_tail->l4_len);
> > +
> > +	if (cmp > 0) {
> > +		/* chain the new packet to the tail of the original packet */
> > +		item_src->lastseg->next = pkt_tail;
> > +		/* update the lastseg for the item */
> > +		item_src->lastseg = get_mbuf_lastseg(pkt_tail);
> > +	} else {
> > +		/* chain the original packet to the tail of the new packet */
> > +		lastseg = get_mbuf_lastseg(pkt_head);
> > +		lastseg->next = pkt_tail;
> > +		/* update the item */
> > +		item_src->pkt = pkt_head;
> > +	}
> > +
> > +	/* parse the head packet */
> > +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_head,
> > +				char *) + pkt_head->l2_len);
> > +
> > +	/* update IP header for the head packet */
> > +	ipv4_hdr0->total_length = rte_cpu_to_be_16(
> > +			rte_be_to_cpu_16(
> > +				ipv4_hdr0->total_length)
> > +			+ tcp_dl1);
> > +	ipv4_hdr0->packet_id = ipv4_hdr1->packet_id;
> 
> Why do we bother to change id field in IP header? I think it's not
> necessary.

Currently, when merge packets, we check if IP ID is consecutive. If two
packets (p1 and p2) are merged and we don't update the IP ID for the
merged packet p1-p2, the IP ID of p1-p2 is the IP ID of p1. When p3
comes, we can't merge it with p1-p2, since their IP IDs are not consecutive.
So I update IP ID when two packets are merged. If we don't update IP ID,
that means we shouldn't check if IP ID is consecutive when merge packets.
But Konstantin suggests me to check the IP ID. Maybe we need to confirm
with him again.

> 
> > +
> > +	/* update mbuf metadata for the merged packet */
> > +	pkt_head->nb_segs += pkt_tail->nb_segs;
> > +	pkt_head->pkt_len += pkt_tail->pkt_len;
> > +	return 1;
> > +}
> > +
> > +static int
> > +check_seq_option(struct rte_mbuf *pkt,
> > +		struct tcp_hdr *tcp_hdr,
> > +		uint16_t ip_id,
> > +		uint16_t tcp_hl,
> > +		uint16_t tcp_dl)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr0;
> > +	struct tcp_hdr *tcp_hdr0;
> > +	uint16_t tcp_hl0, tcp_dl0, ip_id0;
> > +	uint32_t sent_seq0, sent_seq;
> > +	uint16_t len;
> > +
> > +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				char *) + pkt->l2_len);
> > +	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt->l3_len);
> > +
> > +	ip_id0 = rte_be_to_cpu_16(ipv4_hdr0->packet_id);
> > +	tcp_hl0 = pkt->l4_len;
> > +	tcp_dl0 = rte_be_to_cpu_16(ipv4_hdr0->total_length) -
> > +		pkt->l3_len - tcp_hl0;
> > +
> > +	/* check if TCP option fields equal. If not, return 0. */
> > +	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> > +	if ((tcp_hl != tcp_hl0) || ((len > 0) &&
> > +				(memcmp(tcp_hdr + 1,
> > +						tcp_hdr0 + 1,
> > +						len) != 0)))
> > +		return 0;
> > +
> > +	/* check if the two packets are neighbors */
> > +	sent_seq0 = rte_be_to_cpu_32(tcp_hdr0->sent_seq);
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	if ((sent_seq == (sent_seq0 + tcp_dl0)) &&
> > +			(ip_id == (ip_id0 + 1)))
> > +		/* append the new packet into tail */
> > +		return 1;
> > +	else if (((sent_seq + tcp_dl) == sent_seq0) &&
> > +		((ip_id + 1) == ip_id0))
> > +		/* append the new packet into head */
> > +		return -1;
> > +	else
> > +		return 0;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_item(struct gro_tcp4_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++)
> > +		if (tbl->items[i].pkt == NULL)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_key(struct gro_tcp4_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++)
> > +		if (tbl->keys[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp4_tbl *tbl,
> > +		uint32_t max_packet_size,
> > +		uint64_t start_time)
> > +{
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	uint16_t tcp_dl, ip_id;
> > +
> > +	struct tcp4_key key;
> > +	uint32_t cur_idx, prev_idx, item_idx;
> > +	uint32_t i, key_idx;
> > +	int cmp;
> > +
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> > +
> > +	/* check if the packet should be processed */
> > +	if (pkt->l3_len < sizeof(struct ipv4_hdr))
> > +		return -1;
> 
> Unnecessary precheck as you already checked it's a valid IPv4 + TCP
> packet in the upper gro framework.

Thanks, I will remove it.

> 
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> > +	/* if SYN, FIN, RST, 	or URG is set, return immediately */
> > +	if ((tcp_hdr->tcp_flags & (~((uint8_t)TCP_ACK_FLAG))) ||
> > +			((tcp_hdr->tcp_flags & TCP_ACK_FLAG) == 0))
> 
> This can be simplified to: if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
> 
> And in the above comment, we state that CWR and ECE is also not in the
> scope.

Thanks, I will change it.

> 
> > +		return -1;
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len
> > +		- pkt->l4_len;
> > +	if (tcp_dl == 0)
> > +		return -1;
> > +
> > +	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> > +
> > +	/* find a key and traverse all packets in its item group */
> > +	key.eth_saddr = eth_hdr->s_addr;
> > +	key.eth_daddr = eth_hdr->d_addr;
> > +	key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > +	key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> 
> Why do we bother to care about the endian? We only use them for equality
> comparison.

Oh yes, it's unnecessary to change the endian here. I will modify it.

> 
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		/* search for a key */
> > +		if ((tbl->keys[i].is_valid == 0) ||
> > +				(memcmp(&(tbl->keys[i].key), &key,
> > +						sizeof(struct tcp4_key)) !=
> 0))
> > +			continue;
> 
> Please put below code out of the for loop to reduce indent. And further,
> better to keep this flow identification into an inline function as I
> think finally we will optimize this when there are lots of flows.
> 
> As Stephen suggested about the memset, I think it also applies to memcpy
> and memcmp here and anywhere for the key.

Thanks, I will use an inline function to compare key.

> 
> > +
> > +		cur_idx = tbl->keys[i].start_index;
> > +		prev_idx = cur_idx;
> > +		while (cur_idx != INVALID_ARRAY_INDEX) {
> 
> do ... while {} can help to reduce one comparison.

Thanks, I will change it.

> 
> > +			cmp = check_seq_option(tbl->items[cur_idx].pkt,
> > +					tcp_hdr,
> 
> 
> > +					ip_id,
> > +					pkt->l4_len,
> > +					tcp_dl);
> > +			if (cmp != 0) {
> > +				if (merge_two_tcp4_packets(
> > +							&(tbl-
> >items[cur_idx]),
> > +							pkt,
> > +							max_packet_size,
> > +							cmp) > 0)
> > +					return 1;
> > +				/**
> > +				 * fail to merge two packets since
> > +				 * it's beyond the max packet length.
> > +				 * Insert it into the item group.
> > +				 */
> > +				item_idx = find_an_empty_item(tbl);
> > +				if (item_idx == INVALID_ARRAY_INDEX)
> > +					return -1;
> > +				tbl->items[prev_idx].next_pkt_idx =
> item_idx;
> > +				tbl->items[item_idx].pkt = pkt;
> > +				tbl->items[item_idx].lastseg =
> > +					get_mbuf_lastseg(pkt);
> > +				tbl->items[item_idx].next_pkt_idx =
> > +					INVALID_ARRAY_INDEX;
> > +				tbl->items[item_idx].start_time = start_time;
> > +				tbl->item_num++;
> 
> Above code piece will be duplicated in the following. Please abstract
> them into a function named, for example, gro_tcp4_add_new_item().

Thanks, I will modify it.

> 
> > +				return 0;
> > +			}
> > +			prev_idx = cur_idx;
> > +			cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > +		}
> > +		/**
> > +		 * find a corresponding item group but fails to find
> > +		 * one packet to merge. Insert it into this item group.
> > +		 */
> > +		item_idx = find_an_empty_item(tbl);
> > +		if (item_idx == INVALID_ARRAY_INDEX)
> > +			return -1;
> > +		tbl->items[prev_idx].next_pkt_idx = item_idx;
> > +		tbl->items[item_idx].pkt = pkt;
> > +		tbl->items[item_idx].lastseg =
> > +			get_mbuf_lastseg(pkt);
> > +		tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +		tbl->items[item_idx].start_time = start_time;
> > +		tbl->item_num++;
> > +		return 0;
> > +	}
> > +
> > +	/**
> > +	 * merge fail as the given packet has
> > +	 * a new key. So insert a new key.
> > +	 */
> > +	item_idx = find_an_empty_item(tbl);
> > +	key_idx = find_an_empty_key(tbl);
> > +	/**
> > +	 * if current key or item number is greater than the max
> > +	 * value, don't insert the packet into the table and return
> > +	 * immediately.
> > +	 */
> > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > +			key_idx == INVALID_ARRAY_INDEX)
> > +		return -1;
> > +	tbl->items[item_idx].pkt = pkt;
> > +	tbl->items[item_idx].lastseg = get_mbuf_lastseg(pkt);
> > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +	tbl->items[item_idx].start_time = start_time;
> > +	tbl->item_num++;
> > +
> > +	memcpy(&(tbl->keys[key_idx].key),
> > +			&key, sizeof(struct tcp4_key));
> > +	tbl->keys[key_idx].start_index = item_idx;
> > +	tbl->keys[key_idx].is_valid = 1;
> > +	tbl->key_num++;
> > +
> > +	return 0;
> > +}
> > +
> > +uint16_t
> > +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		uint16_t nb_out)
> > +{
> > +	uint16_t k = 0;
> > +	uint32_t i, j;
> > +	uint64_t current_time;
> > +
> > +	current_time = rte_rdtsc();
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		/* all keys have been checked, return immediately */
> > +		if (tbl->key_num == 0)
> > +			return k;
> > +
> > +		if (tbl->keys[i].is_valid == 0)
> > +			continue;
> > +
> > +		j = tbl->keys[i].start_index;
> > +		while (j != INVALID_ARRAY_INDEX) {
> > +			if (current_time - tbl->items[j].start_time >=
> > +					timeout_cycles) {
> > +				out[k++] = tbl->items[j].pkt;
> > +				tbl->items[j].pkt = NULL;
> > +				tbl->item_num--;
> > +				j = tbl->items[j].next_pkt_idx;
> > +
> > +				/**
> > +				 * delete the key as all of
> > +				 * its packets are flushed.
> > +				 */
> > +				if (j == INVALID_ARRAY_INDEX) {
> > +					tbl->keys[i].is_valid = 0;
> > +					tbl->key_num--;
> > +				} else
> > +					/* update start_index of the key */
> > +					tbl->keys[i].start_index = j;
> > +
> > +				if (k == nb_out)
> > +					return k;
> > +			} else
> > +				/**
> > +				 * left packets of this key won't be
> > +				 * timeout, so go to check other keys.
> > +				 */
> > +				break;
> > +		}
> > +	}
> > +	return k;
> > +}
> > +
> > +uint32_t gro_tcp4_tbl_item_num(void *tbl)
> > +{
> > +	struct gro_tcp4_tbl *gro_tbl = (struct gro_tcp4_tbl *)tbl;
> > +
> > +	if (gro_tbl)
> > +		return gro_tbl->item_num;
> > +	return 0;
> > +}
> > diff --git a/lib/librte_gro/rte_gro_tcp4.h b/lib/librte_gro/rte_gro_tcp4.h
> > new file mode 100644
> > index 0000000..27856f8
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp4.h
> > @@ -0,0 +1,172 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_TCP4_H_
> > +#define _RTE_GRO_TCP4_H_
> > +
> > +#define INVALID_ARRAY_INDEX 0xffffffffUL
> > +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > +
> > +/* criteria of mergeing packets */
> > +struct tcp4_key {
> > +	struct ether_addr eth_saddr;
> > +	struct ether_addr eth_daddr;
> 
> Why we need to keep ether addr in the key? Suppose we are running a
> bonding port combined with phy port 0 and phy port1. Then no matter
> which dst mac addr is used, these packets belongs to the same flow.
> 

Currently, we can't handle this case. And as I know, linux also checks
ethernet addresses. You mean we need to handle this case?

> > +	uint32_t ip_src_addr;
> > +	uint32_t ip_dst_addr;
> > +
> > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > +	uint16_t src_port;
> > +	uint16_t dst_port;
> > +};
> > +
> > +struct gro_tcp4_key {
> > +	struct tcp4_key key;
> > +	uint32_t start_index;	/**< the first packet index of the flow */
> > +	uint8_t is_valid;
> > +};
> > +
> > +struct gro_tcp4_item {
> > +	struct rte_mbuf *pkt;	/**< packet address. */
> > +	struct rte_mbuf *lastseg;	/**< last segment of the packet */
> > +	/* the time when the packet in added into the table */
> > +	uint64_t start_time;
> 
> I think the comment should be the time of the eldest packet in this
> item. And an explicit unit statement would be useful for understanding.

Thanks, I will modify it.

> 
> > +	uint32_t next_pkt_idx;	/**< next packet index. */
> 
> The comment is not good enough. We use this field to chain all packets
> belonging to the same flow but not mergeable in sequence.

Thanks, I will change the comment.

> 
> What's more, if we change this field to pointer, we can make this
> structure universal for all gro engines, for example:
> 
> struct gro_item {
>      struct gro_item *next;
>      struct rte_mbuf *first_seg;
>      struct rte_mbuf *last_seg;
>      uint64_t start_time;
> };
> 
> And I think if we store two more fields, seq_begin, and seq_end, in the
> item, we can avoid touching any mbufs' metadata.

Yes, thanks.

> 
> > +};
> > +
> > +/**
> > + * TCP/IPv4 reassembly table.
> > + */
> > +struct gro_tcp4_tbl {
> > +	struct gro_tcp4_item *items;	/**< item array */
> > +	struct gro_tcp4_key *keys;	/**< key array */
> > +	uint32_t item_num;	/**< current item number */
> > +	uint32_t key_num;	/**< current key num */
> > +	uint32_t max_item_num;	/**< item array size */
> > +	uint32_t max_key_num;	/**< key array size */
> > +};
> 
> This structure could be universal for all gro engines;
> 
> struct gro_tbl {
>      void *items; /* will use different initializer for  different gro
> engines */
>      void *keys; /* will use different intializer for different gro
> engines */
>      uint32_t item_num;
>      ...
> }

But I still think it's not a good idea to define a same table structure for
all GRO types, since we only have TCP/IPv4 GRO now and other GRO
types may use different reassembly algorithm.

> 
> > +
> > +/**
> > + * This function creates a TCP/IPv4 reassembly table.
> > + *
> > + * @param socket_id
> > + *  socket index where the Ethernet port connects to.
> > + * @param max_flow_num
> > + *  the maximum number of flows in the TCP/IPv4 GRO table
> > + * @param max_item_per_flow
> > + *  the maximum packet number per flow.
> > + * @return
> > + *  if create successfully, return a pointer which points to the
> > + *  created TCP/IPv4 GRO table. Otherwise, return NULL.
> > + */
> > +void *gro_tcp4_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> 
> Fix the alignment.

Thanks, I will modify it.

> 
> > +
> > +/**
> > + * This function destroys a TCP/IPv4 reassembly table.
> > + * @param tbl
> > + *  a pointer points to the TCP/IPv4 reassembly table.
> > + */
> > +void gro_tcp4_tbl_destroy(void *tbl);
> > +
> > +/**
> > + * This function searches for a packet in the TCP/IPv4 reassembly table
> > + * to merge with the inputted one. To merge two packets is to chain them
> > + * together and update packet headers. Packets, whose SYN, FIN, RST,
> PSH
> > + * or URG bit is set, are returned immediately. Packets which only have
> > + * packet headers (i.e. without data) are also returned immediately.
> > + * Otherwise, the packet is either merged, or inserted into the table.
> > + * Besides, if there is no available space to insert the packet, this
> > + * function returns immediately too.
> > + *
> > + * This function assumes the inputted packet is with correct IPv4 and
> > + * TCP checksums. And if two packets are merged, it won't re-calculate
> > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> > + * fragmented, it assumes the packet is complete (with TCP header).
> > + *
> > + * @param pkt
> > + *  packet to reassemble.
> > + * @param tbl
> > + *  a pointer that points to a TCP/IPv4 reassembly table.
> > + * @param max_packet_size
> > + *  max packet length after merged
> > + * @start_time
> > + *  the start time that the packet is inserted into the table
> > + * @return
> > + *  if the packet doesn't have data, or SYN, FIN, RST, PSH or URG bit is
> > + *  set, or there is no available space in the table to insert a new
> > + *  item or a new key, return a negative value. If the packet is merged
> > + *  successfully, return an positive value. If the packet is inserted
> > + *  into the table, return 0.
> > + */
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp4_tbl *tbl,
> > +		uint32_t max_packet_size,
> > +		uint64_t start_time);
> 
> Ditto.
> 
> > +
> > +/**
> > + * This function flushes timeout packets in a TCP/IPv4 reassembly table
> > + * to applications, and without updating checksums for merged packets.
> > + * The max number of flushed timeout packets is the element number of
> > + * the array which is used to keep flushed packets.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param timeout_cycles
> > + *  the maximum time that packets can stay in the table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets.
> > + * @param nb_out
> > + *  the element number of out. It's also the max number of timeout
> > + *  packets that can be flushed finally.
> > + * @return
> > + *  the number of packets that are returned.
> > + */
> > +uint16_t
> > +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		uint16_t nb_out);
> 
> Ditto.
> 
> > +
> > +/**
> > + * This function returns the number of the packets in a TCP/IPv4
> > + * reassembly table.
> > + *
> > + * @param tbl
> > + *  pointer points to a TCP/IPv4 reassembly table.
> > + * @return
> > + *  the number of packets in the table
> > + */
> > +uint32_t
> > +gro_tcp4_tbl_item_num(void *tbl);
> > +#endif

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
  2017-07-02 10:19                     ` Tan, Jianfeng
@ 2017-07-03  5:56                       ` Hu, Jiayu
  2017-07-04  8:11                         ` Yuanhan Liu
  0 siblings, 1 reply; 141+ messages in thread
From: Hu, Jiayu @ 2017-07-03  5:56 UTC (permalink / raw)
  To: Tan, Jianfeng, dev
  Cc: Ananyev, Konstantin, stephen, yliu, Wu, Jingjing, Yao, Lei A, Bie, Tiwei

Hi Jianfeng,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Sunday, July 2, 2017 6:20 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> stephen@networkplumber.org; yliu@fridaylinux.org; Wu, Jingjing
> <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>
> Subject: Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
> 
> 
> 
> On 7/1/2017 7:08 PM, Jiayu Hu wrote:
> > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > technique to reduce per-packet processing overhead. It gains
> > performance by reassembling small packets into large ones. This
> > patchset is to support GRO in DPDK. To support GRO, this patch
> > implements a GRO API framework.
> >
> > To enable more flexibility to applications, DPDK GRO is implemented as
> > a user library. Applications explicitly use the GRO library to merge
> > small packets into large ones. DPDK GRO provides two reassembly modes.
> > One is called lightweight mode, the other is called heavyweight mode.
> > If applications want to merge packets in a simple way and the number
> > of packets is relatively small, they can use the lightweight mode.
> > If applications need more fine-grained controls, they can choose the
> > heavyweight mode.
> >
> > rte_gro_reassemble_burst is the main reassembly API which is used in
> > lightweight mode and processes N packets at a time. For applications,
> > performing GRO in lightweight mode is simple. They just need to invoke
> > rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> > rte_gro_reassemble_burst returns.
> >
> > rte_gro_reassemble is the main reassembly API which is used in
> > heavyweight mode and tries to merge N inputted packets with the packets
> > in a givn GRO table. For applications, performing GRO in heavyweight
> > mode is relatively complicated. Before performing GRO, applications need
> > to create a GRO table by rte_gro_tbl_create. Then they can use
> > rte_gro_reassemble to merge packets. The GROed packets are in the GRO
> > table. If applications want to get them, applications need to manually
> > flush them by flush API.
> 
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >   config/common_base                 |   5 ++
> >   lib/Makefile                       |   2 +
> >   lib/librte_gro/Makefile            |  50 +++++++++++
> >   lib/librte_gro/rte_gro.c           | 176
> +++++++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro.h           | 176
> +++++++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_version.map |  12 +++
> >   mk/rte.app.mk                      |   1 +
> >   7 files changed, 422 insertions(+)
> >   create mode 100644 lib/librte_gro/Makefile
> >   create mode 100644 lib/librte_gro/rte_gro.c
> >   create mode 100644 lib/librte_gro/rte_gro.h
> >   create mode 100644 lib/librte_gro/rte_gro_version.map
> >
> > diff --git a/config/common_base b/config/common_base
> > index f6aafd1..167f5ef 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -712,6 +712,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> >   CONFIG_RTE_LIBRTE_PMD_VHOST=n
> >
> >   #
> > +# Compile GRO library
> > +#
> > +CONFIG_RTE_LIBRTE_GRO=y
> > +
> > +#
> >   #Compile Xen domain0 support
> >   #
> >   CONFIG_RTE_LIBRTE_XEN_DOM0=n
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 07e1fd0..ac1c2f6 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) +=
> librte_reorder
> >   DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
> >   DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
> >   DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf
> librte_ether
> > +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
> > +DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
> >
> >   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> >   DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > new file mode 100644
> > index 0000000..7e0f128
> > --- /dev/null
> > +++ b/lib/librte_gro/Makefile
> > @@ -0,0 +1,50 @@
> > +#   BSD LICENSE
> > +#
> > +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > +#   All rights reserved.
> > +#
> > +#   Redistribution and use in source and binary forms, with or without
> > +#   modification, are permitted provided that the following conditions
> > +#   are met:
> > +#
> > +#     * Redistributions of source code must retain the above copyright
> > +#       notice, this list of conditions and the following disclaimer.
> > +#     * Redistributions in binary form must reproduce the above copyright
> > +#       notice, this list of conditions and the following disclaimer in
> > +#       the documentation and/or other materials provided with the
> > +#       distribution.
> > +#     * Neither the name of Intel Corporation nor the names of its
> > +#       contributors may be used to endorse or promote products derived
> > +#       from this software without specific prior written permission.
> > +#
> > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +# library name
> > +LIB = librte_gro.a
> > +
> > +CFLAGS += -O3
> > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> > +
> > +EXPORT_MAP := rte_gro_version.map
> > +
> > +LIBABIVER := 1
> > +
> > +# source files
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +
> > +# install this header file
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > new file mode 100644
> > index 0000000..648835b
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -0,0 +1,176 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +
> > +#include "rte_gro.h"
> > +
> > +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> > +typedef void (*gro_tbl_destroy_fn)(void *tbl);
> > +typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
> > +
> > +static gro_tbl_create_fn
> tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
> > +static gro_tbl_destroy_fn
> tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
> > +static gro_tbl_item_num_fn
> tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
> > +
> > +/**
> > + * GRO table, which is used to merge packets. It keeps many reassembly
> > + * tables of desired GRO types. Applications need to create GRO tables
> > + * before using rte_gro_reassemble to perform GRO.
> > + */
> > +struct gro_tbl {
> > +	uint64_t desired_gro_types;	/**< GRO types to perform */
> > +	/* max TTL measured in nanosecond */
> > +	uint64_t max_timeout_cycles;
> > +	/* max length of merged packet measured in byte */
> > +	uint32_t max_packet_size;
> > +	/* reassebly tables of desired GRO types */
> > +	void *tbls[RTE_GRO_TYPE_MAX_NUM];
> > +};
> > +
> > +void *rte_gro_tbl_create(const
> > +		const struct rte_gro_param *param)
> 
> The name of this API and the definition of struct gro_tbl involve some
> confusion. A gro table contains gro tables? I suppose a better name is
> needed, for example, struct gro_ctl.

Actually, a GRO table includes N reassembly tables. But gro_tbl is not a good
name. I will change the name. Thanks.

> 
> > +{
> > +	gro_tbl_create_fn create_tbl_fn;
> > +	gro_tbl_destroy_fn destroy_tbl_fn;
> > +	struct gro_tbl *gro_tbl;
> > +	uint64_t gro_type_flag = 0;
> > +	uint8_t i, j;
> > +
> > +	gro_tbl = rte_zmalloc_socket(__func__,
> > +			sizeof(struct gro_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			param->socket_id);
> > +	if (gro_tbl == NULL)
> > +		return NULL;
> > +	gro_tbl->max_packet_size = param->max_packet_size;
> > +	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
> > +	gro_tbl->desired_gro_types = param->desired_gro_types;
> > +
> > +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> > +		gro_type_flag = 1 << i;
> > +
> > +		if ((param->desired_gro_types & gro_type_flag) == 0)
> > +			continue;
> > +		create_tbl_fn = tbl_create_functions[i];
> > +		if (create_tbl_fn == NULL)
> > +			continue;
> > +
> > +		gro_tbl->tbls[i] = create_tbl_fn(
> > +				param->socket_id,
> > +				param->max_flow_num,
> > +				param->max_item_per_flow);
> 
> Here and somewhere else: the alignment seems not correct.
>          gro_tbl->tbls[i] = create_tbl_fn(param->socket_id,
>                                                                  /* keep
> all parameters aligned like this */
> param->max_flow_num,
> param->max_item_per_flow);

Thanks, I will modify it.

> > +		if (gro_tbl->tbls[i] == NULL) {
> > +			/* destroy all allocated tables */
> > +			for (j = 0; j < i; j++) {
> > +				gro_type_flag = 1 << j;
> > +				if ((param->desired_gro_types &
> gro_type_flag) == 0)
> > +					continue;
> > +				destroy_tbl_fn = tbl_destroy_functions[j];
> > +				if (destroy_tbl_fn)
> > +					destroy_tbl_fn(gro_tbl->tbls[j]);
> > +			}
> > +			rte_free(gro_tbl);
> > +			return NULL;
> > +		}
> > +	}
> > +	return gro_tbl;
> > +}
> > +
> > +void rte_gro_tbl_destroy(void *tbl)
> > +{
> > +	gro_tbl_destroy_fn destroy_tbl_fn;
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> > +	uint64_t gro_type_flag;
> > +	uint8_t i;
> > +
> > +	if (gro_tbl == NULL)
> > +		return;
> > +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> > +		gro_type_flag = 1 << i;
> > +		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
> > +			continue;
> > +		destroy_tbl_fn = tbl_destroy_functions[i];
> > +		if (destroy_tbl_fn)
> > +			destroy_tbl_fn(gro_tbl->tbls[i]);
> > +	}
> > +	rte_free(gro_tbl);
> > +}
> > +
> > +uint16_t
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +		uint16_t nb_pkts,
> > +		const struct rte_gro_param *param __rte_unused)
> > +{
> > +	return nb_pkts;
> > +}
> > +
> > +uint16_t
> > +rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> > +		uint16_t nb_pkts,
> > +		void *tbl __rte_unused)
> > +{
> > +	return nb_pkts;
> > +}
> > +
> > +uint16_t
> > +rte_gro_timeout_flush(void *tbl __rte_unused,
> > +		uint64_t desired_gro_types __rte_unused,
> > +		struct rte_mbuf **out __rte_unused,
> > +		uint16_t max_nb_out __rte_unused)
> > +{
> > +	return 0;
> > +}
> > +
> > +uint64_t rte_gro_tbl_item_num(void *tbl)
> 
> Does rte_gro_get_count() sound better?

OK, I will change the name.

> 
> > +{
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> > +	gro_tbl_item_num_fn item_num_fn;
> > +	uint64_t item_num = 0;
> > +	uint64_t gro_type_flag;
> > +	uint8_t i;
> > +
> > +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> > +		gro_type_flag = 1 << i;
> > +		if ((gro_tbl->desired_gro_types & gro_type_flag) == 0)
> > +			continue;
> > +
> > +		item_num_fn = tbl_item_num_functions[i];
> > +		if (item_num_fn == NULL)
> > +			continue;
> > +		item_num += item_num_fn(gro_tbl->tbls[i]);
> > +	}
> > +	return item_num;
> > +}
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > new file mode 100644
> > index 0000000..02c9113
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -0,0 +1,176 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_H_
> > +#define _RTE_GRO_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * the max packets number that rte_gro_reassemble_burst can
> > + * process in each invocation.
> > + */
> > +#define RTE_GRO_MAX_BURST_ITEM_NUM 128UL
> > +
> > +/* max number of supported GRO types */
> > +#define RTE_GRO_TYPE_MAX_NUM 64
> > +#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported
> GRO num */
> > +
> > +
> > +struct rte_gro_param {
> > +	uint64_t desired_gro_types;	/**< desired GRO types */
> 
> Make it gro_types for simplicity.

Thanks, I will change it.

> 
> > +	uint32_t max_packet_size;	/**< max length of merged packets
> */
> 
> Refer to the tcp4 gro implementation, this is the max size for tcp
> payload. But in principle, the 65535-byte limitation (including TCP
> header) is because IP header lenght is 2-byte long.

Yes, it's a bug. The length of (IP header + TCP header + payload)
should be less than 64KB. But max_packet_size is uint32_t.

> 
> What does it means for other GRO engines then? I think these should be
> decided by each gro engine. And applications don't have to change them.

Make sense. We shouldn't expose it applications. I will remove this parameter.
About the above bug, I will fix it by limiting the max length of a TCP/IPv4 packet
to (l2_header + 64KB).

> 
> > +	uint16_t max_flow_num;	/**< max flow number */
> > +	uint16_t max_item_per_flow;	/**< max packet number per flow
> */
> > +
> > +	/* socket index where the Ethernet port connects to */
> 
> The comment needs to be refined. We have different socket idx for port,
> for pmd thread. We just explain how this will be used: "socket index for
> allocating gro related data structure".

Thanks, I will change it.

> 
> > +	uint16_t socket_id;
> > +	/* max TTL for a packet in the GRO table, measured in nanosecond
> */
> > +	uint64_t max_timeout_cycles;
> 
> We don't need to set it in lightweight mode. Please add this into the
> comment.

Thanks, I will add this comment.

> 
> > +};
> > +
> > +/**
> > + * This function create a GRO table, which is used to merge packets in
> > + * rte_gro_reassemble.
> > + *
> > + * @param param
> > + *  applications use it to pass needed parameters to create a GRO table.
> > + * @return
> > + *  if create successfully, return a pointer which points to the GRO
> > + *  table. Otherwise, return NULL.
> > + */
> > +void *rte_gro_tbl_create(
> > +		const struct rte_gro_param *param);
> 
> Merge above two lines into one.

Thanks, I will modify it.

> 
> > +/**
> > + * This function destroys a GRO table.
> > + */
> > +void rte_gro_tbl_destroy(void *tbl);
> > +
> > +/**
> > + * This is one of the main reassembly APIs, which merges numbers of
> > + * packets at a time. It assumes that all inputted packets are with
> > + * correct checksums. That is, applications should guarantee all
> > + * inputted packets are correct. Besides, it doesn't re-calculate
> > + * checksums for merged packets. If inputted packets are IP fragmented,
> > + * this function assumes them are complete (i.e. with L4 header). After
> > + * finishing processing, it returns all GROed packets to applications
> > + * immediately.
> > + *
> > + * @param pkts
> > + *  a pointer array which points to the packets to reassemble. Besides,
> > + *  it keeps packet addresses for GROed packets.
> > + * @param nb_pkts
> > + *  the number of packets to reassemble.
> > + * @param param
> > + *  applications use it to tell rte_gro_reassemble_burst what rules
> > + *  are demanded.
> > + * @return
> > + *  the number of packets after been GROed.
> > + */
> > +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> > +		uint16_t nb_pkts,
> > +		const struct rte_gro_param *param);
> 
> Fix the alignment.
> 
> > +
> > +/**
> > + * Reassembly function, which tries to merge inputted packets with
> > + * the packets in a given GRO table. This function assumes all inputted
> > + * packet is with correct checksums. And it won't update checksums if
> > + * two packets are merged. Besides, if inputted packets are IP
> > + * fragmented, this function assumes they are complete packets (i.e.
> > + * with L4 header).
> > + *
> > + * If the inputted packets don't have data or are with unsupported GRO
> > + * types, they won't be processed and are returned to applications.
> > + * Otherwise, the inputted packets are either merged or inserted into
> > + * the table. If applications want get packets in the table, they need
> > + * to call flush API.
> > + *
> > + * @param pkts
> > + *  packet to reassemble. Besides, after this function finishes, it
> > + *  keeps the unprocessed packets (i.e. without data or unsupported
> > + *  GRO types).
> > + * @param nb_pkts
> > + *  the number of packets to reassemble.
> > + * @param tbl
> > + *  a pointer points to a GRO table.
> > + * @return
> > + *  return the number of unprocessed packets (i.e. without data or
> > + *  unsupported GRO types). If all packets are processed (merged or
> > + *  inserted into the table), return 0.
> > + */
> > +uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
> > +		uint16_t nb_pkts,
> > +		void *tbl);
> > +
> > +/**
> > + * This function flushes the timeout packets from reassembly tables of
> > + * desired GRO types. The max number of flushed timeout packets is the
> > + * element number of the array which is used to keep the flushed packets.
> > + *
> > + * Besides, this function won't re-calculate checksums for merged
> > + * packets in the tables. That is, the returned packets may be with
> > + * wrong checksums.
> > + *
> > + * @param tbl
> > + *  a pointer points to a GRO table object.
> > + * @param desired_gro_types
> > + * rte_gro_timeout_flush only processes packets which belong to the
> > + * GRO types specified by desired_gro_types.
> > + * @param out
> > + *  a pointer array that is used to keep flushed timeout packets.
> > + * @param nb_out
> > + *  the element number of out. It's also the max number of timeout
> > + *  packets that can be flushed finally.
> > + * @return
> > + *  the number of flushed packets. If no packets are flushed, return 0.
> > + */
> > +uint16_t rte_gro_timeout_flush(void *tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		uint16_t max_nb_out);
> > +
> > +/**
> > + * This function returns the number of packets in a given GRO table.
> > + * @param tbl
> > + *  pointer points to a GRO table.
> > + * @return
> > + *  the number of packets in the table.
> > + */
> > +uint64_t rte_gro_tbl_item_num(void *tbl);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif
> > diff --git a/lib/librte_gro/rte_gro_version.map
> b/lib/librte_gro/rte_gro_version.map
> > new file mode 100644
> > index 0000000..358fb9d
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_version.map
> > @@ -0,0 +1,12 @@
> > +DPDK_17.08 {
> > +	global:
> > +
> > +	rte_gro_tbl_create;
> > +	rte_gro_tbl_destroy;
> 
> As stated earlier, here are the API names I suggested:
> rte_gro_ctl_create()/rte_gro_ctl_destroy()/rte_gro_get_count().

Thanks, I will change the names.

> 
> > +	rte_gro_reassemble_burst;
> > +	rte_gro_reassemble;
> > +	rte_gro_timeout_flush;
> > +	rte_gro_tbl_item_num;
> > +
> > +	local: *;
> > +};
> > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > index bcaf1b3..fc3776d 100644
> > --- a/mk/rte.app.mk
> > +++ b/mk/rte.app.mk
> > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -
> lrte_ring
> >   _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> >   _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> >   _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> >
> >   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> >   _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
  2017-07-03  5:56                       ` Hu, Jiayu
@ 2017-07-04  8:11                         ` Yuanhan Liu
  0 siblings, 0 replies; 141+ messages in thread
From: Yuanhan Liu @ 2017-07-04  8:11 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: Tan, Jianfeng, dev, Ananyev, Konstantin, stephen, Wu, Jingjing,
	Yao, Lei A, Bie, Tiwei

On Mon, Jul 03, 2017 at 05:56:20AM +0000, Hu, Jiayu wrote:
> > > +/**
> > > + * GRO table, which is used to merge packets. It keeps many reassembly
> > > + * tables of desired GRO types. Applications need to create GRO tables
> > > + * before using rte_gro_reassemble to perform GRO.
> > > + */
> > > +struct gro_tbl {
> > > +	uint64_t desired_gro_types;	/**< GRO types to perform */
> > > +	/* max TTL measured in nanosecond */
> > > +	uint64_t max_timeout_cycles;
> > > +	/* max length of merged packet measured in byte */
> > > +	uint32_t max_packet_size;
> > > +	/* reassebly tables of desired GRO types */
> > > +	void *tbls[RTE_GRO_TYPE_MAX_NUM];
> > > +};
> > > +
> > > +void *rte_gro_tbl_create(const
> > > +		const struct rte_gro_param *param)
> > 
> > The name of this API and the definition of struct gro_tbl involve some
> > confusion. A gro table contains gro tables? I suppose a better name is
> > needed, for example, struct gro_ctl.
> 
> Actually, a GRO table includes N reassembly tables. But gro_tbl is not a good
> name. I will change the name. Thanks.

Haven't looked at the details yet, but, probably, gro_ctx (context) is a better
and more typical name?

	--yliu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
  2017-07-01 11:08                   ` [PATCH v10 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-07-02 10:19                     ` Tan, Jianfeng
@ 2017-07-04  8:37                     ` Yuanhan Liu
  2017-07-04 16:01                       ` Hu, Jiayu
  1 sibling, 1 reply; 141+ messages in thread
From: Yuanhan Liu @ 2017-07-04  8:37 UTC (permalink / raw)
  To: Jiayu Hu
  Cc: dev, konstantin.ananyev, stephen, jianfeng.tan, jingjing.wu,
	lei.a.yao, tiwei.bie

Haven't looked at the details yet, and below are some quick comments
after a glimpse.

On Sat, Jul 01, 2017 at 07:08:41PM +0800, Jiayu Hu wrote:
...
> +void *rte_gro_tbl_create(const
> +		const struct rte_gro_param *param)

The DPDK style is:

void *
rte_gro_tbl_destroy(...)

Also you should revisit all other functions, as I have seen quite many
coding style issues like this.

> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	struct gro_tbl *gro_tbl;
> +	uint64_t gro_type_flag = 0;
> +	uint8_t i, j;
> +
> +	gro_tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct gro_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			param->socket_id);
> +	if (gro_tbl == NULL)
> +		return NULL;
> +	gro_tbl->max_packet_size = param->max_packet_size;
> +	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
> +	gro_tbl->desired_gro_types = param->desired_gro_types;
> +
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +
> +		if ((param->desired_gro_types & gro_type_flag) == 0)
> +			continue;
> +		create_tbl_fn = tbl_create_functions[i];
> +		if (create_tbl_fn == NULL)
> +			continue;
> +
> +		gro_tbl->tbls[i] = create_tbl_fn(
> +				param->socket_id,
> +				param->max_flow_num,
> +				param->max_item_per_flow);
> +		if (gro_tbl->tbls[i] == NULL) {
> +			/* destroy all allocated tables */
> +			for (j = 0; j < i; j++) {
> +				gro_type_flag = 1 << j;
> +				if ((param->desired_gro_types & gro_type_flag) == 0)
> +					continue;
> +				destroy_tbl_fn = tbl_destroy_functions[j];
> +				if (destroy_tbl_fn)
> +					destroy_tbl_fn(gro_tbl->tbls[j]);
> +			}
> +			rte_free(gro_tbl);
> +			return NULL;

The typical way to handle this is to re-use rte_gro_tbl_destroy() as
much as possible. This saves duplicate code.

> +		}
> +	}
> +	return gro_tbl;
> +}
> +
> +void rte_gro_tbl_destroy(void *tbl)
> +{
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;

The cast (from void *) is unnecessary and can be dropped.

...
> +/**
> + * the max packets number that rte_gro_reassemble_burst can
> + * process in each invocation.
> + */
> +#define RTE_GRO_MAX_BURST_ITEM_NUM 128UL
> +
> +/* max number of supported GRO types */
> +#define RTE_GRO_TYPE_MAX_NUM 64
> +#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported GRO num */

The reason we need use comment style of "/**< ... */" is because this
is what the doc generator (doxygen) recognizes. If not doing this, your
comment won't be displayed at the generated doc page (for example,
http://dpdk.org/doc/api/rte__ethdev_8h.html#ade7de72f6c0f8102d01a0b3438856900).

The format, as far as I known, could be:

    /**< here is a comment */
    #define A_MACRO		x

Or the one you did for RTE_GRO_TYPE_SUPPORT_NUM: put it at the end
of the line.

That being said, the comments for RTE_GRO_MAX_BURST_ITEM_NUM and
RTE_GRO_TYPE_MAX_NUM should be changed. Again, you should revisit
other places.

> +
> +
> +struct rte_gro_param {
> +	uint64_t desired_gro_types;	/**< desired GRO types */
> +	uint32_t max_packet_size;	/**< max length of merged packets */
> +	uint16_t max_flow_num;	/**< max flow number */
> +	uint16_t max_item_per_flow;	/**< max packet number per flow */
> +
> +	/* socket index where the Ethernet port connects to */

Ditto.

...
> +++ b/lib/librte_gro/rte_gro_version.map
> @@ -0,0 +1,12 @@
> +DPDK_17.08 {
> +	global:
> +
> +	rte_gro_tbl_create;
> +	rte_gro_tbl_destroy;
> +	rte_gro_reassemble_burst;
> +	rte_gro_reassemble;
> +	rte_gro_timeout_flush;
> +	rte_gro_tbl_item_num;

The undocumented habit is to list them in alpha order.

	--yliu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-01 11:08                   ` [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
  2017-07-02 10:19                     ` Tan, Jianfeng
@ 2017-07-04  9:03                     ` Yuanhan Liu
  2017-07-04 16:03                       ` Hu, Jiayu
  1 sibling, 1 reply; 141+ messages in thread
From: Yuanhan Liu @ 2017-07-04  9:03 UTC (permalink / raw)
  To: Jiayu Hu
  Cc: dev, konstantin.ananyev, stephen, jianfeng.tan, jingjing.wu,
	lei.a.yao, tiwei.bie

Again, just some quick comments after a glimpse.

On Sat, Jul 01, 2017 at 07:08:42PM +0800, Jiayu Hu wrote:
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
> +			ret = gro_tcp4_reassemble(pkts[i],
> +					&tcp_tbl,
> +					param->max_packet_size,
> +					current_time);
> +			if (ret > 0)
> +				/* merge successfully */
> +				nb_after_gro--;
> +			else if (ret < 0)
> +				unprocess_pkts[unprocess_num++] =
> +					pkts[i];

Even it's just one statement, if the statement is spawned more than
one line, including the comment, the {} should be used.

Section 1.6.2. Control Statements and Loops of:
http://dpdk.org/doc/guides/contributing/coding_style.html

> +		} else
> +			unprocess_pkts[unprocess_num++] =
> +				pkts[i];

Besides, why breaking it to two lines, judging that it can be fit into
one line smaller than 80 chars.

> +	}
> +
> +	/* re-arrange GROed packets */
> +	if (nb_after_gro < nb_pkts) {
> +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
> +				pkts, nb_pkts);
> +		if (unprocess_num > 0)
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) *
> +					unprocess_num);

Ditto.

> +void *gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	size_t size;
> +	uint32_t entries_num;
> +	struct gro_tcp4_tbl *tbl;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
> +
> +	if (entries_num == 0)
> +		return NULL;
> +
> +	tbl = (struct gro_tcp4_tbl *)rte_zmalloc_socket(
> +			__func__,
> +			sizeof(struct gro_tcp4_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);

Again, the cast (from void *) is unnessary and should be dropped.

> +	memcpy(&(tbl->keys[key_idx].key),
> +			&key, sizeof(struct tcp4_key));

Again, I believe they two can be fit into one single line.

	--yliu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
  2017-07-04  8:37                     ` Yuanhan Liu
@ 2017-07-04 16:01                       ` Hu, Jiayu
  0 siblings, 0 replies; 141+ messages in thread
From: Hu, Jiayu @ 2017-07-04 16:01 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, Ananyev, Konstantin, stephen, Tan, Jianfeng, Wu, Jingjing,
	Yao, Lei A, Bie, Tiwei

Hi Yuanhan,

> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Tuesday, July 4, 2017 4:37 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> stephen@networkplumber.org; Tan, Jianfeng <jianfeng.tan@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Bie,
> Tiwei <tiwei.bie@intel.com>
> Subject: Re: [PATCH v10 1/3] lib: add Generic Receive Offload API framework
> 
> Haven't looked at the details yet, and below are some quick comments
> after a glimpse.
> 
> On Sat, Jul 01, 2017 at 07:08:41PM +0800, Jiayu Hu wrote:
> ...
> > +void *rte_gro_tbl_create(const
> > +		const struct rte_gro_param *param)
> 
> The DPDK style is:
> 
> void *
> rte_gro_tbl_destroy(...)
> 
> Also you should revisit all other functions, as I have seen quite many
> coding style issues like this.

Thanks, I will fix the style issues.

> 
> > +{
> > +	gro_tbl_create_fn create_tbl_fn;
> > +	gro_tbl_destroy_fn destroy_tbl_fn;
> > +	struct gro_tbl *gro_tbl;
> > +	uint64_t gro_type_flag = 0;
> > +	uint8_t i, j;
> > +
> > +	gro_tbl = rte_zmalloc_socket(__func__,
> > +			sizeof(struct gro_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			param->socket_id);
> > +	if (gro_tbl == NULL)
> > +		return NULL;
> > +	gro_tbl->max_packet_size = param->max_packet_size;
> > +	gro_tbl->max_timeout_cycles = param->max_timeout_cycles;
> > +	gro_tbl->desired_gro_types = param->desired_gro_types;
> > +
> > +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> > +		gro_type_flag = 1 << i;
> > +
> > +		if ((param->desired_gro_types & gro_type_flag) == 0)
> > +			continue;
> > +		create_tbl_fn = tbl_create_functions[i];
> > +		if (create_tbl_fn == NULL)
> > +			continue;
> > +
> > +		gro_tbl->tbls[i] = create_tbl_fn(
> > +				param->socket_id,
> > +				param->max_flow_num,
> > +				param->max_item_per_flow);
> > +		if (gro_tbl->tbls[i] == NULL) {
> > +			/* destroy all allocated tables */
> > +			for (j = 0; j < i; j++) {
> > +				gro_type_flag = 1 << j;
> > +				if ((param->desired_gro_types &
> gro_type_flag) == 0)
> > +					continue;
> > +				destroy_tbl_fn = tbl_destroy_functions[j];
> > +				if (destroy_tbl_fn)
> > +					destroy_tbl_fn(gro_tbl->tbls[j]);
> > +			}
> > +			rte_free(gro_tbl);
> > +			return NULL;
> 
> The typical way to handle this is to re-use rte_gro_tbl_destroy() as
> much as possible. This saves duplicate code.

Thanks, I will change it.

> 
> > +		}
> > +	}
> > +	return gro_tbl;
> > +}
> > +
> > +void rte_gro_tbl_destroy(void *tbl)
> > +{
> > +	gro_tbl_destroy_fn destroy_tbl_fn;
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> 
> The cast (from void *) is unnecessary and can be dropped.

Thanks, I will remove them.

> 
> ...
> > +/**
> > + * the max packets number that rte_gro_reassemble_burst can
> > + * process in each invocation.
> > + */
> > +#define RTE_GRO_MAX_BURST_ITEM_NUM 128UL
> > +
> > +/* max number of supported GRO types */
> > +#define RTE_GRO_TYPE_MAX_NUM 64
> > +#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported
> GRO num */
> 
> The reason we need use comment style of "/**< ... */" is because this
> is what the doc generator (doxygen) recognizes. If not doing this, your
> comment won't be displayed at the generated doc page (for example,
> http://dpdk.org/doc/api/rte__ethdev_8h.html#ade7de72f6c0f8102d01a0b3
> 438856900).
> 
> The format, as far as I known, could be:
> 
>     /**< here is a comment */
>     #define A_MACRO		x
> 
> Or the one you did for RTE_GRO_TYPE_SUPPORT_NUM: put it at the end
> of the line.
> 
> That being said, the comments for RTE_GRO_MAX_BURST_ITEM_NUM and
> RTE_GRO_TYPE_MAX_NUM should be changed. Again, you should revisit
> other places.

Thanks, I will modify the comments style.

> 
> > +
> > +
> > +struct rte_gro_param {
> > +	uint64_t desired_gro_types;	/**< desired GRO types */
> > +	uint32_t max_packet_size;	/**< max length of merged packets
> */
> > +	uint16_t max_flow_num;	/**< max flow number */
> > +	uint16_t max_item_per_flow;	/**< max packet number per flow
> */
> > +
> > +	/* socket index where the Ethernet port connects to */
> 
> Ditto.
> 
> ...
> > +++ b/lib/librte_gro/rte_gro_version.map
> > @@ -0,0 +1,12 @@
> > +DPDK_17.08 {
> > +	global:
> > +
> > +	rte_gro_tbl_create;
> > +	rte_gro_tbl_destroy;
> > +	rte_gro_reassemble_burst;
> > +	rte_gro_reassemble;
> > +	rte_gro_timeout_flush;
> > +	rte_gro_tbl_item_num;
> 
> The undocumented habit is to list them in alpha order.

Thanks, I will change the order.

BRs,
Jiayu
> 
> 	--yliu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-04  9:03                     ` Yuanhan Liu
@ 2017-07-04 16:03                       ` Hu, Jiayu
  0 siblings, 0 replies; 141+ messages in thread
From: Hu, Jiayu @ 2017-07-04 16:03 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: dev, Ananyev, Konstantin, stephen, Tan, Jianfeng, Wu, Jingjing,
	Yao, Lei A, Bie, Tiwei

Hi Yuanhan,

> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Tuesday, July 4, 2017 5:03 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> stephen@networkplumber.org; Tan, Jianfeng <jianfeng.tan@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Bie,
> Tiwei <tiwei.bie@intel.com>
> Subject: Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> Again, just some quick comments after a glimpse.
> 
> On Sat, Jul 01, 2017 at 07:08:42PM +0800, Jiayu Hu wrote:
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> > +				(pkts[i]->packet_type &
> RTE_PTYPE_L4_TCP)) {
> > +			ret = gro_tcp4_reassemble(pkts[i],
> > +					&tcp_tbl,
> > +					param->max_packet_size,
> > +					current_time);
> > +			if (ret > 0)
> > +				/* merge successfully */
> > +				nb_after_gro--;
> > +			else if (ret < 0)
> > +				unprocess_pkts[unprocess_num++] =
> > +					pkts[i];
> 
> Even it's just one statement, if the statement is spawned more than
> one line, including the comment, the {} should be used.
> 
> Section 1.6.2. Control Statements and Loops of:
> http://dpdk.org/doc/guides/contributing/coding_style.html
> 
> > +		} else
> > +			unprocess_pkts[unprocess_num++] =
> > +				pkts[i];
> 
> Besides, why breaking it to two lines, judging that it can be fit into
> one line smaller than 80 chars.

Thanks, I will add {}.

> 
> > +	}
> > +
> > +	/* re-arrange GROed packets */
> > +	if (nb_after_gro < nb_pkts) {
> > +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
> > +				pkts, nb_pkts);
> > +		if (unprocess_num > 0)
> > +			memcpy(&pkts[i], unprocess_pkts,
> > +					sizeof(struct rte_mbuf *) *
> > +					unprocess_num);
> 
> Ditto.
> 
> > +void *gro_tcp4_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow)
> > +{
> > +	size_t size;
> > +	uint32_t entries_num;
> > +	struct gro_tcp4_tbl *tbl;
> > +
> > +	entries_num = max_flow_num * max_item_per_flow;
> > +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> > +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
> > +
> > +	if (entries_num == 0)
> > +		return NULL;
> > +
> > +	tbl = (struct gro_tcp4_tbl *)rte_zmalloc_socket(
> > +			__func__,
> > +			sizeof(struct gro_tcp4_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> 
> Again, the cast (from void *) is unnessary and should be dropped.
> 
> > +	memcpy(&(tbl->keys[key_idx].key),
> > +			&key, sizeof(struct tcp4_key));
> 
> Again, I believe they two can be fit into one single line.

Thanks, I will fix these issues.

BRs,
Jiayu
> 
> 	--yliu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                     ` (2 preceding siblings ...)
  2017-07-01 11:08                   ` [PATCH v10 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-05  4:08                   ` Jiayu Hu
  2017-07-05  4:08                     ` [PATCH v11 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                       ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-05  4:08 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way and the number of packets is small, they can
select the lightweight mode API. If applications need more fine-grained
controls, they can select the heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v11:
- avoid converting big-endian to little-endian when compare key
- add sent_seq and ip_id to gro_tcp4_item to accelarate packet
	reassembly
- remove max_packet_size from rte_gro_param
- add inline functions to replace reduplicate codes
- change external API names and structure name
	(rte_gro_tbl_xxx -> rte_gro_ctx_xxx)
- fix coding style issues and order issue in rte_gro_version.map
- change inacccurate comments
- change internal files name from rte_gro_tcp4.x to gro_tcp4.x
v10:
- add support to merge '<seq=2, seq=1>' TCP/IPv4 packets
- check if IP ID is consecutive and update IP ID for merged packets
- check SYN, FIN, PSH, RST, URG flags
- use different reassembly table structures and APIs for TCP/IPv4 GRO
	and TCP/IPv6 GRO
- change file name from 'rte_gro_tcp.x' to 'rte_gro_tcp4.x'
v9:
- avoid defining variable size structure array and memset variable size
	in rte_gro_reassemble_burst
- change internal structure name from 'te_gro_tbl' to 'gro_tbl'
- delete useless variables in rte_gro_tcp.c
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++
 app/test-pmd/config.c                       |  36 ++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 +++
 lib/librte_gro/gro_tcp4.c                   | 493 ++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h                   | 206 ++++++++++++
 lib/librte_gro/rte_gro.c                    | 266 +++++++++++++++
 lib/librte_gro/rte_gro.h                    | 198 +++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1455 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v11 1/3] lib: add Generic Receive Offload API framework
  2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-07-05  4:08                     ` Jiayu Hu
  2017-07-07  6:55                       ` Tan, Jianfeng
  2017-07-05  4:08                     ` [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-07-05  4:08 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in GRO reassembly tables. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO context object, which keeps reassembly tables of
desired GRO types, by rte_gro_ctrl_create. Then applications can use
rte_gro_reassemble to merge packets. The GROed packets are in the
reassembly tables of the GRO context object. If applications want to get
them, applications need to manually flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 ++++++++++
 lib/librte_gro/rte_gro.c           | 171 ++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 195 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 436 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index 660588a..19605a2 100644
--- a/config/common_base
+++ b/config/common_base
@@ -713,6 +713,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..24e5f2b
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,171 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM];
+
+/*
+ * GRO context structure, which is used to merge packets. It keeps
+ * many reassembly tables of desired GRO types. Applications need to
+ * create GRO context objects before using rte_gro_reassemble to
+ * perform GRO.
+ */
+struct gro_ctx {
+	/* GRO types to perform */
+	uint64_t gro_types;
+	/* max TTL in nanosecond */
+	uint64_t max_timeout_cycles;
+	/* reassembly tables */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *
+rte_gro_ctx_create(const struct rte_gro_param *param)
+{
+	struct gro_ctx *gro_ctx;
+	gro_tbl_create_fn create_tbl_fn;
+	uint64_t gro_type_flag = 0;
+	uint64_t gro_types = 0;
+	uint8_t i;
+
+	gro_ctx = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_ctx),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_ctx == NULL)
+		return NULL;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((param->gro_types & gro_type_flag) == 0)
+			continue;
+
+		create_tbl_fn = tbl_create_fn[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_ctx->tbls[i] = create_tbl_fn(param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_ctx->tbls[i] == NULL) {
+			/* destroy all created tables */
+			gro_ctx->gro_types = gro_types;
+			rte_gro_ctx_destroy(gro_ctx);
+			return NULL;
+		}
+		gro_types |= gro_type_flag;
+	}
+	gro_ctx->max_timeout_cycles = param->max_timeout_cycles;
+	gro_ctx->gro_types = param->gro_types;
+
+	return gro_ctx;
+}
+
+void
+rte_gro_ctx_destroy(void *ctx)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_ctx == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_fn[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_ctx->tbls[i]);
+	}
+	rte_free(gro_ctx);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *ctx __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *ctx __rte_unused,
+		uint64_t gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t
+rte_gro_get_count(void *ctx)
+{
+	struct gro_ctx *gro_ctx = ctx;
+	gro_tbl_get_count_fn get_count_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+
+		get_count_fn = tbl_get_count_fn[i];
+		if (get_count_fn == NULL)
+			continue;
+		item_num += get_count_fn(gro_ctx->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..54a6e82
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,195 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**< the max packets number that rte_gro_reassemble_burst()
+ * can process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 128U
+
+/**< max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 0
+
+
+struct rte_gro_param {
+	/**< desired GRO types */
+	uint64_t gro_types;
+	/**< max TTL for packets in reassembly tables, measured
+	 * in nanosecond. When use rte_gro_reassemble_burst(),
+	 * applications don't need to set this value.
+	 */
+	uint64_t max_timeout_cycles;
+	/**< max flow number */
+	uint16_t max_flow_num;
+	/**< max packet number per flow */
+	uint16_t max_item_per_flow;
+	/**< socket index for allocating GRO related data structures,
+	 * like reassembly tables. When use rte_gro_reassemble_burst(),
+	 * applications don't need to set this value.
+	 */
+	uint16_t socket_id;
+};
+
+/**
+ * This function create a GRO context object, which is used to merge
+ * packets in rte_gro_reassemble().
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO
+ *  context object.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  context object. Otherwise, return NULL.
+ */
+void *rte_gro_ctx_create(const struct rte_gro_param *param);
+
+/**
+ * This function destroys a GRO context object.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ */
+void rte_gro_ctx_destroy(void *ctx);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst() what rules
+ *  are demanded.
+ *
+ * @return
+ *  the number of packets after been GROed. If no packets are merged,
+ *  the returned value is nb_pkts.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in the reassembly tables of a given GRO context. This
+ * function assumes all inputted packets are with correct checksums.
+ * And it won't update checksums if two packets are merged. Besides,
+ * if inputted packets are IP fragmented, this function assumes they
+ * are complete packets (i.e. with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types etc., they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (e.g. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ *
+ * @return
+ *  return the number of unprocessed packets (e.g. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *ctx);
+
+/**
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ * @param gro_types
+ * rte_gro_timeout_flush only flushes packets which belong to the
+ * GRO types specified by gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *ctx,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * This function returns the number of packets in all reassembly tables
+ * of a given GRO context.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ *
+ * @return
+ *  the number of packets in all reassembly tables.
+ */
+uint64_t rte_gro_get_count(void *ctx);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRO_H_ */
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..ca139c5
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_ctrl_create;
+	rte_gro_ctrl_destroy;
+	rte_gro_get_count;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7d71a49..d76ed13 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-05  4:08                     ` [PATCH v11 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-05  4:08                     ` Jiayu Hu
  2017-07-07  6:55                       ` Tan, Jianfeng
  2017-07-05  4:08                     ` [PATCH v11 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-07-05  4:08 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
    to merge packets.
- gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
- gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
    reassembly table.
- gro_tcp4_tbl_get_count: return the number of packets in a TCP/IPv4
    reassembly table.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
array and a item array, where the key array keeps the criteria to merge
packets and the item array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes three parts:
- firstseg: the address of the first segment of the packet
- lastsegL the address of the last segment of the packet
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with one of the
    following properties won't be processed:
	- FIN, SYN, RST URG, PSH, ECE or CWR bit is set;
	- packet payload length is 0.
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/gro_tcp4.c              | 493 +++++++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h              | 206 ++++++++++++++
 lib/librte_gro/rte_gro.c               | 121 +++++++-
 lib/librte_gro/rte_gro.h               |   5 +-
 6 files changed, 819 insertions(+), 14 deletions(-)
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..747eeec 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
new file mode 100644
index 0000000..703282d
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.c
@@ -0,0 +1,493 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "gro_tcp4.h"
+
+void *
+gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	struct gro_tcp4_tbl *tbl;
+	size_t size;
+	uint32_t entries_num;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
+		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_tcp4_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp4_item) * entries_num;
+	tbl->items = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp4_key) * entries_num;
+	tbl->keys = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_key_num = entries_num;
+
+	return tbl;
+}
+
+void
+gro_tcp4_tbl_destroy(void *tbl)
+{
+	struct gro_tcp4_tbl *tcp_tbl = tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+/*
+ * merge two TCP/IPv4 packets without updating checksums.
+ * If cmp is larger than 0, append the new packet to the
+ * original packet. Otherwise, pre-pend the new packet to
+ * the original packet.
+ */
+static inline int
+merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		int cmp)
+{
+	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
+	uint16_t tcp_dl1;
+
+	if (cmp > 0) {
+		pkt_head = item_src->firstseg;
+		pkt_tail = pkt;
+	} else {
+		pkt_head = pkt;
+		pkt_tail = item_src->firstseg;
+	}
+
+	/* check if the packet length will be beyond the max value */
+	tcp_dl1 = pkt_tail->pkt_len - pkt_tail->l2_len -
+		pkt_tail->l3_len - pkt_tail->l4_len;
+	if (pkt_head->pkt_len - pkt_head->l2_len + tcp_dl1 >
+			TCP4_MAX_L3_LENGTH)
+		return -1;
+
+	/* remove packet header for the tail packet */
+	rte_pktmbuf_adj(pkt_tail,
+			pkt_tail->l2_len +
+			pkt_tail->l3_len +
+			pkt_tail->l4_len);
+
+	/* chain two packets together */
+	if (cmp > 0) {
+		item_src->lastseg->next = pkt;
+		item_src->lastseg = rte_pktmbuf_lastseg(pkt);
+		/* update IP ID to the larger value */
+		item_src->ip_id = ip_id;
+	} else {
+		lastseg = rte_pktmbuf_lastseg(pkt);
+		lastseg->next = item_src->firstseg;
+		item_src->firstseg = pkt;
+		/* update sent_seq to the smaller value */
+		item_src->sent_seq = sent_seq;
+	}
+	item_src->nb_merged++;
+
+	/* update mbuf metadata for the merged packet */
+	pkt_head->nb_segs += pkt_tail->nb_segs;
+	pkt_head->pkt_len += pkt_tail->pkt_len;
+
+	return 1;
+}
+
+static inline int
+check_seq_option(struct gro_tcp4_item *item,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl,
+		uint16_t tcp_dl,
+		uint16_t ip_id,
+		uint32_t sent_seq)
+{
+	struct rte_mbuf *pkt0 = item->firstseg;
+	struct ipv4_hdr *ipv4_hdr0;
+	struct tcp_hdr *tcp_hdr0;
+	uint16_t tcp_hl0, tcp_dl0;
+	uint16_t len;
+
+	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
+			pkt0->l2_len);
+	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
+	tcp_hl0 = pkt0->l4_len;
+
+	/* check if TCP option fields equal. If not, return 0. */
+	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
+	if ((tcp_hl != tcp_hl0) ||
+			((len > 0) && (memcmp(tcp_hdr + 1,
+					tcp_hdr0 + 1,
+					len) != 0)))
+		return 0;
+
+	/* check if the two packets are neighbors */
+	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
+	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
+			(ip_id == (item->ip_id + 1)))
+		/* append the new packet */
+		return 1;
+	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
+			((ip_id + item->nb_merged) == item->ip_id))
+		/* pre-pend the new packet */
+		return -1;
+	else
+		return 0;
+}
+
+static inline uint32_t
+find_an_empty_item(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].firstseg == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+find_an_empty_key(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].is_valid == 0)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+insert_new_item(struct gro_tcp4_tbl *tbl,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		uint32_t prev_idx,
+		uint64_t start_time)
+{
+	uint32_t item_idx;
+
+	item_idx = find_an_empty_item(tbl);
+	if (item_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	tbl->items[item_idx].firstseg = pkt;
+	tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
+	tbl->items[item_idx].start_time = start_time;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].sent_seq = sent_seq;
+	tbl->items[item_idx].ip_id = ip_id;
+	tbl->items[item_idx].nb_merged = 1;
+	tbl->item_num++;
+
+	/* if the previous packet exists, chain the new one with it */
+	if (prev_idx != INVALID_ARRAY_INDEX)
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+
+	return item_idx;
+}
+
+static inline uint32_t
+delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx)
+{
+	uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
+
+	/* set NULL to firstseg to indicate it's an empty item */
+	tbl->items[item_idx].firstseg = NULL;
+	tbl->item_num--;
+
+	return next_idx;
+}
+
+static inline uint32_t
+insert_new_key(struct gro_tcp4_tbl *tbl,
+		struct tcp4_key *key_src,
+		uint32_t item_idx)
+{
+	struct tcp4_key *key_dst;
+	uint32_t key_idx;
+
+	key_idx = find_an_empty_key(tbl);
+	if (key_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	key_dst = &(tbl->keys[key_idx].key);
+
+	ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
+	ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
+	key_dst->ip_src_addr = key_src->ip_src_addr;
+	key_dst->ip_dst_addr = key_src->ip_dst_addr;
+	key_dst->recv_ack = key_src->recv_ack;
+	key_dst->src_port = key_src->src_port;
+	key_dst->dst_port = key_src->dst_port;
+
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->keys[key_idx].is_valid = 1;
+	tbl->key_num++;
+
+	return key_idx;
+}
+
+static inline int
+compare_key(struct tcp4_key k1, struct tcp4_key k2)
+{
+	uint16_t *c1, *c2;
+
+	c1 = (uint16_t *)&(k1.eth_saddr);
+	c2 = (uint16_t *)&(k2.eth_saddr);
+	if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
+		return -1;
+	c1 = (uint16_t *)&(k1.eth_daddr);
+	c2 = (uint16_t *)&(k2.eth_daddr);
+	if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
+		return -1;
+	if ((k1.ip_src_addr != k2.ip_src_addr) ||
+			(k1.ip_dst_addr != k2.ip_dst_addr) ||
+			(k1.recv_ack != k2.recv_ack) ||
+			(k1.src_port != k2.src_port) ||
+			(k1.dst_port != k2.dst_port))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * update packet length and IP ID for the flushed packet.
+ */
+static inline void
+update_packet_header(struct gro_tcp4_item *item)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct rte_mbuf *pkt = item->firstseg;
+
+	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			pkt->l2_len);
+	ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
+			pkt->l2_len);
+	ipv4_hdr->packet_id = rte_cpu_to_be_16(item->ip_id);
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint32_t sent_seq;
+	uint16_t tcp_dl, ip_id;
+
+	struct tcp4_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i;
+	int cmp;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+
+	/*
+	 * if FIN, SYN, RST, PSH, URG, ECE or CWR is set, return immediately.
+	 */
+	if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
+		return -1;
+	/* if payload length is 0, return immediately */
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
+		pkt->l4_len;
+	if (tcp_dl == 0)
+		return -1;
+
+	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
+	ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
+	key.ip_src_addr = ipv4_hdr->src_addr;
+	key.ip_dst_addr = ipv4_hdr->dst_addr;
+	key.src_port = tcp_hdr->src_port;
+	key.dst_port = tcp_hdr->dst_port;
+	key.recv_ack = tcp_hdr->recv_ack;
+
+	/* search for a key */
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if ((tbl->keys[i].is_valid == 1) &&
+				(compare_key(tbl->keys[i].key, key) == 0))
+			break;
+	}
+
+	/* can't find a key, so insert a new key and a new item. */
+	if (i == tbl->max_key_num) {
+		item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
+				INVALID_ARRAY_INDEX, start_time);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		if (insert_new_key(tbl, &key, item_idx) ==
+				INVALID_ARRAY_INDEX) {
+			/* fail to insert a new key, delete the inserted item */
+			delete_item(tbl, item_idx);
+			return -1;
+		}
+		return 0;
+	}
+
+	/* traverse all packets in the item group to find one to merge */
+	cur_idx = tbl->keys[i].start_index;
+	prev_idx = cur_idx;
+	do {
+		cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
+				pkt->l4_len, tcp_dl, ip_id, sent_seq);
+		if (cmp != 0) {
+			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), pkt,
+						ip_id, sent_seq, cmp) > 0)
+				return 1;
+			/*
+			 * fail to merge two packets since the packet length
+			 * will be greater than the max value. So insert the
+			 * packet into the item group.
+			 */
+			if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
+						start_time) == INVALID_ARRAY_INDEX)
+				return -1;
+			return 0;
+		}
+		prev_idx = cur_idx;
+		cur_idx = tbl->items[cur_idx].next_pkt_idx;
+	} while (cur_idx != INVALID_ARRAY_INDEX);
+
+	/*
+	 * can't find a packet in the item group to merge,
+	 * so insert the packet into the item group.
+	 */
+	if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
+				start_time) == INVALID_ARRAY_INDEX)
+		return -1;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+	uint64_t current_time;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		if (tbl->keys[i].is_valid == 0)
+			continue;
+
+		j = tbl->keys[i].start_index;
+		do {
+			if ((current_time - tbl->items[j].start_time) >=
+					timeout_cycles) {
+				out[k++] = tbl->items[j].firstseg;
+				update_packet_header(&(tbl->items[j]));
+				/* delete the item and get the next packet index */
+				j = delete_item(tbl, j);
+
+				/* delete the key as all of packets are flushed */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].is_valid = 0;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/*
+				 * left packets of this key won't be timeout, so go to
+				 * check other keys.
+				 */
+				break;
+		} while (j != INVALID_ARRAY_INDEX);
+	}
+	return k;
+}
+
+uint32_t
+gro_tcp4_tbl_get_count(void *tbl)
+{
+	struct gro_tcp4_tbl *gro_tbl = tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+
+	return 0;
+}
diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
new file mode 100644
index 0000000..4a57451
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.h
@@ -0,0 +1,206 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _GRO_TCP4_H_
+#define _GRO_TCP4_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
+
+/*
+ * the max L3 length of a TCP/IPv4 packet. The L3 length
+ * is the sum of ipv4 header, tcp header and L4 payload.
+ */
+#define TCP4_MAX_L3_LENGTH UINT16_MAX
+
+/* criteria of mergeing packets */
+struct tcp4_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;
+	uint16_t src_port;
+	uint16_t dst_port;
+};
+
+struct gro_tcp4_key {
+	struct tcp4_key key;
+	/* the index of the first packet in the item group */
+	uint32_t start_index;
+	uint8_t is_valid;
+};
+
+struct gro_tcp4_item {
+	/*
+	 * first segment of the packet. If the value
+	 * is NULL, it means the item is empty.
+	 */
+	struct rte_mbuf *firstseg;
+	/* last segment of the packet */
+	struct rte_mbuf *lastseg;
+	/*
+	 * the time when the first packet is inserted
+	 * into the table. If a packet in the table is
+	 * merged with an incoming packet, this value
+	 * won't be updated. We set this value only
+	 * when the first packet is inserted into the
+	 * table.
+	 */
+	uint64_t start_time;
+	/*
+	 * we use next_pkt_idx to chain the packets that
+	 * have same key value but can't be merged together.
+	 */
+	uint32_t next_pkt_idx;
+	/* the sequence number of the packet */
+	uint32_t sent_seq;
+	/* the IP ID of the packet */
+	uint16_t ip_id;
+	/* the number of merged packets */
+	uint16_t nb_merged;
+};
+
+/*
+ * TCP/IPv4 reassembly table structure.
+ */
+struct gro_tcp4_tbl {
+	/* item array */
+	struct gro_tcp4_item *items;
+	/* key array */
+	struct gro_tcp4_key *keys;
+	/* current item number */
+	uint32_t item_num;
+	/* current key num */
+	uint32_t key_num;
+	/* item array size */
+	uint32_t max_item_num;
+	/* key array size */
+	uint32_t max_key_num;
+};
+
+/**
+ * This function creates a TCP/IPv4 reassembly table.
+ *
+ * @param socket_id
+ *  socket index for allocating TCP/IPv4 reassemblt table
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP/IPv4 GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP/IPv4 GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP/IPv4 reassembly table.
+ *
+ * @param tbl
+ *  a pointer points to the TCP/IPv4 reassembly table.
+ */
+void gro_tcp4_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP/IPv4 reassembly table
+ * to merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
+ * CWR, ECE or URG bit is set, are returned immediately. Packets which
+ * only have packet headers (i.e. without data) are also returned
+ * immediately. Otherwise, the packet is either merged, or inserted into
+ * the table. Besides, if there is no available space to insert the
+ * packet, this function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP/IPv4 reassembly table.
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ *
+ * @return
+ *  if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
+ *  or URG bit is set, or there is no available space in the table to
+ *  insert a new item or a new key, return a negative value. If the
+ *  packet is merged successfully, return an positive value. If the
+ *  packet is inserted into the table, return 0.
+ */
+int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP/IPv4 reassembly table
+ * to applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param timeout_cycles
+ *  the maximum time that packets can stay in the table.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t timeout_cycles,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in a TCP/IPv4
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP/IPv4 reassembly table.
+ *
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t gro_tcp4_tbl_get_count(void *tbl);
+#endif
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index 24e5f2b..7488845 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "gro_tcp4.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
+		gro_tcp4_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_destroy, NULL};
+static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_get_count, NULL};
 
 /*
  * GRO context structure, which is used to merge packets. It keeps
@@ -124,27 +130,116 @@ rte_gro_ctx_destroy(void *ctx)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	struct gro_tcp4_tbl tcp_tbl;
+	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	/* get the actual number of packets */
+	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
+			param->max_item_per_flow));
+	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
+
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = item_num;
+	tcp_tbl.max_item_num = item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts);
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) * unprocess_num);
+		}
+	}
+
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *ctx __rte_unused)
+		void *ctx)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t current_time;
+
+	if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
+				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0) {
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) * unprocess_num);
+	}
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *ctx __rte_unused,
-		uint64_t gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *ctx,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct gro_ctx *gro_ctx = ctx;
+
+	gro_types = gro_types & gro_ctx->gro_types;
+	if (gro_types & RTE_GRO_TCP_IPV4) {
+		return gro_tcp4_tbl_timeout_flush(
+				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				gro_ctx->max_timeout_cycles,
+				out, max_nb_out);
+	}
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 54a6e82..c2140e6 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,8 +45,11 @@ extern "C" {
 /**< max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
 /**< current supported GRO num */
-#define RTE_GRO_TYPE_SUPPORT_NUM 0
+#define RTE_GRO_TYPE_SUPPORT_NUM 1
 
+/**< TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 struct rte_gro_param {
 	/**< desired GRO types */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v11 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-05  4:08                     ` [PATCH v11 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-07-05  4:08                     ` [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-05  4:08                     ` Jiayu Hu
  2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-05  4:08 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  36 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 214 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6789071..262d099 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b0b340e..e7d0842 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2414,6 +2415,41 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.gro_types = RTE_GRO_TCP_IPV4;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 0a23d82..a8a6ac1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -378,6 +379,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 364502d..377d933 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -428,6 +430,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -626,6 +636,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2b9a1ea..528c833 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-05  4:08                     ` [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-07  6:55                       ` Tan, Jianfeng
  0 siblings, 0 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-07  6:55 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, stephen, jingjing.wu, lei.a.yao



On 7/5/2017 12:08 PM, Jiayu Hu wrote:
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
>      to merge packets.
> - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
>      reassembly table.
> - gro_tcp4_tbl_get_count: return the number of packets in a TCP/IPv4
>      reassembly table.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
>
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
>
> In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> array and a item array, where the key array keeps the criteria to merge
> packets and the item array keeps packet information.
>
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
>      merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
>
> Each element in the item array keeps the information of one packet. It
> mainly includes three parts:
> - firstseg: the address of the first segment of the packet
> - lastsegL the address of the last segment of the packet
> - next_pkt_index: the index of the next packet in the same item group.
>      All packets in the same item group are chained by next_pkt_index.
>      With next_pkt_index, we can locate all packets in the same item
>      group one by one.
>
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with one of the
>      following properties won't be processed:
> 	- FIN, SYN, RST URG, PSH, ECE or CWR bit is set;
> 	- packet payload length is 0.
> b. traverse the key array to find a key which has the same criteria
>      value with the incoming packet. If find, goto step c. Otherwise,
>      insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
>      key. Then traverse all packets in the item group via next_pkt_index.
>      If find one packet which can merge with the incoming one, merge them
>      together. If can't find, insert the packet into this item group.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   doc/guides/rel_notes/release_17_08.rst |   7 +
>   lib/librte_gro/Makefile                |   1 +
>   lib/librte_gro/gro_tcp4.c              | 493 +++++++++++++++++++++++++++++++++
>   lib/librte_gro/gro_tcp4.h              | 206 ++++++++++++++
>   lib/librte_gro/rte_gro.c               | 121 +++++++-
>   lib/librte_gro/rte_gro.h               |   5 +-
>   6 files changed, 819 insertions(+), 14 deletions(-)
>   create mode 100644 lib/librte_gro/gro_tcp4.c
>   create mode 100644 lib/librte_gro/gro_tcp4.h
>
> diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> index 842f46f..f067247 100644
> --- a/doc/guides/rel_notes/release_17_08.rst
> +++ b/doc/guides/rel_notes/release_17_08.rst
> @@ -75,6 +75,13 @@ New Features
>   
>     Added support for firmwares with multiple Ethernet ports per physical port.
>   
> +* **Add Generic Receive Offload API support.**
> +
> +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> +  packets. GRO API assumes all inputted packets are with correct
> +  checksums. GRO API doesn't update checksums for merged packets. If
> +  inputted packets are IP fragmented, GRO API assumes they are complete
> +  packets (i.e. with L4 headers).
>   
>   Resolved Issues
>   ---------------
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 7e0f128..747eeec 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
>   
>   # source files
>   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
>   
>   # install this header file
>   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
> new file mode 100644
> index 0000000..703282d
> --- /dev/null
> +++ b/lib/librte_gro/gro_tcp4.c
> @@ -0,0 +1,493 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "gro_tcp4.h"
> +
> +void *
> +gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	struct gro_tcp4_tbl *tbl;
> +	size_t size;
> +	uint32_t entries_num;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;

As I commented before, this check is not good;
entries_num is uint32_t; it can never be greater than (UINT32_MAX - 1). 
Plus, we cannot allocate a memory as big as sizeof(struct gro_tcp4_item) 
* UINT32_MAX.
If we really need a check, please make it smaller. Considering each item 
means a flow in some extent, I think we can limit it to 1M flows for now.

(Sorry, I should comment at the definition of GRO_TCP4_TBL_MAX_ITEM_NUM.


> +
> +	if (entries_num == 0)
> +		return NULL;
> +
> +	tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct gro_tcp4_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl == NULL)
> +		return NULL;
> +
> +	size = sizeof(struct gro_tcp4_item) * entries_num;
> +	tbl->items = rte_zmalloc_socket(__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl->items == NULL) {
> +		rte_free(tbl);
> +		return NULL;
> +	}
> +	tbl->max_item_num = entries_num;
> +
> +	size = sizeof(struct gro_tcp4_key) * entries_num;
> +	tbl->keys = rte_zmalloc_socket(__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl->keys == NULL) {
> +		rte_free(tbl->items);
> +		rte_free(tbl);
> +		return NULL;
> +	}
> +	tbl->max_key_num = entries_num;
> +
> +	return tbl;
> +}
> +
> +void
> +gro_tcp4_tbl_destroy(void *tbl)
> +{
> +	struct gro_tcp4_tbl *tcp_tbl = tbl;
> +
> +	if (tcp_tbl) {
> +		rte_free(tcp_tbl->items);
> +		rte_free(tcp_tbl->keys);
> +	}
> +	rte_free(tcp_tbl);
> +}
> +
> +/*
> + * merge two TCP/IPv4 packets without updating checksums.
> + * If cmp is larger than 0, append the new packet to the
> + * original packet. Otherwise, pre-pend the new packet to
> + * the original packet.
> + */
> +static inline int
> +merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> +		struct rte_mbuf *pkt,
> +		uint16_t ip_id,
> +		uint32_t sent_seq,
> +		int cmp)
> +{
> +	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> +	uint16_t tcp_dl1;

We don't have a tcp_dl2, and for readability,  we should not hide "dl"; 
so just change the name to tcp_datalen.

> +
> +	if (cmp > 0) {
> +		pkt_head = item_src->firstseg;
> +		pkt_tail = pkt;
> +	} else {
> +		pkt_head = pkt;
> +		pkt_tail = item_src->firstseg;
> +	}
> +
> +	/* check if the packet length will be beyond the max value */
> +	tcp_dl1 = pkt_tail->pkt_len - pkt_tail->l2_len -
> +		pkt_tail->l3_len - pkt_tail->l4_len;
> +	if (pkt_head->pkt_len - pkt_head->l2_len + tcp_dl1 >
> +			TCP4_MAX_L3_LENGTH)
> +		return -1;
> +
> +	/* remove packet header for the tail packet */
> +	rte_pktmbuf_adj(pkt_tail,
> +			pkt_tail->l2_len +
> +			pkt_tail->l3_len +
> +			pkt_tail->l4_len);
> +
> +	/* chain two packets together */
> +	if (cmp > 0) {
> +		item_src->lastseg->next = pkt;
> +		item_src->lastseg = rte_pktmbuf_lastseg(pkt);
> +		/* update IP ID to the larger value */
> +		item_src->ip_id = ip_id;
> +	} else {
> +		lastseg = rte_pktmbuf_lastseg(pkt);
> +		lastseg->next = item_src->firstseg;
> +		item_src->firstseg = pkt;
> +		/* update sent_seq to the smaller value */
> +		item_src->sent_seq = sent_seq;
> +	}
> +	item_src->nb_merged++;
> +
> +	/* update mbuf metadata for the merged packet */
> +	pkt_head->nb_segs += pkt_tail->nb_segs;
> +	pkt_head->pkt_len += pkt_tail->pkt_len;
> +
> +	return 1;
> +}
> +
> +static inline int
> +check_seq_option(struct gro_tcp4_item *item,
> +		struct tcp_hdr *tcp_hdr,
> +		uint16_t tcp_hl,
> +		uint16_t tcp_dl,
> +		uint16_t ip_id,
> +		uint32_t sent_seq)
> +{
> +	struct rte_mbuf *pkt0 = item->firstseg;
> +	struct ipv4_hdr *ipv4_hdr0;
> +	struct tcp_hdr *tcp_hdr0;
> +	uint16_t tcp_hl0, tcp_dl0;
> +	uint16_t len;
> +
> +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
> +			pkt0->l2_len);
> +	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
> +	tcp_hl0 = pkt0->l4_len;
> +
> +	/* check if TCP option fields equal. If not, return 0. */
> +	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> +	if ((tcp_hl != tcp_hl0) ||
> +			((len > 0) && (memcmp(tcp_hdr + 1,
> +					tcp_hdr0 + 1,
> +					len) != 0)))
> +		return 0;
> +
> +	/* check if the two packets are neighbors */
> +	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
> +	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
> +			(ip_id == (item->ip_id + 1)))
> +		/* append the new packet */
> +		return 1;
> +	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
> +			((ip_id + item->nb_merged) == item->ip_id))
> +		/* pre-pend the new packet */
> +		return -1;
> +	else
> +		return 0;
> +}
> +
> +static inline uint32_t
> +find_an_empty_item(struct gro_tcp4_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_item_num; i++)
> +		if (tbl->items[i].firstseg == NULL)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static inline uint32_t
> +find_an_empty_key(struct gro_tcp4_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_key_num; i++)
> +		if (tbl->keys[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static inline uint32_t
> +insert_new_item(struct gro_tcp4_tbl *tbl,
> +		struct rte_mbuf *pkt,
> +		uint16_t ip_id,
> +		uint32_t sent_seq,
> +		uint32_t prev_idx,
> +		uint64_t start_time)
> +{
> +	uint32_t item_idx;
> +
> +	item_idx = find_an_empty_item(tbl);
> +	if (item_idx == INVALID_ARRAY_INDEX)
> +		return INVALID_ARRAY_INDEX;
> +
> +	tbl->items[item_idx].firstseg = pkt;
> +	tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
> +	tbl->items[item_idx].start_time = start_time;
> +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +	tbl->items[item_idx].sent_seq = sent_seq;
> +	tbl->items[item_idx].ip_id = ip_id;
> +	tbl->items[item_idx].nb_merged = 1;
> +	tbl->item_num++;
> +
> +	/* if the previous packet exists, chain the new one with it */
> +	if (prev_idx != INVALID_ARRAY_INDEX)
> +		tbl->items[prev_idx].next_pkt_idx = item_idx;
> +
> +	return item_idx;
> +}
> +
> +static inline uint32_t
> +delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx)
> +{
> +	uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
> +
> +	/* set NULL to firstseg to indicate it's an empty item */
> +	tbl->items[item_idx].firstseg = NULL;
> +	tbl->item_num--;
> +
> +	return next_idx;
> +}
> +
> +static inline uint32_t
> +insert_new_key(struct gro_tcp4_tbl *tbl,
> +		struct tcp4_key *key_src,
> +		uint32_t item_idx)
> +{
> +	struct tcp4_key *key_dst;
> +	uint32_t key_idx;
> +
> +	key_idx = find_an_empty_key(tbl);
> +	if (key_idx == INVALID_ARRAY_INDEX)
> +		return INVALID_ARRAY_INDEX;
> +
> +	key_dst = &(tbl->keys[key_idx].key);
> +
> +	ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
> +	ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
> +	key_dst->ip_src_addr = key_src->ip_src_addr;
> +	key_dst->ip_dst_addr = key_src->ip_dst_addr;
> +	key_dst->recv_ack = key_src->recv_ack;
> +	key_dst->src_port = key_src->src_port;
> +	key_dst->dst_port = key_src->dst_port;
> +
> +	tbl->keys[key_idx].start_index = item_idx;
> +	tbl->keys[key_idx].is_valid = 1;
> +	tbl->key_num++;
> +
> +	return key_idx;
> +}
> +
> +static inline int
> +compare_key(struct tcp4_key k1, struct tcp4_key k2)
> +{
> +	uint16_t *c1, *c2;
> +
> +	c1 = (uint16_t *)&(k1.eth_saddr);
> +	c2 = (uint16_t *)&(k2.eth_saddr);
> +	if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
> +		return -1;
> +	c1 = (uint16_t *)&(k1.eth_daddr);
> +	c2 = (uint16_t *)&(k2.eth_daddr);
> +	if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
> +		return -1;
> +	if ((k1.ip_src_addr != k2.ip_src_addr) ||
> +			(k1.ip_dst_addr != k2.ip_dst_addr) ||
> +			(k1.recv_ack != k2.recv_ack) ||
> +			(k1.src_port != k2.src_port) ||
> +			(k1.dst_port != k2.dst_port))
> +		return -1;
> +
> +	return 0;
> +}

Above function can be written in a cleaner way:

static inline int
is_same_key(struct tcp4_key k1, struct tcp4_key k2)
{

         if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
                 return 0;

         if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
                 return 0;

         return ((k1.ip_src_addr == k2.ip_src_addr) &&
                 (k1.ip_dst_addr == k2.ip_dst_addr) &&
                 (k1.recv_ack == k2.recv_ack) &&
                 (k1.src_port == k2.src_port) &&
                 (k1.dst_port == k2.dst_port));
}

> +
> +/*
> + * update packet length and IP ID for the flushed packet.
> + */
> +static inline void
> +update_packet_header(struct gro_tcp4_item *item)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct rte_mbuf *pkt = item->firstseg;
> +
> +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			pkt->l2_len);
> +	ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
> +			pkt->l2_len);
> +	ipv4_hdr->packet_id = rte_cpu_to_be_16(item->ip_id);
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp4_tbl *tbl,
> +		uint64_t start_time)
> +{
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint32_t sent_seq;
> +	uint16_t tcp_dl, ip_id;
> +
> +	struct tcp4_key key;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint32_t i;
> +	int cmp;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> +
> +	/*
> +	 * if FIN, SYN, RST, PSH, URG, ECE or CWR is set, return immediately.
> +	 */
> +	if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
> +		return -1;
> +	/* if payload length is 0, return immediately */
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> +		pkt->l4_len;
> +	if (tcp_dl == 0)
> +		return -1;
> +
> +	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
> +	ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
> +	key.ip_src_addr = ipv4_hdr->src_addr;
> +	key.ip_dst_addr = ipv4_hdr->dst_addr;
> +	key.src_port = tcp_hdr->src_port;
> +	key.dst_port = tcp_hdr->dst_port;
> +	key.recv_ack = tcp_hdr->recv_ack;
> +
> +	/* search for a key */
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		if ((tbl->keys[i].is_valid == 1) &&
> +				(compare_key(tbl->keys[i].key, key) == 0))
> +			break;

Simplified as:
         for (i = 0; i < tbl->max_key_num; i++)
                 if (tbl->keys[i].is_valid && 
is_same_key(tbl->keys[i].key, key))
                         break;

> +	}
> +
> +	/* can't find a key, so insert a new key and a new item. */
> +	if (i == tbl->max_key_num) {
> +		item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
> +				INVALID_ARRAY_INDEX, start_time);
> +		if (item_idx == INVALID_ARRAY_INDEX)
> +			return -1;
> +		if (insert_new_key(tbl, &key, item_idx) ==
> +				INVALID_ARRAY_INDEX) {
> +			/* fail to insert a new key, delete the inserted item */
> +			delete_item(tbl, item_idx);
> +			return -1;
> +		}
> +		return 0;
> +	}
> +
> +	/* traverse all packets in the item group to find one to merge */
> +	cur_idx = tbl->keys[i].start_index;
> +	prev_idx = cur_idx;
> +	do {
> +		cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
> +				pkt->l4_len, tcp_dl, ip_id, sent_seq);
> +		if (cmp != 0) {
> +			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), pkt,
> +						ip_id, sent_seq, cmp) > 0)
> +				return 1;
> +			/*
> +			 * fail to merge two packets since the packet length
> +			 * will be greater than the max value. So insert the
> +			 * packet into the item group.
> +			 */
> +			if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> +						start_time) == INVALID_ARRAY_INDEX)
> +				return -1;
> +			return 0;
> +		}
> +		prev_idx = cur_idx;
> +		cur_idx = tbl->items[cur_idx].next_pkt_idx;
> +	} while (cur_idx != INVALID_ARRAY_INDEX);
> +
> +	/*
> +	 * can't find a packet in the item group to merge,
> +	 * so insert the packet into the item group.
> +	 */
> +	if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> +				start_time) == INVALID_ARRAY_INDEX)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +uint16_t
> +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out)
> +{
> +	uint16_t k = 0;
> +	uint32_t i, j;
> +	uint64_t current_time;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		/* all keys have been checked, return immediately */
> +		if (tbl->key_num == 0)
> +			return k;
> +
> +		if (tbl->keys[i].is_valid == 0)
> +			continue;
> +
> +		j = tbl->keys[i].start_index;
> +		do {
> +			if ((current_time - tbl->items[j].start_time) >=
> +					timeout_cycles) {
> +				out[k++] = tbl->items[j].firstseg;
> +				update_packet_header(&(tbl->items[j]));
> +				/* delete the item and get the next packet index */
> +				j = delete_item(tbl, j);
> +
> +				/* delete the key as all of packets are flushed */
> +				if (j == INVALID_ARRAY_INDEX) {
> +					tbl->keys[i].is_valid = 0;
> +					tbl->key_num--;
> +				} else
> +					/* update start_index of the key */
> +					tbl->keys[i].start_index = j;
> +
> +				if (k == nb_out)
> +					return k;
> +			} else
> +				/*
> +				 * left packets of this key won't be timeout, so go to
> +				 * check other keys.
> +				 */
> +				break;
> +		} while (j != INVALID_ARRAY_INDEX);
> +	}
> +	return k;
> +}
> +
> +uint32_t
> +gro_tcp4_tbl_get_count(void *tbl)
> +{
> +	struct gro_tcp4_tbl *gro_tbl = tbl;
> +
> +	if (gro_tbl)
> +		return gro_tbl->item_num;
> +
> +	return 0;
> +}
> diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
> new file mode 100644
> index 0000000..4a57451
> --- /dev/null
> +++ b/lib/librte_gro/gro_tcp4.h
> @@ -0,0 +1,206 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _GRO_TCP4_H_
> +#define _GRO_TCP4_H_
> +
> +#define INVALID_ARRAY_INDEX 0xffffffffUL
> +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> +
> +/*
> + * the max L3 length of a TCP/IPv4 packet. The L3 length
> + * is the sum of ipv4 header, tcp header and L4 payload.
> + */
> +#define TCP4_MAX_L3_LENGTH UINT16_MAX
> +
> +/* criteria of mergeing packets */
> +struct tcp4_key {
> +	struct ether_addr eth_saddr;
> +	struct ether_addr eth_daddr;
> +	uint32_t ip_src_addr;
> +	uint32_t ip_dst_addr;
> +
> +	uint32_t recv_ack;
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +};
> +
> +struct gro_tcp4_key {
> +	struct tcp4_key key;
> +	/* the index of the first packet in the item group */
> +	uint32_t start_index;
> +	uint8_t is_valid;
> +};
> +
> +struct gro_tcp4_item {
> +	/*
> +	 * first segment of the packet. If the value
> +	 * is NULL, it means the item is empty.
> +	 */
> +	struct rte_mbuf *firstseg;
> +	/* last segment of the packet */
> +	struct rte_mbuf *lastseg;
> +	/*
> +	 * the time when the first packet is inserted
> +	 * into the table. If a packet in the table is
> +	 * merged with an incoming packet, this value
> +	 * won't be updated. We set this value only
> +	 * when the first packet is inserted into the
> +	 * table.
> +	 */
> +	uint64_t start_time;
> +	/*
> +	 * we use next_pkt_idx to chain the packets that
> +	 * have same key value but can't be merged together.
> +	 */
> +	uint32_t next_pkt_idx;
> +	/* the sequence number of the packet */
> +	uint32_t sent_seq;
> +	/* the IP ID of the packet */
> +	uint16_t ip_id;
> +	/* the number of merged packets */
> +	uint16_t nb_merged;
> +};
> +
> +/*
> + * TCP/IPv4 reassembly table structure.
> + */
> +struct gro_tcp4_tbl {
> +	/* item array */
> +	struct gro_tcp4_item *items;
> +	/* key array */
> +	struct gro_tcp4_key *keys;
> +	/* current item number */
> +	uint32_t item_num;
> +	/* current key num */
> +	uint32_t key_num;
> +	/* item array size */
> +	uint32_t max_item_num;
> +	/* key array size */
> +	uint32_t max_key_num;
> +};
> +
> +/**
> + * This function creates a TCP/IPv4 reassembly table.
> + *
> + * @param socket_id
> + *  socket index for allocating TCP/IPv4 reassemblt table
> + * @param max_flow_num
> + *  the maximum number of flows in the TCP/IPv4 GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow.
> + *
> + * @return
> + *  if create successfully, return a pointer which points to the
> + *  created TCP/IPv4 GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +
> +/**
> + * This function destroys a TCP/IPv4 reassembly table.
> + *
> + * @param tbl
> + *  a pointer points to the TCP/IPv4 reassembly table.
> + */
> +void gro_tcp4_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP/IPv4 reassembly table
> + * to merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
> + * CWR, ECE or URG bit is set, are returned immediately. Packets which
> + * only have packet headers (i.e. without data) are also returned
> + * immediately. Otherwise, the packet is either merged, or inserted into
> + * the table. Besides, if there is no available space to insert the
> + * packet, this function returns immediately too.
> + *
> + * This function assumes the inputted packet is with correct IPv4 and
> + * TCP checksums. And if two packets are merged, it won't re-calculate
> + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> + * fragmented, it assumes the packet is complete (with TCP header).
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param tbl
> + *  a pointer that points to a TCP/IPv4 reassembly table.
> + * @start_time
> + *  the start time that the packet is inserted into the table
> + *
> + * @return
> + *  if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
> + *  or URG bit is set, or there is no available space in the table to
> + *  insert a new item or a new key, return a negative value. If the
> + *  packet is merged successfully, return an positive value. If the
> + *  packet is inserted into the table, return 0.
> + */
> +int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp4_tbl *tbl,
> +		uint64_t start_time);
> +
> +/**
> + * This function flushes timeout packets in a TCP/IPv4 reassembly table
> + * to applications, and without updating checksums for merged packets.
> + * The max number of flushed timeout packets is the element number of
> + * the array which is used to keep flushed packets.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + *  the maximum time that packets can stay in the table.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the element number of out. It's also the max number of timeout
> + *  packets that can be flushed finally.
> + *
> + * @return
> + *  the number of packets that are returned.
> + */
> +uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out);
> +
> +/**
> + * This function returns the number of the packets in a TCP/IPv4
> + * reassembly table.
> + *
> + * @param tbl
> + *  pointer points to a TCP/IPv4 reassembly table.
> + *
> + * @return
> + *  the number of packets in the table
> + */
> +uint32_t gro_tcp4_tbl_get_count(void *tbl);
> +#endif
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index 24e5f2b..7488845 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,8 +32,11 @@
>   
>   #include <rte_malloc.h>
>   #include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
>   
>   #include "rte_gro.h"
> +#include "gro_tcp4.h"
>   
>   typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
>   		uint16_t max_flow_num,
> @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
>   typedef void (*gro_tbl_destroy_fn)(void *tbl);
>   typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl);
>   
> -static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
> +		gro_tcp4_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
> +			gro_tcp4_tbl_destroy, NULL};
> +static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
> +			gro_tcp4_tbl_get_count, NULL};
>   
>   /*
>    * GRO context structure, which is used to merge packets. It keeps
> @@ -124,27 +130,116 @@ rte_gro_ctx_destroy(void *ctx)
>   }
>   
>   uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>   		uint16_t nb_pkts,
> -		const struct rte_gro_param *param __rte_unused)
> +		const struct rte_gro_param *param)
>   {
> -	return nb_pkts;
> +	uint16_t i;
> +	uint16_t nb_after_gro = nb_pkts;
> +	uint32_t item_num;
> +
> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	struct gro_tcp4_tbl tcp_tbl;
> +	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +	uint64_t current_time;
> +
> +	if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	/* get the actual number of packets */
> +	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> +			param->max_item_per_flow));
> +	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
> +
> +	tcp_tbl.keys = tcp_keys;
> +	tcp_tbl.items = tcp_items;
> +	tcp_tbl.key_num = 0;
> +	tcp_tbl.item_num = 0;
> +	tcp_tbl.max_key_num = item_num;
> +	tcp_tbl.max_item_num = item_num;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {

Keep one style to check the ptypes, either macro or just compare the bit 
like:

pkt->packet_type & (RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP) == 
(RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP)

> +			ret = gro_tcp4_reassemble(pkts[i],
> +					&tcp_tbl,
> +					current_time);
> +			if (ret > 0)
> +				/* merge successfully */
> +				nb_after_gro--;
> +			else if (ret < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] = pkts[i];
> +	}
> +
> +	/* re-arrange GROed packets */
> +	if (nb_after_gro < nb_pkts) {
> +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts);
> +		if (unprocess_num > 0) {
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) * unprocess_num);
> +		}
> +	}
> +
> +	return nb_after_gro;
>   }
>   
>   uint16_t
> -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble(struct rte_mbuf **pkts,
>   		uint16_t nb_pkts,
> -		void *ctx __rte_unused)
> +		void *ctx)
>   {
> -	return nb_pkts;
> +	uint16_t i, unprocess_num = 0;
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	struct gro_ctx *gro_ctx = ctx;
> +	uint64_t current_time;
> +
> +	if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
> +			if (gro_tcp4_reassemble(pkts[i],
> +						gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
> +						current_time) < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] = pkts[i];
> +	}
> +	if (unprocess_num > 0) {
> +		memcpy(pkts, unprocess_pkts,
> +				sizeof(struct rte_mbuf *) * unprocess_num);
> +	}
> +
> +	return unprocess_num;
>   }
>   
>   uint16_t
> -rte_gro_timeout_flush(void *ctx __rte_unused,
> -		uint64_t gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(void *ctx,
> +		uint64_t gro_types,
> +		struct rte_mbuf **out,
> +		uint16_t max_nb_out)
>   {
> +	struct gro_ctx *gro_ctx = ctx;
> +
> +	gro_types = gro_types & gro_ctx->gro_types;
> +	if (gro_types & RTE_GRO_TCP_IPV4) {
> +		return gro_tcp4_tbl_timeout_flush(
> +				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
> +				gro_ctx->max_timeout_cycles,
> +				out, max_nb_out);
> +	}
>   	return 0;
>   }
>   
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 54a6e82..c2140e6 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -45,8 +45,11 @@ extern "C" {
>   /**< max number of supported GRO types */
>   #define RTE_GRO_TYPE_MAX_NUM 64
>   /**< current supported GRO num */
> -#define RTE_GRO_TYPE_SUPPORT_NUM 0
> +#define RTE_GRO_TYPE_SUPPORT_NUM 1
>   
> +/**< TCP/IPv4 GRO flag */
> +#define RTE_GRO_TCP_IPV4_INDEX 0
> +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
>   
>   struct rte_gro_param {
>   	/**< desired GRO types */

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v11 1/3] lib: add Generic Receive Offload API framework
  2017-07-05  4:08                     ` [PATCH v11 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-07  6:55                       ` Tan, Jianfeng
  2017-07-07  9:19                         ` Tan, Jianfeng
  0 siblings, 1 reply; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-07  6:55 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, stephen, jingjing.wu, lei.a.yao



On 7/5/2017 12:08 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweight mode, the other is called heavyweight mode.
> If applications want to merge packets in a simple way and the number
> of packets is relatively small, they can use the lightweight mode.
> If applications need more fine-grained controls, they can choose the
> heavyweight mode.
>
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweight mode and processes N packets at a time. For applications,
> performing GRO in lightweight mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
>
> rte_gro_reassemble is the main reassembly API which is used in
> heavyweight mode and tries to merge N inputted packets with the packets
> in GRO reassembly tables. For applications, performing GRO in heavyweight
> mode is relatively complicated. Before performing GRO, applications need
> to create a GRO context object, which keeps reassembly tables of
> desired GRO types, by rte_gro_ctrl_create. Then applications can use
> rte_gro_reassemble to merge packets. The GROed packets are in the
> reassembly tables of the GRO context object. If applications want to get
> them, applications need to manually flush them by flush API.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> ---
>   config/common_base                 |   5 +
>   lib/Makefile                       |   2 +
>   lib/librte_gro/Makefile            |  50 ++++++++++
>   lib/librte_gro/rte_gro.c           | 171 ++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro.h           | 195 +++++++++++++++++++++++++++++++++++++
>   lib/librte_gro/rte_gro_version.map |  12 +++
>   mk/rte.app.mk                      |   1 +
>   7 files changed, 436 insertions(+)
>   create mode 100644 lib/librte_gro/Makefile
>   create mode 100644 lib/librte_gro/rte_gro.c
>   create mode 100644 lib/librte_gro/rte_gro.h
>   create mode 100644 lib/librte_gro/rte_gro_version.map
>
> diff --git a/config/common_base b/config/common_base
> index 660588a..19605a2 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -713,6 +713,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>   CONFIG_RTE_LIBRTE_PMD_VHOST=n
>   
>   #
> +# Compile GRO library
> +#
> +CONFIG_RTE_LIBRTE_GRO=y
> +
> +#
>   #Compile Xen domain0 support
>   #
>   CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/lib/Makefile b/lib/Makefile
> index 07e1fd0..ac1c2f6 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
>   DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
>   DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
>   DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
> +DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
> +DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> new file mode 100644
> index 0000000..7e0f128
> --- /dev/null
> +++ b/lib/librte_gro/Makefile
> @@ -0,0 +1,50 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2017 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_gro.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +
> +EXPORT_MAP := rte_gro_version.map
> +
> +LIBABIVER := 1
> +
> +# source files
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +
> +# install this header file
> +SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..24e5f2b
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,171 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +
> +#include "rte_gro.h"
> +
> +typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +typedef void (*gro_tbl_destroy_fn)(void *tbl);
> +typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl);
> +
> +static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM];
> +
> +/*
> + * GRO context structure, which is used to merge packets. It keeps
> + * many reassembly tables of desired GRO types. Applications need to
> + * create GRO context objects before using rte_gro_reassemble to
> + * perform GRO.
> + */
> +struct gro_ctx {
> +	/* GRO types to perform */
> +	uint64_t gro_types;
> +	/* max TTL in nanosecond */
> +	uint64_t max_timeout_cycles;

Not necessary to put this field in gro_ctx. This can be specified as a 
parameter when call flush().

> +	/* reassembly tables */
> +	void *tbls[RTE_GRO_TYPE_MAX_NUM];
> +};
> +
> +void *
> +rte_gro_ctx_create(const struct rte_gro_param *param)
> +{
> +	struct gro_ctx *gro_ctx;
> +	gro_tbl_create_fn create_tbl_fn;
> +	uint64_t gro_type_flag = 0;
> +	uint64_t gro_types = 0;
> +	uint8_t i;
> +
> +	gro_ctx = rte_zmalloc_socket(__func__,
> +			sizeof(struct gro_ctx),
> +			RTE_CACHE_LINE_SIZE,
> +			param->socket_id);
> +	if (gro_ctx == NULL)
> +		return NULL;
> +
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if ((param->gro_types & gro_type_flag) == 0)
> +			continue;
> +
> +		create_tbl_fn = tbl_create_fn[i];
> +		if (create_tbl_fn == NULL)
> +			continue;
> +
> +		gro_ctx->tbls[i] = create_tbl_fn(param->socket_id,
> +				param->max_flow_num,
> +				param->max_item_per_flow);
> +		if (gro_ctx->tbls[i] == NULL) {
> +			/* destroy all created tables */
> +			gro_ctx->gro_types = gro_types;
> +			rte_gro_ctx_destroy(gro_ctx);
> +			return NULL;
> +		}
> +		gro_types |= gro_type_flag;
> +	}
> +	gro_ctx->max_timeout_cycles = param->max_timeout_cycles;
> +	gro_ctx->gro_types = param->gro_types;
> +
> +	return gro_ctx;
> +}
> +
> +void
> +rte_gro_ctx_destroy(void *ctx)
> +{
> +	gro_tbl_destroy_fn destroy_tbl_fn;
> +	struct gro_ctx *gro_ctx = ctx;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	if (gro_ctx == NULL)
> +		return;
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if ((gro_ctx->gro_types & gro_type_flag) == 0)
> +			continue;
> +		destroy_tbl_fn = tbl_destroy_fn[i];
> +		if (destroy_tbl_fn)
> +			destroy_tbl_fn(gro_ctx->tbls[i]);
> +	}
> +	rte_free(gro_ctx);
> +}
> +
> +uint16_t
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +		uint16_t nb_pkts,
> +		const struct rte_gro_param *param __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +uint16_t
> +rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> +		uint16_t nb_pkts,
> +		void *ctx __rte_unused)
> +{
> +	return nb_pkts;
> +}
> +
> +uint16_t
> +rte_gro_timeout_flush(void *ctx __rte_unused,
> +		uint64_t gro_types __rte_unused,
> +		struct rte_mbuf **out __rte_unused,
> +		uint16_t max_nb_out __rte_unused)
> +{
> +	return 0;
> +}
> +
> +uint64_t
> +rte_gro_get_count(void *ctx)

rte_gro_get_pkts_count() sounds better? I'm a bad namer.

> +{
> +	struct gro_ctx *gro_ctx = ctx;
> +	gro_tbl_get_count_fn get_count_fn;
> +	uint64_t item_num = 0;
> +	uint64_t gro_type_flag;
> +	uint8_t i;
> +
> +	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> +		gro_type_flag = 1 << i;
> +		if ((gro_ctx->gro_types & gro_type_flag) == 0)
> +			continue;
> +
> +		get_count_fn = tbl_get_count_fn[i];
> +		if (get_count_fn == NULL)
> +			continue;
> +		item_num += get_count_fn(gro_ctx->tbls[i]);
> +	}
> +	return item_num;
> +}
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> new file mode 100644
> index 0000000..54a6e82
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.h
> @@ -0,0 +1,195 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_GRO_H_
> +#define _RTE_GRO_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**< the max packets number that rte_gro_reassemble_burst()
> + * can process in each invocation.
> + */
> +#define RTE_GRO_MAX_BURST_ITEM_NUM 128U
> +
> +/**< max number of supported GRO types */
> +#define RTE_GRO_TYPE_MAX_NUM 64
> +/**< current supported GRO num */
> +#define RTE_GRO_TYPE_SUPPORT_NUM 0
> +
> +
> +struct rte_gro_param {
> +	/**< desired GRO types */
> +	uint64_t gro_types;
> +	/**< max TTL for packets in reassembly tables, measured
> +	 * in nanosecond. When use rte_gro_reassemble_burst(),
> +	 * applications don't need to set this value.
> +	 */
> +	uint64_t max_timeout_cycles;
> +	/**< max flow number */
> +	uint16_t max_flow_num;
> +	/**< max packet number per flow */
> +	uint16_t max_item_per_flow;
> +	/**< socket index for allocating GRO related data structures,
> +	 * like reassembly tables. When use rte_gro_reassemble_burst(),
> +	 * applications don't need to set this value.
> +	 */
> +	uint16_t socket_id;
> +};
> +
> +/**
> + * This function create a GRO context object, which is used to merge
> + * packets in rte_gro_reassemble().
> + *
> + * @param param
> + *  applications use it to pass needed parameters to create a GRO
> + *  context object.
> + *
> + * @return
> + *  if create successfully, return a pointer which points to the GRO
> + *  context object. Otherwise, return NULL.
> + */
> +void *rte_gro_ctx_create(const struct rte_gro_param *param);
> +
> +/**
> + * This function destroys a GRO context object.
> + *
> + * @param ctx
> + *  pointer points to a GRO context object.
> + */
> +void rte_gro_ctx_destroy(void *ctx);
> +
> +/**
> + * This is one of the main reassembly APIs, which merges numbers of
> + * packets at a time. It assumes that all inputted packets are with
> + * correct checksums. That is, applications should guarantee all
> + * inputted packets are correct. Besides, it doesn't re-calculate
> + * checksums for merged packets. If inputted packets are IP fragmented,
> + * this function assumes them are complete (i.e. with L4 header). After
> + * finishing processing, it returns all GROed packets to applications
> + * immediately.
> + *
> + * @param pkts
> + *  a pointer array which points to the packets to reassemble. Besides,
> + *  it keeps packet addresses for GROed packets.
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param param
> + *  applications use it to tell rte_gro_reassemble_burst() what rules
> + *  are demanded.
> + *
> + * @return
> + *  the number of packets after been GROed. If no packets are merged,
> + *  the returned value is nb_pkts.
> + */
> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		const struct rte_gro_param *param);
> +
> +/**
> + * Reassembly function, which tries to merge inputted packets with
> + * the packets in the reassembly tables of a given GRO context. This
> + * function assumes all inputted packets are with correct checksums.
> + * And it won't update checksums if two packets are merged. Besides,
> + * if inputted packets are IP fragmented, this function assumes they
> + * are complete packets (i.e. with L4 header).
> + *
> + * If the inputted packets don't have data or are with unsupported GRO
> + * types etc., they won't be processed and are returned to applications.
> + * Otherwise, the inputted packets are either merged or inserted into
> + * the table. If applications want get packets in the table, they need
> + * to call flush API.
> + *
> + * @param pkts
> + *  packet to reassemble. Besides, after this function finishes, it
> + *  keeps the unprocessed packets (e.g. without data or unsupported
> + *  GRO types).
> + * @param nb_pkts
> + *  the number of packets to reassemble.
> + * @param ctx
> + *  a pointer points to a GRO context object.
> + *
> + * @return
> + *  return the number of unprocessed packets (e.g. without data or
> + *  unsupported GRO types). If all packets are processed (merged or
> + *  inserted into the table), return 0.
> + */
> +uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		void *ctx);
> +
> +/**
> + * This function flushes the timeout packets from reassembly tables of
> + * desired GRO types. The max number of flushed timeout packets is the
> + * element number of the array which is used to keep the flushed packets.
> + *
> + * Besides, this function won't re-calculate checksums for merged
> + * packets in the tables. That is, the returned packets may be with
> + * wrong checksums.
> + *
> + * @param ctx
> + *  a pointer points to a GRO context object.
> + * @param gro_types
> + * rte_gro_timeout_flush only flushes packets which belong to the
> + * GRO types specified by gro_types.
> + * @param out
> + *  a pointer array that is used to keep flushed timeout packets.
> + * @param nb_out
> + *  the element number of out. It's also the max number of timeout
> + *  packets that can be flushed finally.
> + *
> + * @return
> + *  the number of flushed packets. If no packets are flushed, return 0.
> + */
> +uint16_t rte_gro_timeout_flush(void *ctx,
> +		uint64_t gro_types,
> +		struct rte_mbuf **out,
> +		uint16_t max_nb_out);
> +
> +/**
> + * This function returns the number of packets in all reassembly tables
> + * of a given GRO context.
> + *
> + * @param ctx
> + *  pointer points to a GRO context object.
> + *
> + * @return
> + *  the number of packets in all reassembly tables.
> + */
> +uint64_t rte_gro_get_count(void *ctx);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_GRO_H_ */
> diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
> new file mode 100644
> index 0000000..ca139c5
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro_version.map
> @@ -0,0 +1,12 @@
> +DPDK_17.08 {
> +	global:
> +
> +	rte_gro_ctrl_create;
> +	rte_gro_ctrl_destroy;
> +	rte_gro_get_count;
> +	rte_gro_reassemble_burst;
> +	rte_gro_reassemble;
> +	rte_gro_timeout_flush;
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 7d71a49..d76ed13 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
>   
>   ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v11 1/3] lib: add Generic Receive Offload API framework
  2017-07-07  6:55                       ` Tan, Jianfeng
@ 2017-07-07  9:19                         ` Tan, Jianfeng
  0 siblings, 0 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-07  9:19 UTC (permalink / raw)
  To: Tan, Jianfeng, Hu, Jiayu, dev
  Cc: Ananyev, Konstantin, yliu, stephen, Wu, Jingjing, Yao, Lei A



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tan, Jianfeng
> Sent: Friday, July 7, 2017 2:56 PM
> To: Hu, Jiayu; dev@dpdk.org
> Cc: Ananyev, Konstantin; yliu@fridaylinux.org;
> stephen@networkplumber.org; Wu, Jingjing; Yao, Lei A
> Subject: Re: [dpdk-dev] [PATCH v11 1/3] lib: add Generic Receive Offload API
> framework
> 
> 
> 
> On 7/5/2017 12:08 PM, Jiayu Hu wrote:
> > Generic Receive Offload (GRO) is a widely used SW-based offloading
> > technique to reduce per-packet processing overhead. It gains
> > performance by reassembling small packets into large ones. This
> > patchset is to support GRO in DPDK. To support GRO, this patch
> > implements a GRO API framework.
> >
> > To enable more flexibility to applications, DPDK GRO is implemented as
> > a user library. Applications explicitly use the GRO library to merge
> > small packets into large ones. DPDK GRO provides two reassembly modes.
> > One is called lightweight mode, the other is called heavyweight mode.
> > If applications want to merge packets in a simple way and the number
> > of packets is relatively small, they can use the lightweight mode.
> > If applications need more fine-grained controls, they can choose the
> > heavyweight mode.
> >
> > rte_gro_reassemble_burst is the main reassembly API which is used in
> > lightweight mode and processes N packets at a time. For applications,
> > performing GRO in lightweight mode is simple. They just need to invoke
> > rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> > rte_gro_reassemble_burst returns.
> >
> > rte_gro_reassemble is the main reassembly API which is used in
> > heavyweight mode and tries to merge N inputted packets with the packets
> > in GRO reassembly tables. For applications, performing GRO in
> heavyweight
> > mode is relatively complicated. Before performing GRO, applications need
> > to create a GRO context object, which keeps reassembly tables of
> > desired GRO types, by rte_gro_ctrl_create. Then applications can use
> > rte_gro_reassemble to merge packets. The GROed packets are in the
> > reassembly tables of the GRO context object. If applications want to get
> > them, applications need to manually flush them by flush API.
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > ---
> >   config/common_base                 |   5 +
> >   lib/Makefile                       |   2 +
> >   lib/librte_gro/Makefile            |  50 ++++++++++
> >   lib/librte_gro/rte_gro.c           | 171
> ++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro.h           | 195
> +++++++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_version.map |  12 +++
> >   mk/rte.app.mk                      |   1 +
> >   7 files changed, 436 insertions(+)
> >   create mode 100644 lib/librte_gro/Makefile
> >   create mode 100644 lib/librte_gro/rte_gro.c
> >   create mode 100644 lib/librte_gro/rte_gro.h
> >   create mode 100644 lib/librte_gro/rte_gro_version.map
> >
> > diff --git a/config/common_base b/config/common_base
> > index 660588a..19605a2 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -713,6 +713,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> >   CONFIG_RTE_LIBRTE_PMD_VHOST=n
> >
> >   #
> > +# Compile GRO library

Maybe we can add experimental mark for this library as the heavy mode APIs are still not tested. You can do by adding below line:
# EXPERIMENTAL: API may change without prior notice

> > +#
> > +CONFIG_RTE_LIBRTE_GRO=y

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                       ` (2 preceding siblings ...)
  2017-07-05  4:08                     ` [PATCH v11 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-07 10:39                     ` Jiayu Hu
  2017-07-07 10:39                       ` [PATCH v12 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                         ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-07 10:39 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way and the number of packets is small, they can
select the lightweight mode API. If applications need more fine-grained
controls, they can select the heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v12:
- remove max_timeout_cycles from struct rte_gro_param
- remove is_valid from struct gro_tcp4_key
- optimize key comparison function
- avoid updating IP ID for merged packets
- change GRO_TCP4_TBL_MAX_ITEM_NUM value to 1024*1024
- modify gro_tcp4_tbl_timeout_flush
- change rte_gro_get_count to rte_gro_get_pkt_count
- fix code aligment issue
- add EXPERIMENTAL mark to heavyweight APIs
v11:
- avoid converting big-endian to little-endian when compare key
- add sent_seq and ip_id to gro_tcp4_item to accelarate packet
	reassembly
- remove max_packet_size from rte_gro_param
- add inline functions to replace reduplicate codes
- change external API names and structure name
	(rte_gro_tbl_xxx -> rte_gro_ctx_xxx)
- fix coding style issues and order issue in rte_gro_version.map
- change inacccurate comments
- change internal files name from rte_gro_tcp4.x to gro_tcp4.x
v10:
- add support to merge '<seq=2, seq=1>' TCP/IPv4 packets
- check if IP ID is consecutive and update IP ID for merged packets
- check SYN, FIN, PSH, RST, URG flags
- use different reassembly table structures and APIs for TCP/IPv4 GRO
	and TCP/IPv6 GRO
- change file name from 'rte_gro_tcp.x' to 'rte_gro_tcp4.x'
v9:
- avoid defining variable size structure array and memset variable size
	in rte_gro_reassemble_burst
- change internal structure name from 'te_gro_tbl' to 'gro_tbl'
- delete useless variables in rte_gro_tcp.c
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++
 app/test-pmd/config.c                       |  36 ++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 +++
 lib/librte_gro/gro_tcp4.c                   | 501 ++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h                   | 210 ++++++++++++
 lib/librte_gro/rte_gro.c                    | 278 +++++++++++++++
 lib/librte_gro/rte_gro.h                    | 211 ++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1492 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v12 1/3] lib: add Generic Receive Offload API framework
  2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-07-07 10:39                       ` Jiayu Hu
  2017-07-08 16:37                         ` Tan, Jianfeng
  2017-07-07 10:39                       ` [PATCH v12 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-07-07 10:39 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in GRO reassembly tables. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO context object, which keeps reassembly tables of
desired GRO types, by rte_gro_ctx_create. Then applications can use
rte_gro_reassemble to merge packets. The GROed packets are in the
reassembly tables of the GRO context object. If applications want to get
them, applications need to manually flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++
 lib/librte_gro/rte_gro.c           | 169 ++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 208 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 7 files changed, 447 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/config/common_base b/config/common_base
index 660588a..19605a2 100644
--- a/config/common_base
+++ b/config/common_base
@@ -713,6 +713,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..ac1c2f6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..fa6d7ce
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,169 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM];
+
+/*
+ * GRO context structure, which is used to merge packets. It keeps
+ * many reassembly tables of desired GRO types. Applications need to
+ * create GRO context objects before using rte_gro_reassemble to
+ * perform GRO.
+ */
+struct gro_ctx {
+	/* GRO types to perform */
+	uint64_t gro_types;
+	/* reassembly tables */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *
+rte_gro_ctx_create(const struct rte_gro_param *param)
+{
+	struct gro_ctx *gro_ctx;
+	gro_tbl_create_fn create_tbl_fn;
+	uint64_t gro_type_flag = 0;
+	uint64_t gro_types = 0;
+	uint8_t i;
+
+	gro_ctx = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_ctx),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_ctx == NULL)
+		return NULL;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((param->gro_types & gro_type_flag) == 0)
+			continue;
+
+		create_tbl_fn = tbl_create_fn[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_ctx->tbls[i] = create_tbl_fn(param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_ctx->tbls[i] == NULL) {
+			/* destroy all created tables */
+			gro_ctx->gro_types = gro_types;
+			rte_gro_ctx_destroy(gro_ctx);
+			return NULL;
+		}
+		gro_types |= gro_type_flag;
+	}
+	gro_ctx->gro_types = param->gro_types;
+
+	return gro_ctx;
+}
+
+void
+rte_gro_ctx_destroy(void *ctx)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_ctx == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_fn[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_ctx->tbls[i]);
+	}
+	rte_free(gro_ctx);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *ctx __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *ctx __rte_unused,
+		uint64_t timeout_cycles __rte_unused,
+		uint64_t gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t
+rte_gro_get_pkt_count(void *ctx)
+{
+	struct gro_ctx *gro_ctx = ctx;
+	gro_tbl_pkt_count_fn pkt_count_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+
+		pkt_count_fn = tbl_pkt_count_fn[i];
+		if (pkt_count_fn == NULL)
+			continue;
+		item_num += pkt_count_fn(gro_ctx->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..53ddd15
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,208 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**< the max packets number that rte_gro_reassemble_burst()
+ * can process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 128U
+
+/**< max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 0
+
+
+struct rte_gro_param {
+	/**< desired GRO types */
+	uint64_t gro_types;
+	/**< max flow number */
+	uint16_t max_flow_num;
+	/**< max packet number per flow */
+	uint16_t max_item_per_flow;
+	/**< socket index for allocating GRO related data structures,
+	 * like reassembly tables. When use rte_gro_reassemble_burst(),
+	 * applications don't need to set this value.
+	 */
+	uint16_t socket_id;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function create a GRO context object, which is used to merge
+ * packets in rte_gro_reassemble().
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO
+ *  context object.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  context object. Otherwise, return NULL.
+ */
+void *rte_gro_ctx_create(const struct rte_gro_param *param);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function destroys a GRO context object.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ */
+void rte_gro_ctx_destroy(void *ctx);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst() what rules
+ *  are demanded.
+ *
+ * @return
+ *  the number of packets after been GROed. If no packets are merged,
+ *  the returned value is nb_pkts.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in the reassembly tables of a given GRO context. This
+ * function assumes all inputted packets are with correct checksums.
+ * And it won't update checksums if two packets are merged. Besides,
+ * if inputted packets are IP fragmented, this function assumes they
+ * are complete packets (i.e. with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types etc., they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (e.g. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ *
+ * @return
+ *  return the number of unprocessed packets (e.g. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *ctx);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ * @param timeout_cycles
+ *  max TTL for packets in reassembly tables, measured in nanosecond.
+ * @param gro_types
+ *  this function only flushes packets which belong to the GRO types
+ *  specified by gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *ctx,
+		uint64_t timeout_cycles,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function returns the number of packets in all reassembly tables
+ * of a given GRO context.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ *
+ * @return
+ *  the number of packets in all reassembly tables.
+ */
+uint64_t rte_gro_get_pkt_count(void *ctx);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRO_H_ */
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..a4e58f0
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_ctrl_create;
+	rte_gro_ctrl_destroy;
+	rte_gro_get_pkt_count;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7d71a49..d76ed13 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v12 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-07 10:39                       ` [PATCH v12 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-07 10:39                       ` Jiayu Hu
  2017-07-08 16:37                         ` Tan, Jianfeng
  2017-07-07 10:39                       ` [PATCH v12 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-07-07 10:39 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
- gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
    to merge packets.
- gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
- gro_tcp4_tbl_pkt_count: return the number of packets in a TCP/IPv4
    reassembly table.
- gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
    reassembly table.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
array and a item array, where the key array keeps the criteria to merge
packets and the item array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes three parts:
- firstseg: the address of the first segment of the packet
- lastseg: the address of the last segment of the packet
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with one of the
    following properties won't be processed:
	- FIN, SYN, RST, URG, PSH, ECE or CWR bit is set;
	- packet payload length is 0.
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/gro_tcp4.c              | 501 +++++++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h              | 210 ++++++++++++++
 lib/librte_gro/rte_gro.c               | 137 ++++++++-
 lib/librte_gro/rte_gro.h               |   5 +-
 6 files changed, 846 insertions(+), 15 deletions(-)
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 842f46f..f067247 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -75,6 +75,13 @@ New Features
 
   Added support for firmwares with multiple Ethernet ports per physical port.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..747eeec 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
new file mode 100644
index 0000000..2936fe1
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.c
@@ -0,0 +1,501 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "gro_tcp4.h"
+
+void *
+gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	struct gro_tcp4_tbl *tbl;
+	size_t size;
+	uint32_t entries_num, i;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = RTE_MIN(entries_num, GRO_TCP4_TBL_MAX_ITEM_NUM);
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_tcp4_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp4_item) * entries_num;
+	tbl->items = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp4_key) * entries_num;
+	tbl->keys = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	/* INVALID_ARRAY_INDEX indicates empty key */
+	for (i = 0; i < entries_num; i++)
+		tbl->keys[i].start_index = INVALID_ARRAY_INDEX;
+	tbl->max_key_num = entries_num;
+
+	return tbl;
+}
+
+void
+gro_tcp4_tbl_destroy(void *tbl)
+{
+	struct gro_tcp4_tbl *tcp_tbl = tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+/*
+ * merge two TCP/IPv4 packets without updating checksums.
+ * If cmp is larger than 0, append the new packet to the
+ * original packet. Otherwise, pre-pend the new packet to
+ * the original packet.
+ */
+static inline int
+merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		int cmp)
+{
+	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
+	uint16_t tcp_datalen;
+
+	if (cmp > 0) {
+		pkt_head = item_src->firstseg;
+		pkt_tail = pkt;
+	} else {
+		pkt_head = pkt;
+		pkt_tail = item_src->firstseg;
+	}
+
+	/* check if the packet length will be beyond the max value */
+	tcp_datalen = pkt_tail->pkt_len - pkt_tail->l2_len -
+		pkt_tail->l3_len - pkt_tail->l4_len;
+	if (pkt_head->pkt_len - pkt_head->l2_len + tcp_datalen >
+			TCP4_MAX_L3_LENGTH)
+		return 0;
+
+	/* remove packet header for the tail packet */
+	rte_pktmbuf_adj(pkt_tail,
+			pkt_tail->l2_len +
+			pkt_tail->l3_len +
+			pkt_tail->l4_len);
+
+	/* chain two packets together */
+	if (cmp > 0) {
+		item_src->lastseg->next = pkt;
+		item_src->lastseg = rte_pktmbuf_lastseg(pkt);
+		/* update IP ID to the larger value */
+		item_src->ip_id = ip_id;
+	} else {
+		lastseg = rte_pktmbuf_lastseg(pkt);
+		lastseg->next = item_src->firstseg;
+		item_src->firstseg = pkt;
+		/* update sent_seq to the smaller value */
+		item_src->sent_seq = sent_seq;
+	}
+	item_src->nb_merged++;
+
+	/* update mbuf metadata for the merged packet */
+	pkt_head->nb_segs += pkt_tail->nb_segs;
+	pkt_head->pkt_len += pkt_tail->pkt_len;
+
+	return 1;
+}
+
+static inline int
+check_seq_option(struct gro_tcp4_item *item,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl,
+		uint16_t tcp_dl,
+		uint16_t ip_id,
+		uint32_t sent_seq)
+{
+	struct rte_mbuf *pkt0 = item->firstseg;
+	struct ipv4_hdr *ipv4_hdr0;
+	struct tcp_hdr *tcp_hdr0;
+	uint16_t tcp_hl0, tcp_dl0;
+	uint16_t len;
+
+	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
+			pkt0->l2_len);
+	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
+	tcp_hl0 = pkt0->l4_len;
+
+	/* check if TCP option fields equal. If not, return 0. */
+	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
+	if ((tcp_hl != tcp_hl0) ||
+			((len > 0) && (memcmp(tcp_hdr + 1,
+					tcp_hdr0 + 1,
+					len) != 0)))
+		return 0;
+
+	/* check if the two packets are neighbors */
+	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
+	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
+			(ip_id == (item->ip_id + 1)))
+		/* append the new packet */
+		return 1;
+	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
+			((ip_id + item->nb_merged) == item->ip_id))
+		/* pre-pend the new packet */
+		return -1;
+	else
+		return 0;
+}
+
+static inline uint32_t
+find_an_empty_item(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_item_num; i++)
+		if (tbl->items[i].firstseg == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+find_an_empty_key(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->max_key_num; i++)
+		if (tbl->keys[i].start_index == INVALID_ARRAY_INDEX)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+insert_new_item(struct gro_tcp4_tbl *tbl,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		uint32_t prev_idx,
+		uint64_t start_time)
+{
+	uint32_t item_idx;
+
+	item_idx = find_an_empty_item(tbl);
+	if (item_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	tbl->items[item_idx].firstseg = pkt;
+	tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
+	tbl->items[item_idx].start_time = start_time;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].sent_seq = sent_seq;
+	tbl->items[item_idx].ip_id = ip_id;
+	tbl->items[item_idx].nb_merged = 1;
+	tbl->item_num++;
+
+	/* if the previous packet exists, chain the new one with it */
+	if (prev_idx != INVALID_ARRAY_INDEX) {
+		tbl->items[item_idx].next_pkt_idx =
+			tbl->items[prev_idx].next_pkt_idx;
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+	}
+
+	return item_idx;
+}
+
+static inline uint32_t
+delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx,
+		uint32_t prev_item_idx)
+{
+	uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
+
+	/* set NULL to firstseg to indicate it's an empty item */
+	tbl->items[item_idx].firstseg = NULL;
+	tbl->item_num--;
+	if (prev_item_idx != INVALID_ARRAY_INDEX)
+		tbl->items[prev_item_idx].next_pkt_idx = next_idx;
+
+	return next_idx;
+}
+
+static inline uint32_t
+insert_new_key(struct gro_tcp4_tbl *tbl,
+		struct tcp4_key *key_src,
+		uint32_t item_idx)
+{
+	struct tcp4_key *key_dst;
+	uint32_t key_idx;
+
+	key_idx = find_an_empty_key(tbl);
+	if (key_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	key_dst = &(tbl->keys[key_idx].key);
+
+	ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
+	ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
+	key_dst->ip_src_addr = key_src->ip_src_addr;
+	key_dst->ip_dst_addr = key_src->ip_dst_addr;
+	key_dst->recv_ack = key_src->recv_ack;
+	key_dst->src_port = key_src->src_port;
+	key_dst->dst_port = key_src->dst_port;
+
+	/* non-INVALID_ARRAY_INDEX value indicates this key is valid */
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->key_num++;
+
+	return key_idx;
+}
+
+static inline int
+is_same_key(struct tcp4_key k1, struct tcp4_key k2)
+{
+	if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
+		return 0;
+
+	if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
+		return 0;
+
+	return ((k1.ip_src_addr == k2.ip_src_addr) &&
+			(k1.ip_dst_addr == k2.ip_dst_addr) &&
+			(k1.recv_ack == k2.recv_ack) &&
+			(k1.src_port == k2.src_port) &&
+			(k1.dst_port == k2.dst_port));
+}
+
+/*
+ * update packet length for the flushed packet.
+ */
+static inline void
+update_header(struct gro_tcp4_item *item)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct rte_mbuf *pkt = item->firstseg;
+
+	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			pkt->l2_len);
+	ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
+			pkt->l2_len);
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint32_t sent_seq;
+	uint16_t tcp_dl, ip_id;
+
+	struct tcp4_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i;
+	int cmp;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+
+	/*
+	 * if FIN, SYN, RST, PSH, URG, ECE or
+	 * CWR is set, return immediately.
+	 */
+	if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
+		return -1;
+	/* if payload length is 0, return immediately */
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
+		pkt->l4_len;
+	if (tcp_dl == 0)
+		return -1;
+
+	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
+	ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
+	key.ip_src_addr = ipv4_hdr->src_addr;
+	key.ip_dst_addr = ipv4_hdr->dst_addr;
+	key.src_port = tcp_hdr->src_port;
+	key.dst_port = tcp_hdr->dst_port;
+	key.recv_ack = tcp_hdr->recv_ack;
+
+	/* search for a key */
+	for (i = 0; i < tbl->max_key_num; i++) {
+		if ((tbl->keys[i].start_index != INVALID_ARRAY_INDEX) &&
+				is_same_key(tbl->keys[i].key, key))
+			break;
+	}
+
+	/* can't find a key, so insert a new key and a new item. */
+	if (i == tbl->max_key_num) {
+		item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
+				INVALID_ARRAY_INDEX, start_time);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		if (insert_new_key(tbl, &key, item_idx) ==
+				INVALID_ARRAY_INDEX) {
+			/*
+			 * fail to insert a new key, so
+			 * delete the inserted item
+			 */
+			delete_item(tbl, item_idx, INVALID_ARRAY_INDEX);
+			return -1;
+		}
+		return 0;
+	}
+
+	/* traverse all packets in the item group to find one to merge */
+	cur_idx = tbl->keys[i].start_index;
+	prev_idx = cur_idx;
+	do {
+		cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
+				pkt->l4_len, tcp_dl, ip_id, sent_seq);
+		if (cmp) {
+			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
+						pkt, ip_id,
+						sent_seq, cmp))
+				return 1;
+			/*
+			 * fail to merge two packets since the packet
+			 * length will be greater than the max value.
+			 * So insert the packet into the item group.
+			 */
+			if (insert_new_item(tbl, pkt, ip_id, sent_seq,
+						prev_idx, start_time) ==
+					INVALID_ARRAY_INDEX)
+				return -1;
+			return 0;
+		}
+		prev_idx = cur_idx;
+		cur_idx = tbl->items[cur_idx].next_pkt_idx;
+	} while (cur_idx != INVALID_ARRAY_INDEX);
+
+	/*
+	 * can't find a packet in the item group to merge,
+	 * so insert the packet into the item group.
+	 */
+	if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
+				start_time) == INVALID_ARRAY_INDEX)
+		return -1;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t flush_timestamp,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+
+	for (i = 0; i < tbl->max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		j = tbl->keys[i].start_index;
+		while (j != INVALID_ARRAY_INDEX) {
+			if (tbl->items[j].start_time <= flush_timestamp) {
+				out[k++] = tbl->items[j].firstseg;
+				if (tbl->items[j].nb_merged > 1)
+					update_header(&(tbl->items[j]));
+				/*
+				 * delete the item and get
+				 * the next packet index
+				 */
+				j = delete_item(tbl, j,
+						INVALID_ARRAY_INDEX);
+
+				/*
+				 * delete the key as all of
+				 * packets are flushed
+				 */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].start_index =
+						INVALID_ARRAY_INDEX;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/*
+				 * left packets of this key won't be
+				 * timeout, so go to check other keys.
+				 */
+				break;
+		}
+	}
+	return k;
+}
+
+uint32_t
+gro_tcp4_tbl_pkt_count(void *tbl)
+{
+	struct gro_tcp4_tbl *gro_tbl = tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+
+	return 0;
+}
diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
new file mode 100644
index 0000000..f41dcee
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.h
@@ -0,0 +1,210 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _GRO_TCP4_H_
+#define _GRO_TCP4_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL)
+
+/*
+ * the max L3 length of a TCP/IPv4 packet. The L3 length
+ * is the sum of ipv4 header, tcp header and L4 payload.
+ */
+#define TCP4_MAX_L3_LENGTH UINT16_MAX
+
+/* criteria of mergeing packets */
+struct tcp4_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;
+	uint16_t src_port;
+	uint16_t dst_port;
+};
+
+struct gro_tcp4_key {
+	struct tcp4_key key;
+	/*
+	 * the index of the first packet in the item group.
+	 * If the value is INVALID_ARRAY_INDEX, it means
+	 * the key is empty.
+	 */
+	uint32_t start_index;
+};
+
+struct gro_tcp4_item {
+	/*
+	 * first segment of the packet. If the value
+	 * is NULL, it means the item is empty.
+	 */
+	struct rte_mbuf *firstseg;
+	/* last segment of the packet */
+	struct rte_mbuf *lastseg;
+	/*
+	 * the time when the first packet is inserted
+	 * into the table. If a packet in the table is
+	 * merged with an incoming packet, this value
+	 * won't be updated. We set this value only
+	 * when the first packet is inserted into the
+	 * table.
+	 */
+	uint64_t start_time;
+	/*
+	 * we use next_pkt_idx to chain the packets that
+	 * have same key value but can't be merged together.
+	 */
+	uint32_t next_pkt_idx;
+	/* the sequence number of the packet */
+	uint32_t sent_seq;
+	/* the IP ID of the packet */
+	uint16_t ip_id;
+	/* the number of merged packets */
+	uint16_t nb_merged;
+};
+
+/*
+ * TCP/IPv4 reassembly table structure.
+ */
+struct gro_tcp4_tbl {
+	/* item array */
+	struct gro_tcp4_item *items;
+	/* key array */
+	struct gro_tcp4_key *keys;
+	/* current item number */
+	uint32_t item_num;
+	/* current key num */
+	uint32_t key_num;
+	/* item array size */
+	uint32_t max_item_num;
+	/* key array size */
+	uint32_t max_key_num;
+};
+
+/**
+ * This function creates a TCP/IPv4 reassembly table.
+ *
+ * @param socket_id
+ *  socket index for allocating TCP/IPv4 reassemblt table
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP/IPv4 GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP/IPv4 GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP/IPv4 reassembly table.
+ *
+ * @param tbl
+ *  a pointer points to the TCP/IPv4 reassembly table.
+ */
+void gro_tcp4_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP/IPv4 reassembly table
+ * to merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
+ * CWR, ECE or URG bit is set, are returned immediately. Packets which
+ * only have packet headers (i.e. without data) are also returned
+ * immediately. Otherwise, the packet is either merged, or inserted into
+ * the table. Besides, if there is no available space to insert the
+ * packet, this function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP/IPv4 reassembly table.
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ *
+ * @return
+ *  if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
+ *  or URG bit is set, or there is no available space in the table to
+ *  insert a new item or a new key, return a negative value. If the
+ *  packet is merged successfully, return an positive value. If the
+ *  packet is inserted into the table, return 0.
+ */
+int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP/IPv4 reassembly table
+ * to applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param flush_timestamp
+ *  this function flushes packets which are inserted into the table
+ *  before or at the flush_timestamp.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t flush_timestamp,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in a TCP/IPv4
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP/IPv4 reassembly table.
+ *
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t gro_tcp4_tbl_pkt_count(void *tbl);
+#endif
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index fa6d7ce..4624485 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "gro_tcp4.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
+		gro_tcp4_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_destroy, NULL};
+static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_pkt_count, NULL};
 
 /*
  * GRO context structure, which is used to merge packets. It keeps
@@ -121,28 +127,131 @@ rte_gro_ctx_destroy(void *ctx)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	struct gro_tcp4_tbl tcp_tbl;
+	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM];
+	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	/* get the actual number of packets */
+	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
+			param->max_item_per_flow));
+	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
+
+	for (i = 0; i < item_num; i++)
+		tcp_keys[i].start_index = INVALID_ARRAY_INDEX;
+
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = item_num;
+	tcp_tbl.max_item_num = item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
+					RTE_PTYPE_L4_TCP)) ==
+				(RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0) {
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+			}
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, current_time,
+				pkts, nb_pkts);
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+		}
+	}
+
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *ctx __rte_unused)
+		void *ctx)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t current_time;
+
+	if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
+					RTE_PTYPE_L4_TCP)) ==
+				(RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_ctx->tbls
+						[RTE_GRO_TCP_IPV4_INDEX],
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0) {
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) *
+				unprocess_num);
+	}
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *ctx __rte_unused,
-		uint64_t timeout_cycles __rte_unused,
-		uint64_t gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *ctx,
+		uint64_t timeout_cycles,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t flush_timestamp;
+
+	gro_types = gro_types & gro_ctx->gro_types;
+	flush_timestamp = rte_rdtsc() - timeout_cycles;
+
+	if (gro_types & RTE_GRO_TCP_IPV4) {
+		return gro_tcp4_tbl_timeout_flush(
+				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				flush_timestamp,
+				out, max_nb_out);
+	}
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 53ddd15..3b4fec0 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,8 +45,11 @@ extern "C" {
 /**< max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
 /**< current supported GRO num */
-#define RTE_GRO_TYPE_SUPPORT_NUM 0
+#define RTE_GRO_TYPE_SUPPORT_NUM 1
 
+/**< TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 struct rte_gro_param {
 	/**< desired GRO types */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v12 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-07 10:39                       ` [PATCH v12 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-07-07 10:39                       ` [PATCH v12 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-07 10:39                       ` Jiayu Hu
  2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-07 10:39 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  36 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 214 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6789071..262d099 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -13732,6 +13855,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index aa35505..f1e1cf2 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2415,6 +2416,41 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.gro_types = RTE_GRO_TCP_IPV4;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 132ce81..765867f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -379,6 +380,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c73196f..3db667f 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -34,6 +34,8 @@
 #ifndef _TESTPMD_H_
 #define _TESTPMD_H_
 
+#include <rte_gro.h>
+
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
 #define RTE_TEST_RX_DESC_MAX    2048
@@ -430,6 +432,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -628,6 +638,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index ca7c16b..44743ff 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -884,6 +884,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v12 1/3] lib: add Generic Receive Offload API framework
  2017-07-07 10:39                       ` [PATCH v12 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-08 16:37                         ` Tan, Jianfeng
  0 siblings, 0 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-08 16:37 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, stephen, jingjing.wu, lei.a.yao

Hi Jiayu,


On 7/7/2017 6:39 PM, Jiayu Hu wrote:
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> technique to reduce per-packet processing overhead. It gains
> performance by reassembling small packets into large ones. This
> patchset is to support GRO in DPDK. To support GRO, this patch
> implements a GRO API framework.
>
> To enable more flexibility to applications, DPDK GRO is implemented as
> a user library. Applications explicitly use the GRO library to merge
> small packets into large ones. DPDK GRO provides two reassembly modes.
> One is called lightweight mode, the other is called heavyweight mode.
> If applications want to merge packets in a simple way and the number
> of packets is relatively small, they can use the lightweight mode.
> If applications need more fine-grained controls, they can choose the
> heavyweight mode.
>
> rte_gro_reassemble_burst is the main reassembly API which is used in
> lightweight mode and processes N packets at a time. For applications,
> performing GRO in lightweight mode is simple. They just need to invoke
> rte_gro_reassemble_burst. Applications can get GROed packets as soon as
> rte_gro_reassemble_burst returns.
>
> rte_gro_reassemble is the main reassembly API which is used in
> heavyweight mode and tries to merge N inputted packets with the packets
> in GRO reassembly tables. For applications, performing GRO in heavyweight
> mode is relatively complicated. Before performing GRO, applications need
> to create a GRO context object, which keeps reassembly tables of
> desired GRO types, by rte_gro_ctx_create. Then applications can use
> rte_gro_reassemble to merge packets. The GROed packets are in the
> reassembly tables of the GRO context object. If applications want to get
> them, applications need to manually flush them by flush API.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>

I suppose you still need to update the MAINTAINER file. Besides that, 
this series looks good to me.

Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v12 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-07 10:39                       ` [PATCH v12 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-08 16:37                         ` Tan, Jianfeng
  0 siblings, 0 replies; 141+ messages in thread
From: Tan, Jianfeng @ 2017-07-08 16:37 UTC (permalink / raw)
  To: Jiayu Hu, dev; +Cc: konstantin.ananyev, yliu, stephen, jingjing.wu, lei.a.yao



On 7/7/2017 6:39 PM, Jiayu Hu wrote:
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
> - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
>      to merge packets.
> - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> - gro_tcp4_tbl_pkt_count: return the number of packets in a TCP/IPv4
>      reassembly table.
> - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
>      reassembly table.
>
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
>
> In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> array and a item array, where the key array keeps the criteria to merge
> packets and the item array keeps packet information.
>
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
>      merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
>
> Each element in the item array keeps the information of one packet. It
> mainly includes three parts:
> - firstseg: the address of the first segment of the packet
> - lastseg: the address of the last segment of the packet
> - next_pkt_index: the index of the next packet in the same item group.
>      All packets in the same item group are chained by next_pkt_index.
>      With next_pkt_index, we can locate all packets in the same item
>      group one by one.
>
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with one of the
>      following properties won't be processed:
> 	- FIN, SYN, RST, URG, PSH, ECE or CWR bit is set;
> 	- packet payload length is 0.
> b. traverse the key array to find a key which has the same criteria
>      value with the incoming packet. If find, goto step c. Otherwise,
>      insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
>      key. Then traverse all packets in the item group via next_pkt_index.
>      If find one packet which can merge with the incoming one, merge them
>      together. If can't find, insert the packet into this item group.
>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>

Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                         ` (2 preceding siblings ...)
  2017-07-07 10:39                       ` [PATCH v12 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-09  1:13                       ` Jiayu Hu
  2017-07-09  1:13                         ` [PATCH v13 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                           ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  1:13 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way and the number of packets is small, they can
select the lightweight mode API. If applications need more fine-grained
controls, they can select the heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v13:
- optimize 'for' statement
- update MAINTAINER file
v12:
- remove max_timeout_cycles from struct rte_gro_param
- remove is_valid from struct gro_tcp4_key
- optimize key comparison function
- avoid updating IP ID for merged packets
- change GRO_TCP4_TBL_MAX_ITEM_NUM value to 1024*1024
- modify gro_tcp4_tbl_timeout_flush
- change rte_gro_get_count to rte_gro_get_pkt_count
- fix code aligment issue
- add EXPERIMENTAL mark to heavyweight APIs
v11:
- avoid converting big-endian to little-endian when compare key
- add sent_seq and ip_id to gro_tcp4_item to accelarate packet
	reassembly
- remove max_packet_size from rte_gro_param
- add inline functions to replace reduplicate codes
- change external API names and structure name
	(rte_gro_tbl_xxx -> rte_gro_ctx_xxx)
- fix coding style issues and order issue in rte_gro_version.map
- change inacccurate comments
- change internal files name from rte_gro_tcp4.x to gro_tcp4.x
v10:
- add support to merge '<seq=2, seq=1>' TCP/IPv4 packets
- check if IP ID is consecutive and update IP ID for merged packets
- check SYN, FIN, PSH, RST, URG flags
- use different reassembly table structures and APIs for TCP/IPv4 GRO
	and TCP/IPv6 GRO
- change file name from 'rte_gro_tcp.x' to 'rte_gro_tcp4.x'
v9:
- avoid defining variable size structure array and memset variable size
	in rte_gro_reassemble_burst
- change internal structure name from 'te_gro_tbl' to 'gro_tbl'
- delete useless variables in rte_gro_tcp.c
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 MAINTAINERS                                 |   4 +
 app/test-pmd/cmdline.c                      | 125 +++++++
 app/test-pmd/config.c                       |  36 ++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  10 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 +++
 lib/librte_gro/gro_tcp4.c                   | 505 ++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h                   | 210 ++++++++++++
 lib/librte_gro/rte_gro.c                    | 278 +++++++++++++++
 lib/librte_gro/rte_gro.h                    | 211 ++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 17 files changed, 1499 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v13 1/3] lib: add Generic Receive Offload API framework
  2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-07-09  1:13                         ` Jiayu Hu
  2017-07-09  1:13                         ` [PATCH v13 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  1:13 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in GRO reassembly tables. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO context object, which keeps reassembly tables of
desired GRO types, by rte_gro_ctx_create. Then applications can use
rte_gro_reassemble to merge packets. The GROed packets are in the
reassembly tables of the GRO context object. If applications want to get
them, applications need to manually flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 MAINTAINERS                        |   4 +
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++
 lib/librte_gro/rte_gro.c           | 169 ++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 208 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 8 files changed, 451 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 8fb2132..d20fca5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -648,6 +648,10 @@ F: doc/guides/prog_guide/pdump_lib.rst
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Generic receive offload
+M: Jiayu Hu <jiayu.hu@intel.com>
+F: lib/librte_gro/
+
 
 Packet Framework
 ----------------
diff --git a/config/common_base b/config/common_base
index bb1ba8b..0ac51a1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -713,6 +713,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index aef584e..120fca6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..fa6d7ce
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,169 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM];
+
+/*
+ * GRO context structure, which is used to merge packets. It keeps
+ * many reassembly tables of desired GRO types. Applications need to
+ * create GRO context objects before using rte_gro_reassemble to
+ * perform GRO.
+ */
+struct gro_ctx {
+	/* GRO types to perform */
+	uint64_t gro_types;
+	/* reassembly tables */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *
+rte_gro_ctx_create(const struct rte_gro_param *param)
+{
+	struct gro_ctx *gro_ctx;
+	gro_tbl_create_fn create_tbl_fn;
+	uint64_t gro_type_flag = 0;
+	uint64_t gro_types = 0;
+	uint8_t i;
+
+	gro_ctx = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_ctx),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_ctx == NULL)
+		return NULL;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((param->gro_types & gro_type_flag) == 0)
+			continue;
+
+		create_tbl_fn = tbl_create_fn[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_ctx->tbls[i] = create_tbl_fn(param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_ctx->tbls[i] == NULL) {
+			/* destroy all created tables */
+			gro_ctx->gro_types = gro_types;
+			rte_gro_ctx_destroy(gro_ctx);
+			return NULL;
+		}
+		gro_types |= gro_type_flag;
+	}
+	gro_ctx->gro_types = param->gro_types;
+
+	return gro_ctx;
+}
+
+void
+rte_gro_ctx_destroy(void *ctx)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_ctx == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_fn[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_ctx->tbls[i]);
+	}
+	rte_free(gro_ctx);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *ctx __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *ctx __rte_unused,
+		uint64_t timeout_cycles __rte_unused,
+		uint64_t gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t
+rte_gro_get_pkt_count(void *ctx)
+{
+	struct gro_ctx *gro_ctx = ctx;
+	gro_tbl_pkt_count_fn pkt_count_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+
+		pkt_count_fn = tbl_pkt_count_fn[i];
+		if (pkt_count_fn == NULL)
+			continue;
+		item_num += pkt_count_fn(gro_ctx->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..53ddd15
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,208 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**< the max packets number that rte_gro_reassemble_burst()
+ * can process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 128U
+
+/**< max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 0
+
+
+struct rte_gro_param {
+	/**< desired GRO types */
+	uint64_t gro_types;
+	/**< max flow number */
+	uint16_t max_flow_num;
+	/**< max packet number per flow */
+	uint16_t max_item_per_flow;
+	/**< socket index for allocating GRO related data structures,
+	 * like reassembly tables. When use rte_gro_reassemble_burst(),
+	 * applications don't need to set this value.
+	 */
+	uint16_t socket_id;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function create a GRO context object, which is used to merge
+ * packets in rte_gro_reassemble().
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO
+ *  context object.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  context object. Otherwise, return NULL.
+ */
+void *rte_gro_ctx_create(const struct rte_gro_param *param);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function destroys a GRO context object.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ */
+void rte_gro_ctx_destroy(void *ctx);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst() what rules
+ *  are demanded.
+ *
+ * @return
+ *  the number of packets after been GROed. If no packets are merged,
+ *  the returned value is nb_pkts.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in the reassembly tables of a given GRO context. This
+ * function assumes all inputted packets are with correct checksums.
+ * And it won't update checksums if two packets are merged. Besides,
+ * if inputted packets are IP fragmented, this function assumes they
+ * are complete packets (i.e. with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types etc., they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (e.g. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ *
+ * @return
+ *  return the number of unprocessed packets (e.g. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *ctx);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ * @param timeout_cycles
+ *  max TTL for packets in reassembly tables, measured in nanosecond.
+ * @param gro_types
+ *  this function only flushes packets which belong to the GRO types
+ *  specified by gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *ctx,
+		uint64_t timeout_cycles,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function returns the number of packets in all reassembly tables
+ * of a given GRO context.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ *
+ * @return
+ *  the number of packets in all reassembly tables.
+ */
+uint64_t rte_gro_get_pkt_count(void *ctx);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRO_H_ */
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..a4e58f0
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_ctrl_create;
+	rte_gro_ctrl_destroy;
+	rte_gro_get_pkt_count;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index dbd3614..7e231b9 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v13 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-09  1:13                         ` [PATCH v13 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-09  1:13                         ` Jiayu Hu
  2017-07-09  1:13                         ` [PATCH v13 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  1:13 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
- gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
    to merge packets.
- gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
- gro_tcp4_tbl_pkt_count: return the number of packets in a TCP/IPv4
    reassembly table.
- gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
    reassembly table.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
array and a item array, where the key array keeps the criteria to merge
packets and the item array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes three parts:
- firstseg: the address of the first segment of the packet
- lastseg: the address of the last segment of the packet
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with one of the
    following properties won't be processed:
	- FIN, SYN, RST, URG, PSH, ECE or CWR bit is set;
	- packet payload length is 0.
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/gro_tcp4.c              | 505 +++++++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h              | 210 ++++++++++++++
 lib/librte_gro/rte_gro.c               | 137 ++++++++-
 lib/librte_gro/rte_gro.h               |   5 +-
 6 files changed, 850 insertions(+), 15 deletions(-)
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index a2da17f..1736a3e 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -161,6 +161,13 @@ New Features
   to verify functionality and measure the performance parameters of DPDK
   eventdev devices.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..747eeec 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
new file mode 100644
index 0000000..61a0423
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.c
@@ -0,0 +1,505 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "gro_tcp4.h"
+
+void *
+gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	struct gro_tcp4_tbl *tbl;
+	size_t size;
+	uint32_t entries_num, i;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = RTE_MIN(entries_num, GRO_TCP4_TBL_MAX_ITEM_NUM);
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_tcp4_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp4_item) * entries_num;
+	tbl->items = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp4_key) * entries_num;
+	tbl->keys = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	/* INVALID_ARRAY_INDEX indicates empty key */
+	for (i = 0; i < entries_num; i++)
+		tbl->keys[i].start_index = INVALID_ARRAY_INDEX;
+	tbl->max_key_num = entries_num;
+
+	return tbl;
+}
+
+void
+gro_tcp4_tbl_destroy(void *tbl)
+{
+	struct gro_tcp4_tbl *tcp_tbl = tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+/*
+ * merge two TCP/IPv4 packets without updating checksums.
+ * If cmp is larger than 0, append the new packet to the
+ * original packet. Otherwise, pre-pend the new packet to
+ * the original packet.
+ */
+static inline int
+merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		int cmp)
+{
+	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
+	uint16_t tcp_datalen;
+
+	if (cmp > 0) {
+		pkt_head = item_src->firstseg;
+		pkt_tail = pkt;
+	} else {
+		pkt_head = pkt;
+		pkt_tail = item_src->firstseg;
+	}
+
+	/* check if the packet length will be beyond the max value */
+	tcp_datalen = pkt_tail->pkt_len - pkt_tail->l2_len -
+		pkt_tail->l3_len - pkt_tail->l4_len;
+	if (pkt_head->pkt_len - pkt_head->l2_len + tcp_datalen >
+			TCP4_MAX_L3_LENGTH)
+		return 0;
+
+	/* remove packet header for the tail packet */
+	rte_pktmbuf_adj(pkt_tail,
+			pkt_tail->l2_len +
+			pkt_tail->l3_len +
+			pkt_tail->l4_len);
+
+	/* chain two packets together */
+	if (cmp > 0) {
+		item_src->lastseg->next = pkt;
+		item_src->lastseg = rte_pktmbuf_lastseg(pkt);
+		/* update IP ID to the larger value */
+		item_src->ip_id = ip_id;
+	} else {
+		lastseg = rte_pktmbuf_lastseg(pkt);
+		lastseg->next = item_src->firstseg;
+		item_src->firstseg = pkt;
+		/* update sent_seq to the smaller value */
+		item_src->sent_seq = sent_seq;
+	}
+	item_src->nb_merged++;
+
+	/* update mbuf metadata for the merged packet */
+	pkt_head->nb_segs += pkt_tail->nb_segs;
+	pkt_head->pkt_len += pkt_tail->pkt_len;
+
+	return 1;
+}
+
+static inline int
+check_seq_option(struct gro_tcp4_item *item,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl,
+		uint16_t tcp_dl,
+		uint16_t ip_id,
+		uint32_t sent_seq)
+{
+	struct rte_mbuf *pkt0 = item->firstseg;
+	struct ipv4_hdr *ipv4_hdr0;
+	struct tcp_hdr *tcp_hdr0;
+	uint16_t tcp_hl0, tcp_dl0;
+	uint16_t len;
+
+	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
+			pkt0->l2_len);
+	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
+	tcp_hl0 = pkt0->l4_len;
+
+	/* check if TCP option fields equal. If not, return 0. */
+	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
+	if ((tcp_hl != tcp_hl0) ||
+			((len > 0) && (memcmp(tcp_hdr + 1,
+					tcp_hdr0 + 1,
+					len) != 0)))
+		return 0;
+
+	/* check if the two packets are neighbors */
+	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
+	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
+			(ip_id == (item->ip_id + 1)))
+		/* append the new packet */
+		return 1;
+	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
+			((ip_id + item->nb_merged) == item->ip_id))
+		/* pre-pend the new packet */
+		return -1;
+	else
+		return 0;
+}
+
+static inline uint32_t
+find_an_empty_item(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+	uint32_t max_item_num = tbl->max_item_num;
+
+	for (i = 0; i < max_item_num; i++)
+		if (tbl->items[i].firstseg == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+find_an_empty_key(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+	uint32_t max_key_num = tbl->max_key_num;
+
+	for (i = 0; i < max_key_num; i++)
+		if (tbl->keys[i].start_index == INVALID_ARRAY_INDEX)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+insert_new_item(struct gro_tcp4_tbl *tbl,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		uint32_t prev_idx,
+		uint64_t start_time)
+{
+	uint32_t item_idx;
+
+	item_idx = find_an_empty_item(tbl);
+	if (item_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	tbl->items[item_idx].firstseg = pkt;
+	tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
+	tbl->items[item_idx].start_time = start_time;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].sent_seq = sent_seq;
+	tbl->items[item_idx].ip_id = ip_id;
+	tbl->items[item_idx].nb_merged = 1;
+	tbl->item_num++;
+
+	/* if the previous packet exists, chain the new one with it */
+	if (prev_idx != INVALID_ARRAY_INDEX) {
+		tbl->items[item_idx].next_pkt_idx =
+			tbl->items[prev_idx].next_pkt_idx;
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+	}
+
+	return item_idx;
+}
+
+static inline uint32_t
+delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx,
+		uint32_t prev_item_idx)
+{
+	uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
+
+	/* set NULL to firstseg to indicate it's an empty item */
+	tbl->items[item_idx].firstseg = NULL;
+	tbl->item_num--;
+	if (prev_item_idx != INVALID_ARRAY_INDEX)
+		tbl->items[prev_item_idx].next_pkt_idx = next_idx;
+
+	return next_idx;
+}
+
+static inline uint32_t
+insert_new_key(struct gro_tcp4_tbl *tbl,
+		struct tcp4_key *key_src,
+		uint32_t item_idx)
+{
+	struct tcp4_key *key_dst;
+	uint32_t key_idx;
+
+	key_idx = find_an_empty_key(tbl);
+	if (key_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	key_dst = &(tbl->keys[key_idx].key);
+
+	ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
+	ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
+	key_dst->ip_src_addr = key_src->ip_src_addr;
+	key_dst->ip_dst_addr = key_src->ip_dst_addr;
+	key_dst->recv_ack = key_src->recv_ack;
+	key_dst->src_port = key_src->src_port;
+	key_dst->dst_port = key_src->dst_port;
+
+	/* non-INVALID_ARRAY_INDEX value indicates this key is valid */
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->key_num++;
+
+	return key_idx;
+}
+
+static inline int
+is_same_key(struct tcp4_key k1, struct tcp4_key k2)
+{
+	if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
+		return 0;
+
+	if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
+		return 0;
+
+	return ((k1.ip_src_addr == k2.ip_src_addr) &&
+			(k1.ip_dst_addr == k2.ip_dst_addr) &&
+			(k1.recv_ack == k2.recv_ack) &&
+			(k1.src_port == k2.src_port) &&
+			(k1.dst_port == k2.dst_port));
+}
+
+/*
+ * update packet length for the flushed packet.
+ */
+static inline void
+update_header(struct gro_tcp4_item *item)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct rte_mbuf *pkt = item->firstseg;
+
+	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			pkt->l2_len);
+	ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
+			pkt->l2_len);
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint32_t sent_seq;
+	uint16_t tcp_dl, ip_id;
+
+	struct tcp4_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, max_key_num;
+	int cmp;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+
+	/*
+	 * if FIN, SYN, RST, PSH, URG, ECE or
+	 * CWR is set, return immediately.
+	 */
+	if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
+		return -1;
+	/* if payload length is 0, return immediately */
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
+		pkt->l4_len;
+	if (tcp_dl == 0)
+		return -1;
+
+	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
+	ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
+	key.ip_src_addr = ipv4_hdr->src_addr;
+	key.ip_dst_addr = ipv4_hdr->dst_addr;
+	key.src_port = tcp_hdr->src_port;
+	key.dst_port = tcp_hdr->dst_port;
+	key.recv_ack = tcp_hdr->recv_ack;
+
+	/* search for a key */
+	max_key_num = tbl->max_key_num;
+	for (i = 0; i < max_key_num; i++) {
+		if ((tbl->keys[i].start_index != INVALID_ARRAY_INDEX) &&
+				is_same_key(tbl->keys[i].key, key))
+			break;
+	}
+
+	/* can't find a key, so insert a new key and a new item. */
+	if (i == tbl->max_key_num) {
+		item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
+				INVALID_ARRAY_INDEX, start_time);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		if (insert_new_key(tbl, &key, item_idx) ==
+				INVALID_ARRAY_INDEX) {
+			/*
+			 * fail to insert a new key, so
+			 * delete the inserted item
+			 */
+			delete_item(tbl, item_idx, INVALID_ARRAY_INDEX);
+			return -1;
+		}
+		return 0;
+	}
+
+	/* traverse all packets in the item group to find one to merge */
+	cur_idx = tbl->keys[i].start_index;
+	prev_idx = cur_idx;
+	do {
+		cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
+				pkt->l4_len, tcp_dl, ip_id, sent_seq);
+		if (cmp) {
+			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
+						pkt, ip_id,
+						sent_seq, cmp))
+				return 1;
+			/*
+			 * fail to merge two packets since the packet
+			 * length will be greater than the max value.
+			 * So insert the packet into the item group.
+			 */
+			if (insert_new_item(tbl, pkt, ip_id, sent_seq,
+						prev_idx, start_time) ==
+					INVALID_ARRAY_INDEX)
+				return -1;
+			return 0;
+		}
+		prev_idx = cur_idx;
+		cur_idx = tbl->items[cur_idx].next_pkt_idx;
+	} while (cur_idx != INVALID_ARRAY_INDEX);
+
+	/*
+	 * can't find a packet in the item group to merge,
+	 * so insert the packet into the item group.
+	 */
+	if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
+				start_time) == INVALID_ARRAY_INDEX)
+		return -1;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t flush_timestamp,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+	uint32_t max_key_num = tbl->max_key_num;
+
+	for (i = 0; i < max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		j = tbl->keys[i].start_index;
+		while (j != INVALID_ARRAY_INDEX) {
+			if (tbl->items[j].start_time <= flush_timestamp) {
+				out[k++] = tbl->items[j].firstseg;
+				if (tbl->items[j].nb_merged > 1)
+					update_header(&(tbl->items[j]));
+				/*
+				 * delete the item and get
+				 * the next packet index
+				 */
+				j = delete_item(tbl, j,
+						INVALID_ARRAY_INDEX);
+
+				/*
+				 * delete the key as all of
+				 * packets are flushed
+				 */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].start_index =
+						INVALID_ARRAY_INDEX;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/*
+				 * left packets of this key won't be
+				 * timeout, so go to check other keys.
+				 */
+				break;
+		}
+	}
+	return k;
+}
+
+uint32_t
+gro_tcp4_tbl_pkt_count(void *tbl)
+{
+	struct gro_tcp4_tbl *gro_tbl = tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+
+	return 0;
+}
diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
new file mode 100644
index 0000000..f41dcee
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.h
@@ -0,0 +1,210 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _GRO_TCP4_H_
+#define _GRO_TCP4_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL)
+
+/*
+ * the max L3 length of a TCP/IPv4 packet. The L3 length
+ * is the sum of ipv4 header, tcp header and L4 payload.
+ */
+#define TCP4_MAX_L3_LENGTH UINT16_MAX
+
+/* criteria of mergeing packets */
+struct tcp4_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;
+	uint16_t src_port;
+	uint16_t dst_port;
+};
+
+struct gro_tcp4_key {
+	struct tcp4_key key;
+	/*
+	 * the index of the first packet in the item group.
+	 * If the value is INVALID_ARRAY_INDEX, it means
+	 * the key is empty.
+	 */
+	uint32_t start_index;
+};
+
+struct gro_tcp4_item {
+	/*
+	 * first segment of the packet. If the value
+	 * is NULL, it means the item is empty.
+	 */
+	struct rte_mbuf *firstseg;
+	/* last segment of the packet */
+	struct rte_mbuf *lastseg;
+	/*
+	 * the time when the first packet is inserted
+	 * into the table. If a packet in the table is
+	 * merged with an incoming packet, this value
+	 * won't be updated. We set this value only
+	 * when the first packet is inserted into the
+	 * table.
+	 */
+	uint64_t start_time;
+	/*
+	 * we use next_pkt_idx to chain the packets that
+	 * have same key value but can't be merged together.
+	 */
+	uint32_t next_pkt_idx;
+	/* the sequence number of the packet */
+	uint32_t sent_seq;
+	/* the IP ID of the packet */
+	uint16_t ip_id;
+	/* the number of merged packets */
+	uint16_t nb_merged;
+};
+
+/*
+ * TCP/IPv4 reassembly table structure.
+ */
+struct gro_tcp4_tbl {
+	/* item array */
+	struct gro_tcp4_item *items;
+	/* key array */
+	struct gro_tcp4_key *keys;
+	/* current item number */
+	uint32_t item_num;
+	/* current key num */
+	uint32_t key_num;
+	/* item array size */
+	uint32_t max_item_num;
+	/* key array size */
+	uint32_t max_key_num;
+};
+
+/**
+ * This function creates a TCP/IPv4 reassembly table.
+ *
+ * @param socket_id
+ *  socket index for allocating TCP/IPv4 reassemblt table
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP/IPv4 GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP/IPv4 GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP/IPv4 reassembly table.
+ *
+ * @param tbl
+ *  a pointer points to the TCP/IPv4 reassembly table.
+ */
+void gro_tcp4_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP/IPv4 reassembly table
+ * to merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
+ * CWR, ECE or URG bit is set, are returned immediately. Packets which
+ * only have packet headers (i.e. without data) are also returned
+ * immediately. Otherwise, the packet is either merged, or inserted into
+ * the table. Besides, if there is no available space to insert the
+ * packet, this function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP/IPv4 reassembly table.
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ *
+ * @return
+ *  if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
+ *  or URG bit is set, or there is no available space in the table to
+ *  insert a new item or a new key, return a negative value. If the
+ *  packet is merged successfully, return an positive value. If the
+ *  packet is inserted into the table, return 0.
+ */
+int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP/IPv4 reassembly table
+ * to applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param flush_timestamp
+ *  this function flushes packets which are inserted into the table
+ *  before or at the flush_timestamp.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t flush_timestamp,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in a TCP/IPv4
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP/IPv4 reassembly table.
+ *
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t gro_tcp4_tbl_pkt_count(void *tbl);
+#endif
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index fa6d7ce..4624485 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "gro_tcp4.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
+		gro_tcp4_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_destroy, NULL};
+static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_pkt_count, NULL};
 
 /*
  * GRO context structure, which is used to merge packets. It keeps
@@ -121,28 +127,131 @@ rte_gro_ctx_destroy(void *ctx)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	struct gro_tcp4_tbl tcp_tbl;
+	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM];
+	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	/* get the actual number of packets */
+	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
+			param->max_item_per_flow));
+	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
+
+	for (i = 0; i < item_num; i++)
+		tcp_keys[i].start_index = INVALID_ARRAY_INDEX;
+
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = item_num;
+	tcp_tbl.max_item_num = item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
+					RTE_PTYPE_L4_TCP)) ==
+				(RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0) {
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+			}
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, current_time,
+				pkts, nb_pkts);
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+		}
+	}
+
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *ctx __rte_unused)
+		void *ctx)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t current_time;
+
+	if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
+					RTE_PTYPE_L4_TCP)) ==
+				(RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_ctx->tbls
+						[RTE_GRO_TCP_IPV4_INDEX],
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0) {
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) *
+				unprocess_num);
+	}
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *ctx __rte_unused,
-		uint64_t timeout_cycles __rte_unused,
-		uint64_t gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *ctx,
+		uint64_t timeout_cycles,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t flush_timestamp;
+
+	gro_types = gro_types & gro_ctx->gro_types;
+	flush_timestamp = rte_rdtsc() - timeout_cycles;
+
+	if (gro_types & RTE_GRO_TCP_IPV4) {
+		return gro_tcp4_tbl_timeout_flush(
+				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				flush_timestamp,
+				out, max_nb_out);
+	}
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 53ddd15..3b4fec0 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,8 +45,11 @@ extern "C" {
 /**< max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
 /**< current supported GRO num */
-#define RTE_GRO_TYPE_SUPPORT_NUM 0
+#define RTE_GRO_TYPE_SUPPORT_NUM 1
 
+/**< TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 struct rte_gro_param {
 	/**< desired GRO types */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v13 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-09  1:13                         ` [PATCH v13 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-07-09  1:13                         ` [PATCH v13 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-09  1:13                         ` Jiayu Hu
  2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  1:13 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  36 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  10 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 213 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d66e9c8..d4ff608 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -423,6 +424,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3838,6 +3847,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -14035,6 +14158,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 593677e..e0f0825 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2415,6 +2416,41 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.gro_types = RTE_GRO_TCP_IPV4;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 308c1b7..e09b803 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -379,6 +380,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9ccfb6d..73985c3 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -35,6 +35,7 @@
 #define _TESTPMD_H_
 
 #include <rte_pci.h>
+#include <rte_gro.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -432,6 +433,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -630,6 +639,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 6c0d526..bb6a667 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -898,6 +898,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                           ` (2 preceding siblings ...)
  2017-07-09  1:13                         ` [PATCH v13 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-09  5:46                         ` Jiayu Hu
  2017-07-09  5:46                           ` [PATCH v14 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
                                             ` (3 more replies)
  3 siblings, 4 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  5:46 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweight mode and heavyweight mode. If applications want to merge
packets in a simple way and the number of packets is small, they can
select the lightweight mode API. If applications need more fine-grained
controls, they can select the heavyweight mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
	too;
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
==========
v14:
- fix compilation issue
v13:
- optimize 'for' statement
- update MAINTAINER file
v12:
- remove max_timeout_cycles from struct rte_gro_param
- remove is_valid from struct gro_tcp4_key
- optimize key comparison function
- avoid updating IP ID for merged packets
- change GRO_TCP4_TBL_MAX_ITEM_NUM value to 1024*1024
- modify gro_tcp4_tbl_timeout_flush
- change rte_gro_get_count to rte_gro_get_pkt_count
- fix code aligment issue
- add EXPERIMENTAL mark to heavyweight APIs
v11:
- avoid converting big-endian to little-endian when compare key
- add sent_seq and ip_id to gro_tcp4_item to accelarate packet
	reassembly
- remove max_packet_size from rte_gro_param
- add inline functions to replace reduplicate codes
- change external API names and structure name
	(rte_gro_tbl_xxx -> rte_gro_ctx_xxx)
- fix coding style issues and order issue in rte_gro_version.map
- change inacccurate comments
- change internal files name from rte_gro_tcp4.x to gro_tcp4.x
v10:
- add support to merge '<seq=2, seq=1>' TCP/IPv4 packets
- check if IP ID is consecutive and update IP ID for merged packets
- check SYN, FIN, PSH, RST, URG flags
- use different reassembly table structures and APIs for TCP/IPv4 GRO
	and TCP/IPv6 GRO
- change file name from 'rte_gro_tcp.x' to 'rte_gro_tcp4.x'
v9:
- avoid defining variable size structure array and memset variable size
	in rte_gro_reassemble_burst
- change internal structure name from 'te_gro_tbl' to 'gro_tbl'
- delete useless variables in rte_gro_tcp.c
v8:
- merge rte_gro_flush and rte_gro_timeout_flush together and optimize
	flushing operation
- enable rte_gro_reassemble to process N inputted packets
- add rte_gro_tbl_item_num to get packet num in the GRO table
- add 'lastseg' to struct gro_tcp_item to get last segment faster
- add operations to handle rte_malloc failure
- use mbuf->l2_len/l3_len/l4_len instead of parsing header
- remove 'is_groed' and 'is_valid' in struct gro_tcp_item
- fix bugs in gro_tcp4_reassemble
- pass start-time as a parameter to avoid frequently calling rte_rdtsc 
- modify rte_gro_tbl_create prototype
- add 'RTE_' to external macros
- remove 'goto'
- remove inappropriate 'const'
- hide internal variables
v7:
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
	rte_gro_reassemble_burst
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
v6:
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
v5:
- fix some bugs
- fix coding style issues
v4:
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
v3:
- fix compilation issues.
v2:
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 MAINTAINERS                                 |   4 +
 app/test-pmd/cmdline.c                      | 125 +++++++
 app/test-pmd/config.c                       |  36 ++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  10 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 +++
 lib/librte_gro/gro_tcp4.c                   | 505 ++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h                   | 210 ++++++++++++
 lib/librte_gro/rte_gro.c                    | 278 +++++++++++++++
 lib/librte_gro/rte_gro.h                    | 211 ++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 17 files changed, 1499 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

-- 
2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* [PATCH v14 1/3] lib: add Generic Receive Offload API framework
  2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
@ 2017-07-09  5:46                           ` Jiayu Hu
  2017-07-09  5:46                           ` [PATCH v14 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  5:46 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains
performance by reassembling small packets into large ones. This
patchset is to support GRO in DPDK. To support GRO, this patch
implements a GRO API framework.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes.
One is called lightweight mode, the other is called heavyweight mode.
If applications want to merge packets in a simple way and the number
of packets is relatively small, they can use the lightweight mode.
If applications need more fine-grained controls, they can choose the
heavyweight mode.

rte_gro_reassemble_burst is the main reassembly API which is used in
lightweight mode and processes N packets at a time. For applications,
performing GRO in lightweight mode is simple. They just need to invoke
rte_gro_reassemble_burst. Applications can get GROed packets as soon as
rte_gro_reassemble_burst returns.

rte_gro_reassemble is the main reassembly API which is used in
heavyweight mode and tries to merge N inputted packets with the packets
in GRO reassembly tables. For applications, performing GRO in heavyweight
mode is relatively complicated. Before performing GRO, applications need
to create a GRO context object, which keeps reassembly tables of
desired GRO types, by rte_gro_ctx_create. Then applications can use
rte_gro_reassemble to merge packets. The GROed packets are in the
reassembly tables of the GRO context object. If applications want to get
them, applications need to manually flush them by flush API.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 MAINTAINERS                        |   4 +
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_gro/Makefile            |  50 +++++++++
 lib/librte_gro/rte_gro.c           | 169 ++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h           | 208 +++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_version.map |  12 +++
 mk/rte.app.mk                      |   1 +
 8 files changed, 451 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 8fb2132..d20fca5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -648,6 +648,10 @@ F: doc/guides/prog_guide/pdump_lib.rst
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Generic receive offload
+M: Jiayu Hu <jiayu.hu@intel.com>
+F: lib/librte_gro/
+
 
 Packet Framework
 ----------------
diff --git a/config/common_base b/config/common_base
index bb1ba8b..0ac51a1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -713,6 +713,11 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index aef584e..120fca6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
+DEPDIRS-librte_gro := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..7e0f128
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+# source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..fa6d7ce
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,169 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+
+#include "rte_gro.h"
+
+typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+typedef void (*gro_tbl_destroy_fn)(void *tbl);
+typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
+
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM];
+
+/*
+ * GRO context structure, which is used to merge packets. It keeps
+ * many reassembly tables of desired GRO types. Applications need to
+ * create GRO context objects before using rte_gro_reassemble to
+ * perform GRO.
+ */
+struct gro_ctx {
+	/* GRO types to perform */
+	uint64_t gro_types;
+	/* reassembly tables */
+	void *tbls[RTE_GRO_TYPE_MAX_NUM];
+};
+
+void *
+rte_gro_ctx_create(const struct rte_gro_param *param)
+{
+	struct gro_ctx *gro_ctx;
+	gro_tbl_create_fn create_tbl_fn;
+	uint64_t gro_type_flag = 0;
+	uint64_t gro_types = 0;
+	uint8_t i;
+
+	gro_ctx = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_ctx),
+			RTE_CACHE_LINE_SIZE,
+			param->socket_id);
+	if (gro_ctx == NULL)
+		return NULL;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((param->gro_types & gro_type_flag) == 0)
+			continue;
+
+		create_tbl_fn = tbl_create_fn[i];
+		if (create_tbl_fn == NULL)
+			continue;
+
+		gro_ctx->tbls[i] = create_tbl_fn(param->socket_id,
+				param->max_flow_num,
+				param->max_item_per_flow);
+		if (gro_ctx->tbls[i] == NULL) {
+			/* destroy all created tables */
+			gro_ctx->gro_types = gro_types;
+			rte_gro_ctx_destroy(gro_ctx);
+			return NULL;
+		}
+		gro_types |= gro_type_flag;
+	}
+	gro_ctx->gro_types = param->gro_types;
+
+	return gro_ctx;
+}
+
+void
+rte_gro_ctx_destroy(void *ctx)
+{
+	gro_tbl_destroy_fn destroy_tbl_fn;
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	if (gro_ctx == NULL)
+		return;
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+		destroy_tbl_fn = tbl_destroy_fn[i];
+		if (destroy_tbl_fn)
+			destroy_tbl_fn(gro_ctx->tbls[i]);
+	}
+	rte_free(gro_ctx);
+}
+
+uint16_t
+rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+		uint16_t nb_pkts,
+		void *ctx __rte_unused)
+{
+	return nb_pkts;
+}
+
+uint16_t
+rte_gro_timeout_flush(void *ctx __rte_unused,
+		uint64_t timeout_cycles __rte_unused,
+		uint64_t gro_types __rte_unused,
+		struct rte_mbuf **out __rte_unused,
+		uint16_t max_nb_out __rte_unused)
+{
+	return 0;
+}
+
+uint64_t
+rte_gro_get_pkt_count(void *ctx)
+{
+	struct gro_ctx *gro_ctx = ctx;
+	gro_tbl_pkt_count_fn pkt_count_fn;
+	uint64_t item_num = 0;
+	uint64_t gro_type_flag;
+	uint8_t i;
+
+	for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
+		gro_type_flag = 1 << i;
+		if ((gro_ctx->gro_types & gro_type_flag) == 0)
+			continue;
+
+		pkt_count_fn = tbl_pkt_count_fn[i];
+		if (pkt_count_fn == NULL)
+			continue;
+		item_num += pkt_count_fn(gro_ctx->tbls[i]);
+	}
+	return item_num;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..53ddd15
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,208 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**< the max packets number that rte_gro_reassemble_burst()
+ * can process in each invocation.
+ */
+#define RTE_GRO_MAX_BURST_ITEM_NUM 128U
+
+/**< max number of supported GRO types */
+#define RTE_GRO_TYPE_MAX_NUM 64
+/**< current supported GRO num */
+#define RTE_GRO_TYPE_SUPPORT_NUM 0
+
+
+struct rte_gro_param {
+	/**< desired GRO types */
+	uint64_t gro_types;
+	/**< max flow number */
+	uint16_t max_flow_num;
+	/**< max packet number per flow */
+	uint16_t max_item_per_flow;
+	/**< socket index for allocating GRO related data structures,
+	 * like reassembly tables. When use rte_gro_reassemble_burst(),
+	 * applications don't need to set this value.
+	 */
+	uint16_t socket_id;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function create a GRO context object, which is used to merge
+ * packets in rte_gro_reassemble().
+ *
+ * @param param
+ *  applications use it to pass needed parameters to create a GRO
+ *  context object.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the GRO
+ *  context object. Otherwise, return NULL.
+ */
+void *rte_gro_ctx_create(const struct rte_gro_param *param);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function destroys a GRO context object.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ */
+void rte_gro_ctx_destroy(void *ctx);
+
+/**
+ * This is one of the main reassembly APIs, which merges numbers of
+ * packets at a time. It assumes that all inputted packets are with
+ * correct checksums. That is, applications should guarantee all
+ * inputted packets are correct. Besides, it doesn't re-calculate
+ * checksums for merged packets. If inputted packets are IP fragmented,
+ * this function assumes them are complete (i.e. with L4 header). After
+ * finishing processing, it returns all GROed packets to applications
+ * immediately.
+ *
+ * @param pkts
+ *  a pointer array which points to the packets to reassemble. Besides,
+ *  it keeps packet addresses for GROed packets.
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param param
+ *  applications use it to tell rte_gro_reassemble_burst() what rules
+ *  are demanded.
+ *
+ * @return
+ *  the number of packets after been GROed. If no packets are merged,
+ *  the returned value is nb_pkts.
+ */
+uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		const struct rte_gro_param *param);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Reassembly function, which tries to merge inputted packets with
+ * the packets in the reassembly tables of a given GRO context. This
+ * function assumes all inputted packets are with correct checksums.
+ * And it won't update checksums if two packets are merged. Besides,
+ * if inputted packets are IP fragmented, this function assumes they
+ * are complete packets (i.e. with L4 header).
+ *
+ * If the inputted packets don't have data or are with unsupported GRO
+ * types etc., they won't be processed and are returned to applications.
+ * Otherwise, the inputted packets are either merged or inserted into
+ * the table. If applications want get packets in the table, they need
+ * to call flush API.
+ *
+ * @param pkts
+ *  packet to reassemble. Besides, after this function finishes, it
+ *  keeps the unprocessed packets (e.g. without data or unsupported
+ *  GRO types).
+ * @param nb_pkts
+ *  the number of packets to reassemble.
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ *
+ * @return
+ *  return the number of unprocessed packets (e.g. without data or
+ *  unsupported GRO types). If all packets are processed (merged or
+ *  inserted into the table), return 0.
+ */
+uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *ctx);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function flushes the timeout packets from reassembly tables of
+ * desired GRO types. The max number of flushed timeout packets is the
+ * element number of the array which is used to keep the flushed packets.
+ *
+ * Besides, this function won't re-calculate checksums for merged
+ * packets in the tables. That is, the returned packets may be with
+ * wrong checksums.
+ *
+ * @param ctx
+ *  a pointer points to a GRO context object.
+ * @param timeout_cycles
+ *  max TTL for packets in reassembly tables, measured in nanosecond.
+ * @param gro_types
+ *  this function only flushes packets which belong to the GRO types
+ *  specified by gro_types.
+ * @param out
+ *  a pointer array that is used to keep flushed timeout packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of flushed packets. If no packets are flushed, return 0.
+ */
+uint16_t rte_gro_timeout_flush(void *ctx,
+		uint64_t timeout_cycles,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function returns the number of packets in all reassembly tables
+ * of a given GRO context.
+ *
+ * @param ctx
+ *  pointer points to a GRO context object.
+ *
+ * @return
+ *  the number of packets in all reassembly tables.
+ */
+uint64_t rte_gro_get_pkt_count(void *ctx);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_GRO_H_ */
diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map
new file mode 100644
index 0000000..a4e58f0
--- /dev/null
+++ b/lib/librte_gro/rte_gro_version.map
@@ -0,0 +1,12 @@
+DPDK_17.08 {
+	global:
+
+	rte_gro_ctrl_create;
+	rte_gro_ctrl_destroy;
+	rte_gro_get_pkt_count;
+	rte_gro_reassemble_burst;
+	rte_gro_reassemble;
+	rte_gro_timeout_flush;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index dbd3614..7e231b9 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v14 2/3] lib/gro: add TCP/IPv4 GRO support
  2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-09  5:46                           ` [PATCH v14 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
@ 2017-07-09  5:46                           ` Jiayu Hu
  2017-07-09  5:46                           ` [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
  2017-07-09 16:14                           ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Thomas Monjalon
  3 siblings, 0 replies; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  5:46 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

In this patch, we introduce five APIs to support TCP/IPv4 GRO.
- gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
- gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
    to merge packets.
- gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
- gro_tcp4_tbl_pkt_count: return the number of packets in a TCP/IPv4
    reassembly table.
- gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
    reassembly table.

TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
checksums for merged packets. If inputted packets are IP fragmented,
TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
headers).

In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
array and a item array, where the key array keeps the criteria to merge
packets and the item array keeps packet information.

One key in the key array points to an item group, which consists of
packets which have the same criteria value. If two packets are able to
merge, they must be in the same item group. Each key in the key array
includes two parts:
- criteria: the criteria of merging packets. If two packets can be
    merged, they must have the same criteria value.
- start_index: the index of the first incoming packet of the item group.

Each element in the item array keeps the information of one packet. It
mainly includes three parts:
- firstseg: the address of the first segment of the packet
- lastseg: the address of the last segment of the packet
- next_pkt_index: the index of the next packet in the same item group.
    All packets in the same item group are chained by next_pkt_index.
    With next_pkt_index, we can locate all packets in the same item
    group one by one.

To process an incoming packet needs three steps:
a. check if the packet should be processed. Packets with one of the
    following properties won't be processed:
	- FIN, SYN, RST, URG, PSH, ECE or CWR bit is set;
	- packet payload length is 0.
b. traverse the key array to find a key which has the same criteria
    value with the incoming packet. If find, goto step c. Otherwise,
    insert a new key and insert the packet into the item array.
c. locate the first packet in the item group via the start_index in the
    key. Then traverse all packets in the item group via next_pkt_index.
    If find one packet which can merge with the incoming one, merge them
    together. If can't find, insert the packet into this item group.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 doc/guides/rel_notes/release_17_08.rst |   7 +
 lib/librte_gro/Makefile                |   1 +
 lib/librte_gro/gro_tcp4.c              | 505 +++++++++++++++++++++++++++++++++
 lib/librte_gro/gro_tcp4.h              | 210 ++++++++++++++
 lib/librte_gro/rte_gro.c               | 137 ++++++++-
 lib/librte_gro/rte_gro.h               |   5 +-
 6 files changed, 850 insertions(+), 15 deletions(-)
 create mode 100644 lib/librte_gro/gro_tcp4.c
 create mode 100644 lib/librte_gro/gro_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index a2da17f..1736a3e 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -161,6 +161,13 @@ New Features
   to verify functionality and measure the performance parameters of DPDK
   eventdev devices.
 
+* **Add Generic Receive Offload API support.**
+
+  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
+  packets. GRO API assumes all inputted packets are with correct
+  checksums. GRO API doesn't update checksums for merged packets. If
+  inputted packets are IP fragmented, GRO API assumes they are complete
+  packets (i.e. with L4 headers).
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
index 7e0f128..747eeec 100644
--- a/lib/librte_gro/Makefile
+++ b/lib/librte_gro/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # source files
 SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
new file mode 100644
index 0000000..61a0423
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.c
@@ -0,0 +1,505 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+
+#include "gro_tcp4.h"
+
+void *
+gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow)
+{
+	struct gro_tcp4_tbl *tbl;
+	size_t size;
+	uint32_t entries_num, i;
+
+	entries_num = max_flow_num * max_item_per_flow;
+	entries_num = RTE_MIN(entries_num, GRO_TCP4_TBL_MAX_ITEM_NUM);
+
+	if (entries_num == 0)
+		return NULL;
+
+	tbl = rte_zmalloc_socket(__func__,
+			sizeof(struct gro_tcp4_tbl),
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl == NULL)
+		return NULL;
+
+	size = sizeof(struct gro_tcp4_item) * entries_num;
+	tbl->items = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->items == NULL) {
+		rte_free(tbl);
+		return NULL;
+	}
+	tbl->max_item_num = entries_num;
+
+	size = sizeof(struct gro_tcp4_key) * entries_num;
+	tbl->keys = rte_zmalloc_socket(__func__,
+			size,
+			RTE_CACHE_LINE_SIZE,
+			socket_id);
+	if (tbl->keys == NULL) {
+		rte_free(tbl->items);
+		rte_free(tbl);
+		return NULL;
+	}
+	/* INVALID_ARRAY_INDEX indicates empty key */
+	for (i = 0; i < entries_num; i++)
+		tbl->keys[i].start_index = INVALID_ARRAY_INDEX;
+	tbl->max_key_num = entries_num;
+
+	return tbl;
+}
+
+void
+gro_tcp4_tbl_destroy(void *tbl)
+{
+	struct gro_tcp4_tbl *tcp_tbl = tbl;
+
+	if (tcp_tbl) {
+		rte_free(tcp_tbl->items);
+		rte_free(tcp_tbl->keys);
+	}
+	rte_free(tcp_tbl);
+}
+
+/*
+ * merge two TCP/IPv4 packets without updating checksums.
+ * If cmp is larger than 0, append the new packet to the
+ * original packet. Otherwise, pre-pend the new packet to
+ * the original packet.
+ */
+static inline int
+merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		int cmp)
+{
+	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
+	uint16_t tcp_datalen;
+
+	if (cmp > 0) {
+		pkt_head = item_src->firstseg;
+		pkt_tail = pkt;
+	} else {
+		pkt_head = pkt;
+		pkt_tail = item_src->firstseg;
+	}
+
+	/* check if the packet length will be beyond the max value */
+	tcp_datalen = pkt_tail->pkt_len - pkt_tail->l2_len -
+		pkt_tail->l3_len - pkt_tail->l4_len;
+	if (pkt_head->pkt_len - pkt_head->l2_len + tcp_datalen >
+			TCP4_MAX_L3_LENGTH)
+		return 0;
+
+	/* remove packet header for the tail packet */
+	rte_pktmbuf_adj(pkt_tail,
+			pkt_tail->l2_len +
+			pkt_tail->l3_len +
+			pkt_tail->l4_len);
+
+	/* chain two packets together */
+	if (cmp > 0) {
+		item_src->lastseg->next = pkt;
+		item_src->lastseg = rte_pktmbuf_lastseg(pkt);
+		/* update IP ID to the larger value */
+		item_src->ip_id = ip_id;
+	} else {
+		lastseg = rte_pktmbuf_lastseg(pkt);
+		lastseg->next = item_src->firstseg;
+		item_src->firstseg = pkt;
+		/* update sent_seq to the smaller value */
+		item_src->sent_seq = sent_seq;
+	}
+	item_src->nb_merged++;
+
+	/* update mbuf metadata for the merged packet */
+	pkt_head->nb_segs += pkt_tail->nb_segs;
+	pkt_head->pkt_len += pkt_tail->pkt_len;
+
+	return 1;
+}
+
+static inline int
+check_seq_option(struct gro_tcp4_item *item,
+		struct tcp_hdr *tcp_hdr,
+		uint16_t tcp_hl,
+		uint16_t tcp_dl,
+		uint16_t ip_id,
+		uint32_t sent_seq)
+{
+	struct rte_mbuf *pkt0 = item->firstseg;
+	struct ipv4_hdr *ipv4_hdr0;
+	struct tcp_hdr *tcp_hdr0;
+	uint16_t tcp_hl0, tcp_dl0;
+	uint16_t len;
+
+	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
+			pkt0->l2_len);
+	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
+	tcp_hl0 = pkt0->l4_len;
+
+	/* check if TCP option fields equal. If not, return 0. */
+	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
+	if ((tcp_hl != tcp_hl0) ||
+			((len > 0) && (memcmp(tcp_hdr + 1,
+					tcp_hdr0 + 1,
+					len) != 0)))
+		return 0;
+
+	/* check if the two packets are neighbors */
+	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
+	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
+			(ip_id == (item->ip_id + 1)))
+		/* append the new packet */
+		return 1;
+	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
+			((ip_id + item->nb_merged) == item->ip_id))
+		/* pre-pend the new packet */
+		return -1;
+	else
+		return 0;
+}
+
+static inline uint32_t
+find_an_empty_item(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+	uint32_t max_item_num = tbl->max_item_num;
+
+	for (i = 0; i < max_item_num; i++)
+		if (tbl->items[i].firstseg == NULL)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+find_an_empty_key(struct gro_tcp4_tbl *tbl)
+{
+	uint32_t i;
+	uint32_t max_key_num = tbl->max_key_num;
+
+	for (i = 0; i < max_key_num; i++)
+		if (tbl->keys[i].start_index == INVALID_ARRAY_INDEX)
+			return i;
+	return INVALID_ARRAY_INDEX;
+}
+
+static inline uint32_t
+insert_new_item(struct gro_tcp4_tbl *tbl,
+		struct rte_mbuf *pkt,
+		uint16_t ip_id,
+		uint32_t sent_seq,
+		uint32_t prev_idx,
+		uint64_t start_time)
+{
+	uint32_t item_idx;
+
+	item_idx = find_an_empty_item(tbl);
+	if (item_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	tbl->items[item_idx].firstseg = pkt;
+	tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
+	tbl->items[item_idx].start_time = start_time;
+	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
+	tbl->items[item_idx].sent_seq = sent_seq;
+	tbl->items[item_idx].ip_id = ip_id;
+	tbl->items[item_idx].nb_merged = 1;
+	tbl->item_num++;
+
+	/* if the previous packet exists, chain the new one with it */
+	if (prev_idx != INVALID_ARRAY_INDEX) {
+		tbl->items[item_idx].next_pkt_idx =
+			tbl->items[prev_idx].next_pkt_idx;
+		tbl->items[prev_idx].next_pkt_idx = item_idx;
+	}
+
+	return item_idx;
+}
+
+static inline uint32_t
+delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx,
+		uint32_t prev_item_idx)
+{
+	uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
+
+	/* set NULL to firstseg to indicate it's an empty item */
+	tbl->items[item_idx].firstseg = NULL;
+	tbl->item_num--;
+	if (prev_item_idx != INVALID_ARRAY_INDEX)
+		tbl->items[prev_item_idx].next_pkt_idx = next_idx;
+
+	return next_idx;
+}
+
+static inline uint32_t
+insert_new_key(struct gro_tcp4_tbl *tbl,
+		struct tcp4_key *key_src,
+		uint32_t item_idx)
+{
+	struct tcp4_key *key_dst;
+	uint32_t key_idx;
+
+	key_idx = find_an_empty_key(tbl);
+	if (key_idx == INVALID_ARRAY_INDEX)
+		return INVALID_ARRAY_INDEX;
+
+	key_dst = &(tbl->keys[key_idx].key);
+
+	ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
+	ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
+	key_dst->ip_src_addr = key_src->ip_src_addr;
+	key_dst->ip_dst_addr = key_src->ip_dst_addr;
+	key_dst->recv_ack = key_src->recv_ack;
+	key_dst->src_port = key_src->src_port;
+	key_dst->dst_port = key_src->dst_port;
+
+	/* non-INVALID_ARRAY_INDEX value indicates this key is valid */
+	tbl->keys[key_idx].start_index = item_idx;
+	tbl->key_num++;
+
+	return key_idx;
+}
+
+static inline int
+is_same_key(struct tcp4_key k1, struct tcp4_key k2)
+{
+	if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
+		return 0;
+
+	if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
+		return 0;
+
+	return ((k1.ip_src_addr == k2.ip_src_addr) &&
+			(k1.ip_dst_addr == k2.ip_dst_addr) &&
+			(k1.recv_ack == k2.recv_ack) &&
+			(k1.src_port == k2.src_port) &&
+			(k1.dst_port == k2.dst_port));
+}
+
+/*
+ * update packet length for the flushed packet.
+ */
+static inline void
+update_header(struct gro_tcp4_item *item)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct rte_mbuf *pkt = item->firstseg;
+
+	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+			pkt->l2_len);
+	ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
+			pkt->l2_len);
+}
+
+int32_t
+gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time)
+{
+	struct ether_hdr *eth_hdr;
+	struct ipv4_hdr *ipv4_hdr;
+	struct tcp_hdr *tcp_hdr;
+	uint32_t sent_seq;
+	uint16_t tcp_dl, ip_id;
+
+	struct tcp4_key key;
+	uint32_t cur_idx, prev_idx, item_idx;
+	uint32_t i, max_key_num;
+	int cmp;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+
+	/*
+	 * if FIN, SYN, RST, PSH, URG, ECE or
+	 * CWR is set, return immediately.
+	 */
+	if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
+		return -1;
+	/* if payload length is 0, return immediately */
+	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
+		pkt->l4_len;
+	if (tcp_dl == 0)
+		return -1;
+
+	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
+
+	ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
+	ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
+	key.ip_src_addr = ipv4_hdr->src_addr;
+	key.ip_dst_addr = ipv4_hdr->dst_addr;
+	key.src_port = tcp_hdr->src_port;
+	key.dst_port = tcp_hdr->dst_port;
+	key.recv_ack = tcp_hdr->recv_ack;
+
+	/* search for a key */
+	max_key_num = tbl->max_key_num;
+	for (i = 0; i < max_key_num; i++) {
+		if ((tbl->keys[i].start_index != INVALID_ARRAY_INDEX) &&
+				is_same_key(tbl->keys[i].key, key))
+			break;
+	}
+
+	/* can't find a key, so insert a new key and a new item. */
+	if (i == tbl->max_key_num) {
+		item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
+				INVALID_ARRAY_INDEX, start_time);
+		if (item_idx == INVALID_ARRAY_INDEX)
+			return -1;
+		if (insert_new_key(tbl, &key, item_idx) ==
+				INVALID_ARRAY_INDEX) {
+			/*
+			 * fail to insert a new key, so
+			 * delete the inserted item
+			 */
+			delete_item(tbl, item_idx, INVALID_ARRAY_INDEX);
+			return -1;
+		}
+		return 0;
+	}
+
+	/* traverse all packets in the item group to find one to merge */
+	cur_idx = tbl->keys[i].start_index;
+	prev_idx = cur_idx;
+	do {
+		cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
+				pkt->l4_len, tcp_dl, ip_id, sent_seq);
+		if (cmp) {
+			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
+						pkt, ip_id,
+						sent_seq, cmp))
+				return 1;
+			/*
+			 * fail to merge two packets since the packet
+			 * length will be greater than the max value.
+			 * So insert the packet into the item group.
+			 */
+			if (insert_new_item(tbl, pkt, ip_id, sent_seq,
+						prev_idx, start_time) ==
+					INVALID_ARRAY_INDEX)
+				return -1;
+			return 0;
+		}
+		prev_idx = cur_idx;
+		cur_idx = tbl->items[cur_idx].next_pkt_idx;
+	} while (cur_idx != INVALID_ARRAY_INDEX);
+
+	/*
+	 * can't find a packet in the item group to merge,
+	 * so insert the packet into the item group.
+	 */
+	if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
+				start_time) == INVALID_ARRAY_INDEX)
+		return -1;
+
+	return 0;
+}
+
+uint16_t
+gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t flush_timestamp,
+		struct rte_mbuf **out,
+		uint16_t nb_out)
+{
+	uint16_t k = 0;
+	uint32_t i, j;
+	uint32_t max_key_num = tbl->max_key_num;
+
+	for (i = 0; i < max_key_num; i++) {
+		/* all keys have been checked, return immediately */
+		if (tbl->key_num == 0)
+			return k;
+
+		j = tbl->keys[i].start_index;
+		while (j != INVALID_ARRAY_INDEX) {
+			if (tbl->items[j].start_time <= flush_timestamp) {
+				out[k++] = tbl->items[j].firstseg;
+				if (tbl->items[j].nb_merged > 1)
+					update_header(&(tbl->items[j]));
+				/*
+				 * delete the item and get
+				 * the next packet index
+				 */
+				j = delete_item(tbl, j,
+						INVALID_ARRAY_INDEX);
+
+				/*
+				 * delete the key as all of
+				 * packets are flushed
+				 */
+				if (j == INVALID_ARRAY_INDEX) {
+					tbl->keys[i].start_index =
+						INVALID_ARRAY_INDEX;
+					tbl->key_num--;
+				} else
+					/* update start_index of the key */
+					tbl->keys[i].start_index = j;
+
+				if (k == nb_out)
+					return k;
+			} else
+				/*
+				 * left packets of this key won't be
+				 * timeout, so go to check other keys.
+				 */
+				break;
+		}
+	}
+	return k;
+}
+
+uint32_t
+gro_tcp4_tbl_pkt_count(void *tbl)
+{
+	struct gro_tcp4_tbl *gro_tbl = tbl;
+
+	if (gro_tbl)
+		return gro_tbl->item_num;
+
+	return 0;
+}
diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
new file mode 100644
index 0000000..f41dcee
--- /dev/null
+++ b/lib/librte_gro/gro_tcp4.h
@@ -0,0 +1,210 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _GRO_TCP4_H_
+#define _GRO_TCP4_H_
+
+#define INVALID_ARRAY_INDEX 0xffffffffUL
+#define GRO_TCP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL)
+
+/*
+ * the max L3 length of a TCP/IPv4 packet. The L3 length
+ * is the sum of ipv4 header, tcp header and L4 payload.
+ */
+#define TCP4_MAX_L3_LENGTH UINT16_MAX
+
+/* criteria of mergeing packets */
+struct tcp4_key {
+	struct ether_addr eth_saddr;
+	struct ether_addr eth_daddr;
+	uint32_t ip_src_addr;
+	uint32_t ip_dst_addr;
+
+	uint32_t recv_ack;
+	uint16_t src_port;
+	uint16_t dst_port;
+};
+
+struct gro_tcp4_key {
+	struct tcp4_key key;
+	/*
+	 * the index of the first packet in the item group.
+	 * If the value is INVALID_ARRAY_INDEX, it means
+	 * the key is empty.
+	 */
+	uint32_t start_index;
+};
+
+struct gro_tcp4_item {
+	/*
+	 * first segment of the packet. If the value
+	 * is NULL, it means the item is empty.
+	 */
+	struct rte_mbuf *firstseg;
+	/* last segment of the packet */
+	struct rte_mbuf *lastseg;
+	/*
+	 * the time when the first packet is inserted
+	 * into the table. If a packet in the table is
+	 * merged with an incoming packet, this value
+	 * won't be updated. We set this value only
+	 * when the first packet is inserted into the
+	 * table.
+	 */
+	uint64_t start_time;
+	/*
+	 * we use next_pkt_idx to chain the packets that
+	 * have same key value but can't be merged together.
+	 */
+	uint32_t next_pkt_idx;
+	/* the sequence number of the packet */
+	uint32_t sent_seq;
+	/* the IP ID of the packet */
+	uint16_t ip_id;
+	/* the number of merged packets */
+	uint16_t nb_merged;
+};
+
+/*
+ * TCP/IPv4 reassembly table structure.
+ */
+struct gro_tcp4_tbl {
+	/* item array */
+	struct gro_tcp4_item *items;
+	/* key array */
+	struct gro_tcp4_key *keys;
+	/* current item number */
+	uint32_t item_num;
+	/* current key num */
+	uint32_t key_num;
+	/* item array size */
+	uint32_t max_item_num;
+	/* key array size */
+	uint32_t max_key_num;
+};
+
+/**
+ * This function creates a TCP/IPv4 reassembly table.
+ *
+ * @param socket_id
+ *  socket index for allocating TCP/IPv4 reassemblt table
+ * @param max_flow_num
+ *  the maximum number of flows in the TCP/IPv4 GRO table
+ * @param max_item_per_flow
+ *  the maximum packet number per flow.
+ *
+ * @return
+ *  if create successfully, return a pointer which points to the
+ *  created TCP/IPv4 GRO table. Otherwise, return NULL.
+ */
+void *gro_tcp4_tbl_create(uint16_t socket_id,
+		uint16_t max_flow_num,
+		uint16_t max_item_per_flow);
+
+/**
+ * This function destroys a TCP/IPv4 reassembly table.
+ *
+ * @param tbl
+ *  a pointer points to the TCP/IPv4 reassembly table.
+ */
+void gro_tcp4_tbl_destroy(void *tbl);
+
+/**
+ * This function searches for a packet in the TCP/IPv4 reassembly table
+ * to merge with the inputted one. To merge two packets is to chain them
+ * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
+ * CWR, ECE or URG bit is set, are returned immediately. Packets which
+ * only have packet headers (i.e. without data) are also returned
+ * immediately. Otherwise, the packet is either merged, or inserted into
+ * the table. Besides, if there is no available space to insert the
+ * packet, this function returns immediately too.
+ *
+ * This function assumes the inputted packet is with correct IPv4 and
+ * TCP checksums. And if two packets are merged, it won't re-calculate
+ * IPv4 and TCP checksums. Besides, if the inputted packet is IP
+ * fragmented, it assumes the packet is complete (with TCP header).
+ *
+ * @param pkt
+ *  packet to reassemble.
+ * @param tbl
+ *  a pointer that points to a TCP/IPv4 reassembly table.
+ * @start_time
+ *  the start time that the packet is inserted into the table
+ *
+ * @return
+ *  if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
+ *  or URG bit is set, or there is no available space in the table to
+ *  insert a new item or a new key, return a negative value. If the
+ *  packet is merged successfully, return an positive value. If the
+ *  packet is inserted into the table, return 0.
+ */
+int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
+		struct gro_tcp4_tbl *tbl,
+		uint64_t start_time);
+
+/**
+ * This function flushes timeout packets in a TCP/IPv4 reassembly table
+ * to applications, and without updating checksums for merged packets.
+ * The max number of flushed timeout packets is the element number of
+ * the array which is used to keep flushed packets.
+ *
+ * @param tbl
+ *  a pointer that points to a TCP GRO table.
+ * @param flush_timestamp
+ *  this function flushes packets which are inserted into the table
+ *  before or at the flush_timestamp.
+ * @param out
+ *  pointer array which is used to keep flushed packets.
+ * @param nb_out
+ *  the element number of out. It's also the max number of timeout
+ *  packets that can be flushed finally.
+ *
+ * @return
+ *  the number of packets that are returned.
+ */
+uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
+		uint64_t flush_timestamp,
+		struct rte_mbuf **out,
+		uint16_t nb_out);
+
+/**
+ * This function returns the number of the packets in a TCP/IPv4
+ * reassembly table.
+ *
+ * @param tbl
+ *  pointer points to a TCP/IPv4 reassembly table.
+ *
+ * @return
+ *  the number of packets in the table
+ */
+uint32_t gro_tcp4_tbl_pkt_count(void *tbl);
+#endif
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
index fa6d7ce..4998b90 100644
--- a/lib/librte_gro/rte_gro.c
+++ b/lib/librte_gro/rte_gro.c
@@ -32,8 +32,11 @@
 
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
 
 #include "rte_gro.h"
+#include "gro_tcp4.h"
 
 typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 		uint16_t max_flow_num,
@@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
 typedef void (*gro_tbl_destroy_fn)(void *tbl);
 typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
 
-static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
-static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM];
+static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
+		gro_tcp4_tbl_create, NULL};
+static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_destroy, NULL};
+static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
+			gro_tcp4_tbl_pkt_count, NULL};
 
 /*
  * GRO context structure, which is used to merge packets. It keeps
@@ -121,28 +127,131 @@ rte_gro_ctx_destroy(void *ctx)
 }
 
 uint16_t
-rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		const struct rte_gro_param *param __rte_unused)
+		const struct rte_gro_param *param)
 {
-	return nb_pkts;
+	uint16_t i;
+	uint16_t nb_after_gro = nb_pkts;
+	uint32_t item_num;
+
+	/* allocate a reassembly table for TCP/IPv4 GRO */
+	struct gro_tcp4_tbl tcp_tbl;
+	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM];
+	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {{0} };
+
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	uint16_t unprocess_num = 0;
+	int32_t ret;
+	uint64_t current_time;
+
+	if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	/* get the actual number of packets */
+	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
+			param->max_item_per_flow));
+	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
+
+	for (i = 0; i < item_num; i++)
+		tcp_keys[i].start_index = INVALID_ARRAY_INDEX;
+
+	tcp_tbl.keys = tcp_keys;
+	tcp_tbl.items = tcp_items;
+	tcp_tbl.key_num = 0;
+	tcp_tbl.item_num = 0;
+	tcp_tbl.max_key_num = item_num;
+	tcp_tbl.max_item_num = item_num;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
+					RTE_PTYPE_L4_TCP)) ==
+				(RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) {
+			ret = gro_tcp4_reassemble(pkts[i],
+					&tcp_tbl,
+					current_time);
+			if (ret > 0)
+				/* merge successfully */
+				nb_after_gro--;
+			else if (ret < 0) {
+				unprocess_pkts[unprocess_num++] =
+					pkts[i];
+			}
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+
+	/* re-arrange GROed packets */
+	if (nb_after_gro < nb_pkts) {
+		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, current_time,
+				pkts, nb_pkts);
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+		}
+	}
+
+	return nb_after_gro;
 }
 
 uint16_t
-rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
+rte_gro_reassemble(struct rte_mbuf **pkts,
 		uint16_t nb_pkts,
-		void *ctx __rte_unused)
+		void *ctx)
 {
-	return nb_pkts;
+	uint16_t i, unprocess_num = 0;
+	struct rte_mbuf *unprocess_pkts[nb_pkts];
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t current_time;
+
+	if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
+		return nb_pkts;
+
+	current_time = rte_rdtsc();
+
+	for (i = 0; i < nb_pkts; i++) {
+		if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
+					RTE_PTYPE_L4_TCP)) ==
+				(RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)) {
+			if (gro_tcp4_reassemble(pkts[i],
+						gro_ctx->tbls
+						[RTE_GRO_TCP_IPV4_INDEX],
+						current_time) < 0)
+				unprocess_pkts[unprocess_num++] = pkts[i];
+		} else
+			unprocess_pkts[unprocess_num++] = pkts[i];
+	}
+	if (unprocess_num > 0) {
+		memcpy(pkts, unprocess_pkts,
+				sizeof(struct rte_mbuf *) *
+				unprocess_num);
+	}
+
+	return unprocess_num;
 }
 
 uint16_t
-rte_gro_timeout_flush(void *ctx __rte_unused,
-		uint64_t timeout_cycles __rte_unused,
-		uint64_t gro_types __rte_unused,
-		struct rte_mbuf **out __rte_unused,
-		uint16_t max_nb_out __rte_unused)
+rte_gro_timeout_flush(void *ctx,
+		uint64_t timeout_cycles,
+		uint64_t gro_types,
+		struct rte_mbuf **out,
+		uint16_t max_nb_out)
 {
+	struct gro_ctx *gro_ctx = ctx;
+	uint64_t flush_timestamp;
+
+	gro_types = gro_types & gro_ctx->gro_types;
+	flush_timestamp = rte_rdtsc() - timeout_cycles;
+
+	if (gro_types & RTE_GRO_TCP_IPV4) {
+		return gro_tcp4_tbl_timeout_flush(
+				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
+				flush_timestamp,
+				out, max_nb_out);
+	}
 	return 0;
 }
 
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
index 53ddd15..3b4fec0 100644
--- a/lib/librte_gro/rte_gro.h
+++ b/lib/librte_gro/rte_gro.h
@@ -45,8 +45,11 @@ extern "C" {
 /**< max number of supported GRO types */
 #define RTE_GRO_TYPE_MAX_NUM 64
 /**< current supported GRO num */
-#define RTE_GRO_TYPE_SUPPORT_NUM 0
+#define RTE_GRO_TYPE_SUPPORT_NUM 1
 
+/**< TCP/IPv4 GRO flag */
+#define RTE_GRO_TCP_IPV4_INDEX 0
+#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
 
 struct rte_gro_param {
 	/**< desired GRO types */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
  2017-07-09  5:46                           ` [PATCH v14 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
  2017-07-09  5:46                           ` [PATCH v14 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
@ 2017-07-09  5:46                           ` Jiayu Hu
  2017-07-09  7:59                             ` Yao, Lei A
  2017-07-09 16:14                           ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Thomas Monjalon
  3 siblings, 1 reply; 141+ messages in thread
From: Jiayu Hu @ 2017-07-09  5:46 UTC (permalink / raw)
  To: dev
  Cc: jianfeng.tan, konstantin.ananyev, yliu, stephen, jingjing.wu,
	lei.a.yao, Jiayu Hu

This patch enables TCP/IPv4 GRO library in csum forwarding engine.
By default, GRO is turned off. Users can use command "gro (on|off)
(port_id)" to enable or disable GRO for a given port. If a port is
enabled GRO, all TCP/IPv4 packets received from the port are performed
GRO. Besides, users can set max flow number and packets number per-flow
by command "gro set (max_flow_num) (max_item_num_per_flow) (port_id)".

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 app/test-pmd/cmdline.c                      | 125 ++++++++++++++++++++++++++++
 app/test-pmd/config.c                       |  36 ++++++++
 app/test-pmd/csumonly.c                     |   5 ++
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  10 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
 6 files changed, 213 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d66e9c8..d4ff608 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include <rte_devargs.h>
 #include <rte_eth_ctrl.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 
 #include <cmdline_rdline.h>
 #include <cmdline_parse.h>
@@ -423,6 +424,14 @@ static void cmd_help_long_parsed(void *parsed_result,
 			"tso show (portid)"
 			"    Display the status of TCP Segmentation Offload.\n\n"
 
+			"gro (on|off) (port_id)"
+			"    Enable or disable Generic Receive Offload in io"
+			" forward engine.\n\n"
+
+			"gro set (max_flow_num) (max_item_num_per_flow) (port_id)\n"
+			"    Set max flow number and max packet number per-flow"
+			" for GRO.\n\n"
+
 			"set fwd (%s)\n"
 			"    Set packet forwarding mode.\n\n"
 
@@ -3838,6 +3847,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
 	},
 };
 
+/* *** SET GRO FOR A PORT *** */
+struct cmd_gro_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t mode;
+	uint8_t port_id;
+};
+
+static void
+cmd_enable_gro_parsed(void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct cmd_gro_result *res;
+
+	res = parsed_result;
+	setup_gro(res->mode, res->port_id);
+}
+
+cmdline_parse_token_string_t cmd_gro_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			cmd_keyword, "gro");
+cmdline_parse_token_string_t cmd_gro_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
+			mode, "on#off");
+cmdline_parse_token_num_t cmd_gro_pid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
+			port_id, UINT8);
+
+cmdline_parse_inst_t cmd_enable_gro = {
+	.f = cmd_enable_gro_parsed,
+	.data = NULL,
+	.help_str = "gro (on|off) (port_id)",
+	.tokens = {
+		(void *)&cmd_gro_keyword,
+		(void *)&cmd_gro_mode,
+		(void *)&cmd_gro_pid,
+		NULL,
+	},
+};
+
+/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** */
+struct cmd_gro_set_result {
+	cmdline_fixed_string_t gro;
+	cmdline_fixed_string_t mode;
+	uint16_t flow_num;
+	uint16_t item_num_per_flow;
+	uint8_t port_id;
+};
+
+static void
+cmd_gro_set_parsed(void *parsed_result,
+		       __attribute__((unused)) struct cmdline *cl,
+		       __attribute__((unused)) void *data)
+{
+	struct cmd_gro_set_result *res = parsed_result;
+
+	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+		return;
+	if (test_done == 0) {
+		printf("Before set GRO flow_num and item_num_per_flow,"
+				" please stop forwarding first\n");
+		return;
+	}
+
+	if (!strcmp(res->mode, "set")) {
+		if (res->flow_num == 0)
+			printf("Invalid flow number. Revert to default value:"
+					" %u.\n", GRO_DEFAULT_FLOW_NUM);
+		else
+			gro_ports[res->port_id].param.max_flow_num =
+				res->flow_num;
+
+		if (res->item_num_per_flow == 0)
+			printf("Invalid item number per-flow. Revert"
+					" to default value:%u.\n",
+					GRO_DEFAULT_ITEM_NUM_PER_FLOW);
+		else
+			gro_ports[res->port_id].param.max_item_per_flow =
+				res->item_num_per_flow;
+	}
+}
+
+cmdline_parse_token_string_t cmd_gro_set_gro =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				gro, "gro");
+cmdline_parse_token_string_t cmd_gro_set_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
+				mode, "set");
+cmdline_parse_token_num_t cmd_gro_set_flow_num =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				flow_num, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				item_num_per_flow, UINT16);
+cmdline_parse_token_num_t cmd_gro_set_portid =
+	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
+				port_id, UINT8);
+
+cmdline_parse_inst_t cmd_gro_set = {
+	.f = cmd_gro_set_parsed,
+	.data = NULL,
+	.help_str = "gro set <max_flow_num> <max_item_num_per_flow> "
+		"<port_id>: set max flow number and max packet number per-flow "
+		"for GRO",
+	.tokens = {
+		(void *)&cmd_gro_set_gro,
+		(void *)&cmd_gro_set_mode,
+		(void *)&cmd_gro_set_flow_num,
+		(void *)&cmd_gro_set_item_num_per_flow,
+		(void *)&cmd_gro_set_portid,
+		NULL,
+	},
+};
+
 /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
 struct cmd_set_flush_rx {
 	cmdline_fixed_string_t set;
@@ -14035,6 +14158,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_tso_show,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
 	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
+	(cmdline_parse_inst_t *)&cmd_enable_gro,
+	(cmdline_parse_inst_t *)&cmd_gro_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
 	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 593677e..e0f0825 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -71,6 +71,7 @@
 #ifdef RTE_LIBRTE_BNXT_PMD
 #include <rte_pmd_bnxt.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -2415,6 +2416,41 @@ set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs)
 	tx_pkt_nb_segs = (uint8_t) nb_segs;
 }
 
+void
+setup_gro(const char *mode, uint8_t port_id)
+{
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		printf("invalid port id %u\n", port_id);
+		return;
+	}
+	if (test_done == 0) {
+		printf("Before enable/disable GRO,"
+				" please stop forwarding first\n");
+		return;
+	}
+	if (strcmp(mode, "on") == 0) {
+		if (gro_ports[port_id].enable) {
+			printf("port %u has enabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 1;
+		gro_ports[port_id].param.gro_types = RTE_GRO_TCP_IPV4;
+
+		if (gro_ports[port_id].param.max_flow_num == 0)
+			gro_ports[port_id].param.max_flow_num =
+				GRO_DEFAULT_FLOW_NUM;
+		if (gro_ports[port_id].param.max_item_per_flow == 0)
+			gro_ports[port_id].param.max_item_per_flow =
+				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
+	} else {
+		if (gro_ports[port_id].enable == 0) {
+			printf("port %u has disabled GRO\n", port_id);
+			return;
+		}
+		gro_ports[port_id].enable = 0;
+	}
+}
+
 char*
 list_pkt_forwarding_modes(void)
 {
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 66fc9a0..178ad75 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -71,6 +71,7 @@
 #include <rte_prefetch.h>
 #include <rte_string_fns.h>
 #include <rte_flow.h>
+#include <rte_gro.h>
 #include "testpmd.h"
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
@@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				 nb_pkt_per_burst);
 	if (unlikely(nb_rx == 0))
 		return;
+	if (unlikely(gro_ports[fs->rx_port].enable))
+		nb_rx = rte_gro_reassemble_burst(pkts_burst,
+				nb_rx,
+				&(gro_ports[fs->rx_port].param));
 
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 308c1b7..e09b803 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -90,6 +90,7 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_gro.h>
 
 #include "testpmd.h"
 
@@ -379,6 +380,8 @@ lcoreid_t bitrate_lcore_id;
 uint8_t bitrate_enabled;
 #endif
 
+struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(uint8_t pi, struct rte_port *port);
 static void check_all_ports_link_status(uint32_t port_mask);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9ccfb6d..73985c3 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -35,6 +35,7 @@
 #define _TESTPMD_H_
 
 #include <rte_pci.h>
+#include <rte_gro.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -432,6 +433,14 @@ extern struct ether_addr peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. */
 
+#define GRO_DEFAULT_FLOW_NUM 4
+#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
+struct gro_status {
+	struct rte_gro_param param;
+	uint8_t enable;
+};
+extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
+
 static inline unsigned int
 lcore_num(void)
 {
@@ -630,6 +639,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t index);
 void get_5tuple_filter(uint8_t port_id, uint16_t index);
 int rx_queue_id_is_invalid(queueid_t rxq_id);
 int tx_queue_id_is_invalid(queueid_t txq_id);
+void setup_gro(const char *mode, uint8_t port_id);
 
 /* Functions to manage the set of filtered Multicast MAC addresses */
 void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 6c0d526..bb6a667 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -898,6 +898,40 @@ Display the status of TCP Segmentation Offload::
 
    testpmd> tso show (port_id)
 
+gro
+~~~~~~~~
+
+Enable or disable GRO in ``csum`` forwarding engine::
+
+   testpmd> gro (on|off) (port_id)
+
+If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
+packets received from the given port.
+
+If disabled, packets received from the given port won't be performed
+GRO. By default, GRO is disabled for all ports.
+
+.. note::
+
+   When enable GRO for a port, TCP/IPv4 packets received from the port
+   will be performed GRO. After GRO, the merged packets are multi-segments.
+   But csum forwarding engine doesn't support to calculate TCP checksum
+   for multi-segment packets in SW. So please select TCP HW checksum
+   calculation for the port which GROed packets are transmitted to.
+
+gro set
+~~~~~~~~
+
+Set max flow number and max packet number per-flow for GRO::
+
+   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
+
+The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the max
+number of packets a GRO table can store.
+
+If current packet number is greater than or equal to the max value, GRO
+will stop processing incoming packets.
+
 mac_addr add
 ~~~~~~~~~~~~
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 141+ messages in thread

* Re: [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO
  2017-07-09  5:46                           ` [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-09  7:59                             ` Yao, Lei A
  0 siblings, 0 replies; 141+ messages in thread
From: Yao, Lei A @ 2017-07-09  7:59 UTC (permalink / raw)
  To: Hu, Jiayu, dev
  Cc: Tan, Jianfeng, Ananyev, Konstantin, yliu, stephen, Wu, Jingjing



> -----Original Message-----
> From: Hu, Jiayu
> Sent: Sunday, July 9, 2017 1:47 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; yliu@fridaylinux.org;
> stephen@networkplumber.org; Wu, Jingjing <jingjing.wu@intel.com>; Yao,
> Lei A <lei.a.yao@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
> Subject: [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO
> 
> This patch enables TCP/IPv4 GRO library in csum forwarding engine.
> By default, GRO is turned off. Users can use command "gro (on|off)
> (port_id)" to enable or disable GRO for a given port. If a port is
> enabled GRO, all TCP/IPv4 packets received from the port are performed
> GRO. Besides, users can set max flow number and packets number per-flow
> by command "gro set (max_flow_num) (max_item_num_per_flow)
> (port_id)".
> 
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>

This patch has been verified on Haswell bench for basic functions and 
Performance.

> ---
>  app/test-pmd/cmdline.c                      | 125
> ++++++++++++++++++++++++++++
>  app/test-pmd/config.c                       |  36 ++++++++
>  app/test-pmd/csumonly.c                     |   5 ++
>  app/test-pmd/testpmd.c                      |   3 +
>  app/test-pmd/testpmd.h                      |  10 +++
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++++++++
>  6 files changed, 213 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index d66e9c8..d4ff608 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -76,6 +76,7 @@
>  #include <rte_devargs.h>
>  #include <rte_eth_ctrl.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
> 
>  #include <cmdline_rdline.h>
>  #include <cmdline_parse.h>
> @@ -423,6 +424,14 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>  			"tso show (portid)"
>  			"    Display the status of TCP Segmentation
> Offload.\n\n"
> 
> +			"gro (on|off) (port_id)"
> +			"    Enable or disable Generic Receive Offload in io"
> +			" forward engine.\n\n"
> +
> +			"gro set (max_flow_num)
> (max_item_num_per_flow) (port_id)\n"
> +			"    Set max flow number and max packet number
> per-flow"
> +			" for GRO.\n\n"
> +
>  			"set fwd (%s)\n"
>  			"    Set packet forwarding mode.\n\n"
> 
> @@ -3838,6 +3847,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = {
>  	},
>  };
> 
> +/* *** SET GRO FOR A PORT *** */
> +struct cmd_gro_result {
> +	cmdline_fixed_string_t cmd_keyword;
> +	cmdline_fixed_string_t mode;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_enable_gro_parsed(void *parsed_result,
> +		__attribute__((unused)) struct cmdline *cl,
> +		__attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_result *res;
> +
> +	res = parsed_result;
> +	setup_gro(res->mode, res->port_id);
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_keyword =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			cmd_keyword, "gro");
> +cmdline_parse_token_string_t cmd_gro_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> +			mode, "on#off");
> +cmdline_parse_token_num_t cmd_gro_pid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_result,
> +			port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_enable_gro = {
> +	.f = cmd_enable_gro_parsed,
> +	.data = NULL,
> +	.help_str = "gro (on|off) (port_id)",
> +	.tokens = {
> +		(void *)&cmd_gro_keyword,
> +		(void *)&cmd_gro_mode,
> +		(void *)&cmd_gro_pid,
> +		NULL,
> +	},
> +};
> +
> +/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO ***
> */
> +struct cmd_gro_set_result {
> +	cmdline_fixed_string_t gro;
> +	cmdline_fixed_string_t mode;
> +	uint16_t flow_num;
> +	uint16_t item_num_per_flow;
> +	uint8_t port_id;
> +};
> +
> +static void
> +cmd_gro_set_parsed(void *parsed_result,
> +		       __attribute__((unused)) struct cmdline *cl,
> +		       __attribute__((unused)) void *data)
> +{
> +	struct cmd_gro_set_result *res = parsed_result;
> +
> +	if (port_id_is_invalid(res->port_id, ENABLED_WARN))
> +		return;
> +	if (test_done == 0) {
> +		printf("Before set GRO flow_num and item_num_per_flow,"
> +				" please stop forwarding first\n");
> +		return;
> +	}
> +
> +	if (!strcmp(res->mode, "set")) {
> +		if (res->flow_num == 0)
> +			printf("Invalid flow number. Revert to default value:"
> +					" %u.\n",
> GRO_DEFAULT_FLOW_NUM);
> +		else
> +			gro_ports[res->port_id].param.max_flow_num =
> +				res->flow_num;
> +
> +		if (res->item_num_per_flow == 0)
> +			printf("Invalid item number per-flow. Revert"
> +					" to default value:%u.\n",
> +
> 	GRO_DEFAULT_ITEM_NUM_PER_FLOW);
> +		else
> +			gro_ports[res->port_id].param.max_item_per_flow
> =
> +				res->item_num_per_flow;
> +	}
> +}
> +
> +cmdline_parse_token_string_t cmd_gro_set_gro =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
> +				gro, "gro");
> +cmdline_parse_token_string_t cmd_gro_set_mode =
> +	TOKEN_STRING_INITIALIZER(struct cmd_gro_set_result,
> +				mode, "set");
> +cmdline_parse_token_num_t cmd_gro_set_flow_num =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
> +				flow_num, UINT16);
> +cmdline_parse_token_num_t cmd_gro_set_item_num_per_flow =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
> +				item_num_per_flow, UINT16);
> +cmdline_parse_token_num_t cmd_gro_set_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_gro_set_result,
> +				port_id, UINT8);
> +
> +cmdline_parse_inst_t cmd_gro_set = {
> +	.f = cmd_gro_set_parsed,
> +	.data = NULL,
> +	.help_str = "gro set <max_flow_num> <max_item_num_per_flow>
> "
> +		"<port_id>: set max flow number and max packet number
> per-flow "
> +		"for GRO",
> +	.tokens = {
> +		(void *)&cmd_gro_set_gro,
> +		(void *)&cmd_gro_set_mode,
> +		(void *)&cmd_gro_set_flow_num,
> +		(void *)&cmd_gro_set_item_num_per_flow,
> +		(void *)&cmd_gro_set_portid,
> +		NULL,
> +	},
> +};
> +
>  /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */
>  struct cmd_set_flush_rx {
>  	cmdline_fixed_string_t set;
> @@ -14035,6 +14158,8 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_tso_show,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_set,
>  	(cmdline_parse_inst_t *)&cmd_tunnel_tso_show,
> +	(cmdline_parse_inst_t *)&cmd_enable_gro,
> +	(cmdline_parse_inst_t *)&cmd_gro_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx,
>  	(cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx,
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 593677e..e0f0825 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -71,6 +71,7 @@
>  #ifdef RTE_LIBRTE_BNXT_PMD
>  #include <rte_pmd_bnxt.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -2415,6 +2416,41 @@ set_tx_pkt_segments(unsigned *seg_lengths,
> unsigned nb_segs)
>  	tx_pkt_nb_segs = (uint8_t) nb_segs;
>  }
> 
> +void
> +setup_gro(const char *mode, uint8_t port_id)
> +{
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		printf("invalid port id %u\n", port_id);
> +		return;
> +	}
> +	if (test_done == 0) {
> +		printf("Before enable/disable GRO,"
> +				" please stop forwarding first\n");
> +		return;
> +	}
> +	if (strcmp(mode, "on") == 0) {
> +		if (gro_ports[port_id].enable) {
> +			printf("port %u has enabled GRO\n", port_id);
> +			return;
> +		}
> +		gro_ports[port_id].enable = 1;
> +		gro_ports[port_id].param.gro_types = RTE_GRO_TCP_IPV4;
> +
> +		if (gro_ports[port_id].param.max_flow_num == 0)
> +			gro_ports[port_id].param.max_flow_num =
> +				GRO_DEFAULT_FLOW_NUM;
> +		if (gro_ports[port_id].param.max_item_per_flow == 0)
> +			gro_ports[port_id].param.max_item_per_flow =
> +				GRO_DEFAULT_ITEM_NUM_PER_FLOW;
> +	} else {
> +		if (gro_ports[port_id].enable == 0) {
> +			printf("port %u has disabled GRO\n", port_id);
> +			return;
> +		}
> +		gro_ports[port_id].enable = 0;
> +	}
> +}
> +
>  char*
>  list_pkt_forwarding_modes(void)
>  {
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
> index 66fc9a0..178ad75 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -71,6 +71,7 @@
>  #include <rte_prefetch.h>
>  #include <rte_string_fns.h>
>  #include <rte_flow.h>
> +#include <rte_gro.h>
>  #include "testpmd.h"
> 
>  #define IP_DEFTTL  64   /* from RFC 1340. */
> @@ -658,6 +659,10 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
>  				 nb_pkt_per_burst);
>  	if (unlikely(nb_rx == 0))
>  		return;
> +	if (unlikely(gro_ports[fs->rx_port].enable))
> +		nb_rx = rte_gro_reassemble_burst(pkts_burst,
> +				nb_rx,
> +				&(gro_ports[fs->rx_port].param));
> 
>  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
>  	fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 308c1b7..e09b803 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -90,6 +90,7 @@
>  #ifdef RTE_LIBRTE_LATENCY_STATS
>  #include <rte_latencystats.h>
>  #endif
> +#include <rte_gro.h>
> 
>  #include "testpmd.h"
> 
> @@ -379,6 +380,8 @@ lcoreid_t bitrate_lcore_id;
>  uint8_t bitrate_enabled;
>  #endif
> 
> +struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  /* Forward function declarations */
>  static void map_port_queue_stats_mapping_registers(uint8_t pi, struct
> rte_port *port);
>  static void check_all_ports_link_status(uint32_t port_mask);
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 9ccfb6d..73985c3 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -35,6 +35,7 @@
>  #define _TESTPMD_H_
> 
>  #include <rte_pci.h>
> +#include <rte_gro.h>
> 
>  #define RTE_PORT_ALL            (~(portid_t)0x0)
> 
> @@ -432,6 +433,14 @@ extern struct ether_addr
> peer_eth_addrs[RTE_MAX_ETHPORTS];
>  extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac-
> retry. */
>  extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-
> retry. */
> 
> +#define GRO_DEFAULT_FLOW_NUM 4
> +#define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST
> +struct gro_status {
> +	struct rte_gro_param param;
> +	uint8_t enable;
> +};
> +extern struct gro_status gro_ports[RTE_MAX_ETHPORTS];
> +
>  static inline unsigned int
>  lcore_num(void)
>  {
> @@ -630,6 +639,7 @@ void get_2tuple_filter(uint8_t port_id, uint16_t
> index);
>  void get_5tuple_filter(uint8_t port_id, uint16_t index);
>  int rx_queue_id_is_invalid(queueid_t rxq_id);
>  int tx_queue_id_is_invalid(queueid_t txq_id);
> +void setup_gro(const char *mode, uint8_t port_id);
> 
>  /* Functions to manage the set of filtered Multicast MAC addresses */
>  void mcast_addr_add(uint8_t port_id, struct ether_addr *mc_addr);
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 6c0d526..bb6a667 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -898,6 +898,40 @@ Display the status of TCP Segmentation Offload::
> 
>     testpmd> tso show (port_id)
> 
> +gro
> +~~~~~~~~
> +
> +Enable or disable GRO in ``csum`` forwarding engine::
> +
> +   testpmd> gro (on|off) (port_id)
> +
> +If enabled, the csum forwarding engine will perform GRO on the TCP/IPv4
> +packets received from the given port.
> +
> +If disabled, packets received from the given port won't be performed
> +GRO. By default, GRO is disabled for all ports.
> +
> +.. note::
> +
> +   When enable GRO for a port, TCP/IPv4 packets received from the port
> +   will be performed GRO. After GRO, the merged packets are multi-
> segments.
> +   But csum forwarding engine doesn't support to calculate TCP checksum
> +   for multi-segment packets in SW. So please select TCP HW checksum
> +   calculation for the port which GROed packets are transmitted to.
> +
> +gro set
> +~~~~~~~~
> +
> +Set max flow number and max packet number per-flow for GRO::
> +
> +   testpmd> gro set (max_flow_num) (max_item_num_per_flow) (port_id)
> +
> +The product of ``max_flow_num`` and ``max_item_num_per_flow`` is the
> max
> +number of packets a GRO table can store.
> +
> +If current packet number is greater than or equal to the max value, GRO
> +will stop processing incoming packets.
> +
>  mac_addr add
>  ~~~~~~~~~~~~
> 
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
                                             ` (2 preceding siblings ...)
  2017-07-09  5:46                           ` [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
@ 2017-07-09 16:14                           ` Thomas Monjalon
  2017-07-10  2:21                             ` Hu, Jiayu
  3 siblings, 1 reply; 141+ messages in thread
From: Thomas Monjalon @ 2017-07-09 16:14 UTC (permalink / raw)
  To: Jiayu Hu
  Cc: dev, jianfeng.tan, konstantin.ananyev, yliu, stephen,
	jingjing.wu, lei.a.yao

09/07/2017 07:46, Jiayu Hu:
> Jiayu Hu (3):
>   lib: add Generic Receive Offload API framework
>   lib/gro: add TCP/IPv4 GRO support
>   app/testpmd: enable TCP/IPv4 GRO
> 
>  MAINTAINERS                                 |   4 +
>  app/test-pmd/cmdline.c                      | 125 +++++++
>  app/test-pmd/config.c                       |  36 ++
>  app/test-pmd/csumonly.c                     |   5 +
>  app/test-pmd/testpmd.c                      |   3 +
>  app/test-pmd/testpmd.h                      |  10 +
>  config/common_base                          |   5 +
>  doc/guides/rel_notes/release_17_08.rst      |   7 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++
>  lib/Makefile                                |   2 +
>  lib/librte_gro/Makefile                     |  51 +++
>  lib/librte_gro/gro_tcp4.c                   | 505 ++++++++++++++++++++++++++++
>  lib/librte_gro/gro_tcp4.h                   | 210 ++++++++++++
>  lib/librte_gro/rte_gro.c                    | 278 +++++++++++++++
>  lib/librte_gro/rte_gro.h                    | 211 ++++++++++++
>  lib/librte_gro/rte_gro_version.map          |  12 +
>  mk/rte.app.mk                               |   1 +
>  17 files changed, 1499 insertions(+)

I have added an EXPERIMENTAL note in MAINTAINERS file.
I have added the library in the doxygen doc and in the release notes
libraries list.

Applied with above changes, thanks

A page in the programmer's guide is missing.
Please fill it, thanks

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-09 16:14                           ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Thomas Monjalon
@ 2017-07-10  2:21                             ` Hu, Jiayu
  2017-07-10  7:03                               ` Thomas Monjalon
  0 siblings, 1 reply; 141+ messages in thread
From: Hu, Jiayu @ 2017-07-10  2:21 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Tan, Jianfeng, Ananyev, Konstantin, yliu, stephen, Wu,
	Jingjing, Yao, Lei A

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, July 10, 2017 12:14 AM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; yliu@fridaylinux.org;
> stephen@networkplumber.org; Wu, Jingjing <jingjing.wu@intel.com>; Yao,
> Lei A <lei.a.yao@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK
> 
> 09/07/2017 07:46, Jiayu Hu:
> > Jiayu Hu (3):
> >   lib: add Generic Receive Offload API framework
> >   lib/gro: add TCP/IPv4 GRO support
> >   app/testpmd: enable TCP/IPv4 GRO
> >
> >  MAINTAINERS                                 |   4 +
> >  app/test-pmd/cmdline.c                      | 125 +++++++
> >  app/test-pmd/config.c                       |  36 ++
> >  app/test-pmd/csumonly.c                     |   5 +
> >  app/test-pmd/testpmd.c                      |   3 +
> >  app/test-pmd/testpmd.h                      |  10 +
> >  config/common_base                          |   5 +
> >  doc/guides/rel_notes/release_17_08.rst      |   7 +
> >  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 ++
> >  lib/Makefile                                |   2 +
> >  lib/librte_gro/Makefile                     |  51 +++
> >  lib/librte_gro/gro_tcp4.c                   | 505
> ++++++++++++++++++++++++++++
> >  lib/librte_gro/gro_tcp4.h                   | 210 ++++++++++++
> >  lib/librte_gro/rte_gro.c                    | 278 +++++++++++++++
> >  lib/librte_gro/rte_gro.h                    | 211 ++++++++++++
> >  lib/librte_gro/rte_gro_version.map          |  12 +
> >  mk/rte.app.mk                               |   1 +
> >  17 files changed, 1499 insertions(+)
> 
> I have added an EXPERIMENTAL note in MAINTAINERS file.
> I have added the library in the doxygen doc and in the release notes
> libraries list.
> 
> Applied with above changes, thanks
> 
> A page in the programmer's guide is missing.
> Please fill it, thanks

Thanks for your reminder. I will update it.

BRs,
Jiayu

^ permalink raw reply	[flat|nested] 141+ messages in thread

* Re: [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK
  2017-07-10  2:21                             ` Hu, Jiayu
@ 2017-07-10  7:03                               ` Thomas Monjalon
  0 siblings, 0 replies; 141+ messages in thread
From: Thomas Monjalon @ 2017-07-10  7:03 UTC (permalink / raw)
  To: Hu, Jiayu
  Cc: dev, Tan, Jianfeng, Ananyev, Konstantin, yliu, stephen, Wu,
	Jingjing, Yao, Lei A

10/07/2017 04:21, Hu, Jiayu:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > I have added an EXPERIMENTAL note in MAINTAINERS file.
> > I have added the library in the doxygen doc and in the release notes
> > libraries list.
> > 
> > Applied with above changes, thanks
> > 
> > A page in the programmer's guide is missing.
> > Please fill it, thanks
> 
> Thanks for your reminder. I will update it.

You need also to add some doxygen comments to the API:
	http://dpdk.org/doc/api/rte__gro_8h_source.html

^ permalink raw reply	[flat|nested] 141+ messages in thread

end of thread, other threads:[~2017-07-10  7:03 UTC | newest]

Thread overview: 141+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22  9:32 [PATCH 0/2] lib: add TCP IPv4 GRO support Jiayu Hu
2017-03-22  9:32 ` [PATCH 1/2] lib: add Generic Receive Offload support for TCP IPv4 packets Jiayu Hu
2017-03-22  9:32 ` [PATCH 2/2] app/testpmd: provide TCP IPv4 GRO function in iofwd mode Jiayu Hu
     [not found] ` <1B893F1B-4DA8-4F88-9583-8C0BAA570832@intel.com>
     [not found]   ` <20170323021502.GA114662@localhost.localdomain>
     [not found]     ` <C830A6FC-F440-4E68-AB4E-2FD502722E3F@intel.com>
     [not found]       ` <20170323062433.GA120139@localhost.localdomain>
     [not found]         ` <59AF69C657FD0841A61C55336867B5B066729E3F@IRSMSX103.ger.corp.intel.com>
     [not found]           ` <20170323102135.GA124301@localhost.localdomain>
     [not found]             ` <2601191342CEEE43887BDE71AB9772583FAD410A@IRSMSX109.ger.corp.intel.com>
2017-03-24  2:23               ` [PATCH 0/2] lib: add TCP IPv4 GRO support Jiayu Hu
2017-03-24  6:18                 ` Wiles, Keith
2017-03-24  7:22                   ` Yuanhan Liu
2017-03-24  8:06                     ` Jiayu Hu
2017-03-24 11:43                       ` Ananyev, Konstantin
2017-03-24 14:37                         ` Wiles, Keith
2017-03-24 14:59                           ` Olivier Matz
2017-03-24 15:07                             ` Wiles, Keith
2017-03-28 13:40                               ` Wiles, Keith
2017-03-28 13:57                                 ` Hu, Jiayu
2017-03-28 16:06                                   ` Wiles, Keith
2017-03-29 10:47                         ` Morten Brørup
2017-03-29 12:12                           ` Wiles, Keith
2017-04-04 12:31 ` [PATCH v2 0/3] support GRO in DPDK Jiayu Hu
2017-04-04 12:31   ` [PATCH v2 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-04-04 12:31   ` [PATCH v2 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-04-04 12:31   ` [PATCH v2 3/3] app/testpmd: enable GRO feature Jiayu Hu
2017-04-24  8:09   ` [PATCH v3 0/3] support GRO in DPDK Jiayu Hu
2017-04-24  8:09     ` [PATCH v3 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-05-22  9:19       ` Ananyev, Konstantin
2017-05-23 10:31         ` Jiayu Hu
2017-05-24 12:38           ` Ananyev, Konstantin
2017-05-26  7:26             ` Jiayu Hu
2017-05-26 23:10               ` Ananyev, Konstantin
2017-05-27  3:41                 ` Jiayu Hu
2017-05-27 11:12                   ` Ananyev, Konstantin
2017-05-27 14:09                     ` Jiayu Hu
2017-05-27 16:51                       ` Ananyev, Konstantin
2017-05-29 10:22                         ` Hu, Jiayu
2017-05-29 12:18                           ` Bruce Richardson
2017-05-30 14:10                             ` Hu, Jiayu
2017-05-29 12:51                           ` Ananyev, Konstantin
2017-05-30  5:29                             ` Hu, Jiayu
2017-05-30 11:56                               ` Ananyev, Konstantin
2017-04-24  8:09     ` [PATCH v3 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-04-24  8:09     ` [PATCH v3 3/3] app/testpmd: enable GRO feature Jiayu Hu
2017-06-07  9:24       ` Wu, Jingjing
2017-06-07 11:08     ` [PATCH v4 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-06-07 11:08       ` [PATCH v4 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-06-07 11:08       ` [PATCH v4 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-06-07 11:08       ` [PATCH v4 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-06-18  7:21       ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-06-18  7:21         ` [PATCH v5 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-06-19  4:03           ` Tiwei Bie
2017-06-19  5:16             ` Jiayu Hu
2017-06-19 15:43           ` Tan, Jianfeng
2017-06-19 15:55           ` Stephen Hemminger
2017-06-20  1:48             ` Jiayu Hu
2017-06-18  7:21         ` [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-06-19 15:43           ` Tan, Jianfeng
2017-06-20  3:22             ` Jiayu Hu
2017-06-20 15:15               ` Ananyev, Konstantin
2017-06-20 16:16                 ` Jiayu Hu
2017-06-20 15:21               ` Ananyev, Konstantin
2017-06-20 23:30               ` Tan, Jianfeng
2017-06-20 23:55                 ` Stephen Hemminger
2017-06-22  7:39                 ` Jiayu Hu
2017-06-22  8:18             ` Jiayu Hu
2017-06-22  9:35               ` Tan, Jianfeng
2017-06-22 13:55                 ` Jiayu Hu
2017-06-18  7:21         ` [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-06-19  1:24           ` Yao, Lei A
2017-06-19  2:27           ` Wu, Jingjing
2017-06-19  3:22             ` Jiayu Hu
2017-06-19  1:39         ` [PATCH v5 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
2017-06-19  3:07           ` Jiayu Hu
2017-06-19  5:12             ` Jiayu Hu
2017-06-23 14:43         ` [PATCH v6 " Jiayu Hu
2017-06-23 14:43           ` [PATCH v6 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-06-25 16:53             ` Tan, Jianfeng
2017-06-23 14:43           ` [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-06-25 16:53             ` Tan, Jianfeng
2017-06-26  1:58               ` Jiayu Hu
2017-06-23 14:43           ` [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-06-24  8:01             ` Yao, Lei A
2017-06-25 16:03           ` [PATCH v6 0/3] Support TCP/IPv4 GRO in DPDK Tan, Jianfeng
2017-06-26  1:35             ` Jiayu Hu
2017-06-26  6:43           ` [PATCH v7 " Jiayu Hu
2017-06-26  6:43             ` [PATCH v7 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-06-27 23:42               ` Ananyev, Konstantin
2017-06-28  2:17                 ` Jiayu Hu
2017-06-28 17:41                   ` Ananyev, Konstantin
2017-06-29  1:19                     ` Jiayu Hu
2017-06-26  6:43             ` [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-06-28 23:56               ` Ananyev, Konstantin
2017-06-29  2:26                 ` Jiayu Hu
2017-06-30 12:07                   ` Ananyev, Konstantin
2017-06-30 15:40                     ` Hu, Jiayu
2017-06-26  6:43             ` [PATCH v7 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-06-29 10:58             ` [PATCH v8 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-06-29 10:58               ` [PATCH v8 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-06-29 10:58               ` [PATCH v8 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-06-29 17:51                 ` Stephen Hemminger
2017-06-30  2:07                   ` Jiayu Hu
2017-06-29 10:59               ` [PATCH v8 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-06-30  2:26                 ` Wu, Jingjing
2017-06-30  6:53               ` [PATCH v9 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-06-30  6:53                 ` [PATCH v9 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-06-30  6:53                 ` [PATCH v9 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-06-30  6:53                 ` [PATCH v9 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-07-01 11:08                 ` [PATCH v10 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-07-01 11:08                   ` [PATCH v10 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-07-02 10:19                     ` Tan, Jianfeng
2017-07-03  5:56                       ` Hu, Jiayu
2017-07-04  8:11                         ` Yuanhan Liu
2017-07-04  8:37                     ` Yuanhan Liu
2017-07-04 16:01                       ` Hu, Jiayu
2017-07-01 11:08                   ` [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-07-02 10:19                     ` Tan, Jianfeng
2017-07-03  5:13                       ` Hu, Jiayu
2017-07-04  9:03                     ` Yuanhan Liu
2017-07-04 16:03                       ` Hu, Jiayu
2017-07-01 11:08                   ` [PATCH v10 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-07-05  4:08                   ` [PATCH v11 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-07-05  4:08                     ` [PATCH v11 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-07-07  6:55                       ` Tan, Jianfeng
2017-07-07  9:19                         ` Tan, Jianfeng
2017-07-05  4:08                     ` [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-07-07  6:55                       ` Tan, Jianfeng
2017-07-05  4:08                     ` [PATCH v11 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-07-07 10:39                     ` [PATCH v12 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-07-07 10:39                       ` [PATCH v12 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-07-08 16:37                         ` Tan, Jianfeng
2017-07-07 10:39                       ` [PATCH v12 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-07-08 16:37                         ` Tan, Jianfeng
2017-07-07 10:39                       ` [PATCH v12 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-07-09  1:13                       ` [PATCH v13 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-07-09  1:13                         ` [PATCH v13 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-07-09  1:13                         ` [PATCH v13 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-07-09  1:13                         ` [PATCH v13 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-07-09  5:46                         ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Jiayu Hu
2017-07-09  5:46                           ` [PATCH v14 1/3] lib: add Generic Receive Offload API framework Jiayu Hu
2017-07-09  5:46                           ` [PATCH v14 2/3] lib/gro: add TCP/IPv4 GRO support Jiayu Hu
2017-07-09  5:46                           ` [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO Jiayu Hu
2017-07-09  7:59                             ` Yao, Lei A
2017-07-09 16:14                           ` [PATCH v14 0/3] Support TCP/IPv4 GRO in DPDK Thomas Monjalon
2017-07-10  2:21                             ` Hu, Jiayu
2017-07-10  7:03                               ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.