All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] gro: some minor optimization
@ 2022-02-03 15:48 Paolo Abeni
  2022-02-03 15:48 ` [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Paolo Abeni @ 2022-02-03 15:48 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

This series collects a few small optimization for the GRO engine.
I measure a 10% performance improvements in micro-benchmarks
around dev_gro_receive(), but deltas are within noise range in tput
tests.

Still with big TCP coming every cycle saved from the GRO engine will
count - I hope ;)

The only change from the RFC is in patch 2, as per Alexander feedback

Paolo Abeni (3):
  net: gro: avoid re-computing truesize twice on recycle
  net: gro: minor optimization for dev_gro_receive()
  net: gro: register gso and gro offload on separate lists

 include/linux/netdevice.h |   3 +-
 include/net/gro.h         |  52 +++++++++---------
 net/core/gro.c            | 109 +++++++++++++++++++++-----------------
 3 files changed, 91 insertions(+), 73 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle
  2022-02-03 15:48 [PATCH net-next 0/3] gro: some minor optimization Paolo Abeni
@ 2022-02-03 15:48 ` Paolo Abeni
  2022-02-03 16:16   ` Alexander Lobakin
  2022-02-03 15:48 ` [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive() Paolo Abeni
  2022-02-03 15:48 ` [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists Paolo Abeni
  2 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2022-02-03 15:48 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

After commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb
carring sock reference") and commit af352460b465 ("net: fix GRO
skb truesize update") the truesize of freed skb is properly updated
by the GRO engine, we don't need anymore resetting it at recycle time.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/gro.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/gro.c b/net/core/gro.c
index a11b286d1495..d43d42215bdb 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -634,7 +634,6 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 
 	skb->encapsulation = 0;
 	skb_shinfo(skb)->gso_type = 0;
-	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
 	if (unlikely(skb->slow_gro)) {
 		skb_orphan(skb);
 		skb_ext_reset(skb);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive()
  2022-02-03 15:48 [PATCH net-next 0/3] gro: some minor optimization Paolo Abeni
  2022-02-03 15:48 ` [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle Paolo Abeni
@ 2022-02-03 15:48 ` Paolo Abeni
  2022-02-03 16:09   ` Alexander Lobakin
  2022-02-03 16:39   ` Alexander H Duyck
  2022-02-03 15:48 ` [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists Paolo Abeni
  2 siblings, 2 replies; 12+ messages in thread
From: Paolo Abeni @ 2022-02-03 15:48 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

While inspecting some perf report, I noticed that the compiler
emits suboptimal code for the napi CB initialization, fetching
and storing multiple times the memory for flags bitfield.
This is with gcc 10.3.1, but I observed the same with older compiler
versions.

We can help the compiler to do a nicer work clearing several
fields at once using an u32 alias. The generated code is quite
smaller, with the same number of conditional.

Before:
objdump -t net/core/gro.o | grep " F .text"
0000000000000bb0 l     F .text	0000000000000357 dev_gro_receive

After:
0000000000000bb0 l     F .text	000000000000033c dev_gro_receive

RFC -> v1:
 - use __struct_group to delimt the zeroed area (Alexander)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/gro.h | 52 +++++++++++++++++++++++++----------------------
 net/core/gro.c    | 18 +++++++---------
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index 8f75802d50fd..fa1bb0f0ad28 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -29,46 +29,50 @@ struct napi_gro_cb {
 	/* Number of segments aggregated. */
 	u16	count;
 
-	/* Start offset for remote checksum offload */
-	u16	gro_remcsum_start;
+	/* Used in ipv6_gro_receive() and foo-over-udp */
+	u16	proto;
 
 	/* jiffies when first packet was created/queued */
 	unsigned long age;
 
-	/* Used in ipv6_gro_receive() and foo-over-udp */
-	u16	proto;
+	/* portion of the cb set to zero at every gro iteration */
+	__struct_group(/* no tag */, zeroed, /* no attrs */,
+
+		/* Start offset for remote checksum offload */
+		u16	gro_remcsum_start;
 
-	/* This is non-zero if the packet may be of the same flow. */
-	u8	same_flow:1;
+		/* This is non-zero if the packet may be of the same flow. */
+		u8	same_flow:1;
 
-	/* Used in tunnel GRO receive */
-	u8	encap_mark:1;
+		/* Used in tunnel GRO receive */
+		u8	encap_mark:1;
 
-	/* GRO checksum is valid */
-	u8	csum_valid:1;
+		/* GRO checksum is valid */
+		u8	csum_valid:1;
 
-	/* Number of checksums via CHECKSUM_UNNECESSARY */
-	u8	csum_cnt:3;
+		/* Number of checksums via CHECKSUM_UNNECESSARY */
+		u8	csum_cnt:3;
 
-	/* Free the skb? */
-	u8	free:2;
+		/* Free the skb? */
+		u8	free:2;
 #define NAPI_GRO_FREE		  1
 #define NAPI_GRO_FREE_STOLEN_HEAD 2
 
-	/* Used in foo-over-udp, set in udp[46]_gro_receive */
-	u8	is_ipv6:1;
+		/* Used in foo-over-udp, set in udp[46]_gro_receive */
+		u8	is_ipv6:1;
 
-	/* Used in GRE, set in fou/gue_gro_receive */
-	u8	is_fou:1;
+		/* Used in GRE, set in fou/gue_gro_receive */
+		u8	is_fou:1;
 
-	/* Used to determine if flush_id can be ignored */
-	u8	is_atomic:1;
+		/* Used to determine if flush_id can be ignored */
+		u8	is_atomic:1;
 
-	/* Number of gro_receive callbacks this packet already went through */
-	u8 recursion_counter:4;
+		/* Number of gro_receive callbacks this packet already went through */
+		u8 recursion_counter:4;
 
-	/* GRO is done by frag_list pointer chaining. */
-	u8	is_flist:1;
+		/* GRO is done by frag_list pointer chaining. */
+		u8	is_flist:1;
+	);
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
diff --git a/net/core/gro.c b/net/core/gro.c
index d43d42215bdb..fc56be9408c7 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -435,6 +435,9 @@ static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head)
 	napi_gro_complete(napi, oldest);
 }
 
+#define zeroed_len	(offsetof(struct napi_gro_cb, zeroed_end) - \
+			 offsetof(struct napi_gro_cb, zeroed_start))
+
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	u32 bucket = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
@@ -459,29 +462,22 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 
 		skb_set_network_header(skb, skb_gro_offset(skb));
 		skb_reset_mac_len(skb);
-		NAPI_GRO_CB(skb)->same_flow = 0;
+		BUILD_BUG_ON(sizeof_field(struct napi_gro_cb, zeroed) != sizeof(u32));
+		BUILD_BUG_ON(!IS_ALIGNED(offsetof(struct napi_gro_cb, zeroed),
+					 sizeof(u32))); /* Avoid slow unaligned acc */
+		*(u32 *)&NAPI_GRO_CB(skb)->zeroed = 0;
 		NAPI_GRO_CB(skb)->flush = skb_is_gso(skb) || skb_has_frag_list(skb);
-		NAPI_GRO_CB(skb)->free = 0;
-		NAPI_GRO_CB(skb)->encap_mark = 0;
-		NAPI_GRO_CB(skb)->recursion_counter = 0;
-		NAPI_GRO_CB(skb)->is_fou = 0;
 		NAPI_GRO_CB(skb)->is_atomic = 1;
-		NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
 
 		/* Setup for GRO checksum validation */
 		switch (skb->ip_summed) {
 		case CHECKSUM_COMPLETE:
 			NAPI_GRO_CB(skb)->csum = skb->csum;
 			NAPI_GRO_CB(skb)->csum_valid = 1;
-			NAPI_GRO_CB(skb)->csum_cnt = 0;
 			break;
 		case CHECKSUM_UNNECESSARY:
 			NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1;
-			NAPI_GRO_CB(skb)->csum_valid = 0;
 			break;
-		default:
-			NAPI_GRO_CB(skb)->csum_cnt = 0;
-			NAPI_GRO_CB(skb)->csum_valid = 0;
 		}
 
 		pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists
  2022-02-03 15:48 [PATCH net-next 0/3] gro: some minor optimization Paolo Abeni
  2022-02-03 15:48 ` [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle Paolo Abeni
  2022-02-03 15:48 ` [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive() Paolo Abeni
@ 2022-02-03 15:48 ` Paolo Abeni
  2022-02-03 16:11   ` Eric Dumazet
  2 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2022-02-03 15:48 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

So that we know each element in gro_list has valid gro callbacks
(and the same for gso). This allows dropping a bunch of conditional
in fastpath.

Before:
objdump -t net/core/gro.o | grep " F .text"
0000000000000bb0 l     F .text  000000000000033c dev_gro_receive

After:
0000000000000bb0 l     F .text  0000000000000325 dev_gro_receive

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/netdevice.h |  3 +-
 net/core/gro.c            | 90 +++++++++++++++++++++++----------------
 2 files changed, 56 insertions(+), 37 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3213c7227b59..406cb457d788 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2564,7 +2564,8 @@ struct packet_offload {
 	__be16			 type;	/* This is really htons(ether_type). */
 	u16			 priority;
 	struct offload_callbacks callbacks;
-	struct list_head	 list;
+	struct list_head	 gro_list;
+	struct list_head	 gso_list;
 };
 
 /* often modified stats are per-CPU, other are shared (netdev->stats) */
diff --git a/net/core/gro.c b/net/core/gro.c
index fc56be9408c7..bd619d494fdd 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -10,10 +10,21 @@
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
 static DEFINE_SPINLOCK(offload_lock);
-static struct list_head offload_base __read_mostly = LIST_HEAD_INIT(offload_base);
+static struct list_head gro_offload_base __read_mostly = LIST_HEAD_INIT(gro_offload_base);
+static struct list_head gso_offload_base __read_mostly = LIST_HEAD_INIT(gso_offload_base);
 /* Maximum number of GRO_NORMAL skbs to batch up for list-RX */
 int gro_normal_batch __read_mostly = 8;
 
+#define offload_list_insert(head, poff, list)		\
+({							\
+	struct packet_offload *elem;			\
+	list_for_each_entry(elem, head, list) {		\
+		if ((poff)->priority < elem->priority)	\
+			break;				\
+	}						\
+	list_add_rcu(&(poff)->list, elem->list.prev);	\
+})
+
 /**
  *	dev_add_offload - register offload handlers
  *	@po: protocol offload declaration
@@ -28,18 +39,33 @@ int gro_normal_batch __read_mostly = 8;
  */
 void dev_add_offload(struct packet_offload *po)
 {
-	struct packet_offload *elem;
-
 	spin_lock(&offload_lock);
-	list_for_each_entry(elem, &offload_base, list) {
-		if (po->priority < elem->priority)
-			break;
-	}
-	list_add_rcu(&po->list, elem->list.prev);
+	if (po->callbacks.gro_receive && po->callbacks.gro_complete)
+		offload_list_insert(&gro_offload_base, po, gro_list);
+	else if (po->callbacks.gro_complete)
+		pr_warn("missing gro_receive callback");
+	else if (po->callbacks.gro_receive)
+		pr_warn("missing gro_complete callback");
+
+	if (po->callbacks.gso_segment)
+		offload_list_insert(&gso_offload_base, po, gso_list);
 	spin_unlock(&offload_lock);
 }
 EXPORT_SYMBOL(dev_add_offload);
 
+#define offload_list_remove(type, head, poff, list)	\
+({							\
+	struct packet_offload *elem;			\
+	list_for_each_entry(elem, head, list) {		\
+		if ((poff) == elem) {			\
+			list_del_rcu(&(poff)->list);	\
+			break;				\
+		}					\
+	}						\
+	if (elem != (poff))				\
+		pr_warn("dev_remove_offload: %p not found in %s list\n", (poff), type); \
+})
+
 /**
  *	__dev_remove_offload	 - remove offload handler
  *	@po: packet offload declaration
@@ -55,20 +81,12 @@ EXPORT_SYMBOL(dev_add_offload);
  */
 static void __dev_remove_offload(struct packet_offload *po)
 {
-	struct list_head *head = &offload_base;
-	struct packet_offload *po1;
-
 	spin_lock(&offload_lock);
+	if (po->callbacks.gro_receive)
+		offload_list_remove("gro", &gro_offload_base, po, gso_list);
 
-	list_for_each_entry(po1, head, list) {
-		if (po == po1) {
-			list_del_rcu(&po->list);
-			goto out;
-		}
-	}
-
-	pr_warn("dev_remove_offload: %p not found\n", po);
-out:
+	if (po->callbacks.gso_segment)
+		offload_list_remove("gso", &gso_offload_base, po, gro_list);
 	spin_unlock(&offload_lock);
 }
 
@@ -111,8 +129,8 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 	__skb_pull(skb, vlan_depth);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ptype, &offload_base, list) {
-		if (ptype->type == type && ptype->callbacks.gso_segment) {
+	list_for_each_entry_rcu(ptype, &gso_offload_base, gso_list) {
+		if (ptype->type == type) {
 			segs = ptype->callbacks.gso_segment(skb, features);
 			break;
 		}
@@ -250,7 +268,7 @@ static void napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb)
 {
 	struct packet_offload *ptype;
 	__be16 type = skb->protocol;
-	struct list_head *head = &offload_base;
+	struct list_head *head = &gro_offload_base;
 	int err = -ENOENT;
 
 	BUILD_BUG_ON(sizeof(struct napi_gro_cb) > sizeof(skb->cb));
@@ -261,8 +279,8 @@ static void napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb)
 	}
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ptype, head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_complete)
+	list_for_each_entry_rcu(ptype, head, gro_list) {
+		if (ptype->type != type)
 			continue;
 
 		err = INDIRECT_CALL_INET(ptype->callbacks.gro_complete,
@@ -273,7 +291,7 @@ static void napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb)
 	rcu_read_unlock();
 
 	if (err) {
-		WARN_ON(&ptype->list == head);
+		WARN_ON(&ptype->gro_list == head);
 		kfree_skb(skb);
 		return;
 	}
@@ -442,7 +460,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 {
 	u32 bucket = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
 	struct gro_list *gro_list = &napi->gro_hash[bucket];
-	struct list_head *head = &offload_base;
+	struct list_head *head = &gro_offload_base;
 	struct packet_offload *ptype;
 	__be16 type = skb->protocol;
 	struct sk_buff *pp = NULL;
@@ -456,8 +474,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	gro_list_prepare(&gro_list->list, skb);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(ptype, head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_receive)
+	list_for_each_entry_rcu(ptype, head, gro_list) {
+		if (ptype->type != type)
 			continue;
 
 		skb_set_network_header(skb, skb_gro_offset(skb));
@@ -487,7 +505,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	}
 	rcu_read_unlock();
 
-	if (&ptype->list == head)
+	if (&ptype->gro_list == head)
 		goto normal;
 
 	if (PTR_ERR(pp) == -EINPROGRESS) {
@@ -543,11 +561,11 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 
 struct packet_offload *gro_find_receive_by_type(__be16 type)
 {
-	struct list_head *offload_head = &offload_base;
+	struct list_head *offload_head = &gro_offload_base;
 	struct packet_offload *ptype;
 
-	list_for_each_entry_rcu(ptype, offload_head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_receive)
+	list_for_each_entry_rcu(ptype, offload_head, gro_list) {
+		if (ptype->type != type)
 			continue;
 		return ptype;
 	}
@@ -557,11 +575,11 @@ EXPORT_SYMBOL(gro_find_receive_by_type);
 
 struct packet_offload *gro_find_complete_by_type(__be16 type)
 {
-	struct list_head *offload_head = &offload_base;
+	struct list_head *offload_head = &gro_offload_base;
 	struct packet_offload *ptype;
 
-	list_for_each_entry_rcu(ptype, offload_head, list) {
-		if (ptype->type != type || !ptype->callbacks.gro_complete)
+	list_for_each_entry_rcu(ptype, offload_head, gro_list) {
+		if (ptype->type != type)
 			continue;
 		return ptype;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive()
  2022-02-03 15:48 ` [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive() Paolo Abeni
@ 2022-02-03 16:09   ` Alexander Lobakin
  2022-02-03 16:39   ` Alexander H Duyck
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Lobakin @ 2022-02-03 16:09 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

From: Paolo Abeni <pabeni@redhat.com>
Date: Thu,  3 Feb 2022 16:48:22 +0100

> While inspecting some perf report, I noticed that the compiler
> emits suboptimal code for the napi CB initialization, fetching
> and storing multiple times the memory for flags bitfield.
> This is with gcc 10.3.1, but I observed the same with older compiler
> versions.
> 
> We can help the compiler to do a nicer work clearing several
> fields at once using an u32 alias. The generated code is quite
> smaller, with the same number of conditional.
> 
> Before:
> objdump -t net/core/gro.o | grep " F .text"
> 0000000000000bb0 l     F .text	0000000000000357 dev_gro_receive
> 
> After:
> 0000000000000bb0 l     F .text	000000000000033c dev_gro_receive
> 
> RFC -> v1:
>  - use __struct_group to delimt the zeroed area (Alexander)

"delimit"?

> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/net/gro.h | 52 +++++++++++++++++++++++++----------------------
>  net/core/gro.c    | 18 +++++++---------
>  2 files changed, 35 insertions(+), 35 deletions(-)
> 
> diff --git a/include/net/gro.h b/include/net/gro.h
> index 8f75802d50fd..fa1bb0f0ad28 100644
> --- a/include/net/gro.h
> +++ b/include/net/gro.h
> @@ -29,46 +29,50 @@ struct napi_gro_cb {
>  	/* Number of segments aggregated. */
>  	u16	count;
>  
> -	/* Start offset for remote checksum offload */
> -	u16	gro_remcsum_start;
> +	/* Used in ipv6_gro_receive() and foo-over-udp */
> +	u16	proto;
>  
>  	/* jiffies when first packet was created/queued */
>  	unsigned long age;
>  
> -	/* Used in ipv6_gro_receive() and foo-over-udp */
> -	u16	proto;
> +	/* portion of the cb set to zero at every gro iteration */
> +	__struct_group(/* no tag */, zeroed, /* no attrs */,

Oh, only after sending this in my reply I noticed that there's
a shorthand for this

struct_group(name, ...)

means exactly

__struct_group(/* no tag */, name, /* no attrs */, ...)

Sorry for that. I got confused by that Kees used __struct_group()
directly in one place although there was a possibility to use short
struct_group() instead.
Since there should already be a v2 (see below), probably worth
changing.

> +
> +		/* Start offset for remote checksum offload */
> +		u16	gro_remcsum_start;
>  
> -	/* This is non-zero if the packet may be of the same flow. */
> -	u8	same_flow:1;
> +		/* This is non-zero if the packet may be of the same flow. */
> +		u8	same_flow:1;
>  
> -	/* Used in tunnel GRO receive */
> -	u8	encap_mark:1;
> +		/* Used in tunnel GRO receive */
> +		u8	encap_mark:1;
>  
> -	/* GRO checksum is valid */
> -	u8	csum_valid:1;
> +		/* GRO checksum is valid */
> +		u8	csum_valid:1;
>  
> -	/* Number of checksums via CHECKSUM_UNNECESSARY */
> -	u8	csum_cnt:3;
> +		/* Number of checksums via CHECKSUM_UNNECESSARY */
> +		u8	csum_cnt:3;
>  
> -	/* Free the skb? */
> -	u8	free:2;
> +		/* Free the skb? */
> +		u8	free:2;
>  #define NAPI_GRO_FREE		  1
>  #define NAPI_GRO_FREE_STOLEN_HEAD 2
>  
> -	/* Used in foo-over-udp, set in udp[46]_gro_receive */
> -	u8	is_ipv6:1;
> +		/* Used in foo-over-udp, set in udp[46]_gro_receive */
> +		u8	is_ipv6:1;
>  
> -	/* Used in GRE, set in fou/gue_gro_receive */
> -	u8	is_fou:1;
> +		/* Used in GRE, set in fou/gue_gro_receive */
> +		u8	is_fou:1;
>  
> -	/* Used to determine if flush_id can be ignored */
> -	u8	is_atomic:1;
> +		/* Used to determine if flush_id can be ignored */
> +		u8	is_atomic:1;
>  
> -	/* Number of gro_receive callbacks this packet already went through */
> -	u8 recursion_counter:4;
> +		/* Number of gro_receive callbacks this packet already went through */
> +		u8 recursion_counter:4;
>  
> -	/* GRO is done by frag_list pointer chaining. */
> -	u8	is_flist:1;
> +		/* GRO is done by frag_list pointer chaining. */
> +		u8	is_flist:1;
> +	);
>  
>  	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
>  	__wsum	csum;
> diff --git a/net/core/gro.c b/net/core/gro.c
> index d43d42215bdb..fc56be9408c7 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -435,6 +435,9 @@ static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head)
>  	napi_gro_complete(napi, oldest);
>  }
>  
> +#define zeroed_len	(offsetof(struct napi_gro_cb, zeroed_end) - \
> +			 offsetof(struct napi_gro_cb, zeroed_start))
> +

A leftover from the RFC I guess? It's not used anywhere in the code,
so it doesn't break build.

>  static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
>  {
>  	u32 bucket = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
> @@ -459,29 +462,22 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
>  
>  		skb_set_network_header(skb, skb_gro_offset(skb));
>  		skb_reset_mac_len(skb);
> -		NAPI_GRO_CB(skb)->same_flow = 0;
> +		BUILD_BUG_ON(sizeof_field(struct napi_gro_cb, zeroed) != sizeof(u32));
> +		BUILD_BUG_ON(!IS_ALIGNED(offsetof(struct napi_gro_cb, zeroed),
> +					 sizeof(u32))); /* Avoid slow unaligned acc */
> +		*(u32 *)&NAPI_GRO_CB(skb)->zeroed = 0;

Looks relatively elegant and self-explanatory to me, nice.

>  		NAPI_GRO_CB(skb)->flush = skb_is_gso(skb) || skb_has_frag_list(skb);
> -		NAPI_GRO_CB(skb)->free = 0;
> -		NAPI_GRO_CB(skb)->encap_mark = 0;
> -		NAPI_GRO_CB(skb)->recursion_counter = 0;
> -		NAPI_GRO_CB(skb)->is_fou = 0;
>  		NAPI_GRO_CB(skb)->is_atomic = 1;
> -		NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
>  
>  		/* Setup for GRO checksum validation */
>  		switch (skb->ip_summed) {
>  		case CHECKSUM_COMPLETE:
>  			NAPI_GRO_CB(skb)->csum = skb->csum;
>  			NAPI_GRO_CB(skb)->csum_valid = 1;
> -			NAPI_GRO_CB(skb)->csum_cnt = 0;
>  			break;
>  		case CHECKSUM_UNNECESSARY:
>  			NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1;
> -			NAPI_GRO_CB(skb)->csum_valid = 0;
>  			break;
> -		default:
> -			NAPI_GRO_CB(skb)->csum_cnt = 0;
> -			NAPI_GRO_CB(skb)->csum_valid = 0;
>  		}
>  
>  		pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
> -- 
> 2.34.1

Al

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists
  2022-02-03 15:48 ` [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists Paolo Abeni
@ 2022-02-03 16:11   ` Eric Dumazet
  2022-02-03 16:26     ` Alexander Lobakin
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2022-02-03 16:11 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Jakub Kicinski, Alexander Lobakin

On Thu, Feb 3, 2022 at 7:48 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> So that we know each element in gro_list has valid gro callbacks
> (and the same for gso). This allows dropping a bunch of conditional
> in fastpath.
>
> Before:
> objdump -t net/core/gro.o | grep " F .text"
> 0000000000000bb0 l     F .text  000000000000033c dev_gro_receive
>
> After:
> 0000000000000bb0 l     F .text  0000000000000325 dev_gro_receive
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/netdevice.h |  3 +-
>  net/core/gro.c            | 90 +++++++++++++++++++++++----------------
>  2 files changed, 56 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3213c7227b59..406cb457d788 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2564,7 +2564,8 @@ struct packet_offload {
>         __be16                   type;  /* This is really htons(ether_type). */
>         u16                      priority;
>         struct offload_callbacks callbacks;
> -       struct list_head         list;
> +       struct list_head         gro_list;
> +       struct list_head         gso_list;
>  };
>

On the other hand, this makes this object bigger, increasing the risk
of spanning cache lines.

It would be nice to group all struct packet_offload together in the
same section to increase data locality.

I played in the past with a similar idea, but splitting struct
packet_offload in two structures, one for GRO, one for GSO.
(Note that GSO is hardly ever use with modern NIC)

But the gains were really marginal.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle
  2022-02-03 15:48 ` [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle Paolo Abeni
@ 2022-02-03 16:16   ` Alexander Lobakin
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Lobakin @ 2022-02-03 16:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Alexander Lobakin, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet

From: Paolo Abeni <pabeni@redhat.com>
Date: Thu,  3 Feb 2022 16:48:21 +0100

> After commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb
> carring sock reference") and commit af352460b465 ("net: fix GRO
> skb truesize update") the truesize of freed skb is properly updated

                                        ^^^^^

One nit here, I'd change this to "truesize of skb with stolen head"
or so. It took me a bit of time to get why we should update the
truesize of skb already freed (: Right, napi_reuse_skb() makes use
of stolen-data skbs.

> by the GRO engine, we don't need anymore resetting it at recycle time.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/core/gro.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/core/gro.c b/net/core/gro.c
> index a11b286d1495..d43d42215bdb 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -634,7 +634,6 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
>  
>  	skb->encapsulation = 0;
>  	skb_shinfo(skb)->gso_type = 0;
> -	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
>  	if (unlikely(skb->slow_gro)) {
>  		skb_orphan(skb);
>  		skb_ext_reset(skb);
> -- 
> 2.34.1

Thanks,
Al

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists
  2022-02-03 16:11   ` Eric Dumazet
@ 2022-02-03 16:26     ` Alexander Lobakin
  2022-02-03 16:41       ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Lobakin @ 2022-02-03 16:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Lobakin, Paolo Abeni, netdev, David S. Miller, Jakub Kicinski

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 3 Feb 2022 08:11:43 -0800

> On Thu, Feb 3, 2022 at 7:48 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > So that we know each element in gro_list has valid gro callbacks
> > (and the same for gso). This allows dropping a bunch of conditional
> > in fastpath.
> >
> > Before:
> > objdump -t net/core/gro.o | grep " F .text"
> > 0000000000000bb0 l     F .text  000000000000033c dev_gro_receive
> >
> > After:
> > 0000000000000bb0 l     F .text  0000000000000325 dev_gro_receive
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/linux/netdevice.h |  3 +-
> >  net/core/gro.c            | 90 +++++++++++++++++++++++----------------
> >  2 files changed, 56 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 3213c7227b59..406cb457d788 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -2564,7 +2564,8 @@ struct packet_offload {
> >         __be16                   type;  /* This is really htons(ether_type). */
> >         u16                      priority;
> >         struct offload_callbacks callbacks;
> > -       struct list_head         list;
> > +       struct list_head         gro_list;
> > +       struct list_head         gso_list;
> >  };
> >
> 
> On the other hand, this makes this object bigger, increasing the risk
> of spanning cache lines.

As you said, GSO callbacks are barely used with most modern NICs.
Plus GRO and GSO callbacks are used in the completely different
operations.
`gro_list` occupies the same place where the `list` previously was.
Does it make a lot of sense to care about `gso_list` being placed
in a cacheline separate from `gro_list`?

> 
> It would be nice to group all struct packet_offload together in the
> same section to increase data locality.
> 
> I played in the past with a similar idea, but splitting struct
> packet_offload in two structures, one for GRO, one for GSO.
> (Note that GSO is hardly ever use with modern NIC)
> 
> But the gains were really marginal.

Thanks,
Al

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive()
  2022-02-03 15:48 ` [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive() Paolo Abeni
  2022-02-03 16:09   ` Alexander Lobakin
@ 2022-02-03 16:39   ` Alexander H Duyck
  2022-02-03 17:44     ` Paolo Abeni
  1 sibling, 1 reply; 12+ messages in thread
From: Alexander H Duyck @ 2022-02-03 16:39 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

On Thu, 2022-02-03 at 16:48 +0100, Paolo Abeni wrote:
> While inspecting some perf report, I noticed that the compiler
> emits suboptimal code for the napi CB initialization, fetching
> and storing multiple times the memory for flags bitfield.
> This is with gcc 10.3.1, but I observed the same with older compiler
> versions.
> 
> We can help the compiler to do a nicer work clearing several
> fields at once using an u32 alias. The generated code is quite
> smaller, with the same number of conditional.
> 
> Before:
> objdump -t net/core/gro.o | grep " F .text"
> 0000000000000bb0 l     F .text  0000000000000357 dev_gro_receive
> 
> After:
> 0000000000000bb0 l     F .text  000000000000033c dev_gro_receive
> 
> RFC -> v1:
>  - use __struct_group to delimt the zeroed area (Alexander)
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/net/gro.h | 52 +++++++++++++++++++++++++--------------------
> --
>  net/core/gro.c    | 18 +++++++---------
>  2 files changed, 35 insertions(+), 35 deletions(-)
> 
> diff --git a/include/net/gro.h b/include/net/gro.h
> index 8f75802d50fd..fa1bb0f0ad28 100644
> --- a/include/net/gro.h
> +++ b/include/net/gro.h
> @@ -29,46 +29,50 @@ struct napi_gro_cb {
>         /* Number of segments aggregated. */
>         u16     count;
>  
> -       /* Start offset for remote checksum offload */
> -       u16     gro_remcsum_start;
> +       /* Used in ipv6_gro_receive() and foo-over-udp */
> +       u16     proto;
>  
>         /* jiffies when first packet was created/queued */
>         unsigned long age;
>  
> -       /* Used in ipv6_gro_receive() and foo-over-udp */
> -       u16     proto;
> +       /* portion of the cb set to zero at every gro iteration */
> +       __struct_group(/* no tag */, zeroed, /* no attrs */,

Any specific reason for using __struct_group here rather than the
struct_group macro instead?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists
  2022-02-03 16:26     ` Alexander Lobakin
@ 2022-02-03 16:41       ` Eric Dumazet
  2022-02-03 18:07         ` Paolo Abeni
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2022-02-03 16:41 UTC (permalink / raw)
  To: Alexander Lobakin; +Cc: Paolo Abeni, netdev, David S. Miller, Jakub Kicinski

On Thu, Feb 3, 2022 at 8:28 AM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Thu, 3 Feb 2022 08:11:43 -0800
>
> > On Thu, Feb 3, 2022 at 7:48 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > So that we know each element in gro_list has valid gro callbacks
> > > (and the same for gso). This allows dropping a bunch of conditional
> > > in fastpath.
> > >
> > > Before:
> > > objdump -t net/core/gro.o | grep " F .text"
> > > 0000000000000bb0 l     F .text  000000000000033c dev_gro_receive
> > >
> > > After:
> > > 0000000000000bb0 l     F .text  0000000000000325 dev_gro_receive
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/netdevice.h |  3 +-
> > >  net/core/gro.c            | 90 +++++++++++++++++++++++----------------
> > >  2 files changed, 56 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 3213c7227b59..406cb457d788 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2564,7 +2564,8 @@ struct packet_offload {
> > >         __be16                   type;  /* This is really htons(ether_type). */
> > >         u16                      priority;
> > >         struct offload_callbacks callbacks;
> > > -       struct list_head         list;
> > > +       struct list_head         gro_list;
> > > +       struct list_head         gso_list;
> > >  };
> > >
> >
> > On the other hand, this makes this object bigger, increasing the risk
> > of spanning cache lines.
>
> As you said, GSO callbacks are barely used with most modern NICs.
> Plus GRO and GSO callbacks are used in the completely different
> operations.
> `gro_list` occupies the same place where the `list` previously was.
> Does it make a lot of sense to care about `gso_list` being placed
> in a cacheline separate from `gro_list`?

Not sure what you are asking in this last sentence.

Putting together all struct packet_offload and struct net_offload
would reduce number of cache lines,
but making these objects bigger is conflicting.

Another approach to avoiding conditional tests would be to provide non
NULL values
for all "struct offload_callbacks"  members, just in case we missed something.

I think the test over ptype->type == type should be enough, right ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive()
  2022-02-03 16:39   ` Alexander H Duyck
@ 2022-02-03 17:44     ` Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2022-02-03 17:44 UTC (permalink / raw)
  To: Alexander H Duyck, netdev
  Cc: David S. Miller, Jakub Kicinski, Alexander Lobakin, Eric Dumazet

On Thu, 2022-02-03 at 08:39 -0800, Alexander H Duyck wrote:
> On Thu, 2022-02-03 at 16:48 +0100, Paolo Abeni wrote:
> > While inspecting some perf report, I noticed that the compiler
> > emits suboptimal code for the napi CB initialization, fetching
> > and storing multiple times the memory for flags bitfield.
> > This is with gcc 10.3.1, but I observed the same with older compiler
> > versions.
> > 
> > We can help the compiler to do a nicer work clearing several
> > fields at once using an u32 alias. The generated code is quite
> > smaller, with the same number of conditional.
> > 
> > Before:
> > objdump -t net/core/gro.o | grep " F .text"
> > 0000000000000bb0 l     F .text  0000000000000357 dev_gro_receive
> > 
> > After:
> > 0000000000000bb0 l     F .text  000000000000033c dev_gro_receive
> > 
> > RFC -> v1:
> >  - use __struct_group to delimt the zeroed area (Alexander)
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/net/gro.h | 52 +++++++++++++++++++++++++--------------------
> > --
> >  net/core/gro.c    | 18 +++++++---------
> >  2 files changed, 35 insertions(+), 35 deletions(-)
> > 
> > diff --git a/include/net/gro.h b/include/net/gro.h
> > index 8f75802d50fd..fa1bb0f0ad28 100644
> > --- a/include/net/gro.h
> > +++ b/include/net/gro.h
> > @@ -29,46 +29,50 @@ struct napi_gro_cb {
> >         /* Number of segments aggregated. */
> >         u16     count;
> >  
> > -       /* Start offset for remote checksum offload */
> > -       u16     gro_remcsum_start;
> > +       /* Used in ipv6_gro_receive() and foo-over-udp */
> > +       u16     proto;
> >  
> >         /* jiffies when first packet was created/queued */
> >         unsigned long age;
> >  
> > -       /* Used in ipv6_gro_receive() and foo-over-udp */
> > -       u16     proto;
> > +       /* portion of the cb set to zero at every gro iteration */
> > +       __struct_group(/* no tag */, zeroed, /* no attrs */,
> 
> Any specific reason for using __struct_group here rather than the
> struct_group macro instead?

Just sheer ignorance on my side. I'll fix on the next iteration.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists
  2022-02-03 16:41       ` Eric Dumazet
@ 2022-02-03 18:07         ` Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2022-02-03 18:07 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Lobakin; +Cc: netdev, David S. Miller, Jakub Kicinski

On Thu, 2022-02-03 at 08:41 -0800, Eric Dumazet wrote:
> On Thu, Feb 3, 2022 at 8:28 AM Alexander Lobakin
> <alexandr.lobakin@intel.com> wrote:
> > 
> > From: Eric Dumazet <edumazet@google.com>
> > Date: Thu, 3 Feb 2022 08:11:43 -0800
> > 
> > > On Thu, Feb 3, 2022 at 7:48 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > 
> > > > So that we know each element in gro_list has valid gro callbacks
> > > > (and the same for gso). This allows dropping a bunch of conditional
> > > > in fastpath.
> > > > 
> > > > Before:
> > > > objdump -t net/core/gro.o | grep " F .text"
> > > > 0000000000000bb0 l     F .text  000000000000033c dev_gro_receive
> > > > 
> > > > After:
> > > > 0000000000000bb0 l     F .text  0000000000000325 dev_gro_receive
> > > > 
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > >  include/linux/netdevice.h |  3 +-
> > > >  net/core/gro.c            | 90 +++++++++++++++++++++++----------------
> > > >  2 files changed, 56 insertions(+), 37 deletions(-)
> > > > 
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 3213c7227b59..406cb457d788 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -2564,7 +2564,8 @@ struct packet_offload {
> > > >         __be16                   type;  /* This is really htons(ether_type). */
> > > >         u16                      priority;
> > > >         struct offload_callbacks callbacks;
> > > > -       struct list_head         list;
> > > > +       struct list_head         gro_list;
> > > > +       struct list_head         gso_list;
> > > >  };
> > > > 
> > > 
> > > On the other hand, this makes this object bigger, increasing the risk
> > > of spanning cache lines.
> > 
> > As you said, GSO callbacks are barely used with most modern NICs.
> > Plus GRO and GSO callbacks are used in the completely different
> > operations.
> > `gro_list` occupies the same place where the `list` previously was.
> > Does it make a lot of sense to care about `gso_list` being placed
> > in a cacheline separate from `gro_list`?
> 
> Not sure what you are asking in this last sentence.
> 
> Putting together all struct packet_offload and struct net_offload
> would reduce number of cache lines,
> but making these objects bigger is conflicting.
> 
> Another approach to avoiding conditional tests would be to provide non
> NULL values
> for all "struct offload_callbacks"  members, just in case we missed something.
> 
> I think the test over ptype->type == type should be enough, right ?

I'll try this later approach, it looks simpler.

Also I'll keep this patch separate, as the others looks less
controversial.

Thanks!

Paolo



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-03 18:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 15:48 [PATCH net-next 0/3] gro: some minor optimization Paolo Abeni
2022-02-03 15:48 ` [PATCH net-next 1/3] net: gro: avoid re-computing truesize twice on recycle Paolo Abeni
2022-02-03 16:16   ` Alexander Lobakin
2022-02-03 15:48 ` [PATCH net-next 2/3] net: gro: minor optimization for dev_gro_receive() Paolo Abeni
2022-02-03 16:09   ` Alexander Lobakin
2022-02-03 16:39   ` Alexander H Duyck
2022-02-03 17:44     ` Paolo Abeni
2022-02-03 15:48 ` [PATCH net-next 3/3] net: gro: register gso and gro offload on separate lists Paolo Abeni
2022-02-03 16:11   ` Eric Dumazet
2022-02-03 16:26     ` Alexander Lobakin
2022-02-03 16:41       ` Eric Dumazet
2022-02-03 18:07         ` Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.