All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] netem: Segment GSO packets on enqueue.
@ 2016-04-26 17:43 Neil Horman
  2016-04-26 18:49 ` Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Neil Horman @ 2016-04-26 17:43 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, Jamal Hadi Salim, David S. Miller, netem

This was recently reported to me, and reproduced on the latest net kernel, when
attempting to run netperf from a host that had a netem qdisc attached to the
egress interface:

[  788.073771] ------------[ cut here ]------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), except here we always
segment, instead of only when the interface needs us to do it.  This allows
netem to properly drop/mangle/pass/etc the correct percentages of frames as per
its qdisc configuration, and avoid failing its checksum operations

tested successfully by myself on the latest net kernel, to whcih this applies

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netem@lists.linux-foundation.org
---
 net/sched/sch_netem.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9640bb3..c9feecb 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -401,7 +401,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
  * 	NET_XMIT_DROP: queue length didn't change.
  *      NET_XMIT_SUCCESS: one skb was queued.
  */
-static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+static int __netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
 	/* We don't fill cb now as skb_unshare() may invalidate it */
@@ -519,6 +519,38 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	return NET_XMIT_SUCCESS;
 }
 
+static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct sk_buff *segs;
+	struct sk_buff *next;
+	int rc = NET_XMIT_SUCCESS;
+
+	if (skb_is_gso(skb)) {
+		segs = skb_gso_segment(skb, 0);
+		kfree_skb(skb);
+		if (IS_ERR(segs)) {
+			qdisc_qstats_drop(sch);
+			return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+		}
+		skb = segs;
+	}
+
+	while (skb) {
+		next = skb->next;
+		skb->next = NULL;
+		if (rc == NET_XMIT_SUCCESS)
+			rc = __netem_enqueue(skb, sch);
+		else {
+			qdisc_qstats_drop(sch);
+			kfree_skb(skb);
+		}
+
+		skb = next;
+	}
+
+	return rc;
+}
+
 static unsigned int netem_drop(struct Qdisc *sch)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] netem: Segment GSO packets on enqueue.
  2016-04-26 17:43 [PATCH] netem: Segment GSO packets on enqueue Neil Horman
@ 2016-04-26 18:49 ` Eric Dumazet
  2016-04-26 19:00   ` Neil Horman
  2016-04-28 20:09 ` [PATCHv2] " Neil Horman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2016-04-26 18:49 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Tue, 2016-04-26 at 13:43 -0400, Neil Horman wrote:
> This was recently reported to me, and reproduced on the latest net kernel, when
> attempting to run netperf from a host that had a netem qdisc attached to the
> egress interface:

..

> The problem occurs because netem is not prepared to handle GSO packets (as it
> uses skb_checksum_help in its enqueue path, which cannot manipulate these
> frames).
> 
> The solution I think is to simply segment the skb in a simmilar fashion to the
> way we do in __dev_queue_xmit (via validate_xmit_skb), except here we always
> segment, instead of only when the interface needs us to do it.  This allows
> netem to properly drop/mangle/pass/etc the correct percentages of frames as per
> its qdisc configuration, and avoid failing its checksum operations
> 
> tested successfully by myself on the latest net kernel, to whcih this applies
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Jamal Hadi Salim <jhs@mojatatu.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: netem@lists.linux-foundation.org
> ---
>  net/sched/sch_netem.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)


This is not correct.

I want my TSO packets being still TSO after netem traversal.

Some of us use netem at 40Gbit speed ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] netem: Segment GSO packets on enqueue.
  2016-04-26 18:49 ` Eric Dumazet
@ 2016-04-26 19:00   ` Neil Horman
  2016-04-26 20:19     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Horman @ 2016-04-26 19:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Tue, Apr 26, 2016 at 11:49:58AM -0700, Eric Dumazet wrote:
> On Tue, 2016-04-26 at 13:43 -0400, Neil Horman wrote:
> > This was recently reported to me, and reproduced on the latest net kernel, when
> > attempting to run netperf from a host that had a netem qdisc attached to the
> > egress interface:
> 
> ..
> 
> > The problem occurs because netem is not prepared to handle GSO packets (as it
> > uses skb_checksum_help in its enqueue path, which cannot manipulate these
> > frames).
> > 
> > The solution I think is to simply segment the skb in a simmilar fashion to the
> > way we do in __dev_queue_xmit (via validate_xmit_skb), except here we always
> > segment, instead of only when the interface needs us to do it.  This allows
> > netem to properly drop/mangle/pass/etc the correct percentages of frames as per
> > its qdisc configuration, and avoid failing its checksum operations
> > 
> > tested successfully by myself on the latest net kernel, to whcih this applies
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Jamal Hadi Salim <jhs@mojatatu.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: netem@lists.linux-foundation.org
> > ---
> >  net/sched/sch_netem.c | 34 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> 
> This is not correct.
> 
> I want my TSO packets being still TSO after netem traversal.
> 
> Some of us use netem at 40Gbit speed ;)
> 
I can understand that, but that raises two questions in my mind:

1)  Doesn't that make all the statistical manipulation for netem wrong?  That is
to say, if netem drops 5% of packets, and it happens to drop a GSO packet, its
actually dropping several, instead of the single one required.

2) How are you getting netem to work with GSO at all?  The warning is triggered
for me on every GSO packet, which I think would impact throughput :)

Neil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] netem: Segment GSO packets on enqueue.
  2016-04-26 19:00   ` Neil Horman
@ 2016-04-26 20:19     ` Eric Dumazet
  2016-04-27 11:27       ` Neil Horman
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2016-04-26 20:19 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Tue, 2016-04-26 at 15:00 -0400, Neil Horman wrote:
> I can understand that, but that raises two questions in my mind:
> 
> 1)  Doesn't that make all the statistical manipulation for netem wrong?  That is
> to say, if netem drops 5% of packets, and it happens to drop a GSO packet, its
> actually dropping several, instead of the single one required.


Please take a look at tbf_segment(), where you can find a proper way to
handle this.

Note that for the case q->corrupt is 0, we definitely do not want to
segment TSO packets.

> 2) How are you getting netem to work with GSO at all?  The warning is triggered
> for me on every GSO packet, which I think would impact throughput :)

I mostly use netem to add delays and drops.
I never had this bug, since q->corrupt = 0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] netem: Segment GSO packets on enqueue.
  2016-04-26 20:19     ` Eric Dumazet
@ 2016-04-27 11:27       ` Neil Horman
  0 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2016-04-27 11:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Tue, Apr 26, 2016 at 01:19:15PM -0700, Eric Dumazet wrote:
> On Tue, 2016-04-26 at 15:00 -0400, Neil Horman wrote:
> > I can understand that, but that raises two questions in my mind:
> > 
> > 1)  Doesn't that make all the statistical manipulation for netem wrong?  That is
> > to say, if netem drops 5% of packets, and it happens to drop a GSO packet, its
> > actually dropping several, instead of the single one required.
> 
> 
> Please take a look at tbf_segment(), where you can find a proper way to
> handle this.
> 
> Note that for the case q->corrupt is 0, we definitely do not want to
> segment TSO packets.
> 
> > 2) How are you getting netem to work with GSO at all?  The warning is triggered
> > for me on every GSO packet, which I think would impact throughput :)
> 
> I mostly use netem to add delays and drops.
> I never had this bug, since q->corrupt = 0
> 

I see what you're saying now, I should only be segmenting the packet if the
qdisc needs to compute the checksum on each packet.  Other packets that aren't
selected to be mangled can pass through un-segmented (assuming they meet any
other queue constraints).

Ok, thanks.  Self-nak.  I'll respin/test and post a new version

Best
Neil

> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2] netem: Segment GSO packets on enqueue.
  2016-04-26 17:43 [PATCH] netem: Segment GSO packets on enqueue Neil Horman
  2016-04-26 18:49 ` Eric Dumazet
@ 2016-04-28 20:09 ` Neil Horman
  2016-04-28 20:58   ` Eric Dumazet
  2016-04-29 17:35 ` [PATCHv3] " Neil Horman
  2016-05-02 16:20 ` [PATCHv4] " Neil Horman
  3 siblings, 1 reply; 15+ messages in thread
From: Neil Horman @ 2016-04-28 20:09 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jamal Hadi Salim, David S. Miller, netem, eric.dumazet

This was recently reported to me, and reproduced on the latest net kernel, when
attempting to run netperf from a host that had a netem qdisc attached to the
egress interface:

[  788.073771] ------------[ cut here ]------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), except here we always
segment, instead of only when the interface needs us to do it.  This allows
netem to properly drop/mangle/pass/etc the correct percentages of frames as per
its qdisc configuration, and avoid failing its checksum operations

tested successfully by myself on the latest net kernel, to whcih this applies

---
Change Notes:
V2) As per request from Eric Dumazet, I rewrote this to limit the need to
segment the skb. Instead of doing so unilaterally, we no only do so now when the
netem qdisc requires determines that a packet must be corrupted, thus avoiding
the failure in skb_checksum_help.  This still leaves open concerns with
statistical measurements made on GSO packets being dropped or reordered (i.e.
they are counted as a single packet rather than multiple packets), but I'd
rather fix the immediate problem before we go rewriting everything to fix that
larger issue.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netem@lists.linux-foundation.org
CC: eric.dumazet@gmail.com
---
 net/sched/sch_netem.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9640bb3..7cde5d3 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -395,6 +395,25 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 	sch->q.qlen++;
 }
 
+/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
+ * when we statistically choose to corrupt one, we instead segment it, returning
+ * the first packet to be corrupted, and re-enqueue the remaining frames
+ */
+static struct sk_buff* netem_segment(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct sk_buff *segs;
+	netdev_features_t features = netif_skb_features(skb);
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs)) {
+		qdisc_reshape_fail(skb, sch);
+		return NULL;
+	}
+	consume_skb(skb);
+	return segs;
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -407,7 +426,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* We don't fill cb now as skb_unshare() may invalidate it */
 	struct netem_skb_cb *cb;
 	struct sk_buff *skb2;
+	struct sk_buff *segs = NULL;
 	int count = 1;
+	int rc = NET_XMIT_SUCCESS;
 
 	/* Random duplication */
 	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
@@ -453,10 +474,22 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	 * do it now in software before we mangle it.
 	 */
 	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+		if (skb_is_gso(skb)) {
+			segs = netem_segment(skb, sch);
+			if (!segs)
+				return NET_XMIT_DROP;
+		} else
+			segs = skb;
+
+		skb = segs;
+		segs = segs->next;
+
 		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
 		    (skb->ip_summed == CHECKSUM_PARTIAL &&
-		     skb_checksum_help(skb)))
-			return qdisc_drop(skb, sch);
+		     skb_checksum_help(skb))) {
+			rc = qdisc_drop(skb, sch);
+			goto finish_segs;
+		}
 
 		skb->data[prandom_u32() % skb_headlen(skb)] ^=
 			1<<(prandom_u32() % 8);
@@ -516,7 +549,19 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch->qstats.requeues++;
 	}
 
-	return NET_XMIT_SUCCESS;
+finish_segs:
+	while (segs) {
+		skb2 = segs->next;
+		segs->next = NULL;
+		qdisc_skb_cb(segs)->pkt_len = segs->len;
+		rc = qdisc_enqueue(segs, sch);
+		if (rc != NET_XMIT_SUCCESS) {
+			if (net_xmit_drop_count(rc))
+				qdisc_qstats_drop(sch);
+		}
+		segs = skb2;
+	}
+	return rc;
 }
 
 static unsigned int netem_drop(struct Qdisc *sch)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv2] netem: Segment GSO packets on enqueue.
  2016-04-28 20:09 ` [PATCHv2] " Neil Horman
@ 2016-04-28 20:58   ` Eric Dumazet
  2016-04-29  1:20     ` Neil Horman
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2016-04-28 20:58 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Thu, 2016-04-28 at 16:09 -0400, Neil Horman wrote:
> This was recently reported to me, and reproduced on the latest net kernel, when
> attempting to run netperf from a host that had a netem qdisc attached to the
> egress interface:

>  
> -	return NET_XMIT_SUCCESS;
> +finish_segs:
> +	while (segs) {
> +		skb2 = segs->next;
> +		segs->next = NULL;
> +		qdisc_skb_cb(segs)->pkt_len = segs->len;
> +		rc = qdisc_enqueue(segs, sch);
> +		if (rc != NET_XMIT_SUCCESS) {
> +			if (net_xmit_drop_count(rc))
> +				qdisc_qstats_drop(sch);
> +		}
> +		segs = skb2;
> +	}
> +	return rc;
>  }

It seems you missed the qdisc_tree_reduce_backlog() call ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2] netem: Segment GSO packets on enqueue.
  2016-04-28 20:58   ` Eric Dumazet
@ 2016-04-29  1:20     ` Neil Horman
  0 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2016-04-29  1:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Thu, Apr 28, 2016 at 01:58:53PM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-28 at 16:09 -0400, Neil Horman wrote:
> > This was recently reported to me, and reproduced on the latest net kernel, when
> > attempting to run netperf from a host that had a netem qdisc attached to the
> > egress interface:
> 
> >  
> > -	return NET_XMIT_SUCCESS;
> > +finish_segs:
> > +	while (segs) {
> > +		skb2 = segs->next;
> > +		segs->next = NULL;
> > +		qdisc_skb_cb(segs)->pkt_len = segs->len;
> > +		rc = qdisc_enqueue(segs, sch);
> > +		if (rc != NET_XMIT_SUCCESS) {
> > +			if (net_xmit_drop_count(rc))
> > +				qdisc_qstats_drop(sch);
> > +		}
> > +		segs = skb2;
> > +	}
> > +	return rc;
> >  }
> 
> It seems you missed the qdisc_tree_reduce_backlog() call ?
> 
Crap, yes, sorry.  I did a last minute modification to move the segment
requeuing to netem_enqueue and inadvertently remove it, I'll repost shortly.

Best
Neil

> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv3] netem: Segment GSO packets on enqueue
  2016-04-26 17:43 [PATCH] netem: Segment GSO packets on enqueue Neil Horman
  2016-04-26 18:49 ` Eric Dumazet
  2016-04-28 20:09 ` [PATCHv2] " Neil Horman
@ 2016-04-29 17:35 ` Neil Horman
  2016-04-29 18:09   ` Eric Dumazet
  2016-04-29 18:19   ` Stephen Hemminger
  2016-05-02 16:20 ` [PATCHv4] " Neil Horman
  3 siblings, 2 replies; 15+ messages in thread
From: Neil Horman @ 2016-04-29 17:35 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jamal Hadi Salim, David S. Miller, netem, eric.dumazet

This was recently reported to me, and reproduced on the latest net kernel, when
attempting to run netperf from a host that had a netem qdisc attached to the
egress interface:

[  788.073771] ------------[ cut here ]------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.

tested successfully by myself on the latest net kernel, to whcih this applies

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netem@lists.linux-foundation.org
CC: eric.dumazet@gmail.com

----
Change Notes:
V2) As per request from Eric Dumazet, I rewrote this to limit the need to
segment the skb. Instead of doing so unilaterally, we only do so now when
the netem qdisc requires determines that a packet must be corrupted, thus
avoidign the failure in skb_checksum_help.  This still leaves open concerns
wth statistical measurements made on GSO packets being dropped or reordered
(i.e. they are counted as a single packet rather than multiple packets), but
i'd rather fix the immediate problem before we go rewriting everything to fix
that larger issue

V3) Added back missing call to qdisc_tree_reduce_backlog that I misplaced in
the V2 change.
---
 net/sched/sch_netem.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9640bb3..c4d8133 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -395,6 +395,25 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 	sch->q.qlen++;
 }
 
+/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
+ * when we statistically choose to corrupt one, we instead segment it, returning
+ * the first packet to be corrupted, and re-enqueue the remaining frames
+ */
+static struct sk_buff* netem_segment(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct sk_buff *segs;
+	netdev_features_t features = netif_skb_features(skb);
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs)) {
+		qdisc_reshape_fail(skb, sch);
+		return NULL;
+	}
+	consume_skb(skb);
+	return segs;
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -407,7 +426,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* We don't fill cb now as skb_unshare() may invalidate it */
 	struct netem_skb_cb *cb;
 	struct sk_buff *skb2;
+	struct sk_buff *segs = NULL;
+	unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
+	int nb = 0;
 	int count = 1;
+	int rc = NET_XMIT_SUCCESS;
 
 	/* Random duplication */
 	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
@@ -453,10 +476,22 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	 * do it now in software before we mangle it.
 	 */
 	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+		if (skb_is_gso(skb)) {
+			segs = netem_segment(skb, sch);
+			if (!segs)
+				return NET_XMIT_DROP;
+		} else
+			segs = skb;
+
+		skb = segs;
+		segs = segs->next;
+
 		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
 		    (skb->ip_summed == CHECKSUM_PARTIAL &&
-		     skb_checksum_help(skb)))
-			return qdisc_drop(skb, sch);
+		     skb_checksum_help(skb))) {
+			rc = qdisc_drop(skb, sch);
+			goto finish_segs;
+		}
 
 		skb->data[prandom_u32() % skb_headlen(skb)] ^=
 			1<<(prandom_u32() % 8);
@@ -516,7 +551,26 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch->qstats.requeues++;
 	}
 
-	return NET_XMIT_SUCCESS;
+finish_segs:
+	if (segs) {
+		while (segs) {
+			skb2 = segs->next;
+			segs->next = NULL;
+			qdisc_skb_cb(segs)->pkt_len = segs->len;
+			len += segs->len;
+			rc = qdisc_enqueue(segs, sch);
+			if (rc != NET_XMIT_SUCCESS) {
+				if (net_xmit_drop_count(rc))
+					qdisc_qstats_drop(sch);
+			} else
+				nb++;
+			segs = skb2;
+		}
+		sch->q.qlen += nb;
+		if (nb > 1)
+			qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
+	}
+	return rc;
 }
 
 static unsigned int netem_drop(struct Qdisc *sch)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv3] netem: Segment GSO packets on enqueue
  2016-04-29 17:35 ` [PATCHv3] " Neil Horman
@ 2016-04-29 18:09   ` Eric Dumazet
  2016-04-29 18:19   ` Stephen Hemminger
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2016-04-29 18:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem

On Fri, 2016-04-29 at 13:35 -0400, Neil Horman wrote:
> This was recently reported to me, and reproduced on the latest net kernel, when
> attempting to run netperf from a host that had a netem qdisc attached to the
> egress interface:
> 


>  /*
>   * Insert one skb into qdisc.
>   * Note: parent depends on return value to account for queue length.
> @@ -407,7 +426,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  	/* We don't fill cb now as skb_unshare() may invalidate it */
>  	struct netem_skb_cb *cb;
>  	struct sk_buff *skb2;
> +	struct sk_buff *segs = NULL;
> +	unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
> +	int nb = 0;
>  	int count = 1;
> +	int rc = NET_XMIT_SUCCESS;
>  
>  	/* Random duplication */
>  	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
> @@ -453,10 +476,22 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  	 * do it now in software before we mangle it.
>  	 */
>  	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
> +		if (skb_is_gso(skb)) {
> +			segs = netem_segment(skb, sch);
> +			if (!segs)
> +				return NET_XMIT_DROP;
> +		} else
> +			segs = skb;
> +
> +		skb = segs;
> +		segs = segs->next;
> +
>  		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
>  		    (skb->ip_summed == CHECKSUM_PARTIAL &&
> -		     skb_checksum_help(skb)))
> -			return qdisc_drop(skb, sch);
> +		     skb_checksum_help(skb))) {
> +			rc = qdisc_drop(skb, sch);
> +			goto finish_segs;
> +		}
>  
>  		skb->data[prandom_u32() % skb_headlen(skb)] ^=
>  			1<<(prandom_u32() % 8);
> @@ -516,7 +551,26 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		sch->qstats.requeues++;
>  	}
>  
> -	return NET_XMIT_SUCCESS;
> +finish_segs:
> +	if (segs) {
> +		while (segs) {
> +			skb2 = segs->next;
> +			segs->next = NULL;
> +			qdisc_skb_cb(segs)->pkt_len = segs->len;
> +			len += segs->len;

Wrong operation if packet is dropped by qdisc_enqueue() ?

I would use
 u32 last_len = segs->len;

> +			rc = qdisc_enqueue(segs, sch);
> +			if (rc != NET_XMIT_SUCCESS) {
> +				if (net_xmit_drop_count(rc))
> +					qdisc_qstats_drop(sch);
> +			} else

   } else {
        nb++;
        len += last_len;
   }

> +				nb++;
> +			segs = skb2;
> +		}
> +		sch->q.qlen += nb;
> +		if (nb > 1)
> +			qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
> +	}
> +	return rc;

Seems you should return NET_XMIT_SUCCESS instead of status of last
segment ?

>  }

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3] netem: Segment GSO packets on enqueue
  2016-04-29 17:35 ` [PATCHv3] " Neil Horman
  2016-04-29 18:09   ` Eric Dumazet
@ 2016-04-29 18:19   ` Stephen Hemminger
  2016-04-30 13:30     ` Neil Horman
  1 sibling, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2016-04-29 18:19 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, Jamal Hadi Salim, David S. Miller, netem, eric.dumazet

On Fri, 29 Apr 2016 13:35:48 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> This was recently reported to me, and reproduced on the latest net kernel, when
> attempting to run netperf from a host that had a netem qdisc attached to the
> egress interface:
> 
> [  788.073771] ------------[ cut here ]------------
> [  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
> [  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
> data_len=0 gso_size=1448 gso_type=1 ip_summed=3
> [  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
> ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
> glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
> i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
> pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
> sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
> i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
> crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
> serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
> dm_mod
> [  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
> ------------   3.10.0-327.el7.x86_64 #1
> [  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
> [  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
> ffffffff816351f1
> [  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
> ffff880231674000
> [  788.611943]  0000000000000001 0000000000000003 0000000000000000
> ffff880437c03710
> [  788.647241] Call Trace:
> [  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
> [  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
> [  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
> [  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
> [  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
> [  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
> [  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
> [  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
> [  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
> ...
> 
> The problem occurs because netem is not prepared to handle GSO packets (as it
> uses skb_checksum_help in its enqueue path, which cannot manipulate these
> frames).
> 
> The solution I think is to simply segment the skb in a simmilar fashion to the
> way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
> When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
> the first segment, and enqueue the remaining ones.
> 
> tested successfully by myself on the latest net kernel, to whcih this applies
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Jamal Hadi Salim <jhs@mojatatu.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: netem@lists.linux-foundation.org
> CC: eric.dumazet@gmail.com
> 

This looks like a good idea.

Please cleanup the formatting issues:

This was recently reported to me, and reproduced on the latest net kernel, when

WARNING: 'whcih' may be misspelled - perhaps 'which'?
#93: 
tested successfully by myself on the latest net kernel, to whcih this applies

ERROR: "foo* bar" should be "foo *bar"
#130: FILE: net/sched/sch_netem.c:402:
+static struct sk_buff* netem_segment(struct sk_buff *skb, struct Qdisc *sch)


CHECK: braces {} should be used on all arms of this statement
#164: FILE: net/sched/sch_netem.c:479:
+		if (skb_is_gso(skb)) {
[...]
+		} else
[...]

CHECK: braces {} should be used on all arms of this statement
#198: FILE: net/sched/sch_netem.c:562:
+			if (rc != NET_XMIT_SUCCESS) {
[...]
+			} else
[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3] netem: Segment GSO packets on enqueue
  2016-04-29 18:19   ` Stephen Hemminger
@ 2016-04-30 13:30     ` Neil Horman
  0 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2016-04-30 13:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, Jamal Hadi Salim, David S. Miller, netem, eric.dumazet

On Fri, Apr 29, 2016 at 11:19:05AM -0700, Stephen Hemminger wrote:
> On Fri, 29 Apr 2016 13:35:48 -0400
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > This was recently reported to me, and reproduced on the latest net kernel, when
> > attempting to run netperf from a host that had a netem qdisc attached to the
> > egress interface:
> > 
> > [  788.073771] ------------[ cut here ]------------
> > [  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
> > [  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
> > data_len=0 gso_size=1448 gso_type=1 ip_summed=3
> > [  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
> > ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
> > glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
> > i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
> > pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
> > sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
> > i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
> > crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
> > serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
> > dm_mod
> > [  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
> > ------------   3.10.0-327.el7.x86_64 #1
> > [  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
> > [  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
> > ffffffff816351f1
> > [  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
> > ffff880231674000
> > [  788.611943]  0000000000000001 0000000000000003 0000000000000000
> > ffff880437c03710
> > [  788.647241] Call Trace:
> > [  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
> > [  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
> > [  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
> > [  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
> > [  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
> > [  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
> > [  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
> > [  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
> > [  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
> > ...
> > 
> > The problem occurs because netem is not prepared to handle GSO packets (as it
> > uses skb_checksum_help in its enqueue path, which cannot manipulate these
> > frames).
> > 
> > The solution I think is to simply segment the skb in a simmilar fashion to the
> > way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
> > When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
> > the first segment, and enqueue the remaining ones.
> > 
> > tested successfully by myself on the latest net kernel, to whcih this applies
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Jamal Hadi Salim <jhs@mojatatu.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: netem@lists.linux-foundation.org
> > CC: eric.dumazet@gmail.com
> > 
> 
> This looks like a good idea.
> 
> Please cleanup the formatting issues:
> 
> This was recently reported to me, and reproduced on the latest net kernel, when
> 
> WARNING: 'whcih' may be misspelled - perhaps 'which'?
> #93: 
> tested successfully by myself on the latest net kernel, to whcih this applies
> 
> ERROR: "foo* bar" should be "foo *bar"
> #130: FILE: net/sched/sch_netem.c:402:
> +static struct sk_buff* netem_segment(struct sk_buff *skb, struct Qdisc *sch)
> 
> 
> CHECK: braces {} should be used on all arms of this statement
> #164: FILE: net/sched/sch_netem.c:479:
> +		if (skb_is_gso(skb)) {
> [...]
> +		} else
> [...]
> 
> CHECK: braces {} should be used on all arms of this statement
> #198: FILE: net/sched/sch_netem.c:562:
> +			if (rc != NET_XMIT_SUCCESS) {
> [...]
> +			} else
> [...]
> 
> 
> 
> 
> 
> 
Will, do when I get back to the office monday
Thanks!
Neil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv4] netem: Segment GSO packets on enqueue
  2016-04-26 17:43 [PATCH] netem: Segment GSO packets on enqueue Neil Horman
                   ` (2 preceding siblings ...)
  2016-04-29 17:35 ` [PATCHv3] " Neil Horman
@ 2016-05-02 16:20 ` Neil Horman
  2016-05-02 16:56   ` Eric Dumazet
  2016-05-03  4:34   ` David Miller
  3 siblings, 2 replies; 15+ messages in thread
From: Neil Horman @ 2016-05-02 16:20 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Jamal Hadi Salim, David S. Miller, netem,
	eric.dumazet, stephen

This was recently reported to me, and reproduced on the latest net kernel,
when attempting to run netperf from a host that had a netem qdisc attached
to the egress interface:

[  788.073771] ---------------------[ cut here ]---------------------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.

tested successfully by myself on the latest net kernel, to which this applies

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netem@lists.linux-foundation.org
CC: eric.dumazet@gmail.com
CC: stephen@networkplumber.org

---
Change Notes:
V2) As per request from Eric Dumazet, I rewrote this to limit the need to
segment the skb. Instead of doing so unilaterally, we no only do so now when the
netem qdisc requires determines that a packet must be corrupted, thus avoiding
the failure in skb_checksum_help.  This still leaves open concerns with
statistical measurements made on GSO packets being dropped or reordered (i.e.
they are counted as a single packet rather than multiple packets), but I'd
rather fix the immediate problem before we go rewriting everything to fix that
larger issue.

V3) Added back missing call to qdisc_tree_reduce_backlog that I misplaced in the
V2 change.

V4) Fix up length computation and return code.  Also clean up some patch
formatting
---
 net/sched/sch_netem.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9640bb3..4befe97 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -395,6 +395,25 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 	sch->q.qlen++;
 }
 
+/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
+ * when we statistically choose to corrupt one, we instead segment it, returning
+ * the first packet to be corrupted, and re-enqueue the remaining frames
+ */
+static struct sk_buff *netem_segment(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct sk_buff *segs;
+	netdev_features_t features = netif_skb_features(skb);
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs)) {
+		qdisc_reshape_fail(skb, sch);
+		return NULL;
+	}
+	consume_skb(skb);
+	return segs;
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -407,7 +426,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* We don't fill cb now as skb_unshare() may invalidate it */
 	struct netem_skb_cb *cb;
 	struct sk_buff *skb2;
+	struct sk_buff *segs = NULL;
+	unsigned int len = 0, last_len, prev_len = qdisc_pkt_len(skb);
+	int nb = 0;
 	int count = 1;
+	int rc = NET_XMIT_SUCCESS;
 
 	/* Random duplication */
 	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
@@ -453,10 +476,23 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	 * do it now in software before we mangle it.
 	 */
 	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+		if (skb_is_gso(skb)) {
+			segs = netem_segment(skb, sch);
+			if (!segs)
+				return NET_XMIT_DROP;
+		} else {
+			segs = skb;
+		}
+
+		skb = segs;
+		segs = segs->next;
+
 		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
 		    (skb->ip_summed == CHECKSUM_PARTIAL &&
-		     skb_checksum_help(skb)))
-			return qdisc_drop(skb, sch);
+		     skb_checksum_help(skb))) {
+			rc = qdisc_drop(skb, sch);
+			goto finish_segs;
+		}
 
 		skb->data[prandom_u32() % skb_headlen(skb)] ^=
 			1<<(prandom_u32() % 8);
@@ -516,6 +552,27 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch->qstats.requeues++;
 	}
 
+finish_segs:
+	if (segs) {
+		while (segs) {
+			skb2 = segs->next;
+			segs->next = NULL;
+			qdisc_skb_cb(segs)->pkt_len = segs->len;
+			last_len = segs->len;
+			rc = qdisc_enqueue(segs, sch);
+			if (rc != NET_XMIT_SUCCESS) {
+				if (net_xmit_drop_count(rc))
+					qdisc_qstats_drop(sch);
+			} else {
+				nb++;
+				len += last_len;
+			}
+			segs = skb2;
+		}
+		sch->q.qlen += nb;
+		if (nb > 1)
+			qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
+	}
 	return NET_XMIT_SUCCESS;
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv4] netem: Segment GSO packets on enqueue
  2016-05-02 16:20 ` [PATCHv4] " Neil Horman
@ 2016-05-02 16:56   ` Eric Dumazet
  2016-05-03  4:34   ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2016-05-02 16:56 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Jamal Hadi Salim, David S. Miller, netem, stephen

On Mon, 2016-05-02 at 12:20 -0400, Neil Horman wrote:
> This was recently reported to me, and reproduced on the latest net kernel,
> when attempting to run netperf from a host that had a netem qdisc attached
> to the egress interface:

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks Neil.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv4] netem: Segment GSO packets on enqueue
  2016-05-02 16:20 ` [PATCHv4] " Neil Horman
  2016-05-02 16:56   ` Eric Dumazet
@ 2016-05-03  4:34   ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2016-05-03  4:34 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, jhs, netem, eric.dumazet, stephen

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon,  2 May 2016 12:20:15 -0400

> This was recently reported to me, and reproduced on the latest net kernel,
> when attempting to run netperf from a host that had a netem qdisc attached
> to the egress interface:
 ...
> The problem occurs because netem is not prepared to handle GSO packets (as it
> uses skb_checksum_help in its enqueue path, which cannot manipulate these
> frames).
> 
> The solution I think is to simply segment the skb in a simmilar fashion to the
> way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
> When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
> the first segment, and enqueue the remaining ones.
> 
> tested successfully by myself on the latest net kernel, to which this applies
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Applied and queued up for -stable, thanks Neil.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-05-03  4:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26 17:43 [PATCH] netem: Segment GSO packets on enqueue Neil Horman
2016-04-26 18:49 ` Eric Dumazet
2016-04-26 19:00   ` Neil Horman
2016-04-26 20:19     ` Eric Dumazet
2016-04-27 11:27       ` Neil Horman
2016-04-28 20:09 ` [PATCHv2] " Neil Horman
2016-04-28 20:58   ` Eric Dumazet
2016-04-29  1:20     ` Neil Horman
2016-04-29 17:35 ` [PATCHv3] " Neil Horman
2016-04-29 18:09   ` Eric Dumazet
2016-04-29 18:19   ` Stephen Hemminger
2016-04-30 13:30     ` Neil Horman
2016-05-02 16:20 ` [PATCHv4] " Neil Horman
2016-05-02 16:56   ` Eric Dumazet
2016-05-03  4:34   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.