All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamal Mostafa <kamal@canonical.com>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	kernel-team@lists.ubuntu.com
Cc: Neil Horman <nhorman@tuxdriver.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	"David S. Miller" <davem@davemloft.net>,
	netem@lists.linux-foundation.org, eric.dumazet@gmail.com,
	stephen@networkplumber.org, Kamal Mostafa <kamal@canonical.com>
Subject: [PATCH 4.2.y-ckt 45/53] netem: Segment GSO packets on enqueue
Date: Tue, 24 May 2016 10:55:15 -0700	[thread overview]
Message-ID: <1464112523-3701-46-git-send-email-kamal@canonical.com> (raw)
In-Reply-To: <1464112523-3701-1-git-send-email-kamal@canonical.com>

4.2.8-ckt11 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Neil Horman <nhorman@tuxdriver.com>

[ Upstream commit 6071bd1aa13ed9e41824bafad845b7b7f4df5cfd ]

This was recently reported to me, and reproduced on the latest net kernel,
when attempting to run netperf from a host that had a netem qdisc attached
to the egress interface:

[  788.073771] ---------------------[ cut here ]---------------------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.

tested successfully by myself on the latest net kernel, to which this applies

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netem@lists.linux-foundation.org
CC: eric.dumazet@gmail.com
CC: stephen@networkplumber.org
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 net/sched/sch_netem.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9640bb3..4befe97 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -395,6 +395,25 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 	sch->q.qlen++;
 }
 
+/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
+ * when we statistically choose to corrupt one, we instead segment it, returning
+ * the first packet to be corrupted, and re-enqueue the remaining frames
+ */
+static struct sk_buff *netem_segment(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct sk_buff *segs;
+	netdev_features_t features = netif_skb_features(skb);
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs)) {
+		qdisc_reshape_fail(skb, sch);
+		return NULL;
+	}
+	consume_skb(skb);
+	return segs;
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -407,7 +426,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* We don't fill cb now as skb_unshare() may invalidate it */
 	struct netem_skb_cb *cb;
 	struct sk_buff *skb2;
+	struct sk_buff *segs = NULL;
+	unsigned int len = 0, last_len, prev_len = qdisc_pkt_len(skb);
+	int nb = 0;
 	int count = 1;
+	int rc = NET_XMIT_SUCCESS;
 
 	/* Random duplication */
 	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
@@ -453,10 +476,23 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	 * do it now in software before we mangle it.
 	 */
 	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+		if (skb_is_gso(skb)) {
+			segs = netem_segment(skb, sch);
+			if (!segs)
+				return NET_XMIT_DROP;
+		} else {
+			segs = skb;
+		}
+
+		skb = segs;
+		segs = segs->next;
+
 		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
 		    (skb->ip_summed == CHECKSUM_PARTIAL &&
-		     skb_checksum_help(skb)))
-			return qdisc_drop(skb, sch);
+		     skb_checksum_help(skb))) {
+			rc = qdisc_drop(skb, sch);
+			goto finish_segs;
+		}
 
 		skb->data[prandom_u32() % skb_headlen(skb)] ^=
 			1<<(prandom_u32() % 8);
@@ -516,6 +552,27 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch->qstats.requeues++;
 	}
 
+finish_segs:
+	if (segs) {
+		while (segs) {
+			skb2 = segs->next;
+			segs->next = NULL;
+			qdisc_skb_cb(segs)->pkt_len = segs->len;
+			last_len = segs->len;
+			rc = qdisc_enqueue(segs, sch);
+			if (rc != NET_XMIT_SUCCESS) {
+				if (net_xmit_drop_count(rc))
+					qdisc_qstats_drop(sch);
+			} else {
+				nb++;
+				len += last_len;
+			}
+			segs = skb2;
+		}
+		sch->q.qlen += nb;
+		if (nb > 1)
+			qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
+	}
 	return NET_XMIT_SUCCESS;
 }
 
-- 
2.7.4

  parent reply	other threads:[~2016-05-24 17:56 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24 17:54 [4.2.y-ckt stable] Linux 4.2.8-ckt11 stable review Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 01/53] [4.2-stable only] fix backport "IB/security: restrict use of the write() interface" Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 02/53] Revert "usb: hub: do not clear BOS field during reset device" Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 03/53] regulator: s2mps11: Fix invalid selector mask and voltages for buck9 Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 04/53] regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 05/53] ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2) Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 06/53] atomic_open(): fix the handling of create_error Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 07/53] drm/i915/bdw: Add missing delay during L3 SQC credit programming Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 08/53] crypto: hash - Fix page length clamping in hash walk Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 09/53] drm/radeon: fix DP link training issue with second 4K monitor Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 10/53] drm/radeon: fix PLL sharing on DCE6.1 (v2) Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 11/53] get_rock_ridge_filename(): handle malformed NM entries Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 12/53] ALSA: hda - Fix white noise on Asus UX501VW headset Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 13/53] Input: max8997-haptic - fix NULL pointer dereference Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 14/53] drm/i915: Bail out of pipe config compute loop on LPT Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 15/53] ALSA: hda - Fix broken reconfig Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 16/53] ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 17/53] vfs: add vfs_select_inode() helper Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 18/53] vfs: rename: check backing inode being equal Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 19/53] ALSA: usb-audio: Yet another Phoneix Audio device quirk Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 20/53] perf/x86: Fix undefined shift on 32-bit kernels Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 21/53] perf/x86/intel/pt: Generate PMI in the STOP region as well Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 22/53] perf/core: Disable the event on a truncated AUX record Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 23/53] tools lib traceevent: Do not reassign parg after collapse_tree() Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 24/53] workqueue: fix rebind bound workers warning Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 25/53] ocfs2: fix posix_acl_create deadlock Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 26/53] nf_conntrack: avoid kernel pointer value leak in slab name Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 27/53] macvtap: segmented packet is consumed Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 28/53] regulator: axp20x: Fix axp22x ldo_io voltage ranges Kamal Mostafa
2016-05-24 17:54 ` [PATCH 4.2.y-ckt 29/53] arm64: bpf: jit JMP_JSET_{X,K} Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 30/53] bridge: fix igmp / mld query parsing Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 31/53] net/mlx4_en: Fix endianness bug in IPV6 csum calculation Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 32/53] net: fec: only clear a queue's work bit if the queue was emptied Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 33/53] tcp: refresh skb timestamp at retransmit time Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 34/53] net/route: enforce hoplimit max value Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 35/53] decnet: Do not build routes to devices without decnet private data Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 36/53] route: do not cache fib route info on local routes with oif Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 37/53] net: use skb_postpush_rcsum instead of own implementations Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 38/53] vlan: pull on __vlan_insert_tag error path and fix csum correction Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 39/53] ipv4/fib: don't warn when primary address is missing if in_dev is dead Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 40/53] bpf: fix double-fdput in replace_map_fd_with_map_ptr() Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 41/53] net_sched: introduce qdisc_replace() helper Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 42/53] net_sched: update hierarchical backlog too Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 43/53] sch_htb: update backlog as well Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 44/53] sch_dsmark: " Kamal Mostafa
2016-05-24 17:55 ` Kamal Mostafa [this message]
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 46/53] net: fix infoleak in llc Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 47/53] net: fix infoleak in rtnetlink Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 48/53] VSOCK: do not disconnect socket when peer has shutdown SEND only Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 49/53] net: bridge: fix old ioctl unlocked net device walk Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 50/53] net: fix a kernel infoleak in x25 module Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 51/53] cdc_mbim: apply "NDP to end" quirk to all Huawei devices Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 52/53] soreuseport: fix ordering for mixed v4/v6 sockets Kamal Mostafa
2016-05-24 17:55 ` [PATCH 4.2.y-ckt 53/53] uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h Kamal Mostafa
2016-05-25  7:22   ` Mikko Rapeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464112523-3701-46-git-send-email-kamal@canonical.com \
    --to=kamal@canonical.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=kernel-team@lists.ubuntu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netem@lists.linux-foundation.org \
    --cc=nhorman@tuxdriver.com \
    --cc=stable@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.