All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Shahar Klein <shahark@mellanox.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.9 06/59] net, sched: fix soft lockup in tc_classify
Date: Fri, 13 Jan 2017 13:01:13 +0100	[thread overview]
Message-ID: <20170113113839.604917522@linuxfoundation.org> (raw)
In-Reply-To: <20170113113839.364876751@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>


[ Upstream commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 ]

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/cls_api.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff
 	unsigned long cl;
 	unsigned long fh;
 	int err;
-	int tp_created = 0;
+	int tp_created;
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) &&
 	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
+	tp_created = 0;
+
 	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
 	if (err < 0)
 		return err;

  parent reply	other threads:[~2017-01-13 12:02 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170113120315epcas3p230e25f6a27069614fecfb95292c2ba78@epcas3p2.samsung.com>
2017-01-13 12:01 ` [PATCH 4.9 00/59] 4.9.4-stable review Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 01/59] net: vrf: Fix NAT within a VRF Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 02/59] net: vrf: Drop conntrack data after pass through VRF device on Tx Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 03/59] sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 04/59] inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 05/59] ipv6: handle -EFAULT from skb_copy_bits Greg Kroah-Hartman
2017-01-13 12:01   ` Greg Kroah-Hartman [this message]
2017-01-13 12:01   ` [PATCH 4.9 07/59] net: stmmac: Fix race between stmmac_drv_probe and stmmac_open Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 08/59] net/sched: cls_flower: Fix missing addr_type in classify Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 09/59] net/mlx5: Check FW limitations on log_max_qp before setting it Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 10/59] net/mlx5: Cancel recovery work in remove flow Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 11/59] net/mlx5: Avoid shadowing numa_node Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 12/59] net/mlx5: Mask destination mac value in ethtool steering rules Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 13/59] net/mlx5: Prevent setting multicast macs for VFs Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 14/59] net/mlx5e: Dont sync netdev state when not registered Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 15/59] net/mlx5e: Disable netdev after close Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 16/59] rtnl: stats - add missing netlink message size checks Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 17/59] net: fix incorrect original ingress device index in PKTINFO Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 18/59] net: ipv4: dst for local input routes should use l3mdev if relevant Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 19/59] drop_monitor: add missing call to genlmsg_end Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 20/59] drop_monitor: consider inserted data in genlmsg_end Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 21/59] flow_dissector: Update pptp handling to avoid null pointer deref Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 22/59] igmp: Make igmp group member RFC 3376 compliant Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 23/59] ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 24/59] net: vrf: Add missing Rx counters Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 25/59] bpf: change back to orig prog on too many passes Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 26/59] net: dsa: bcm_sf2: Do not clobber b53_switch_ops Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 27/59] net: dsa: bcm_sf2: Utilize nested MDIO read/write Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 28/59] r8152: split rtl8152_suspend function Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 29/59] r8152: fix rx issue for runtime suspend Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 30/59] net: dsa: Ensure validity of dst->ds[0] Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 31/59] net: add the AF_QIPCRTR entries to family name tables Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 32/59] gro: Enter slow-path if there is no tailroom Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 33/59] gro: use min_t() in skb_gro_reset_offset() Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 34/59] gro: Disable frag0 optimization on IPv6 ext headers Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 35/59] net/mlx5e: Remove WARN_ONCE from adaptive moderation code Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 36/59] net: ipv4: Fix multipath selection with vrf Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 37/59] net: vrf: do not allow table id 0 Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 38/59] HID: hid-cypress: validate length of report Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 39/59] ALSA: firewire-tascam: Fix to handle error from initialization of stream data Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 40/59] powerpc: Fix build warning on 32-bit PPC Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 41/59] tools/virtio: fix READ_ONCE() Greg Kroah-Hartman
2017-01-13 12:01   ` Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 42/59] arm64: dts: mt8173: Fix auxadc node Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 43/59] ARM64: dts: bcm2837-rpi-3-b: remove incorrect pwr LED Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 45/59] ARM: pxa: fix pxa25x interrupt init Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 46/59] ARM: zynq: Reserve correct amount of non-DMA RAM Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 47/59] ARM: qcom_defconfig: Fix MDM9515 LCC and GCC config Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 48/59] svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 49/59] bus: arm-ccn: Prevent hotplug callback leak Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 50/59] ARM: OMAP5: Fix mpuss_early_init Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 51/59] ARM: OMAP5: Fix build for PM code Greg Kroah-Hartman
2017-01-13 12:01   ` [PATCH 4.9 52/59] ARM: OMAP4+: Fix bad fallthrough for cpuidle Greg Kroah-Hartman
2017-01-13 12:02   ` [PATCH 4.9 53/59] ARM: 8631/1: clkdev: Detect errors in clk_hw_register_clkdev() for mass registration Greg Kroah-Hartman
2017-01-13 12:02   ` [PATCH 4.9 54/59] ARM: omap2+: am437x: rollback to use omap3_gptimer_timer_init() Greg Kroah-Hartman
2017-01-13 12:02   ` [PATCH 4.9 56/59] ALSA: usb-audio: Add a quirk for Plantronics BT600 Greg Kroah-Hartman
2017-01-13 12:02   ` [PATCH 4.9 58/59] rtlwifi: Fix enter/exit power_save Greg Kroah-Hartman
2017-01-13 12:02   ` [PATCH 4.9 59/59] rtlwifi: rtl_usb: Fix missing entry in USB drivers private data Greg Kroah-Hartman
2017-01-13 20:20   ` [PATCH 4.9 00/59] 4.9.4-stable review Guenter Roeck
2017-01-14  7:23     ` Greg Kroah-Hartman
2017-01-13 21:58   ` Shuah Khan
2017-01-14  7:24     ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170113113839.604917522@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shahark@mellanox.com \
    --cc=stable@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.