linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Shahar Klein <shahark@mellanox.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Amit Pundir <amit.pundir@linaro.org>
Subject: [PATCH 3.18 30/50] net, sched: fix soft lockup in tc_classify
Date: Fri,  4 Aug 2017 16:16:16 -0700	[thread overview]
Message-ID: <20170804231552.816592509@linuxfoundation.org> (raw)
In-Reply-To: <20170804231550.830518786@linuxfoundation.org>

3.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

commit 628185cfddf1dfb701c4efe2cfd72cf5b09f5702 upstream.

Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/sched/cls_api.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -137,13 +137,15 @@ static int tc_ctl_tfilter(struct sk_buff
 	unsigned long cl;
 	unsigned long fh;
 	int err;
-	int tp_created = 0;
+	int tp_created;
 
 	if ((n->nlmsg_type != RTM_GETTFILTER) &&
 	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 replay:
+	tp_created = 0;
+
 	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
 	if (err < 0)
 		return err;

  parent reply	other threads:[~2017-08-04 23:35 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-04 23:15 [PATCH 3.18 00/50] 3.18.64-stable review Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 01/50] af_key: Add lock to key dump Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 02/50] pstore: Make spinlock per zone instead of global Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 03/50] net: reduce skb_warn_bad_offload() noise Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 04/50] powerpc/pseries: Fix of_node_put() underflow during reconfig remove Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 05/50] md/raid5: add thread_group worker async_tx_issue_pending_all Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 06/50] drm/vmwgfx: Fix gcc-7.1.1 warning Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 07/50] KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 08/50] KVM: PPC: Book3S HV: Reload HTM registers explicitly Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 09/50] KVM: PPC: Book3S HV: Save/restore host values of debug registers Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 10/50] Revert "powerpc/numa: Fix percpu allocations to be NUMA aware" Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 11/50] Staging: comedi: comedi_fops: Avoid orphaned proc entry Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 12/50] Bluetooth: bnep: bnep_add_connection() should verify that its dealing with l2cap socket Greg Kroah-Hartman
2017-08-04 23:15 ` [PATCH 3.18 13/50] Bluetooth: Fix potential NULL dereference Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 14/50] Bluetooth: cmtp: cmtp_add_connection() should verify that its dealing with l2cap socket Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 15/50] net: phy: Do not perform software reset for Generic PHY Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 16/50] isdn: Fix a sleep-in-atomic bug Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 17/50] string: provide strscpy() Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 18/50] strscpy: zero any trailing garbage bytes in the destination Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 19/50] isdn/i4l: fix buffer overflow Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 20/50] wil6210: fix deadlock when using fw_no_recovery option Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 21/50] mailbox: always wait in mbox_send_message for blocking Tx mode Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 22/50] mailbox: skip complete wait event if timer expired Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 23/50] mailbox: handle empty message in tx_tick Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 24/50] mpt3sas: Dont overreach ioc->reply_post[] during initialization Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 25/50] kaweth: fix firmware download Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 26/50] kaweth: fix oops upon failed memory allocation Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 27/50] ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 28/50] net: sctp: fix race for one-to-many sockets in sendmsgs auto associate Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 29/50] sh_eth: Fix ethtool operation crash when net device is down Greg Kroah-Hartman
2017-08-04 23:16 ` Greg Kroah-Hartman [this message]
2017-08-04 23:16 ` [PATCH 3.18 31/50] ipmi/watchdog: fix watchdog timeout set on reboot Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 32/50] dentry name snapshots Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 33/50] [media] v4l: s5c73m3: fix negation operator Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 34/50] pstore: Allow prz to control need for locking Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 35/50] pstore: Correctly initialize spinlock and flags Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 36/50] pstore: Use dynamic spinlock initializer Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 37/50] net: skb_needs_check() accepts CHECKSUM_NONE for tx Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 38/50] tpm: fix a kernel memory leak in tpm-sysfs.c Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 39/50] x86/mce/AMD: Make the init code more robust Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 40/50] r8169: add support for RTL8168 series add-on card Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 42/50] ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 43/50] net/mlx4: Remove BUG_ON from ICM allocation routine Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 44/50] drm/msm: Ensure that the hardware write pointer is valid Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 45/50] drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 46/50] vfio-pci: use 32-bit comparisons for register address for gcc-4.5 Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 47/50] ASoC: tlv320aic3x: Mark the RESET register as volatile Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 48/50] spi: dw: Make debugfs name unique between instances Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 49/50] vlan: Propagate MAC address to VLANs Greg Kroah-Hartman
2017-08-04 23:16 ` [PATCH 3.18 50/50] xfrm: Dont use sk_family for socket policy lookups Greg Kroah-Hartman
2017-08-05  1:43 ` [PATCH 3.18 00/50] 3.18.64-stable review Guenter Roeck
2017-08-05  2:46   ` Greg Kroah-Hartman
2017-08-05  2:51     ` Greg Kroah-Hartman
2017-08-05  3:00       ` Greg Kroah-Hartman
2017-08-05  4:01         ` Guenter Roeck
2017-08-05 15:43           ` Greg Kroah-Hartman
2017-08-05  5:55       ` Willy Tarreau
2017-08-05  6:02         ` Willy Tarreau
2017-08-05 15:43           ` Greg Kroah-Hartman
2017-08-05 19:11             ` Guenter Roeck
2017-08-07 19:34               ` Greg Kroah-Hartman
2017-08-08  4:11                 ` Guenter Roeck
2017-08-05  3:57     ` Guenter Roeck
2017-08-05 15:45       ` Greg Kroah-Hartman
2017-08-05  1:52 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170804231552.816592509@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=amit.pundir@linaro.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shahark@mellanox.com \
    --cc=stable@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).