All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Shahar Klein <shahark@mellanox.com>,
	Cong Wang <xiyou.wangcong@gmail.com>
Cc: Or Gerlitz <gerlitz.or@gmail.com>,
	Linux Netdev List <netdev@vger.kernel.org>,
	Roi Dayan <roid@mellanox.com>, David Miller <davem@davemloft.net>,
	Jiri Pirko <jiri@mellanox.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Hadar Hen Zion <hadarh@mellanox.com>
Subject: Re: Soft lockup in tc_classify
Date: Wed, 21 Dec 2016 14:18:21 +0100	[thread overview]
Message-ID: <585A811D.1080202@iogearbox.net> (raw)
In-Reply-To: <cb55ede9-3e6e-8e68-3005-8bdb2105122e@mellanox.com>

On 12/21/2016 01:58 PM, Shahar Klein wrote:
> On 12/21/2016 12:15 PM, Daniel Borkmann wrote:
>> On 12/21/2016 08:03 AM, Cong Wang wrote:
>>> On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein <shahark@mellanox.com>
>>> wrote:
>> [...]
>>> Looks like you added a debug printk inside tcf_destroy() too,
>>> which seems racy with filter creation, it should not happen since
>>> in both cases we take RTNL lock.
>>>
>>> Don't know if changing all RCU_INIT_POINTER in that file to
>>> rcu_assign_pointer could help anything or not. Mind to try?
>>
>> I don't think at this point that it's RCU related at all.
>>
>> I have a theory on what is happening. Quoting the piece in question from
>> Shahar's log:
>>
>>  1: thread-2845[cpu-1] setting tp_created to 1 tp=ffff94b5b0280780
>> back=ffff94b9ea932060
>>  2: thread-2856[cpu-1] setting tp_created to 1 tp=ffff94b9ea9322a0
>> back=ffff94b9ea932060
>>  3: thread-2843[cpu-1] setting tp_created to 1 tp=ffff94b5b402c960
>> back=ffff94b9ea932060
>>  4: destroy ffff94b5b669fea0 tcf_destroy:1905
>>  5: thread-2853[cpu-1] setting tp_created to 1 tp=ffff94b5b02805a0
>> back=ffff94b9ea932060
>>  6: thread-2853[cpu-1] add/change filter by: fl_get [cls_flower]
>> tp=ffff94b5b02805a0 tp->next=ffff94b9ea932060
>>  7: destroy ffff94b5b0280780 tcf_destroy:1905
>>  8: thread-2845[cpu-1] add/change filter by: fl_get [cls_flower]
>> tp=ffff94b5b02805a0 tp->next=ffff94b5b02805a0
>>
>> The interesting thing is that all this happens on CPU1, so as you say
>> we're under rtnl.
>> In 1), thread-2845 creates tp=ffff94b5b0280780, which is destroyed in
>> 7), presumably also
>> by thread-2845, and the weird part is why suddenly in 8) thread-2845
>> adds a created filter
>> without actually creating it. Plus, thread-2845 got interrupted, which
>> means it must have
>> dropped rntl in the middle. We drop it in tc_ctl_tfilter() when we do
>> tcf_proto_lookup_ops()
>> and need to pull in a module, but here this doesn't make sense at all
>> since i) at this
>> point we haven't created the tp yet and 2) flower was already there.
>> Thus the only explanation
>> where this must have happened is where we called tp->ops->change(). So
>> here the return
>> code must have been -EAGAIN, which makes sense because in 7) we
>> destroyed that specific
>> tp instance. Which means we goto replay but *do not* clear tp_created. I
>> think that is
>> the bug in question. So, while we dropped rtnl in the meantime, some
>> other tp instance
>> was added (tp=ffff94b5b02805a0) that we had a match on in round 2, but
>> we still think it
>> was newly created which wasn't the actual case. So we'd need to deal
>> with the fact that
>> ->change() callback could return -EAGAIN as well. Now looking at flower,
>> I think the call
>> chain must have been fl_change() -> fl_set_parms() ->
>> tcf_exts_validate() -> tcf_action_init()
>> -> tcf_action_init_1(). And here one possibility I see is that
>> tc_lookup_action_n()
>> failed, therefore we shortly dropped rtnl for the request_module() where
>> the module
>> got loaded successfully and thus error code from there is -EAGAIN that
>> got propagated
>> all the way through ->change() from tc_ctl_tfilter(). So it looks like a
>> generic issue
>> not specifically tied to flower.
>>
>> Shahar, can you test the following? Thanks!
>>
>>  net/sched/cls_api.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 3fbba79..1ecdf80 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb,
>> struct nlmsghdr *n)
>>      unsigned long cl;
>>      unsigned long fh;
>>      int err;
>> -    int tp_created = 0;
>> +    int tp_created;
>>
>>      if ((n->nlmsg_type != RTM_GETTFILTER) &&
>>          !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
>>          return -EPERM;
>>
>>  replay:
>> +    tp_created = 0;
>> +
>>      err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
>>      if (err < 0)
>>          return err;
>
> Test run successfully!
> I'll remove all other debug "fixes" and run again later
>
> Thanks Daniel!

Great, thanks for confirming so far Shahar!
I'll cook an official patch in the meantime.

  reply	other threads:[~2016-12-21 13:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <c1c394d3-3aea-52a8-89e3-be57d4d46b8e@mellanox.com>
2016-12-12  9:43 ` Soft lockup in tc_classify Shahar Klein
2016-12-12 13:28   ` Daniel Borkmann
2016-12-12 16:04     ` Shahar Klein
2016-12-12 19:07       ` Cong Wang
2016-12-13 11:59         ` Shahar Klein
2016-12-12 21:18     ` Or Gerlitz
2016-12-12 22:51       ` Cong Wang
2016-12-19 16:39         ` Shahar Klein
2016-12-19 17:58           ` Cong Wang
2016-12-20  6:22             ` Shahar Klein
2016-12-20 11:47               ` Daniel Borkmann
2016-12-21  6:44                 ` Shahar Klein
2016-12-21  7:03                   ` Cong Wang
2016-12-21 10:15                     ` Daniel Borkmann
2016-12-21 12:58                       ` Shahar Klein
2016-12-21 13:18                         ` Daniel Borkmann [this message]
2016-12-21 11:25                     ` Shahar Klein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=585A811D.1080202@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=hadarh@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=roid@mellanox.com \
    --cc=shahark@mellanox.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.