On 12/19/2016 7:58 PM, Cong Wang wrote: > Hello, > > On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein wrote: >> >> >> On 12/13/2016 12:51 AM, Cong Wang wrote: >>> >>> On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz wrote: >>>> >>>> On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann >>>> wrote: >>>> >>>>> Note that there's still the RCU fix missing for the deletion race that >>>>> Cong will still send out, but you say that the only thing you do is to >>>>> add a single rule, but no other operation in involved during that test? >>>> >>>> >>>> What's missing to have the deletion race fixed? making a patch or >>>> testing to a patch which was sent? >>> >>> >>> If you think it would help for this problem, here is my patch rebased >>> on the latest net-next. >>> >>> Again, I don't see how it could help this case yet, especially I don't >>> see how we could have a loop in this singly linked list. >>> >> >> I've applied cong's patch and hit a different lockup(full log attached): > > > Are you sure this is really different? For me, it is still inside the loop > in tc_classify(), with only a slightly different offset. > > >> >> Daniel suggested I'll add a print: >> case RTM_DELTFILTER: >> - err = tp->ops->delete(tp, fh); >> + printk(KERN_ERR "DEBUGG:SK %s:%d\n", __func__, __LINE__); >> + err = tp->ops->delete(tp, fh, &last); >> if (err == 0) { >> >> and I couldn't see this print in the output..... > > Hmm, that is odd, if this never prints, then my patch should not make any > difference. > > There are still two other cases where we could change tp->next, so do you > mind to add two more printk's for debugging? > > Attached is the delta patch. > > Thanks! > I've added a slightly different debug print: @@ -368,11 +375,12 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n) if (tp_created) { RCU_INIT_POINTER(tp->next, rtnl_dereference(*back)); rcu_assign_pointer(*back, tp); + printk(KERN_ERR "DEBUGG:SK add/change filter by: %pf tp=%p tp->next=%p\n", tp->ops->get, tp, tp->next); } tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false); full output attached: [ 283.290271] Mirror/redirect action on [ 283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9432d704df60 tp->next= (null) [ 283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d240 tp->next= (null) [ 283.359997] GACT probability on [ 283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d240 [ 283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0 [ 283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0 [ 283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0 [ 283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0 [ 308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] Thanks Shahar