Corrupted SKB

* Corrupted SKB
@ 2017-04-18  0:39 Michael Ma
  2017-04-18 23:12 ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Ma @ 2017-04-18  0:39 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Hi -

We've implemented a "glue" qdisc similar to mqprio which can associate
one qdisc to multiple txqs as the root qdisc. Reference count of the
child qdiscs have been adjusted properly in this case so that it
represents the number of txqs it has been attached to. However when
sending packets we saw the skb from dequeue_skb() corrupted with the
following call stack:

    [exception RIP: netif_skb_features+51]
    RIP: ffffffff815292b3  RSP: ffff8817f6987940  RFLAGS: 00010246

 #9 [ffff8817f6987968] validate_xmit_skb at ffffffff815294aa
#10 [ffff8817f69879a0] validate_xmit_skb at ffffffff8152a0d9
#11 [ffff8817f69879b0] __qdisc_run at ffffffff8154a193
#12 [ffff8817f6987a00] dev_queue_xmit at ffffffff81529e03

It looks like the skb has already been released since its dev pointer
field is invalid.

Any clue on how this can be investigated further? My current thought
is to add some instrumentation to the place where skb is released and
analyze whether there is any race condition happening there. However
by looking through the existing code I think the case where one root
qdisc is associated with multiple txqs already exists (when mqprio is
not used) so not sure why it won't work when we group txqs and assign
each group a root qdisc. Any insight on this issue would be much
appreciated!

Thanks,
Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread