All of lore.kernel.org
 help / color / mirror / Atom feed
* RFH: problems with adjacency graph
@ 2016-10-11  2:18 David Ahern
  2016-10-11  6:54 ` Jiri Pirko
  0 siblings, 1 reply; 3+ messages in thread
From: David Ahern @ 2016-10-11  2:18 UTC (permalink / raw)
  To: Jiri Pirko, vfalico, Nikolay Aleksandrov, roopa; +Cc: netdev

Jiri / Veaceslav:

As author's of the adjacency tracking code in dev.c I am hoping you can help with suggested patches for a couple of problems. The start point needs to include commit 93409033ae65 which resolved a different problem from what I am seeing now.

At the moment I have 2 cases both for this topology:
        +--------+
        |  myvrf |
        +--------+
          |    |
          |  +---------+
          |  | macvlan |
          |  +---------+
          |    |
      +----------+
      |  bridge  |
      +----------+
          |
      +--------+
      | bond0  |
      +--------+
          |
      +--------+
      |  eth3  |
      +--------+


Base set of commands for both cases:

ip link add bond1 type bond
ip link set bond1 up
ip link set eth3 down
ip link set eth3 master bond1
ip link set eth3 up

ip link add bridge type bridge
ip link set bridge up
ip link add macvlan link bridge type macvlan
ip link set macvlan up

ip link add myvrf type vrf table 1234
ip link set myvrf up

ip link set bridge master myvrf


############################################################
# case 1

ip link set macvlan master myvrf
ip link set bond1 master bridge

ip link delete myvrf

dmesg has a splat triggered in __netdev_adjacent_dev_remove() where you currently see the BUG(). If you convert that to a WARN_ON (which it should be, no need to panic on the remove path) it will show you 4 missing adjacencies: eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1. All of those are because the dev_link function does not link macvlan lower devices to myvrf when it is enslaved. (Enable the debugging to see that those messages are missing.)



############################################################
# case 2

This case just flips the ordering of the enslavements:

ip link set bond1 master bridge
ip link set macvlan master myvrf

Then run:
ip link delete bond1
ip link delete myvrf

The last command hangs because myvrf has a reference that has not been released. If you do not have commit 93409033ae65 the delete of bond1 hangs for the same reason. For this case, the debug messages show that the macvlan lower devices (eth3 and bond1) are connected to myvrf on the enslavement, but the link delete the path only removes one of them hence the unreleased refcnt on myvrf.


In the end it seems that the code for the dependency graph is not making the complete mesh which causes problems on the tear down. I have attempted a few changes that so far fix 1 problem and uncover a different one. Hence the request for help from the author's of this code.

It seems like the complete mesh is not really needed, but cscope shows spectrum, ixgbe and bonding all using the for_each upper and lower device macros.

Suggestions?

David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFH: problems with adjacency graph
  2016-10-11  2:18 RFH: problems with adjacency graph David Ahern
@ 2016-10-11  6:54 ` Jiri Pirko
  2016-10-11 19:08   ` David Ahern
  0 siblings, 1 reply; 3+ messages in thread
From: Jiri Pirko @ 2016-10-11  6:54 UTC (permalink / raw)
  To: David Ahern; +Cc: vfalico, Nikolay Aleksandrov, roopa, netdev

Tue, Oct 11, 2016 at 04:18:52AM CEST, dsahern@gmail.com wrote:
>Jiri / Veaceslav:
>
>As author's of the adjacency tracking code in dev.c I am hoping you can help with suggested patches for a couple of problems. The start point needs to include commit 93409033ae65 which resolved a different problem from what I am seeing now.
>
>At the moment I have 2 cases both for this topology:
>        +--------+
>        |  myvrf |
>        +--------+
>          |    |
>          |  +---------+
>          |  | macvlan |
>          |  +---------+
>          |    |
>      +----------+
>      |  bridge  |
>      +----------+
>          |
>      +--------+
>      | bond0  |
>      +--------+
>          |
>      +--------+
>      |  eth3  |
>      +--------+
>
>
>Base set of commands for both cases:
>
>ip link add bond1 type bond
>ip link set bond1 up
>ip link set eth3 down
>ip link set eth3 master bond1
>ip link set eth3 up
>
>ip link add bridge type bridge
>ip link set bridge up
>ip link add macvlan link bridge type macvlan
>ip link set macvlan up
>
>ip link add myvrf type vrf table 1234
>ip link set myvrf up
>
>ip link set bridge master myvrf
>
>
>############################################################
># case 1
>
>ip link set macvlan master myvrf
>ip link set bond1 master bridge
>
>ip link delete myvrf
>
>dmesg has a splat triggered in __netdev_adjacent_dev_remove() where you currently see the BUG(). If you convert that to a WARN_ON (which it should be, no need to panic on the remove path) it will show you 4 missing adjacencies: eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1. All of those are because the dev_link function does not link macvlan lower devices to myvrf when it is enslaved. (Enable the debugging to see that those messages are missing.)
>
>
>
>############################################################
># case 2
>
>This case just flips the ordering of the enslavements:
>
>ip link set bond1 master bridge
>ip link set macvlan master myvrf
>
>Then run:
>ip link delete bond1
>ip link delete myvrf
>
>The last command hangs because myvrf has a reference that has not been released. If you do not have commit 93409033ae65 the delete of bond1 hangs for the same reason. For this case, the debug messages show that the macvlan lower devices (eth3 and bond1) are connected to myvrf on the enslavement, but the link delete the path only removes one of them hence the unreleased refcnt on myvrf.
>
>
>In the end it seems that the code for the dependency graph is not making the complete mesh which causes problems on the tear down. I have attempted a few changes that so far fix 1 problem and uncover a different one. Hence the request for help from the author's of this code.

Agreed. We need to fix the code to work with duplicates so the graph is
complete.


>
>It seems like the complete mesh is not really needed, but cscope shows spectrum, ixgbe and bonding all using the for_each upper and lower device macros.
>
>Suggestions?

Well other possibility is to traverse the tree recursively. But that is
exactly why the colided lists of all uppers/lowers were introduced to
avoid this.


>
>David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFH: problems with adjacency graph
  2016-10-11  6:54 ` Jiri Pirko
@ 2016-10-11 19:08   ` David Ahern
  0 siblings, 0 replies; 3+ messages in thread
From: David Ahern @ 2016-10-11 19:08 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: vfalico, Nikolay Aleksandrov, roopa, netdev

On 10/11/16 12:54 AM, Jiri Pirko wrote:
>>
>> It seems like the complete mesh is not really needed, but cscope shows spectrum, ixgbe and bonding all using the for_each upper and lower device macros.
>>
>> Suggestions?
> 
> Well other possibility is to traverse the tree recursively. But that is
> exactly why the colided lists of all uppers/lowers were introduced to
> avoid this.

The simpler approach is to remove all_adj_list completely and iteratively walk the lower and upper lists in adj_list for the macros that walk all lower and all upper devices. Maintaining a complete linear list per netdev of all lower devices and all upper devices is just not going to work given all combinations of enslave orderings and considering stacks as high as 6 deep. As far as I can tell the netdev_for_each_all_lower_dev, netdev_for_each_all_lower_dev_rcu, and netdev_for_each_all_upper_dev_rcu macros are all used in the slow path so it should be ok. Removing all_adj_list significantly simplifies the dev insert and remove code.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-10-11 19:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-11  2:18 RFH: problems with adjacency graph David Ahern
2016-10-11  6:54 ` Jiri Pirko
2016-10-11 19:08   ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.