All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-07 20:35 ` Yaron Haviv
  0 siblings, 0 replies; 52+ messages in thread
From: Yaron Haviv @ 2009-08-07 20:35 UTC (permalink / raw)
  To: evb, shemminger, anna.fischer
  Cc: arnd, davem, netdev, bridge, adobriyan, virtualization


[-- Attachment #1.1: Type: text/plain, Size: 9517 bytes --]

Paul,

I also think that bridge may not be the right place for VEPA, but rather a simpler sw/hw mux 
Although the VEPA support may reside in multiple places (I.e. also in the bridge)

As Arnd pointed out Or already added an extension to qemu that allow direct guest virtual NIC mapping to an interface device (vs using tap), this was done specifically to address VEPA, and result in much faster performance and lower cpu overhead (Or and some others are planning additional meaningful performance optimizations) 

The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

It may be counter intuitive for some, but we expect the (completed) qemu VEPA mode + SR-IOV + certain switches with hairpin (vepa) mode to perform faster than using bridge+tap even for connecting 2 VMs on the same host


Yaron 

Sent from BlackBerry

________________________________

From: evb@yahoogroups.com 
To: 'Stephen Hemminger' ; 'Fischer, Anna' 
Cc: bridge@lists.linux-foundation.org ; linux-kernel@vger.kernel.org ; netdev@vger.kernel.org ; virtualization@lists.linux-foundation.org ; evb@yahoogroups.com ; davem@davemloft.net ; kaber@trash.net ; adobriyan@gmail.com ; 'Arnd Bergmann' 
Sent: Fri Aug 07 21:58:00 2009
Subject: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support 


  

> 
> After reading more about this, I am not convinced this should be part 
> of the bridge code. The bridge code really consists of two parts:
> forwarding table and optional spanning tree. Well the VEPA code short 
> circuits both of these; it can't imagine it working with STP turned 
> on. The only part of bridge code that really gets used by this are the 
> receive packet hooks and the crufty old API.
> 
> So instead of adding more stuff to existing bridge code, why not have 
> a new driver for just VEPA. You could do it with a simple version of 
> macvlan type driver.

Stephen,

Thanks for your comments and questions. We do believe the bridge code is
the right place for this, so I'd like to embellish on that a bit more to
help persuade you. Sorry for the long winded response, but here are some
thoughts:

- First and foremost, VEPA is going to be a standard addition to the IEEE
802.1Q specification. The working group agreed at the last meeting to
pursue a project to augment the bridge standard with hairpin mode (aka
reflective relay) and a remote filtering service (VEPA). See for details:
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709 <http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709> 
-v01.pdf

- The VEPA functionality was really a pretty small change to the code with
low risk and wouldn't seem to warrant an entire new driver or module.

- There are good use cases where VMs will want to have some of their
interfaces attached to bridges and others to bridges operating in VEPA mode.
In other words, we see simultaneous operation of the bridge code and VEPA
occurring, so having as much of the underlying code as common as possible
would seem to be beneficial. 

- By augmenting the bridge code with VEPA there is a great amount of re-use
achieved. It works wherever the bridge code works and doesn't need anything
special to support KVM, XEN, and all the hooks, etc...

- The hardware vendors building SR-IOV NICs with embedded switches will be
adding VEPA mode, so by keeping the bridge module in sync would be
consistent with this trend and direction. It will be possible to extend the
hardware implementations by cascading a software bridge and/or VEPA, so
being in sync with the architecture would make this more consistent.

- The forwarding table is still needed and used on inbound traffic to
deliver frames to the correct virtual interfaces and to filter any reflected
frames. A new driver would have to basically implement an equivalent
forwarding table anyway. As I understand the current macvlan type driver,
it wouldn't filter multicast frames properly without such a table.

- It seems the hairpin mode would be needed in the bridge module whether
VEPA was added to the bridge module or a new driver. Having the associated
changes together in the same code could aid in understanding and deployment.

As I understand the macvlan code, it currently doesn't allow two VMs on the
same machine to communicate with one another. I could imagine a hairpin
mode on the adjacent bridge making this possible, but the macvlan code would
need to be updated to filter reflected frames so a source did not receive
his own packet. I could imagine this being done as well, but to also
support selective multicast usage, something similar to the bridge
forwarding table would be needed. I think putting VEPA into a new driver
would cause you to implement many things the bridge code already supports.
Given that we expect the bridge standard to ultimately include VEPA, and the
new functions are basic forwarding operations, it seems to make most sense
to keep this consistent with the bridge module.

Paul



__._,_.___
Messages in this topic <http://groups.yahoo.com/group/evb/message/167;_ylc=X3oDMTMzb3FibzIzBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARtc2dJZAMyMTIEc2VjA2Z0cgRzbGsDdnRwYwRzdGltZQMxMjQ5NjcxNTEwBHRwY0lkAzE2Nw--> (9) Reply (via web post) <http://groups.yahoo.com/group/evb/post;_ylc=X3oDMTJwcDZzNTZqBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARtc2dJZAMyMTIEc2VjA2Z0cgRzbGsDcnBseQRzdGltZQMxMjQ5NjcxNTEw?act=reply&messageNum=212> | Start a new topic <http://groups.yahoo.com/group/evb/post;_ylc=X3oDMTJmZW52ZmhiBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNudHBjBHN0aW1lAzEyNDk2NzE1MTA-> 
Messages <http://groups.yahoo.com/group/evb/messages;_ylc=X3oDMTJmODJ0MmU2BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNtc2dzBHN0aW1lAzEyNDk2NzE1MTA->  | Files <http://groups.yahoo.com/group/evb/files;_ylc=X3oDMTJnaGdsYXI4BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNmaWxlcwRzdGltZQMxMjQ5NjcxNTEw>  | Photos <http://groups.yahoo.com/group/evb/photos;_ylc=X3oDMTJmYm90MmpqBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNwaG90BHN0aW1lAzEyNDk2NzE1MTA->  | Links <http://groups.yahoo.com/group/evb/links;_ylc=X3oDMTJnbWdyaTZnBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNsaW5rcwRzdGltZQMxMjQ5NjcxNTEw>  | Database <http://groups.yahoo.com/group/evb/database;_ylc=X3oDMTJkZW1ka3FhBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNkYgRzdGltZQMxMjQ5NjcxNTEw>  | Polls <http://groups.yahoo.com/group/evb/polls;_ylc=X3oDMTJnMG9lZTJuBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNwb2xscwRzdGltZQMxMjQ5NjcxNTEw>  | Members <http://groups.yahoo.com/group/evb/members;_ylc=X3oDMTJmMWdwYXViBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNtYnJzBHN0aW1lAzEyNDk2NzE1MTA->  | Calendar <http://groups.yahoo.com/group/evb/calendar;_ylc=X3oDMTJlZnQ1N25iBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNjYWwEc3RpbWUDMTI0OTY3MTUxMA-->  
Yahoo! Groups <http://groups.yahoo.com/;_ylc=X3oDMTJlNDhoZDY1BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNnZnAEc3RpbWUDMTI0OTY3MTUxMA-->  
Change settings via the Web <http://groups.yahoo.com/group/evb/join;_ylc=X3oDMTJna2g4aW9zBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjQ5NjcxNTEw>  (Yahoo! ID required) 
Change settings via email: Switch delivery to Daily Digest <mailto:evb-digest@yahoogroups.com?subject=Email Delivery: Digest>  | Switch format to Traditional <mailto:evb-traditional@yahoogroups.com?subject=Change Delivery Format: Traditional>  
Visit Your Group <http://groups.yahoo.com/group/evb;_ylc=X3oDMTJlN2ZwMTRxBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI0OTY3MTUxMA--> | Yahoo! Groups Terms of Use <http://docs.yahoo.com/info/terms/> | Unsubscribe <mailto:evb-unsubscribe@yahoogroups.com?subject=> 
Recent Activity

Visit Your Group <http://groups.yahoo.com/group/evb;_ylc=X3oDMTJmYW91dGs2BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDdnRsBHNsawN2Z2hwBHN0aW1lAzEyNDk2NzE1MTA-> 
Give Back

Yahoo! for Good <http://us.lrd.yahoo.com/_ylc=X3oDMTJuam45aG04BF9TAzk3MzU5NzE0BF9wAzEEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDYnJhbmQEc3RpbWUDMTI0OTY3MTUxMA--;_ylg=1/SIG=11314uv3k/**http%3A//brand.yahoo.com/forgood> 

Get inspired

by a good cause.

Y! Toolbar

Get it Free! <http://us.lrd.yahoo.com/_ylc=X3oDMTJwbGY0NzUzBF9TAzk3MzU5NzE0BF9wAzIEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDdG9vbGJhcgRzdGltZQMxMjQ5NjcxNTEw;_ylg=1/SIG=11c6dvmk9/**http%3A//toolbar.yahoo.com/%3F.cpdl=ygrps> 

easy 1-click access

to your groups.

Yahoo! Groups

Start a group <http://groups.yahoo.com/start;_ylc=X3oDMTJwdjNqdTNiBF9TAzk3MzU5NzE0BF9wAzMEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDZ3JvdXBzMgRzdGltZQMxMjQ5NjcxNTEw> 

in 3 easy steps.

Connect with others.

.
 <http://geo.yahoo.com/serv?s=97359714/grpId=23896447/grpspId=1705004750/msgId=212/stime=1249671510/nc1=1/nc2=2/nc3=3> 

__,_._,___

[-- Attachment #1.2: Type: text/html, Size: 18843 bytes --]

[-- Attachment #2: Type: text/plain, Size: 184 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-07 20:35 ` Yaron Haviv
  0 siblings, 0 replies; 52+ messages in thread
From: Yaron Haviv @ 2009-08-07 20:35 UTC (permalink / raw)
  To: evb, shemminger, anna.fischer
  Cc: arnd, davem, netdev, bridge, adobriyan, virtualization


[-- Attachment #1.1: Type: text/plain, Size: 9517 bytes --]

Paul,

I also think that bridge may not be the right place for VEPA, but rather a simpler sw/hw mux 
Although the VEPA support may reside in multiple places (I.e. also in the bridge)

As Arnd pointed out Or already added an extension to qemu that allow direct guest virtual NIC mapping to an interface device (vs using tap), this was done specifically to address VEPA, and result in much faster performance and lower cpu overhead (Or and some others are planning additional meaningful performance optimizations) 

The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

It may be counter intuitive for some, but we expect the (completed) qemu VEPA mode + SR-IOV + certain switches with hairpin (vepa) mode to perform faster than using bridge+tap even for connecting 2 VMs on the same host


Yaron 

Sent from BlackBerry

________________________________

From: evb@yahoogroups.com 
To: 'Stephen Hemminger' ; 'Fischer, Anna' 
Cc: bridge@lists.linux-foundation.org ; linux-kernel@vger.kernel.org ; netdev@vger.kernel.org ; virtualization@lists.linux-foundation.org ; evb@yahoogroups.com ; davem@davemloft.net ; kaber@trash.net ; adobriyan@gmail.com ; 'Arnd Bergmann' 
Sent: Fri Aug 07 21:58:00 2009
Subject: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support 


  

> 
> After reading more about this, I am not convinced this should be part 
> of the bridge code. The bridge code really consists of two parts:
> forwarding table and optional spanning tree. Well the VEPA code short 
> circuits both of these; it can't imagine it working with STP turned 
> on. The only part of bridge code that really gets used by this are the 
> receive packet hooks and the crufty old API.
> 
> So instead of adding more stuff to existing bridge code, why not have 
> a new driver for just VEPA. You could do it with a simple version of 
> macvlan type driver.

Stephen,

Thanks for your comments and questions. We do believe the bridge code is
the right place for this, so I'd like to embellish on that a bit more to
help persuade you. Sorry for the long winded response, but here are some
thoughts:

- First and foremost, VEPA is going to be a standard addition to the IEEE
802.1Q specification. The working group agreed at the last meeting to
pursue a project to augment the bridge standard with hairpin mode (aka
reflective relay) and a remote filtering service (VEPA). See for details:
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709 <http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709> 
-v01.pdf

- The VEPA functionality was really a pretty small change to the code with
low risk and wouldn't seem to warrant an entire new driver or module.

- There are good use cases where VMs will want to have some of their
interfaces attached to bridges and others to bridges operating in VEPA mode.
In other words, we see simultaneous operation of the bridge code and VEPA
occurring, so having as much of the underlying code as common as possible
would seem to be beneficial. 

- By augmenting the bridge code with VEPA there is a great amount of re-use
achieved. It works wherever the bridge code works and doesn't need anything
special to support KVM, XEN, and all the hooks, etc...

- The hardware vendors building SR-IOV NICs with embedded switches will be
adding VEPA mode, so by keeping the bridge module in sync would be
consistent with this trend and direction. It will be possible to extend the
hardware implementations by cascading a software bridge and/or VEPA, so
being in sync with the architecture would make this more consistent.

- The forwarding table is still needed and used on inbound traffic to
deliver frames to the correct virtual interfaces and to filter any reflected
frames. A new driver would have to basically implement an equivalent
forwarding table anyway. As I understand the current macvlan type driver,
it wouldn't filter multicast frames properly without such a table.

- It seems the hairpin mode would be needed in the bridge module whether
VEPA was added to the bridge module or a new driver. Having the associated
changes together in the same code could aid in understanding and deployment.

As I understand the macvlan code, it currently doesn't allow two VMs on the
same machine to communicate with one another. I could imagine a hairpin
mode on the adjacent bridge making this possible, but the macvlan code would
need to be updated to filter reflected frames so a source did not receive
his own packet. I could imagine this being done as well, but to also
support selective multicast usage, something similar to the bridge
forwarding table would be needed. I think putting VEPA into a new driver
would cause you to implement many things the bridge code already supports.
Given that we expect the bridge standard to ultimately include VEPA, and the
new functions are basic forwarding operations, it seems to make most sense
to keep this consistent with the bridge module.

Paul



__._,_.___
Messages in this topic <http://groups.yahoo.com/group/evb/message/167;_ylc=X3oDMTMzb3FibzIzBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARtc2dJZAMyMTIEc2VjA2Z0cgRzbGsDdnRwYwRzdGltZQMxMjQ5NjcxNTEwBHRwY0lkAzE2Nw--> (9) Reply (via web post) <http://groups.yahoo.com/group/evb/post;_ylc=X3oDMTJwcDZzNTZqBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARtc2dJZAMyMTIEc2VjA2Z0cgRzbGsDcnBseQRzdGltZQMxMjQ5NjcxNTEw?act=reply&messageNum=212> | Start a new topic <http://groups.yahoo.com/group/evb/post;_ylc=X3oDMTJmZW52ZmhiBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNudHBjBHN0aW1lAzEyNDk2NzE1MTA-> 
Messages <http://groups.yahoo.com/group/evb/messages;_ylc=X3oDMTJmODJ0MmU2BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNtc2dzBHN0aW1lAzEyNDk2NzE1MTA->  | Files <http://groups.yahoo.com/group/evb/files;_ylc=X3oDMTJnaGdsYXI4BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNmaWxlcwRzdGltZQMxMjQ5NjcxNTEw>  | Photos <http://groups.yahoo.com/group/evb/photos;_ylc=X3oDMTJmYm90MmpqBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNwaG90BHN0aW1lAzEyNDk2NzE1MTA->  | Links <http://groups.yahoo.com/group/evb/links;_ylc=X3oDMTJnbWdyaTZnBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNsaW5rcwRzdGltZQMxMjQ5NjcxNTEw>  | Database <http://groups.yahoo.com/group/evb/database;_ylc=X3oDMTJkZW1ka3FhBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNkYgRzdGltZQMxMjQ5NjcxNTEw>  | Polls <http://groups.yahoo.com/group/evb/polls;_ylc=X3oDMTJnMG9lZTJuBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNwb2xscwRzdGltZQMxMjQ5NjcxNTEw>  | Members <http://groups.yahoo.com/group/evb/members;_ylc=X3oDMTJmMWdwYXViBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNtYnJzBHN0aW1lAzEyNDk2NzE1MTA->  | Calendar <http://groups.yahoo.com/group/evb/calendar;_ylc=X3oDMTJlZnQ1N25iBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNjYWwEc3RpbWUDMTI0OTY3MTUxMA-->  
Yahoo! Groups <http://groups.yahoo.com/;_ylc=X3oDMTJlNDhoZDY1BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNnZnAEc3RpbWUDMTI0OTY3MTUxMA-->  
Change settings via the Web <http://groups.yahoo.com/group/evb/join;_ylc=X3oDMTJna2g4aW9zBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjQ5NjcxNTEw>  (Yahoo! ID required) 
Change settings via email: Switch delivery to Daily Digest <mailto:evb-digest@yahoogroups.com?subject=Email Delivery: Digest>  | Switch format to Traditional <mailto:evb-traditional@yahoogroups.com?subject=Change Delivery Format: Traditional>  
Visit Your Group <http://groups.yahoo.com/group/evb;_ylc=X3oDMTJlN2ZwMTRxBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI0OTY3MTUxMA--> | Yahoo! Groups Terms of Use <http://docs.yahoo.com/info/terms/> | Unsubscribe <mailto:evb-unsubscribe@yahoogroups.com?subject=> 
Recent Activity

Visit Your Group <http://groups.yahoo.com/group/evb;_ylc=X3oDMTJmYW91dGs2BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDdnRsBHNsawN2Z2hwBHN0aW1lAzEyNDk2NzE1MTA-> 
Give Back

Yahoo! for Good <http://us.lrd.yahoo.com/_ylc=X3oDMTJuam45aG04BF9TAzk3MzU5NzE0BF9wAzEEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDYnJhbmQEc3RpbWUDMTI0OTY3MTUxMA--;_ylg=1/SIG=11314uv3k/**http%3A//brand.yahoo.com/forgood> 

Get inspired

by a good cause.

Y! Toolbar

Get it Free! <http://us.lrd.yahoo.com/_ylc=X3oDMTJwbGY0NzUzBF9TAzk3MzU5NzE0BF9wAzIEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDdG9vbGJhcgRzdGltZQMxMjQ5NjcxNTEw;_ylg=1/SIG=11c6dvmk9/**http%3A//toolbar.yahoo.com/%3F.cpdl=ygrps> 

easy 1-click access

to your groups.

Yahoo! Groups

Start a group <http://groups.yahoo.com/start;_ylc=X3oDMTJwdjNqdTNiBF9TAzk3MzU5NzE0BF9wAzMEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDZ3JvdXBzMgRzdGltZQMxMjQ5NjcxNTEw> 

in 3 easy steps.

Connect with others.

.
 <http://geo.yahoo.com/serv?s=97359714/grpId=23896447/grpspId=1705004750/msgId=212/stime=1249671510/nc1=1/nc2=2/nc3=3> 

__,_._,___

[-- Attachment #1.2: Type: text/html, Size: 18843 bytes --]

[-- Attachment #2: Type: text/plain, Size: 184 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-07 20:35 ` Yaron Haviv
  0 siblings, 0 replies; 52+ messages in thread
From: Yaron Haviv @ 2009-08-07 20:35 UTC (permalink / raw)
  To: evb, shemminger, anna.fischer
  Cc: arnd, davem, netdev, bridge, adobriyan, virtualization

[-- Attachment #1: Type: text/plain, Size: 9517 bytes --]

Paul,

I also think that bridge may not be the right place for VEPA, but rather a simpler sw/hw mux 
Although the VEPA support may reside in multiple places (I.e. also in the bridge)

As Arnd pointed out Or already added an extension to qemu that allow direct guest virtual NIC mapping to an interface device (vs using tap), this was done specifically to address VEPA, and result in much faster performance and lower cpu overhead (Or and some others are planning additional meaningful performance optimizations) 

The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

It may be counter intuitive for some, but we expect the (completed) qemu VEPA mode + SR-IOV + certain switches with hairpin (vepa) mode to perform faster than using bridge+tap even for connecting 2 VMs on the same host


Yaron 

Sent from BlackBerry

________________________________

From: evb@yahoogroups.com 
To: 'Stephen Hemminger' ; 'Fischer, Anna' 
Cc: bridge@lists.linux-foundation.org ; linux-kernel@vger.kernel.org ; netdev@vger.kernel.org ; virtualization@lists.linux-foundation.org ; evb@yahoogroups.com ; davem@davemloft.net ; kaber@trash.net ; adobriyan@gmail.com ; 'Arnd Bergmann' 
Sent: Fri Aug 07 21:58:00 2009
Subject: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support 


  

> 
> After reading more about this, I am not convinced this should be part 
> of the bridge code. The bridge code really consists of two parts:
> forwarding table and optional spanning tree. Well the VEPA code short 
> circuits both of these; it can't imagine it working with STP turned 
> on. The only part of bridge code that really gets used by this are the 
> receive packet hooks and the crufty old API.
> 
> So instead of adding more stuff to existing bridge code, why not have 
> a new driver for just VEPA. You could do it with a simple version of 
> macvlan type driver.

Stephen,

Thanks for your comments and questions. We do believe the bridge code is
the right place for this, so I'd like to embellish on that a bit more to
help persuade you. Sorry for the long winded response, but here are some
thoughts:

- First and foremost, VEPA is going to be a standard addition to the IEEE
802.1Q specification. The working group agreed at the last meeting to
pursue a project to augment the bridge standard with hairpin mode (aka
reflective relay) and a remote filtering service (VEPA). See for details:
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709 <http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709> 
-v01.pdf

- The VEPA functionality was really a pretty small change to the code with
low risk and wouldn't seem to warrant an entire new driver or module.

- There are good use cases where VMs will want to have some of their
interfaces attached to bridges and others to bridges operating in VEPA mode.
In other words, we see simultaneous operation of the bridge code and VEPA
occurring, so having as much of the underlying code as common as possible
would seem to be beneficial. 

- By augmenting the bridge code with VEPA there is a great amount of re-use
achieved. It works wherever the bridge code works and doesn't need anything
special to support KVM, XEN, and all the hooks, etc...

- The hardware vendors building SR-IOV NICs with embedded switches will be
adding VEPA mode, so by keeping the bridge module in sync would be
consistent with this trend and direction. It will be possible to extend the
hardware implementations by cascading a software bridge and/or VEPA, so
being in sync with the architecture would make this more consistent.

- The forwarding table is still needed and used on inbound traffic to
deliver frames to the correct virtual interfaces and to filter any reflected
frames. A new driver would have to basically implement an equivalent
forwarding table anyway. As I understand the current macvlan type driver,
it wouldn't filter multicast frames properly without such a table.

- It seems the hairpin mode would be needed in the bridge module whether
VEPA was added to the bridge module or a new driver. Having the associated
changes together in the same code could aid in understanding and deployment.

As I understand the macvlan code, it currently doesn't allow two VMs on the
same machine to communicate with one another. I could imagine a hairpin
mode on the adjacent bridge making this possible, but the macvlan code would
need to be updated to filter reflected frames so a source did not receive
his own packet. I could imagine this being done as well, but to also
support selective multicast usage, something similar to the bridge
forwarding table would be needed. I think putting VEPA into a new driver
would cause you to implement many things the bridge code already supports.
Given that we expect the bridge standard to ultimately include VEPA, and the
new functions are basic forwarding operations, it seems to make most sense
to keep this consistent with the bridge module.

Paul



__._,_.___
Messages in this topic <http://groups.yahoo.com/group/evb/message/167;_ylc=X3oDMTMzb3FibzIzBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARtc2dJZAMyMTIEc2VjA2Z0cgRzbGsDdnRwYwRzdGltZQMxMjQ5NjcxNTEwBHRwY0lkAzE2Nw--> (9) Reply (via web post) <http://groups.yahoo.com/group/evb/post;_ylc=X3oDMTJwcDZzNTZqBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARtc2dJZAMyMTIEc2VjA2Z0cgRzbGsDcnBseQRzdGltZQMxMjQ5NjcxNTEw?act=reply&messageNum=212> | Start a new topic <http://groups.yahoo.com/group/evb/post;_ylc=X3oDMTJmZW52ZmhiBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNudHBjBHN0aW1lAzEyNDk2NzE1MTA-> 
Messages <http://groups.yahoo.com/group/evb/messages;_ylc=X3oDMTJmODJ0MmU2BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNtc2dzBHN0aW1lAzEyNDk2NzE1MTA->  | Files <http://groups.yahoo.com/group/evb/files;_ylc=X3oDMTJnaGdsYXI4BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNmaWxlcwRzdGltZQMxMjQ5NjcxNTEw>  | Photos <http://groups.yahoo.com/group/evb/photos;_ylc=X3oDMTJmYm90MmpqBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNwaG90BHN0aW1lAzEyNDk2NzE1MTA->  | Links <http://groups.yahoo.com/group/evb/links;_ylc=X3oDMTJnbWdyaTZnBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNsaW5rcwRzdGltZQMxMjQ5NjcxNTEw>  | Database <http://groups.yahoo.com/group/evb/database;_ylc=X3oDMTJkZW1ka3FhBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNkYgRzdGltZQMxMjQ5NjcxNTEw>  | Polls <http://groups.yahoo.com/group/evb/polls;_ylc=X3oDMTJnMG9lZTJuBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNwb2xscwRzdGltZQMxMjQ5NjcxNTEw>  | Members <http://groups.yahoo.com/group/evb/members;_ylc=X3oDMTJmMWdwYXViBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNtYnJzBHN0aW1lAzEyNDk2NzE1MTA->  | Calendar <http://groups.yahoo.com/group/evb/calendar;_ylc=X3oDMTJlZnQ1N25iBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNjYWwEc3RpbWUDMTI0OTY3MTUxMA-->  
Yahoo! Groups <http://groups.yahoo.com/;_ylc=X3oDMTJlNDhoZDY1BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNnZnAEc3RpbWUDMTI0OTY3MTUxMA-->  
Change settings via the Web <http://groups.yahoo.com/group/evb/join;_ylc=X3oDMTJna2g4aW9zBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjQ5NjcxNTEw>  (Yahoo! ID required) 
Change settings via email: Switch delivery to Daily Digest <mailto:evb-digest@yahoogroups.com?subject=Email Delivery: Digest>  | Switch format to Traditional <mailto:evb-traditional@yahoogroups.com?subject=Change Delivery Format: Traditional>  
Visit Your Group <http://groups.yahoo.com/group/evb;_ylc=X3oDMTJlN2ZwMTRxBF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI0OTY3MTUxMA--> | Yahoo! Groups Terms of Use <http://docs.yahoo.com/info/terms/> | Unsubscribe <mailto:evb-unsubscribe@yahoogroups.com?subject=> 
Recent Activity

Visit Your Group <http://groups.yahoo.com/group/evb;_ylc=X3oDMTJmYW91dGs2BF9TAzk3MzU5NzE0BGdycElkAzIzODk2NDQ3BGdycHNwSWQDMTcwNTAwNDc1MARzZWMDdnRsBHNsawN2Z2hwBHN0aW1lAzEyNDk2NzE1MTA-> 
Give Back

Yahoo! for Good <http://us.lrd.yahoo.com/_ylc=X3oDMTJuam45aG04BF9TAzk3MzU5NzE0BF9wAzEEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDYnJhbmQEc3RpbWUDMTI0OTY3MTUxMA--;_ylg=1/SIG=11314uv3k/**http%3A//brand.yahoo.com/forgood> 

Get inspired

by a good cause.

Y! Toolbar

Get it Free! <http://us.lrd.yahoo.com/_ylc=X3oDMTJwbGY0NzUzBF9TAzk3MzU5NzE0BF9wAzIEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDdG9vbGJhcgRzdGltZQMxMjQ5NjcxNTEw;_ylg=1/SIG=11c6dvmk9/**http%3A//toolbar.yahoo.com/%3F.cpdl=ygrps> 

easy 1-click access

to your groups.

Yahoo! Groups

Start a group <http://groups.yahoo.com/start;_ylc=X3oDMTJwdjNqdTNiBF9TAzk3MzU5NzE0BF9wAzMEZ3JwSWQDMjM4OTY0NDcEZ3Jwc3BJZAMxNzA1MDA0NzUwBHNlYwNuY21vZARzbGsDZ3JvdXBzMgRzdGltZQMxMjQ5NjcxNTEw> 

in 3 easy steps.

Connect with others.

.
 <http://geo.yahoo.com/serv?s=97359714/grpId=23896447/grpspId=1705004750/msgId=212/stime=1249671510/nc1=1/nc2=2/nc3=3> 

__,_._,___

[-- Attachment #2: Type: text/html, Size: 18843 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 20:35 ` Yaron Haviv
@ 2009-08-07 21:00   ` Fischer, Anna
  -1 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-07 21:00 UTC (permalink / raw)
  To: Yaron Haviv, evb, shemminger
  Cc: bridge, netdev, virtualization, davem, kaber, adobriyan, arnd,
	Paul Congdon (UC Davis)

Hi Yaron,

Yes, I also believe that VEPA + SRIOV can potentially, in some deployments, achieve better performance than a bridge/tap configuration, especially when you run multiple VMs and if you want to enable more sophisticated network processing in the data path.

If you do have a SRIOV NIC that supports VEPA, then I would think that you do not have QEMU or macvtap in the setup any more though. Simply because in that case the VM can directly access the VF on the physical device. That would be ideal.

I do think that the macvtap driver is a good addition as a simple and fast virtual network I/O interface, in case you do not need full bridge functionality. It does seem to assume though that the virtualization software uses QEMU/tap interfaces. How would this work with a Xen para-virtualized network interface? I guess there would need to be yet another driver?

Anna

--

From: Yaron Haviv [mailto:yaronh@voltaire.com] 
Sent: 07 August 2009 21:36
To: evb@yahoogroups.com; shemminger@linux-foundation.org; Fischer, Anna
Cc: bridge@lists.linux-foundation.org; netdev@vger.kernel.org; virtualization@lists.linux-foundation.org; davem@davemloft.net; kaber@trash.net; adobriyan@gmail.com; arnd@arndb.de
Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support

Paul,

I also think that bridge may not be the right place for VEPA, but rather a simpler sw/hw mux 
Although the VEPA support may reside in multiple places (I.e. also in the bridge)

As Arnd pointed out Or already added an extension to qemu that allow direct guest virtual NIC mapping to an interface device (vs using tap), this was done specifically to address VEPA, and result in much faster performance and lower cpu overhead (Or and some others are planning additional meaningful performance optimizations) 

The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

It may be counter intuitive for some, but we expect the (completed) qemu VEPA mode + SR-IOV + certain switches with hairpin (vepa) mode to perform faster than using bridge+tap even for connecting 2 VMs on the same host


Yaron 

Sent from BlackBerry
________________________________________
From: evb@yahoogroups.com 
To: 'Stephen Hemminger' ; 'Fischer, Anna' 
Cc: bridge@lists.linux-foundation.org ; linux-kernel@vger.kernel.org ; netdev@vger.kernel.org ; virtualization@lists.linux-foundation.org ; evb@yahoogroups.com ; davem@davemloft.net ; kaber@trash.net ; adobriyan@gmail.com ; 'Arnd Bergmann' 
Sent: Fri Aug 07 21:58:00 2009
Subject: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support 
  
> 
> After reading more about this, I am not convinced this should be part 
> of the bridge code. The bridge code really consists of two parts:
> forwarding table and optional spanning tree. Well the VEPA code short 
> circuits both of these; it can't imagine it working with STP turned 
> on. The only part of bridge code that really gets used by this are the 
> receive packet hooks and the crufty old API.
> 
> So instead of adding more stuff to existing bridge code, why not have 
> a new driver for just VEPA. You could do it with a simple version of 
> macvlan type driver.

Stephen,

Thanks for your comments and questions. We do believe the bridge code is
the right place for this, so I'd like to embellish on that a bit more to
help persuade you. Sorry for the long winded response, but here are some
thoughts:

- First and foremost, VEPA is going to be a standard addition to the IEEE
802.1Q specification. The working group agreed at the last meeting to
pursue a project to augment the bridge standard with hairpin mode (aka
reflective relay) and a remote filtering service (VEPA). See for details:
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709
-v01.pdf

- The VEPA functionality was really a pretty small change to the code with
low risk and wouldn't seem to warrant an entire new driver or module.

- There are good use cases where VMs will want to have some of their
interfaces attached to bridges and others to bridges operating in VEPA mode.
In other words, we see simultaneous operation of the bridge code and VEPA
occurring, so having as much of the underlying code as common as possible
would seem to be beneficial. 

- By augmenting the bridge code with VEPA there is a great amount of re-use
achieved. It works wherever the bridge code works and doesn't need anything
special to support KVM, XEN, and all the hooks, etc...

- The hardware vendors building SR-IOV NICs with embedded switches will be
adding VEPA mode, so by keeping the bridge module in sync would be
consistent with this trend and direction. It will be possible to extend the
hardware implementations by cascading a software bridge and/or VEPA, so
being in sync with the architecture would make this more consistent.

- The forwarding table is still needed and used on inbound traffic to
deliver frames to the correct virtual interfaces and to filter any reflected
frames. A new driver would have to basically implement an equivalent
forwarding table anyway. As I understand the current macvlan type driver,
it wouldn't filter multicast frames properly without such a table.

- It seems the hairpin mode would be needed in the bridge module whether
VEPA was added to the bridge module or a new driver. Having the associated
changes together in the same code could aid in understanding and deployment.

As I understand the macvlan code, it currently doesn't allow two VMs on the
same machine to communicate with one another. I could imagine a hairpin
mode on the adjacent bridge making this possible, but the macvlan code would
need to be updated to filter reflected frames so a source did not receive
his own packet. I could imagine this being done as well, but to also
support selective multicast usage, something similar to the bridge
forwarding table would be needed. I think putting VEPA into a new driver
would cause you to implement many things the bridge code already supports.
Given that we expect the bridge standard to ultimately include VEPA, and the
new functions are basic forwarding operations, it seems to make most sense
to keep this consistent with the bridge module.

Paul

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 20:35 ` Yaron Haviv
  (?)
  (?)
@ 2009-08-07 21:00 ` Fischer, Anna
  -1 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-07 21:00 UTC (permalink / raw)
  To: Yaron Haviv, evb, shemminger
  Cc: arnd, davem, netdev, bridge, adobriyan, Paul Congdon (UC Davis),
	virtualization

Hi Yaron,

Yes, I also believe that VEPA + SRIOV can potentially, in some deployments, achieve better performance than a bridge/tap configuration, especially when you run multiple VMs and if you want to enable more sophisticated network processing in the data path.

If you do have a SRIOV NIC that supports VEPA, then I would think that you do not have QEMU or macvtap in the setup any more though. Simply because in that case the VM can directly access the VF on the physical device. That would be ideal.

I do think that the macvtap driver is a good addition as a simple and fast virtual network I/O interface, in case you do not need full bridge functionality. It does seem to assume though that the virtualization software uses QEMU/tap interfaces. How would this work with a Xen para-virtualized network interface? I guess there would need to be yet another driver?

Anna

--

From: Yaron Haviv [mailto:yaronh@voltaire.com] 
Sent: 07 August 2009 21:36
To: evb@yahoogroups.com; shemminger@linux-foundation.org; Fischer, Anna
Cc: bridge@lists.linux-foundation.org; netdev@vger.kernel.org; virtualization@lists.linux-foundation.org; davem@davemloft.net; kaber@trash.net; adobriyan@gmail.com; arnd@arndb.de
Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support

Paul,

I also think that bridge may not be the right place for VEPA, but rather a simpler sw/hw mux 
Although the VEPA support may reside in multiple places (I.e. also in the bridge)

As Arnd pointed out Or already added an extension to qemu that allow direct guest virtual NIC mapping to an interface device (vs using tap), this was done specifically to address VEPA, and result in much faster performance and lower cpu overhead (Or and some others are planning additional meaningful performance optimizations) 

The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

It may be counter intuitive for some, but we expect the (completed) qemu VEPA mode + SR-IOV + certain switches with hairpin (vepa) mode to perform faster than using bridge+tap even for connecting 2 VMs on the same host


Yaron 

Sent from BlackBerry
________________________________________
From: evb@yahoogroups.com 
To: 'Stephen Hemminger' ; 'Fischer, Anna' 
Cc: bridge@lists.linux-foundation.org ; linux-kernel@vger.kernel.org ; netdev@vger.kernel.org ; virtualization@lists.linux-foundation.org ; evb@yahoogroups.com ; davem@davemloft.net ; kaber@trash.net ; adobriyan@gmail.com ; 'Arnd Bergmann' 
Sent: Fri Aug 07 21:58:00 2009
Subject: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support 
  
> 
> After reading more about this, I am not convinced this should be part 
> of the bridge code. The bridge code really consists of two parts:
> forwarding table and optional spanning tree. Well the VEPA code short 
> circuits both of these; it can't imagine it working with STP turned 
> on. The only part of bridge code that really gets used by this are the 
> receive packet hooks and the crufty old API.
> 
> So instead of adding more stuff to existing bridge code, why not have 
> a new driver for just VEPA. You could do it with a simple version of 
> macvlan type driver.

Stephen,

Thanks for your comments and questions. We do believe the bridge code is
the right place for this, so I'd like to embellish on that a bit more to
help persuade you. Sorry for the long winded response, but here are some
thoughts:

- First and foremost, VEPA is going to be a standard addition to the IEEE
802.1Q specification. The working group agreed at the last meeting to
pursue a project to augment the bridge standard with hairpin mode (aka
reflective relay) and a remote filtering service (VEPA). See for details:
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709
-v01.pdf

- The VEPA functionality was really a pretty small change to the code with
low risk and wouldn't seem to warrant an entire new driver or module.

- There are good use cases where VMs will want to have some of their
interfaces attached to bridges and others to bridges operating in VEPA mode.
In other words, we see simultaneous operation of the bridge code and VEPA
occurring, so having as much of the underlying code as common as possible
would seem to be beneficial. 

- By augmenting the bridge code with VEPA there is a great amount of re-use
achieved. It works wherever the bridge code works and doesn't need anything
special to support KVM, XEN, and all the hooks, etc...

- The hardware vendors building SR-IOV NICs with embedded switches will be
adding VEPA mode, so by keeping the bridge module in sync would be
consistent with this trend and direction. It will be possible to extend the
hardware implementations by cascading a software bridge and/or VEPA, so
being in sync with the architecture would make this more consistent.

- The forwarding table is still needed and used on inbound traffic to
deliver frames to the correct virtual interfaces and to filter any reflected
frames. A new driver would have to basically implement an equivalent
forwarding table anyway. As I understand the current macvlan type driver,
it wouldn't filter multicast frames properly without such a table.

- It seems the hairpin mode would be needed in the bridge module whether
VEPA was added to the bridge module or a new driver. Having the associated
changes together in the same code could aid in understanding and deployment.

As I understand the macvlan code, it currently doesn't allow two VMs on the
same machine to communicate with one another. I could imagine a hairpin
mode on the adjacent bridge making this possible, but the macvlan code would
need to be updated to filter reflected frames so a source did not receive
his own packet. I could imagine this being done as well, but to also
support selective multicast usage, something similar to the bridge
forwarding table would be needed. I think putting VEPA into a new driver
would cause you to implement many things the bridge code already supports.
Given that we expect the bridge standard to ultimately include VEPA, and the
new functions are basic forwarding operations, it seems to make most sense
to keep this consistent with the bridge module.

Paul
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-07 21:00   ` Fischer, Anna
  0 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-07 21:00 UTC (permalink / raw)
  To: Yaron Haviv, evb, shemminger
  Cc: arnd, davem, netdev, bridge, adobriyan, Paul Congdon (UC Davis),
	virtualization

Hi Yaron,

Yes, I also believe that VEPA + SRIOV can potentially, in some deployments, achieve better performance than a bridge/tap configuration, especially when you run multiple VMs and if you want to enable more sophisticated network processing in the data path.

If you do have a SRIOV NIC that supports VEPA, then I would think that you do not have QEMU or macvtap in the setup any more though. Simply because in that case the VM can directly access the VF on the physical device. That would be ideal.

I do think that the macvtap driver is a good addition as a simple and fast virtual network I/O interface, in case you do not need full bridge functionality. It does seem to assume though that the virtualization software uses QEMU/tap interfaces. How would this work with a Xen para-virtualized network interface? I guess there would need to be yet another driver?

Anna

--

From: Yaron Haviv [mailto:yaronh@voltaire.com] 
Sent: 07 August 2009 21:36
To: evb@yahoogroups.com; shemminger@linux-foundation.org; Fischer, Anna
Cc: bridge@lists.linux-foundation.org; netdev@vger.kernel.org; virtualization@lists.linux-foundation.org; davem@davemloft.net; kaber@trash.net; adobriyan@gmail.com; arnd@arndb.de
Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support

Paul,

I also think that bridge may not be the right place for VEPA, but rather a simpler sw/hw mux 
Although the VEPA support may reside in multiple places (I.e. also in the bridge)

As Arnd pointed out Or already added an extension to qemu that allow direct guest virtual NIC mapping to an interface device (vs using tap), this was done specifically to address VEPA, and result in much faster performance and lower cpu overhead (Or and some others are planning additional meaningful performance optimizations) 

The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

It may be counter intuitive for some, but we expect the (completed) qemu VEPA mode + SR-IOV + certain switches with hairpin (vepa) mode to perform faster than using bridge+tap even for connecting 2 VMs on the same host


Yaron 

Sent from BlackBerry
________________________________________
From: evb@yahoogroups.com 
To: 'Stephen Hemminger' ; 'Fischer, Anna' 
Cc: bridge@lists.linux-foundation.org ; linux-kernel@vger.kernel.org ; netdev@vger.kernel.org ; virtualization@lists.linux-foundation.org ; evb@yahoogroups.com ; davem@davemloft.net ; kaber@trash.net ; adobriyan@gmail.com ; 'Arnd Bergmann' 
Sent: Fri Aug 07 21:58:00 2009
Subject: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support 
  
> 
> After reading more about this, I am not convinced this should be part 
> of the bridge code. The bridge code really consists of two parts:
> forwarding table and optional spanning tree. Well the VEPA code short 
> circuits both of these; it can't imagine it working with STP turned 
> on. The only part of bridge code that really gets used by this are the 
> receive packet hooks and the crufty old API.
> 
> So instead of adding more stuff to existing bridge code, why not have 
> a new driver for just VEPA. You could do it with a simple version of 
> macvlan type driver.

Stephen,

Thanks for your comments and questions. We do believe the bridge code is
the right place for this, so I'd like to embellish on that a bit more to
help persuade you. Sorry for the long winded response, but here are some
thoughts:

- First and foremost, VEPA is going to be a standard addition to the IEEE
802.1Q specification. The working group agreed at the last meeting to
pursue a project to augment the bridge standard with hairpin mode (aka
reflective relay) and a remote filtering service (VEPA). See for details:
http://www.ieee802.org/1/files/public/docs2009/new-evb-congdon-evbPar5C-0709
-v01.pdf

- The VEPA functionality was really a pretty small change to the code with
low risk and wouldn't seem to warrant an entire new driver or module.

- There are good use cases where VMs will want to have some of their
interfaces attached to bridges and others to bridges operating in VEPA mode.
In other words, we see simultaneous operation of the bridge code and VEPA
occurring, so having as much of the underlying code as common as possible
would seem to be beneficial. 

- By augmenting the bridge code with VEPA there is a great amount of re-use
achieved. It works wherever the bridge code works and doesn't need anything
special to support KVM, XEN, and all the hooks, etc...

- The hardware vendors building SR-IOV NICs with embedded switches will be
adding VEPA mode, so by keeping the bridge module in sync would be
consistent with this trend and direction. It will be possible to extend the
hardware implementations by cascading a software bridge and/or VEPA, so
being in sync with the architecture would make this more consistent.

- The forwarding table is still needed and used on inbound traffic to
deliver frames to the correct virtual interfaces and to filter any reflected
frames. A new driver would have to basically implement an equivalent
forwarding table anyway. As I understand the current macvlan type driver,
it wouldn't filter multicast frames properly without such a table.

- It seems the hairpin mode would be needed in the bridge module whether
VEPA was added to the bridge module or a new driver. Having the associated
changes together in the same code could aid in understanding and deployment.

As I understand the macvlan code, it currently doesn't allow two VMs on the
same machine to communicate with one another. I could imagine a hairpin
mode on the adjacent bridge making this possible, but the macvlan code would
need to be updated to filter reflected frames so a source did not receive
his own packet. I could imagine this being done as well, but to also
support selective multicast usage, something similar to the bridge
forwarding table would be needed. I think putting VEPA into a new driver
would cause you to implement many things the bridge code already supports.
Given that we expect the bridge standard to ultimately include VEPA, and the
new functions are basic forwarding operations, it seems to make most sense
to keep this consistent with the bridge module.

Paul

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 20:35 ` Yaron Haviv
@ 2009-08-07 21:06   ` Paul Congdon (UC Davis)
  -1 siblings, 0 replies; 52+ messages in thread
From: Paul Congdon (UC Davis) @ 2009-08-07 21:06 UTC (permalink / raw)
  To: shemminger, anna.fischer
  Cc: arnd, davem, netdev, bridge, adobriyan, virtualization


[-- Attachment #1.1: Type: text/plain, Size: 1122 bytes --]

Yaron,


The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

Agreed that the hardware solution is preferred so the macvlan implementation doesn’t really matter.  If we are talking SR-IOV, then it is direct mapped, regardless of whether there is a VEB or VEPA in the hardware below, so you are bypassing the bridge software code also.  

I disagree that adding the multicast handling is simple – while not conceptually hard, it will basically require you to put an address table into the macvlan implementation – if you have that, then why not have just used the one already in the bridge code.  If you hook a VEPA up to a non-hairpin mode external bridge, you get the macvlan capability as well.

It also seems to me like the special macvlan interfaces for KVM don’t apply to XEN or a non-virtualized environment?  Or more has to be written to make that work?  If it is in the bridge code, you get all of this re-use.

 

 


[-- Attachment #1.2: Type: text/html, Size: 6831 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Bridge mailing list
Bridge@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/bridge

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 20:35 ` Yaron Haviv
                   ` (3 preceding siblings ...)
  (?)
@ 2009-08-07 21:06 ` Paul Congdon (UC Davis)
  -1 siblings, 0 replies; 52+ messages in thread
From: Paul Congdon (UC Davis) @ 2009-08-07 21:06 UTC (permalink / raw)
  To: shemminger, anna.fischer
  Cc: arnd, davem, netdev, bridge, adobriyan, virtualization


[-- Attachment #1.1: Type: text/plain, Size: 1122 bytes --]

Yaron,


The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

Agreed that the hardware solution is preferred so the macvlan implementation doesn’t really matter.  If we are talking SR-IOV, then it is direct mapped, regardless of whether there is a VEB or VEPA in the hardware below, so you are bypassing the bridge software code also.  

I disagree that adding the multicast handling is simple – while not conceptually hard, it will basically require you to put an address table into the macvlan implementation – if you have that, then why not have just used the one already in the bridge code.  If you hook a VEPA up to a non-hairpin mode external bridge, you get the macvlan capability as well.

It also seems to me like the special macvlan interfaces for KVM don’t apply to XEN or a non-virtualized environment?  Or more has to be written to make that work?  If it is in the bridge code, you get all of this re-use.

 

 


[-- Attachment #1.2: Type: text/html, Size: 6831 bytes --]

[-- Attachment #2: Type: text/plain, Size: 184 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-07 21:06   ` Paul Congdon (UC Davis)
  0 siblings, 0 replies; 52+ messages in thread
From: Paul Congdon (UC Davis) @ 2009-08-07 21:06 UTC (permalink / raw)
  To: shemminger, anna.fischer
  Cc: arnd, davem, netdev, bridge, adobriyan, virtualization

[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]

Yaron,


The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 

Agreed that the hardware solution is preferred so the macvlan implementation doesn’t really matter.  If we are talking SR-IOV, then it is direct mapped, regardless of whether there is a VEB or VEPA in the hardware below, so you are bypassing the bridge software code also.  

I disagree that adding the multicast handling is simple – while not conceptually hard, it will basically require you to put an address table into the macvlan implementation – if you have that, then why not have just used the one already in the bridge code.  If you hook a VEPA up to a non-hairpin mode external bridge, you get the macvlan capability as well.

It also seems to me like the special macvlan interfaces for KVM don’t apply to XEN or a non-virtualized environment?  Or more has to be written to make that work?  If it is in the bridge code, you get all of this re-use.

 

 


[-- Attachment #2: Type: text/html, Size: 6831 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 21:06   ` [Bridge] " Paul Congdon (UC Davis)
@ 2009-08-07 21:36     ` Stephen Hemminger
  -1 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-07 21:36 UTC (permalink / raw)
  To: Paul Congdon (UC Davis)
  Cc: anna.fischer, bridge, netdev, virtualization, davem, kaber,
	adobriyan, arnd

On Fri, 7 Aug 2009 14:06:58 -0700
"Paul Congdon \(UC Davis\)" <ptcongdon@ucdavis.edu> wrote:

> Yaron,
> 
> 
> The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 
> 
> Agreed that the hardware solution is preferred so the macvlan implementation doesn’t really matter.  If we are talking SR-IOV, then it is direct mapped, regardless of whether there is a VEB or VEPA in the hardware below, so you are bypassing the bridge software code also.  
> 
> I disagree that adding the multicast handling is simple – while not conceptually hard, it will basically require you to put an address table into the macvlan implementation – if you have that, then why not have just used the one already in the bridge code.  If you hook a VEPA up to a non-hairpin mode external bridge, you get the macvlan capability as well.

I have a patch that forwards all multicast packets, and another that does
proper forwarding. It should have worked that way in original macvlan, the
current behavior is really a bug.


-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 21:06   ` [Bridge] " Paul Congdon (UC Davis)
  (?)
  (?)
@ 2009-08-07 21:36   ` Stephen Hemminger
  -1 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-07 21:36 UTC (permalink / raw)
  To: Paul Congdon (UC Davis)
  Cc: arnd, anna.fischer, netdev, bridge, davem, adobriyan, virtualization

On Fri, 7 Aug 2009 14:06:58 -0700
"Paul Congdon \(UC Davis\)" <ptcongdon@ucdavis.edu> wrote:

> Yaron,
> 
> 
> The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 
> 
> Agreed that the hardware solution is preferred so the macvlan implementation doesn’t really matter.  If we are talking SR-IOV, then it is direct mapped, regardless of whether there is a VEB or VEPA in the hardware below, so you are bypassing the bridge software code also.  
> 
> I disagree that adding the multicast handling is simple – while not conceptually hard, it will basically require you to put an address table into the macvlan implementation – if you have that, then why not have just used the one already in the bridge code.  If you hook a VEPA up to a non-hairpin mode external bridge, you get the macvlan capability as well.

I have a patch that forwards all multicast packets, and another that does
proper forwarding. It should have worked that way in original macvlan, the
current behavior is really a bug.


-- 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-07 21:36     ` Stephen Hemminger
  0 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-07 21:36 UTC (permalink / raw)
  To: Paul Congdon (UC Davis)
  Cc: arnd, anna.fischer, netdev, bridge, davem, adobriyan, virtualization

On Fri, 7 Aug 2009 14:06:58 -0700
"Paul Congdon \(UC Davis\)" <ptcongdon@ucdavis.edu> wrote:

> Yaron,
> 
> 
> The interface multiplexing can be achieved using macvlan driver or using an SR-IOV capable NIC (the preferred option), macvlan may need to be extended to support VEPA multicast handling, this looks like a rather simple task 
> 
> Agreed that the hardware solution is preferred so the macvlan implementation doesn’t really matter.  If we are talking SR-IOV, then it is direct mapped, regardless of whether there is a VEB or VEPA in the hardware below, so you are bypassing the bridge software code also.  
> 
> I disagree that adding the multicast handling is simple – while not conceptually hard, it will basically require you to put an address table into the macvlan implementation – if you have that, then why not have just used the one already in the bridge code.  If you hook a VEPA up to a non-hairpin mode external bridge, you get the macvlan capability as well.

I have a patch that forwards all multicast packets, and another that does
proper forwarding. It should have worked that way in original macvlan, the
current behavior is really a bug.


-- 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
       [not found] ` <0199E0D51A61344794750DC57738F58E6D6A6CD803__29862.6656564467$1249679159$gmane$org@GVW1118EXC.americas.hpqcorp.net>
@ 2009-08-08  8:50     ` Benny Amorsen
  2009-08-08  8:50   ` Benny Amorsen
  1 sibling, 0 replies; 52+ messages in thread
From: Benny Amorsen @ 2009-08-08  8:50 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: arnd, Yaron Haviv, bridge, virtualization, adobriyan,
	Paul Congdon (UC Davis),
	netdev, evb, davem

"Fischer, Anna" <anna.fischer@hp.com> writes:

> If you do have a SRIOV NIC that supports VEPA, then I would think that
> you do not have QEMU or macvtap in the setup any more though. Simply
> because in that case the VM can directly access the VF on the physical
> device. That would be ideal.

I'm just trying to understand how this all works, so I'm probably asking
a stupid question:

Would a SRIOV NIC with VEPA support show up as multiple devices? I.e.
would I get e.g. eth0-eth7 for a NIC with support for 8 virtual
interfaces? Would they have different MAC addresses?


/Benny

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
       [not found] ` <0199E0D51A61344794750DC57738F58E6D6A6CD803__29862.6656564467$1249679159$gmane$org@GVW1118EXC.americas.hpqcorp.net>
  2009-08-08  8:50     ` [Bridge] " Benny Amorsen
@ 2009-08-08  8:50   ` Benny Amorsen
  1 sibling, 0 replies; 52+ messages in thread
From: Benny Amorsen @ 2009-08-08  8:50 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: arnd, Yaron Haviv, bridge, virtualization, adobriyan,
	Paul Congdon (UC Davis),
	netdev, evb, shemminger, davem

"Fischer, Anna" <anna.fischer@hp.com> writes:

> If you do have a SRIOV NIC that supports VEPA, then I would think that
> you do not have QEMU or macvtap in the setup any more though. Simply
> because in that case the VM can directly access the VF on the physical
> device. That would be ideal.

I'm just trying to understand how this all works, so I'm probably asking
a stupid question:

Would a SRIOV NIC with VEPA support show up as multiple devices? I.e.
would I get e.g. eth0-eth7 for a NIC with support for 8 virtual
interfaces? Would they have different MAC addresses?


/Benny

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-08  8:50     ` Benny Amorsen
  0 siblings, 0 replies; 52+ messages in thread
From: Benny Amorsen @ 2009-08-08  8:50 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: arnd, Yaron Haviv, bridge, virtualization, adobriyan,
	Paul Congdon (UC Davis),
	netdev, evb, davem

"Fischer, Anna" <anna.fischer@hp.com> writes:

> If you do have a SRIOV NIC that supports VEPA, then I would think that
> you do not have QEMU or macvtap in the setup any more though. Simply
> because in that case the VM can directly access the VF on the physical
> device. That would be ideal.

I'm just trying to understand how this all works, so I'm probably asking
a stupid question:

Would a SRIOV NIC with VEPA support show up as multiple devices? I.e.
would I get e.g. eth0-eth7 for a NIC with support for 8 virtual
interfaces? Would they have different MAC addresses?


/Benny


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 21:00   ` [Bridge] " Fischer, Anna
@ 2009-08-08  9:22     ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-08  9:22 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Yaron Haviv, evb, shemminger, bridge, netdev, virtualization,
	davem, kaber, adobriyan, Paul Congdon (UC Davis)

On Friday 07 August 2009, Fischer, Anna wrote:
> If you do have a SRIOV NIC that supports VEPA, then I would think
> that you do not have QEMU or macvtap in the setup any more though.
> Simply because in that case the VM can directly access the VF on
> the physical device. That would be ideal.

There may be reasons why even with an SR-IOV adapter you may want
to use the macvtap setup, with some extensions. E.g. guest migration
becomes a lot simpler if you don't have to deal with PCI passthrough
devices. If we manage to add both TX and RX zero-copy (into the
guest) to the macvlan driver, we can treat an SR-IOV adapter like
a VMDq adapter and get the best of both.

> I do think that the macvtap driver is a good addition as a simple
> and fast virtual network I/O interface, in case you do not need
> full bridge functionality. It does seem to assume though that the
> virtualization software uses QEMU/tap interfaces. How would this
> work with a Xen para-virtualized network interface? I guess there
> would need to be yet another driver?

I'm not sure how Xen guest networking works, but if neither the
traditional macvlan driver nor the macvtap driver are able to
connect it to the external NIC, then you can probably add a third
macvlan backend to handle that.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 21:00   ` [Bridge] " Fischer, Anna
  (?)
  (?)
@ 2009-08-08  9:22   ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-08  9:22 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Yaron Haviv, bridge, virtualization, Paul Congdon (UC Davis),
	adobriyan, netdev, evb, shemminger, davem

On Friday 07 August 2009, Fischer, Anna wrote:
> If you do have a SRIOV NIC that supports VEPA, then I would think
> that you do not have QEMU or macvtap in the setup any more though.
> Simply because in that case the VM can directly access the VF on
> the physical device. That would be ideal.

There may be reasons why even with an SR-IOV adapter you may want
to use the macvtap setup, with some extensions. E.g. guest migration
becomes a lot simpler if you don't have to deal with PCI passthrough
devices. If we manage to add both TX and RX zero-copy (into the
guest) to the macvlan driver, we can treat an SR-IOV adapter like
a VMDq adapter and get the best of both.

> I do think that the macvtap driver is a good addition as a simple
> and fast virtual network I/O interface, in case you do not need
> full bridge functionality. It does seem to assume though that the
> virtualization software uses QEMU/tap interfaces. How would this
> work with a Xen para-virtualized network interface? I guess there
> would need to be yet another driver?

I'm not sure how Xen guest networking works, but if neither the
traditional macvlan driver nor the macvtap driver are able to
connect it to the external NIC, then you can probably add a third
macvlan backend to handle that.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-08  9:22     ` Arnd Bergmann
  0 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-08  9:22 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Yaron Haviv, bridge, virtualization, Paul Congdon (UC Davis),
	adobriyan, netdev, evb, davem

On Friday 07 August 2009, Fischer, Anna wrote:
> If you do have a SRIOV NIC that supports VEPA, then I would think
> that you do not have QEMU or macvtap in the setup any more though.
> Simply because in that case the VM can directly access the VF on
> the physical device. That would be ideal.

There may be reasons why even with an SR-IOV adapter you may want
to use the macvtap setup, with some extensions. E.g. guest migration
becomes a lot simpler if you don't have to deal with PCI passthrough
devices. If we manage to add both TX and RX zero-copy (into the
guest) to the macvlan driver, we can treat an SR-IOV adapter like
a VMDq adapter and get the best of both.

> I do think that the macvtap driver is a good addition as a simple
> and fast virtual network I/O interface, in case you do not need
> full bridge functionality. It does seem to assume though that the
> virtualization software uses QEMU/tap interfaces. How would this
> work with a Xen para-virtualized network interface? I guess there
> would need to be yet another driver?

I'm not sure how Xen guest networking works, but if neither the
traditional macvlan driver nor the macvtap driver are able to
connect it to the external NIC, then you can probably add a third
macvlan backend to handle that.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-08  8:50     ` [Bridge] " Benny Amorsen
@ 2009-08-08  9:44       ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-08  9:44 UTC (permalink / raw)
  To: Benny Amorsen
  Cc: Fischer, Anna, Yaron Haviv, evb, shemminger, davem, netdev,
	bridge, adobriyan, Paul Congdon (UC Davis),
	virtualization

On Saturday 08 August 2009, Benny Amorsen wrote:
> Would a SRIOV NIC with VEPA support show up as multiple devices? I.e.
> would I get e.g. eth0-eth7 for a NIC with support for 8 virtual
> interfaces? Would they have different MAC addresses?

It could, but the idea of SR-IOV is that it shows up as 8 PCI
devices. One of them is owned by the host and is seen as eth0
there. The other seven PCI devices (virtual functions) are meant
to be assigned to the guest using PCI passthrough and will show
up as the guests eth0, each one with its own MAC address.

An other mode of operation is VMDq, where the host owns all
interfaces and you might see eth0-eth7 there. You can then attach
a qemu process with a raw packet socket or a single macvtap port
for each of those interfaces. This is not yet implemented in Linux,
so how it will be done is still open. It might all be integrated
into macvlan or some new subsystem alternatively.

AFAIK, every SR-IOV adapter can also be operated as a VMDq adapter,
but there are VMDq adapters that do not support SR-IOV.

	Arnd <><


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-08  8:50     ` [Bridge] " Benny Amorsen
  (?)
@ 2009-08-08  9:44     ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-08  9:44 UTC (permalink / raw)
  To: Benny Amorsen
  Cc: Fischer, Anna, Yaron Haviv, bridge, virtualization, adobriyan,
	Paul Congdon (UC Davis),
	netdev, evb, shemminger, davem

On Saturday 08 August 2009, Benny Amorsen wrote:
> Would a SRIOV NIC with VEPA support show up as multiple devices? I.e.
> would I get e.g. eth0-eth7 for a NIC with support for 8 virtual
> interfaces? Would they have different MAC addresses?

It could, but the idea of SR-IOV is that it shows up as 8 PCI
devices. One of them is owned by the host and is seen as eth0
there. The other seven PCI devices (virtual functions) are meant
to be assigned to the guest using PCI passthrough and will show
up as the guests eth0, each one with its own MAC address.

An other mode of operation is VMDq, where the host owns all
interfaces and you might see eth0-eth7 there. You can then attach
a qemu process with a raw packet socket or a single macvtap port
for each of those interfaces. This is not yet implemented in Linux,
so how it will be done is still open. It might all be integrated
into macvlan or some new subsystem alternatively.

AFAIK, every SR-IOV adapter can also be operated as a VMDq adapter,
but there are VMDq adapters that do not support SR-IOV.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-08  9:44       ` Arnd Bergmann
  0 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-08  9:44 UTC (permalink / raw)
  To: Benny Amorsen
  Cc: Fischer, Anna, Yaron Haviv, bridge, virtualization, adobriyan,
	Paul Congdon (UC Davis),
	netdev, evb, davem

On Saturday 08 August 2009, Benny Amorsen wrote:
> Would a SRIOV NIC with VEPA support show up as multiple devices? I.e.
> would I get e.g. eth0-eth7 for a NIC with support for 8 virtual
> interfaces? Would they have different MAC addresses?

It could, but the idea of SR-IOV is that it shows up as 8 PCI
devices. One of them is owned by the host and is seen as eth0
there. The other seven PCI devices (virtual functions) are meant
to be assigned to the guest using PCI passthrough and will show
up as the guests eth0, each one with its own MAC address.

An other mode of operation is VMDq, where the host owns all
interfaces and you might see eth0-eth7 there. You can then attach
a qemu process with a raw packet socket or a single macvtap port
for each of those interfaces. This is not yet implemented in Linux,
so how it will be done is still open. It might all be integrated
into macvlan or some new subsystem alternatively.

AFAIK, every SR-IOV adapter can also be operated as a VMDq adapter,
but there are VMDq adapters that do not support SR-IOV.

	Arnd <><


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 21:36     ` [Bridge] " Stephen Hemminger
@ 2009-08-09 11:19       ` Or Gerlitz
  -1 siblings, 0 replies; 52+ messages in thread
From: Or Gerlitz @ 2009-08-09 11:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Paul Congdon (UC Davis),
	arnd, anna.fischer, netdev, bridge, davem, adobriyan,
	virtualization, evb

Stephen Hemminger wrote:
> I have a patch that forwards all multicast packets, and another that does proper forwarding. It should have worked that way in original macvlan, the current behavior is really a bug.
>   
Looking in macvlan_set_multicast_list() it acts in a similar manner to
macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
assume what's left is to add macvlan_hash_xxx multicast logic to
map/unmap multicast groups to what macvlan devices want to receive them
and this way the flooding can be removed, correct?


Or.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 21:36     ` [Bridge] " Stephen Hemminger
  (?)
  (?)
@ 2009-08-09 11:19     ` Or Gerlitz
  -1 siblings, 0 replies; 52+ messages in thread
From: Or Gerlitz @ 2009-08-09 11:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: evb, arnd, anna.fischer, netdev, bridge, adobriyan,
	Paul Congdon (UC Davis),
	virtualization, davem

Stephen Hemminger wrote:
> I have a patch that forwards all multicast packets, and another that does proper forwarding. It should have worked that way in original macvlan, the current behavior is really a bug.
>   
Looking in macvlan_set_multicast_list() it acts in a similar manner to
macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
assume what's left is to add macvlan_hash_xxx multicast logic to
map/unmap multicast groups to what macvlan devices want to receive them
and this way the flooding can be removed, correct?


Or.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-09 11:19       ` Or Gerlitz
  0 siblings, 0 replies; 52+ messages in thread
From: Or Gerlitz @ 2009-08-09 11:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: evb, arnd, anna.fischer, netdev, bridge, adobriyan,
	Paul Congdon (UC Davis),
	virtualization, davem

Stephen Hemminger wrote:
> I have a patch that forwards all multicast packets, and another that does proper forwarding. It should have worked that way in original macvlan, the current behavior is really a bug.
>   
Looking in macvlan_set_multicast_list() it acts in a similar manner to
macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
assume what's left is to add macvlan_hash_xxx multicast logic to
map/unmap multicast groups to what macvlan devices want to receive them
and this way the flooding can be removed, correct?


Or.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-09 11:19       ` [Bridge] " Or Gerlitz
@ 2009-08-10 15:20         ` Stephen Hemminger
  -1 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-10 15:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Paul Congdon (UC Davis),
	arnd, anna.fischer, netdev, bridge, davem, adobriyan,
	virtualization, evb

On Sun, 09 Aug 2009 14:19:08 +0300
Or Gerlitz <ogerlitz@voltaire.com> wrote:

> Stephen Hemminger wrote:
> > I have a patch that forwards all multicast packets, and another that does proper forwarding. It should have worked that way in original macvlan, the current behavior is really a bug.
> >   
> Looking in macvlan_set_multicast_list() it acts in a similar manner to
> macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
> assume what's left is to add macvlan_hash_xxx multicast logic to
> map/unmap multicast groups to what macvlan devices want to receive them
> and this way the flooding can be removed, correct?

The device can just flood all multicast packets, since the filtering
is done on the receive path anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-09 11:19       ` [Bridge] " Or Gerlitz
  (?)
  (?)
@ 2009-08-10 15:20       ` Stephen Hemminger
  -1 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-10 15:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: evb, arnd, anna.fischer, netdev, bridge, adobriyan,
	Paul Congdon (UC Davis),
	virtualization, davem

On Sun, 09 Aug 2009 14:19:08 +0300
Or Gerlitz <ogerlitz@voltaire.com> wrote:

> Stephen Hemminger wrote:
> > I have a patch that forwards all multicast packets, and another that does proper forwarding. It should have worked that way in original macvlan, the current behavior is really a bug.
> >   
> Looking in macvlan_set_multicast_list() it acts in a similar manner to
> macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
> assume what's left is to add macvlan_hash_xxx multicast logic to
> map/unmap multicast groups to what macvlan devices want to receive them
> and this way the flooding can be removed, correct?

The device can just flood all multicast packets, since the filtering
is done on the receive path anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 15:20         ` Stephen Hemminger
  0 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-10 15:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: evb, arnd, anna.fischer, netdev, bridge, adobriyan,
	Paul Congdon (UC Davis),
	virtualization, davem

On Sun, 09 Aug 2009 14:19:08 +0300
Or Gerlitz <ogerlitz@voltaire.com> wrote:

> Stephen Hemminger wrote:
> > I have a patch that forwards all multicast packets, and another that does proper forwarding. It should have worked that way in original macvlan, the current behavior is really a bug.
> >   
> Looking in macvlan_set_multicast_list() it acts in a similar manner to
> macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
> assume what's left is to add macvlan_hash_xxx multicast logic to
> map/unmap multicast groups to what macvlan devices want to receive them
> and this way the flooding can be removed, correct?

The device can just flood all multicast packets, since the filtering
is done on the receive path anyway.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:20         ` [Bridge] " Stephen Hemminger
@ 2009-08-10 15:28           ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 15:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Or Gerlitz, Paul Congdon (UC Davis),
	anna.fischer, netdev, bridge, davem, adobriyan, virtualization,
	evb

On Monday 10 August 2009, Stephen Hemminger wrote:
> On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > Looking in macvlan_set_multicast_list() it acts in a similar manner to
> > macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
> > assume what's left is to add macvlan_hash_xxx multicast logic to
> > map/unmap multicast groups to what macvlan devices want to receive them
> > and this way the flooding can be removed, correct?
> 
> The device can just flood all multicast packets, since the filtering
> is done on the receive path anyway.

But we'd still have to copy the frames to user space (for both
macvtap and raw packet sockets) and exit from the guest to inject
it into its stack, right?

I guess for multicast heavy workloads, we could save a lot of cycles
by throwing the frames away as early as possible. How common are those
setups in virtual servers though?

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:20         ` [Bridge] " Stephen Hemminger
  (?)
  (?)
@ 2009-08-10 15:28         ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 15:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Paul Congdon (UC Davis),
	evb, anna.fischer, netdev, bridge, adobriyan, virtualization,
	davem

On Monday 10 August 2009, Stephen Hemminger wrote:
> On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > Looking in macvlan_set_multicast_list() it acts in a similar manner to
> > macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
> > assume what's left is to add macvlan_hash_xxx multicast logic to
> > map/unmap multicast groups to what macvlan devices want to receive them
> > and this way the flooding can be removed, correct?
> 
> The device can just flood all multicast packets, since the filtering
> is done on the receive path anyway.

But we'd still have to copy the frames to user space (for both
macvtap and raw packet sockets) and exit from the guest to inject
it into its stack, right?

I guess for multicast heavy workloads, we could save a lot of cycles
by throwing the frames away as early as possible. How common are those
setups in virtual servers though?

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 15:28           ` Arnd Bergmann
  0 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 15:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Paul Congdon (UC Davis),
	evb, anna.fischer, netdev, bridge, adobriyan, Or Gerlitz,
	virtualization, davem

On Monday 10 August 2009, Stephen Hemminger wrote:
> On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > Looking in macvlan_set_multicast_list() it acts in a similar manner to
> > macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I
> > assume what's left is to add macvlan_hash_xxx multicast logic to
> > map/unmap multicast groups to what macvlan devices want to receive them
> > and this way the flooding can be removed, correct?
> 
> The device can just flood all multicast packets, since the filtering
> is done on the receive path anyway.

But we'd still have to copy the frames to user space (for both
macvtap and raw packet sockets) and exit from the guest to inject
it into its stack, right?

I guess for multicast heavy workloads, we could save a lot of cycles
by throwing the frames away as early as possible. How common are those
setups in virtual servers though?

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:28           ` [Bridge] " Arnd Bergmann
@ 2009-08-10 16:32             ` Fischer, Anna
  -1 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-10 16:32 UTC (permalink / raw)
  To: Stephen Hemminger, Arnd Bergmann
  Cc: Or Gerlitz, Paul Congdon (UC Davis),
	netdev, bridge, davem, adobriyan, virtualization, evb

> Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
> 
> On Monday 10 August 2009, Stephen Hemminger wrote:
> > On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz
> <ogerlitz@voltaire.com> wrote:
> > > Looking in macvlan_set_multicast_list() it acts in a similar manner
> to
> > > macvlan_set_mac_address() in the sense that it calls dev_mc_sync().
> I
> > > assume what's left is to add macvlan_hash_xxx multicast logic to
> > > map/unmap multicast groups to what macvlan devices want to receive
> them
> > > and this way the flooding can be removed, correct?
> >
> > The device can just flood all multicast packets, since the filtering
> > is done on the receive path anyway.

Is this handled by one of the additional patches? In the current kernel tree
macvlan code it looks as if multicast filtering is only handled by the
physical device driver, but not on particular macvlan devices.
 

> But we'd still have to copy the frames to user space (for both
> macvtap and raw packet sockets) and exit from the guest to inject
> it into its stack, right?

I think it would be nice if you can implement what Or describes for 
macvlan and avoid flooding, and it doesn't sound too hard to do. 

I guess one advantage for macvlan (over the bridge) is that you can 
program in all information you have for the ports attached to it, e.g. 
MAC addresses and multicast addresses. So you could take advantage of
that whereas the bridge always floods multicast frames to all ports.
 
How would this work though, if the OS inside the guest wants to register
to a particular multicast address? Is this propagated through the backend
drivers to the macvlan/macvtap interface?

Anna


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:28           ` [Bridge] " Arnd Bergmann
  (?)
  (?)
@ 2009-08-10 16:32           ` Fischer, Anna
  -1 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-10 16:32 UTC (permalink / raw)
  To: Stephen Hemminger, Arnd Bergmann
  Cc: Paul Congdon (UC Davis),
	evb, netdev, bridge, davem, virtualization, adobriyan

> Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
> 
> On Monday 10 August 2009, Stephen Hemminger wrote:
> > On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz
> <ogerlitz@voltaire.com> wrote:
> > > Looking in macvlan_set_multicast_list() it acts in a similar manner
> to
> > > macvlan_set_mac_address() in the sense that it calls dev_mc_sync().
> I
> > > assume what's left is to add macvlan_hash_xxx multicast logic to
> > > map/unmap multicast groups to what macvlan devices want to receive
> them
> > > and this way the flooding can be removed, correct?
> >
> > The device can just flood all multicast packets, since the filtering
> > is done on the receive path anyway.

Is this handled by one of the additional patches? In the current kernel tree
macvlan code it looks as if multicast filtering is only handled by the
physical device driver, but not on particular macvlan devices.
 

> But we'd still have to copy the frames to user space (for both
> macvtap and raw packet sockets) and exit from the guest to inject
> it into its stack, right?

I think it would be nice if you can implement what Or describes for 
macvlan and avoid flooding, and it doesn't sound too hard to do. 

I guess one advantage for macvlan (over the bridge) is that you can 
program in all information you have for the ports attached to it, e.g. 
MAC addresses and multicast addresses. So you could take advantage of
that whereas the bridge always floods multicast frames to all ports.
 
How would this work though, if the OS inside the guest wants to register
to a particular multicast address? Is this propagated through the backend
drivers to the macvlan/macvtap interface?

Anna

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 16:32             ` Fischer, Anna
  0 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-10 16:32 UTC (permalink / raw)
  To: Stephen Hemminger, Arnd Bergmann
  Cc: Paul Congdon (UC Davis),
	evb, netdev, bridge, davem, Or Gerlitz, virtualization,
	adobriyan

> Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
> 
> On Monday 10 August 2009, Stephen Hemminger wrote:
> > On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz
> <ogerlitz@voltaire.com> wrote:
> > > Looking in macvlan_set_multicast_list() it acts in a similar manner
> to
> > > macvlan_set_mac_address() in the sense that it calls dev_mc_sync().
> I
> > > assume what's left is to add macvlan_hash_xxx multicast logic to
> > > map/unmap multicast groups to what macvlan devices want to receive
> them
> > > and this way the flooding can be removed, correct?
> >
> > The device can just flood all multicast packets, since the filtering
> > is done on the receive path anyway.

Is this handled by one of the additional patches? In the current kernel tree
macvlan code it looks as if multicast filtering is only handled by the
physical device driver, but not on particular macvlan devices.
 

> But we'd still have to copy the frames to user space (for both
> macvtap and raw packet sockets) and exit from the guest to inject
> it into its stack, right?

I think it would be nice if you can implement what Or describes for 
macvlan and avoid flooding, and it doesn't sound too hard to do. 

I guess one advantage for macvlan (over the bridge) is that you can 
program in all information you have for the ports attached to it, e.g. 
MAC addresses and multicast addresses. So you could take advantage of
that whereas the bridge always floods multicast frames to all ports.
 
How would this work though, if the OS inside the guest wants to register
to a particular multicast address? Is this propagated through the backend
drivers to the macvlan/macvtap interface?

Anna


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 16:32             ` [Bridge] " Fischer, Anna
@ 2009-08-10 16:51               ` Stephen Hemminger
  -1 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-10 16:51 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Arnd Bergmann, Or Gerlitz, Paul Congdon (UC Davis),
	netdev, bridge, davem, adobriyan, virtualization, evb

On Mon, 10 Aug 2009 16:32:01 +0000
"Fischer, Anna" <anna.fischer@hp.com> wrote:

> > Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
> > 
> > On Monday 10 August 2009, Stephen Hemminger wrote:
> > > On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz
> > <ogerlitz@voltaire.com> wrote:
> > > > Looking in macvlan_set_multicast_list() it acts in a similar manner
> > to
> > > > macvlan_set_mac_address() in the sense that it calls dev_mc_sync().
> > I
> > > > assume what's left is to add macvlan_hash_xxx multicast logic to
> > > > map/unmap multicast groups to what macvlan devices want to receive
> > them
> > > > and this way the flooding can be removed, correct?
> > >
> > > The device can just flood all multicast packets, since the filtering
> > > is done on the receive path anyway.
> 
> Is this handled by one of the additional patches? In the current kernel tree
> macvlan code it looks as if multicast filtering is only handled by the
> physical device driver, but not on particular macvlan devices.
>  
> 
> > But we'd still have to copy the frames to user space (for both
> > macvtap and raw packet sockets) and exit from the guest to inject
> > it into its stack, right?
> 
> I think it would be nice if you can implement what Or describes for 
> macvlan and avoid flooding, and it doesn't sound too hard to do. 
> 
> I guess one advantage for macvlan (over the bridge) is that you can 
> program in all information you have for the ports attached to it, e.g. 
> MAC addresses and multicast addresses. So you could take advantage of
> that whereas the bridge always floods multicast frames to all ports.
>  
> How would this work though, if the OS inside the guest wants to register
> to a particular multicast address? Is this propagated through the backend
> drivers to the macvlan/macvtap interface?

Sure filtering is better, but multicast performance with large number
of guests is really a corner case, not the real performance issue.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 16:32             ` [Bridge] " Fischer, Anna
  (?)
@ 2009-08-10 16:51             ` Stephen Hemminger
  -1 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-10 16:51 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Paul Congdon (UC Davis),
	evb, Arnd Bergmann, netdev, bridge, adobriyan, virtualization,
	davem

On Mon, 10 Aug 2009 16:32:01 +0000
"Fischer, Anna" <anna.fischer@hp.com> wrote:

> > Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
> > 
> > On Monday 10 August 2009, Stephen Hemminger wrote:
> > > On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz
> > <ogerlitz@voltaire.com> wrote:
> > > > Looking in macvlan_set_multicast_list() it acts in a similar manner
> > to
> > > > macvlan_set_mac_address() in the sense that it calls dev_mc_sync().
> > I
> > > > assume what's left is to add macvlan_hash_xxx multicast logic to
> > > > map/unmap multicast groups to what macvlan devices want to receive
> > them
> > > > and this way the flooding can be removed, correct?
> > >
> > > The device can just flood all multicast packets, since the filtering
> > > is done on the receive path anyway.
> 
> Is this handled by one of the additional patches? In the current kernel tree
> macvlan code it looks as if multicast filtering is only handled by the
> physical device driver, but not on particular macvlan devices.
>  
> 
> > But we'd still have to copy the frames to user space (for both
> > macvtap and raw packet sockets) and exit from the guest to inject
> > it into its stack, right?
> 
> I think it would be nice if you can implement what Or describes for 
> macvlan and avoid flooding, and it doesn't sound too hard to do. 
> 
> I guess one advantage for macvlan (over the bridge) is that you can 
> program in all information you have for the ports attached to it, e.g. 
> MAC addresses and multicast addresses. So you could take advantage of
> that whereas the bridge always floods multicast frames to all ports.
>  
> How would this work though, if the OS inside the guest wants to register
> to a particular multicast address? Is this propagated through the backend
> drivers to the macvlan/macvtap interface?

Sure filtering is better, but multicast performance with large number
of guests is really a corner case, not the real performance issue.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 16:51               ` Stephen Hemminger
  0 siblings, 0 replies; 52+ messages in thread
From: Stephen Hemminger @ 2009-08-10 16:51 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Paul Congdon (UC Davis),
	evb, Arnd Bergmann, netdev, bridge, adobriyan, Or Gerlitz,
	virtualization, davem

On Mon, 10 Aug 2009 16:32:01 +0000
"Fischer, Anna" <anna.fischer@hp.com> wrote:

> > Subject: Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
> > 
> > On Monday 10 August 2009, Stephen Hemminger wrote:
> > > On Sun, 09 Aug 2009 14:19:08 +0300, Or Gerlitz
> > <ogerlitz@voltaire.com> wrote:
> > > > Looking in macvlan_set_multicast_list() it acts in a similar manner
> > to
> > > > macvlan_set_mac_address() in the sense that it calls dev_mc_sync().
> > I
> > > > assume what's left is to add macvlan_hash_xxx multicast logic to
> > > > map/unmap multicast groups to what macvlan devices want to receive
> > them
> > > > and this way the flooding can be removed, correct?
> > >
> > > The device can just flood all multicast packets, since the filtering
> > > is done on the receive path anyway.
> 
> Is this handled by one of the additional patches? In the current kernel tree
> macvlan code it looks as if multicast filtering is only handled by the
> physical device driver, but not on particular macvlan devices.
>  
> 
> > But we'd still have to copy the frames to user space (for both
> > macvtap and raw packet sockets) and exit from the guest to inject
> > it into its stack, right?
> 
> I think it would be nice if you can implement what Or describes for 
> macvlan and avoid flooding, and it doesn't sound too hard to do. 
> 
> I guess one advantage for macvlan (over the bridge) is that you can 
> program in all information you have for the ports attached to it, e.g. 
> MAC addresses and multicast addresses. So you could take advantage of
> that whereas the bridge always floods multicast frames to all ports.
>  
> How would this work though, if the OS inside the guest wants to register
> to a particular multicast address? Is this propagated through the backend
> drivers to the macvlan/macvtap interface?

Sure filtering is better, but multicast performance with large number
of guests is really a corner case, not the real performance issue.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 16:51               ` [Bridge] " Stephen Hemminger
@ 2009-08-10 19:18                 ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 19:18 UTC (permalink / raw)
  To: virtualization
  Cc: Stephen Hemminger, Fischer, Anna, Paul Congdon (UC Davis),
	evb, netdev, bridge, adobriyan, davem

On Monday 10 August 2009, Stephen Hemminger wrote:
> On Mon, 10 Aug 2009 16:32:01, "Fischer, Anna" <anna.fischer@hp.com> wrote:
> > How would this work though, if the OS inside the guest wants to register
> > to a particular multicast address? Is this propagated through the backend
> > drivers to the macvlan/macvtap interface?
> 
> Sure filtering is better, but multicast performance with large number
> of guests is really a corner case, not the real performance issue.

Well, right now, qemu does not care at all about this, it essentially
leaves the tun device in ALLMULTI state. I should check whether macvtap
at this stage can receive multicast frames at all, but if it does,
it will get them all ;-).

If we want to implement this with kvm, we would have to start with
the qemu virtio-net implementation, to move the receive filter into
the tap device. With tun/tap that will mean less copying to user
space, with macvtap (after implementing TUNSETTXFILTER) we get already
pretty far because we no longer need to have the external interface
in ALLMULTI state. Once that is in place, we can start thinking about
filtering per virtual device.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 16:51               ` [Bridge] " Stephen Hemminger
  (?)
@ 2009-08-10 19:18               ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 19:18 UTC (permalink / raw)
  To: virtualization
  Cc: Fischer, Anna, netdev, bridge, davem, Paul Congdon (UC Davis),
	evb, Stephen Hemminger, adobriyan

On Monday 10 August 2009, Stephen Hemminger wrote:
> On Mon, 10 Aug 2009 16:32:01, "Fischer, Anna" <anna.fischer@hp.com> wrote:
> > How would this work though, if the OS inside the guest wants to register
> > to a particular multicast address? Is this propagated through the backend
> > drivers to the macvlan/macvtap interface?
> 
> Sure filtering is better, but multicast performance with large number
> of guests is really a corner case, not the real performance issue.

Well, right now, qemu does not care at all about this, it essentially
leaves the tun device in ALLMULTI state. I should check whether macvtap
at this stage can receive multicast frames at all, but if it does,
it will get them all ;-).

If we want to implement this with kvm, we would have to start with
the qemu virtio-net implementation, to move the receive filter into
the tap device. With tun/tap that will mean less copying to user
space, with macvtap (after implementing TUNSETTXFILTER) we get already
pretty far because we no longer need to have the external interface
in ALLMULTI state. Once that is in place, we can start thinking about
filtering per virtual device.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 19:18                 ` Arnd Bergmann
  0 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 19:18 UTC (permalink / raw)
  To: virtualization
  Cc: Fischer, Anna, netdev, bridge, davem, Paul Congdon (UC Davis),
	evb, adobriyan

On Monday 10 August 2009, Stephen Hemminger wrote:
> On Mon, 10 Aug 2009 16:32:01, "Fischer, Anna" <anna.fischer@hp.com> wrote:
> > How would this work though, if the OS inside the guest wants to register
> > to a particular multicast address? Is this propagated through the backend
> > drivers to the macvlan/macvtap interface?
> 
> Sure filtering is better, but multicast performance with large number
> of guests is really a corner case, not the real performance issue.

Well, right now, qemu does not care at all about this, it essentially
leaves the tun device in ALLMULTI state. I should check whether macvtap
at this stage can receive multicast frames at all, but if it does,
it will get them all ;-).

If we want to implement this with kvm, we would have to start with
the qemu virtio-net implementation, to move the receive filter into
the tap device. With tun/tap that will mean less copying to user
space, with macvtap (after implementing TUNSETTXFILTER) we get already
pretty far because we no longer need to have the external interface
in ALLMULTI state. Once that is in place, we can start thinking about
filtering per virtual device.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:20         ` [Bridge] " Stephen Hemminger
@ 2009-08-27 12:35           ` Or Gerlitz
  -1 siblings, 0 replies; 52+ messages in thread
From: Or Gerlitz @ 2009-08-27 12:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Paul Congdon (UC Davis),
	arnd, anna.fischer, netdev, bridge, davem, adobriyan,
	virtualization, evb

Stephen Hemminger wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>> Looking in macvlan_set_multicast_list() it acts in a similar manner to macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I assume what's left is to add macvlan_hash_xxx multicast logic to map/unmap multicast groups to what macvlan devices want to receive them and this way the flooding can be removed, correct?
> The device can just flood all multicast packets, since the filtering is done on the receive path anyway.
for each multicast packet, macvlan_broadcast is invoked and calls 
skb_clone/ netif_rx for each device, now a smart scheme that takes into 
account (hash) the multicast list of the different macvlan  devices 
would save the skb_clone call, isn't it?

Or.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:20         ` [Bridge] " Stephen Hemminger
                           ` (2 preceding siblings ...)
  (?)
@ 2009-08-27 12:35         ` Or Gerlitz
  -1 siblings, 0 replies; 52+ messages in thread
From: Or Gerlitz @ 2009-08-27 12:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: evb, arnd, anna.fischer, netdev, bridge, adobriyan,
	Paul Congdon (UC Davis),
	virtualization, davem

Stephen Hemminger wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>> Looking in macvlan_set_multicast_list() it acts in a similar manner to macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I assume what's left is to add macvlan_hash_xxx multicast logic to map/unmap multicast groups to what macvlan devices want to receive them and this way the flooding can be removed, correct?
> The device can just flood all multicast packets, since the filtering is done on the receive path anyway.
for each multicast packet, macvlan_broadcast is invoked and calls 
skb_clone/ netif_rx for each device, now a smart scheme that takes into 
account (hash) the multicast list of the different macvlan  devices 
would save the skb_clone call, isn't it?

Or.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Bridge] [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-27 12:35           ` Or Gerlitz
  0 siblings, 0 replies; 52+ messages in thread
From: Or Gerlitz @ 2009-08-27 12:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: evb, arnd, anna.fischer, netdev, bridge, adobriyan,
	Paul Congdon (UC Davis),
	virtualization, davem

Stephen Hemminger wrote:
> Or Gerlitz <ogerlitz@voltaire.com> wrote:
>> Looking in macvlan_set_multicast_list() it acts in a similar manner to macvlan_set_mac_address() in the sense that it calls dev_mc_sync(). I assume what's left is to add macvlan_hash_xxx multicast logic to map/unmap multicast groups to what macvlan devices want to receive them and this way the flooding can be removed, correct?
> The device can just flood all multicast packets, since the filtering is done on the receive path anyway.
for each multicast packet, macvlan_broadcast is invoked and calls 
skb_clone/ netif_rx for each device, now a smart scheme that takes into 
account (hash) the multicast list of the different macvlan  devices 
would save the skb_clone call, isn't it?

Or.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:59           ` Fischer, Anna
@ 2009-08-10 16:16             ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 16:16 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: evb, 'Stephen Hemminger',
	bridge, linux-kernel, netdev, virtualization, davem, kaber,
	adobriyan, 'Or Gerlitz', Paul Congdon (UC Davis)

On Monday 10 August 2009, Fischer, Anna wrote:
> On the VEPA filtering service side, the only change we have implemented
> in the bridging code is that in VEPA mode all frames are passed to the
> uplink on TX. However, frames are still passed through the netfilter 
> hooks before they go out on the wire. On the inbound path, there are
> no changes to the way frames are processed (except the filtering for
> the original source port), so netfilter hooks work in the same way
> as for a normal bridge.

Ah, interesting. I did not realize that the hooks were still active,
although that obviously makes sense. So that would be another
important difference between our implementations.

> If a frame is reflected back because of a hairpin turn, then of course
> the incoming port is the VEPA uplink port and not the port that
> originally sent the frame. So if you are trying to enforce some
> packet filtering on that inbound path, then you would have to do that
> based on MAC addresses and not on bridge ports. But I would assume that
> you would enforce the filtering already before you send out the frame
> to the adjacent bridge. Apart from that, if you enable your bridge to
> behave in VEPA mode, then you would typically do packet filtering etc
> on the adjacent bridge and not use the netfilter hook. You can still use
> both though, if you like.

Right, that was my point. They bridge in VEPA mode would likely be
configured to be completely ignorant of the data going through it
and not do any filter, and you do all filterring on the adjacent
bridge.

I just wasn't sure that this is possible with ebtables if the
adjacent bridge is a Linux system with the bridge in hairpin turn
mode.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 16:16             ` Arnd Bergmann
  0 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 16:16 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: evb, 'Stephen Hemminger',
	bridge, linux-kernel, netdev, virtualization, davem, kaber,
	adobriyan, 'Or Gerlitz', Paul Congdon (UC Davis)

On Monday 10 August 2009, Fischer, Anna wrote:
> On the VEPA filtering service side, the only change we have implemented
> in the bridging code is that in VEPA mode all frames are passed to the
> uplink on TX. However, frames are still passed through the netfilter 
> hooks before they go out on the wire. On the inbound path, there are
> no changes to the way frames are processed (except the filtering for
> the original source port), so netfilter hooks work in the same way
> as for a normal bridge.

Ah, interesting. I did not realize that the hooks were still active,
although that obviously makes sense. So that would be another
important difference between our implementations.

> If a frame is reflected back because of a hairpin turn, then of course
> the incoming port is the VEPA uplink port and not the port that
> originally sent the frame. So if you are trying to enforce some
> packet filtering on that inbound path, then you would have to do that
> based on MAC addresses and not on bridge ports. But I would assume that
> you would enforce the filtering already before you send out the frame
> to the adjacent bridge. Apart from that, if you enable your bridge to
> behave in VEPA mode, then you would typically do packet filtering etc
> on the adjacent bridge and not use the netfilter hook. You can still use
> both though, if you like.

Right, that was my point. They bridge in VEPA mode would likely be
configured to be completely ignorant of the data going through it
and not do any filter, and you do all filterring on the adjacent
bridge.

I just wasn't sure that this is possible with ebtables if the
adjacent bridge is a Linux system with the bridge in hairpin turn
mode.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:59           ` Fischer, Anna
  (?)
@ 2009-08-10 16:16           ` Arnd Bergmann
  -1 siblings, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 16:16 UTC (permalink / raw)
  To: Fischer, Anna
  Cc: Paul Congdon (UC Davis),
	netdev, bridge, linux-kernel, virtualization, adobriyan, evb,
	'Stephen Hemminger',
	davem

On Monday 10 August 2009, Fischer, Anna wrote:
> On the VEPA filtering service side, the only change we have implemented
> in the bridging code is that in VEPA mode all frames are passed to the
> uplink on TX. However, frames are still passed through the netfilter 
> hooks before they go out on the wire. On the inbound path, there are
> no changes to the way frames are processed (except the filtering for
> the original source port), so netfilter hooks work in the same way
> as for a normal bridge.

Ah, interesting. I did not realize that the hooks were still active,
although that obviously makes sense. So that would be another
important difference between our implementations.

> If a frame is reflected back because of a hairpin turn, then of course
> the incoming port is the VEPA uplink port and not the port that
> originally sent the frame. So if you are trying to enforce some
> packet filtering on that inbound path, then you would have to do that
> based on MAC addresses and not on bridge ports. But I would assume that
> you would enforce the filtering already before you send out the frame
> to the adjacent bridge. Apart from that, if you enable your bridge to
> behave in VEPA mode, then you would typically do packet filtering etc
> on the adjacent bridge and not use the netfilter hook. You can still use
> both though, if you like.

Right, that was my point. They bridge in VEPA mode would likely be
configured to be completely ignorant of the data going through it
and not do any filter, and you do all filterring on the adjacent
bridge.

I just wasn't sure that this is possible with ebtables if the
adjacent bridge is a Linux system with the bridge in hairpin turn
mode.

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:23       ` Arnd Bergmann
@ 2009-08-10 15:59           ` Fischer, Anna
  2009-08-10 15:59         ` Fischer, Anna
  1 sibling, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-10 15:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: evb, 'Stephen Hemminger',
	Fischer, Anna, bridge, linux-kernel, netdev, virtualization,
	davem, kaber, adobriyan, 'Or Gerlitz',
	Paul Congdon (UC Davis)

> Subject: Re: [PATCH][RFC] net/bridge: add basic VEPA support
> 
> On Friday 07 August 2009, Paul Congdon (UC Davis) wrote:
> >
> > I don't think your scheme works too well because broadcast packet
> coming
> > from other interfaces on br0 would get replicated and sent across the
> wire
> > to ethB multiple times.
> 
> Right, that won't work. So the bridge patch for the hairpin turn
> is still the best solution. 

Yes, I think that we should separate the discussions between hairpin 
mode on the adjacent bridge and the VEPA filtering service residing
within the end-station. The hairpin feature really has to be
implemented in the bridging code.


> Btw, how will that interact with
> the bride-netfilter (ebtables) setup? Can you apply any filters
> that work on current bridges also between two VEPA ports while
> doing the hairpin turn?

The hairpin mode is implemented on the adjacent bridge. The only 
difference for a hairpin mode port vs. a normal bridge port is that
it can pass frames back out to the same port it came from. All the
netfilter hooks are still in place.

On the VEPA filtering service side, the only change we have implemented
in the bridging code is that in VEPA mode all frames are passed to the
uplink on TX. However, frames are still passed through the netfilter 
hooks before they go out on the wire. On the inbound path, there are
no changes to the way frames are processed (except the filtering for
the original source port), so netfilter hooks work in the same way
as for a normal bridge.

If a frame is reflected back because of a hairpin turn, then of course
the incoming port is the VEPA uplink port and not the port that
originally sent the frame. So if you are trying to enforce some
packet filtering on that inbound path, then you would have to do that
based on MAC addresses and not on bridge ports. But I would assume that
you would enforce the filtering already before you send out the frame
to the adjacent bridge. Apart from that, if you enable your bridge to
behave in VEPA mode, then you would typically do packet filtering etc
on the adjacent bridge and not use the netfilter hook. You can still use
both though, if you like.

Anna

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
@ 2009-08-10 15:59           ` Fischer, Anna
  0 siblings, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-10 15:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: evb, 'Stephen Hemminger',
	Fischer, Anna, bridge, linux-kernel, netdev, virtualization,
	davem, kaber, adobriyan, 'Or Gerlitz',
	Paul Congdon (UC Davis)

> Subject: Re: [PATCH][RFC] net/bridge: add basic VEPA support
> 
> On Friday 07 August 2009, Paul Congdon (UC Davis) wrote:
> >
> > I don't think your scheme works too well because broadcast packet
> coming
> > from other interfaces on br0 would get replicated and sent across the
> wire
> > to ethB multiple times.
> 
> Right, that won't work. So the bridge patch for the hairpin turn
> is still the best solution. 

Yes, I think that we should separate the discussions between hairpin 
mode on the adjacent bridge and the VEPA filtering service residing
within the end-station. The hairpin feature really has to be
implemented in the bridging code.


> Btw, how will that interact with
> the bride-netfilter (ebtables) setup? Can you apply any filters
> that work on current bridges also between two VEPA ports while
> doing the hairpin turn?

The hairpin mode is implemented on the adjacent bridge. The only 
difference for a hairpin mode port vs. a normal bridge port is that
it can pass frames back out to the same port it came from. All the
netfilter hooks are still in place.

On the VEPA filtering service side, the only change we have implemented
in the bridging code is that in VEPA mode all frames are passed to the
uplink on TX. However, frames are still passed through the netfilter 
hooks before they go out on the wire. On the inbound path, there are
no changes to the way frames are processed (except the filtering for
the original source port), so netfilter hooks work in the same way
as for a normal bridge.

If a frame is reflected back because of a hairpin turn, then of course
the incoming port is the VEPA uplink port and not the port that
originally sent the frame. So if you are trying to enforce some
packet filtering on that inbound path, then you would have to do that
based on MAC addresses and not on bridge ports. But I would assume that
you would enforce the filtering already before you send out the frame
to the adjacent bridge. Apart from that, if you enable your bridge to
behave in VEPA mode, then you would typically do packet filtering etc
on the adjacent bridge and not use the netfilter hook. You can still use
both though, if you like.

Anna

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-10 15:23       ` Arnd Bergmann
  2009-08-10 15:59           ` Fischer, Anna
@ 2009-08-10 15:59         ` Fischer, Anna
  1 sibling, 0 replies; 52+ messages in thread
From: Fischer, Anna @ 2009-08-10 15:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Paul Congdon (UC Davis),
	Fischer, Anna, netdev, bridge, linux-kernel, virtualization,
	adobriyan, evb, 'Stephen Hemminger',
	davem

> Subject: Re: [PATCH][RFC] net/bridge: add basic VEPA support
> 
> On Friday 07 August 2009, Paul Congdon (UC Davis) wrote:
> >
> > I don't think your scheme works too well because broadcast packet
> coming
> > from other interfaces on br0 would get replicated and sent across the
> wire
> > to ethB multiple times.
> 
> Right, that won't work. So the bridge patch for the hairpin turn
> is still the best solution. 

Yes, I think that we should separate the discussions between hairpin 
mode on the adjacent bridge and the VEPA filtering service residing
within the end-station. The hairpin feature really has to be
implemented in the bridging code.


> Btw, how will that interact with
> the bride-netfilter (ebtables) setup? Can you apply any filters
> that work on current bridges also between two VEPA ports while
> doing the hairpin turn?

The hairpin mode is implemented on the adjacent bridge. The only 
difference for a hairpin mode port vs. a normal bridge port is that
it can pass frames back out to the same port it came from. All the
netfilter hooks are still in place.

On the VEPA filtering service side, the only change we have implemented
in the bridging code is that in VEPA mode all frames are passed to the
uplink on TX. However, frames are still passed through the netfilter 
hooks before they go out on the wire. On the inbound path, there are
no changes to the way frames are processed (except the filtering for
the original source port), so netfilter hooks work in the same way
as for a normal bridge.

If a frame is reflected back because of a hairpin turn, then of course
the incoming port is the VEPA uplink port and not the port that
originally sent the frame. So if you are trying to enforce some
packet filtering on that inbound path, then you would have to do that
based on MAC addresses and not on bridge ports. But I would assume that
you would enforce the filtering already before you send out the frame
to the adjacent bridge. Apart from that, if you enable your bridge to
behave in VEPA mode, then you would typically do packet filtering etc
on the adjacent bridge and not use the netfilter hook. You can still use
both though, if you like.

Anna

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 19:44     ` Paul Congdon (UC Davis)
  2009-08-10 15:23       ` Arnd Bergmann
@ 2009-08-10 15:23       ` Arnd Bergmann
  2009-08-10 15:59           ` Fischer, Anna
  2009-08-10 15:59         ` Fischer, Anna
  1 sibling, 2 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 15:23 UTC (permalink / raw)
  To: evb
  Cc: 'Stephen Hemminger', 'Fischer, Anna',
	bridge, linux-kernel, netdev, virtualization, davem, kaber,
	adobriyan, 'Or Gerlitz'

On Friday 07 August 2009, Paul Congdon (UC Davis) wrote:
> 
> I don't think your scheme works too well because broadcast packet coming
> from other interfaces on br0 would get replicated and sent across the wire
> to ethB multiple times.

Right, that won't work. So the bridge patch for the hairpin turn
is still the best solution. Btw, how will that interact with
the bride-netfilter (ebtables) setup? Can you apply any filters
that work on current bridges also between two VEPA ports while
doing the hairpin turn?

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 19:44     ` Paul Congdon (UC Davis)
@ 2009-08-10 15:23       ` Arnd Bergmann
  2009-08-10 15:23       ` Arnd Bergmann
  1 sibling, 0 replies; 52+ messages in thread
From: Arnd Bergmann @ 2009-08-10 15:23 UTC (permalink / raw)
  To: evb
  Cc: 'Fischer, Anna',
	netdev, bridge, linux-kernel, virtualization, adobriyan,
	'Stephen Hemminger',
	davem

On Friday 07 August 2009, Paul Congdon (UC Davis) wrote:
> 
> I don't think your scheme works too well because broadcast packet coming
> from other interfaces on br0 would get replicated and sent across the wire
> to ethB multiple times.

Right, that won't work. So the bridge patch for the hairpin turn
is still the best solution. Btw, how will that interact with
the bride-netfilter (ebtables) setup? Can you apply any filters
that work on current bridges also between two VEPA ports while
doing the hairpin turn?

	Arnd <><

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 11:29   ` Arnd Bergmann
  2009-08-07 19:44     ` [evb] " Paul Congdon (UC Davis)
@ 2009-08-07 19:44     ` Paul Congdon (UC Davis)
  2009-08-10 15:23       ` Arnd Bergmann
  2009-08-10 15:23       ` Arnd Bergmann
  1 sibling, 2 replies; 52+ messages in thread
From: Paul Congdon (UC Davis) @ 2009-08-07 19:44 UTC (permalink / raw)
  To: evb, 'Stephen Hemminger'
  Cc: 'Fischer, Anna',
	netdev, bridge, linux-kernel, virtualization,
	'Or Gerlitz',
	adobriyan, davem


[-- Attachment #1.1: Type: text/plain, Size: 862 bytes --]

Arnd,

 

I don't think your scheme works too well because broadcast packet coming
from other interfaces on br0 would get replicated and sent across the wire
to ethB multiple times.

 

Paul

That way you should be able to do something
like:

Host A Host B

/- nalvcam0 -\ /- macvlan0 - 192.168.1.1
br0 -| |- ethA === ethB -|
\- nalvcam1 -/ \- macvlan1 - 192.168.1.2

Now assuming that macvlan0 and macvlan1 are in different
network namespaces or belong to different KVM guests, these
guests would be able to communicate with each other through
the bridge on host A, which can set the policy (using ebtables)
for this communication and get interface statistics on its
nalvcam interfaces. Also, instead of having the br0, Host A could
assign an IP addresses to the two nalvcam interfaces that host
B has, and use IP forwarding between the guests of host B.



 


[-- Attachment #1.2: Type: text/html, Size: 9293 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Bridge mailing list
Bridge@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/bridge

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [evb] Re: [PATCH][RFC] net/bridge: add basic VEPA support
  2009-08-07 11:29   ` Arnd Bergmann
@ 2009-08-07 19:44     ` Paul Congdon (UC Davis)
  2009-08-07 19:44     ` Paul Congdon (UC Davis)
  1 sibling, 0 replies; 52+ messages in thread
From: Paul Congdon (UC Davis) @ 2009-08-07 19:44 UTC (permalink / raw)
  To: evb, 'Stephen Hemminger'
  Cc: 'Fischer, Anna',
	netdev, bridge, linux-kernel, virtualization, adobriyan, davem


[-- Attachment #1.1: Type: text/plain, Size: 862 bytes --]

Arnd,

 

I don't think your scheme works too well because broadcast packet coming
from other interfaces on br0 would get replicated and sent across the wire
to ethB multiple times.

 

Paul

That way you should be able to do something
like:

Host A Host B

/- nalvcam0 -\ /- macvlan0 - 192.168.1.1
br0 -| |- ethA === ethB -|
\- nalvcam1 -/ \- macvlan1 - 192.168.1.2

Now assuming that macvlan0 and macvlan1 are in different
network namespaces or belong to different KVM guests, these
guests would be able to communicate with each other through
the bridge on host A, which can set the policy (using ebtables)
for this communication and get interface statistics on its
nalvcam interfaces. Also, instead of having the br0, Host A could
assign an IP addresses to the two nalvcam interfaces that host
B has, and use IP forwarding between the guests of host B.



 


[-- Attachment #1.2: Type: text/html, Size: 9293 bytes --]

[-- Attachment #2: Type: text/plain, Size: 184 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2009-08-27 12:36 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-07 20:35 [evb] RE: [PATCH][RFC] net/bridge: add basic VEPA support Yaron Haviv
2009-08-07 20:35 ` [Bridge] " Yaron Haviv
2009-08-07 20:35 ` Yaron Haviv
2009-08-07 21:00 ` Fischer, Anna
2009-08-07 21:00 ` Fischer, Anna
2009-08-07 21:00   ` [Bridge] " Fischer, Anna
2009-08-08  9:22   ` Arnd Bergmann
2009-08-08  9:22     ` [Bridge] " Arnd Bergmann
2009-08-08  9:22   ` Arnd Bergmann
2009-08-07 21:06 ` Paul Congdon (UC Davis)
2009-08-07 21:06 ` Paul Congdon (UC Davis)
2009-08-07 21:06   ` [Bridge] " Paul Congdon (UC Davis)
2009-08-07 21:36   ` Stephen Hemminger
2009-08-07 21:36     ` [Bridge] " Stephen Hemminger
2009-08-09 11:19     ` Or Gerlitz
2009-08-09 11:19       ` [Bridge] " Or Gerlitz
2009-08-10 15:20       ` Stephen Hemminger
2009-08-10 15:20         ` [Bridge] " Stephen Hemminger
2009-08-10 15:28         ` Arnd Bergmann
2009-08-10 15:28           ` [Bridge] " Arnd Bergmann
2009-08-10 16:32           ` Fischer, Anna
2009-08-10 16:32             ` [Bridge] " Fischer, Anna
2009-08-10 16:51             ` Stephen Hemminger
2009-08-10 16:51             ` Stephen Hemminger
2009-08-10 16:51               ` [Bridge] " Stephen Hemminger
2009-08-10 19:18               ` Arnd Bergmann
2009-08-10 19:18               ` Arnd Bergmann
2009-08-10 19:18                 ` [Bridge] " Arnd Bergmann
2009-08-10 16:32           ` Fischer, Anna
2009-08-10 15:28         ` Arnd Bergmann
2009-08-27 12:35         ` Or Gerlitz
2009-08-27 12:35         ` Or Gerlitz
2009-08-27 12:35           ` [Bridge] " Or Gerlitz
2009-08-10 15:20       ` Stephen Hemminger
2009-08-09 11:19     ` Or Gerlitz
2009-08-07 21:36   ` Stephen Hemminger
     [not found] ` <0199E0D51A61344794750DC57738F58E6D6A6CD803__29862.6656564467$1249679159$gmane$org@GVW1118EXC.americas.hpqcorp.net>
2009-08-08  8:50   ` Benny Amorsen
2009-08-08  8:50     ` [Bridge] " Benny Amorsen
2009-08-08  9:44     ` Arnd Bergmann
2009-08-08  9:44     ` Arnd Bergmann
2009-08-08  9:44       ` [Bridge] " Arnd Bergmann
2009-08-08  8:50   ` Benny Amorsen
  -- strict thread matches above, loose matches on Subject: below --
2009-06-15 17:33 Fischer, Anna
2009-08-07  4:00 ` Stephen Hemminger
2009-08-07 11:29   ` Arnd Bergmann
2009-08-07 19:44     ` [evb] " Paul Congdon (UC Davis)
2009-08-07 19:44     ` Paul Congdon (UC Davis)
2009-08-10 15:23       ` Arnd Bergmann
2009-08-10 15:23       ` Arnd Bergmann
2009-08-10 15:59         ` Fischer, Anna
2009-08-10 15:59           ` Fischer, Anna
2009-08-10 16:16           ` Arnd Bergmann
2009-08-10 16:16           ` Arnd Bergmann
2009-08-10 16:16             ` Arnd Bergmann
2009-08-10 15:59         ` Fischer, Anna

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.