All of lore.kernel.org
 help / color / mirror / Atom feed
From: Or Gerlitz <or.gerlitz@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ali Ayoub <ali@mellanox.com>, David Miller <davem@davemloft.net>,
	ogerlitz@mellanox.com, roland@kernel.org, netdev@vger.kernel.org,
	sean.hefty@intel.com, erezsh@mellanox.co.il, dledford@redhat.com,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality
Date: Thu, 9 Aug 2012 07:34:23 +0300	[thread overview]
Message-ID: <CAJZOPZ+JKZAxF-SaWxCd_8pLqhrLXrPyQHEo0n-gNzuvMOA02w@mail.gmail.com> (raw)
In-Reply-To: <87pq714uaa.fsf@xmission.com>

Eric W. Biederman <ebiederm@xmission.com> wrote:
> Or Gerlitz <or.gerlitz@gmail.com> writes:

>> as you wrote here, when performance and ease-of-use is under the spot,
>> VM deployments tend to not to use routing.
>> This is b/c it involves more over-head on the packet forwarding, and
>> more administration work, for example, for setting routing rules that
>> involve the VM IP address, something which AFAIK the hypervisor have
>> no clue on, also its unclear to me if/how live migration can work in such setting.

> All you need to make proxy-arp essentially pain free is a smart dhcp
> relay, that sets up the routes.

So dhcp relay would help to maybe avoid some of the pain, however,
can you elaborate if/how live migration is supported under this scheme?

>> From this exact reason, there's a bunch of use-cases by tools and
>> cloud stacks (such as open stack, ovirt, more) which do use bridged
>> mode and the rest of the Ethernet envelope, such as using virtual L2
>> vlan domains, ebtables based rules, etc etc. Where they and are not
>> application to ipoib, but are working file ith eipoib.

> Yes I am certain all of their IPv6 traffic works fine.

I'm not sure to follow this comment.

> Regardless those are open source projects and can be modified to add
> support to cleanly support inifiniband.

open source/code can be modified indeed, but since exposing IPoIB link
layer to tools/emulators and VMs doesn't really make sense (see below),
we brought that this approach as a way to go for allowing people to use
bridging mode when the fabric is IB.

>> You mentioned that bridging mode doesn't apply to environment such as
>> 802.11, and hence routing mode is used, we are trying to make a pointn
>> here that bridging mode applies to ipoib with the approach suggested by eipoib.

> You are completely failing.  Every time I look I see something about
> eIPoIB that is even more broken.  Given that eIPoIB is a NAT
> implementation that isn't really a surprise but still.

> eIPoIB imposes enough overhead that I expect that routing is cheaper,
> so your performance advantges go right out the window.
>
> eIPoIB is seriously incompatible with ethernet breaking almost
> everything and barely allowing IPv4 to work.

I don't agree on this incompatiblity statement, you had a claim
on DHCP and I addressed it, beyond that, you don't like the eIPoIB
basic idea/design but this can't base an incompatiblity argument.


>> Also, if we extend the discussion a bit, there are two more aspects to throw in:
>> The first is the performance thing we have already started to mention
>> -- specifically, the approach for RX zero copy (into the VM buffer),
>> use designs such as vhost + macvtap NIC in passthrough mode which is
>> likey to be set over a per VM hypervisor NIC, e.g such as the ones
>> provided by VMDQ patches John Fastabend started to post (see
>> http://marc.info/?l=linux-netdev&m=134264998405581&w=2) -- the ib0.N
>> clone child are IPoIB VMDQ NICs if you like, and setting an eipoib NIC
>> on top of each they can be plugged to that design.

> If you care about performance link-layer NAT is not the way to go.
> Teach the pieces you care about how to talk infiniband.
>
>> The 2nd aspect, is NON VM environments where a NIC with Ethernet look
>> and feel is required for IP traffic, but this have to live within an
>> echo-system that fully uses IPoIB.
>> In other words, a use case where IPoIB has to be below the cover for
>> set of some specific apps, or nodes but do IP interaction with other
>> apps/nodes and gateways who use IPoIB, the eIPoIB driver provides that
>> functionality.


> ip link add type dummy.

> There now you have an interface with ethernet look and feel, and
> routing can happily avoid it.

again, not sure to follow, you mean "routing can happily use it", correct? that
is do routing between the dummy interface to IPoIB interface?


>> So, to sum up, routing / proxy-arp seems to be off where we are targeting.

> My condolences.
> The existence of router / proxy-arp means that solutions do exist
> (unlike your previous claim) you just don't like the idea of deploying them.

Don't like them from set of arguments, which we are covering here, re
manageability it still needs to clarified if/how live migration work and what
does it mean to always mandate dhcp relay.

> Infiniband is standard enough you could quite easily implement virtual
> infiniband bridging as an alternative to ethernet bridging.

Not really, as Michael indicated in his response over this thread
http://marc.info/?l=linux-netdev&m=134419288218373&w=2
IPoIB link layer addresses use IB HW constructs for which soft
hardware address setting isn't supported, and this interferes
with live migration.

Or.

  reply	other threads:[~2012-08-09  4:34 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-01 17:09 [PATCH V2 00/12] Add Ethernet IPoIB driver Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 01/12] IB/ipoib: Add rtnl_link_ops support Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 02/12] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 03/12] include/linux: Add private flags for IPoIB interfaces Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 04/12] IB/ipoib: Add support for acting as VIF Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 05/12] net: Add ndo_set_vif_param operation to serve eIPoIB VIFs Or Gerlitz
2012-08-02  0:17   ` Ben Hutchings
2012-08-02  8:25     ` Erez Shitrit
2012-08-01 17:09 ` [PATCH V2 06/12] net/core: Add rtnetlink support to vif parameters Or Gerlitz
2012-08-02  0:20   ` Ben Hutchings
2012-08-02 15:29     ` Erez Shitrit
2012-08-01 17:09 ` [PATCH V2 07/12] net/eipoib: Add private header file Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 08/12] net/eipoib: Add ethtool file support Or Gerlitz
2012-08-02  0:22   ` Ben Hutchings
2012-08-02  8:35     ` Erez Shitrit
2012-08-02 15:42       ` Ben Hutchings
2012-08-01 17:09 ` [PATCH V2 09/12] net/eipoib: Add main driver functionality Or Gerlitz
2012-08-02 17:15   ` Eric W. Biederman
2012-08-03 20:31     ` Ali Ayoub
2012-08-03 21:33       ` David Miller
2012-08-03 22:39         ` Ali Ayoub
2012-08-03 23:36           ` David Miller
2012-08-04 21:23             ` Or Gerlitz
2012-08-04 21:44               ` Or Gerlitz
2012-08-04 23:19                 ` Eric W. Biederman
2012-08-07  0:14             ` Ali Ayoub
2012-08-07  0:44               ` Eric W. Biederman
2012-08-07  1:21                 ` Re[2]: " Naoto MATSUMOTO
2012-08-15  9:10                   ` Re[3]: " Naoto MATSUMOTO
2012-08-07  3:33                 ` Eric W. Biederman
2012-08-08  6:04                   ` Or Gerlitz
2012-08-08  8:36                     ` Eric W. Biederman
2012-08-09  4:06                       ` Or Gerlitz
2012-08-12 14:05                         ` Michael S. Tsirkin
2012-08-07  3:37                 ` Joseph Glanville
2012-08-08  7:32                 ` Or Gerlitz
2012-08-08  9:17                   ` Eric W. Biederman
2012-08-09  4:34                     ` Or Gerlitz [this message]
2012-08-12 10:36                       ` Michael S. Tsirkin
2012-08-04  0:02           ` Ali Ayoub
2012-08-04  0:05             ` David Miller
2012-08-04  1:34             ` Eric W. Biederman
2012-08-04 21:33               ` Or Gerlitz
2012-08-05 18:50     ` Michael S. Tsirkin
2012-08-08  5:23       ` Or Gerlitz
2012-08-12 10:22         ` Michael S. Tsirkin
2012-08-12 13:09           ` Or Gerlitz
2012-08-12 13:41             ` Michael S. Tsirkin
2012-08-12 13:15           ` Or Gerlitz
2012-08-12 13:55             ` Michael S. Tsirkin
2012-08-12 14:13               ` Or Gerlitz
2012-08-12 20:54                 ` Michael S. Tsirkin
2012-08-14  8:44                   ` Or Gerlitz
2012-08-20 18:57                   ` Michael S. Tsirkin
2012-08-23  6:45                     ` Or Gerlitz
2012-08-14  7:41               ` Or Gerlitz
2012-08-12 10:54         ` Michael S. Tsirkin
2012-08-12 13:19           ` Or Gerlitz
2012-08-12 15:40         ` Eric W. Biederman
2012-08-13  8:33           ` Or Gerlitz
2012-08-13 16:08             ` Eric W. Biederman
2012-09-03 20:53       ` Or Gerlitz
2012-09-03 21:22         ` Michael S. Tsirkin
2012-09-04 18:50           ` Or Gerlitz
2012-09-04 19:31             ` Eric W. Biederman
2012-09-04 19:47               ` Or Gerlitz
2012-09-04 21:21             ` Michael S. Tsirkin
2012-09-04 18:57           ` Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 10/12] net/eipoib: Add sysfs support Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 11/12] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 12/12] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour Or Gerlitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZOPZ+JKZAxF-SaWxCd_8pLqhrLXrPyQHEo0n-gNzuvMOA02w@mail.gmail.com \
    --to=or.gerlitz@gmail.com \
    --cc=ali@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=erezsh@mellanox.co.il \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=roland@kernel.org \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.