All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Or Gerlitz <or.gerlitz@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	davem@davemloft.net, roland@kernel.org, netdev@vger.kernel.org,
	ali@mellanox.com, sean.hefty@intel.com,
	Erez Shitrit <erezsh@mellanox.co.il>,
	Doug Ledford <dledford@redhat.com>
Subject: Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality
Date: Sun, 12 Aug 2012 13:22:40 +0300	[thread overview]
Message-ID: <20120812102240.GG1421@redhat.com> (raw)
In-Reply-To: <CAJZOPZKtdvxTGvrxj+T896mEexb=yN9s1cCuqUmhhzCOvUPEnA@mail.gmail.com>

On Wed, Aug 08, 2012 at 08:23:15AM +0300, Or Gerlitz wrote:
> On Sun, Aug 5, 2012 at 9:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> [...]
> > So it seems that a sane solution would involve an extra level of
> > indirection, with guest addresses being translated to host IB addresses.
> > As long as you do this, maybe using an ethernet frame format makes sense.
> [...]
> 
> Yep, that's among the points we're trying to make, the way you've put
> it makes it clearer.
> 
> > So far the things that make sense. Here are some that don't, to me:
> 
> > - Is a pdf presentation all you have in terms of documentation?
> >   We are talking communication protocols here - I would expect a
> >   proper spec, and some effort to standardize, otherwise where's the
> >   guarantee it won't change in an incompatible way?
> 
> To be precise, the solution uses 100% IPoIB wire-protocol, so we don't
> see a need
> for any spec change / standardization effort.

Yes, I am guessing this is the real reason you pack LID/QPN
in the MAC - to make it all local. But it's a hack really,
and if you start storing it all in the SM you will need
to document the format so others can inter-operate.

> This might go to the 1st
> point you've
> brought... improve the documentation, will do that. The pdf you looked
> at was presented
> in a conference.
> 
> >   Other things that I would expect to be addressed in such a spec is
> >   interaction with other IPoIB features, such as connected
> >   mode, checksum offloading etc, and IB features such as multipath etc.
> 
> For the eipoib interface, it doesn't really matters if the underlyind
> ipoib clones used by it (we call them VIFs) use connected or datagram
> mode, what does matter is the MTU and offload features supported by
> these VIFs, for which the eipoib interface will have the min among all
> these VIFs. Since for a given eipoib nic, all its VIFs must originated
> from the same IPoIB PIF (e.g ib0) its easy admin job to make sure they
> all have the same mtu / features which are needed for that eipoib nic,
> e.g by using the same mode (connected/datagram for all of them), hope
> this is clear.
> 

Just pointing out all this needs to be documented.

> > - The way you encode LID/QPN in the MAC seems questionable. IIRC there's
> >   more to IB addressing than just the LID.  Since everyone on the subnet
> >   need access to this translation, I think it makes sense to store it in
> >   the SM. I think this would also obviate some IPv4 specific hacks in kernel.
> 
> The idead beyond the encoding was uniqueness, LID/QPN is unique per IB
> HCA end-node.

But then it breaks with VM migration, IB failover, softmac setting in
guest, probably more?

> I wasn't sure to understand the comment re the IPv4 hacks.

This refers to the ARP hack that you use to fix
VM migration.

> > - IGMP/MAC snooping in a driver is just too hairy.
> 
> mmm, any rough idea/direction how to do that otherwise?

Sure, even two ways, ideally you'd do both :)
A. fix macvtap
1. Use netdev_for_each_mc_addr etc to get multicast addresses
2. teach macvtap to fill that in (it currently floods multicasts
   for guest to guest communication so we ned to fix it anyway)

B. fix bridge
   teach bridge to work for VMs without using promisc mode


> >   As you point out, bridge currently needs the uplink in promisc mode.
> >   I don't think a driver should work around that limitation.
> >   For some setups, it might be interesting to remove the
> >   promisc mode requirement, failing that,
> >   I think you could use macvtap passthrough.
> 
> That's in the plans, the current code doesn't assume that the eipoib
> has bridge on top, for VM networking it works with bridge + tap,
> bridge + macvtap, but it would easily work with passthrough when we
> allow to create multiple eipoib interfaces on the same ipoib PIF (e.g
> today for the ib0 PIF we create eipoib eth0, and then two VIFs ib0.1
> and ib0.2 that are enslaved by eth0, but next we will create eth1 and
> eth2 which will use ib0.1 and ib0.2
> respectively.

The whole promisc mode emulation is there for the bridge, no?
Since you don't support promisc, ideally we'd check a hardware
capability and fail gracefully, though naturally this is not top
priority.

> > - Currently migration works without host kernel help, would be
> >   preferable to keep it that way.
> 
> OK

  reply	other threads:[~2012-08-12 10:24 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-01 17:09 [PATCH V2 00/12] Add Ethernet IPoIB driver Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 01/12] IB/ipoib: Add rtnl_link_ops support Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 02/12] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 03/12] include/linux: Add private flags for IPoIB interfaces Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 04/12] IB/ipoib: Add support for acting as VIF Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 05/12] net: Add ndo_set_vif_param operation to serve eIPoIB VIFs Or Gerlitz
2012-08-02  0:17   ` Ben Hutchings
2012-08-02  8:25     ` Erez Shitrit
2012-08-01 17:09 ` [PATCH V2 06/12] net/core: Add rtnetlink support to vif parameters Or Gerlitz
2012-08-02  0:20   ` Ben Hutchings
2012-08-02 15:29     ` Erez Shitrit
2012-08-01 17:09 ` [PATCH V2 07/12] net/eipoib: Add private header file Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 08/12] net/eipoib: Add ethtool file support Or Gerlitz
2012-08-02  0:22   ` Ben Hutchings
2012-08-02  8:35     ` Erez Shitrit
2012-08-02 15:42       ` Ben Hutchings
2012-08-01 17:09 ` [PATCH V2 09/12] net/eipoib: Add main driver functionality Or Gerlitz
2012-08-02 17:15   ` Eric W. Biederman
2012-08-03 20:31     ` Ali Ayoub
2012-08-03 21:33       ` David Miller
2012-08-03 22:39         ` Ali Ayoub
2012-08-03 23:36           ` David Miller
2012-08-04 21:23             ` Or Gerlitz
2012-08-04 21:44               ` Or Gerlitz
2012-08-04 23:19                 ` Eric W. Biederman
2012-08-07  0:14             ` Ali Ayoub
2012-08-07  0:44               ` Eric W. Biederman
2012-08-07  1:21                 ` Re[2]: " Naoto MATSUMOTO
2012-08-15  9:10                   ` Re[3]: " Naoto MATSUMOTO
2012-08-07  3:33                 ` Eric W. Biederman
2012-08-08  6:04                   ` Or Gerlitz
2012-08-08  8:36                     ` Eric W. Biederman
2012-08-09  4:06                       ` Or Gerlitz
2012-08-12 14:05                         ` Michael S. Tsirkin
2012-08-07  3:37                 ` Joseph Glanville
2012-08-08  7:32                 ` Or Gerlitz
2012-08-08  9:17                   ` Eric W. Biederman
2012-08-09  4:34                     ` Or Gerlitz
2012-08-12 10:36                       ` Michael S. Tsirkin
2012-08-04  0:02           ` Ali Ayoub
2012-08-04  0:05             ` David Miller
2012-08-04  1:34             ` Eric W. Biederman
2012-08-04 21:33               ` Or Gerlitz
2012-08-05 18:50     ` Michael S. Tsirkin
2012-08-08  5:23       ` Or Gerlitz
2012-08-12 10:22         ` Michael S. Tsirkin [this message]
2012-08-12 13:09           ` Or Gerlitz
2012-08-12 13:41             ` Michael S. Tsirkin
2012-08-12 13:15           ` Or Gerlitz
2012-08-12 13:55             ` Michael S. Tsirkin
2012-08-12 14:13               ` Or Gerlitz
2012-08-12 20:54                 ` Michael S. Tsirkin
2012-08-14  8:44                   ` Or Gerlitz
2012-08-20 18:57                   ` Michael S. Tsirkin
2012-08-23  6:45                     ` Or Gerlitz
2012-08-14  7:41               ` Or Gerlitz
2012-08-12 10:54         ` Michael S. Tsirkin
2012-08-12 13:19           ` Or Gerlitz
2012-08-12 15:40         ` Eric W. Biederman
2012-08-13  8:33           ` Or Gerlitz
2012-08-13 16:08             ` Eric W. Biederman
2012-09-03 20:53       ` Or Gerlitz
2012-09-03 21:22         ` Michael S. Tsirkin
2012-09-04 18:50           ` Or Gerlitz
2012-09-04 19:31             ` Eric W. Biederman
2012-09-04 19:47               ` Or Gerlitz
2012-09-04 21:21             ` Michael S. Tsirkin
2012-09-04 18:57           ` Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 10/12] net/eipoib: Add sysfs support Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 11/12] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 12/12] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour Or Gerlitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120812102240.GG1421@redhat.com \
    --to=mst@redhat.com \
    --cc=ali@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=erezsh@mellanox.co.il \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=or.gerlitz@gmail.com \
    --cc=roland@kernel.org \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.