All of lore.kernel.org
 help / color / mirror / Atom feed
From: Or Gerlitz <ogerlitz@mellanox.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	<davem@davemloft.net>, <roland@kernel.org>,
	<netdev@vger.kernel.org>, <ali@mellanox.com>,
	<sean.hefty@intel.com>, Erez Shitrit <erezsh@mellanox.co.il>,
	Doug Ledford <dledford@redhat.com>
Subject: Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality
Date: Tue, 14 Aug 2012 11:44:26 +0300	[thread overview]
Message-ID: <502A0FEA.5050806@mellanox.com> (raw)
In-Reply-To: <20120812205457.GA14081@redhat.com>

On 12/08/2012 23:54, Michael S. Tsirkin wrote:
> On Sun, Aug 12, 2012 at 05:13:43PM +0300, Or Gerlitz wrote:
>> On 12/08/2012 16:55, Michael S. Tsirkin wrote:
>>> I didn't realize you do ARP snooping. Why? I know you mangle
>>> outgoing ARP packets,
>>
>> Maybe I wasn't accurate/clear, we do mangle outgoing/incoming ARP
>> packets, from/to Ethernet ARPs to/from IPoIB ARPs.
>>
>>
>>> this will go away if you maintain a mapping in SM accessible to all guests.
>>
>> guests don't interact with IB, I assume you referred to dom0 code, eIPoIB or
>> another driver in the host. But what mapping exactly?
>
> Well we are getting into protocol design here.

wait... reading your responses again, I realized that we 1st and most 
have to (try and) agree on the
problem statement before going/jumping to solutions.

AFAIU your email/s you maybe think that we mandate the admin to set a 
specific MAC to the VM which is derived from the LID/QPN of the IPoIB 
VIF serving it, well this is wrong, we don't,  OTOH, indeed, the VM 
source mac isn't sent on the wire, since the Ethernet header is dropped, 
and on the receiving side is constructed from the LID/QPN
the IB packet arrived from, see next.

This reconstruction of what we call the REMAC (remote ethernet mac) is 
based in the current submission on
the LID/QPN, and as I said earlier on this thread, we are revisiting 
this approach -- where your idea below sounds
good: the eipoib driver can register with the SA an IB "service record" 
entry mapping from LID/QPN to the VM mac, when ever a VM is to be served 
by this eipoib instance, and remove the entry when the VM shouldn't be 
served any more. This will allow to preserve 1:1 the Ethernet MAC header 
sent by VMs on the receiving side.



> So here's a sketch showing how you could build a protocol that does work.  But note it is not *exactly* IPoIB.

HOWEVER, this doesn't touch the IPoIB wire protocol, and hence on the 
wire it IS exactly IPoIB. We only make use of your lovely suggestion to 
apply this SA assistance, so the change doesn't involve 
hardware/firmware nor the wire protocol.

Or.


> It is I think close enough that you can use existing NIC hardware/firmware, which is why it differs slightly from what Eric described, and is more complex.  But it still shares the same property of no hacks, no packet snooping in driver, etc.
>
>
> And if you want to go that route, you really should talk to some IB
> protocol people to figure out what works, write a spec and try to
> standardize. lkml/netdev is not the right place.
>
> But since you asked, if I had to, I would probably try to
> do it like this:
>
> - Each device registers with the SA specifying the
>    mac address (+ vlan?), SA stores the translation from that
>    to IPoIB address.
> - alternatively, SA admin configures the translation statically
> - you get a packet with 6 byte mac address,
>    query the SA for a mapping to IPoIB address, strip
>    ethernet frame and send
> - multicast GID addresses can be similar, filled either when registering
>    for multicast or by SA admin
>
> I think it's possible that you could also convert a mac address to
> EUI-64 and prepend a prefix to get a legal GID. But maybe I'm missing
> something.  This could be handy for multicast.
>
>
> In both cases:
> - SA could return GID that you then resolve to
>    LID using another query, or it could return LID so you save a roundtrip
> - results can be cached locally
> - SA can send updates when translation changes to flush this cache
>
>
> Now above means protocols such as ARP and DHCP use 48 bit addresses so
> you can not mix this new protocol with IPoIB.  Maybe IPoIB could simply
> ignore irrelevant packets, but it's best not to try, get a
> different all-broadcast group and CM ID instead to avoid confusion.
>
> One other interesting thing you can do is forward multicast
> registration data from the router, translate to mgid
> by the SA and do appropriate IB mcast registrations.
>
>

  reply	other threads:[~2012-08-14  8:46 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-01 17:09 [PATCH V2 00/12] Add Ethernet IPoIB driver Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 01/12] IB/ipoib: Add rtnl_link_ops support Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 02/12] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 03/12] include/linux: Add private flags for IPoIB interfaces Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 04/12] IB/ipoib: Add support for acting as VIF Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 05/12] net: Add ndo_set_vif_param operation to serve eIPoIB VIFs Or Gerlitz
2012-08-02  0:17   ` Ben Hutchings
2012-08-02  8:25     ` Erez Shitrit
2012-08-01 17:09 ` [PATCH V2 06/12] net/core: Add rtnetlink support to vif parameters Or Gerlitz
2012-08-02  0:20   ` Ben Hutchings
2012-08-02 15:29     ` Erez Shitrit
2012-08-01 17:09 ` [PATCH V2 07/12] net/eipoib: Add private header file Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 08/12] net/eipoib: Add ethtool file support Or Gerlitz
2012-08-02  0:22   ` Ben Hutchings
2012-08-02  8:35     ` Erez Shitrit
2012-08-02 15:42       ` Ben Hutchings
2012-08-01 17:09 ` [PATCH V2 09/12] net/eipoib: Add main driver functionality Or Gerlitz
2012-08-02 17:15   ` Eric W. Biederman
2012-08-03 20:31     ` Ali Ayoub
2012-08-03 21:33       ` David Miller
2012-08-03 22:39         ` Ali Ayoub
2012-08-03 23:36           ` David Miller
2012-08-04 21:23             ` Or Gerlitz
2012-08-04 21:44               ` Or Gerlitz
2012-08-04 23:19                 ` Eric W. Biederman
2012-08-07  0:14             ` Ali Ayoub
2012-08-07  0:44               ` Eric W. Biederman
2012-08-07  1:21                 ` Re[2]: " Naoto MATSUMOTO
2012-08-15  9:10                   ` Re[3]: " Naoto MATSUMOTO
2012-08-07  3:33                 ` Eric W. Biederman
2012-08-08  6:04                   ` Or Gerlitz
2012-08-08  8:36                     ` Eric W. Biederman
2012-08-09  4:06                       ` Or Gerlitz
2012-08-12 14:05                         ` Michael S. Tsirkin
2012-08-07  3:37                 ` Joseph Glanville
2012-08-08  7:32                 ` Or Gerlitz
2012-08-08  9:17                   ` Eric W. Biederman
2012-08-09  4:34                     ` Or Gerlitz
2012-08-12 10:36                       ` Michael S. Tsirkin
2012-08-04  0:02           ` Ali Ayoub
2012-08-04  0:05             ` David Miller
2012-08-04  1:34             ` Eric W. Biederman
2012-08-04 21:33               ` Or Gerlitz
2012-08-05 18:50     ` Michael S. Tsirkin
2012-08-08  5:23       ` Or Gerlitz
2012-08-12 10:22         ` Michael S. Tsirkin
2012-08-12 13:09           ` Or Gerlitz
2012-08-12 13:41             ` Michael S. Tsirkin
2012-08-12 13:15           ` Or Gerlitz
2012-08-12 13:55             ` Michael S. Tsirkin
2012-08-12 14:13               ` Or Gerlitz
2012-08-12 20:54                 ` Michael S. Tsirkin
2012-08-14  8:44                   ` Or Gerlitz [this message]
2012-08-20 18:57                   ` Michael S. Tsirkin
2012-08-23  6:45                     ` Or Gerlitz
2012-08-14  7:41               ` Or Gerlitz
2012-08-12 10:54         ` Michael S. Tsirkin
2012-08-12 13:19           ` Or Gerlitz
2012-08-12 15:40         ` Eric W. Biederman
2012-08-13  8:33           ` Or Gerlitz
2012-08-13 16:08             ` Eric W. Biederman
2012-09-03 20:53       ` Or Gerlitz
2012-09-03 21:22         ` Michael S. Tsirkin
2012-09-04 18:50           ` Or Gerlitz
2012-09-04 19:31             ` Eric W. Biederman
2012-09-04 19:47               ` Or Gerlitz
2012-09-04 21:21             ` Michael S. Tsirkin
2012-09-04 18:57           ` Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 10/12] net/eipoib: Add sysfs support Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 11/12] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries Or Gerlitz
2012-08-01 17:09 ` [PATCH V2 12/12] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour Or Gerlitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502A0FEA.5050806@mellanox.com \
    --to=ogerlitz@mellanox.com \
    --cc=ali@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=erezsh@mellanox.co.il \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=roland@kernel.org \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.