All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@dev.mellanox.co.il>
To: Alexei Starovoitov <ast@fb.com>
Cc: Tom Herbert <tom@herbertland.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	David Miller <davem@davemloft.net>,
	Linux Netdev List <netdev@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
Date: Sun, 29 Jan 2017 11:11:40 +0200	[thread overview]
Message-ID: <CALzJLG9OksXW5JAZiPoSfKtTF_z8Te9UDP9bm_XhXAp-FwmwEw@mail.gmail.com> (raw)
In-Reply-To: <588CDA55.7030900@fb.com>

On Sat, Jan 28, 2017 at 7:52 PM, Alexei Starovoitov <ast@fb.com> wrote:
> On 1/28/17 3:20 AM, Saeed Mahameed wrote:
>>
>> On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov <ast@fb.com> wrote:
>>>
>>> On 1/27/17 1:15 PM, Saeed Mahameed wrote:
>>>>
>>>>
>>>> It is only mandatory for configurations that needs eswitch, where the
>>>> driver has no way to know about them, for a good old bare metal box,
>>>> eswitch is not needed.
>>>>
>>>> we can do some work to strip the l2 table logic - needed for PFs to
>>>> work on multi-host - out of eswitch but again that would further
>>>> complicate the driver code since eswitch will still need to update l2
>>>> tables for VFs.
>>>
>>>
>>>
>>> Saeed,
>>> for multi-host setups every host in that multi-host doesn't
>>> actually see the eswitch, no? Otherwise broken driver on one machine
>>> can affect the other hosts in the same bundle? Please double check,
>>
>>
>> each host (PF) has its own eswitch, and each eswitch lives in its own
>> "steering-space"
>>   and it can't affect others.
>>
>>> since this is absolutely critical HW requirement.
>>>
>>
>> The only shared HW resources between hosts (PFs) is the simple l2 table,
>> and the only thing a host can ask from the l2 talbe (FW) is: "forward
>> UC MAC to me", and it is the responsibility of the the driver eswitch
>> to do so.
>>
>> the l2 table is created and managed by FW, SW eswitch can only request
>> from FW, and the FW is trusted.
>
>
> ok. clear. thanks for explaining.
> Could you describe the sequence of function calls within mlx5
> that does the assignment of uc mac for PF ?
> since I'm missing where eswitch is involved.
> I can see:
> mlx5e_nic_enable | mlx5e_set_mac
>   queue_work(priv->wq, &priv->set_rx_mode_work);
>     mlx5e_set_rx_mode_work
>       mlx5e_apply_netdev_addr
>         mlx5e_add_l2_flow_rule
>

It is a  little bit more complicated than this :).

ConnectX4/5 and hopefully so on .. provide three different isolated
steering layers:

3. vport layer: avaialbe for any PF/VF vport nic driver instance
(netdevice), it allows vlan/mac filtering
 ,RSS hashing and n-tuple steering (for both encapsulated and
nonencapsulated traffic) and RFS steering. ( the code above only
writes flow entries of a PF/VF to its own vport flow tables, there is
another mechanism to propagate l2 steering rules down to eswitch from
the vport layer.

2. eswitch layer: Available for PFs only with
HCA_CAP.vport_group_manager capability set.
it allows steering between PF and different VFs on the same host (vlan
mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
steering and offloads for switchdev mode - eswitch_offloads.c - )
if this table is not create the default is pass-throu traffic to PF

1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
capability set.
needed for MH configurations and only PF is allowed and should write
"request UC MAC - set_l2_table_entry" on behalf of the PF itself and
it's own VFs.

- On a bare metal machine only layer 3 is required (all traffic is
passed to the PF vport).
- On a MH configuration layer 3 and 1 are required.
- On a SRIOV configuration layer 3 and 2 are required.
- On MH with SRIOV all layers are required.

in the driver, eswitch and L2 layers are handled by PF@eswitch.c.

So for your question:

PF always init_eswitch ( no eswitch -sriov- tables are created), and
the eswitch will start listening for vport_change_events.

A PF/VF or netdev vport instance on any steering changes updates
should call  mlx5e_vport_context_update[1]

vport_context_update is A FW command that will store the current
UC/MC/VLAN list and promiscuity info of a vport.

The FW will generate an event to the PF driver eswitch manager (vport
manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
set_l2_table_entry for each UC mac on each vport change event of any
vport (including its own vport), in case of SRIOV is enabled it will
update eswitch tables as well.

To simplify my answer the function calls are:
Vport VF/PF netdevice:
mlx5e_set_rx_mode_work
    mlx5e_vport_context_update
       mlx5e_vport_context_update_addr_list  --> FW event will be
generated to the PF esiwtch manager

PF eswitch manager(eswitch.c) on a vport change FW event:
mlx5_eswitch_vport_event
      esw_vport_change_handler
           esw_vport_change_handle_locked
                   esw_apply_vport_addr_list
                              esw_add_uc_addr
                                     set_l2_table_entry --> this will
update the l2 table in case MH is enabled.

Sorry for the long answer :)
-Saeed

[1] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c#L440
[2] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c#L1675

  reply	other threads:[~2017-01-29 13:18 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
2017-01-27  5:34   ` Or Gerlitz
2017-01-27 17:38     ` Saeed Mahameed
2017-01-27 17:50       ` Tom Herbert
2017-01-27 18:05         ` Saeed Mahameed
2017-01-27 18:16           ` Tom Herbert
2017-01-27 18:28             ` Saeed Mahameed
2017-01-27 18:42               ` Tom Herbert
2017-01-27 21:15                 ` Saeed Mahameed
2017-01-27 23:23                   ` Alexei Starovoitov
2017-01-28 11:20                     ` Saeed Mahameed
2017-01-28 17:52                       ` Alexei Starovoitov
2017-01-29  9:11                         ` Saeed Mahameed [this message]
2017-01-30 16:45                           ` Alexei Starovoitov
2017-01-30 21:18                             ` Saeed Mahameed
2017-01-31  3:32                               ` Alexei Starovoitov
2017-01-31 14:44                                 ` Mohamad Haj Yahia
2017-01-27 18:19   ` Saeed Mahameed
2017-01-27 18:33     ` Tom Herbert
2017-01-27 20:59       ` Saeed Mahameed
2017-01-26 23:32 ` [PATCH net-next 2/4] mlx5: Make building SR-IOV configurable Tom Herbert
2017-01-26 23:32 ` [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable Tom Herbert
2017-01-27  6:29   ` kbuild test robot
2017-01-27 13:43   ` kbuild test robot
2017-01-26 23:32 ` [PATCH net-next 4/4] mlx5: Make building vxlan " Tom Herbert
2017-01-27 17:58 ` [PATCH net-next 0/4] mlx5: Create build configuration options Saeed Mahameed
2017-01-27 18:13   ` Tom Herbert
2017-01-28 11:38     ` Saeed Mahameed
2017-01-28 17:19       ` Tom Herbert
2017-01-29  8:07         ` Saeed Mahameed
2017-01-30 20:00           ` Tom Herbert
2017-01-30 21:26             ` Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALzJLG9OksXW5JAZiPoSfKtTF_z8Te9UDP9bm_XhXAp-FwmwEw@mail.gmail.com \
    --to=saeedm@dev.mellanox.co.il \
    --cc=ast@fb.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.