From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Lunn <andrew@lunn.ch>
Subject: Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs
Date: Thu, 21 Jun 2018 10:11:27 +0200
Message-ID: <20180621081127.GA31742@lunn.ch>
References: <20180611042515.ml6zbcmz6dlvjmrp@thebollingers.org>
 <496e06b9-9f02-c4ae-4156-ab6221ba23fd@amd.com>
 <20180612181109.GD12251@lunn.ch>
 <20180615022652.t6oqpnwwvdmbooab@thebollingers.org>
 <20180615075417.GA28730@lunn.ch>
 <20180618194127.uaeqlo3dy35qs3ip@thebollingers.org>
 <20180619151510.GB26796@lunn.ch>
 <20180621052425.pa464laxjrebm5s3@thebollingers.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: netdev <netdev@vger.kernel.org>,
        Florian Fainelli <f.fainelli@gmail.com>
To: Don Bollinger <don@thebollingers.org>
Return-path: <netdev-owner@vger.kernel.org>
Received: from vps0.lunn.ch ([185.16.172.187]:38382 "EHLO vps0.lunn.ch"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S932528AbeFUILa (ORCPT <rfc822;netdev@vger.kernel.org>);
        Thu, 21 Jun 2018 04:11:30 -0400
Content-Disposition: inline
In-Reply-To: <20180621052425.pa464laxjrebm5s3@thebollingers.org>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

> I'm trying to figure out how the netdev environment works on large
> switches.  I can't imagine that the kernel network stack is involved in
> any part of the data plane. 

Hi Don

It is involved in the slow path. I.e. packets from the host out
network ports. BPDU, IGMP, ARP, ND, etc. It can also be involved when
the hardware is missing features. Also, for communication with the
host itself.

What is more important is it is the control plane. Want to bridge two
ports together?  You create a software bridge, add the two ports, and
then offload it to the hardware. The kernel STP code in the software
bridge then does the state tracking, blocked, learning, forwarding
etc. Need RSTP? Just run the standard RSTP daemon on the software
bridge interface.

What to add a route between two ports? Just use the ip route add. It
then it gets offloaded to the hardware. This means routing daemons
like FreeRangeRouting just work. 

Want to combine two ports to make a trunk? Use the bond/team driver,
and offload it to the hardware.

Basically, use the Linux network stack as is, and offload what you can
to the hardware. That means you keep all your existing user space
network tools, daemons, SNMP agents, etc. They all just work, because
the kernel APIs remain the same, independent of if you have a switch,
or just a bunch of networks cards.

> Can you point me to any conference slides,
> or design docs or other documentation that describes a netdev
> implementation on Trident II switch silicon?  Or any other switch that
> supports 128 x 10G (or 32 x 40G) ports?

Look at past netdev conference. In particular, talks given by
Mellanox, Cumulus, and Netronome. You can also see there drivers in
drivers/ethernet/{mellonex|netronome}. These devices however tend to
go for firmware to control the PHYs, not the Linux network stack.
drivers/net/dsa covers SOHO switches, so up to 10 ports, mostly 1G,
some 10G ports. There is a lot of industry involved thin this segment,
trains, planes, busses, plant automation, etc, and some WiFi and STP.
Switches with DSA drivers make use of Linux to control the PHYs, not
firmware.

SFP are also slowly starting to enter the consumer market, with
products like the Clearfog, MACCHIATObin, and there are a few
industrial boards with SOHO switches with SFP sockets or SFF
modules. These are what are driving in kernel SFP support.

> Also, I see the sfp.c code.  Is there any QSFP code?  I'd like to see
> how that code compares to the sfp.c code.

Currently, none that i know of. SFP+ is the limit so far. Mainly
because SoC/SOHO switches currently top out at 10G, so SFP+ is
sufficient.

> optoe can provide access, through the SFP code, to the rest of the EEPROM
> address space.  It can also provide access to the QSFP EEPROM.  I would
> like to collaborate on the integration, that would fit my objective of
> making more EEPROM space accessible on more platforms and distros.
> 
> However, you don't want me to make the changes to SFP myself.  I don't
> have any hardware or OS environment that currently supports that code.
> The cost and learning curve exceed my resources.  I *think* the changes
> to the SFP code would be small, but I would need someone who understands
> the code and can test it to actually make and submit the changes.

So i have been thinking about this some more. But given your lack of
resources, i'm guessing this won't actually work for you. But it is
probably the correct way forwards.

The basic problem the systems you are talking about is that they don't
have a network interface per port. So they cannot use ethtool
--module-info, which IS the defined API to get access to SFP
data. Adding another API is probably not going to get accepted.

However, the current ethtool API is ioctl based. The network stack is
moving away from that, replacing it with netlink sockets. All IP
configuration is now via netlink. All bridge configuration is by
netlink, etc. So there is a desire to move ethtool to netlink.

This move makes the API more flexible. By default, you would expect
the replacement implementation for --module-info to pass an ifindex,
or maybe an interface name. However, it could be made to optionally
take an i2c bus number. That could then go direct to the SFP code,
rather than go via the MAC driver. That would give evil user space,
proprietary, binary blob drivers access to SFP via the standard
supported kernel API, using the standard supported kernel SFP driver.

But that requires you roll up your sleeves and get stuck in to this
conversion work.

But you say you work for a fibre module company. Do they produce
SFP/SFP+ modules? You could get one of the supported consumer boards
with an SFP/SFP+ socket and test your modules work properly. Build out
the SFP code. It has been on my TODO list to add HWMON support for the
temperature sensors, etc.

	   Andrew