From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Lunn Subject: Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs Date: Thu, 21 Jun 2018 10:11:27 +0200 Message-ID: <20180621081127.GA31742@lunn.ch> References: <20180611042515.ml6zbcmz6dlvjmrp@thebollingers.org> <496e06b9-9f02-c4ae-4156-ab6221ba23fd@amd.com> <20180612181109.GD12251@lunn.ch> <20180615022652.t6oqpnwwvdmbooab@thebollingers.org> <20180615075417.GA28730@lunn.ch> <20180618194127.uaeqlo3dy35qs3ip@thebollingers.org> <20180619151510.GB26796@lunn.ch> <20180621052425.pa464laxjrebm5s3@thebollingers.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev , Florian Fainelli To: Don Bollinger Return-path: Received: from vps0.lunn.ch ([185.16.172.187]:38382 "EHLO vps0.lunn.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932528AbeFUILa (ORCPT ); Thu, 21 Jun 2018 04:11:30 -0400 Content-Disposition: inline In-Reply-To: <20180621052425.pa464laxjrebm5s3@thebollingers.org> Sender: netdev-owner@vger.kernel.org List-ID: > I'm trying to figure out how the netdev environment works on large > switches. I can't imagine that the kernel network stack is involved in > any part of the data plane. Hi Don It is involved in the slow path. I.e. packets from the host out network ports. BPDU, IGMP, ARP, ND, etc. It can also be involved when the hardware is missing features. Also, for communication with the host itself. What is more important is it is the control plane. Want to bridge two ports together? You create a software bridge, add the two ports, and then offload it to the hardware. The kernel STP code in the software bridge then does the state tracking, blocked, learning, forwarding etc. Need RSTP? Just run the standard RSTP daemon on the software bridge interface. What to add a route between two ports? Just use the ip route add. It then it gets offloaded to the hardware. This means routing daemons like FreeRangeRouting just work. Want to combine two ports to make a trunk? Use the bond/team driver, and offload it to the hardware. Basically, use the Linux network stack as is, and offload what you can to the hardware. That means you keep all your existing user space network tools, daemons, SNMP agents, etc. They all just work, because the kernel APIs remain the same, independent of if you have a switch, or just a bunch of networks cards. > Can you point me to any conference slides, > or design docs or other documentation that describes a netdev > implementation on Trident II switch silicon? Or any other switch that > supports 128 x 10G (or 32 x 40G) ports? Look at past netdev conference. In particular, talks given by Mellanox, Cumulus, and Netronome. You can also see there drivers in drivers/ethernet/{mellonex|netronome}. These devices however tend to go for firmware to control the PHYs, not the Linux network stack. drivers/net/dsa covers SOHO switches, so up to 10 ports, mostly 1G, some 10G ports. There is a lot of industry involved thin this segment, trains, planes, busses, plant automation, etc, and some WiFi and STP. Switches with DSA drivers make use of Linux to control the PHYs, not firmware. SFP are also slowly starting to enter the consumer market, with products like the Clearfog, MACCHIATObin, and there are a few industrial boards with SOHO switches with SFP sockets or SFF modules. These are what are driving in kernel SFP support. > Also, I see the sfp.c code. Is there any QSFP code? I'd like to see > how that code compares to the sfp.c code. Currently, none that i know of. SFP+ is the limit so far. Mainly because SoC/SOHO switches currently top out at 10G, so SFP+ is sufficient. > optoe can provide access, through the SFP code, to the rest of the EEPROM > address space. It can also provide access to the QSFP EEPROM. I would > like to collaborate on the integration, that would fit my objective of > making more EEPROM space accessible on more platforms and distros. > > However, you don't want me to make the changes to SFP myself. I don't > have any hardware or OS environment that currently supports that code. > The cost and learning curve exceed my resources. I *think* the changes > to the SFP code would be small, but I would need someone who understands > the code and can test it to actually make and submit the changes. So i have been thinking about this some more. But given your lack of resources, i'm guessing this won't actually work for you. But it is probably the correct way forwards. The basic problem the systems you are talking about is that they don't have a network interface per port. So they cannot use ethtool --module-info, which IS the defined API to get access to SFP data. Adding another API is probably not going to get accepted. However, the current ethtool API is ioctl based. The network stack is moving away from that, replacing it with netlink sockets. All IP configuration is now via netlink. All bridge configuration is by netlink, etc. So there is a desire to move ethtool to netlink. This move makes the API more flexible. By default, you would expect the replacement implementation for --module-info to pass an ifindex, or maybe an interface name. However, it could be made to optionally take an i2c bus number. That could then go direct to the SFP code, rather than go via the MAC driver. That would give evil user space, proprietary, binary blob drivers access to SFP via the standard supported kernel API, using the standard supported kernel SFP driver. But that requires you roll up your sleeves and get stuck in to this conversion work. But you say you work for a fibre module company. Do they produce SFP/SFP+ modules? You could get one of the supported consumer boards with an SFP/SFP+ socket and test your modules work properly. Build out the SFP code. It has been on my TODO list to add HWMON support for the temperature sensors, etc. Andrew