From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Bollinger Subject: Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs Date: Thu, 21 Jun 2018 13:28:09 -0700 Message-ID: <20180621202809.enukvcdzahdon3lx@thebollingers.org> References: <20180611042515.ml6zbcmz6dlvjmrp@thebollingers.org> <496e06b9-9f02-c4ae-4156-ab6221ba23fd@amd.com> <20180612181109.GD12251@lunn.ch> <20180615022652.t6oqpnwwvdmbooab@thebollingers.org> <20180615075417.GA28730@lunn.ch> <20180618194127.uaeqlo3dy35qs3ip@thebollingers.org> <20180619151510.GB26796@lunn.ch> <20180621052425.pa464laxjrebm5s3@thebollingers.org> <20180621081127.GA31742@lunn.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev , Florian Fainelli , don@thebollingers.org To: Andrew Lunn Return-path: Received: from resqmta-po-12v.sys.comcast.net ([96.114.154.171]:50648 "EHLO resqmta-po-12v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933119AbeFUUiL (ORCPT ); Thu, 21 Jun 2018 16:38:11 -0400 Content-Disposition: inline In-Reply-To: <20180621081127.GA31742@lunn.ch> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Jun 21, 2018 at 10:11:27AM +0200, Andrew Lunn wrote: > > I'm trying to figure out how the netdev environment works on large > > switches. I can't imagine that the kernel network stack is involved in > > any part of the data plane. > > Hi Don > > It is involved in the slow path. I.e. packets from the host out > network ports. BPDU, IGMP, ARP, ND, etc. It can also be involved when > the hardware is missing features. Also, for communication with the > host itself. > > What is more important is it is the control plane. Want to bridge two > ports together? You create a software bridge, add the two ports, and > then offload it to the hardware. The kernel STP code in the software > bridge then does the state tracking, blocked, learning, forwarding > etc. Need RSTP? Just run the standard RSTP daemon on the software > bridge interface. > > Basically, use the Linux network stack as is, and offload what you can > to the hardware. That means you keep all your existing user space > network tools, daemons, SNMP agents, etc. They all just work, because > the kernel APIs remain the same, independent of if you have a switch, > or just a bunch of networks cards. > > > Can you point me to any conference slides, > > or design docs or other documentation that describes a netdev > > implementation on Trident II switch silicon? Or any other switch that > > supports 128 x 10G (or 32 x 40G) ports? > > Look at past netdev conference. In particular, talks given by > Mellanox, Cumulus, and Netronome. You can also see there drivers in Thanks. I found a slide deck from Cumulus at www.slideshare.net/CumulusNetworks/webinarlinux-networking-is-awesome I think this connects the dots between our worlds. It turns out that optoe actually is derived from the Cumulus sff_8436 driver, which they use to access QSFP devices. It was the best available, but had an experimental implementation of SFP (didn't work yet). They actually use the at24 driver for SFP. Optoe is actually architecturally identical to the Cumulus implementation. It does not use the SFP framework, but it does interface with their Linux network stack via ethtool, etc. In slide 10 of the deck they explicitly call out device drivers, saying "Innovation and change here is good." > drivers/ethernet/{mellonex|netronome}. These devices however tend to > go for firmware to control the PHYs, not the Linux network stack. > drivers/net/dsa covers SOHO switches, so up to 10 ports, mostly 1G, > some 10G ports. There is a lot of industry involved thin this segment, > trains, planes, busses, plant automation, etc, and some WiFi and STP. > Switches with DSA drivers make use of Linux to control the PHYs, not > firmware. Again, optoe does not control the PHYs. It only access the EEPROMs (on the PHYs). It does not touch any of the electrical pins. It can provide the EEPROM access to any component that wants that access, including sfp.c. > SFP are also slowly starting to enter the consumer market, with > products like the Clearfog, MACCHIATObin, and there are a few > industrial boards with SOHO switches with SFP sockets or SFF > modules. These are what are driving in kernel SFP support. Got it. I'm targeting a different market, with a different architecture. In this architecture it makes more sense to separate the EEPROM access from the IO pins control. > > Also, I see the sfp.c code. Is there any QSFP code? I'd like to see > > how that code compares to the sfp.c code. > > Currently, none that i know of. SFP+ is the limit so far. Mainly > because SoC/SOHO switches currently top out at 10G, so SFP+ is > sufficient. SFP+ is not sufficient for another market, which is using Linux to manage larger switches. These switches all have some QSFP ports, many of them have exclusively QSFP ports. I have some useful code for those environments. > > optoe can provide access, through the SFP code, to the rest of the EEPROM > > address space. It can also provide access to the QSFP EEPROM. I would > > like to collaborate on the integration, that would fit my objective of > > making more EEPROM space accessible on more platforms and distros. > > > > However, you don't want me to make the changes to SFP myself. I don't > > have any hardware or OS environment that currently supports that code. > > The cost and learning curve exceed my resources. I *think* the changes > > to the SFP code would be small, but I would need someone who understands > > the code and can test it to actually make and submit the changes. > > So i have been thinking about this some more. But given your lack of > resources, i'm guessing this won't actually work for you. But it is > probably the correct way forwards. > > The basic problem the systems you are talking about is that they don't > have a network interface per port. So they cannot use ethtool > --module-info, which IS the defined API to get access to SFP > data. Adding another API is probably not going to get accepted. Got it. I don't think I'm adding another API. Note that Cumulus is using the same architecture as optoe, and providing all the expected linux services, including ethtool --module-info. They are accessing the module-info data throug ioctl, which opens the device file provided by their driver and reads/writes the appropriate location. Optoe works the same way. > However, the current ethtool API is ioctl based. The network stack is > moving away from that, replacing it with netlink sockets. All IP > configuration is now via netlink. All bridge configuration is by > netlink, etc. So there is a desire to move ethtool to netlink. > > This move makes the API more flexible. By default, you would expect > the replacement implementation for --module-info to pass an ifindex, > or maybe an interface name. However, it could be made to optionally > take an i2c bus number. That could then go direct to the SFP code, > rather than go via the MAC driver. That would give evil user space, > proprietary, binary blob drivers access to SFP via the standard > supported kernel API, using the standard supported kernel SFP driver. Here's what I have in mind... struct sfp in sfp.c has a read and write: int (*read) struct sfp *, bool, u8, void *, size_t); int (*write) struct sfp *, bool, u8, void *, size_t); These are instantiated with: sfp->read = sfp_i2c_read; sfp->write = sfp_i2c_write; So to insert optoe into this stack, we would need to add an i2c_client to struct sfp: struct i2c_client *i2c_client; We would need to initialize that i2c_client in sfp_i2c_configure: board_info = alloc_board_info(sfp); sfp->i2c_client = i2c_new_device(i2c, board_info); We need to write the brief routine that allocates a struct_i2c_board_info, and stuffs the necessary data into it. I'm assuming that data can come from sfp. The data required includes an appropriate name for this device, and whether it is an SFP or QSFP device. (When QSFP is added to sfp.c, we can add a flag to struct sfp. Your stack will have to know which it is anyway to know where the necessary data is in the EEPROM.) Finally, we replace the body of sfp_i2c_{read, write} with a callback to optoe. All of the necessary data is already in the parameters to sfp_i2c_{read, write}. > > But that requires you roll up your sleeves and get stuck in to this > conversion work. I'm offering an improvement to sfp.c. The improvement is access to more pages of SFP EEPROM, and access to QSFP EEPROMs. It comes in the form of a specialized EEPROM driver custom built for {Q}SFP devices. I'm also offering to help integrate that driver into sfp.c. I can modify optoe to accomodate sfp.c, I can recommend how to instantiate and call it. I am not going to be able to spend the time and money required to modify and test sfp.c. I'm pretty sure you can do it MUCH faster, and MUCH better than I can. > > But you say you work for a fibre module company. Do they produce > SFP/SFP+ modules? You could get one of the supported consumer boards > with an SFP/SFP+ socket and test your modules work properly. Build out Unfortunately, that isn't going to happen on their dime. Their dimes are running out for this kind of work. > the SFP code. It has been on my TODO list to add HWMON support for the > temperature sensors, etc. Huh. Just read Documentation/hwmon/sysfs-interface. Looks like a good way to deliver that EEPROM data. Wish I'd found that two years ago when there were a few more dimes available. Don