netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
       [not found] <20210215193821.3345-1-don@thebollingers.org>
@ 2021-02-26 22:34 ` Andrew Lunn
  2021-02-27  2:46   ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-02-26 22:34 UTC (permalink / raw)
  To: Don Bollinger
  Cc: arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, netdev

On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote:
> optoe is an i2c based driver that supports read/write access to all
> the pages (tables) of MSA standard SFP and similar devices (conforming
> to the SFF-8472 spec), MSA standard QSFP and similar devices (conforming
> to the SFF-8636 spec) and CMIS and similar devices (conforming to the
> Common Management Interface Specfication).

Hi Don

Please make sure you Cc: netdev. This is networking stuff.

And we have seen this code before, and the netdev Maintainers have
argued against it before.

> These devices provide identification, operational status and control
> registers via an EEPROM model.  These devices support one or 3 fixed pages
> (128 bytes) of data, and one page that is selected via a page register on
> the first fixed page.  Thus the driver's main task is to map these pages
> onto a simple linear address space for user space management applications.
> See the driver code for a detailed layout.

I assume you have seen the work NVIDIA submitted last week? This idea
of linear pages is really restrictive and we are moving away from it.

> The EEPROM data is accessible to user space and kernel consumers via the
> nvmem interface.

ethtool -m ?

In the past, this code has been NACKed because it does not integrate
into the networking stack. Is this attempt any different?

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-02-26 22:34 ` [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS Andrew Lunn
@ 2021-02-27  2:46   ` Don Bollinger
  2021-02-27 16:23     ` Andrew Lunn
  2021-03-01 20:00     ` Don Bollinger
  0 siblings, 2 replies; 30+ messages in thread
From: Don Bollinger @ 2021-02-27  2:46 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	don

On Fri, Feb 26, 2021 at 2:35 PM -0800, Andrew Lunn wrote:
> On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote:
> > optoe is an i2c based driver that supports read/write access to all
> > the pages (tables) of MSA standard SFP and similar devices (conforming
> > to the SFF-8472 spec), MSA standard QSFP and similar devices
> > (conforming to the SFF-8636 spec) and CMIS and similar devices
> > (conforming to the Common Management Interface Specfication).
> 
> Hi Don
> 
> Please make sure you Cc: netdev. This is networking stuff.

Will do.

> And we have seen this code before, and the netdev Maintainers have
> argued against it before.

Yes, though I have tried to make it more acceptable, and also more useful to
you...

> > These devices provide identification, operational status and control
> > registers via an EEPROM model.  These devices support one or 3 fixed
> > pages
> > (128 bytes) of data, and one page that is selected via a page register
> > on the first fixed page.  Thus the driver's main task is to map these
> > pages onto a simple linear address space for user space management
> applications.
> > See the driver code for a detailed layout.
> 
> I assume you have seen the work NVIDIA submitted last week? This idea of
> linear pages is really restrictive and we are moving away from it.

No, I haven't seen it.  I can't seem to locate anything in the past month on
LMKL from NVIDIA.  Please point me to it.

What are you using instead of mapping the pages into a linear address space?
Perhaps I can provide a different interface to call with some other mapping.

> > The EEPROM data is accessible to user space and kernel consumers via
> > the nvmem interface.
> 
> ethtool -m ?

This is actually below ethtool.  Ethtool has to fetch the data from the
eeprom using an appropriate i2c interface (these are defined to be i2c
devices).  And, to fetch the data from any but the first 256 bytes the code
ethtool calls must deal with the page register on the device.  This driver
handles the page register, for both SFP and QSFP/CMIS devices.  This driver
would be a useful path for ethtool to use when someone decides to make that
data available.  Note for example, CMIS devices put the DOM data
(per-channel Tx power, Rx Power, laser bias) on page 0x11.  When you want
that data, you'll have to go past 256 bytes and deal with the page register.
optoe will do that for you.
 
> In the past, this code has been NACKed because it does not integrate into
> the networking stack. Is this attempt any different?

Yes.  I have updated the driver with all the latest changes in at24.c, the
primary eeprom driver.  That update includes use of the nvmem interface,
which means this driver can now be called by kernel code.  I believe it
would be useful and easy to replace the sfp.c code that accesses the eeprom
with nvmem calls to this driver.  By doing so, you will be able to access
the additional pages of data on those devices.  You would also get the
benefit of sharing common code the other eeprom drivers.  As part of that
cleanup, the explicit sysfs creation of an eeprom device has been removed.
Full disclosure, the nvmem interface creates that device now.

> Thanks
> 	Andrew

One more point, I have been requested by the SONiC team to upstream this
driver.  It is on their short list of kernel code that is not upstream, and
they want to fix that.  They are not consumers of the netdev interface, but
they (and other NOS providers) have a need for a driver to access the eeprom
data.  Greg KH was involved in that request, and I related your concerns to
him, as well as my position that this is an eeprom driver with partners that
need it independent of netdev.  His response:

GKH> I can't remember the actual patch anymore, but you are right, it's
"just"
GKH> poking around in the EEPROM for the device, and doing what you want in 
GKH> userspace, which really should be fine.  Submit it and I'll be glad to
review it
GKH> under that type of functionality being added.

He didn't say he would override your position, but he suggested it was
appropriate to submit it.  So, here it is.

Thus:
1.  There are existing NOS platforms that need and use this functionality,
they want it in the upstream kernel.
2.  This driver is better than sfp.c, and could easily be plugged in to
improve netdev and ethtool access to SFP/QSFP/CMIS EEPROM data.

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-02-27  2:46   ` Don Bollinger
@ 2021-02-27 16:23     ` Andrew Lunn
  2021-03-01 20:00     ` Don Bollinger
  1 sibling, 0 replies; 30+ messages in thread
From: Andrew Lunn @ 2021-02-27 16:23 UTC (permalink / raw)
  To: Don Bollinger
  Cc: arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev'

> > I assume you have seen the work NVIDIA submitted last week? This idea of
> > linear pages is really restrictive and we are moving away from it.
> 
> No, I haven't seen it.  I can't seem to locate anything in the past month on
> LMKL from NVIDIA.  Please point me to it.

[RFC PATCH net-next 0/5] ethtool: Extend module EEPROM dump
Message-Id: <1614181274-28482-1-git-send-email-moshe@nvidia.com>

b4 should be able to fetch it for you, using that message id.

Clearly, we don't want two different kernel APIs for doing the same
thing. This new KAPI is still in its early days. You can contribute to
it, and make it work for your use case. If i understand correctly, you
are using Linux as a bootloader, and running the complete switch
driver in userspace, not making use of the Linux network stack. This
is not something the netdev community likes, but if you work within
the networking KAPI, rather than adding parallel KAPI, we can probably
work together. I think the biggest problem you have is identifiers.
Since you don't have the SFP associated to a netdev, the current IOCTL
interface which us the netdev name as an identifier does not work. But
the new code is netlink based. The identifier is just an attribute in
the message. See if you can use an alternative attribute which
directly identifies the SFP, not the netdev. It is O.K. to instantiate
an SFP device and then not make use of it in PHYLINK. So this should
work.

     Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-02-27  2:46   ` Don Bollinger
  2021-02-27 16:23     ` Andrew Lunn
@ 2021-03-01 20:00     ` Don Bollinger
  2021-03-01 20:45       ` Andrew Lunn
  1 sibling, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-01 20:00 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

On Fri, Feb 26, 2021 at 6:46 PM -0800, Don Bollinger wrote:
> On Fri, Feb 26, 2021 at 2:35 PM -0800, Andrew Lunn wrote:
> > On Mon, Feb 15, 2021 at 11:38:21AM -0800, Don Bollinger wrote:
> > > optoe is an i2c based driver that supports read/write access to all
> > > the pages (tables) of MSA standard SFP and similar devices
> > > (conforming to the SFF-8472 spec), MSA standard QSFP and similar
> > > devices (conforming to the SFF-8636 spec) and CMIS and similar
> > > devices (conforming to the Common Management Interface
> Specfication).
> >
> > Hi Don
> >
> > And we have seen this code before, and the netdev Maintainers have
> > argued against it before.
> 
> Yes, though I have tried to make it more acceptable, and also more useful
to
> you...
> 
> > > These devices provide identification, operational status and control
> > > registers via an EEPROM model.  These devices support one or 3 fixed
> > > pages
> > > (128 bytes) of data, and one page that is selected via a page
> > > register on the first fixed page.  Thus the driver's main task is to
> > > map these pages onto a simple linear address space for user space
> > > management
> > applications.
> > > See the driver code for a detailed layout.
> >
> > I assume you have seen the work NVIDIA submitted last week? This idea
> > of linear pages is really restrictive and we are moving away from it.

Yes, I see at least most of it in your response in the netdev archive.  The
linear pages are really a very simple mapping, but they also provide a very
simple access model.  I can readily accommodate Moshe's addressing (i2c
addr, page, bank, offset, len).  Details below...

> > > The EEPROM data is accessible to user space and kernel consumers via
> > > the nvmem interface.
> >
> > ethtool -m ?
> 
> This is actually below ethtool.  Ethtool has to fetch the data from the
eeprom
> using an appropriate i2c interface (these are defined to be i2c devices).
And,
> to fetch the data from any but the first 256 bytes the code ethtool calls
must
> deal with the page register on the device.  This driver handles the page
> register, for both SFP and QSFP/CMIS devices.  This driver would be a
useful
> path for ethtool to use when someone decides to make that data available.
> Note for example, CMIS devices put the DOM data (per-channel Tx power,
> Rx Power, laser bias) on page 0x11.  When you want that data, you'll have
to
> go past 256 bytes and deal with the page register.
> optoe will do that for you.

To be more specific, optoe is only replacing the functionality of
drivers/net/phy/sfp.c, the functions of sfp_i2c_read() and sfp_i2c_write().
These are the routines at the very bottom of the ethtool stack that actually
execute the i2c calls to get the data.  The existing routines are very
limited, in that they don't handle pages at all.  Hence they can only reach
256 bytes of QSFP EEPROM data and 512 bytes of SFP EEPROM data.  I can
propose a shorter cleaner replacement for each of those routines which will
provide access to the rest of the data on those devices.

ALL of the ethtool stack remains unchanged, works exactly the same, and will
now provide access to more data on the EEPROMs.  We may have to remove some
range checks to allow requests to pages beyond page 0.

Note that Moshe's RFC does not include the actual access routines, the
equivalent of sfp_i2c_read/write.  That will in fact require managing pages.
Using optoe, those routines are a few lines of code to map his
addr/page/bank/offset and a call to the new sfp_i2c_read/write calls.

And, all of the i2c/EEPROM accesses go through the same code that routinely
handles the rest of the EEPROMs in the system, with all of the architectural
consolidation, i2c anomaly handling, and simplified support inherent in
sharing common code.
 
> > In the past, this code has been NACKed because it does not integrate
> > into the networking stack. Is this attempt any different?
> 
> Yes.  I have updated the driver with all the latest changes in at24.c, the
> primary eeprom driver.  That update includes use of the nvmem interface,
> which means this driver can now be called by kernel code.  I believe it
would
> be useful and easy to replace the sfp.c code that accesses the eeprom with
> nvmem calls to this driver.  By doing so, you will be able to access the
> additional pages of data on those devices.  You would also get the benefit
of
> sharing common code the other eeprom drivers.  As part of that cleanup,
the
> explicit sysfs creation of an eeprom device has been removed.
> Full disclosure, the nvmem interface creates that device now.
> 
> > Thanks
> > 	Andrew
> 
> One more point, I have been requested by the SONiC team to upstream this
> driver.  It is on their short list of kernel code that is not upstream,
and they
> want to fix that.  They are not consumers of the netdev interface, but
they
> (and other NOS providers) have a need for a driver to access the eeprom
> data.  Greg KH was involved in that request, and I related your concerns
to
> him, as well as my position that this is an eeprom driver with partners
that
> need it independent of netdev.  His response:
> 
> GKH> I can't remember the actual patch anymore, but you are right, it's
> "just"
> GKH> poking around in the EEPROM for the device, and doing what you want
> GKH> in userspace, which really should be fine.  Submit it and I'll be
> GKH> glad to
> review it
> GKH> under that type of functionality being added.
> 
> He didn't say he would override your position, but he suggested it was
> appropriate to submit it.  So, here it is.
> 
> Thus:
> 1.  There are existing NOS platforms that need and use this functionality,
> they want it in the upstream kernel.
> 2.  This driver is better than sfp.c, and could easily be plugged in to
improve
> netdev and ethtool access to SFP/QSFP/CMIS EEPROM data.
> 
> Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-01 20:00     ` Don Bollinger
@ 2021-03-01 20:45       ` Andrew Lunn
  2021-03-05 19:07         ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-01 20:45 UTC (permalink / raw)
  To: Don Bollinger
  Cc: arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> To be more specific, optoe is only replacing the functionality of
> drivers/net/phy/sfp.c, the functions of sfp_i2c_read() and sfp_i2c_write().
> These are the routines at the very bottom of the ethtool stack that actually
> execute the i2c calls to get the data.  The existing routines are very
> limited, in that they don't handle pages at all.  Hence they can only reach
> 256 bytes of QSFP EEPROM data and 512 bytes of SFP EEPROM data.  I can
> propose a shorter cleaner replacement for each of those routines which will
> provide access to the rest of the data on those devices.

drivers/net/phy/sfp.c is not the only code making use of this KAPI.
Any MAC driver can implement the ethtool op calls for reading SFP
memory. The MAC driver can either directly reply because it has the
SFP hidden behind firmware, or it can call into the sfp.c code,
because Linux is driving the SFP.

Moshe is working on the Mellonox MAC drivers. As you say, the current
sfp.c code is very limited. But once Moshe code is merged, i will do
the work needed to extend sfp.c to fully support the KAPI. It will
then work for many more MAC drivers, those using phylink.

For me, the KAPI is the important thing, and less so how the
implementation underneath works. Ideally, we want one KAPI for
accessing SFP EEPROMs. Up until now, that KAPI is the ethtool IOCTL.
But that KAPI is now showing its age, and it causing us problems. So
we need to replace that KAPI. ethtool has recently moved to using
netlink messages. So any replacement should be based on netlink. The
whole network stack is pretty much controlled via netlink. So you will
find it very difficult to argue for any other form of KAPI within the
netdev community. Since optoe's KAPI is not netlink based, it is very
unlikely to be accepted.

But netlink is much more flexible than the older IOCTL interface.
Please work with us to ensure this new KAPI can work with your use
cases.

     Andrew


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-01 20:45       ` Andrew Lunn
@ 2021-03-05 19:07         ` Don Bollinger
  2021-03-05 22:55           ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-05 19:07 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

On Mon, Mar 1, 2021 at 12:46 PM-0800, Andrew Lunn wrote:
> > To be more specific, optoe is only replacing the functionality of
> > drivers/net/phy/sfp.c, the functions of sfp_i2c_read() and
sfp_i2c_write().
> > These are the routines at the very bottom of the ethtool stack that
> > actually execute the i2c calls to get the data.  The existing routines
> > are very limited, in that they don't handle pages at all.  Hence they
> > can only reach
> > 256 bytes of QSFP EEPROM data and 512 bytes of SFP EEPROM data.  I can
> > propose a shorter cleaner replacement for each of those routines which
> > will provide access to the rest of the data on those devices.
> 
> drivers/net/phy/sfp.c is not the only code making use of this KAPI.
> Any MAC driver can implement the ethtool op calls for reading SFP memory.
> The MAC driver can either directly reply because it has the SFP hidden
> behind firmware, or it can call into the sfp.c code, because Linux is
driving the
> SFP.

OK, I have checked with my partners, including NOS vendors and switch
platform vendors.  They are not using the netdev framework, they are
basically not using kernel networking for managing the networking through
tens to over a hundred network ports at 10G to 400G speeds.  The kernel is
not the source of truth regarding the state of network devices.  I know
netdev *could* manage these systems, and that you are working toward that
goal, that is not the approach they are taking nor the direction they are
heading.  I am not disparaging netdev, I respect and value the contribution
to linux networking.  It's all good.  It just isn't the direction my
partners are going at this time.

You have described this architecture in the past as a 'bootloader'.  In fact
Linux is the operating system running on those switches.  It is not
temporary (eg loading the real OS and going away).  It is allocating memory,
dispatching processes and threads, handling interrupts, hosting docker
containers, and running the proprietary network APIs that manage the
networks.  In this architecture, the optical modules are managed by the OS,
through drivers.  The network APIs interact through these drivers.  For much
of the configuration data, there are configuration files that match up
device hardware (e.g. Low Power Mode GPIO lines and Tx disable lines) and
i2c buses (through layers of i2c muxes) with switch ports as seen by the
switch silicon.  Network management software (user space apps) is
responsible for enabling, configuring and monitoring optical modules to
match the config files.  Kernel drivers provide the access to the GPIO lines
and the EEPROM control registers.

Notably, in this architecture, there are actually NO kernel consumers of the
module EEPROM data.  The version of optoe that is in production in these NOS
and switch environments does not even have an entry point callable by the
kernel.  All of the consumers are accessing the data via the sysfs file in
/sys/bus/i2c/devices/*.   I have closely modeled the updated version of
optoe on the at24 driver (drivers/misc/eeprom/at24.c).  Thus, the KAPI is
actually the same as used by other eeprom drivers.  It is an eeprom, it is
accessed by the nvmem interfaces, in both kernel and user space.

My primary motivation for creating optoe is to consolidate a bunch of
different implementations by different vendors, to add page support which
most implementations lacked, to extend the reach to all of the architected
pages (the standards describe them as proprietary, not forbidden), to
provide write access, and to enable CMIS devices.  Those goals apply to the
netdev/netlink environment as well.  I added the kernel access via nvmem to
facilitate adoption in your network stack, to achieve the same goals
(standardization and improvement of access to module EEPROMs).

> 
> Moshe is working on the Mellonox MAC drivers. As you say, the current
sfp.c

I love Moshe's KAPI, and am reviewing and commenting on it to ensure its
success.

> code is very limited. But once Moshe code is merged, i will do the work
> needed to extend sfp.c to fully support the KAPI. It will then work for
many
> more MAC drivers, those using phylink.

One piece of that work could be to replace the contents of
drivers/net/phy/sfp.c, functions sfp_i2c_read() and sfp_i2c_write() with
nvmem calls to optoe.  That would be an easy change, and provide all of the
features of optoe (pages, access to all of the EEPROM, write support, CMIS),
without writing and maintaining that i2c access code.  The actual i2c calls
would be handled by the same code that is supporting at24.

It is plausible that platform vendors would choose not to implement their
own version of these functions if the generic sfp_i2c_read/write worked for
them.   Fewer implementations of the same code, with more capability in the
common implementation, is obviously beneficial.

> For me, the KAPI is the important thing, and less so how the
implementation
> underneath works. Ideally, we want one KAPI for accessing SFP EEPROMs.
> Up until now, that KAPI is the ethtool IOCTL.
> But that KAPI is now showing its age, and it causing us problems. So we
need
> to replace that KAPI. ethtool has recently moved to using netlink
messages.
> So any replacement should be based on netlink. The whole network stack is
> pretty much controlled via netlink. So you will find it very difficult to
argue for
> any other form of KAPI within the netdev community. Since optoe's KAPI is
> not netlink based, it is very unlikely to be accepted.
> 
> But netlink is much more flexible than the older IOCTL interface.
> Please work with us to ensure this new KAPI can work with your use cases.

I accept all your points from the netdev/netlink perspective.  To that end,
I offer optoe as an upgrade to the default implementation of
sfp_i2c_read/write.

I also have partners using a different architecture, for whom a
netdev/netlink based solution would not be useful.  These partners have been
using a sysfs based approach to module EEPROMs and have no motivation to
change that.  This version of optoe is using the standard eeprom access
method (nvmem) to provide this access.

Acknowledging your objections, I nonetheless request that optoe be accepted
into upstream as an eeprom driver in drivers/misc/eeprom.  It is a
legitimate driver, with a legitimate user community, which deserves the
benefits of being managed as a legitimate part of the linux kernel.

> 
>      Andrew

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-05 19:07         ` Don Bollinger
@ 2021-03-05 22:55           ` Jakub Kicinski
  2021-03-06  2:30             ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2021-03-05 22:55 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Andrew Lunn',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

On Fri, 5 Mar 2021 11:07:08 -0800 Don Bollinger wrote:
> Acknowledging your objections, I nonetheless request that optoe be accepted
> into upstream as an eeprom driver in drivers/misc/eeprom.  It is a
> legitimate driver, with a legitimate user community, which deserves the
> benefits of being managed as a legitimate part of the linux kernel.

It's in the best interest of the community to standardize on how 
we expect things to operate. You're free to do whatever you want
in your proprietary systems but please don't expect us to accept
a parallel, in now way superior method of accessing SFPs. 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-05 22:55           ` Jakub Kicinski
@ 2021-03-06  2:30             ` Don Bollinger
  2021-03-06  3:31               ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-06  2:30 UTC (permalink / raw)
  To: 'Jakub Kicinski'
  Cc: 'Andrew Lunn',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

On Fri, 5 Mar 2021 2:55 PM -0800 Jakub Kicinski wrote:
> On Fri, 5 Mar 2021 11:07:08 -0800 Don Bollinger wrote:
> > Acknowledging your objections, I nonetheless request that optoe be
> > accepted into upstream as an eeprom driver in drivers/misc/eeprom.  It
> > is a legitimate driver, with a legitimate user community, which
> > deserves the benefits of being managed as a legitimate part of the linux
> kernel.
> 
> It's in the best interest of the community to standardize on how we expect
> things to operate. You're free to do whatever you want in your proprietary
> systems but please don't expect us to accept a parallel, in now way
superior
> method of accessing SFPs.

These are not proprietary systems.  The Network Operating Systems that use
optoe are open source projects, Linux based, available on github, and
contributing to the Linux source (see for example
https://github.com/Azure/SONiC).  The switches that these NOSs run on are
open spec systems
(https://www.opencompute.org/wiki/Networking/SpecsAndDesigns).  The fact
that they use the SDK from the switch silicon vendor should not mean they
are banished from the Linux community.

I am proposing acceptance of a commonly used implementation for accessing
SFPs because the one used by the netdev/netlink community does not fit the
architecture of the white box NOS/switch community.  I am not trying to pick
sides, I am trying to make optoe available to both communities, to improve
EEPROM access for both of them.

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-06  2:30             ` Don Bollinger
@ 2021-03-06  3:31               ` Andrew Lunn
  2021-03-12 19:04                 ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-06  3:31 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> I am proposing acceptance of a commonly used implementation for accessing
> SFPs because the one used by the netdev/netlink community does not fit the
> architecture of the white box NOS/switch community.

Please could you expand on this. I've given suggests as to how the new
netlink KAPI could be used for this use case, without being attached
to a netdev. And you have said nothing about why it cannot be made to
work. You cannot argue the architecture does not fit without actually
saying why it does not fit.

   Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-06  3:31               ` Andrew Lunn
@ 2021-03-12 19:04                 ` Don Bollinger
  2021-03-12 19:59                   ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-12 19:04 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

On Fri, 5 Mar 2021 19:32 -0800 Andrew Lunn wrote:
> > I am proposing acceptance of a commonly used implementation for
> > accessing SFPs because the one used by the netdev/netlink community
> > does not fit the architecture of the white box NOS/switch community.
> 
> Please could you expand on this. I've given suggests as to how the new
> netlink KAPI could be used for this use case, without being attached to a
> netdev. And you have said nothing about why it cannot be made to work.
> You cannot argue the architecture does not fit without actually saying why
it
> does not fit.
> 
>    Andrew

Sorry it took some time to clarify this for myself.  I'm using SONiC (the
NOS
Microsoft uses to run the switches in its Azure cloud) as my example.  They
are users of optoe, and they actually initiated the request to push optoe 
upstream.  Other white box NOS vendors are similar.

SONiC manages all aspects of SFP/QSFP/CMIS interaction through user
space.  They have specified an API that is implemented by switch platform
vendors, that provides things like presence detection, LowPower mode
up/down/status, raw access to EEPROM content, interpretation of EEPROM
content (including TxPower/RxPower/bias, high/low alarm/warning thresholds,
static content like serial number and part number, and dozens of other
items).
This interface is implemented in python scripts provided by the switch
platform
vendor.  Those scripts encode the mapping of CPLD i2c muxes to i2c buses to
port numbers, specific to each switch.

At the bottom of that python stack, all EEPROM access goes through
open/seek/read/close access to the optoe managed file in 
/sys/bus/i2c/devices/{num}-0050/eeprom.

You're not going to like this, but ethtool -e and ethtool -m both return 
' Ethernet0 Cannot get EEPROM data: Operation not supported', for
every port managed by the big switch silicon.

So, my users are using Linux for all the usual Linux things (memory 
management, process management, I/O, IPC, containers), but they don't
use Linux networking to manage the ports that are managed by the big
switch silicon.  (Linux networking is still in use for the management port
that talks to the management processor running Linux.)

optoe provides the device EEPROM foundation for this architecture, but 
requires the sysfs interface (via nvmem) to provide it.  optoe can also
easily
provide the default EEPROM access for the netdev/netlink interface through
the old and new KAPI.

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-12 19:04                 ` Don Bollinger
@ 2021-03-12 19:59                   ` Andrew Lunn
  2021-03-13 21:35                     ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-12 19:59 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> This interface is implemented in python scripts provided by the switch
> platform
> vendor.  Those scripts encode the mapping of CPLD i2c muxes to i2c buses to
> port numbers, specific to each switch.
> 
> At the bottom of that python stack, all EEPROM access goes through
> open/seek/read/close access to the optoe managed file in 
> /sys/bus/i2c/devices/{num}-0050/eeprom.

And this python stack is all open source? So you should be able to
throw away parts of the bottom end and replace it with a different
KAPI, and nobody will notice? In fact, this is probably how it was
designed. Anybody working with out of tree code knows what gets merged
later is going to be different because of review comments. And KAPI
code is even more likely to be different. So nobody really expected
optoe to get merged as is.

> You're not going to like this, but ethtool -e and ethtool -m both
> return ' Ethernet0 Cannot get EEPROM data: Operation not supported',
> for every port managed by the big switch silicon.

You are still missing what i said. The existing IOCTL interface needs
a network interface name. But there is no reason why you cannot extend
the new netlink KAPI to take an alternative identifier, sfp42. That
maps directly to the SFP device, without using an interface name. Your
pile of python can directly use the netlink API, the ethtool command
does not need to make use of this form of identifier, and you don't
need to "screen scrape" ethtool.

It seems very unlikely optoe is going to get merged. The network
maintainers are against it, due to KAPI issues. I'm trying to point
out a path you can take to get code merged. But it is up to you if you
decided to follow it.

	Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-12 19:59                   ` Andrew Lunn
@ 2021-03-13 21:35                     ` Don Bollinger
  2021-03-15 17:39                       ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-13 21:35 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

> > This interface is implemented in python scripts provided by the switch
> > platform vendor.  Those scripts encode the mapping of CPLD i2c muxes
> > to i2c buses to port numbers, specific to each switch.
> >
> > At the bottom of that python stack, all EEPROM access goes through
> > open/seek/read/close access to the optoe managed file in
> > /sys/bus/i2c/devices/{num}-0050/eeprom.
> 
> And this python stack is all open source? So you should be able to throw

It is open in the sense that it is accessible to the world on github and the
maintainers would presumably entertain contributions.

In practice, these are developed and maintained by the white box switch
vendors.  There is one implementation for each platform on each supported
NOS.  There is lots of commonality among them, but each is nonetheless
unique.  Optoe has been adopted across this ecosystem as a common piece that
has been factored out of many of the platform specific implementations.

> away parts of the bottom end and replace it with a different KAPI, and
> nobody will notice? In fact, this is probably how it was designed. Anybody

Actually everyone who touches this code would notice, each implementation
would have to be modified, with literally no benefit to this community.

> working with out of tree code knows what gets merged later is going to be
> different because of review comments. And KAPI code is even more likely to
> be different. So nobody really expected optoe to get merged as is.

The list of 'nobody' includes myself, my switch platform partners, my NOS
partners and Greg KH.  I did expect to accommodate constructive review of
the code, which I have already done (this is v2).

> 
> > You're not going to like this, but ethtool -e and ethtool -m both
> > return ' Ethernet0 Cannot get EEPROM data: Operation not supported',
> > for every port managed by the big switch silicon.
> 
> You are still missing what i said. The existing IOCTL interface needs a
network
> interface name. But there is no reason why you cannot extend the new
> netlink KAPI to take an alternative identifier, sfp42. That maps directly
to the
> SFP device, without using an interface name. Your pile of python can
directly
> use the netlink API, the ethtool command does not need to make use of this
> form of identifier, and you don't need to "screen scrape" ethtool.

It is just software, your proposal is certainly technically feasible.  It
provides no benefit to the community that is using optoe.

optoe is using a perfectly good KAPI, the nvmem interface that is being
developed and maintained by the folks who manage the EEPROM  drivers in the
kernel.  It has been updated since the prior submittal in 2018 to use the
nvmem interface and the regmap interface, both from the at24.c driver.  This
community isn't using the rest of the netdev/netlink interfaces, and has
adopted (before I wrote optoe) a perfectly reasonable approach of writing a
simple driver to access these simple devices.  

optoe does not undermine the netlink KAPI that Moshe is working on.  If your
community is interested, it could adopt optoe, WITH your KAPI, to
consolidate and improve module EEPROM access for mainstream netdev
consumers.  I am eager to collaborate on the fairly simple integration.

> 
> It seems very unlikely optoe is going to get merged. The network
maintainers
> are against it, due to KAPI issues. I'm trying to point out a path you can
take
> to get code merged. But it is up to you if you decided to follow it.

Thank you, I decline.  I respectfully request that optoe be accepted as a
useful implementation of the EEPROM driver, with the same access methods as
other EEPROM drivers, customized for the unique memory layout of SFP, QSFP
and CMIS devices.  I remain open to improvements in the implementation, but
my community finds no value in an implementation that removes the standard
EEPROM access via sysfs and open/seek/read/close calls.

> 
> 	Andrew

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-13 21:35                     ` Don Bollinger
@ 2021-03-15 17:39                       ` Jakub Kicinski
  2021-03-15 18:09                         ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2021-03-15 17:39 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Andrew Lunn',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

On Sat, 13 Mar 2021 13:35:56 -0800 Don Bollinger wrote:
> > away parts of the bottom end and replace it with a different KAPI, and
> > nobody will notice? In fact, this is probably how it was designed. Anybody  
> 
> Actually everyone who touches this code would notice, each implementation
> would have to be modified, with literally no benefit to this community.

You keep saying that kernel API is "of no benefit to this community"
yet you don't want to accept the argument that your code is of no
benefit to the upstream community.

> optoe does not undermine the netlink KAPI that Moshe is working on.  

It does, although it may be hard to grasp for a vendor who can just EoL
a product at will once nobody is paying for it. We aim to provide
uniform support for all networking devices and an infinite backward
compatibility guarantee.

People will try to use optoe-based tools on the upstream drivers and
they won't work. Realistically we will need to support both APIs.

> If your community is interested, it could adopt optoe, WITH your
> KAPI, to consolidate and improve module EEPROM access for mainstream
> netdev consumers.  I am eager to collaborate on the fairly simple
> integration.

Nacked-by: Jakub Kicinski <kuba@kernel.org>

Please move on.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-15 17:39                       ` Jakub Kicinski
@ 2021-03-15 18:09                         ` Don Bollinger
  2021-03-17 18:15                           ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-15 18:09 UTC (permalink / raw)
  To: 'Jakub Kicinski'
  Cc: 'Andrew Lunn',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

On Mon, 15 Mar 2021 10:40 -0800 Jakub Kicinski wrote:
> On Sat, 13 Mar 2021 13:35:56 -0800 Don Bollinger wrote:
> > > away parts of the bottom end and replace it with a different KAPI,
> > > and nobody will notice? In fact, this is probably how it was
> > > designed. Anybody
> >
> > Actually everyone who touches this code would notice, each
> > implementation would have to be modified, with literally no benefit to
this
> community.
> 
> You keep saying that kernel API is "of no benefit to this community"
> yet you don't want to accept the argument that your code is of no benefit
to
> the upstream community.

I have offered, in every response, to collaborate with the simple
integration to use optoe as the default upstream driver to access the module
EEPROMs.  optoe would be superior to the existing default routines in sfp.c
and would allow multiple existing vendor specific upstream drivers to adopt
the default.  That would reduce the code base they maintain and provide
better access to module eeproms.  I already adopted the existing EEPROM api
to make that integration easy (nvmem).  I'm reluctant to submit the changes
to sfp.c because I have no expertise in that stack and no platform to test
it on.

> 
> > optoe does not undermine the netlink KAPI that Moshe is working on.
> 
> It does, although it may be hard to grasp for a vendor who can just EoL a
> product at will once nobody is paying for it. We aim to provide uniform
> support for all networking devices and an infinite backward compatibility
> guarantee.

I aim to provide uniform support for module EEPROM devices, with no less
reason to expect an infinite backward compatibility guarantee.  (Infinite is
a bit much, that technology will inevitably become uninteresting to my
community as well as yours.)

> 
> People will try to use optoe-based tools on the upstream drivers and they
> won't work. Realistically we will need to support both APIs.

I assume they "won't work" because of new requirements or newly discovered
defects.  At that point optoe would be fixed by people who care, just like
any other upstream code.  If your stack adopts optoe, through Moshe's KAPI,
then both communities benefit from ongoing support to maintain and enhance
EEPROM access.  If your stack does not adopt optoe, then my community will
manage the support, since they need and use the code.

As for "both APIs", the first API is Moshe's, which we both support
(politically and technically).  The second is nvmem, which is supported by
the EEPROM driver folks, led by the support for the at24 driver.  I'm
calling the routines they created for this purpose, I'm not creating a new
API.

Bottom line:  "Realistically we will need to support both APIs" even if
optoe does not get accepted.

> 
> > If your community is interested, it could adopt optoe, WITH your KAPI,
> > to consolidate and improve module EEPROM access for mainstream netdev
> > consumers.  I am eager to collaborate on the fairly simple
> > integration.
> 
> Nacked-by: Jakub Kicinski <kuba@kernel.org>
> 
> Please move on.

My community still has useful code that they would like in the upstream
kernel.

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-15 18:09                         ` Don Bollinger
@ 2021-03-17 18:15                           ` Andrew Lunn
  2021-03-20 16:10                             ` Pali Rohár
  2021-03-23 20:32                             ` Don Bollinger
  0 siblings, 2 replies; 30+ messages in thread
From: Andrew Lunn @ 2021-03-17 18:15 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> I have offered, in every response, to collaborate with the simple
> integration to use optoe as the default upstream driver to access the module
> EEPROMs.  optoe would be superior to the existing default routines in sfp.c

Actually, i'm not sure they would be. Since the KAPI issues are pretty
much a NACK on their own, i didn't bother raising other issues. Both
Russell King and I has issues with quirks and hotplug.

Our experience is that a number of SFPs are broken, they don't follow
the standard. Some you cannot perform more than 16 bytes reads without
them locking up. Others will perform a 16 byte read, but only give you
one useful byte of data. So you have to read enough of the EEPROM a
byte at a time to get the vendor and product strings in order to
determine what quirks need to be applied. optoe has nothing like
this. Either you don't care and only support well behaved SFPs, or you
have the quirk handling in user space, in the various vendor code
blobs, repeated again and again. To make optoe generically usable, you
are going to have to push the quirk handling into optoe. The
brokenness should be hidden from userspace.

And then you repeat all the quirk handling sfp.c has. That does not
scale, we don't want the same quirks in two different places. However,
because SFPs are hot pluggable, you need to re-evaluate the quirks
whenever there is a hot-plug event. optoe has no idea if there has
been a hotplug event, since it does not have access to the GPIOs. Your
user space vendor code might know, it has access to the GPIOs. So
maybe you could add an IOCTL call or something, to let optoe know the
module has changed and it needs to update its quirks. Or for every
user space read, you actually re-read the vendor IDs and refresh the
quirks before performing the read the user actually wants. That all
seems ugly and is missing from the current patch.

I fully agree with Jakub NACK.

  Andrew



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-17 18:15                           ` Andrew Lunn
@ 2021-03-20 16:10                             ` Pali Rohár
  2021-03-26 18:43                               ` Don Bollinger
  2021-03-23 20:32                             ` Don Bollinger
  1 sibling, 1 reply; 30+ messages in thread
From: Pali Rohár @ 2021-03-20 16:10 UTC (permalink / raw)
  To: Don Bollinger
  Cc: Andrew Lunn, 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

Hello Don!

I have read whole discussion and your EEPROM patch proposal. But for me
it looks like some kernel glue code for some old legacy / proprietary
access method which does not have any usage outside of that old code.

Your code does not contain any quirks which are needed to read different
EEPROMs in different SFP modules. As Andrew wrote there are lot of
broken SFPs which needs special handling and this logic is already
implemented in sfp.c and sfp-bus.c kernel drivers. These drivers then
export EEPROM content to userspace via ethtool -m API in unified way and
userspace does not implement any quirks (nor does not have to deal with
quirks).

If you try to read EEPROM "incorrectly" then SFP module with its EEPROM
chip (or emulation of chip) locks and is fully unusable after you unplug
it and plug it again. Kernel really should not export API to userspace
which can cause "damage" to SFP modules. And currently it does *not* do
it.

I have contributed code for some GPON SFP modules, so their EEPROM can
be correctly read and exported to userspace via ethtool -m. So I know
that this is very fragile area and needs to be properly handled.

So I do not see any reason why can be a new optoe method in _current_
form useful. It does not implemented required things for handling
different EEPROM modules.

I would rather suggest you to use ethtool -m IOCTL API and in case it is
not suitable for QSFP (e.g. because of paging system) then extend this
API.

There were already proposals for using netlink socket interface which is
today de-facto standard interface for new code. sysfs API for such thing
really looks like some legacy code and nowadays we have better access
methods.

If you want, I can help you with these steps and make patches to be in
acceptable state; not written in "legacy" style. As I'm working with
GPON SFP modules with different EEPROMs in them, I'm really interested
in any improvements in their EEPROM area.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-17 18:15                           ` Andrew Lunn
  2021-03-20 16:10                             ` Pali Rohár
@ 2021-03-23 20:32                             ` Don Bollinger
  2021-03-23 22:29                               ` Andrew Lunn
  1 sibling, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-23 20:32 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

> > I have offered, in every response, to collaborate with the simple
> > integration to use optoe as the default upstream driver to access the
> > module EEPROMs.  optoe would be superior to the existing default
> > routines in sfp.c
> 
> Actually, i'm not sure they would be. Since the KAPI issues are pretty
much a
> NACK on their own, i didn't bother raising other issues. Both Russell King
and
> I has issues with quirks and hotplug.
> 
> Our experience is that a number of SFPs are broken, they don't follow the
> standard. Some you cannot perform more than 16 bytes reads without them
> locking up. Others will perform a 16 byte read, but only give you one
useful
> byte of data. So you have to read enough of the EEPROM a byte at a time to
> get the vendor and product strings in order to determine what quirks need
> to be applied. optoe has nothing like this. Either you don't care and only
> support well behaved SFPs, or you have the quirk handling in user space,
in
> the various vendor code blobs, repeated again and again. To make optoe
> generically usable, you are going to have to push the quirk handling into
> optoe. The brokenness should be hidden from userspace.

Interesting.  I would throw away such devices.  That's why switch vendors
publish supported parts lists.

Can you point me to the code that is handling those quirks?  Since I haven't
seen those problems, I don't know what they are and how to address them.

Note there are a VAST number of data items in those EEPROMs, including
proprietary capabilities.  Many of the items are configuration dependent,
and mean different things depending on the value of other data items.  Most
of these items are not of any interest to kernel networking.  I try to
minimize the size of the kernel footprint and move those decoding and
management functions to user space.

> 
> And then you repeat all the quirk handling sfp.c has. That does not scale,
we
> don't want the same quirks in two different places. However, because SFPs
> are hot pluggable, you need to re-evaluate the quirks whenever there is a
> hot-plug event. optoe has no idea if there has been a hotplug event, since
it
> does not have access to the GPIOs. Your user space vendor code might
> know, it has access to the GPIOs. So maybe you could add an IOCTL call or
> something, to let optoe know the module has changed and it needs to
> update its quirks. Or for every user space read, you actually re-read the
> vendor IDs and refresh the quirks before performing the read the user
> actually wants. That all seems ugly and is missing from the current patch.

Actually I do need to know whether the device supports paging, that's the
only device state I need.  Since I don't detect hotplug events, I read the
'paging supported' bit on every read that changes the page register.  

There is a GPIO line to detect 'presence', which presumably could be
accessed via device tree configuration with the GPIO driver.  I haven't
figured out how to connect those pieces so I just read the page register on
every access.  Adding that would be a useful feature.

> 
> I fully agree with Jakub NACK.
> 
>   Andrew

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-23 20:32                             ` Don Bollinger
@ 2021-03-23 22:29                               ` Andrew Lunn
  2021-03-26 18:43                                 ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-23 22:29 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> > Our experience is that a number of SFPs are broken, they don't follow the
> > standard. Some you cannot perform more than 16 bytes reads without them
> > locking up. Others will perform a 16 byte read, but only give you one
> useful
> > byte of data. So you have to read enough of the EEPROM a byte at a time to
> > get the vendor and product strings in order to determine what quirks need
> > to be applied. optoe has nothing like this. Either you don't care and only
> > support well behaved SFPs, or you have the quirk handling in user space,
> in
> > the various vendor code blobs, repeated again and again. To make optoe
> > generically usable, you are going to have to push the quirk handling into
> > optoe. The brokenness should be hidden from userspace.
> 
> Interesting.  I would throw away such devices.  That's why switch vendors
> publish supported parts lists.
> 
> Can you point me to the code that is handling those quirks?  Since I haven't
> seen those problems, I don't know what they are and how to address them.

Take a look in drivers/net/phy/sfp.c

commit f0b4f847673299577c29b71d3f3acd3c313d81b7
Author: Pali Rohár <pali@kernel.org>
Date:   Mon Jan 25 16:02:28 2021 +0100

    net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant
    
    The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical information
    stored in its EEPROM. It claims to support all transceiver types including
    10G Ethernet. Clear all claimed modes and set only 1000baseX_Full, which is
    the only one supported.
    
    This module has also phys_id set to SFF, and the SFP subsystem currently
    does not allow to use SFP modules detected as SFFs. Add exception for this
    module so it can be detected as supported.
    
    This change finally allows to detect and use SFP GPON module Ubiquiti
    U-Fiber Instant on Linux system.
    
    EEPROM content of this SFP module is (where XX is serial number):
    
    00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8    ???........??.??
    10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20    ....UBNT
    20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41        .??)UF-INSTA
    30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36    NT      4   ??.6
    40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX    .?..UBNTXXXXXXXX
    50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41        140123  `??A


commit 426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1
Author: Pali Rohár <pali@kernel.org>
Date:   Mon Jan 25 16:02:27 2021 +0100

    net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
    
    The workaround for VSOL V2801F brand based GPON SFP modules added in commit
    0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0
    workaround") works only for IDs added explicitly to the list. Since there
    are rebranded modules where OEM vendors put different strings into the
    vendor name field, we cannot base workaround on IDs only.
    
    Moreover the issue which the above mentioned commit tried to work around is
    generic not only to VSOL based modules, but rather to all GPON modules
    based on Realtek RTL8672 and RTL9601C chips.
    
    These include at least the following GPON modules:
    * V-SOL V2801F
    * C-Data FD511GX-RM0
    * OPTON GP801R
    * BAUDCOM BD-1234-SFM
    * CPGOS03-0490 v2.0
    * Ubiquiti U-Fiber Instant
    * EXOT EGS1
    
    These Realtek chips have broken EEPROM emulator which for N-byte read
    operation returns just the first byte of EEPROM data, followed by N-1
    zeros.
    
    Introduce a new function, sfp_id_needs_byte_io(), which detects SFP modules
    with broken EEPROM emulator based on N-1 zeros and switch to 1 byte EEPROM
    reading operation.
    
    Function sfp_i2c_read() now always uses single byte reading when it is
    required and when function sfp_hwmon_probe() detects single byte access,
    it disables registration of hwmon device, because in this case we cannot
    reliably and atomically read 2 bytes as is required by the standard for
    retrieving values from diagnostic area.
    
    (These Realtek chips are broken in a way that violates SFP standards for
    diagnostic interface. Kernel in this case simply cannot do anything less
    of skipping registration of the hwmon interface.)
    
    This patch fixes reading of EEPROM content from SFP modules based on
    Realtek RTL8672 and RTL9601C chips. Diagnostic interface of EEPROM stays
    broken and cannot be fixed.

commit 624407d2cf14ff58e53bf4b2af9595c4f21d606e
Author: Russell King <rmk+kernel@armlinux.org.uk>
Date:   Sun Jan 10 10:58:32 2021 +0000

    net: sfp: cope with SFPs that set both LOS normal and LOS inverted
    
    The SFP MSA defines two option bits in byte 65 to indicate how the
    Rx_LOS signal on SFP pin 8 behaves:
    
    bit 2 - Loss of Signal implemented, signal inverted from standard
            definition in SFP MSA (often called "Signal Detect").
    bit 1 - Loss of Signal implemented, signal as defined in SFP MSA
            (often called "Rx_LOS").
    
    Clearly, setting both bits results in a meaningless situation: it would
    mean that LOS is implemented in both the normal sense (1 = signal loss)
    and inverted sense (0 = signal loss).
    
    Unfortunately, there are modules out there which set both bits, which
    will be initially interpret as "inverted" sense, and then, if the LOS
    signal changes state, we will toggle between LINK_UP and WAIT_LOS
    states.
    
    Change our LOS handling to give well defined behaviour: only interpret
    these bits as meaningful if exactly one is set, otherwise treat it as
    if LOS is not implemented.

ommit 0d035bed2a4a6c4878518749348be61bf082d12a
Author: Russell King <rmk+kernel@armlinux.org.uk>
Date:   Wed Dec 9 11:22:49 2020 +0000

    net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround
    
    Add a workaround for the detection of VSOL V2801F / CarlitoxxPro
    CPGOS03-0490 v2.0 GPON module which CarlitoxxPro states needs single
    byte I2C reads to the EEPROM.
    
    Pali Rohár reports that he also has a CarlitoxxPro-based V2801F module,
    which reports a manufacturer of "OEM". This manufacturer can't be
    matched as it appears in many different modules, so also match the part
    number too.

etc.

Now, it could be you only work with TOR switches and their SFP
modules. And they follow the standard in a better way than others. But
the kernel is used in many more environments that just data centers.
We need to support SFPs in FTTH boxes, in aircraft inflight
entertainment systems, industrial control etc.


	Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-23 22:29                               ` Andrew Lunn
@ 2021-03-26 18:43                                 ` Don Bollinger
  2021-03-26 19:46                                   ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-26 18:43 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

> > > Our experience is that a number of SFPs are broken, they don't
> > > follow the standard. Some you cannot perform more than 16 bytes
> > > reads without them locking up. Others will perform a 16 byte read,
> > > but only give you one
> > useful
> > > byte of data. So you have to read enough of the EEPROM a byte at a
> > > time to get the vendor and product strings in order to determine
> > > what quirks need to be applied. optoe has nothing like this. Either
> > > you don't care and only support well behaved SFPs, or you have the
> > > quirk handling in user space,
> > in
> > > the various vendor code blobs, repeated again and again. To make
> > > optoe generically usable, you are going to have to push the quirk
> > > handling into optoe. The brokenness should be hidden from userspace.
> >
> > Interesting.  I would throw away such devices.  That's why switch
> > vendors publish supported parts lists.
> >
> > Can you point me to the code that is handling those quirks?  Since I
> > haven't seen those problems, I don't know what they are and how to
> address them.
> 
> Take a look in drivers/net/phy/sfp.c

Thanks for the pointers, I appreciate the chance to understand exactly what
quirks are under discussion.

It turns out that those quirks are not relevant to my patch.  I don't mean
you are wrong, or that they don't have to be handled for your use cases,
just that my patch does not need to deal with them.

If optoe were adopted into your framework, it would replace the routines
sfp_i2c_read() and sfp_i2c_write().  By the time these two routines are
called, all of the quirk handling has already been applied (there is no
quirk handling in these routines today).  The data requests have been broken
down as necessary into short reads or targeted reads for each device.  Those
calls are then sent to the module EEPROM with no further need for quirk
handling.  So, optoe does not need to handle quirks to fit into your code.
The quirks are handled, without optoe being involved.

In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per switch)
often cost more than the switch itself.  Consumers (both vendors and
customers) extensively test these devices to ensure correct and reliable
operation.  Then they buy them literally by the millions.  Quirks lead to
quick rejection in favor of reliable parts from reliable vendors.  In this
environment, for completely different reasons, optoe does not need to handle
quirks.

> 
> commit f0b4f847673299577c29b71d3f3acd3c313d81b7
> Author: Pali Rohár <pali@kernel.org>
> Date:   Mon Jan 25 16:02:28 2021 +0100
> 
>     net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant
> 
>     The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical
information
>     stored in its EEPROM. It claims to support all transceiver types
including
>     10G Ethernet. Clear all claimed modes and set only 1000baseX_Full,
which
> is
>     the only one supported.
> 
>     This module has also phys_id set to SFF, and the SFP subsystem
currently
>     does not allow to use SFP modules detected as SFFs. Add exception for
this
>     module so it can be detected as supported.

In my community, all interpretation of the SFP/QSFP/CMIS content is done in
user space.  So, these issues don't affect optoe.  I know you find that
distasteful, but it is what my community wants and needs.

> 
>     This change finally allows to detect and use SFP GPON module Ubiquiti
>     U-Fiber Instant on Linux system.
> 
>     EEPROM content of this SFP module is (where XX is serial number):
> 
>     00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8
???........??.??
>     10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20    ....UBNT
>     20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41
.??)UF-INSTA
>     30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36    NT      4
??.6
>     40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX
.?..UBNTXXXXXXXX
>     50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41        140123
`??A
> 
> 
> commit 426c6cbc409cbda9ab1a9dbf15d3c2ef947eb8c1
> Author: Pali Rohár <pali@kernel.org>
> Date:   Mon Jan 25 16:02:27 2021 +0100
> 
>     net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
> 
>     The workaround for VSOL V2801F brand based GPON SFP modules added
> in commit
>     0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0
>     workaround") works only for IDs added explicitly to the list. Since
there
>     are rebranded modules where OEM vendors put different strings into the
>     vendor name field, we cannot base workaround on IDs only.

You might be fighting a losing battle here.  There are companies that buy
cheap SFP devices and reprogram the EEPROM to identify them as higher
quality parts from reliable vendors.  Checking the Vendor ID and part number
may tell you have a high quality part, but the code inside still has the
quirks you are trying to avoid.

I developed optoe while working for a (high quality) vendor of these
modules.  Their support team runs into these counterfeit parts all the time.
(I don't work for them or anyone else any more.)

> 
>     Moreover the issue which the above mentioned commit tried to work
> around is
>     generic not only to VSOL based modules, but rather to all GPON modules
>     based on Realtek RTL8672 and RTL9601C chips.
> 
>     These include at least the following GPON modules:
>     * V-SOL V2801F
>     * C-Data FD511GX-RM0
>     * OPTON GP801R
>     * BAUDCOM BD-1234-SFM
>     * CPGOS03-0490 v2.0
>     * Ubiquiti U-Fiber Instant
>     * EXOT EGS1
> 
>     These Realtek chips have broken EEPROM emulator which for N-byte read
>     operation returns just the first byte of EEPROM data, followed by N-1
>     zeros.
> 
>     Introduce a new function, sfp_id_needs_byte_io(), which detects SFP
> modules
>     with broken EEPROM emulator based on N-1 zeros and switch to 1 byte
> EEPROM
>     reading operation.
> 
>     Function sfp_i2c_read() now always uses single byte reading when it is
>     required and when function sfp_hwmon_probe() detects single byte
> access,
>     it disables registration of hwmon device, because in this case we
cannot
>     reliably and atomically read 2 bytes as is required by the standard
for
>     retrieving values from diagnostic area.
> 
>     (These Realtek chips are broken in a way that violates SFP standards
for
>     diagnostic interface. Kernel in this case simply cannot do anything
less
>     of skipping registration of the hwmon interface.)
> 
>     This patch fixes reading of EEPROM content from SFP modules based on
>     Realtek RTL8672 and RTL9601C chips. Diagnostic interface of EEPROM
stays
>     broken and cannot be fixed.
> 
> commit 624407d2cf14ff58e53bf4b2af9595c4f21d606e
> Author: Russell King <rmk+kernel@armlinux.org.uk>
> Date:   Sun Jan 10 10:58:32 2021 +0000
> 
>     net: sfp: cope with SFPs that set both LOS normal and LOS inverted
> 
>     The SFP MSA defines two option bits in byte 65 to indicate how the
>     Rx_LOS signal on SFP pin 8 behaves:
> 
>     bit 2 - Loss of Signal implemented, signal inverted from standard
>             definition in SFP MSA (often called "Signal Detect").
>     bit 1 - Loss of Signal implemented, signal as defined in SFP MSA
>             (often called "Rx_LOS").
> 
>     Clearly, setting both bits results in a meaningless situation: it
would
>     mean that LOS is implemented in both the normal sense (1 = signal
loss)
>     and inverted sense (0 = signal loss).
> 
>     Unfortunately, there are modules out there which set both bits, which
>     will be initially interpret as "inverted" sense, and then, if the LOS
>     signal changes state, we will toggle between LINK_UP and WAIT_LOS
>     states.
> 
>     Change our LOS handling to give well defined behaviour: only interpret
>     these bits as meaningful if exactly one is set, otherwise treat it as
>     if LOS is not implemented.
> 
> ommit 0d035bed2a4a6c4878518749348be61bf082d12a
> Author: Russell King <rmk+kernel@armlinux.org.uk>
> Date:   Wed Dec 9 11:22:49 2020 +0000
> 
>     net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround
> 
>     Add a workaround for the detection of VSOL V2801F / CarlitoxxPro
>     CPGOS03-0490 v2.0 GPON module which CarlitoxxPro states needs single
>     byte I2C reads to the EEPROM.
> 
>     Pali Rohár reports that he also has a CarlitoxxPro-based V2801F
module,
>     which reports a manufacturer of "OEM". This manufacturer can't be
>     matched as it appears in many different modules, so also match the
part
>     number too.
> 
> etc.
> 
> Now, it could be you only work with TOR switches and their SFP modules.
> And they follow the standard in a better way than others. But the kernel
is
> used in many more environments that just data centers.
> We need to support SFPs in FTTH boxes, in aircraft inflight entertainment
> systems, industrial control etc.

Totally agree.  As a replacement for sfp_i2c_read() and sfp_i2c_write(),
optoe would achieve everything the existing code does, plus it would support
paged devices.  Works the same, but covers more capabilities.

AND, for TOR switches, and their SFP AND QSFP AND CMIS modules, it works
too, even when they don't adopt linux kernel networking.  Unfortunately you
have decided that their architecture, handling networking outside the
kernel, is not allowed.  My community does not get to have the perfectly
good code they use to access these devices accepted into the mainline
kernel.

> 
> 
> 	Andrew

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-20 16:10                             ` Pali Rohár
@ 2021-03-26 18:43                               ` Don Bollinger
  2021-03-29 22:13                                 ` Pali Rohár
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-26 18:43 UTC (permalink / raw)
  To: 'Pali Rohár'
  Cc: 'Andrew Lunn', 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

> Hello Don!
> 
> I have read whole discussion and your EEPROM patch proposal. But for me it
> looks like some kernel glue code for some old legacy / proprietary access
> method which does not have any usage outside of that old code.

I don't know if 'kernel glue code' is good or bad.  It is a driver.  It
locks access to a device so it can perform multiple accesses without
interference.  It organizes the data on a weird device into a simple linear
address space that can be accessed with open(), seek(), read() and write()
calls.

As for 'old code', this code and variations of it are under active
development by multiple Network OS vendors and multiple switch vendors, and
in production on hundreds of thousands of switches with millions of
SFP/QSFP/CMIS devices.  This stuff is running the biggest clouds in the
world.

> 
> Your code does not contain any quirks which are needed to read different
> EEPROMs in different SFP modules. As Andrew wrote there are lot of broken
> SFPs which needs special handling and this logic is already implemented in
> sfp.c and sfp-bus.c kernel drivers. These drivers then export EEPROM
> content to userspace via ethtool -m API in unified way and userspace does
> not implement any quirks (nor does not have to deal with quirks).

As a technical matter, you handle those quirks in the code that interprets
EEPROM data.  You have figured out what devices have what quirks, then coded
up solutions to make them work.  Then, after all the quirk handling is done,
you call the actual access routines (sfp_i2c_read() and sfp_i2c_write()) to
access the module EEPROMs.  My code works the same way, except in my
community all the interpretation of EEPROM data is done in user space.  You
may not like that, but it is not clear why you should care how my community
chooses to handle the data.  In this architecture, the only thing needed
from the kernel is the equivalent of sfp_i2c_read() and sfp_i2c_write, which
optoe provides.  The key point here is that my community wants the kernel to
just access the data.  No interpretation, no identification, no special
cases.

> 
> If you try to read EEPROM "incorrectly" then SFP module with its EEPROM
> chip (or emulation of chip) locks and is fully unusable after you unplug
it and
> plug it again. Kernel really should not export API to userspace which can
> cause "damage" to SFP modules. And currently it does *not* do it.

In my community, such devices are not tolerated.  Modules which can be
"damaged" should be thrown away.

Please be clear, I am not disagreeing with your implementation.  For your
GPON devices, you handle this in kernel code.  Cool.  Keep it that way.
Just don't make my community implement that where it is not needed and not
wanted.  Optoe just accesses the device.  Other layers handle interpretation
and quirks.  Your handling is in sfp.c, mine is in user space.  Not right or
wrong, just different.  Both work.

> 
> I have contributed code for some GPON SFP modules, so their EEPROM can
> be correctly read and exported to userspace via ethtool -m. So I know that
> this is very fragile area and needs to be properly handled.

My code is in use in cloud datacenters and campus switches around the world.
I know it needs to be properly handled.

> 
> So I do not see any reason why can be a new optoe method in _current_
> form useful. It does not implemented required things for handling
different
> EEPROM modules.

optoe would be useful in your current environment as a replacement for
sfp_i2c_read() and sfp_i2c_write().  Those routines just access the EEPROM.
They don't identify or interpret or implement quirk handling.  Neither does
optoe.

AND, optoe is useful to my community.  An ethtool -m solution could of
course be implemented, and all of the user space code that currently
accesses module EEPROM could be rewritten, but there would be no value in
that to my community.  What they have works just fine.

> 
> I would rather suggest you to use ethtool -m IOCTL API and in case it is
not
> suitable for QSFP (e.g. because of paging system) then extend this API.

optoe already handles QSFP and CMIS just fine.  The API does not need to be
extended for pages.  Indeed, the ethtool API has already implemented the
same linear address space to flatten out the two i2c addresses plus pages in
the various SFP/QSFP/CMIS specs.  optoe flattens the same way.

> 
> There were already proposals for using netlink socket interface which is
> today de-facto standard interface for new code. sysfs API for such thing
> really looks like some legacy code and nowadays we have better access
> methods.

netlink is the de-facto standard interface for kernel networking.  My
community does not have kernel networking (except for a puny management
port).  All of the switch networking is handled by the switch ASIC, and in
user space network management software.  Which is 'better' is complicated,
depending on the needs and requirements of the application.  A large and
vibrant community is using a different architecture.  All I am asking for is
to submit a simple kernel driver that this community has found useful.

> 
> If you want, I can help you with these steps and make patches to be in
> acceptable state; not written in "legacy" style. As I'm working with GPON
SFP
> modules with different EEPROMs in them, I'm really interested in any
> improvements in their EEPROM area.

You will find this odd, but I don't actually have any way to test anything
using the kernel network stack with these devices.  The only hardware I have
treats SFP/QSFP/CMIS devices as i2c addresses.  There is no concept of these
being network devices in the linux kernels I'm running.  So, I'll turn your
offer around...  I can help you improve your EEPROM access code, unless you
already have all the good stuff.  The things optoe does that
sfp_i2c_read/write() don't do are:

Page support:  Check whether pages are supported, then lock the device,
write the page register as needed, and return it to 0 when finished.
Check legal access:  Based on page support, and which spec (SFP, QSFP,
CMIS), figure out if the offset and length are legal.  This is more
interesting when flattening to a linear address space, less interesting in
the new API with i2c_addr, page, bank, offset, len all specified.
Support all 256 architected pages:  There are interesting capabilities that
vendors provide in spec-approved 'proprietary' pages.  Don't ask, don't
interpret, don't deny access.  Just execute the requested read/write.

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 18:43                                 ` Don Bollinger
@ 2021-03-26 19:46                                   ` Andrew Lunn
  2021-03-26 20:16                                     ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-26 19:46 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per switch)
> often cost more than the switch itself.  Consumers (both vendors and
> customers) extensively test these devices to ensure correct and reliable
> operation.  Then they buy them literally by the millions.  Quirks lead to
> quick rejection in favor of reliable parts from reliable vendors.  In this
> environment, for completely different reasons, optoe does not need to handle
> quirks.

Well, if optoe were to be merged, it would not be just for your
community. It has to work for everybody who wants to use the Linux
kernel with an SFP. You cannot decide to add a KAPI which just
supports a subset of SFPs. It needs to support as many as possible,
warts and all.

So how would you handle these SFPs with the optoe KAPI?

   Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 19:46                                   ` Andrew Lunn
@ 2021-03-26 20:16                                     ` Don Bollinger
  2021-03-26 20:37                                       ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-26 20:16 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

> > In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per
> > switch) often cost more than the switch itself.  Consumers (both
> > vendors and
> > customers) extensively test these devices to ensure correct and
> > reliable operation.  Then they buy them literally by the millions.
> > Quirks lead to quick rejection in favor of reliable parts from
> > reliable vendors.  In this environment, for completely different
> > reasons, optoe does not need to handle quirks.
> 
> Well, if optoe were to be merged, it would not be just for your community.
It
> has to work for everybody who wants to use the Linux kernel with an SFP.
> You cannot decide to add a KAPI which just supports a subset of SFPs. It
> needs to support as many as possible, warts and all.
> 
> So how would you handle these SFPs with the optoe KAPI?

Just like they are handled now.  Folks who use your stack would filter
through sfp.c, with all the quirk handling, and eventually call optoe for
the actual device access.  Folks who don't use kernel networking would call
optoe directly and limit themselves to well behaved SFPs, or would handle
quirks in user space.  You think that's dumb, but there is clearly a market
for that approach.  It is working, at scale, today.

BTW, why can't we have a driver that only handles devices that work
correctly?  Indeed, why would we tolerate an endless stream of special cases
for each new device that comes along?

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 20:16                                     ` Don Bollinger
@ 2021-03-26 20:37                                       ` Andrew Lunn
  2021-03-26 21:09                                         ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-26 20:37 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

On Fri, Mar 26, 2021 at 01:16:14PM -0700, Don Bollinger wrote:
> > > In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per
> > > switch) often cost more than the switch itself.  Consumers (both
> > > vendors and
> > > customers) extensively test these devices to ensure correct and
> > > reliable operation.  Then they buy them literally by the millions.
> > > Quirks lead to quick rejection in favor of reliable parts from
> > > reliable vendors.  In this environment, for completely different
> > > reasons, optoe does not need to handle quirks.
> > 
> > Well, if optoe were to be merged, it would not be just for your community.
> It
> > has to work for everybody who wants to use the Linux kernel with an SFP.
> > You cannot decide to add a KAPI which just supports a subset of SFPs. It
> > needs to support as many as possible, warts and all.
> > 
> > So how would you handle these SFPs with the optoe KAPI?
> 
> Just like they are handled now.  Folks who use your stack would filter
> through sfp.c, with all the quirk handling, and eventually call optoe for
> the actual device access.  Folks who don't use kernel networking would call
> optoe directly and limit themselves to well behaved SFPs, or would handle
> quirks in user space.  You think that's dumb, but there is clearly a market
> for that approach.  It is working, at scale, today.
> 
> BTW, why can't we have a driver

You keep missing the point. I always refer to the KAPI. The driver we
can rework and rework, throw away and reimplement, as much as we want.
The KAPI cannot be changed, it is ABI. It is pretty much frozen the
day the code is first committed.

The optoe KAPI needs to handle these 'interesting' SFP modules. The
KAPI design needs to be flexible enough that the driver underneath it
can be extended to support these SFPs. The code does not need to be
there, but the KAPI design needs to allow it. And i personally cannot
see how the optoe KAPI can efficiently support these SFPs.

    Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 20:37                                       ` Andrew Lunn
@ 2021-03-26 21:09                                         ` Don Bollinger
  2021-03-26 21:54                                           ` Andrew Lunn
  2021-03-27 12:40                                           ` Greg KH
  0 siblings, 2 replies; 30+ messages in thread
From: Don Bollinger @ 2021-03-26 21:09 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

On Fri, Mar 26, 2021 at 01:37PM -0700, Andrew Lunn wrote:
> On Fri, Mar 26, 2021 at 01:16:14PM -0700, Don Bollinger wrote:
> > > > In my community, the SFP/QSFP/CMIS devices (32 to 128 of them per
> > > > switch) often cost more than the switch itself.  Consumers (both
> > > > vendors and
> > > > customers) extensively test these devices to ensure correct and
> > > > reliable operation.  Then they buy them literally by the millions.
> > > > Quirks lead to quick rejection in favor of reliable parts from
> > > > reliable vendors.  In this environment, for completely different
> > > > reasons, optoe does not need to handle quirks.
> > >
> > > Well, if optoe were to be merged, it would not be just for your
> community.
> > It
> > > has to work for everybody who wants to use the Linux kernel with an
> SFP.
> > > You cannot decide to add a KAPI which just supports a subset of
> > > SFPs. It needs to support as many as possible, warts and all.
> > >
> > > So how would you handle these SFPs with the optoe KAPI?
> >
> > Just like they are handled now.  Folks who use your stack would filter
> > through sfp.c, with all the quirk handling, and eventually call optoe
> > for the actual device access.  Folks who don't use kernel networking
> > would call optoe directly and limit themselves to well behaved SFPs,
> > or would handle quirks in user space.  You think that's dumb, but
> > there is clearly a market for that approach.  It is working, at scale,
today.
> >
> > BTW, why can't we have a driver
> 
> You keep missing the point. I always refer to the KAPI. The driver we can
> rework and rework, throw away and reimplement, as much as we want.
> The KAPI cannot be changed, it is ABI. It is pretty much frozen the day
the
> code is first committed.

Maybe I don't understand what you mean by KAPI.  The KAPI that optoe exposes
is in two parts.

First, it makes the EEPROM accessible via the nvmem() interface, an existing
KAPI that I call from optoe.  at24 implemented it, I made use of it.  This
interface exposes EEPROM data to user space through a defined sysfs() file.
I didn't invent this, nor am I proposing it, it already exists.

Second, it specifies how the data in the EEPROM is accessed.  It says that
low memory is in offset 0-127, and paged memory starts at offset (128 +
(page * 128)).  For SFP devices, memory at i2c address 0x50 is the first 256
bytes, and everything at 0x51 is pushed up 256 bytes.  This is exactly the
behavior of ethtool.  Again, I didn't invent this, I adopted the existing
convention for how to flatten the SFP/QSFP/CMIS address space.

With these two parts, EEPROM data can be accessed by standard open(2),
seek(2), read(2), write(2) calls.  Nothing special there, the actual syntax
is as old school and standard as you can possibly get.

So, what is wrong with that KAPI?

The only thing wrong I can see is that it doesn't use the kernel network
stack.  Here's where "you keep missing the point".  The kernel network stack
is not being used in these systems.  Implementing EEPROM access through
ethtool is of course possible, but it is a lot of work with no benefit on
these systems.  The simple approach works.  Let me use it.

> 
> The optoe KAPI needs to handle these 'interesting' SFP modules. The KAPI
> design needs to be flexible enough that the driver underneath it can be
> extended to support these SFPs. The code does not need to be there, but
> the KAPI design needs to allow it. And i personally cannot see how the
optoe
> KAPI can efficiently support these SFPs.

Help me understand.  Your KAPI specifies ethtool as the KAPI, and the
ethtool/netlink stack as the interface through which the data flows.  As I
see it, your KAPI only specifies i2c address, page, bank, offset and length.
How does your KAPI support interesting SFP modules but mine does not?  optoe
could be reworked ad infinitum to implement support for quirks, just as
yours can.  Neither has any hint of quirks in the KAPI.

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 21:09                                         ` Don Bollinger
@ 2021-03-26 21:54                                           ` Andrew Lunn
  2021-03-26 22:30                                             ` Don Bollinger
  2021-03-27 12:40                                           ` Greg KH
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-26 21:54 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> The only thing wrong I can see is that it doesn't use the kernel network
> stack.  Here's where "you keep missing the point".  The kernel network stack
> is not being used in these systems.  Implementing EEPROM access through
> ethtool is of course possible, but it is a lot of work with no benefit on
> these systems.  The simple approach works.  Let me use it.
> 
> > 
> > The optoe KAPI needs to handle these 'interesting' SFP modules. The KAPI
> > design needs to be flexible enough that the driver underneath it can be
> > extended to support these SFPs. The code does not need to be there, but
> > the KAPI design needs to allow it. And i personally cannot see how the
> optoe
> > KAPI can efficiently support these SFPs.
> 
> Help me understand.  Your KAPI specifies ethtool as the KAPI, and the
> ethtool/netlink stack as the interface through which the data flows.

Nearly. It specifies netlink and the netlink messages definitions.
They happen to be in the ethtool group, but there is no requirement to
use ethtool(1).

But there is a bit more to it.

Either the MAC driver implements the needed code, or it defers to the
generic sfp.c code. In both cases, you have access to the GPIO
lines.

If you are using the generic sfp.c, the Device Tree definitions of the
GPIOs become part of the KAPI. DT is consider ABI.

So the low level code knows when there was been a hotplug. It then can
determine what quirks to apply for all future reads until the module
is unplugged.

Your KAPI is missing how optoe gets access to the GPIOs. Without
knowing if the module has been hotplugged, in a robust manor, you have
problems with quirks. For every userspace read, you need to assume the
module has been changed, read the ID information from the EEPROM a
byte at a time, figure out what quirks to apply, and then do the user
requested read. I doubt that is good for performance. The design which
has been chosen is that userspace is monitoring the GPIO lines. So to
make it efficient, your KAPI need some way to pass down that the
module has/has not been hot-plugged since the last read.

Or do you see some other way to implement these quirks?

       Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 21:54                                           ` Andrew Lunn
@ 2021-03-26 22:30                                             ` Don Bollinger
  2021-03-27 15:25                                               ` Andrew Lunn
  0 siblings, 1 reply; 30+ messages in thread
From: Don Bollinger @ 2021-03-26 22:30 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Friday, March 26, 2021 2:54 PM
> To: Don Bollinger <don@thebollingers.org>
> Cc: 'Jakub Kicinski' <kuba@kernel.org>; arndb@arndb.de;
> gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org;
> brandon_chuang@edge-core.com; wally_wang@accton.com;
> aken_liu@edge-core.com; gulv@microsoft.com; jolevequ@microsoft.com;
> xinxliu@microsoft.com; 'netdev' <netdev@vger.kernel.org>; 'Moshe
> Shemesh' <moshe@nvidia.com>
> Subject: Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS
> EEPROMS
> 
> > The only thing wrong I can see is that it doesn't use the kernel
> > network stack.  Here's where "you keep missing the point".  The kernel
> > network stack is not being used in these systems.  Implementing EEPROM
> > access through ethtool is of course possible, but it is a lot of work
> > with no benefit on these systems.  The simple approach works.  Let me
use
> it.
> >
> > >
> > > The optoe KAPI needs to handle these 'interesting' SFP modules. The
> > > KAPI design needs to be flexible enough that the driver underneath
> > > it can be extended to support these SFPs. The code does not need to
> > > be there, but the KAPI design needs to allow it. And i personally
> > > cannot see how the
> > optoe
> > > KAPI can efficiently support these SFPs.
> >
> > Help me understand.  Your KAPI specifies ethtool as the KAPI, and the
> > ethtool/netlink stack as the interface through which the data flows.
> 
> Nearly. It specifies netlink and the netlink messages definitions.
> They happen to be in the ethtool group, but there is no requirement to use
> ethtool(1).
> 
> But there is a bit more to it.
> 
> Either the MAC driver implements the needed code, or it defers to the
> generic sfp.c code. In both cases, you have access to the GPIO lines.
> 
> If you are using the generic sfp.c, the Device Tree definitions of the
GPIOs
> become part of the KAPI. DT is consider ABI.
> 
> So the low level code knows when there was been a hotplug. It then can
> determine what quirks to apply for all future reads until the module is
> unplugged.

Got it.  All good.  I agree, I would like optoe to have access to the GPIO
lines, but it is a "nice to have", not a requirement...

> 
> Your KAPI is missing how optoe gets access to the GPIOs. Without knowing
if

Right.  It doesn't get access to the GPIOs.  So, that is not part of its
KAPI.

> the module has been hotplugged, in a robust manor, you have problems
> with quirks. For every userspace read, you need to assume the module has
> been changed, read the ID information from the EEPROM a byte at a time,
> figure out what quirks to apply, and then do the user requested read. I
doubt

Again, optoe does not deal with quirks.  Your code filters them out before
calling optoe or sfp_i2c_read.  My stack does not deal with them.  In my
community, quirky devices are not tolerated.  In the 4 years this code has
been in use, nobody has ever asked me to accommodate a weird module.

I inherited a limitation of writing one byte at a time from the ancestor of
optoe, which I have kept.  I don't know if it is needed, but someone once
thought it was required.  I apply it universally, all devices of all types.

I do need one item of state from the EEPROM, the register that tells me
whether pages are supported by the device.  Due to the hotplug risk, I
actually do read that register once for each access to non-zero pages.
(Once per read or write call.)  Access to GPIOs would eliminate that.  It
turns out the vast majority of calls are to page 0 or lower memory, and the
performance penalty is at most 25% since there is also a page write, data
access and page write in the same call.  The penalty goes down as the number
of bytes read increases.  Overall, it has never come up as an issue.  People
don't expect these things to be fast.

> that is good for performance. The design which has been chosen is that
> userspace is monitoring the GPIO lines. So to make it efficient, your KAPI
> need some way to pass down that the module has/has not been hot-
> plugged since the last read.

Nope.  optoe does not need to know, it assumes a new device every time it is
accessed.

> 
> Or do you see some other way to implement these quirks?

What I have works.  Your consumers get quirk handling, mine don't need it.
No compromise.

> 
>        Andrew

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 21:09                                         ` Don Bollinger
  2021-03-26 21:54                                           ` Andrew Lunn
@ 2021-03-27 12:40                                           ` Greg KH
  1 sibling, 0 replies; 30+ messages in thread
From: Greg KH @ 2021-03-27 12:40 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Andrew Lunn', 'Jakub Kicinski',
	arndb, linux-kernel, brandon_chuang, wally_wang, aken_liu, gulv,
	jolevequ, xinxliu, 'netdev', 'Moshe Shemesh'

On Fri, Mar 26, 2021 at 02:09:36PM -0700, Don Bollinger wrote:
> > You keep missing the point. I always refer to the KAPI. The driver we can
> > rework and rework, throw away and reimplement, as much as we want.
> > The KAPI cannot be changed, it is ABI. It is pretty much frozen the day
> the
> > code is first committed.
> 
> Maybe I don't understand what you mean by KAPI.  The KAPI that optoe exposes
> is in two parts.
> 
> First, it makes the EEPROM accessible via the nvmem() interface, an existing
> KAPI that I call from optoe.  at24 implemented it, I made use of it.  This
> interface exposes EEPROM data to user space through a defined sysfs() file.
> I didn't invent this, nor am I proposing it, it already exists.

Again, a "raw" interface to a device that is just memory-mapping all of
the device information directly is no sort of a real KABI at all.

It is no different from trying to use /dev/mem/ to write a networking
driver, just because you can mmap in the device's configuration space to
userspace.

That is not a real api, it is only using the kernel as a "pass-through"
which works fine for one-off devices, and other oddities, but is not a
unified user/kernel api for a class of device types at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 22:30                                             ` Don Bollinger
@ 2021-03-27 15:25                                               ` Andrew Lunn
  2021-03-27 21:20                                                 ` Don Bollinger
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2021-03-27 15:25 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

> What I have works.  Your consumers get quirk handling, mine don't need it.
> No compromise.

Hi Don

All this discussion is now a mute point. GregKH has spoken.

But i'm sure there are some on the side lines, eating popcorn, maybe
learning from the discussion.

Would you think it is O.K. to add a KAPI which works for 3 1/2" SCSI
disks, but not 2", because you only make machines with 3 1/2" bays?

This is an extreme, absurd example, but hopefully you get the
point. We don't design KAPIs with the intention to only work for a
subset of devices. It needs to work with as many devices as possible,
even if the first implementation below the KAPI is limited to just a
subset.

Anyway, i'm gratefull you have looked at the new ethtool netlink
KAPI. It will be better for your contributions. And i hope you can
make use of it in the future. But i think this discussion about optoe
in mainline is over.

     Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-27 15:25                                               ` Andrew Lunn
@ 2021-03-27 21:20                                                 ` Don Bollinger
  0 siblings, 0 replies; 30+ messages in thread
From: Don Bollinger @ 2021-03-27 21:20 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh',
	don

> > What I have works.  Your consumers get quirk handling, mine don't need
it.
> > No compromise.
> 
> Hi Don
> 
> All this discussion is now a mute point. GregKH has spoken.

Ack.  I actually checked in with Greg a couple of days ago and got that
answer.  I just thought the discussion was going in an interesting direction
and didn't want to end it yet.  Response below is in the same vein.

> 
> But i'm sure there are some on the side lines, eating popcorn, maybe
> learning from the discussion.

Honestly not sure what they are learning from the discussion.  I think what
I learned is not what you think you taught me.

> 
> Would you think it is O.K. to add a KAPI which works for 3 1/2" SCSI
disks, but
> not 2", because you only make machines with 3 1/2" bays?

No.  Not sure how the analogy works.  QSFP is a larger form factor than SFP,
and the EEPROM layout changed at the same time, but optoe and my community
had no problem accommodating both.  CMIS changed the EEPROM layout again,
but it was easily accommodated.

> 
> This is an extreme, absurd example, but hopefully you get the point. We
> don't design KAPIs with the intention to only work for a subset of
devices. It
> needs to work with as many devices as possible, even if the first
> implementation below the KAPI is limited to just a subset.

Let me run with your example...  Suppose I used those 3 1/2" SCSI disks to
build storage servers.  Let's call them RAID devices.  Innovation follows, I
figure out how to make these devices hot swappable, hot backup, massively
parallel...  Innovation follows and I evolve this into a distributed
architecture with redundant data, encrypted, compressed, distributed across
servers and data centers.  Massively parallel, synchronized, access to the
data anywhere in the world, at bandwidth limited only by the size of your
network pipe.  Let's call it RIDSS (Redundant, Independent, Distributed
Storage Servers).  And I can use 2" disks.  Or non-spinning storage.

My fancy technology thinks the storage is dumb.  I present a
track/sector/length and I get back the bits.  Just for grins, let's say I
can also query the device for a list of bad blocks (sectors, whatever).

Recently, with the addition of 2" devices, YOU figured out that you can
stuff several disks into a small space, and manage them with software and
call it RAID.  You build a nice abstraction for storage, which contains
individual disks, and handles parallelism, redundancy, hot swap.  Very cool,
works well, a genuine innovation.

I've been using Linux for RIDSS for years, I get the memo that my linux
driver to access these SCSI devices should be in the upstream kernel.
Oddly, there are no drivers in the kernel that I can just present
track/sector/length and get back the bits.  So, I offer mine.  The answer is
no, that is a RAID device, you need to access the device through RAID data
structures and APIs.

Sorry, no, that is not a RAID device.  That is a storage device.  You use it
for RAID storage, I use it for RIDSS storage.  We need a layered
architecture with raw data access at the bottom (we both need the same
track/sector/length access).  Beyond that, your RAID stack, while brilliant,
has virtually nothing to do with my RIDSS stack.  They sound superficially
similar, but the technology and architecture are wildly different.  The RAID
stack is unnecessary for my RIDSS architecture, and requires widespread
changes to my software that yield no benefit.

So, no, I don't get your point.  I think there is value in a simple layered
architecture, where access to the module EEPROM is independent of the
consumer of that access.  You can access it for kernel networking, which is
useful, innovative, valuable.  I can access it for TOR switches which do not
use kernel networking but are nonetheless useful, innovative, valuable.
Your decision to reject optoe means the TOR community has to maintain this
simple bit of kernel code outside the mainline tree.  The judges have ruled,
case closed.

> 
> Anyway, i'm gratefull you have looked at the new ethtool netlink KAPI. It
will
> be better for your contributions. And i hope you can make use of it in the

Thanks for the acknowledgement.  

> future. But i think this discussion about optoe in mainline is over.

This discussion is indeed over if you say it is.  I'll be moving on :-(.

> 
>      Andrew

Don


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS
  2021-03-26 18:43                               ` Don Bollinger
@ 2021-03-29 22:13                                 ` Pali Rohár
  0 siblings, 0 replies; 30+ messages in thread
From: Pali Rohár @ 2021-03-29 22:13 UTC (permalink / raw)
  To: Don Bollinger
  Cc: 'Andrew Lunn', 'Jakub Kicinski',
	arndb, gregkh, linux-kernel, brandon_chuang, wally_wang,
	aken_liu, gulv, jolevequ, xinxliu, 'netdev',
	'Moshe Shemesh'

On Friday 26 March 2021 11:43:10 Don Bollinger wrote:
> > Hello Don!
> > 
> > I have read whole discussion and your EEPROM patch proposal. But for me it
> > looks like some kernel glue code for some old legacy / proprietary access
> > method which does not have any usage outside of that old code.
> 
> I don't know if 'kernel glue code' is good or bad.  It is a driver.  It
> locks access to a device so it can perform multiple accesses without
> interference.  It organizes the data on a weird device into a simple linear
> address space that can be accessed with open(), seek(), read() and write()
> calls.
> 
> As for 'old code', this code and variations of it are under active
> development by multiple Network OS vendors and multiple switch vendors, and
> in production on hundreds of thousands of switches with millions of
> SFP/QSFP/CMIS devices.  This stuff is running the biggest clouds in the
> world.
> 
> > 
> > Your code does not contain any quirks which are needed to read different
> > EEPROMs in different SFP modules. As Andrew wrote there are lot of broken
> > SFPs which needs special handling and this logic is already implemented in
> > sfp.c and sfp-bus.c kernel drivers. These drivers then export EEPROM
> > content to userspace via ethtool -m API in unified way and userspace does
> > not implement any quirks (nor does not have to deal with quirks).
> 
> As a technical matter, you handle those quirks in the code that interprets
> EEPROM data.  You have figured out what devices have what quirks, then coded
> up solutions to make them work.  Then, after all the quirk handling is done,
> you call the actual access routines (sfp_i2c_read() and sfp_i2c_write()) to
> access the module EEPROMs.  My code works the same way, except in my
> community all the interpretation of EEPROM data is done in user space.  You
> may not like that, but it is not clear why you should care how my community
> chooses to handle the data.  In this architecture, the only thing needed
> from the kernel is the equivalent of sfp_i2c_read() and sfp_i2c_write, which
> optoe provides.  The key point here is that my community wants the kernel to
> just access the data.  No interpretation, no identification, no special
> cases.
> 
> > 
> > If you try to read EEPROM "incorrectly" then SFP module with its EEPROM
> > chip (or emulation of chip) locks and is fully unusable after you unplug
> it and
> > plug it again. Kernel really should not export API to userspace which can
> > cause "damage" to SFP modules. And currently it does *not* do it.
> 
> In my community, such devices are not tolerated.  Modules which can be
> "damaged" should be thrown away.

Well, those tested / problematic modules are not damaged by real. They
just stop working until you do power off / on cycle and unplug / plug
them again.

But I agree with you. Russel wrote about those SFP modules that they
should have biohazard label :-)

The best would be if those modules disappear, but that would not happen
in near future.

These modules are already used by lot of users and users wanted support
for them. Hence we wrote what was possible.

The main issue is that we do know know any well-behaved GPON SFP module.

GPON is now very common technology for internet access, so it is not
obvious that more people are asking about support for GPON HW compatible
with Linux kernel.

> Please be clear, I am not disagreeing with your implementation.  For your
> GPON devices, you handle this in kernel code.  Cool.  Keep it that way.
> Just don't make my community implement that where it is not needed and not
> wanted.  Optoe just accesses the device.  Other layers handle interpretation
> and quirks.  Your handling is in sfp.c, mine is in user space.  Not right or
> wrong, just different.  Both work.
> 
> > 
> > I have contributed code for some GPON SFP modules, so their EEPROM can
> > be correctly read and exported to userspace via ethtool -m. So I know that
> > this is very fragile area and needs to be properly handled.
> 
> My code is in use in cloud datacenters and campus switches around the world.
> I know it needs to be properly handled.
> 
> > 
> > So I do not see any reason why can be a new optoe method in _current_
> > form useful. It does not implemented required things for handling
> different
> > EEPROM modules.
> 
> optoe would be useful in your current environment as a replacement for
> sfp_i2c_read() and sfp_i2c_write().  Those routines just access the EEPROM.
> They don't identify or interpret or implement quirk handling.  Neither does
> optoe.

Now that we know that there are SFP modules which needs special handling
without it they lock themselves on i2c bus, we really cannot export to
userspace interface which allows this locking.

I understand that you do not care for these broken GPON SFP modules, but
"generic" API which would be part of mainline kernel really cannot have
known issue that can cause some modules to lock by just simple "read"
operation.

> AND, optoe is useful to my community.  An ethtool -m solution could of
> course be implemented, and all of the user space code that currently
> accesses module EEPROM could be rewritten, but there would be no value in
> that to my community.  What they have works just fine.

I understand that your API is useful for you and your existing projects.

> > 
> > I would rather suggest you to use ethtool -m IOCTL API and in case it is
> not
> > suitable for QSFP (e.g. because of paging system) then extend this API.
> 
> optoe already handles QSFP and CMIS just fine.  The API does not need to be
> extended for pages.  Indeed, the ethtool API has already implemented the
> same linear address space to flatten out the two i2c addresses plus pages in
> the various SFP/QSFP/CMIS specs.  optoe flattens the same way.
> 
> > 
> > There were already proposals for using netlink socket interface which is
> > today de-facto standard interface for new code. sysfs API for such thing
> > really looks like some legacy code and nowadays we have better access
> > methods.
> 
> netlink is the de-facto standard interface for kernel networking.  My
> community does not have kernel networking (except for a puny management
> port).  All of the switch networking is handled by the switch ASIC, and in
> user space network management software.  Which is 'better' is complicated,
> depending on the needs and requirements of the application.  A large and
> vibrant community is using a different architecture.  All I am asking for is
> to submit a simple kernel driver that this community has found useful.

I think that this is the main issue here. You are proposing a new API
for kernel networking code without being user of kernel networking
subsystem. Obviously your code is not directly designed for kernel
networking (as you do not use it and do not need it) and that is why
proposed new API looks like it is -- it perfectly fits your usage.

But for those who are using kernel networking code, proposed new API
does not "fit" into existing networking APIs.

Proposing merging a new API, which is going to replace some existing
API, needs very good arguments. Existing API in this case is ethtool -m.
Replacing it by optoe needs to explain in details why ethtool -m API
cannot be extended for a new functionality which is provided by optoe.

Kernel really do not need two APIs for userspace which provides same set
of functionality.

But I think that main issue is here the fact, that you have a lot of
userspace code already in production which is using this proposed kernel
API and you do not have capacity to change existing code in production
to use different kernel API.

Note that having something in production for years and baked by lot of
vendors, does not mean that would be automatically merged into mainline
kernel.

Anyway, kernel has already API for 'network switching in ASIC' with any
other HW offloading (vlan tagging, ...), so it is possible to use kernel
code and kernel drivers to manage big data center switches. IIRC this
DSA switch architecture was developed initially for Marvell switches.

> > 
> > If you want, I can help you with these steps and make patches to be in
> > acceptable state; not written in "legacy" style. As I'm working with GPON
> SFP
> > modules with different EEPROMs in them, I'm really interested in any
> > improvements in their EEPROM area.
> 
> You will find this odd, but I don't actually have any way to test anything
> using the kernel network stack with these devices.  The only hardware I have
> treats SFP/QSFP/CMIS devices as i2c addresses.  There is no concept of these
> being network devices in the linux kernels I'm running.  So, I'll turn your
> offer around...  I can help you improve your EEPROM access code, unless you
> already have all the good stuff.  The things optoe does that
> sfp_i2c_read/write() don't do are:
> 
> Page support:  Check whether pages are supported, then lock the device,
> write the page register as needed, and return it to 0 when finished.
> Check legal access:  Based on page support, and which spec (SFP, QSFP,
> CMIS), figure out if the offset and length are legal.  This is more
> interesting when flattening to a linear address space, less interesting in
> the new API with i2c_addr, page, bank, offset, len all specified.
> Support all 256 architected pages:  There are interesting capabilities that
> vendors provide in spec-approved 'proprietary' pages.  Don't ask, don't
> interpret, don't deny access.  Just execute the requested read/write.
> 
> Don
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-03-29 22:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210215193821.3345-1-don@thebollingers.org>
2021-02-26 22:34 ` [PATCH v2] eeprom/optoe: driver to read/write SFP/QSFP/CMIS EEPROMS Andrew Lunn
2021-02-27  2:46   ` Don Bollinger
2021-02-27 16:23     ` Andrew Lunn
2021-03-01 20:00     ` Don Bollinger
2021-03-01 20:45       ` Andrew Lunn
2021-03-05 19:07         ` Don Bollinger
2021-03-05 22:55           ` Jakub Kicinski
2021-03-06  2:30             ` Don Bollinger
2021-03-06  3:31               ` Andrew Lunn
2021-03-12 19:04                 ` Don Bollinger
2021-03-12 19:59                   ` Andrew Lunn
2021-03-13 21:35                     ` Don Bollinger
2021-03-15 17:39                       ` Jakub Kicinski
2021-03-15 18:09                         ` Don Bollinger
2021-03-17 18:15                           ` Andrew Lunn
2021-03-20 16:10                             ` Pali Rohár
2021-03-26 18:43                               ` Don Bollinger
2021-03-29 22:13                                 ` Pali Rohár
2021-03-23 20:32                             ` Don Bollinger
2021-03-23 22:29                               ` Andrew Lunn
2021-03-26 18:43                                 ` Don Bollinger
2021-03-26 19:46                                   ` Andrew Lunn
2021-03-26 20:16                                     ` Don Bollinger
2021-03-26 20:37                                       ` Andrew Lunn
2021-03-26 21:09                                         ` Don Bollinger
2021-03-26 21:54                                           ` Andrew Lunn
2021-03-26 22:30                                             ` Don Bollinger
2021-03-27 15:25                                               ` Andrew Lunn
2021-03-27 21:20                                                 ` Don Bollinger
2021-03-27 12:40                                           ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).