netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type
@ 2020-06-26 14:47 Ido Schimmel
  2020-06-26 14:47 ` [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers Ido Schimmel
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ido Schimmel @ 2020-06-26 14:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, vadimp, andrew, popadrian1996, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

This patch set from Vadim adds support for Quad Small Form Factor
Pluggable Double Density (QSFP-DD) modules in mlxsw.

Patch #1 enables dumping of QSFP-DD module information through ethtool.

Patch #2 enables reading of temperature thresholds from QSFP-DD modules
for hwmon and thermal zone purposes.

Vadim Pasternak (2):
  mlxsw: core: Add ethtool support for QSFP-DD transceivers
  mlxsw: core: Add support for temperature thresholds reading for
    QSFP-DD transceivers

 .../net/ethernet/mellanox/mlxsw/core_env.c    | 47 +++++++++++++++----
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |  3 ++
 2 files changed, 41 insertions(+), 9 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-26 14:47 [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
@ 2020-06-26 14:47 ` Ido Schimmel
  2020-06-26 15:19   ` Andrew Lunn
  2020-06-26 14:47 ` [PATCH net-next 2/2] mlxsw: core: Add support for temperature thresholds reading " Ido Schimmel
  2020-06-26 14:53 ` [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
  2 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2020-06-26 14:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, vadimp, andrew, popadrian1996, mlxsw, Ido Schimmel

From: Vadim Pasternak <vadimp@mellanox.com>

The Quad Small Form Factor Pluggable Double Density (QSFP-DD) hardware
specification defines a form factor that supports up to 400 Gbps in
aggregate over an 8x50-Gbps electrical interface. The QSFP-DD supports
both optical and copper interfaces.

Implementation is based on Common Management Interface Specification;
Rev 4.0 May 8, 2019. Table 8-2 "Identifier and Status Summary (Lower
Page)" from this spec defines "Id and Status" fields located at offsets
00h - 02h. Bit 2 at offset 02h ("Flat_mem") specifies QSFP EEPROM memory
mode, which could be "upper memory flat" or "paged". Flat memory mode is
coded "1", and indicates that only page 00h is implemented in EEPROM.
Paged memory is coded "0" and indicates that pages 00h, 01h, 02h, 10h
and 11h are implemented. Pages 10h and 11h are currently not supported
by the driver.

"Flat" memory mode is used for the passive copper transceivers. For this
type only page 00h (256 bytes) is available. "Paged" memory is used for
the optical transceivers. For this type pages 00h (256 bytes), 01h (128
bytes) and 02h (128 bytes) are available. Upper page 01h contains static
advertising field, while upper page 02h contains the module-defined
thresholds and lane-specific monitors.

Extend enumerator 'mlxsw_reg_mcia_eeprom_module_info_id' with additional
field 'MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID'. This field is used to
indicate for QSFP-DD transceiver type which memory mode is to be used.

Expose 256 bytes buffer for QSFP-DD passive copper transceiver and
512 bytes buffer for optical.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/core_env.c    | 19 ++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |  2 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
index 08215fed193d..68f198afc9b0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_env.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
@@ -67,7 +67,8 @@ mlxsw_env_query_module_eeprom(struct mlxsw_core *mlxsw_core, int module,
 		offset -= MLXSW_REG_MCIA_EEPROM_UP_PAGE_LENGTH * page;
 		/* When reading upper pages 1, 2 and 3 the offset starts at
 		 * 128. Please refer to "QSFP+ Memory Map" figure in SFF-8436
-		 * specification for graphical depiction.
+		 * specification and to "CMIS Module Memory Map" Figure in
+		 * CMIS specification for graphical depiction.
 		 * MCIA register accepts buffer size <= 48. Page of size 128
 		 * should be read by chunks of size 48, 48, 32. Align the size
 		 * of the last chunk to avoid reading after the end of the
@@ -210,6 +211,22 @@ int mlxsw_env_get_module_info(struct mlxsw_core *mlxsw_core, int module,
 		else
 			modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN / 2;
 		break;
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD:
+		/* Use SFF_8636 as base type. ethtool should recognize specific
+		 * type through the identifier value.
+		 */
+		modinfo->type       = ETH_MODULE_SFF_8636;
+		/* Verify if module EEPROM is a flat memory. In case of flat
+		 * memory only page 00h (0-255 bytes) can be read. Otherwise
+		 * upper pages 01h and 02h can also be read. Upper pages 10h
+		 * and 11h are currently not supported by the driver.
+		 */
+		if (module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID] &
+		    MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY)
+			modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+		else
+			modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index fcb88d4271bf..73cc7fd5020c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -8548,6 +8548,7 @@ MLXSW_ITEM32(reg, mcia, size, 0x08, 0, 16);
 #define MLXSW_REG_MCIA_TH_PAGE_NUM		3
 #define MLXSW_REG_MCIA_PAGE0_LO			0
 #define MLXSW_REG_MCIA_TH_PAGE_OFF		0x80
+#define MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY	BIT(7)
 
 enum mlxsw_reg_mcia_eeprom_module_info_rev_id {
 	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID_UNSPC	= 0x00,
@@ -8566,6 +8567,7 @@ enum mlxsw_reg_mcia_eeprom_module_info_id {
 enum mlxsw_reg_mcia_eeprom_module_info {
 	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID,
 	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_REV_ID,
+	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID,
 	MLXSW_REG_MCIA_EEPROM_MODULE_INFO_SIZE,
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/2] mlxsw: core: Add support for temperature thresholds reading for QSFP-DD transceivers
  2020-06-26 14:47 [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
  2020-06-26 14:47 ` [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers Ido Schimmel
@ 2020-06-26 14:47 ` Ido Schimmel
  2020-06-26 14:53 ` [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
  2 siblings, 0 replies; 18+ messages in thread
From: Ido Schimmel @ 2020-06-26 14:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, vadimp, andrew, popadrian1996, mlxsw, Ido Schimmel

From: Vadim Pasternak <vadimp@mellanox.com>

Allow QSFP-DD transceivers temperature thresholds reading for hardware
monitoring and thermal control.

For this type, the thresholds are located in page 02h according to the
"Module and Lane Thresholds" description from Common Management
Interface Specification.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/core_env.c    | 28 +++++++++++++------
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |  1 +
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_env.c b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
index 68f198afc9b0..94a208a7367f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_env.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_env.c
@@ -11,7 +11,7 @@
 #include "reg.h"
 
 static int mlxsw_env_validate_cable_ident(struct mlxsw_core *core, int id,
-					  bool *qsfp)
+					  bool *qsfp, bool *cmis)
 {
 	char eeprom_tmp[MLXSW_REG_MCIA_EEPROM_SIZE];
 	char mcia_pl[MLXSW_REG_MCIA_LEN];
@@ -25,15 +25,19 @@ static int mlxsw_env_validate_cable_ident(struct mlxsw_core *core, int id,
 		return err;
 	mlxsw_reg_mcia_eeprom_memcpy_from(mcia_pl, eeprom_tmp);
 	ident = eeprom_tmp[0];
+	*cmis = false;
 	switch (ident) {
 	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_SFP:
 		*qsfp = false;
 		break;
 	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP: /* fall-through */
 	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_PLUS: /* fall-through */
-	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28: /* fall-through */
+	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP28:
+		*qsfp = true;
+		break;
 	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD:
 		*qsfp = true;
+		*cmis = true;
 		break;
 	default:
 		return -EINVAL;
@@ -106,7 +110,8 @@ int mlxsw_env_module_temp_thresholds_get(struct mlxsw_core *core, int module,
 	char mcia_pl[MLXSW_REG_MCIA_LEN] = {0};
 	char mtmp_pl[MLXSW_REG_MTMP_LEN];
 	unsigned int module_temp;
-	bool qsfp;
+	bool qsfp, cmis;
+	int page;
 	int err;
 
 	mlxsw_reg_mtmp_pack(mtmp_pl, MLXSW_REG_MTMP_MODULE_INDEX_MIN + module,
@@ -130,21 +135,28 @@ int mlxsw_env_module_temp_thresholds_get(struct mlxsw_core *core, int module,
 	 */
 
 	/* Validate module identifier value. */
-	err = mlxsw_env_validate_cable_ident(core, module, &qsfp);
+	err = mlxsw_env_validate_cable_ident(core, module, &qsfp, &cmis);
 	if (err)
 		return err;
 
-	if (qsfp)
-		mlxsw_reg_mcia_pack(mcia_pl, module, 0,
-				    MLXSW_REG_MCIA_TH_PAGE_NUM,
+	if (qsfp) {
+		/* For QSFP/CMIS module-defined thresholds are located in page
+		 * 02h, otherwise in page 03h.
+		 */
+		if (cmis)
+			page = MLXSW_REG_MCIA_TH_PAGE_CMIS_NUM;
+		else
+			page = MLXSW_REG_MCIA_TH_PAGE_NUM;
+		mlxsw_reg_mcia_pack(mcia_pl, module, 0, page,
 				    MLXSW_REG_MCIA_TH_PAGE_OFF + off,
 				    MLXSW_REG_MCIA_TH_ITEM_SIZE,
 				    MLXSW_REG_MCIA_I2C_ADDR_LOW);
-	else
+	} else {
 		mlxsw_reg_mcia_pack(mcia_pl, module, 0,
 				    MLXSW_REG_MCIA_PAGE0_LO,
 				    off, MLXSW_REG_MCIA_TH_ITEM_SIZE,
 				    MLXSW_REG_MCIA_I2C_ADDR_HIGH);
+	}
 
 	err = mlxsw_reg_query(core, MLXSW_REG(mcia), mcia_pl);
 	if (err)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 73cc7fd5020c..55da011ad19e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -8546,6 +8546,7 @@ MLXSW_ITEM32(reg, mcia, size, 0x08, 0, 16);
 #define MLXSW_REG_MCIA_PAGE0_LO_OFF		0xa0
 #define MLXSW_REG_MCIA_TH_ITEM_SIZE		2
 #define MLXSW_REG_MCIA_TH_PAGE_NUM		3
+#define MLXSW_REG_MCIA_TH_PAGE_CMIS_NUM		2
 #define MLXSW_REG_MCIA_PAGE0_LO			0
 #define MLXSW_REG_MCIA_TH_PAGE_OFF		0x80
 #define MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY	BIT(7)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type
  2020-06-26 14:47 [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
  2020-06-26 14:47 ` [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers Ido Schimmel
  2020-06-26 14:47 ` [PATCH net-next 2/2] mlxsw: core: Add support for temperature thresholds reading " Ido Schimmel
@ 2020-06-26 14:53 ` Ido Schimmel
  2020-06-26 15:06   ` Andrew Lunn
  2 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2020-06-26 14:53 UTC (permalink / raw)
  To: netdev, popadrian1996
  Cc: davem, kuba, jiri, vadimp, andrew, mlxsw, Ido Schimmel

On Fri, Jun 26, 2020 at 05:47:22PM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> This patch set from Vadim adds support for Quad Small Form Factor
> Pluggable Double Density (QSFP-DD) modules in mlxsw.

Adrian,

In November you sent a patch that adds QSFP-DD support in ethtool user
space utility:
https://patchwork.ozlabs.org/project/netdev/patch/20191109124205.11273-1-popadrian1996@gmail.com/

Back then Andrew rightfully noted that no driver in the upstream kernel
supports QSFP-DD and the patch was deferred.

If this patch set is accepted, would it be possible for you to re-post
your patch?

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type
  2020-06-26 14:53 ` [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
@ 2020-06-26 15:06   ` Andrew Lunn
  2020-06-26 16:45     ` Adrian Pop
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2020-06-26 15:06 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, popadrian1996, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

On Fri, Jun 26, 2020 at 05:53:42PM +0300, Ido Schimmel wrote:
> On Fri, Jun 26, 2020 at 05:47:22PM +0300, Ido Schimmel wrote:
> > From: Ido Schimmel <idosch@mellanox.com>
> > 
> > This patch set from Vadim adds support for Quad Small Form Factor
> > Pluggable Double Density (QSFP-DD) modules in mlxsw.
> 
> Adrian,
> 
> In November you sent a patch that adds QSFP-DD support in ethtool user
> space utility:
> https://patchwork.ozlabs.org/project/netdev/patch/20191109124205.11273-1-popadrian1996@gmail.com/
> 
> Back then Andrew rightfully noted that no driver in the upstream kernel
> supports QSFP-DD and the patch was deferred.

Hi Ido

It is a while ago, but i thought there was something odd about the
order of the pages, or the number of the pages? And there was no clear
indication from the kernel about QSPF page format vs QSPF-DD page
format?

So Adrian's patch is probably a good starting point, but i think it
needs further work.

      Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-26 14:47 ` [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers Ido Schimmel
@ 2020-06-26 15:19   ` Andrew Lunn
  2020-06-26 17:33     ` Adrian Pop
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2020-06-26 15:19 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, jiri, vadimp, popadrian1996, mlxsw, Ido Schimmel

> +	case MLXSW_REG_MCIA_EEPROM_MODULE_INFO_ID_QSFP_DD:
> +		/* Use SFF_8636 as base type. ethtool should recognize specific
> +		 * type through the identifier value.
> +		 */
> +		modinfo->type       = ETH_MODULE_SFF_8636;
> +		/* Verify if module EEPROM is a flat memory. In case of flat
> +		 * memory only page 00h (0-255 bytes) can be read. Otherwise
> +		 * upper pages 01h and 02h can also be read. Upper pages 10h
> +		 * and 11h are currently not supported by the driver.
> +		 */
> +		if (module_info[MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID] &
> +		    MLXSW_REG_MCIA_EEPROM_CMIS_FLAT_MEMORY)
> +			modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
> +		else
> +			modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
> +		break;

Although the upper pages 10h and 11h are not supported now, we
probably think about how they would be supported, to make sure we are
not going into a dead end.

From ethtool qsfp.c

/*
 *      Description:
 *      a) The register/memory layout is up to 5 128 byte pages defined by
 *              a "pages valid" register and switched via a "page select"
 *              register. Memory of 256 bytes can be memory mapped at a time
 *              according to SFF 8636.
 *      b) SFF 8636 based 640 bytes memory layout is presented for parser
 *
 *           SFF 8636 based QSFP Memory Map
 *
 *           2-Wire Serial Address: 1010000x
 *
 *           Lower Page 00h (128 bytes)
 *           ======================
 *           |                     |
 *           |Page Select Byte(127)|
 *           ======================
 *                    |
 *                    V
 *           ----------------------------------------
 *          |             |            |             |
 *          V             V            V             V
 *       ----------   ----------   ---------    ------------
 *      | Upper    | | Upper    | | Upper    | | Upper      |
 *      | Page 00h | | Page 01h | | Page 02h | | Page 03h   |
 *      |          | |(Optional)| |(Optional)| | (Optional) |
 *      |          | |          | |          | |            |
 *      |          | |          | |          | |            |
 *      |    ID    | |   AST    | |  User    | |  For       |
 *      |  Fields  | |  Table   | | EEPROM   | |  Cable     |
 *      |          | |          | | Data     | | Assemblies |
 *      |          | |          | |          | |            |
 *      |          | |          | |          | |            |
 *      -----------  -----------   ----------  --------------
 *
 *
 **/

Is page 03h valid for a QSFP DD? Do we add pages 10h and 11h after
page 03h, or instead of? How do we indicate to user space what pages
of data have been passed to it?

   Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type
  2020-06-26 15:06   ` Andrew Lunn
@ 2020-06-26 16:45     ` Adrian Pop
  0 siblings, 0 replies; 18+ messages in thread
From: Adrian Pop @ 2020-06-26 16:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

Hi Ido, Andrew!

I'm happy to receive these emails and to see that my old patch had
some interest. Yes, at that time there was no official support for
QSFP-DD in the upstream kernel. I was able to test my code using a
custom driver we developed in my company. If everything works out, I'd
be happy to re-submit it.

> It is a while ago, but i thought there was something odd about the
> order of the pages, or the number of the pages? And there was no clear
> indication from the kernel about QSPF page format vs QSPF-DD page
> format?
>
> So Adrian's patch is probably a good starting point, but i think it
> needs further work.

It is true, my patch was basically defining a KAPI. The decision to
have the pages 0x10 and 0x11 right after 0x00, 0x01 and 0x02 was made
based on the need to provide the same stats for QSFP-DD as for QSFP.
At that point, page 0x03 was not needed. Anyway, due to the way I
defined the offset values in qsfp-dd.h, if we add page 0x03 it would
be extremely easy to just change

#define PAG11H_OFFSET (0x04 * 0x80)
to
#define PAG11H_OFFSET (0x05 * 0x80)

and probably everything else would stay the same.
We can continue our discussion about the needed pages in the "[PATCH
net-next 1/2]" e-mails chain.


On Fri, 26 Jun 2020 at 16:06, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Fri, Jun 26, 2020 at 05:53:42PM +0300, Ido Schimmel wrote:
> > On Fri, Jun 26, 2020 at 05:47:22PM +0300, Ido Schimmel wrote:
> > > From: Ido Schimmel <idosch@mellanox.com>
> > >
> > > This patch set from Vadim adds support for Quad Small Form Factor
> > > Pluggable Double Density (QSFP-DD) modules in mlxsw.
> >
> > Adrian,
> >
> > In November you sent a patch that adds QSFP-DD support in ethtool user
> > space utility:
> > https://patchwork.ozlabs.org/project/netdev/patch/20191109124205.11273-1-popadrian1996@gmail.com/
> >
> > Back then Andrew rightfully noted that no driver in the upstream kernel
> > supports QSFP-DD and the patch was deferred.
>
> Hi Ido
>
> It is a while ago, but i thought there was something odd about the
> order of the pages, or the number of the pages? And there was no clear
> indication from the kernel about QSPF page format vs QSPF-DD page
> format?
>
> So Adrian's patch is probably a good starting point, but i think it
> needs further work.
>
>       Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-26 15:19   ` Andrew Lunn
@ 2020-06-26 17:33     ` Adrian Pop
  2020-06-26 19:07       ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Adrian Pop @ 2020-06-26 17:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

>
> Is page 03h valid for a QSFP DD? Do we add pages 10h and 11h after
> page 03h, or instead of? How do we indicate to user space what pages
> of data have been passed to it?
>
>    Andrew

From QSFP-DD CMIS Rev 4.0: "In particular, support of the Lower Memory
and of Page 00h is required for all modules, including passive copper
cables. These pages are therefore always implemented. Additional
support for Pages 01h, 02h and bank 0 of Pages 10h and 11h is required
for all paged memory modules."

According to the same document, page 0x03 contains "User EEPROM
(NVRs)". Byte 142, bit 2, page 0x01 indicates if the user page 0x03
was implemented. I did not find anything about page 0x02 (where the
user EEPROM is stored) in the documentation for QSFP. I suppose it is
always implemented? If we really want to have it so it is similar to
QSFP, one could send 896 bytes (instead of 768) and just fill that
portion with 0 in case it's not implemented. Note that this is just an
idea, I'm not aware of best practices in cases like this.

Adrian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-26 17:33     ` Adrian Pop
@ 2020-06-26 19:07       ` Andrew Lunn
  2020-06-26 22:13         ` Adrian Pop
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2020-06-26 19:07 UTC (permalink / raw)
  To: Adrian Pop
  Cc: Ido Schimmel, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

On Fri, Jun 26, 2020 at 06:33:55PM +0100, Adrian Pop wrote:
> >
> > Is page 03h valid for a QSFP DD? Do we add pages 10h and 11h after
> > page 03h, or instead of? How do we indicate to user space what pages
> > of data have been passed to it?
> >
> >    Andrew
> 
> >From QSFP-DD CMIS Rev 4.0: "In particular, support of the Lower Memory
> and of Page 00h is required for all modules, including passive copper
> cables. These pages are therefore always implemented. Additional
> support for Pages 01h, 02h and bank 0 of Pages 10h and 11h is required
> for all paged memory modules."
> 
> According to the same document, page 0x03 contains "User EEPROM
> (NVRs)". Byte 142, bit 2, page 0x01 indicates if the user page 0x03
> was implemented. I did not find anything about page 0x02 (where the
> user EEPROM is stored) in the documentation for QSFP. I suppose it is
> always implemented? If we really want to have it so it is similar to
> QSFP, one could send 896 bytes (instead of 768) and just fill that
> portion with 0 in case it's not implemented. Note that this is just an
> idea, I'm not aware of best practices in cases like this.

It does seem common to only return a subset of the pages. This patch
for example. We need some clear rules to known what the kernel has
passed to user space, so we can both correctly interpret when a subset
has been passed, and also ensure all drivers are doing the same thing.

Currently we have:

 *       ----------   ----------   ---------    ------------
 *      | Upper    | | Upper    | | Upper    | | Upper      |
 *      | Page 00h | | Page 01h | | Page 02h | | Page 03h   |
 *      |          | |(Optional)| |(Optional)| | (Optional) |
 *      |          | |          | |          | |            |
 *      |          | |          | |          | |            |
 *      |    ID    | |   AST    | |  User    | |  For       |
 *      |  Fields  | |  Table   | | EEPROM   | |  Cable     |
 *      |          | |          | | Data     | | Assemblies |
 *      |          | |          | |          | |            |
 *      |          | |          | |          | |            |
 *      -----------  -----------   ----------  --------------

You are saying pages 00h, 01h and 02h are mandatory for QSPF-DD.  Page
03h is optional, but when present, it seems to contain what is page
02h above. Since the QSPF KAPI has it, QSPF-DD KAPI should also have
it. So i would suggest that pages 10h and 11h come after that.

If a driver wants to pass a subset, it can, but it must always trim
from the right, it cannot remove pages from the middle.

     Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-26 19:07       ` Andrew Lunn
@ 2020-06-26 22:13         ` Adrian Pop
  2020-06-27 19:16           ` Ido Schimmel
  0 siblings, 1 reply; 18+ messages in thread
From: Adrian Pop @ 2020-06-26 22:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

> You are saying pages 00h, 01h and 02h are mandatory for QSPF-DD.  Page
> 03h is optional, but when present, it seems to contain what is page
> 02h above. Since the QSPF KAPI has it, QSPF-DD KAPI should also have
> it. So i would suggest that pages 10h and 11h come after that.
>
> If a driver wants to pass a subset, it can, but it must always trim
> from the right, it cannot remove pages from the middle.
>
>      Andrew

I agree with this. Basically there are two big cases:
- passive copper transceivers with flat memory => just 00h will be
present (both lower and higher => 256 bytes)
- optical transceivers with paged memory => 00h, 01h, 02h, 10h, 11h => 768 bytes
Having page 03h after 02h (so 896 bytes in total) seems like a good
idea and the updates I'll have to do to my old patch are minor
(updating the offset value of page 10h and 11h). When I tested my
patch, I did it with both passive copper transceivers and optical
transceivers and there weren't any issues.

In this patch, Ido added a comment in the code stating "Upper pages
10h and 11h are currently not supported by the driver.". This won't
actually be a problem! In CMIS Rev. 4, Table 8-12 Byte 85 (55h), we
learn that if the value of that byte is 01h or 02h, we have a SMF or
MMF interface (both optical). In the qsfp_dd_show_sig_optical_pwr
function (in my patch) there is this bit:

+ __u8 module_type = id[QSFP_DD_MODULE_TYPE_OFFSET];
[...]
+ /**
+ * The thresholds and the high/low alarms/warnings are available
+ * only if an optical interface (MMF/SMF) is present (if this is
+ * the case, it means that 5 pages are available).
+ */
+ if (module_type != QSFP_DD_MT_MMF &&
+    module_type != QSFP_DD_MT_SMF &&
+    eeprom_len != QSFP_DD_EEPROM_5PAG)
+ return;

But Ido sets the eeprom_len to be ETH_MODULE_SFF_8472_LEN which is
512, while QSFP_DD_EEPROM_5PAG is defined as 80h * 6 = 768. So there
won't be any issues of accessing non-existent values, since I return
from the function that deals with the pages 10h and 11h early. When
the driver will support them too everything will just work so your
idea of a driver being able to pass only a subset of pages (being
allowed to trim only from the right) holds.

Adrian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-26 22:13         ` Adrian Pop
@ 2020-06-27 19:16           ` Ido Schimmel
  2020-06-27 20:42             ` Adrian Pop
  0 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2020-06-27 19:16 UTC (permalink / raw)
  To: Adrian Pop
  Cc: Andrew Lunn, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

On Fri, Jun 26, 2020 at 11:13:42PM +0100, Adrian Pop wrote:
> > You are saying pages 00h, 01h and 02h are mandatory for QSPF-DD.  Page
> > 03h is optional, but when present, it seems to contain what is page
> > 02h above. Since the QSPF KAPI has it, QSPF-DD KAPI should also have
> > it. So i would suggest that pages 10h and 11h come after that.
> >
> > If a driver wants to pass a subset, it can, but it must always trim
> > from the right, it cannot remove pages from the middle.
> >
> >      Andrew
> 
> I agree with this. Basically there are two big cases:
> - passive copper transceivers with flat memory => just 00h will be
> present (both lower and higher => 256 bytes)
> - optical transceivers with paged memory => 00h, 01h, 02h, 10h, 11h => 768 bytes
> Having page 03h after 02h (so 896 bytes in total) seems like a good
> idea and the updates I'll have to do to my old patch are minor
> (updating the offset value of page 10h and 11h). When I tested my
> patch, I did it with both passive copper transceivers and optical
> transceivers and there weren't any issues.

Hi Adrian, Andrew,

Not sure I understand... You want the kernel to always pass page 03h to
user space (potentially zeroed)? Page 03h is not mandatory according to
the standard and page 01h contains information if page 03h is present or
not. So user space has the information it needs to determine if after
page 02h we have page 03h or page 10h. Why always pass page 03h then?

> 
> In this patch, Ido added a comment in the code stating "Upper pages
> 10h and 11h are currently not supported by the driver.". This won't
> actually be a problem! In CMIS Rev. 4, Table 8-12 Byte 85 (55h), we
> learn that if the value of that byte is 01h or 02h, we have a SMF or
> MMF interface (both optical). In the qsfp_dd_show_sig_optical_pwr
> function (in my patch) there is this bit:
> 
> + __u8 module_type = id[QSFP_DD_MODULE_TYPE_OFFSET];
> [...]
> + /**
> + * The thresholds and the high/low alarms/warnings are available
> + * only if an optical interface (MMF/SMF) is present (if this is
> + * the case, it means that 5 pages are available).
> + */
> + if (module_type != QSFP_DD_MT_MMF &&
> +    module_type != QSFP_DD_MT_SMF &&
> +    eeprom_len != QSFP_DD_EEPROM_5PAG)
> + return;
> 
> But Ido sets the eeprom_len to be ETH_MODULE_SFF_8472_LEN which is
> 512, while QSFP_DD_EEPROM_5PAG is defined as 80h * 6 = 768. So there
> won't be any issues of accessing non-existent values, since I return
> from the function that deals with the pages 10h and 11h early. When
> the driver will support them too everything will just work so your
> idea of a driver being able to pass only a subset of pages (being
> allowed to trim only from the right) holds.

BTW, Adrian, this is the output I get with your patch on top of current
ethtool:

$ ethtool -m swp1
        Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
        Power class                               : 1
        Max power                                 : 0.25W
        Connector                                 : 0x23 (No separable connector)
        Cable assembly length                     : 0.50km
        Tx CDR bypass control                     : Yes
        Rx CDR bypass control                     : No
        Tx CDR                                    : No
        Rx CDR                                    : No
        Transmitter technology                    : 0x0a (Copper cable, unequalized)
        Attenuation at 5GHz                       : 4db
        Attenuation at 7GHz                       : 5db
        Attenuation at 12.9GHz                    : 7db
        Attenuation at 25.8GHz                    : 21db
        Module temperature                        : 0.00 degrees C / 32.00 degrees F
        Module voltage                            : 0.0000 V
        Length (SMF)                              : 0.00km
        Length (OM5)                              : 0m
        Length (OM4)                              : 0m
        Length (OM3 50/125um)                     : 0m
        Length (OM2 50/125um)                     : 17m
        Vendor name                               : Mellanox
        Vendor OUI                                : 00:02:c9
        Vendor PN                                 : MCP1660-W00AE30
        Vendor rev                                : A2
        Vendor SN                                 : MT2019VS04757
        Date code                                 : 200507  _
        Revision compliance                       : Rev. 3.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-27 19:16           ` Ido Schimmel
@ 2020-06-27 20:42             ` Adrian Pop
  2020-06-28 11:55               ` Ido Schimmel
  0 siblings, 1 reply; 18+ messages in thread
From: Adrian Pop @ 2020-06-27 20:42 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

>
> Hi Adrian, Andrew,
>
> Not sure I understand... You want the kernel to always pass page 03h to
> user space (potentially zeroed)? Page 03h is not mandatory according to
> the standard and page 01h contains information if page 03h is present or

Hi Ido!

Andrew was thinking of having 03h after 02h (potentially zeroed) just
for the purpose of having a similar layout for QSFP-DD the same way we
do for QSFP. But as you said, it is not mandatory according to the
standard and I also don't know the entire codebase for ethtool and
where it might be actually needed. I think Andrew can argue for its
presence better than me.

> not. So user space has the information it needs to determine if after
> page 02h we have page 03h or page 10h. Why always pass page 03h then?
>

If we decide to add 03h but only sometimes, I think we will add an
extra layer of complexity. Sometimes after 02h we would have 03h and
sometimes 10h. In qsfp-dd.h (following the convention from qsfp.h) in
my patch there are a lot of different constants defined with respect
to the offset of the parent page in the memory layout and "dynamic
offsets" don't sound very good, at least for me. So even if there's a
way of checking in the user space which page is after 02h, a more
stable memory layout works better on the long run.

Adrian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-27 20:42             ` Adrian Pop
@ 2020-06-28 11:55               ` Ido Schimmel
  2020-06-28 12:27                 ` Adrian Pop
  2020-06-30  0:21                 ` Andrew Lunn
  0 siblings, 2 replies; 18+ messages in thread
From: Ido Schimmel @ 2020-06-28 11:55 UTC (permalink / raw)
  To: Adrian Pop
  Cc: Andrew Lunn, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

On Sat, Jun 27, 2020 at 09:42:10PM +0100, Adrian Pop wrote:
> >
> > Hi Adrian, Andrew,
> >
> > Not sure I understand... You want the kernel to always pass page 03h to
> > user space (potentially zeroed)? Page 03h is not mandatory according to
> > the standard and page 01h contains information if page 03h is present or
> 
> Hi Ido!
> 
> Andrew was thinking of having 03h after 02h (potentially zeroed) just
> for the purpose of having a similar layout for QSFP-DD the same way we
> do for QSFP. But as you said, it is not mandatory according to the
> standard and I also don't know the entire codebase for ethtool and
> where it might be actually needed. I think Andrew can argue for its
> presence better than me.
> 
> > not. So user space has the information it needs to determine if after
> > page 02h we have page 03h or page 10h. Why always pass page 03h then?
> >
> 
> If we decide to add 03h but only sometimes, I think we will add an
> extra layer of complexity. Sometimes after 02h we would have 03h and
> sometimes 10h. In qsfp-dd.h (following the convention from qsfp.h) in
> my patch there are a lot of different constants defined with respect
> to the offset of the parent page in the memory layout and "dynamic
> offsets" don't sound very good, at least for me. So even if there's a
> way of checking in the user space which page is after 02h, a more
> stable memory layout works better on the long run.

Adrian,

Thanks for the detailed response. I don't think the kernel should pass
fake pages only to make it easier for user space to parse the
information. What you are describing is basic dissection and it's done
all the time by wireshark / tcpdump.

Anyway, even we pass a fake page 03h, page 11h can still be at a
variable offset. See table 8-28 [1], bits 1-0 at offset 142 in page 01h
determine the size of pages 10h and 11h:

0 - each page is 128 bytes in size
1 - each page is 256 bytes in size
2 - each page is 512 bytes in size

So a completely stable layout (unless I missed something) will entail
the kernel sending 1664 bytes to user space each time. This looks
unnecessarily rigid to me. The people who wrote the standard obviously
took into account the fact that the page layout needs to be discoverable
from the data and I think we should embrace it and only pass valid
information to user space.

Regardless, can Andrew and you let us know if you have a problems with
current patch set which only exposes pages 00h-02h? I see it's marked as
"Changes Requested", so I will need to re-submit.

Thanks

[1] http://www.qsfp-dd.com/wp-content/uploads/2019/05/QSFP-DD-CMIS-rev4p0.pdf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-28 11:55               ` Ido Schimmel
@ 2020-06-28 12:27                 ` Adrian Pop
  2020-06-30  0:21                 ` Andrew Lunn
  1 sibling, 0 replies; 18+ messages in thread
From: Adrian Pop @ 2020-06-28 12:27 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

Hi!

Regarding multiple banks, I think that we can have a much more
creative way of dealing with them (in the future). At page 76, we have
"In particular, support of the Lower Memory and of Page 00h is
required for all modules, including passive copper cables. These pages
are therefore always implemented. Additional support for Pages 01h,
02h and bank 0 of Pages 10h and 11h is required for all paged memory
modules."

My old patch clearly supports only the 1st (and mandatory) bank. So
the memory layout is 00h, 01h, 02h, 10h, 11h. If we will extend
ethtool to work for multiple banks, we might have something like 00h,
01h, 02h, (10h, 11h | bank 0), (10h, 11h | bank 1) etc. So we can
still check bits 1-0 of byte 142 from page 01h to determine how many
banks we have and we can still follow the "we can trim, but only to
the right" rule. Of course, this is only an idea, at the moment I
don't think we can even test something like that.

I'm sure that I can work something out to deal with sometimes having
page 03h and sometimes not, but first I think we need to decide if we
always add it or not. As I mentioned in my previous email, I think
Andrew can argue for its presence better than me. Based on that, I can
re-submit my old patch for ethtool.

Adrian

On Sun, 28 Jun 2020 at 12:56, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Sat, Jun 27, 2020 at 09:42:10PM +0100, Adrian Pop wrote:
> > >
> > > Hi Adrian, Andrew,
> > >
> > > Not sure I understand... You want the kernel to always pass page 03h to
> > > user space (potentially zeroed)? Page 03h is not mandatory according to
> > > the standard and page 01h contains information if page 03h is present or
> >
> > Hi Ido!
> >
> > Andrew was thinking of having 03h after 02h (potentially zeroed) just
> > for the purpose of having a similar layout for QSFP-DD the same way we
> > do for QSFP. But as you said, it is not mandatory according to the
> > standard and I also don't know the entire codebase for ethtool and
> > where it might be actually needed. I think Andrew can argue for its
> > presence better than me.
> >
> > > not. So user space has the information it needs to determine if after
> > > page 02h we have page 03h or page 10h. Why always pass page 03h then?
> > >
> >
> > If we decide to add 03h but only sometimes, I think we will add an
> > extra layer of complexity. Sometimes after 02h we would have 03h and
> > sometimes 10h. In qsfp-dd.h (following the convention from qsfp.h) in
> > my patch there are a lot of different constants defined with respect
> > to the offset of the parent page in the memory layout and "dynamic
> > offsets" don't sound very good, at least for me. So even if there's a
> > way of checking in the user space which page is after 02h, a more
> > stable memory layout works better on the long run.
>
> Adrian,
>
> Thanks for the detailed response. I don't think the kernel should pass
> fake pages only to make it easier for user space to parse the
> information. What you are describing is basic dissection and it's done
> all the time by wireshark / tcpdump.
>
> Anyway, even we pass a fake page 03h, page 11h can still be at a
> variable offset. See table 8-28 [1], bits 1-0 at offset 142 in page 01h
> determine the size of pages 10h and 11h:
>
> 0 - each page is 128 bytes in size
> 1 - each page is 256 bytes in size
> 2 - each page is 512 bytes in size
>
> So a completely stable layout (unless I missed something) will entail
> the kernel sending 1664 bytes to user space each time. This looks
> unnecessarily rigid to me. The people who wrote the standard obviously
> took into account the fact that the page layout needs to be discoverable
> from the data and I think we should embrace it and only pass valid
> information to user space.
>
> Regardless, can Andrew and you let us know if you have a problems with
> current patch set which only exposes pages 00h-02h? I see it's marked as
> "Changes Requested", so I will need to re-submit.
>
> Thanks
>
> [1] http://www.qsfp-dd.com/wp-content/uploads/2019/05/QSFP-DD-CMIS-rev4p0.pdf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-28 11:55               ` Ido Schimmel
  2020-06-28 12:27                 ` Adrian Pop
@ 2020-06-30  0:21                 ` Andrew Lunn
  2020-06-30  5:59                   ` Ido Schimmel
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2020-06-30  0:21 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Adrian Pop, netdev, davem, kuba, jiri, vadimp, mlxsw, Ido Schimmel

> Adrian,
> 
> Thanks for the detailed response. I don't think the kernel should pass
> fake pages only to make it easier for user space to parse the
> information. What you are describing is basic dissection and it's done
> all the time by wireshark / tcpdump.
> 
> Anyway, even we pass a fake page 03h, page 11h can still be at a
> variable offset. See table 8-28 [1], bits 1-0 at offset 142 in page 01h
> determine the size of pages 10h and 11h:
> 
> 0 - each page is 128 bytes in size
> 1 - each page is 256 bytes in size
> 2 - each page is 512 bytes in size
> 
> So a completely stable layout (unless I missed something) will entail
> the kernel sending 1664 bytes to user space each time. This looks
> unnecessarily rigid to me. The people who wrote the standard obviously
> took into account the fact that the page layout needs to be discoverable
> from the data and I think we should embrace it and only pass valid
> information to user space.

O.K. That makes things more complex. And i would say, Ido is correct,
we need to make use of the discoverable features.

I've no practice experience with modules other than plain old SFPs,
1G. And those have all sorts of errors, even basic things like the CRC
are systematically incorrect because they are not recalculated after
adding the serial number. We have had people trying to submit patches
to ethtool to make it ignore bits so that it dumps more information,
because the manufacturer failed to set the correct bits, etc.

Ido, Adrian, what is your experience with these QSFP-DD devices. Are
they generally of better quality, the EEPROM can be trusted? Is there
any form of compliance test.

If we go down the path of using the discovery information, it means we
have no way for user space to try to correct for when the information
is incorrect. It cannot request specific pages. So maybe we should
consider an alternative?

The netlink ethtool gives us more flexibility. How about we make a new
API where user space can request any pages it want, and specify the
size of the page. ethtool can start out by reading page 0. That should
allow it to identify the basic class of device. It can then request
additional pages as needed.

The nice thing about that is we don't need two parsers of the
discovery information, one in user and second in kernel space. We
don't need to guarantee these two parsers agree with each other, in
order to correctly decode what the kernel sent to user space. And user
space has the flexibility to work around known issues when
manufactures get their EEPROM wrong.

	     Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-30  0:21                 ` Andrew Lunn
@ 2020-06-30  5:59                   ` Ido Schimmel
  2020-06-30  9:37                     ` Vadim Pasternak
  2020-06-30 13:00                     ` Andrew Lunn
  0 siblings, 2 replies; 18+ messages in thread
From: Ido Schimmel @ 2020-06-30  5:59 UTC (permalink / raw)
  To: Andrew Lunn, vadimp
  Cc: Adrian Pop, netdev, davem, kuba, jiri, mlxsw, Ido Schimmel

On Tue, Jun 30, 2020 at 02:21:59AM +0200, Andrew Lunn wrote:
> I've no practice experience with modules other than plain old SFPs,
> 1G. And those have all sorts of errors, even basic things like the CRC
> are systematically incorrect because they are not recalculated after
> adding the serial number. We have had people trying to submit patches
> to ethtool to make it ignore bits so that it dumps more information,
> because the manufacturer failed to set the correct bits, etc.
> 
> Ido, Adrian, what is your experience with these QSFP-DD devices. Are
> they generally of better quality, the EEPROM can be trusted? Is there
> any form of compliance test.

Vadim, I know you tested with at least two different QSFP-DD modules,
can you please share your experience?

> 
> If we go down the path of using the discovery information, it means we
> have no way for user space to try to correct for when the information
> is incorrect. It cannot request specific pages. So maybe we should
> consider an alternative?
> 
> The netlink ethtool gives us more flexibility. How about we make a new
> API where user space can request any pages it want, and specify the
> size of the page. ethtool can start out by reading page 0. That should
> allow it to identify the basic class of device. It can then request
> additional pages as needed.

Just to make sure I understand, this also means adding a new API towards
drivers, right? So that they only read from HW the requested info.

> The nice thing about that is we don't need two parsers of the
> discovery information, one in user and second in kernel space. We
> don't need to guarantee these two parsers agree with each other, in
> order to correctly decode what the kernel sent to user space. And user
> space has the flexibility to work around known issues when
> manufactures get their EEPROM wrong.

Sounds sane to me... I know that in the past Vadim had to deal with
various faulty modules. Vadim, is this something we can support? What
happens if user space requests a page that does not exist? For example,
in the case of QSFP-DD, lets say we do not provide page 03h but user
space still wants it because it believes manufacturer did not set
correct bits.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-30  5:59                   ` Ido Schimmel
@ 2020-06-30  9:37                     ` Vadim Pasternak
  2020-06-30 13:00                     ` Andrew Lunn
  1 sibling, 0 replies; 18+ messages in thread
From: Vadim Pasternak @ 2020-06-30  9:37 UTC (permalink / raw)
  To: Ido Schimmel, Andrew Lunn
  Cc: Adrian Pop, netdev, davem, kuba, Jiri Pirko, mlxsw, Ido Schimmel



> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Tuesday, June 30, 2020 9:00 AM
> To: Andrew Lunn <andrew@lunn.ch>; Vadim Pasternak
> <vadimp@mellanox.com>
> Cc: Adrian Pop <popadrian1996@gmail.com>; netdev@vger.kernel.org;
> davem@davemloft.net; kuba@kernel.org; Jiri Pirko <jiri@mellanox.com>;
> mlxsw <mlxsw@mellanox.com>; Ido Schimmel <idosch@mellanox.com>
> Subject: Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-
> DD transceivers
> 
> On Tue, Jun 30, 2020 at 02:21:59AM +0200, Andrew Lunn wrote:
> > I've no practice experience with modules other than plain old SFPs,
> > 1G. And those have all sorts of errors, even basic things like the CRC
> > are systematically incorrect because they are not recalculated after
> > adding the serial number. We have had people trying to submit patches
> > to ethtool to make it ignore bits so that it dumps more information,
> > because the manufacturer failed to set the correct bits, etc.
> >
> > Ido, Adrian, what is your experience with these QSFP-DD devices. Are
> > they generally of better quality, the EEPROM can be trusted? Is there
> > any form of compliance test.
> 
> Vadim, I know you tested with at least two different QSFP-DD modules, can
> you please share your experience?
> 

I tested two types of QSFP-DD, cooper and optical from few vendors:
Innolight, SP (Source Photonics) and Mellanox customized transceivers.
We don't have enough statistics. I guess in all our systems in LAB we
validated about 150 - 200 cables. No one of them had wrong EEPROM.

But in all Mellanox systems QSFP reading works through the firmware
and firmware performs QSFP validation for stamping (some cable type
are considered as untrusted and firmware put them to the black list),
page checksum, power consuming criteria.


> >
> > If we go down the path of using the discovery information, it means we
> > have no way for user space to try to correct for when the information
> > is incorrect. It cannot request specific pages. So maybe we should
> > consider an alternative?
> >
> > The netlink ethtool gives us more flexibility. How about we make a new
> > API where user space can request any pages it want, and specify the
> > size of the page. ethtool can start out by reading page 0. That should
> > allow it to identify the basic class of device. It can then request
> > additional pages as needed.
> 
> Just to make sure I understand, this also means adding a new API towards
> drivers, right? So that they only read from HW the requested info.
> 
> > The nice thing about that is we don't need two parsers of the
> > discovery information, one in user and second in kernel space. We
> > don't need to guarantee these two parsers agree with each other, in
> > order to correctly decode what the kernel sent to user space. And user
> > space has the flexibility to work around known issues when
> > manufactures get their EEPROM wrong.
> 
> Sounds sane to me... I know that in the past Vadim had to deal with various
> faulty modules. Vadim, is this something we can support? What happens if
> user space requests a page that does not exist? For example, in the case of
> QSFP-DD, lets say we do not provide page 03h but user space still wants it
> because it believes manufacturer did not set correct bits.

Regarding faulty modules, as I wrote - validation is performed by firmware
and our software trust firmware.

Currently user space just asks for the buffer length.
I suppose in case we'll have additional API:
ethtool -m <if> <page> <offset> <size>
it would be possible to provide buffer only for the defined page and upto
valid size.

Pay attention that CMIS specification covers also others transceivers types
and some of them we are going to support through ethtool, like:
19h (OSFP 8x Pluggable Transceiver)
1Ah (SFP-DD Double Density 2x Pluggable Transceiver)
1Eh (QSFP with QSFP-DD memory map)

If I am not wrong 19h and 1Eh should have same layout as QSFP-DD and
SFP-DD is supposed to be similar, but shorter (page 02h is reserved, page
01h contains info, which for QSFP-DD sits at page 02h).


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers
  2020-06-30  5:59                   ` Ido Schimmel
  2020-06-30  9:37                     ` Vadim Pasternak
@ 2020-06-30 13:00                     ` Andrew Lunn
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Lunn @ 2020-06-30 13:00 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: vadimp, Adrian Pop, netdev, davem, kuba, jiri, mlxsw, Ido Schimmel

> Sounds sane to me... I know that in the past Vadim had to deal with
> various faulty modules. Vadim, is this something we can support? What
> happens if user space requests a page that does not exist? For example,
> in the case of QSFP-DD, lets say we do not provide page 03h but user
> space still wants it because it believes manufacturer did not set
> correct bits.

Hi Ido

I can see two scenarios.

This API is retrofitted to a firmware which only supports pre-defined
linear dumps. A page is requested which is not part of the
dump. EOPNOTSUPP seems like a good response.

The second is for a page which does not exist in the module. I would
just let the i2c bus master perform the transfer. Some might return
EIO, ENODEV if the SFP does not respond. Otherwise it might return
0xff from pullups, or random junk. Hopefully each page as a checksum,
and hopefully the vendor actually get the checksum correct? So let
userspace either deal with the error code, or whatever data is
provided.

The current code already to some extent handles missing data, it uses
the length to determine if the page is present or not. So it is not
too big a change to look error codes for individual pages.

    Andrew


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-30 13:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-26 14:47 [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
2020-06-26 14:47 ` [PATCH net-next 1/2] mlxsw: core: Add ethtool support for QSFP-DD transceivers Ido Schimmel
2020-06-26 15:19   ` Andrew Lunn
2020-06-26 17:33     ` Adrian Pop
2020-06-26 19:07       ` Andrew Lunn
2020-06-26 22:13         ` Adrian Pop
2020-06-27 19:16           ` Ido Schimmel
2020-06-27 20:42             ` Adrian Pop
2020-06-28 11:55               ` Ido Schimmel
2020-06-28 12:27                 ` Adrian Pop
2020-06-30  0:21                 ` Andrew Lunn
2020-06-30  5:59                   ` Ido Schimmel
2020-06-30  9:37                     ` Vadim Pasternak
2020-06-30 13:00                     ` Andrew Lunn
2020-06-26 14:47 ` [PATCH net-next 2/2] mlxsw: core: Add support for temperature thresholds reading " Ido Schimmel
2020-06-26 14:53 ` [PATCH net-next 0/2] mlxsw: Add support for QSFP-DD transceiver type Ido Schimmel
2020-06-26 15:06   ` Andrew Lunn
2020-06-26 16:45     ` Adrian Pop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).