linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] igb: add parameter to ignore nvm checksum validation
@ 2019-05-08 23:14 Nikunj Kela
  2019-05-16 19:35 ` Jeff Kirsher
  0 siblings, 1 reply; 12+ messages in thread
From: Nikunj Kela @ 2019-05-08 23:14 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: xe-linux-external, David S. Miller, intel-wired-lan, netdev,
	linux-kernel

Some of the broken NICs don't have EEPROM programmed correctly. It results
in probe to fail. This change adds a module parameter that can be used to
ignore nvm checksum validation.

Cc: xe-linux-external@cisco.com
Signed-off-by: Nikunj Kela <nkela@cisco.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 39f33af..0ae1324 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -247,6 +247,11 @@ static int debug = -1;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+static bool ignore_nvm_checksum;
+module_param(ignore_nvm_checksum, bool, 0);
+MODULE_PARM_DESC(ignore_nvm_checksum,
+		"Set to ignore nvm checksum validation (defaults N)");
+
 struct igb_reg_info {
 	u32 ofs;
 	char *name;
@@ -3191,18 +3196,29 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	case e1000_i211:
 		if (igb_get_flash_presence_i210(hw)) {
 			if (hw->nvm.ops.validate(hw) < 0) {
-				dev_err(&pdev->dev,
+				if (ignore_nvm_checksum) {
+					dev_warn(&pdev->dev,
 					"The NVM Checksum Is Not Valid\n");
-				err = -EIO;
-				goto err_eeprom;
+				} else {
+					dev_err(&pdev->dev,
+					"The NVM Checksum Is Not Valid\n");
+					err = -EIO;
+					goto err_eeprom;
+				}
 			}
 		}
 		break;
 	default:
 		if (hw->nvm.ops.validate(hw) < 0) {
-			dev_err(&pdev->dev, "The NVM Checksum Is Not Valid\n");
-			err = -EIO;
-			goto err_eeprom;
+			if (ignore_nvm_checksum) {
+				dev_warn(&pdev->dev,
+					"The NVM Checksum Is Not Valid\n");
+			} else {
+				dev_err(&pdev->dev,
+					"The NVM Checksum Is Not Valid\n");
+				err = -EIO;
+				goto err_eeprom;
+			}
 		}
 		break;
 	}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-08 23:14 [PATCH] igb: add parameter to ignore nvm checksum validation Nikunj Kela
@ 2019-05-16 19:35 ` Jeff Kirsher
  2019-05-16 19:55   ` Nikunj Kela (nkela)
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Kirsher @ 2019-05-16 19:35 UTC (permalink / raw)
  To: Nikunj Kela
  Cc: xe-linux-external, David S. Miller, intel-wired-lan, netdev,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
> Some of the broken NICs don't have EEPROM programmed correctly. It
> results
> in probe to fail. This change adds a module parameter that can be
> used to
> ignore nvm checksum validation.
> 
> Cc: xe-linux-external@cisco.com
> Signed-off-by: Nikunj Kela <nkela@cisco.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)

NAK for two reasons.  First, module parameters are not desirable
because their individual to one driver and a global solution should be
found so that all networking device drivers can use the solution.  This
will keep the interface to change/setup/modify networking drivers
consistent for all drivers.

Second and more importantly, if your NIC is broken, fix it.  Do not try
and create a software workaround so that you can continue to use a
broken NIC.  There are methods/tools available to properly reprogram
the EEPROM on a NIC, which is the right solution for your issue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-16 19:35 ` Jeff Kirsher
@ 2019-05-16 19:55   ` Nikunj Kela (nkela)
  2019-05-16 22:02     ` Florian Fainelli
  0 siblings, 1 reply; 12+ messages in thread
From: Nikunj Kela (nkela) @ 2019-05-16 19:55 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: xe-linux-external(mailer list),
	David S. Miller, intel-wired-lan, netdev, linux-kernel



On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:

    On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
   >> Some of the broken NICs don't have EEPROM programmed correctly. It
   >> results
   >> in probe to fail. This change adds a module parameter that can be
   >> used to
   >> ignore nvm checksum validation.
   >> 
   >> Cc: xe-linux-external@cisco.com
   >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
   >> ---
   >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
   >> ++++++++++++++++++++++------
   >>  1 file changed, 22 insertions(+), 6 deletions(-)
    
    >NAK for two reasons.  First, module parameters are not desirable
    >because their individual to one driver and a global solution should be
    >found so that all networking device drivers can use the solution.  This
    >will keep the interface to change/setup/modify networking drivers
    >consistent for all drivers.

    
    >Second and more importantly, if your NIC is broken, fix it.  Do not try
    >and create a software workaround so that you can continue to use a
    >broken NIC.  There are methods/tools available to properly reprogram
    >the EEPROM on a NIC, which is the right solution for your issue.

I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
    


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-16 19:55   ` Nikunj Kela (nkela)
@ 2019-05-16 22:02     ` Florian Fainelli
  2019-05-16 22:33       ` Nikunj Kela (nkela)
  2019-05-17  1:03       ` Daniel Walker
  0 siblings, 2 replies; 12+ messages in thread
From: Florian Fainelli @ 2019-05-16 22:02 UTC (permalink / raw)
  To: Nikunj Kela (nkela), jeffrey.t.kirsher
  Cc: xe-linux-external(mailer list),
	David S. Miller, intel-wired-lan, netdev, linux-kernel

On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> 
> 
> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
> 
>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
>    >> results
>    >> in probe to fail. This change adds a module parameter that can be
>    >> used to
>    >> ignore nvm checksum validation.
>    >> 
>    >> Cc: xe-linux-external@cisco.com
>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
>    >> ---
>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
>    >> ++++++++++++++++++++++------
>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
>     
>     >NAK for two reasons.  First, module parameters are not desirable
>     >because their individual to one driver and a global solution should be
>     >found so that all networking device drivers can use the solution.  This
>     >will keep the interface to change/setup/modify networking drivers
>     >consistent for all drivers.
> 
>     
>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
>     >and create a software workaround so that you can continue to use a
>     >broken NIC.  There are methods/tools available to properly reprogram
>     >the EEPROM on a NIC, which is the right solution for your issue.
> 
> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.

Then why even bother with sending this upstream?
-- 
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-16 22:02     ` Florian Fainelli
@ 2019-05-16 22:33       ` Nikunj Kela (nkela)
  2019-05-17  1:03       ` Daniel Walker
  1 sibling, 0 replies; 12+ messages in thread
From: Nikunj Kela (nkela) @ 2019-05-16 22:33 UTC (permalink / raw)
  To: Florian Fainelli, jeffrey.t.kirsher
  Cc: xe-linux-external(mailer list),
	David S. Miller, intel-wired-lan, netdev, linux-kernel



On 5/16/19, 3:02 PM, "Florian Fainelli" <f.fainelli@gmail.com> wrote:

    On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
    >> 
    >> 
    >> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
    >> 
    >>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
    >>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
    >>    >> results
    >>    >> in probe to fail. This change adds a module parameter that can be
    >>    >> used to
    >>    >> ignore nvm checksum validation.
    >>    >> 
    >>    >> Cc: xe-linux-external@cisco.com
    >>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
    >>    >> ---
    >>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
    >>    >> ++++++++++++++++++++++------
    >>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
    >>     
    >>     >NAK for two reasons.  First, module parameters are not desirable
    >>     >because their individual to one driver and a global solution should be
    >>     >found so that all networking device drivers can use the solution.  This
    >>     >will keep the interface to change/setup/modify networking drivers
    >>     >consistent for all drivers.
    >> 
    >>     
    >>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
    >>     >and create a software workaround so that you can continue to use a
    >>     >broken NIC.  There are methods/tools available to properly reprogram
    >>     >the EEPROM on a NIC, which is the right solution for your issue.
    >> 
    >> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
    
    >Then why even bother with sending this upstream?
    >-- 
    >Florian

My colleagues wanted me to upstream so if there is anyone else in the same situations, maybe there is a better solution. 
    


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-16 22:02     ` Florian Fainelli
  2019-05-16 22:33       ` Nikunj Kela (nkela)
@ 2019-05-17  1:03       ` Daniel Walker
  2019-05-17  1:48         ` Florian Fainelli
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Walker @ 2019-05-17  1:03 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Nikunj Kela (nkela),
	jeffrey.t.kirsher, xe-linux-external(mailer list),
	David S. Miller, intel-wired-lan, netdev, linux-kernel

On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> > 
> > 
> > On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
> > 
> >     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
> >    >> Some of the broken NICs don't have EEPROM programmed correctly. It
> >    >> results
> >    >> in probe to fail. This change adds a module parameter that can be
> >    >> used to
> >    >> ignore nvm checksum validation.
> >    >> 
> >    >> Cc: xe-linux-external@cisco.com
> >    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
> >    >> ---
> >    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> >    >> ++++++++++++++++++++++------
> >    >>  1 file changed, 22 insertions(+), 6 deletions(-)
> >     
> >     >NAK for two reasons.  First, module parameters are not desirable
> >     >because their individual to one driver and a global solution should be
> >     >found so that all networking device drivers can use the solution.  This
> >     >will keep the interface to change/setup/modify networking drivers
> >     >consistent for all drivers.
> > 
> >     
> >     >Second and more importantly, if your NIC is broken, fix it.  Do not try
> >     >and create a software workaround so that you can continue to use a
> >     >broken NIC.  There are methods/tools available to properly reprogram
> >     >the EEPROM on a NIC, which is the right solution for your issue.
> > 
> > I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
> 
> Then why even bother with sending this upstream?

It seems rather drastic to disable the entire driver because the checksum
doesn't match. It really should be a warning, even a big warning, to let people
know something is wrong, but disabling the whole driver doesn't make sense.

Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-17  1:03       ` Daniel Walker
@ 2019-05-17  1:48         ` Florian Fainelli
  2019-05-17 15:16           ` [Intel-wired-lan] " Alexander Duyck
  0 siblings, 1 reply; 12+ messages in thread
From: Florian Fainelli @ 2019-05-17  1:48 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Nikunj Kela (nkela),
	jeffrey.t.kirsher, xe-linux-external(mailer list),
	David S. Miller, intel-wired-lan, netdev, linux-kernel



On 5/16/2019 6:03 PM, Daniel Walker wrote:
> On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
>> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
>>>
>>>
>>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
>>>
>>>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
>>>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
>>>    >> results
>>>    >> in probe to fail. This change adds a module parameter that can be
>>>    >> used to
>>>    >> ignore nvm checksum validation.
>>>    >> 
>>>    >> Cc: xe-linux-external@cisco.com
>>>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
>>>    >> ---
>>>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
>>>    >> ++++++++++++++++++++++------
>>>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
>>>     
>>>     >NAK for two reasons.  First, module parameters are not desirable
>>>     >because their individual to one driver and a global solution should be
>>>     >found so that all networking device drivers can use the solution.  This
>>>     >will keep the interface to change/setup/modify networking drivers
>>>     >consistent for all drivers.
>>>
>>>     
>>>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
>>>     >and create a software workaround so that you can continue to use a
>>>     >broken NIC.  There are methods/tools available to properly reprogram
>>>     >the EEPROM on a NIC, which is the right solution for your issue.
>>>
>>> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
>>
>> Then why even bother with sending this upstream?
> 
> It seems rather drastic to disable the entire driver because the checksum
> doesn't match. It really should be a warning, even a big warning, to let people
> know something is wrong, but disabling the whole driver doesn't make sense.

You could generate a random Ethernet MAC address if you don't have a
valid one, a lot of drivers do that, and that's a fairly reasonable
behavior. At some point in your product development someone will
certainly verify that the provisioned MAC address matches the network
interface's MAC address.
-- 
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-17  1:48         ` Florian Fainelli
@ 2019-05-17 15:16           ` Alexander Duyck
  2019-05-17 16:36             ` Daniel Walker
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Duyck @ 2019-05-17 15:16 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Daniel Walker, Nikunj Kela (nkela),
	netdev, linux-kernel, intel-wired-lan,
	xe-linux-external(mailer list),
	David S. Miller

On Thu, May 16, 2019 at 6:48 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 5/16/2019 6:03 PM, Daniel Walker wrote:
> > On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> >> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> >>>
> >>>
> >>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
> >>>
> >>>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
> >>>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
> >>>    >> results
> >>>    >> in probe to fail. This change adds a module parameter that can be
> >>>    >> used to
> >>>    >> ignore nvm checksum validation.
> >>>    >>
> >>>    >> Cc: xe-linux-external@cisco.com
> >>>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
> >>>    >> ---
> >>>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> >>>    >> ++++++++++++++++++++++------
> >>>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
> >>>
> >>>     >NAK for two reasons.  First, module parameters are not desirable
> >>>     >because their individual to one driver and a global solution should be
> >>>     >found so that all networking device drivers can use the solution.  This
> >>>     >will keep the interface to change/setup/modify networking drivers
> >>>     >consistent for all drivers.
> >>>
> >>>
> >>>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
> >>>     >and create a software workaround so that you can continue to use a
> >>>     >broken NIC.  There are methods/tools available to properly reprogram
> >>>     >the EEPROM on a NIC, which is the right solution for your issue.
> >>>
> >>> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
> >>
> >> Then why even bother with sending this upstream?
> >
> > It seems rather drastic to disable the entire driver because the checksum
> > doesn't match. It really should be a warning, even a big warning, to let people
> > know something is wrong, but disabling the whole driver doesn't make sense.
>
> You could generate a random Ethernet MAC address if you don't have a
> valid one, a lot of drivers do that, and that's a fairly reasonable
> behavior. At some point in your product development someone will
> certainly verify that the provisioned MAC address matches the network
> interface's MAC address.
> --
> Florian

The thing is the EEPROM contains much more than just the MAC address.
There ends up being configuration for some of the PCIe interface in
the hardware as well as PHY configuration. If that is somehow mangled
we shouldn't be bringing up the part because there are one or more
pieces of the device configuration that are likely wrong.

The checksum is being used to make sure the EEPROM is valid, without
that we would need to go through and validate each individual section
of the EEPROM before enabling the the portions of the device related
to it. The concern is that this will become a slippery slope where we
eventually have to code all the configuration of the EEPROM into the
driver itself.

We need to make the checksum a hard stop. If the part is broken then
it needs to be addressed. Workarounds just end up being used and
forgotten, which makes it that much harder to support the product.
Better to mark the part as being broken, and get it fixed now, than to
have parts start shipping that require workarounds in order to
function.

- Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-17 15:16           ` [Intel-wired-lan] " Alexander Duyck
@ 2019-05-17 16:36             ` Daniel Walker
  2019-05-17 16:47               ` Florian Fainelli
  2019-05-17 16:58               ` Alexander Duyck
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Walker @ 2019-05-17 16:36 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Florian Fainelli, Nikunj Kela (nkela),
	netdev, linux-kernel, intel-wired-lan,
	xe-linux-external(mailer list),
	David S. Miller

On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
> On Thu, May 16, 2019 at 6:48 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> >
> >
> >
> > On 5/16/2019 6:03 PM, Daniel Walker wrote:
> > > On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> > >> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> > >>>
> > >>>
> > >>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
> > >>>
> > >>>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
> > >>>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
> > >>>    >> results
> > >>>    >> in probe to fail. This change adds a module parameter that can be
> > >>>    >> used to
> > >>>    >> ignore nvm checksum validation.
> > >>>    >>
> > >>>    >> Cc: xe-linux-external@cisco.com
> > >>>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
> > >>>    >> ---
> > >>>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> > >>>    >> ++++++++++++++++++++++------
> > >>>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
> > >>>
> > >>>     >NAK for two reasons.  First, module parameters are not desirable
> > >>>     >because their individual to one driver and a global solution should be
> > >>>     >found so that all networking device drivers can use the solution.  This
> > >>>     >will keep the interface to change/setup/modify networking drivers
> > >>>     >consistent for all drivers.
> > >>>
> > >>>
> > >>>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
> > >>>     >and create a software workaround so that you can continue to use a
> > >>>     >broken NIC.  There are methods/tools available to properly reprogram
> > >>>     >the EEPROM on a NIC, which is the right solution for your issue.
> > >>>
> > >>> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
> > >>
> > >> Then why even bother with sending this upstream?
> > >
> > > It seems rather drastic to disable the entire driver because the checksum
> > > doesn't match. It really should be a warning, even a big warning, to let people
> > > know something is wrong, but disabling the whole driver doesn't make sense.
> >
> > You could generate a random Ethernet MAC address if you don't have a
> > valid one, a lot of drivers do that, and that's a fairly reasonable
> > behavior. At some point in your product development someone will
> > certainly verify that the provisioned MAC address matches the network
> > interface's MAC address.
> > --
> > Florian
> 
> The thing is the EEPROM contains much more than just the MAC address.
> There ends up being configuration for some of the PCIe interface in
> the hardware as well as PHY configuration. If that is somehow mangled
> we shouldn't be bringing up the part because there are one or more
> pieces of the device configuration that are likely wrong.
> 
> The checksum is being used to make sure the EEPROM is valid, without
> that we would need to go through and validate each individual section
> of the EEPROM before enabling the the portions of the device related
> to it. The concern is that this will become a slippery slope where we
> eventually have to code all the configuration of the EEPROM into the
> driver itself.
 

I don't think you can say because the checksum is valid that all data contained
inside is also valid. You can have a valid checksum , and someone screwed up the
data prior to the checksum getting computed.


> We need to make the checksum a hard stop. If the part is broken then
> it needs to be addressed. Workarounds just end up being used and
> forgotten, which makes it that much harder to support the product.
> Better to mark the part as being broken, and get it fixed now, than to
> have parts start shipping that require workarounds in order to
> function.o

I don't think it's realistic to define the development process for large
corporations like Cisco, or like what your doing , to define the development
process for all corporations and products which may use intel parts. It's better
to be flexible.

Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-17 16:36             ` Daniel Walker
@ 2019-05-17 16:47               ` Florian Fainelli
  2019-05-17 16:58               ` Alexander Duyck
  1 sibling, 0 replies; 12+ messages in thread
From: Florian Fainelli @ 2019-05-17 16:47 UTC (permalink / raw)
  To: Daniel Walker, Alexander Duyck
  Cc: Nikunj Kela (nkela),
	netdev, linux-kernel, intel-wired-lan,
	xe-linux-external(mailer list),
	David S. Miller

On 5/17/19 9:36 AM, Daniel Walker wrote:
> On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
>> On Thu, May 16, 2019 at 6:48 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>
>>>
>>>
>>> On 5/16/2019 6:03 PM, Daniel Walker wrote:
>>>> On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
>>>>> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
>>>>>>
>>>>>>
>>>>>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
>>>>>>
>>>>>>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
>>>>>>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
>>>>>>    >> results
>>>>>>    >> in probe to fail. This change adds a module parameter that can be
>>>>>>    >> used to
>>>>>>    >> ignore nvm checksum validation.
>>>>>>    >>
>>>>>>    >> Cc: xe-linux-external@cisco.com
>>>>>>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
>>>>>>    >> ---
>>>>>>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
>>>>>>    >> ++++++++++++++++++++++------
>>>>>>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
>>>>>>
>>>>>>     >NAK for two reasons.  First, module parameters are not desirable
>>>>>>     >because their individual to one driver and a global solution should be
>>>>>>     >found so that all networking device drivers can use the solution.  This
>>>>>>     >will keep the interface to change/setup/modify networking drivers
>>>>>>     >consistent for all drivers.
>>>>>>
>>>>>>
>>>>>>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
>>>>>>     >and create a software workaround so that you can continue to use a
>>>>>>     >broken NIC.  There are methods/tools available to properly reprogram
>>>>>>     >the EEPROM on a NIC, which is the right solution for your issue.
>>>>>>
>>>>>> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
>>>>>
>>>>> Then why even bother with sending this upstream?
>>>>
>>>> It seems rather drastic to disable the entire driver because the checksum
>>>> doesn't match. It really should be a warning, even a big warning, to let people
>>>> know something is wrong, but disabling the whole driver doesn't make sense.
>>>
>>> You could generate a random Ethernet MAC address if you don't have a
>>> valid one, a lot of drivers do that, and that's a fairly reasonable
>>> behavior. At some point in your product development someone will
>>> certainly verify that the provisioned MAC address matches the network
>>> interface's MAC address.
>>> --
>>> Florian
>>
>> The thing is the EEPROM contains much more than just the MAC address.
>> There ends up being configuration for some of the PCIe interface in
>> the hardware as well as PHY configuration. If that is somehow mangled
>> we shouldn't be bringing up the part because there are one or more
>> pieces of the device configuration that are likely wrong.
>>
>> The checksum is being used to make sure the EEPROM is valid, without
>> that we would need to go through and validate each individual section
>> of the EEPROM before enabling the the portions of the device related
>> to it. The concern is that this will become a slippery slope where we
>> eventually have to code all the configuration of the EEPROM into the
>> driver itself.
>  
> 
> I don't think you can say because the checksum is valid that all data contained
> inside is also valid. You can have a valid checksum , and someone screwed up the
> data prior to the checksum getting computed.
> 
> 
>> We need to make the checksum a hard stop. If the part is broken then
>> it needs to be addressed. Workarounds just end up being used and
>> forgotten, which makes it that much harder to support the product.
>> Better to mark the part as being broken, and get it fixed now, than to
>> have parts start shipping that require workarounds in order to
>> function.o
> 
> I don't think it's realistic to define the development process for large
> corporations like Cisco, or like what your doing , to define the development
> process for all corporations and products which may use intel parts. It's better
> to be flexible.

Nikunj indicated that "manufacturing fixes NIC" so that sounds like a
workaround for an issue that would not affect a final product, in which
case, keeping downstream changes for development boards/intermediate
revisions of a product and focusing on relevant upstreaming changes for
the actual product would make a lot more sense, no?
-- 
Florian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-17 16:36             ` Daniel Walker
  2019-05-17 16:47               ` Florian Fainelli
@ 2019-05-17 16:58               ` Alexander Duyck
  2019-05-17 18:09                 ` Daniel Walker
  1 sibling, 1 reply; 12+ messages in thread
From: Alexander Duyck @ 2019-05-17 16:58 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Florian Fainelli, Nikunj Kela (nkela),
	netdev, linux-kernel, intel-wired-lan,
	xe-linux-external(mailer list),
	David S. Miller

On Fri, May 17, 2019 at 9:36 AM Daniel Walker <danielwa@cisco.com> wrote:
>
> On Fri, May 17, 2019 at 08:16:34AM -0700, Alexander Duyck wrote:
> > On Thu, May 16, 2019 at 6:48 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
> > >
> > >
> > >
> > > On 5/16/2019 6:03 PM, Daniel Walker wrote:
> > > > On Thu, May 16, 2019 at 03:02:18PM -0700, Florian Fainelli wrote:
> > > >> On 5/16/19 12:55 PM, Nikunj Kela (nkela) wrote:
> > > >>>
> > > >>>
> > > >>> On 5/16/19, 12:35 PM, "Jeff Kirsher" <jeffrey.t.kirsher@intel.com> wrote:
> > > >>>
> > > >>>     On Wed, 2019-05-08 at 23:14 +0000, Nikunj Kela wrote:
> > > >>>    >> Some of the broken NICs don't have EEPROM programmed correctly. It
> > > >>>    >> results
> > > >>>    >> in probe to fail. This change adds a module parameter that can be
> > > >>>    >> used to
> > > >>>    >> ignore nvm checksum validation.
> > > >>>    >>
> > > >>>    >> Cc: xe-linux-external@cisco.com
> > > >>>    >> Signed-off-by: Nikunj Kela <nkela@cisco.com>
> > > >>>    >> ---
> > > >>>    >>  drivers/net/ethernet/intel/igb/igb_main.c | 28
> > > >>>    >> ++++++++++++++++++++++------
> > > >>>    >>  1 file changed, 22 insertions(+), 6 deletions(-)
> > > >>>
> > > >>>     >NAK for two reasons.  First, module parameters are not desirable
> > > >>>     >because their individual to one driver and a global solution should be
> > > >>>     >found so that all networking device drivers can use the solution.  This
> > > >>>     >will keep the interface to change/setup/modify networking drivers
> > > >>>     >consistent for all drivers.
> > > >>>
> > > >>>
> > > >>>     >Second and more importantly, if your NIC is broken, fix it.  Do not try
> > > >>>     >and create a software workaround so that you can continue to use a
> > > >>>     >broken NIC.  There are methods/tools available to properly reprogram
> > > >>>     >the EEPROM on a NIC, which is the right solution for your issue.
> > > >>>
> > > >>> I am proposing this as a debug parameter. Obviously, we need to fix EEPROM but this helps us continuing the development while manufacturing fixes NIC.
> > > >>
> > > >> Then why even bother with sending this upstream?
> > > >
> > > > It seems rather drastic to disable the entire driver because the checksum
> > > > doesn't match. It really should be a warning, even a big warning, to let people
> > > > know something is wrong, but disabling the whole driver doesn't make sense.
> > >
> > > You could generate a random Ethernet MAC address if you don't have a
> > > valid one, a lot of drivers do that, and that's a fairly reasonable
> > > behavior. At some point in your product development someone will
> > > certainly verify that the provisioned MAC address matches the network
> > > interface's MAC address.
> > > --
> > > Florian
> >
> > The thing is the EEPROM contains much more than just the MAC address.
> > There ends up being configuration for some of the PCIe interface in
> > the hardware as well as PHY configuration. If that is somehow mangled
> > we shouldn't be bringing up the part because there are one or more
> > pieces of the device configuration that are likely wrong.
> >
> > The checksum is being used to make sure the EEPROM is valid, without
> > that we would need to go through and validate each individual section
> > of the EEPROM before enabling the the portions of the device related
> > to it. The concern is that this will become a slippery slope where we
> > eventually have to code all the configuration of the EEPROM into the
> > driver itself.
>
>
> I don't think you can say because the checksum is valid that all data contained
> inside is also valid. You can have a valid checksum , and someone screwed up the
> data prior to the checksum getting computed.

If someone screwed up the data prior to writing the checksum then that
is on them. In theory we could also have a multi-bit error that could
similarly be missed. However if the checksum is not valid then the
data contained in the NVM does not match what was originally written,
so we know we have bad data. Why should we act on the data if we know
it is bad?

> > We need to make the checksum a hard stop. If the part is broken then
> > it needs to be addressed. Workarounds just end up being used and
> > forgotten, which makes it that much harder to support the product.
> > Better to mark the part as being broken, and get it fixed now, than to
> > have parts start shipping that require workarounds in order to
> > function.o
>
> I don't think it's realistic to define the development process for large
> corporations like Cisco, or like what your doing , to define the development
> process for all corporations and products which may use intel parts. It's better
> to be flexible.
>
> Daniel

This isn't about development. If you are doing development you can do
whatever you want with your own downstream driver. What you are
attempting to do is update the upstream driver which is used in
production environments.

What concerns me is when this module parameter gets used in a
development environment and then slips into being required for a
production environment. At that point it defeats the whole point of
the checksum in the first place.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: add parameter to ignore nvm checksum validation
  2019-05-17 16:58               ` Alexander Duyck
@ 2019-05-17 18:09                 ` Daniel Walker
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Walker @ 2019-05-17 18:09 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Florian Fainelli, Nikunj Kela (nkela),
	netdev, linux-kernel, intel-wired-lan,
	xe-linux-external(mailer list),
	David S. Miller

On Fri, May 17, 2019 at 09:58:46AM -0700, Alexander Duyck wrote:
> > I don't think you can say because the checksum is valid that all data contained
> > inside is also valid. You can have a valid checksum , and someone screwed up the
> > data prior to the checksum getting computed.
> 
> If someone screwed up the data prior to writing the checksum then that
> is on them. In theory we could also have a multi-bit error that could
> similarly be missed. However if the checksum is not valid then the
> data contained in the NVM does not match what was originally written,
> so we know we have bad data. Why should we act on the data if we know
> it is bad?
 
It's hypothetical , but it's likely someone has screwed up the data prior to the
checksum getting computed.

> > > We need to make the checksum a hard stop. If the part is broken then
> > > it needs to be addressed. Workarounds just end up being used and
> > > forgotten, which makes it that much harder to support the product.
> > > Better to mark the part as being broken, and get it fixed now, than to
> > > have parts start shipping that require workarounds in order to
> > > function.o
> >
> > I don't think it's realistic to define the development process for large
> > corporations like Cisco, or like what your doing , to define the development
> > process for all corporations and products which may use intel parts. It's better
> > to be flexible.
> >
> > Daniel
> 
> This isn't about development. If you are doing development you can do
> whatever you want with your own downstream driver. What you are
> attempting to do is update the upstream driver which is used in
> production environments.
 
Cisco has this issue in development, and in production. So your right, it's not
about development in isolation. People make mistakes..

> What concerns me is when this module parameter gets used in a
> development environment and then slips into being required for a
> production environment. At that point it defeats the whole point of
> the checksum in the first place.

I agree .. Ultimately it's the choice of the OEM, if it gets into production
then it's their product and they support the product. As I was saying in a prior
email it should be a priority of the driver to give flexibility for mistakes
people will inevitably make.

Daniel


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-17 18:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 23:14 [PATCH] igb: add parameter to ignore nvm checksum validation Nikunj Kela
2019-05-16 19:35 ` Jeff Kirsher
2019-05-16 19:55   ` Nikunj Kela (nkela)
2019-05-16 22:02     ` Florian Fainelli
2019-05-16 22:33       ` Nikunj Kela (nkela)
2019-05-17  1:03       ` Daniel Walker
2019-05-17  1:48         ` Florian Fainelli
2019-05-17 15:16           ` [Intel-wired-lan] " Alexander Duyck
2019-05-17 16:36             ` Daniel Walker
2019-05-17 16:47               ` Florian Fainelli
2019-05-17 16:58               ` Alexander Duyck
2019-05-17 18:09                 ` Daniel Walker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).