All of lore.kernel.org
 help / color / mirror / Atom feed
* i40e: pci probe fails when using one bogus sfp
@ 2017-06-08  9:29 Olivier Matz
  2017-06-08 10:01 ` Bruce Richardson
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Matz @ 2017-06-08  9:29 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu, dev

Hi,

One of our customers encounters an issue with dpdk when there
is a bogus SFP on one of the ports. The following message is
reported:

  PMD: eth_i40e_dev_init(): Failed to sync phy type: -95 

(note: 95 is EOPNOTSUPP/ENOTSUP)

Unfortunately I cannot reproduce the issue to give more details,
but the hypothesis is that it fails in i40e_dev_sync_phy_type().
It could be related to that patch:

  http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf

To me, the expected behavior should be:
- pci probe is succesful
- the initialization of the port with faulty SFP fails
- the initialization of the other ports is succesful

Do you have any comment or idea to fix this issue?

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-08  9:29 i40e: pci probe fails when using one bogus sfp Olivier Matz
@ 2017-06-08 10:01 ` Bruce Richardson
  2017-06-08 10:13   ` Olivier Matz
  0 siblings, 1 reply; 14+ messages in thread
From: Bruce Richardson @ 2017-06-08 10:01 UTC (permalink / raw)
  To: Olivier Matz; +Cc: helin.zhang, jingjing.wu, dev

On Thu, Jun 08, 2017 at 11:29:17AM +0200, Olivier Matz wrote:
> Hi,
> 
> One of our customers encounters an issue with dpdk when there
> is a bogus SFP on one of the ports. The following message is
> reported:
> 
>   PMD: eth_i40e_dev_init(): Failed to sync phy type: -95 
> 
> (note: 95 is EOPNOTSUPP/ENOTSUP)
> 
> Unfortunately I cannot reproduce the issue to give more details,
> but the hypothesis is that it fails in i40e_dev_sync_phy_type().
> It could be related to that patch:
> 
>   http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf
> 
> To me, the expected behavior should be:
> - pci probe is succesful
> - the initialization of the port with faulty SFP fails
> - the initialization of the other ports is succesful
> 
> Do you have any comment or idea to fix this issue?
> 
And what is the current behaviour you are seeing? The whole PCI probe is
terminating after the failure on the error port? Can that one problem
port not just be blacklisted, since it's presumably unusable anyway?

/Bruce

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-08 10:01 ` Bruce Richardson
@ 2017-06-08 10:13   ` Olivier Matz
  2017-06-12  8:45     ` Xing, Beilei
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Matz @ 2017-06-08 10:13 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: helin.zhang, jingjing.wu, dev

On Thu, 8 Jun 2017 11:01:54 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Thu, Jun 08, 2017 at 11:29:17AM +0200, Olivier Matz wrote:
> > Hi,
> > 
> > One of our customers encounters an issue with dpdk when there
> > is a bogus SFP on one of the ports. The following message is
> > reported:
> > 
> >   PMD: eth_i40e_dev_init(): Failed to sync phy type: -95 
> > 
> > (note: 95 is EOPNOTSUPP/ENOTSUP)
> > 
> > Unfortunately I cannot reproduce the issue to give more details,
> > but the hypothesis is that it fails in i40e_dev_sync_phy_type().
> > It could be related to that patch:
> > 
> >   http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf
> > 
> > To me, the expected behavior should be:
> > - pci probe is succesful
> > - the initialization of the port with faulty SFP fails
> > - the initialization of the other ports is succesful
> > 
> > Do you have any comment or idea to fix this issue?
> >   
> And what is the current behaviour you are seeing? The whole PCI probe is
> terminating after the failure on the error port?

Yes, the probe is terminating

> Can that one problem
> port not just be blacklisted, since it's presumably unusable anyway?

This would imply the user (or the manager program) that launches the
application to parse the logs to detect which port fails and update
the configuration accordingly.

I think it would be better to return an error at port initialization,
so that the application can takes its dispositions directly. We could
even imagine that the port could be reenabled later once the SFP is
changed, without restarting the application.


Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-08 10:13   ` Olivier Matz
@ 2017-06-12  8:45     ` Xing, Beilei
  2017-06-12  9:45       ` Olivier Matz
  0 siblings, 1 reply; 14+ messages in thread
From: Xing, Beilei @ 2017-06-12  8:45 UTC (permalink / raw)
  To: Olivier Matz, Richardson, Bruce; +Cc: Zhang, Helin, Wu, Jingjing, dev

Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Thursday, June 8, 2017 6:14 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> 
> On Thu, 8 Jun 2017 11:01:54 +0100, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > On Thu, Jun 08, 2017 at 11:29:17AM +0200, Olivier Matz wrote:
> > > Hi,
> > >
> > > One of our customers encounters an issue with dpdk when there is a
> > > bogus SFP on one of the ports. The following message is
> > > reported:
> > >
> > >   PMD: eth_i40e_dev_init(): Failed to sync phy type: -95
> > >
> > > (note: 95 is EOPNOTSUPP/ENOTSUP)
> > >
> > > Unfortunately I cannot reproduce the issue to give more details, but
> > > the hypothesis is that it fails in i40e_dev_sync_phy_type().
> > > It could be related to that patch:
> > >
> > >   http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf
> > >
> > > To me, the expected behavior should be:
> > > - pci probe is succesful
> > > - the initialization of the port with faulty SFP fails
> > > - the initialization of the other ports is succesful
> > >
> > > Do you have any comment or idea to fix this issue?
> > >
> > And what is the current behaviour you are seeing? The whole PCI probe
> > is terminating after the failure on the error port?
> 
> Yes, the probe is terminating

Sorry I'm not very clear about the termination of PCI probe you mentioned.
I did some test in current code base: there're two ports (87:00.0 and 87:00.2)bound to igb_uio, and force the first port to fail to initialize, I find that the second port still can finish initialization successfully. I thought it has met your request. Please correct me if I'm wrong.

EAL: PCI device 0000:87:00.0 on NUMA socket -1
EAL:   probe driver: 8086:1572 net_i40e
~failed
eth_i40e_dev_init(): Failed to sync phy type: 0
EAL: PCI device 0000:87:00.1 on NUMA socket -1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:87:00.2 on NUMA socket -1
EAL:   probe driver: 8086:1572 net_i40e
~succeed

Beilei

> 
> > Can that one problem
> > port not just be blacklisted, since it's presumably unusable anyway?
> 
> This would imply the user (or the manager program) that launches the
> application to parse the logs to detect which port fails and update the
> configuration accordingly.
> 
> I think it would be better to return an error at port initialization, so that the
> application can takes its dispositions directly. We could even imagine that the
> port could be reenabled later once the SFP is changed, without restarting the
> application.
> 
> 
> Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-12  8:45     ` Xing, Beilei
@ 2017-06-12  9:45       ` Olivier Matz
  2017-06-12 15:53         ` Wu, Jingjing
  2017-06-12 16:23         ` Wu, Jingjing
  0 siblings, 2 replies; 14+ messages in thread
From: Olivier Matz @ 2017-06-12  9:45 UTC (permalink / raw)
  To: Xing, Beilei; +Cc: Richardson, Bruce, Zhang, Helin, Wu, Jingjing, dev

Hi Beilei,

On Mon, 12 Jun 2017 08:45:43 +0000, "Xing, Beilei" <beilei.xing@intel.com> wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > Sent: Thursday, June 8, 2017 6:14 PM
> > To: Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> > 
> > On Thu, 8 Jun 2017 11:01:54 +0100, Bruce Richardson
> > <bruce.richardson@intel.com> wrote:  
> > > On Thu, Jun 08, 2017 at 11:29:17AM +0200, Olivier Matz wrote:  
> > > > Hi,
> > > >
> > > > One of our customers encounters an issue with dpdk when there is a
> > > > bogus SFP on one of the ports. The following message is
> > > > reported:
> > > >
> > > >   PMD: eth_i40e_dev_init(): Failed to sync phy type: -95
> > > >
> > > > (note: 95 is EOPNOTSUPP/ENOTSUP)
> > > >
> > > > Unfortunately I cannot reproduce the issue to give more details, but
> > > > the hypothesis is that it fails in i40e_dev_sync_phy_type().
> > > > It could be related to that patch:
> > > >
> > > >   http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf
> > > >
> > > > To me, the expected behavior should be:
> > > > - pci probe is succesful
> > > > - the initialization of the port with faulty SFP fails
> > > > - the initialization of the other ports is succesful
> > > >
> > > > Do you have any comment or idea to fix this issue?
> > > >  
> > > And what is the current behaviour you are seeing? The whole PCI probe
> > > is terminating after the failure on the error port?  
> > 
> > Yes, the probe is terminating  
> 
> Sorry I'm not very clear about the termination of PCI probe you mentioned.
> I did some test in current code base: there're two ports (87:00.0 and 87:00.2)bound to igb_uio, and force the first port to fail to initialize, I find that the second port still can finish initialization successfully. I thought it has met your request. Please correct me if I'm wrong.
> 
> EAL: PCI device 0000:87:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:1572 net_i40e
> ~failed
> eth_i40e_dev_init(): Failed to sync phy type: 0
> EAL: PCI device 0000:87:00.1 on NUMA socket -1
> EAL:   probe driver: 8086:1572 net_i40e
> EAL: PCI device 0000:87:00.2 on NUMA socket -1
> EAL:   probe driver: 8086:1572 net_i40e
> ~succeed


Thank you for your quick answer.

Yes, the pci probing continues for the other ports even if one port
failed (since v17.05, commit 10f6c93cea).

But I find a bit strange to have this check about the SFP in the
PCI probing function instead of having it the port initialization
function. My understanding is the SFP check is not related to PCI
probing. Is it consistent with other drivers?

In case of failure, it shifts the port ids of next ports, making it
harder to recognize them in the application.

With current code, after a failure, if the user replaces the faulty SFP
after the application is started, it requires the application to support
hotplug to ask to probe the PCI again to make the port appear again.

If the failure is moved in the port start function, it would just
require the application to start the port again.


Regards
Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-12  9:45       ` Olivier Matz
@ 2017-06-12 15:53         ` Wu, Jingjing
  2017-06-12 16:25           ` Wu, Jingjing
  2017-06-12 16:23         ` Wu, Jingjing
  1 sibling, 1 reply; 14+ messages in thread
From: Wu, Jingjing @ 2017-06-12 15:53 UTC (permalink / raw)
  To: Olivier Matz, Xing, Beilei; +Cc: Richardson, Bruce, Zhang, Helin, dev



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, June 12, 2017 5:46 PM
> To: Xing, Beilei <beilei.xing@intel.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> 
> Hi Beilei,
> 
> On Mon, 12 Jun 2017 08:45:43 +0000, "Xing, Beilei" <beilei.xing@intel.com> wrote:
> > Hi Olivier,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Thursday, June 8, 2017 6:14 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>
> > > Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> > >
> > > On Thu, 8 Jun 2017 11:01:54 +0100, Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > > > On Thu, Jun 08, 2017 at 11:29:17AM +0200, Olivier Matz wrote:
> > > > > Hi,
> > > > >
> > > > > One of our customers encounters an issue with dpdk when there is a
> > > > > bogus SFP on one of the ports. The following message is
> > > > > reported:
> > > > >
> > > > >   PMD: eth_i40e_dev_init(): Failed to sync phy type: -95
> > > > >
> > > > > (note: 95 is EOPNOTSUPP/ENOTSUP)
> > > > >
> > > > > Unfortunately I cannot reproduce the issue to give more details, but
> > > > > the hypothesis is that it fails in i40e_dev_sync_phy_type().
> > > > > It could be related to that patch:
> > > > >
> > > > >   http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf
> > > > >
> > > > > To me, the expected behavior should be:
> > > > > - pci probe is succesful
> > > > > - the initialization of the port with faulty SFP fails
> > > > > - the initialization of the other ports is succesful
> > > > >
> > > > > Do you have any comment or idea to fix this issue?
> > > > >
> > > > And what is the current behaviour you are seeing? The whole PCI probe
> > > > is terminating after the failure on the error port?
> > >
> > > Yes, the probe is terminating
> >
> > Sorry I'm not very clear about the termination of PCI probe you mentioned.
> > I did some test in current code base: there're two ports (87:00.0 and 87:00.2)bound to
> igb_uio, and force the first port to fail to initialize, I find that the second port still can finish
> initialization successfully. I thought it has met your request. Please correct me if I'm wrong.
> >
> > EAL: PCI device 0000:87:00.0 on NUMA socket -1
> > EAL:   probe driver: 8086:1572 net_i40e
> > ~failed
> > eth_i40e_dev_init(): Failed to sync phy type: 0
> > EAL: PCI device 0000:87:00.1 on NUMA socket -1
> > EAL:   probe driver: 8086:1572 net_i40e
> > EAL: PCI device 0000:87:00.2 on NUMA socket -1
> > EAL:   probe driver: 8086:1572 net_i40e
> > ~succeed
> 
> 
> Thank you for your quick answer.
> 
> Yes, the pci probing continues for the other ports even if one port
> failed (since v17.05, commit 10f6c93cea).
> 
> But I find a bit strange to have this check about the SFP in the
> PCI probing function instead of having it the port initialization
> function. My understanding is the SFP check is not related to PCI
> probing. Is it consistent with other drivers?
> 
> In case of failure, it shifts the port ids of next ports, making it
> harder to recognize them in the application.
> 
> With current code, after a failure, if the user replaces the faulty SFP
> after the application is started, it requires the application to support
> hotplug to ask to probe the PCI again to make the port appear again.
> 
> If the failure is moved in the port start function, it would just
> require the application to start the port again.
> 
> 
> Regards
> Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-12  9:45       ` Olivier Matz
  2017-06-12 15:53         ` Wu, Jingjing
@ 2017-06-12 16:23         ` Wu, Jingjing
  2017-06-13  8:27           ` Olivier MATZ
  1 sibling, 1 reply; 14+ messages in thread
From: Wu, Jingjing @ 2017-06-12 16:23 UTC (permalink / raw)
  To: Olivier Matz, Xing, Beilei; +Cc: Richardson, Bruce, Zhang, Helin, dev

HI, Olivier

> Thank you for your quick answer.
> 
> Yes, the pci probing continues for the other ports even if one port
> failed (since v17.05, commit 10f6c93cea).
> 
> But I find a bit strange to have this check about the SFP in the
> PCI probing function instead of having it the port initialization
> function. My understanding is the SFP check is not related to PCI
> probing. Is it consistent with other drivers?
>
Could your customer help to check what is the exactly error code is by
Checking the "hw->aq.asq_last_status" when eth_i40e_dev_init() fails.

Yes, it seems better PHY init fails doesn't block PCI probe. Just compared with i40e
Kernel version, PHY init fails doesn't block CPI probe. And there is watchdog task to
Check the PHY status. But DPDK is polling mode, If PCI probe fails, PCI probe continues,
then application need poll PHY status to support SFP change.

And I also checked ixgbe driver, it seems phy init is done at probe time.
In my opinion, dev_start and dev_stop is meaning ready for receiving and transmitting
packets, it may not be suitable to put it in the start/stop phase.


> 
> Regards
> Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-12 15:53         ` Wu, Jingjing
@ 2017-06-12 16:25           ` Wu, Jingjing
  0 siblings, 0 replies; 14+ messages in thread
From: Wu, Jingjing @ 2017-06-12 16:25 UTC (permalink / raw)
  To: Wu, Jingjing, Olivier Matz, Xing, Beilei
  Cc: Richardson, Bruce, Zhang, Helin, dev

Please ignore the empty mail.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wu, Jingjing
> Sent: Monday, June 12, 2017 11:54 PM
> To: Olivier Matz <olivier.matz@6wind.com>; Xing, Beilei <beilei.xing@intel.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> 
> 
> 
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Monday, June 12, 2017 5:46 PM
> > To: Xing, Beilei <beilei.xing@intel.com>
> > Cc: Richardson, Bruce <bruce.richardson@intel.com>; Zhang, Helin
> <helin.zhang@intel.com>;
> > Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> >
> > Hi Beilei,
> >
> > On Mon, 12 Jun 2017 08:45:43 +0000, "Xing, Beilei" <beilei.xing@intel.com> wrote:
> > > Hi Olivier,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > > Sent: Thursday, June 8, 2017 6:14 PM
> > > > To: Richardson, Bruce <bruce.richardson@intel.com>
> > > > Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> > > > <jingjing.wu@intel.com>; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> > > >
> > > > On Thu, 8 Jun 2017 11:01:54 +0100, Bruce Richardson
> > > > <bruce.richardson@intel.com> wrote:
> > > > > On Thu, Jun 08, 2017 at 11:29:17AM +0200, Olivier Matz wrote:
> > > > > > Hi,
> > > > > >
> > > > > > One of our customers encounters an issue with dpdk when there is a
> > > > > > bogus SFP on one of the ports. The following message is
> > > > > > reported:
> > > > > >
> > > > > >   PMD: eth_i40e_dev_init(): Failed to sync phy type: -95
> > > > > >
> > > > > > (note: 95 is EOPNOTSUPP/ENOTSUP)
> > > > > >
> > > > > > Unfortunately I cannot reproduce the issue to give more details, but
> > > > > > the hypothesis is that it fails in i40e_dev_sync_phy_type().
> > > > > > It could be related to that patch:
> > > > > >
> > > > > >   http://dpdk.org/browse/dpdk/commit/?id=edfb226f69bf
> > > > > >
> > > > > > To me, the expected behavior should be:
> > > > > > - pci probe is succesful
> > > > > > - the initialization of the port with faulty SFP fails
> > > > > > - the initialization of the other ports is succesful
> > > > > >
> > > > > > Do you have any comment or idea to fix this issue?
> > > > > >
> > > > > And what is the current behaviour you are seeing? The whole PCI probe
> > > > > is terminating after the failure on the error port?
> > > >
> > > > Yes, the probe is terminating
> > >
> > > Sorry I'm not very clear about the termination of PCI probe you mentioned.
> > > I did some test in current code base: there're two ports (87:00.0 and 87:00.2)bound to
> > igb_uio, and force the first port to fail to initialize, I find that the second port still can finish
> > initialization successfully. I thought it has met your request. Please correct me if I'm wrong.
> > >
> > > EAL: PCI device 0000:87:00.0 on NUMA socket -1
> > > EAL:   probe driver: 8086:1572 net_i40e
> > > ~failed
> > > eth_i40e_dev_init(): Failed to sync phy type: 0
> > > EAL: PCI device 0000:87:00.1 on NUMA socket -1
> > > EAL:   probe driver: 8086:1572 net_i40e
> > > EAL: PCI device 0000:87:00.2 on NUMA socket -1
> > > EAL:   probe driver: 8086:1572 net_i40e
> > > ~succeed
> >
> >
> > Thank you for your quick answer.
> >
> > Yes, the pci probing continues for the other ports even if one port
> > failed (since v17.05, commit 10f6c93cea).
> >
> > But I find a bit strange to have this check about the SFP in the
> > PCI probing function instead of having it the port initialization
> > function. My understanding is the SFP check is not related to PCI
> > probing. Is it consistent with other drivers?
> >
> > In case of failure, it shifts the port ids of next ports, making it
> > harder to recognize them in the application.
> >
> > With current code, after a failure, if the user replaces the faulty SFP
> > after the application is started, it requires the application to support
> > hotplug to ask to probe the PCI again to make the port appear again.
> >
> > If the failure is moved in the port start function, it would just
> > require the application to start the port again.
> >
> >
> > Regards
> > Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-12 16:23         ` Wu, Jingjing
@ 2017-06-13  8:27           ` Olivier MATZ
  2017-06-13 14:14             ` Wu, Jingjing
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier MATZ @ 2017-06-13  8:27 UTC (permalink / raw)
  To: Wu, Jingjing; +Cc: Xing, Beilei, Richardson, Bruce, Zhang, Helin, dev

Hi Jingjing,

On Mon, 12 Jun 2017 16:23:47 +0000, "Wu, Jingjing" <jingjing.wu@intel.com> wrote:
> HI, Olivier
> 
> > Thank you for your quick answer.
> > 
> > Yes, the pci probing continues for the other ports even if one port
> > failed (since v17.05, commit 10f6c93cea).
> > 
> > But I find a bit strange to have this check about the SFP in the
> > PCI probing function instead of having it the port initialization
> > function. My understanding is the SFP check is not related to PCI
> > probing. Is it consistent with other drivers?
> >  
> Could your customer help to check what is the exactly error code is by
> Checking the "hw->aq.asq_last_status" when eth_i40e_dev_init() fails.

I'm afraid it won't be possible, since it's a random issue that is
not reproducible.

What do you think about adding a log in i40e_dev_sync_phy_type() to
display the status value in case of failure? It would help for next
times. I can submit a patch for that if you want.

> Yes, it seems better PHY init fails doesn't block PCI probe. Just compared with i40e
> Kernel version, PHY init fails doesn't block CPI probe. And there is watchdog task to
> Check the PHY status. But DPDK is polling mode, If PCI probe fails, PCI probe continues,
> then application need poll PHY status to support SFP change.

From what I understand, i40e_dev_sync_phy_type() was added
to know the PHY capabilities, which useful for instance for devinfo().
Indeed, devinfo() can be call before the port is started, so
we need to have get the PHY capabilities before starting the port.

I've done a quick patch that:
- keeps the call to i40e_dev_sync_phy_type() in pci probing but
  continue in case of failure
- add another call to i40e_dev_sync_phy_type() in port start
  function

I think it would solve the issue without impacting the result
of the devinfo() function. What would you think of a patch like
this?

> And I also checked ixgbe driver, it seems phy init is done at probe time.
> In my opinion, dev_start and dev_stop is meaning ready for receiving and transmitting
> packets, it may not be suitable to put it in the start/stop phase.

I'm wondering: what would/should occur if the SFP is unplugged and
replugged while the port is running?

I suppose we don't have any PCI notification when unplugging/plugging
the SFP, so I'm not sure we should have this check at the PCI level,
because the application does not know if the bus has to be probed again.


Thanks
Olivier

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-13  8:27           ` Olivier MATZ
@ 2017-06-13 14:14             ` Wu, Jingjing
  2017-06-15  9:03               ` Olivier MATZ
  0 siblings, 1 reply; 14+ messages in thread
From: Wu, Jingjing @ 2017-06-13 14:14 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: Xing, Beilei, Richardson, Bruce, Zhang, Helin, dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, June 13, 2017 4:28 PM
> To: Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Xing, Beilei <beilei.xing@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> 
> Hi Jingjing,
> 
> On Mon, 12 Jun 2017 16:23:47 +0000, "Wu, Jingjing" <jingjing.wu@intel.com> wrote:
> > HI, Olivier
> >
> > > Thank you for your quick answer.
> > >
> > > Yes, the pci probing continues for the other ports even if one port
> > > failed (since v17.05, commit 10f6c93cea).
> > >
> > > But I find a bit strange to have this check about the SFP in the
> > > PCI probing function instead of having it the port initialization
> > > function. My understanding is the SFP check is not related to PCI
> > > probing. Is it consistent with other drivers?
> > >
> > Could your customer help to check what is the exactly error code is by
> > Checking the "hw->aq.asq_last_status" when eth_i40e_dev_init() fails.
> 
> I'm afraid it won't be possible, since it's a random issue that is
> not reproducible.
> 
> What do you think about adding a log in i40e_dev_sync_phy_type() to
> display the status value in case of failure? It would help for next
> times. I can submit a patch for that if you want.
>
Yes, It makes sense. Thanks for doing that.

> > Yes, it seems better PHY init fails doesn't block PCI probe. Just compared with i40e
> > Kernel version, PHY init fails doesn't block CPI probe. And there is watchdog task to
> > Check the PHY status. But DPDK is polling mode, If PCI probe fails, PCI probe continues,
> > then application need poll PHY status to support SFP change.
> 
> From what I understand, i40e_dev_sync_phy_type() was added
> to know the PHY capabilities, which useful for instance for devinfo().
> Indeed, devinfo() can be call before the port is started, so
> we need to have get the PHY capabilities before starting the port.
> 
> I've done a quick patch that:
> - keeps the call to i40e_dev_sync_phy_type() in pci probing but
>   continue in case of failure
> - add another call to i40e_dev_sync_phy_type() in port start
>   function
>
I think this workaround would not introduce issue if SFP is ready on NIC before Probe.
But I'm not sure if it can resolve the issue, because during probe, dozens of initialization work
Is ongoing. I think we need to verify it PHY init during start phase works or not. Maybe we
Need to do more than that? Not sure.

> I think it would solve the issue without impacting the result
> of the devinfo() function. What would you think of a patch like
> this?
> 
> > And I also checked ixgbe driver, it seems phy init is done at probe time.
> > In my opinion, dev_start and dev_stop is meaning ready for receiving and transmitting
> > packets, it may not be suitable to put it in the start/stop phase.
> 
> I'm wondering: what would/should occur if the SFP is unplugged and
> replugged while the port is running?
> 
> I suppose we don't have any PCI notification when unplugging/plugging
> the SFP, so I'm not sure we should have this check at the PCI level,
> because the application does not know if the bus has to be probed again.
>
We can consider about unplugging/plugging as link event in DPDK. Now we
have LSC or user can polling link status. Maybe that is a way to go.

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: i40e: pci probe fails when using one bogus sfp
  2017-06-13 14:14             ` Wu, Jingjing
@ 2017-06-15  9:03               ` Olivier MATZ
  2017-06-15  9:08                 ` [PATCH] net/i40e: avoid PCI probing failure when using " Olivier Matz
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier MATZ @ 2017-06-15  9:03 UTC (permalink / raw)
  To: Wu, Jingjing; +Cc: Xing, Beilei, Richardson, Bruce, Zhang, Helin, dev

Hi,

On Tue, 13 Jun 2017 14:14:43 +0000, "Wu, Jingjing" <jingjing.wu@intel.com> wrote:
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Tuesday, June 13, 2017 4:28 PM
> > To: Wu, Jingjing <jingjing.wu@intel.com>
> > Cc: Xing, Beilei <beilei.xing@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> > Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] i40e: pci probe fails when using one bogus sfp
> > 
> > Hi Jingjing,
> > 
> > On Mon, 12 Jun 2017 16:23:47 +0000, "Wu, Jingjing" <jingjing.wu@intel.com> wrote:  
> > > HI, Olivier
> > >  
> > > > Thank you for your quick answer.
> > > >
> > > > Yes, the pci probing continues for the other ports even if one port
> > > > failed (since v17.05, commit 10f6c93cea).
> > > >
> > > > But I find a bit strange to have this check about the SFP in the
> > > > PCI probing function instead of having it the port initialization
> > > > function. My understanding is the SFP check is not related to PCI
> > > > probing. Is it consistent with other drivers?
> > > >  
> > > Could your customer help to check what is the exactly error code is by
> > > Checking the "hw->aq.asq_last_status" when eth_i40e_dev_init() fails.  
> > 
> > I'm afraid it won't be possible, since it's a random issue that is
> > not reproducible.
> > 
> > What do you think about adding a log in i40e_dev_sync_phy_type() to
> > display the status value in case of failure? It would help for next
> > times. I can submit a patch for that if you want.
> >  
> Yes, It makes sense. Thanks for doing that.
> 
> > > Yes, it seems better PHY init fails doesn't block PCI probe. Just compared with i40e
> > > Kernel version, PHY init fails doesn't block CPI probe. And there is watchdog task to
> > > Check the PHY status. But DPDK is polling mode, If PCI probe fails, PCI probe continues,
> > > then application need poll PHY status to support SFP change.  
> > 
> > From what I understand, i40e_dev_sync_phy_type() was added
> > to know the PHY capabilities, which useful for instance for devinfo().
> > Indeed, devinfo() can be call before the port is started, so
> > we need to have get the PHY capabilities before starting the port.
> > 
> > I've done a quick patch that:
> > - keeps the call to i40e_dev_sync_phy_type() in pci probing but
> >   continue in case of failure
> > - add another call to i40e_dev_sync_phy_type() in port start
> >   function
> >  
> I think this workaround would not introduce issue if SFP is ready on NIC before Probe.
> But I'm not sure if it can resolve the issue, because during probe, dozens of initialization work
> Is ongoing. I think we need to verify it PHY init during start phase works or not. Maybe we
> Need to do more than that? Not sure.

After discussion with Helin, I moved the call to i40e_dev_sync_phy_type()
in dev_configure(). I'm sending it as a reply to this mail.

I see if I can get my patch tested by the customer.


> 
> > I think it would solve the issue without impacting the result
> > of the devinfo() function. What would you think of a patch like
> > this?
> >   
> > > And I also checked ixgbe driver, it seems phy init is done at probe time.
> > > In my opinion, dev_start and dev_stop is meaning ready for receiving and transmitting
> > > packets, it may not be suitable to put it in the start/stop phase.  
> > 
> > I'm wondering: what would/should occur if the SFP is unplugged and
> > replugged while the port is running?
> > 
> > I suppose we don't have any PCI notification when unplugging/plugging
> > the SFP, so I'm not sure we should have this check at the PCI level,
> > because the application does not know if the bus has to be probed again.
> >  
> We can consider about unplugging/plugging as link event in DPDK. Now we
> have LSC or user can polling link status. Maybe that is a way to go.

Yes, I agree. Thanks for the confirmation.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] net/i40e: avoid PCI probing failure when using bogus sfp
  2017-06-15  9:03               ` Olivier MATZ
@ 2017-06-15  9:08                 ` Olivier Matz
  2017-06-23  9:25                   ` Wu, Jingjing
  0 siblings, 1 reply; 14+ messages in thread
From: Olivier Matz @ 2017-06-15  9:08 UTC (permalink / raw)
  To: dev, jingjing.wu, helin.zhang; +Cc: beilei.xing, bruce.richardson

When a port is using a bogus SFP, the PCI probing returns an error,
preventing to register a portid.

To give a better chance to the applications to retry after the SFP is
changed, move this check in eth_i40e_dev_configure(), so that only a
port reconfiguration is needed to retry.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/i40e/i40e_ethdev.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d3428b9c0..1f256d767 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1142,11 +1142,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	config_floating_veb(dev);
 	/* Clear PXE mode */
 	i40e_clear_pxe_mode(hw);
-	ret = i40e_dev_sync_phy_type(hw);
-	if (ret) {
-		PMD_INIT_LOG(ERR, "Failed to sync phy type: %d", ret);
-		goto err_sync_phy_type;
-	}
+	i40e_dev_sync_phy_type(hw);
+
 	/*
 	 * On X710, performance number is far from the expectation on recent
 	 * firmware versions. The fix for this issue may not be integrated in
@@ -1331,7 +1328,6 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 err_qp_pool_init:
 err_parameter_init:
 err_get_capabilities:
-err_sync_phy_type:
 	(void)i40e_shutdown_adminq(hw);
 
 	return ret;
@@ -1470,9 +1466,14 @@ i40e_dev_configure(struct rte_eth_dev *dev)
 	struct i40e_adapter *ad =
 		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 	struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	enum rte_eth_rx_mq_mode mq_mode = dev->data->dev_conf.rxmode.mq_mode;
 	int i, ret;
 
+	ret = i40e_dev_sync_phy_type(hw);
+	if (ret)
+		return ret;
+
 	/* Initialize to TRUE. If any of Rx queues doesn't meet the
 	 * bulk allocation or vector Rx preconditions we will reset it.
 	 */
@@ -9176,8 +9177,11 @@ i40e_dev_sync_phy_type(struct i40e_hw *hw)
 	status = i40e_aq_get_phy_capabilities(hw, false, true, &phy_ab,
 					      NULL);
 
-	if (status)
+	if (status) {
+		PMD_INIT_LOG(WARNING, "Failed to sync phy type: status=%d",
+			status);
 		return ret;
+	}
 
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] net/i40e: avoid PCI probing failure when using bogus sfp
  2017-06-15  9:08                 ` [PATCH] net/i40e: avoid PCI probing failure when using " Olivier Matz
@ 2017-06-23  9:25                   ` Wu, Jingjing
  2017-06-23 10:11                     ` Ferruh Yigit
  0 siblings, 1 reply; 14+ messages in thread
From: Wu, Jingjing @ 2017-06-23  9:25 UTC (permalink / raw)
  To: Olivier Matz, dev, Zhang, Helin; +Cc: Xing, Beilei, Richardson, Bruce



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, June 15, 2017 5:09 PM
> To: dev@dpdk.org; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
> <helin.zhang@intel.com>
> Cc: Xing, Beilei <beilei.xing@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [PATCH] net/i40e: avoid PCI probing failure when using bogus sfp
> 
> When a port is using a bogus SFP, the PCI probing returns an error,
> preventing to register a portid.
> 
> To give a better chance to the applications to retry after the SFP is
> changed, move this check in eth_i40e_dev_configure(), so that only a
> port reconfiguration is needed to retry.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net/i40e: avoid PCI probing failure when using bogus sfp
  2017-06-23  9:25                   ` Wu, Jingjing
@ 2017-06-23 10:11                     ` Ferruh Yigit
  0 siblings, 0 replies; 14+ messages in thread
From: Ferruh Yigit @ 2017-06-23 10:11 UTC (permalink / raw)
  To: Wu, Jingjing, Olivier Matz, dev, Zhang, Helin
  Cc: Xing, Beilei, Richardson, Bruce

On 6/23/2017 10:25 AM, Wu, Jingjing wrote:
> 
> 
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Thursday, June 15, 2017 5:09 PM
>> To: dev@dpdk.org; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
>> <helin.zhang@intel.com>
>> Cc: Xing, Beilei <beilei.xing@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>> Subject: [PATCH] net/i40e: avoid PCI probing failure when using bogus sfp
>>
>> When a port is using a bogus SFP, the PCI probing returns an error,
>> preventing to register a portid.
>>
>> To give a better chance to the applications to retry after the SFP is
>> changed, move this check in eth_i40e_dev_configure(), so that only a
>> port reconfiguration is needed to retry.
>>
>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Jingjing Wu <jingjing.wu@intel.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-23 10:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-08  9:29 i40e: pci probe fails when using one bogus sfp Olivier Matz
2017-06-08 10:01 ` Bruce Richardson
2017-06-08 10:13   ` Olivier Matz
2017-06-12  8:45     ` Xing, Beilei
2017-06-12  9:45       ` Olivier Matz
2017-06-12 15:53         ` Wu, Jingjing
2017-06-12 16:25           ` Wu, Jingjing
2017-06-12 16:23         ` Wu, Jingjing
2017-06-13  8:27           ` Olivier MATZ
2017-06-13 14:14             ` Wu, Jingjing
2017-06-15  9:03               ` Olivier MATZ
2017-06-15  9:08                 ` [PATCH] net/i40e: avoid PCI probing failure when using " Olivier Matz
2017-06-23  9:25                   ` Wu, Jingjing
2017-06-23 10:11                     ` Ferruh Yigit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.