All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] eal: bus scan and probe never fail
@ 2017-08-12 10:22 Shreyansh Jain
  2017-09-18 11:36 ` Hemant Agrawal
  0 siblings, 1 reply; 19+ messages in thread
From: Shreyansh Jain @ 2017-08-12 10:22 UTC (permalink / raw)
  To: dev; +Cc: thomas, jblunck, Shreyansh Jain

Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.

Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

---
Until now, this decision was left onto author of bus specific scan and
probe function. But, that is incorrect.
---
 lib/librte_eal/common/eal_common_bus.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d..58e1084 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -73,11 +73,9 @@ rte_bus_scan(void)
 
 	TAILQ_FOREACH(bus, &rte_bus_list, next) {
 		ret = bus->scan();
-		if (ret) {
+		if (ret)
 			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
 				bus->name);
-			return ret;
-		}
 	}
 
 	return 0;
@@ -97,20 +95,16 @@ rte_bus_probe(void)
 		}
 
 		ret = bus->probe();
-		if (ret) {
+		if (ret)
 			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
 				bus->name);
-			return ret;
-		}
 	}
 
 	if (vbus) {
 		ret = vbus->probe();
-		if (ret) {
+		if (ret)
 			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
 				vbus->name);
-			return ret;
-		}
 	}
 
 	return 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-08-12 10:22 [PATCH] eal: bus scan and probe never fail Shreyansh Jain
@ 2017-09-18 11:36 ` Hemant Agrawal
  2017-09-19 18:51   ` Jan Blunck
  0 siblings, 1 reply; 19+ messages in thread
From: Hemant Agrawal @ 2017-09-18 11:36 UTC (permalink / raw)
  To: Shreyansh Jain, dev; +Cc: thomas, jblunck

Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>

On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
> Bus scan is responsible for finding devices over *all* buses.
> Some of these buses might not be able to scan but that should
> not prevent other buses to be scanned.
>
> Same is the case for probing. It is possible that some devices which
> were scanned didn't have a specific driver. That should not prevent
> other buses from being probed.
>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>
> ---
> Until now, this decision was left onto author of bus specific scan and
> probe function. But, that is incorrect.
> ---
>  lib/librte_eal/common/eal_common_bus.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 08bec2d..58e1084 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -73,11 +73,9 @@ rte_bus_scan(void)
>
>  	TAILQ_FOREACH(bus, &rte_bus_list, next) {
>  		ret = bus->scan();
> -		if (ret) {
> +		if (ret)
>  			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>  				bus->name);
> -			return ret;
> -		}
>  	}
>
>  	return 0;
> @@ -97,20 +95,16 @@ rte_bus_probe(void)
>  		}
>
>  		ret = bus->probe();
> -		if (ret) {
> +		if (ret)
>  			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>  				bus->name);
> -			return ret;
> -		}
>  	}
>
>  	if (vbus) {
>  		ret = vbus->probe();
> -		if (ret) {
> +		if (ret)
>  			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>  				vbus->name);
> -			return ret;
> -		}
>  	}
>
>  	return 0;
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-09-18 11:36 ` Hemant Agrawal
@ 2017-09-19 18:51   ` Jan Blunck
  2017-10-05 23:21     ` Thomas Monjalon
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Blunck @ 2017-09-19 18:51 UTC (permalink / raw)
  To: Hemant Agrawal; +Cc: Shreyansh Jain, dev, Thomas Monjalon

On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>
>
> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>>
>> Bus scan is responsible for finding devices over *all* buses.
>> Some of these buses might not be able to scan but that should
>> not prevent other buses to be scanned.
>>

If scanning the bus fails this is signaling an error. In that case we
might even want to unregister the bus.

>> Same is the case for probing. It is possible that some devices which
>> were scanned didn't have a specific driver. That should not prevent
>> other buses from being probed.

Absolutely correct.

>>
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>
>> ---
>> Until now, this decision was left onto author of bus specific scan and
>> probe function. But, that is incorrect.
>> ---
>>  lib/librte_eal/common/eal_common_bus.c | 12 +++---------
>>  1 file changed, 3 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/eal_common_bus.c
>> b/lib/librte_eal/common/eal_common_bus.c
>> index 08bec2d..58e1084 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -73,11 +73,9 @@ rte_bus_scan(void)
>>
>>         TAILQ_FOREACH(bus, &rte_bus_list, next) {
>>                 ret = bus->scan();
>> -               if (ret) {
>> +               if (ret)
>>                         RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>>                                 bus->name);
>> -                       return ret;
>> -               }
>>         }
>>
>>         return 0;
>> @@ -97,20 +95,16 @@ rte_bus_probe(void)
>>                 }
>>
>>                 ret = bus->probe();
>> -               if (ret) {
>> +               if (ret)
>>                         RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>                                 bus->name);
>> -                       return ret;
>> -               }
>>         }
>>
>>         if (vbus) {
>>                 ret = vbus->probe();
>> -               if (ret) {
>> +               if (ret)
>>                         RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>                                 vbus->name);
>> -                       return ret;
>> -               }
>>         }
>>
>>         return 0;
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-09-19 18:51   ` Jan Blunck
@ 2017-10-05 23:21     ` Thomas Monjalon
  2017-10-06 13:12       ` Shreyansh Jain
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Monjalon @ 2017-10-05 23:21 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, Jan Blunck, Hemant Agrawal

19/09/2017 20:51, Jan Blunck:
> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
> > Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >
> >
> > On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
> >>
> >> Bus scan is responsible for finding devices over *all* buses.
> >> Some of these buses might not be able to scan but that should
> >> not prevent other buses to be scanned.
> >>
> 
> If scanning the bus fails this is signaling an error. In that case we
> might even want to unregister the bus.

A scan error seems important enough to be reported to the caller.
OK to continue scanning other buses, but an error code should be returned.

> >> Same is the case for probing. It is possible that some devices which
> >> were scanned didn't have a specific driver. That should not prevent
> >> other buses from being probed.
> 
> Absolutely correct.

Yes
When we will have a probe notification, we will be able
to notify the upper layer that a device probing has failed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-05 23:21     ` Thomas Monjalon
@ 2017-10-06 13:12       ` Shreyansh Jain
  2017-10-06 13:37         ` Thomas Monjalon
  0 siblings, 1 reply; 19+ messages in thread
From: Shreyansh Jain @ 2017-10-06 13:12 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Jan Blunck, Hemant Agrawal

On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
> 19/09/2017 20:51, Jan Blunck:
>> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
>>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>
>>>
>>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>>>>
>>>> Bus scan is responsible for finding devices over *all* buses.
>>>> Some of these buses might not be able to scan but that should
>>>> not prevent other buses to be scanned.
>>>>
>>
>> If scanning the bus fails this is signaling an error. In that case we
>> might even want to unregister the bus.
> 
> A scan error seems important enough to be reported to the caller.
> OK to continue scanning other buses, but an error code should be returned.

Isn't that counter intuitive if the scanning continues after error and 
an error is expected to be returned from it?
What if there are more than one error? Which one is reported.

As for cleanup, bus un-registration is not correct. Scan has failed, 
which might mean some assumption that bus took for scanning for devices 
doesn't exist for time being or present platform. Either way, I think 
whatever rollback needs to be done for scan failure, would be done by 
the bus->scan() implementation.

Let me know what you think - I will make changes to the patch and push 
again.

> 
>>>> Same is the case for probing. It is possible that some devices which
>>>> were scanned didn't have a specific driver. That should not prevent
>>>> other buses from being probed.
>>
>> Absolutely correct.
> 
> Yes
> When we will have a probe notification, we will be able
> to notify the upper layer that a device probing has failed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-06 13:12       ` Shreyansh Jain
@ 2017-10-06 13:37         ` Thomas Monjalon
  2017-10-06 17:34           ` Jan Blunck
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Monjalon @ 2017-10-06 13:37 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, Jan Blunck, Hemant Agrawal

06/10/2017 15:12, Shreyansh Jain:
> On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
> > 19/09/2017 20:51, Jan Blunck:
> >> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
> >>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >>>
> >>>
> >>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
> >>>>
> >>>> Bus scan is responsible for finding devices over *all* buses.
> >>>> Some of these buses might not be able to scan but that should
> >>>> not prevent other buses to be scanned.
> >>>>
> >>
> >> If scanning the bus fails this is signaling an error. In that case we
> >> might even want to unregister the bus.
> > 
> > A scan error seems important enough to be reported to the caller.
> > OK to continue scanning other buses, but an error code should be returned.
> 
> Isn't that counter intuitive if the scanning continues after error and 
> an error is expected to be returned from it?
> What if there are more than one error? Which one is reported.

Both are reported with the same code.
Anyway, there is no way to know which bus is failing,
except from log.

> As for cleanup, bus un-registration is not correct. Scan has failed, 
> which might mean some assumption that bus took for scanning for devices 
> doesn't exist for time being or present platform. Either way, I think 
> whatever rollback needs to be done for scan failure, would be done by 
> the bus->scan() implementation.
> 
> Let me know what you think - I will make changes to the patch and push 
> again.

We may need more opinion here.

Mine is that we should not hide a scan failure.
I would return an error code if any of the scan has failed,
but would process every scans.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-06 13:37         ` Thomas Monjalon
@ 2017-10-06 17:34           ` Jan Blunck
  2017-10-09 11:10             ` Shreyansh Jain
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Blunck @ 2017-10-06 17:34 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Shreyansh Jain, dev, Hemant Agrawal

On Fri, Oct 6, 2017 at 3:37 PM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 06/10/2017 15:12, Shreyansh Jain:
>> On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
>> > 19/09/2017 20:51, Jan Blunck:
>> >> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
>> >>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> >>>
>> >>>
>> >>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>> >>>>
>> >>>> Bus scan is responsible for finding devices over *all* buses.
>> >>>> Some of these buses might not be able to scan but that should
>> >>>> not prevent other buses to be scanned.
>> >>>>
>> >>
>> >> If scanning the bus fails this is signaling an error. In that case we
>> >> might even want to unregister the bus.
>> >
>> > A scan error seems important enough to be reported to the caller.
>> > OK to continue scanning other buses, but an error code should be returned.
>>
>> Isn't that counter intuitive if the scanning continues after error and
>> an error is expected to be returned from it?
>> What if there are more than one error? Which one is reported.
>
> Both are reported with the same code.
> Anyway, there is no way to know which bus is failing,
> except from log.
>

Correct. Also there is no way to handle that failure except for
reporting it to the log in all detail.


>> As for cleanup, bus un-registration is not correct. Scan has failed,
>> which might mean some assumption that bus took for scanning for devices
>> doesn't exist for time being or present platform. Either way, I think
>> whatever rollback needs to be done for scan failure, would be done by
>> the bus->scan() implementation.
>>
>> Let me know what you think - I will make changes to the patch and push
>> again.
>
> We may need more opinion here.
>
> Mine is that we should not hide a scan failure.

Hide scan failures? Do you mean hiding it from the log? I wouldn't do that.

> I would return an error code if any of the scan has failed,
> but would process every scans.

FWIW I agree.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-06 17:34           ` Jan Blunck
@ 2017-10-09 11:10             ` Shreyansh Jain
  2017-10-09 18:21               ` Don Provan
  0 siblings, 1 reply; 19+ messages in thread
From: Shreyansh Jain @ 2017-10-09 11:10 UTC (permalink / raw)
  To: Jan Blunck, Thomas Monjalon; +Cc: dev, Hemant Agrawal

On Friday 06 October 2017 11:04 PM, Jan Blunck wrote:
> On Fri, Oct 6, 2017 at 3:37 PM, Thomas Monjalon <thomas@monjalon.net> wrote:
>> 06/10/2017 15:12, Shreyansh Jain:
>>> On Friday 06 October 2017 04:51 AM, Thomas Monjalon wrote:
>>>> 19/09/2017 20:51, Jan Blunck:
>>>>> On Mon, Sep 18, 2017 at 1:36 PM, Hemant Agrawal <hemant.agrawal@nxp.com> wrote:
>>>>>> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>>>>
>>>>>>
>>>>>> On 8/12/2017 3:52 PM, Shreyansh Jain wrote:
>>>>>>>
>>>>>>> Bus scan is responsible for finding devices over *all* buses.
>>>>>>> Some of these buses might not be able to scan but that should
>>>>>>> not prevent other buses to be scanned.
>>>>>>>
>>>>>
>>>>> If scanning the bus fails this is signaling an error. In that case we
>>>>> might even want to unregister the bus.
>>>>
>>>> A scan error seems important enough to be reported to the caller.
>>>> OK to continue scanning other buses, but an error code should be returned.
>>>
>>> Isn't that counter intuitive if the scanning continues after error and
>>> an error is expected to be returned from it?
>>> What if there are more than one error? Which one is reported.
>>
>> Both are reported with the same code.
>> Anyway, there is no way to know which bus is failing,
>> except from log.
>>
> 
> Correct. Also there is no way to handle that failure except for
> reporting it to the log in all detail.

Even now both, scan and probe, are reporting error to EAL if scan or 
probe fail. This is what you are suggesting, isn't it?

> 
> 
>>> As for cleanup, bus un-registration is not correct. Scan has failed,
>>> which might mean some assumption that bus took for scanning for devices
>>> doesn't exist for time being or present platform. Either way, I think
>>> whatever rollback needs to be done for scan failure, would be done by
>>> the bus->scan() implementation.
>>>
>>> Let me know what you think - I will make changes to the patch and push
>>> again.
>>
>> We may need more opinion here.
>>
>> Mine is that we should not hide a scan failure.
> 
> Hide scan failures? Do you mean hiding it from the log? I wouldn't do that.

I think Thomas was of the opinion to *not* hide scan failure.
Reporting through logs works fine here, I guess.

> 
>> I would return an error code if any of the scan has failed,
>> but would process every scans.
> 
> FWIW I agree.
> 

This is where I have disagreement/doubt.
Reporting error code from rte_bus_scan would do two things:

1. rte_eal_init is not designed to ignore/log-only these errors - it 
would quit initialization. (But, this can be changed)
2. What should rte_eal_init do with this error? rte_bus_scan would have 
already printed the problematic bus->scan() failure.

Also, does it make sense to report error from rte_bus_scan() to 
rte_eal_init() when no buses are identified? Currently that is not 
happening.

-
Shreyansh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-09 11:10             ` Shreyansh Jain
@ 2017-10-09 18:21               ` Don Provan
  2017-10-09 19:34                 ` Thomas Monjalon
  2017-10-10  5:00                 ` Shreyansh Jain
  0 siblings, 2 replies; 19+ messages in thread
From: Don Provan @ 2017-10-09 18:21 UTC (permalink / raw)
  To: Shreyansh Jain, Jan Blunck, Thomas Monjalon; +Cc: dev, Hemant Agrawal

> -----Original Message-----
> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> Sent: Monday, October 09, 2017 4:10 AM
> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon
> <thomas@monjalon.net>
> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
> 
>...
> This is where I have disagreement/doubt.
> Reporting error code from rte_bus_scan would do two things:
> 
> 1. rte_eal_init is not designed to ignore/log-only these errors - it
> would quit initialization. (But, this can be changed)
> 2. What should rte_eal_init do with this error? rte_bus_scan would have
> already printed the problematic bus->scan() failure.

These practical problems confirm to me that the failure of a bus
scan is more of a strategic issue: when asking "which devices can
I use?", "none" is a perfectly valid answer that does not seem
like an error to me even when a failed bus scan is the reason for
that answer.

From the application's point of view, the potential error here
is that the device it wants to use isn't available. I don't see that
either the init function or the probe function will have enough
information to understand that application-level problem, so
they should leave it to the application to detect it.

-don provan
dprovan@bivio.net


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-09 18:21               ` Don Provan
@ 2017-10-09 19:34                 ` Thomas Monjalon
  2017-10-10  5:00                 ` Shreyansh Jain
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Monjalon @ 2017-10-09 19:34 UTC (permalink / raw)
  To: Don Provan; +Cc: Shreyansh Jain, Jan Blunck, dev, Hemant Agrawal

09/10/2017 20:21, Don Provan:
> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> >...
> > This is where I have disagreement/doubt.
> > Reporting error code from rte_bus_scan would do two things:
> > 
> > 1. rte_eal_init is not designed to ignore/log-only these errors - it
> > would quit initialization. (But, this can be changed)
> > 2. What should rte_eal_init do with this error? rte_bus_scan would have
> > already printed the problematic bus->scan() failure.
> 
> These practical problems confirm to me that the failure of a bus
> scan is more of a strategic issue: when asking "which devices can
> I use?", "none" is a perfectly valid answer that does not seem
> like an error to me even when a failed bus scan is the reason for
> that answer.
> 
> From the application's point of view, the potential error here
> is that the device it wants to use isn't available. I don't see that
> either the init function or the probe function will have enough
> information to understand that application-level problem, so
> they should leave it to the application to detect it.

Thank you Don. I think you convinced me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-09 18:21               ` Don Provan
  2017-10-09 19:34                 ` Thomas Monjalon
@ 2017-10-10  5:00                 ` Shreyansh Jain
  2017-10-10 16:00                   ` Aaron Conole
  2017-10-11  0:03                   ` Don Provan
  1 sibling, 2 replies; 19+ messages in thread
From: Shreyansh Jain @ 2017-10-10  5:00 UTC (permalink / raw)
  To: Don Provan, Jan Blunck, Thomas Monjalon; +Cc: dev, Hemant Agrawal

Hello Don,

On Monday 09 October 2017 11:51 PM, Don Provan wrote:
>> -----Original Message-----
>> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
>> Sent: Monday, October 09, 2017 4:10 AM
>> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon
>> <thomas@monjalon.net>
>> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
>> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
>>
>> ...
>> This is where I have disagreement/doubt.
>> Reporting error code from rte_bus_scan would do two things:
>>
>> 1. rte_eal_init is not designed to ignore/log-only these errors - it
>> would quit initialization. (But, this can be changed)
>> 2. What should rte_eal_init do with this error? rte_bus_scan would have
>> already printed the problematic bus->scan() failure.
> 
> These practical problems confirm to me that the failure of a bus
> scan is more of a strategic issue: when asking "which devices can
> I use?", "none" is a perfectly valid answer that does not seem
> like an error to me even when a failed bus scan is the reason for
> that answer.

I agree with this.

> 
>  From the application's point of view, the potential error here
> is that the device it wants to use isn't available. I don't see that
> either the init function or the probe function will have enough
> information to understand that application-level problem, so
> they should leave it to the application to detect it.

I think I understand you comment but just want to cross check again:
Scan or probe error should simply be ignored by EAL layer and let the 
application take stance when it detects that the device it was looking 
for is missing. Is my understanding correct?

I am trying to come a conclusion so that this patch can either be 
modified or pushed as it is. If the above understanding is correct, I 
don't see any changes required in the patch.

> 
> -don provan
> dprovan@bivio.net
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-10  5:00                 ` Shreyansh Jain
@ 2017-10-10 16:00                   ` Aaron Conole
  2017-10-11 22:34                     ` Thomas Monjalon
  2017-10-12  5:39                     ` Shreyansh Jain
  2017-10-11  0:03                   ` Don Provan
  1 sibling, 2 replies; 19+ messages in thread
From: Aaron Conole @ 2017-10-10 16:00 UTC (permalink / raw)
  To: Shreyansh Jain
  Cc: Don Provan, Jan Blunck, Thomas Monjalon, dev, Hemant Agrawal

Shreyansh Jain <shreyansh.jain@nxp.com> writes:

> Hello Don,
>
> On Monday 09 October 2017 11:51 PM, Don Provan wrote:
>>> -----Original Message-----
>>> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
>>> Sent: Monday, October 09, 2017 4:10 AM
>>> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon
>>> <thomas@monjalon.net>
>>> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
>>> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
>>>
>>> ...
>>> This is where I have disagreement/doubt.
>>> Reporting error code from rte_bus_scan would do two things:
>>>
>>> 1. rte_eal_init is not designed to ignore/log-only these errors - it
>>> would quit initialization. (But, this can be changed)
>>> 2. What should rte_eal_init do with this error? rte_bus_scan would have
>>> already printed the problematic bus->scan() failure.
>>
>> These practical problems confirm to me that the failure of a bus
>> scan is more of a strategic issue: when asking "which devices can
>> I use?", "none" is a perfectly valid answer that does not seem
>> like an error to me even when a failed bus scan is the reason for
>> that answer.
>
> I agree with this.
>
>>
>>  From the application's point of view, the potential error here
>> is that the device it wants to use isn't available. I don't see that
>> either the init function or the probe function will have enough
>> information to understand that application-level problem, so
>> they should leave it to the application to detect it.
>
> I think I understand you comment but just want to cross check again:
> Scan or probe error should simply be ignored by EAL layer and let the
> application take stance when it detects that the device it was looking
> for is missing. Is my understanding correct?
>
> I am trying to come a conclusion so that this patch can either be
> modified or pushed as it is. If the above understanding is correct, I
> don't see any changes required in the patch.

Does it make sense to introduce a way to query the results of the
various bus types for their status?  That way we can give the relevant
information to the application if it wants, and make the bus scanning
code *always* succeed?  This version shouldn't be an ABI breakage,
either (confirm?).

half-baked below (not tested or suitable - just an example):

---
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index a30a898..cd1ef1e 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -38,9 +38,23 @@
 
 #include "eal_private.h"
 
+struct rte_bus_failure {
+	struct rte_bus *bus;
+	int err;
+};
+
 struct rte_bus_list rte_bus_list =
 	TAILQ_HEAD_INITIALIZER(rte_bus_list);
 
+TAILQ_HEAD(rte_bus_scan_failure_list, rte_bus_failure);
+struct rte_bus_scan_failure_list rte_bus_scan_failure_list =
+	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
+
+TAILQ_HEAD(rte_bus_probe_failure_list, rte_bus_failure);
+struct rte_bus_probe_failure_list rte_bus_probe_failure_list =
+	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
+
+
 void
 rte_bus_register(struct rte_bus *bus)
 {
@@ -64,6 +78,26 @@ rte_bus_unregister(struct rte_bus *bus)
 	RTE_LOG(DEBUG, EAL, "Unregistered [%s] bus.\n", bus->name);
 }
 
+static void
+rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
+{
+	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
+	if (!f) abort();
+	f->bus = bus;
+	f->ret = ret;
+	TAILQ_INSERT_TAIL(&rte_bus_scan_failure_list, f, next);
+}
+
+static void
+rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
+{
+	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
+	if (!f) abort();
+	f->bus = bus;
+	f->ret = ret;
+	TAILQ_INSERT_TAIL(&rte_bus_probe_failure_list, f, next);
+}
+
 /* Scan all the buses for registered devices */
 int
 rte_bus_scan(void)
@@ -76,13 +110,33 @@ rte_bus_scan(void)
 		if (ret) {
 			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
 				bus->name);
-			return ret;
+			rte_bus_append_failed_scan(bus, ret);
 		}
 	}
 
 	return 0;
 }
 
+/* Seek through scan failures */
+void
+rte_bus_scan_errors(rte_bus_error_callback cb)
+{
+	struct rte_bus_failure *f = NULL;
+	TAILQ_FOREACH(f, &rte_bus_scan_failure_list, next) {
+		cb(f->bus, f->ret);
+	}
+}
+
+/* Seek through probe failures */
+void
+rte_bus_probe_errors(rte_bus_error_callback cb)
+{
+	struct rte_bus_failure *f = NULL;
+	TAILQ_FOREACH(f, &rte_bus_probe_failure_list, next) {
+		cb(f->bus, f->ret);
+	}
+}
+
 /* Probe all devices of all buses */
 int
 rte_bus_probe(void)
@@ -100,7 +154,7 @@ rte_bus_probe(void)
 		if (ret) {
 			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
 				bus->name);
-			return ret;
+            rte_bus_append_failed_probe(bus, ret);
 		}
 	}
 
@@ -109,7 +163,7 @@ rte_bus_probe(void)
 		if (ret) {
 			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
 				vbus->name);
-			return ret;
+            rte_bus_append_failed_probe(bus, ret);
 		}
 	}
 
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..daddb28 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -231,6 +231,20 @@ void rte_bus_register(struct rte_bus *bus);
  */
 void rte_bus_unregister(struct rte_bus *bus);
 
+typedef void (*rte_bus_error_callback)(struct rte_bus *bus, int err);
+
+/**
+ * Search through all buses, invoking cb for each bus which reports scan
+ * error.
+ */
+void rte_bus_scan_errors(rte_bus_error_callback cb);
+
+/**
+ * Search through all buses, invoking cb for each bus which reports scan
+ * error.
+ */
+void rte_bus_probe_errors(rte_bus_error_callback cb);
+
 /**
  * Scan all the buses.
  *
-- 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-10  5:00                 ` Shreyansh Jain
  2017-10-10 16:00                   ` Aaron Conole
@ 2017-10-11  0:03                   ` Don Provan
  2017-10-11 22:32                     ` Thomas Monjalon
  1 sibling, 1 reply; 19+ messages in thread
From: Don Provan @ 2017-10-11  0:03 UTC (permalink / raw)
  To: Shreyansh Jain, Jan Blunck, Thomas Monjalon; +Cc: dev, Hemant Agrawal

> -----Original Message-----
> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> Sent: Monday, October 09, 2017 10:01 PM
> To: Don Provan <dprovan@bivio.net>; Jan Blunck <jblunck@infradead.org>;
> Thomas Monjalon <thomas@monjalon.net>
> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
> 
> ...
>
> >  From the application's point of view, the potential error here
> > is that the device it wants to use isn't available. I don't see that
> > either the init function or the probe function will have enough
> > information to understand that application-level problem, so
> > they should leave it to the application to detect it.
> 
> I think I understand you comment but just want to cross check again:
> Scan or probe error should simply be ignored by EAL layer and let the
> application take stance when it detects that the device it was looking
> for is missing. Is my understanding correct?
> 
> I am trying to come a conclusion so that this patch can either be
> modified or pushed as it is. If the above understanding is correct, I
> don't see any changes required in the patch.

Yes, I agree my comments support the patch as is.
-don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-11  0:03                   ` Don Provan
@ 2017-10-11 22:32                     ` Thomas Monjalon
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Monjalon @ 2017-10-11 22:32 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, Don Provan, Jan Blunck, Hemant Agrawal

11/10/2017 02:03, Don Provan:
> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> > 
> > ...
> >
> > >  From the application's point of view, the potential error here
> > > is that the device it wants to use isn't available. I don't see that
> > > either the init function or the probe function will have enough
> > > information to understand that application-level problem, so
> > > they should leave it to the application to detect it.
> > 
> > I think I understand you comment but just want to cross check again:
> > Scan or probe error should simply be ignored by EAL layer and let the
> > application take stance when it detects that the device it was looking
> > for is missing. Is my understanding correct?
> > 
> > I am trying to come a conclusion so that this patch can either be
> > modified or pushed as it is. If the above understanding is correct, I
> > don't see any changes required in the patch.
> 
> Yes, I agree my comments support the patch as is.
> -don

Applied, thanks for the discussion

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-10 16:00                   ` Aaron Conole
@ 2017-10-11 22:34                     ` Thomas Monjalon
  2017-10-12 13:08                       ` Aaron Conole
  2017-10-12  5:39                     ` Shreyansh Jain
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Monjalon @ 2017-10-11 22:34 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev, Shreyansh Jain, Don Provan, Jan Blunck, Hemant Agrawal

10/10/2017 18:00, Aaron Conole:
> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
> 
> > Hello Don,
> >
> > On Monday 09 October 2017 11:51 PM, Don Provan wrote:
> >>> -----Original Message-----
> >>> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
> >>> Sent: Monday, October 09, 2017 4:10 AM
> >>> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon
> >>> <thomas@monjalon.net>
> >>> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
> >>> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
> >>>
> >>> ...
> >>> This is where I have disagreement/doubt.
> >>> Reporting error code from rte_bus_scan would do two things:
> >>>
> >>> 1. rte_eal_init is not designed to ignore/log-only these errors - it
> >>> would quit initialization. (But, this can be changed)
> >>> 2. What should rte_eal_init do with this error? rte_bus_scan would have
> >>> already printed the problematic bus->scan() failure.
> >>
> >> These practical problems confirm to me that the failure of a bus
> >> scan is more of a strategic issue: when asking "which devices can
> >> I use?", "none" is a perfectly valid answer that does not seem
> >> like an error to me even when a failed bus scan is the reason for
> >> that answer.
> >
> > I agree with this.
> >
> >>
> >>  From the application's point of view, the potential error here
> >> is that the device it wants to use isn't available. I don't see that
> >> either the init function or the probe function will have enough
> >> information to understand that application-level problem, so
> >> they should leave it to the application to detect it.
> >
> > I think I understand you comment but just want to cross check again:
> > Scan or probe error should simply be ignored by EAL layer and let the
> > application take stance when it detects that the device it was looking
> > for is missing. Is my understanding correct?
> >
> > I am trying to come a conclusion so that this patch can either be
> > modified or pushed as it is. If the above understanding is correct, I
> > don't see any changes required in the patch.
> 
> Does it make sense to introduce a way to query the results of the
> various bus types for their status?  That way we can give the relevant
> information to the application if it wants, and make the bus scanning
> code *always* succeed?  This version shouldn't be an ABI breakage,
> either (confirm?).
> 
> half-baked below (not tested or suitable - just an example):

We are going to need notification callbacks for scan and probe anyway.
I think errors could be also notified with callbacks?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-10 16:00                   ` Aaron Conole
  2017-10-11 22:34                     ` Thomas Monjalon
@ 2017-10-12  5:39                     ` Shreyansh Jain
  2017-10-12 13:20                       ` Aaron Conole
  1 sibling, 1 reply; 19+ messages in thread
From: Shreyansh Jain @ 2017-10-12  5:39 UTC (permalink / raw)
  To: Aaron Conole; +Cc: Don Provan, Jan Blunck, Thomas Monjalon, dev, Hemant Agrawal

Hello Aaron,

On Tuesday 10 October 2017 09:30 PM, Aaron Conole wrote:
> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
> 
>> Hello Don,
>>

[snip]

>>>
>>> These practical problems confirm to me that the failure of a bus
>>> scan is more of a strategic issue: when asking "which devices can
>>> I use?", "none" is a perfectly valid answer that does not seem
>>> like an error to me even when a failed bus scan is the reason for
>>> that answer.
>>
>> I agree with this.
>>
>>>
>>>   From the application's point of view, the potential error here
>>> is that the device it wants to use isn't available. I don't see that
>>> either the init function or the probe function will have enough
>>> information to understand that application-level problem, so
>>> they should leave it to the application to detect it.
>>
>> I think I understand you comment but just want to cross check again:
>> Scan or probe error should simply be ignored by EAL layer and let the
>> application take stance when it detects that the device it was looking
>> for is missing. Is my understanding correct?
>>
>> I am trying to come a conclusion so that this patch can either be
>> modified or pushed as it is. If the above understanding is correct, I
>> don't see any changes required in the patch.
> 
> Does it make sense to introduce a way to query the results of the
> various bus types for their status?  That way we can give the relevant
> information to the application if it wants, and make the bus scanning
> code *always* succeed?  This version shouldn't be an ABI breakage,
> either (confirm?).
> 
> half-baked below (not tested or suitable - just an example):
> 
> ---
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index a30a898..cd1ef1e 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -38,9 +38,23 @@
>   
>   #include "eal_private.h"
>   
> +struct rte_bus_failure {
> +	struct rte_bus *bus;
> +	int err;
> +};
> +
>   struct rte_bus_list rte_bus_list =
>   	TAILQ_HEAD_INITIALIZER(rte_bus_list);
>   
> +TAILQ_HEAD(rte_bus_scan_failure_list, rte_bus_failure);
> +struct rte_bus_scan_failure_list rte_bus_scan_failure_list =
> +	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
> +
> +TAILQ_HEAD(rte_bus_probe_failure_list, rte_bus_failure);
> +struct rte_bus_probe_failure_list rte_bus_probe_failure_list =
> +	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
> +
> +
>   void
>   rte_bus_register(struct rte_bus *bus)
>   {
> @@ -64,6 +78,26 @@ rte_bus_unregister(struct rte_bus *bus)
>   	RTE_LOG(DEBUG, EAL, "Unregistered [%s] bus.\n", bus->name);
>   }
>   
> +static void
> +rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
> +{
> +	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
> +	if (!f) abort();
> +	f->bus = bus;
> +	f->ret = ret;
> +	TAILQ_INSERT_TAIL(&rte_bus_scan_failure_list, f, next);
> +}
> +
> +static void
> +rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
> +{
> +	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
> +	if (!f) abort();
> +	f->bus = bus;
> +	f->ret = ret;
> +	TAILQ_INSERT_TAIL(&rte_bus_probe_failure_list, f, next);
> +}
> +
>   /* Scan all the buses for registered devices */
>   int
>   rte_bus_scan(void)
> @@ -76,13 +110,33 @@ rte_bus_scan(void)
>   		if (ret) {
>   			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>   				bus->name);
> -			return ret;
> +			rte_bus_append_failed_scan(bus, ret);
>   		}
>   	}
>   
>   	return 0;
>   }
>   
> +/* Seek through scan failures */
> +void
> +rte_bus_scan_errors(rte_bus_error_callback cb)
> +{
> +	struct rte_bus_failure *f = NULL;
> +	TAILQ_FOREACH(f, &rte_bus_scan_failure_list, next) {
> +		cb(f->bus, f->ret);
> +	}
> +}
> +
> +/* Seek through probe failures */
> +void
> +rte_bus_probe_errors(rte_bus_error_callback cb)
> +{
> +	struct rte_bus_failure *f = NULL;
> +	TAILQ_FOREACH(f, &rte_bus_probe_failure_list, next) {
> +		cb(f->bus, f->ret);
> +	}
> +}
> +
>   /* Probe all devices of all buses */
>   int
>   rte_bus_probe(void)
> @@ -100,7 +154,7 @@ rte_bus_probe(void)
>   		if (ret) {
>   			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>   				bus->name);
> -			return ret;
> +            rte_bus_append_failed_probe(bus, ret);
>   		}
>   	}
>   
> @@ -109,7 +163,7 @@ rte_bus_probe(void)
>   		if (ret) {
>   			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>   				vbus->name);
> -			return ret;
> +            rte_bus_append_failed_probe(bus, ret);
>   		}
>   	}
>   
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6fb0834..daddb28 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -231,6 +231,20 @@ void rte_bus_register(struct rte_bus *bus);
>    */
>   void rte_bus_unregister(struct rte_bus *bus);
>   
> +typedef void (*rte_bus_error_callback)(struct rte_bus *bus, int err);
> +
> +/**
> + * Search through all buses, invoking cb for each bus which reports scan
> + * error.
> + */
> +void rte_bus_scan_errors(rte_bus_error_callback cb);
> +
> +/**
> + * Search through all buses, invoking cb for each bus which reports scan
> + * error.
> + */
> +void rte_bus_probe_errors(rte_bus_error_callback cb);
> +
>   /**
>    * Scan all the buses.
>    *
> 

I am assuming that that aim of this is to have a way so that application 
can query whether its device of interest is there or not. But, I think 
this (creating a list of scan errrors) would be overkill.

Even if we were to create a list of errors from scan/probe, how would 
that help an application? Is there some specific use-case that you are 
hinting at?

Application should worry about devices rather than how they are being 
detected (scan/probe etc). Application can use API like 
rte_eth_dev_get_port_by_name to query its specific device of interest. 
If the scan has failed, this API would be sufficient for the application 
to take counter-measures. Isn't that enough from a DPDK application 
perspective to move from init to I/O?

I am not discounting that there might be some higher use-cases where 
this list might come of us - but I can't think of one right now and I 
can't comment on this proposal in absence of that understanding - sorry.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-11 22:34                     ` Thomas Monjalon
@ 2017-10-12 13:08                       ` Aaron Conole
  0 siblings, 0 replies; 19+ messages in thread
From: Aaron Conole @ 2017-10-12 13:08 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Shreyansh Jain, Don Provan, Jan Blunck, Hemant Agrawal

Thomas Monjalon <thomas@monjalon.net> writes:

> 10/10/2017 18:00, Aaron Conole:
>> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
>> 
>> > Hello Don,
>> >
>> > On Monday 09 October 2017 11:51 PM, Don Provan wrote:
>> >>> -----Original Message-----
>> >>> From: Shreyansh Jain [mailto:shreyansh.jain@nxp.com]
>> >>> Sent: Monday, October 09, 2017 4:10 AM
>> >>> To: Jan Blunck <jblunck@infradead.org>; Thomas Monjalon
>> >>> <thomas@monjalon.net>
>> >>> Cc: dev <dev@dpdk.org>; Hemant Agrawal <hemant.agrawal@nxp.com>
>> >>> Subject: Re: [dpdk-dev] [PATCH] eal: bus scan and probe never fail
>> >>>
>> >>> ...
>> >>> This is where I have disagreement/doubt.
>> >>> Reporting error code from rte_bus_scan would do two things:
>> >>>
>> >>> 1. rte_eal_init is not designed to ignore/log-only these errors - it
>> >>> would quit initialization. (But, this can be changed)
>> >>> 2. What should rte_eal_init do with this error? rte_bus_scan would have
>> >>> already printed the problematic bus->scan() failure.
>> >>
>> >> These practical problems confirm to me that the failure of a bus
>> >> scan is more of a strategic issue: when asking "which devices can
>> >> I use?", "none" is a perfectly valid answer that does not seem
>> >> like an error to me even when a failed bus scan is the reason for
>> >> that answer.
>> >
>> > I agree with this.
>> >
>> >>
>> >>  From the application's point of view, the potential error here
>> >> is that the device it wants to use isn't available. I don't see that
>> >> either the init function or the probe function will have enough
>> >> information to understand that application-level problem, so
>> >> they should leave it to the application to detect it.
>> >
>> > I think I understand you comment but just want to cross check again:
>> > Scan or probe error should simply be ignored by EAL layer and let the
>> > application take stance when it detects that the device it was looking
>> > for is missing. Is my understanding correct?
>> >
>> > I am trying to come a conclusion so that this patch can either be
>> > modified or pushed as it is. If the above understanding is correct, I
>> > don't see any changes required in the patch.
>> 
>> Does it make sense to introduce a way to query the results of the
>> various bus types for their status?  That way we can give the relevant
>> information to the application if it wants, and make the bus scanning
>> code *always* succeed?  This version shouldn't be an ABI breakage,
>> either (confirm?).
>> 
>> half-baked below (not tested or suitable - just an example):
>
> We are going to need notification callbacks for scan and probe anyway.
> I think errors could be also notified with callbacks?

Definitely.  That's part of my half-baked patch.  Call the error check
function and get a callback.  There's probably a better way to do it
than my patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-12  5:39                     ` Shreyansh Jain
@ 2017-10-12 13:20                       ` Aaron Conole
  2017-10-12 14:23                         ` Shreyansh Jain
  0 siblings, 1 reply; 19+ messages in thread
From: Aaron Conole @ 2017-10-12 13:20 UTC (permalink / raw)
  To: Shreyansh Jain
  Cc: Don Provan, Jan Blunck, Thomas Monjalon, dev, Hemant Agrawal

Shreyansh Jain <shreyansh.jain@nxp.com> writes:

> Hello Aaron,
>
> On Tuesday 10 October 2017 09:30 PM, Aaron Conole wrote:
>> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
>>
>>> Hello Don,
>>>
>
> [snip]
>
>>>>
>>>> These practical problems confirm to me that the failure of a bus
>>>> scan is more of a strategic issue: when asking "which devices can
>>>> I use?", "none" is a perfectly valid answer that does not seem
>>>> like an error to me even when a failed bus scan is the reason for
>>>> that answer.
>>>
>>> I agree with this.
>>>
>>>>
>>>>   From the application's point of view, the potential error here
>>>> is that the device it wants to use isn't available. I don't see that
>>>> either the init function or the probe function will have enough
>>>> information to understand that application-level problem, so
>>>> they should leave it to the application to detect it.
>>>
>>> I think I understand you comment but just want to cross check again:
>>> Scan or probe error should simply be ignored by EAL layer and let the
>>> application take stance when it detects that the device it was looking
>>> for is missing. Is my understanding correct?
>>>
>>> I am trying to come a conclusion so that this patch can either be
>>> modified or pushed as it is. If the above understanding is correct, I
>>> don't see any changes required in the patch.
>>
>> Does it make sense to introduce a way to query the results of the
>> various bus types for their status?  That way we can give the relevant
>> information to the application if it wants, and make the bus scanning
>> code *always* succeed?  This version shouldn't be an ABI breakage,
>> either (confirm?).
>>
>> half-baked below (not tested or suitable - just an example):
>>
>> ---
>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>> index a30a898..cd1ef1e 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -38,9 +38,23 @@
>>     #include "eal_private.h"
>>   +struct rte_bus_failure {
>> +	struct rte_bus *bus;
>> +	int err;
>> +};
>> +
>>   struct rte_bus_list rte_bus_list =
>>   	TAILQ_HEAD_INITIALIZER(rte_bus_list);
>>   +TAILQ_HEAD(rte_bus_scan_failure_list, rte_bus_failure);
>> +struct rte_bus_scan_failure_list rte_bus_scan_failure_list =
>> +	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
>> +
>> +TAILQ_HEAD(rte_bus_probe_failure_list, rte_bus_failure);
>> +struct rte_bus_probe_failure_list rte_bus_probe_failure_list =
>> +	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
>> +
>> +
>>   void
>>   rte_bus_register(struct rte_bus *bus)
>>   {
>> @@ -64,6 +78,26 @@ rte_bus_unregister(struct rte_bus *bus)
>>   	RTE_LOG(DEBUG, EAL, "Unregistered [%s] bus.\n", bus->name);
>>   }
>>   +static void
>> +rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
>> +{
>> +	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
>> +	if (!f) abort();
>> +	f->bus = bus;
>> +	f->ret = ret;
>> +	TAILQ_INSERT_TAIL(&rte_bus_scan_failure_list, f, next);
>> +}
>> +
>> +static void
>> +rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
>> +{
>> +	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
>> +	if (!f) abort();
>> +	f->bus = bus;
>> +	f->ret = ret;
>> +	TAILQ_INSERT_TAIL(&rte_bus_probe_failure_list, f, next);
>> +}
>> +
>>   /* Scan all the buses for registered devices */
>>   int
>>   rte_bus_scan(void)
>> @@ -76,13 +110,33 @@ rte_bus_scan(void)
>>   		if (ret) {
>>   			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>>   				bus->name);
>> -			return ret;
>> +			rte_bus_append_failed_scan(bus, ret);
>>   		}
>>   	}
>>     	return 0;
>>   }
>>   +/* Seek through scan failures */
>> +void
>> +rte_bus_scan_errors(rte_bus_error_callback cb)
>> +{
>> +	struct rte_bus_failure *f = NULL;
>> +	TAILQ_FOREACH(f, &rte_bus_scan_failure_list, next) {
>> +		cb(f->bus, f->ret);
>> +	}
>> +}
>> +
>> +/* Seek through probe failures */
>> +void
>> +rte_bus_probe_errors(rte_bus_error_callback cb)
>> +{
>> +	struct rte_bus_failure *f = NULL;
>> +	TAILQ_FOREACH(f, &rte_bus_probe_failure_list, next) {
>> +		cb(f->bus, f->ret);
>> +	}
>> +}
>> +
>>   /* Probe all devices of all buses */
>>   int
>>   rte_bus_probe(void)
>> @@ -100,7 +154,7 @@ rte_bus_probe(void)
>>   		if (ret) {
>>   			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>   				bus->name);
>> -			return ret;
>> +            rte_bus_append_failed_probe(bus, ret);
>>   		}
>>   	}
>>   @@ -109,7 +163,7 @@ rte_bus_probe(void)
>>   		if (ret) {
>>   			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>   				vbus->name);
>> -			return ret;
>> +            rte_bus_append_failed_probe(bus, ret);
>>   		}
>>   	}
>>   diff --git a/lib/librte_eal/common/include/rte_bus.h
>> b/lib/librte_eal/common/include/rte_bus.h
>> index 6fb0834..daddb28 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -231,6 +231,20 @@ void rte_bus_register(struct rte_bus *bus);
>>    */
>>   void rte_bus_unregister(struct rte_bus *bus);
>>   +typedef void (*rte_bus_error_callback)(struct rte_bus *bus, int
>> err);
>> +
>> +/**
>> + * Search through all buses, invoking cb for each bus which reports scan
>> + * error.
>> + */
>> +void rte_bus_scan_errors(rte_bus_error_callback cb);
>> +
>> +/**
>> + * Search through all buses, invoking cb for each bus which reports scan
>> + * error.
>> + */
>> +void rte_bus_probe_errors(rte_bus_error_callback cb);
>> +
>>   /**
>>    * Scan all the buses.
>>    *
>>
>
> I am assuming that that aim of this is to have a way so that
> application can query whether its device of interest is there or
> not. But, I think this (creating a list of scan errrors) would be
> overkill.

No.  That can be done through a different query.

> Even if we were to create a list of errors from scan/probe, how would
> that help an application? Is there some specific use-case that you are
> hinting at?

Sure.  Let's assume that due to some permissions problem, /proc/bus/pci
doesn't exist for the application.  The entire PCI bus scan fails.  No
PCI devices are found.

In this case, how can the application even start to understand why the
device is missing?  I don't think parsing logs makes sense.  But if
there's a way to see that the PCI bus scan/probe failed, maybe the
application can start making corrective action (for instance, check that
/proc is mounted, and retry the bus probe/scan).

> Application should worry about devices rather than how they are being
> detected (scan/probe etc). Application can use API like
> rte_eth_dev_get_port_by_name to query its specific device of
> interest. If the scan has failed, this API would be sufficient for the
> application to take counter-measures. Isn't that enough from a DPDK
> application perspective to move from init to I/O?

I'm not sure what you're asking here.  I agree that bus probe/scan
shouldn't ever fail, and that we should pass from init to i/o asap.

> I am not discounting that there might be some higher use-cases where
> this list might come of us - but I can't think of one right now and I
> can't comment on this proposal in absence of that understanding -
> sorry.

Maybe the above helps?  Not sure if I described my thinking.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] eal: bus scan and probe never fail
  2017-10-12 13:20                       ` Aaron Conole
@ 2017-10-12 14:23                         ` Shreyansh Jain
  0 siblings, 0 replies; 19+ messages in thread
From: Shreyansh Jain @ 2017-10-12 14:23 UTC (permalink / raw)
  To: Aaron Conole; +Cc: Don Provan, Jan Blunck, Thomas Monjalon, dev, Hemant Agrawal

Hello Aaron,

On Thursday 12 October 2017 06:50 PM, Aaron Conole wrote:
> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
> 
>> Hello Aaron,
>>
>> On Tuesday 10 October 2017 09:30 PM, Aaron Conole wrote:
>>> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
>>>
>>>> Hello Don,
>>>>
>>
>> [snip]
>>
>>>>>

[snip]

>>
>> I am assuming that that aim of this is to have a way so that
>> application can query whether its device of interest is there or
>> not. But, I think this (creating a list of scan errrors) would be
>> overkill.
> 
> No.  That can be done through a different query.

OK. So, aim is to know errors, if any, that might have occurred when 
DPDK scan (just after rte_eal_init) would have occurred.
(Assuming probe is just based on successful scan, lets just ignore that 
for a while.)

> 
>> Even if we were to create a list of errors from scan/probe, how would
>> that help an application? Is there some specific use-case that you are
>> hinting at?
> 
> Sure.  Let's assume that due to some permissions problem, /proc/bus/pci
> doesn't exist for the application.  The entire PCI bus scan fails.  No
> PCI devices are found.

Agree - that is a general scan failure.
It will end up detecting any non-PCI devices which are present. So, lets 
say for this available device tree:

PCI
  |- 0000:00:00.0
  |- 0000:00:02.0
DPAA2
  |- dpni.1
  |- dpni.2
<others>

DPDK scan would detect only DPAA2 devices. PCI devices are absent and no 
port id (post probe) would be assigned to any of them.

> 
> In this case, how can the application even start to understand why the
> device is missing?  I don't think parsing logs makes sense.  But if
> there's a way to see that the PCI bus scan/probe failed, maybe the
> application can start making corrective action (for instance, check that
> /proc is mounted, and retry the bus probe/scan).

See below.

> 
>> Application should worry about devices rather than how they are being
>> detected (scan/probe etc). Application can use API like
>> rte_eth_dev_get_port_by_name to query its specific device of
>> interest. If the scan has failed, this API would be sufficient for the
>> application to take counter-measures. Isn't that enough from a DPDK
>> application perspective to move from init to I/O?
> 
> I'm not sure what you're asking here.  I agree that bus probe/scan
> shouldn't ever fail, and that we should pass from init to i/o asap.

What I had in mind that applications are more concerned about devices 
that it requires than environment issues because of which scan failed.
An application would try and query:

   ret = rte_eth_dev_get_port_by_name("0000:00:00.0")

resulting in an error.
Obviously, at this point it is too late to make changes like you 
suggested ("/proc"...retry bus/scan) - (hotplugging?).
My assumption was that at this point application would take necessary 
action (error, quit) when its devices are not available.

Application should not be worried about 'scan/probe' process - that is 
an internal operation, outcome of which (ports) is what application want.
Again, this is just my opinion.

> 
>> I am not discounting that there might be some higher use-cases where
>> this list might come of us - but I can't think of one right now and I
>> can't comment on this proposal in absence of that understanding -
>> sorry.
> 
> Maybe the above helps?  Not sure if I described my thinking.
> 

I understand your point.
Maybe a wider audience would be better judge of usability of this model. 
I think you should go ahead and propose this a proper patch/RFC.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-10-12 14:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-12 10:22 [PATCH] eal: bus scan and probe never fail Shreyansh Jain
2017-09-18 11:36 ` Hemant Agrawal
2017-09-19 18:51   ` Jan Blunck
2017-10-05 23:21     ` Thomas Monjalon
2017-10-06 13:12       ` Shreyansh Jain
2017-10-06 13:37         ` Thomas Monjalon
2017-10-06 17:34           ` Jan Blunck
2017-10-09 11:10             ` Shreyansh Jain
2017-10-09 18:21               ` Don Provan
2017-10-09 19:34                 ` Thomas Monjalon
2017-10-10  5:00                 ` Shreyansh Jain
2017-10-10 16:00                   ` Aaron Conole
2017-10-11 22:34                     ` Thomas Monjalon
2017-10-12 13:08                       ` Aaron Conole
2017-10-12  5:39                     ` Shreyansh Jain
2017-10-12 13:20                       ` Aaron Conole
2017-10-12 14:23                         ` Shreyansh Jain
2017-10-11  0:03                   ` Don Provan
2017-10-11 22:32                     ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.