All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Conole <aconole@redhat.com>
To: Shreyansh Jain <shreyansh.jain@nxp.com>
Cc: Don Provan <dprovan@bivio.net>,
	Jan Blunck <jblunck@infradead.org>,
	Thomas Monjalon <thomas@monjalon.net>, dev <dev@dpdk.org>,
	Hemant Agrawal <hemant.agrawal@nxp.com>
Subject: Re: [PATCH] eal: bus scan and probe never fail
Date: Thu, 12 Oct 2017 09:20:05 -0400	[thread overview]
Message-ID: <f7tlgkgcxkq.fsf@dhcp-25-97.bos.redhat.com> (raw)
In-Reply-To: <f629eaa4-881e-090b-e4ce-d9afd9d502a1@nxp.com> (Shreyansh Jain's message of "Thu, 12 Oct 2017 11:09:20 +0530")

Shreyansh Jain <shreyansh.jain@nxp.com> writes:

> Hello Aaron,
>
> On Tuesday 10 October 2017 09:30 PM, Aaron Conole wrote:
>> Shreyansh Jain <shreyansh.jain@nxp.com> writes:
>>
>>> Hello Don,
>>>
>
> [snip]
>
>>>>
>>>> These practical problems confirm to me that the failure of a bus
>>>> scan is more of a strategic issue: when asking "which devices can
>>>> I use?", "none" is a perfectly valid answer that does not seem
>>>> like an error to me even when a failed bus scan is the reason for
>>>> that answer.
>>>
>>> I agree with this.
>>>
>>>>
>>>>   From the application's point of view, the potential error here
>>>> is that the device it wants to use isn't available. I don't see that
>>>> either the init function or the probe function will have enough
>>>> information to understand that application-level problem, so
>>>> they should leave it to the application to detect it.
>>>
>>> I think I understand you comment but just want to cross check again:
>>> Scan or probe error should simply be ignored by EAL layer and let the
>>> application take stance when it detects that the device it was looking
>>> for is missing. Is my understanding correct?
>>>
>>> I am trying to come a conclusion so that this patch can either be
>>> modified or pushed as it is. If the above understanding is correct, I
>>> don't see any changes required in the patch.
>>
>> Does it make sense to introduce a way to query the results of the
>> various bus types for their status?  That way we can give the relevant
>> information to the application if it wants, and make the bus scanning
>> code *always* succeed?  This version shouldn't be an ABI breakage,
>> either (confirm?).
>>
>> half-baked below (not tested or suitable - just an example):
>>
>> ---
>> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
>> index a30a898..cd1ef1e 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -38,9 +38,23 @@
>>     #include "eal_private.h"
>>   +struct rte_bus_failure {
>> +	struct rte_bus *bus;
>> +	int err;
>> +};
>> +
>>   struct rte_bus_list rte_bus_list =
>>   	TAILQ_HEAD_INITIALIZER(rte_bus_list);
>>   +TAILQ_HEAD(rte_bus_scan_failure_list, rte_bus_failure);
>> +struct rte_bus_scan_failure_list rte_bus_scan_failure_list =
>> +	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
>> +
>> +TAILQ_HEAD(rte_bus_probe_failure_list, rte_bus_failure);
>> +struct rte_bus_probe_failure_list rte_bus_probe_failure_list =
>> +	TAILQ_HEAD_INITIALIZER(rte_bus_failure);
>> +
>> +
>>   void
>>   rte_bus_register(struct rte_bus *bus)
>>   {
>> @@ -64,6 +78,26 @@ rte_bus_unregister(struct rte_bus *bus)
>>   	RTE_LOG(DEBUG, EAL, "Unregistered [%s] bus.\n", bus->name);
>>   }
>>   +static void
>> +rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
>> +{
>> +	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
>> +	if (!f) abort();
>> +	f->bus = bus;
>> +	f->ret = ret;
>> +	TAILQ_INSERT_TAIL(&rte_bus_scan_failure_list, f, next);
>> +}
>> +
>> +static void
>> +rte_bus_append_failed_scan(struct rte_bus *bus, int ret)
>> +{
>> +	struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure));
>> +	if (!f) abort();
>> +	f->bus = bus;
>> +	f->ret = ret;
>> +	TAILQ_INSERT_TAIL(&rte_bus_probe_failure_list, f, next);
>> +}
>> +
>>   /* Scan all the buses for registered devices */
>>   int
>>   rte_bus_scan(void)
>> @@ -76,13 +110,33 @@ rte_bus_scan(void)
>>   		if (ret) {
>>   			RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n",
>>   				bus->name);
>> -			return ret;
>> +			rte_bus_append_failed_scan(bus, ret);
>>   		}
>>   	}
>>     	return 0;
>>   }
>>   +/* Seek through scan failures */
>> +void
>> +rte_bus_scan_errors(rte_bus_error_callback cb)
>> +{
>> +	struct rte_bus_failure *f = NULL;
>> +	TAILQ_FOREACH(f, &rte_bus_scan_failure_list, next) {
>> +		cb(f->bus, f->ret);
>> +	}
>> +}
>> +
>> +/* Seek through probe failures */
>> +void
>> +rte_bus_probe_errors(rte_bus_error_callback cb)
>> +{
>> +	struct rte_bus_failure *f = NULL;
>> +	TAILQ_FOREACH(f, &rte_bus_probe_failure_list, next) {
>> +		cb(f->bus, f->ret);
>> +	}
>> +}
>> +
>>   /* Probe all devices of all buses */
>>   int
>>   rte_bus_probe(void)
>> @@ -100,7 +154,7 @@ rte_bus_probe(void)
>>   		if (ret) {
>>   			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>   				bus->name);
>> -			return ret;
>> +            rte_bus_append_failed_probe(bus, ret);
>>   		}
>>   	}
>>   @@ -109,7 +163,7 @@ rte_bus_probe(void)
>>   		if (ret) {
>>   			RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n",
>>   				vbus->name);
>> -			return ret;
>> +            rte_bus_append_failed_probe(bus, ret);
>>   		}
>>   	}
>>   diff --git a/lib/librte_eal/common/include/rte_bus.h
>> b/lib/librte_eal/common/include/rte_bus.h
>> index 6fb0834..daddb28 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -231,6 +231,20 @@ void rte_bus_register(struct rte_bus *bus);
>>    */
>>   void rte_bus_unregister(struct rte_bus *bus);
>>   +typedef void (*rte_bus_error_callback)(struct rte_bus *bus, int
>> err);
>> +
>> +/**
>> + * Search through all buses, invoking cb for each bus which reports scan
>> + * error.
>> + */
>> +void rte_bus_scan_errors(rte_bus_error_callback cb);
>> +
>> +/**
>> + * Search through all buses, invoking cb for each bus which reports scan
>> + * error.
>> + */
>> +void rte_bus_probe_errors(rte_bus_error_callback cb);
>> +
>>   /**
>>    * Scan all the buses.
>>    *
>>
>
> I am assuming that that aim of this is to have a way so that
> application can query whether its device of interest is there or
> not. But, I think this (creating a list of scan errrors) would be
> overkill.

No.  That can be done through a different query.

> Even if we were to create a list of errors from scan/probe, how would
> that help an application? Is there some specific use-case that you are
> hinting at?

Sure.  Let's assume that due to some permissions problem, /proc/bus/pci
doesn't exist for the application.  The entire PCI bus scan fails.  No
PCI devices are found.

In this case, how can the application even start to understand why the
device is missing?  I don't think parsing logs makes sense.  But if
there's a way to see that the PCI bus scan/probe failed, maybe the
application can start making corrective action (for instance, check that
/proc is mounted, and retry the bus probe/scan).

> Application should worry about devices rather than how they are being
> detected (scan/probe etc). Application can use API like
> rte_eth_dev_get_port_by_name to query its specific device of
> interest. If the scan has failed, this API would be sufficient for the
> application to take counter-measures. Isn't that enough from a DPDK
> application perspective to move from init to I/O?

I'm not sure what you're asking here.  I agree that bus probe/scan
shouldn't ever fail, and that we should pass from init to i/o asap.

> I am not discounting that there might be some higher use-cases where
> this list might come of us - but I can't think of one right now and I
> can't comment on this proposal in absence of that understanding -
> sorry.

Maybe the above helps?  Not sure if I described my thinking.

  reply	other threads:[~2017-10-12 13:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-12 10:22 [PATCH] eal: bus scan and probe never fail Shreyansh Jain
2017-09-18 11:36 ` Hemant Agrawal
2017-09-19 18:51   ` Jan Blunck
2017-10-05 23:21     ` Thomas Monjalon
2017-10-06 13:12       ` Shreyansh Jain
2017-10-06 13:37         ` Thomas Monjalon
2017-10-06 17:34           ` Jan Blunck
2017-10-09 11:10             ` Shreyansh Jain
2017-10-09 18:21               ` Don Provan
2017-10-09 19:34                 ` Thomas Monjalon
2017-10-10  5:00                 ` Shreyansh Jain
2017-10-10 16:00                   ` Aaron Conole
2017-10-11 22:34                     ` Thomas Monjalon
2017-10-12 13:08                       ` Aaron Conole
2017-10-12  5:39                     ` Shreyansh Jain
2017-10-12 13:20                       ` Aaron Conole [this message]
2017-10-12 14:23                         ` Shreyansh Jain
2017-10-11  0:03                   ` Don Provan
2017-10-11 22:32                     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7tlgkgcxkq.fsf@dhcp-25-97.bos.redhat.com \
    --to=aconole@redhat.com \
    --cc=dev@dpdk.org \
    --cc=dprovan@bivio.net \
    --cc=hemant.agrawal@nxp.com \
    --cc=jblunck@infradead.org \
    --cc=shreyansh.jain@nxp.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.