All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/14] eal: Remove most causes of panic on init
@ 2017-03-22 20:19 Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 01/14] eal: do not panic on cpu detection Aaron Conole
                   ` (14 more replies)
  0 siblings, 15 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

In many cases, it's enough to simply let the application know that the
call to initialize DPDK has failed.  A complete halt can then be
decided by the application based on error returned (and the app could
even attempt a possible re-attempt after some corrective action by the
user or application).

Changes ->v2:
- Audited all "RTE_LOG (" calls that were introduced, and converted
  to "RTE_LOG("
- Added some fprintf(stderr, "") lines to indicate errors before logging
  is initialized
- Removed assignments to errno.
- Changed patch 14/25 to reflect EFAULT, and document in 25/25

Changes ->v3:
- Checkpatch issues in patches 3 (spelling mistake), 9 (issue with leading
  spaces), and 19 (braces around single line statement if-condition)

Changes ->v4:
- Error text cleanup.
- Add a new check around rte_bus_scan(), added during the development of
  this series.

Changes ->v5:
- checkpatch.pl cleanup in patch 02/26
- move rte_errno.h include from patch 15 to patch 02
- remove stdbool.h and use int as return type in patch 06/26

Changes ->v6:
- convert all of the initialization calls to RTE_LOG() to rte_eal_init_alert()
- run through check-git-log and checkpatches
- add Bruce's ack to the series

Changes ->v7:
- Squash a bunch of commits
- Make the corresponding BSD side changes
- refactor the PCI probe failure code to be more explicit in the intent
- Remove most of Bruce's acks (with all the shuffling/changes I think the
  series should be re-evaluated)

Aaron Conole (14):
  eal: do not panic on cpu detection
  eal: do not panic when CPU isn't supported
  eal: do not panic on hugepage info init
  eal: do not panic if parsing args returns error
  eal: do not panic on memzone initialization fails
  eal: set errno when exiting for already called
  eal: do not panic on a number of conditions
  eal: do not panic on timer init failure
  eal: do not panic on interrupt thread init
  eal: do not error if plugins fail to init
  eal: do not panic on PCI failures
  eal: do not panic if vdev init fails
  eal: do not panic when bus probe/scan fails
  rte_eal_init: add info about various error codes

 lib/librte_eal/bsdapp/eal/eal.c                    | 117 +++++++++++++-----
 lib/librte_eal/common/eal_common_cpuflags.c        |  13 +-
 lib/librte_eal/common/eal_common_dev.c             |   5 +-
 lib/librte_eal/common/eal_common_lcore.c           |   7 +-
 lib/librte_eal/common/eal_common_pci.c             |  12 +-
 lib/librte_eal/common/eal_common_tailqs.c          |   3 +-
 .../common/include/generic/rte_cpuflags.h          |   9 ++
 lib/librte_eal/common/include/rte_eal.h            |  27 ++++-
 lib/librte_eal/linuxapp/eal/eal.c                  | 131 +++++++++++++++------
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c    |   9 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |   5 +-
 11 files changed, 256 insertions(+), 82 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v7 01/14] eal: do not panic on cpu detection
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 02/14] eal: do not panic when CPU isn't supported Aaron Conole
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

There may be no way to gracefully recover, but the application
should be notified that a failure happened, rather than completely
aborting.  This allows the user to proceed with a "slow-path" type
solution.

After this change, the EAL CPU NUMA node resolution step can no longer
emit an rte_panic.  This aligns with the code in rte_eal_init, which
expects failures to return an error code.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c          | 14 ++++++++++++--
 lib/librte_eal/common/eal_common_lcore.c |  7 ++++---
 lib/librte_eal/linuxapp/eal/eal.c        | 14 ++++++++++++--
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index ee7c9de..12df127 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -56,6 +56,7 @@
 #include <rte_launch.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
@@ -487,6 +488,12 @@ rte_eal_iopl_init(void)
 	return 0;
 }
 
+static void rte_eal_init_alert(const char *msg)
+{
+	fprintf(stderr, "EAL: FATAL: %s\n", msg);
+	RTE_LOG(ERR, EAL, "%s\n", msg);
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -510,8 +517,11 @@ rte_eal_init(int argc, char **argv)
 	/* set log level as early as possible */
 	rte_set_log_level(internal_config.log_level);
 
-	if (rte_eal_cpu_init() < 0)
-		rte_panic("Cannot detect lcores\n");
+	if (rte_eal_cpu_init() < 0) {
+		rte_eal_init_alert("Cannot detect lcores.");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	fctret = eal_parse_args(argc, argv);
 	if (fctret < 0)
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 2cd4132..84fa0cb 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -83,16 +83,17 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
 		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES)
+		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
 #ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
 			lcore_config[lcore_id].socket_id = 0;
 #else
-			rte_panic("Socket ID (%u) is greater than "
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
 				"RTE_MAX_NUMA_NODES (%d)\n",
 				lcore_config[lcore_id].socket_id,
 				RTE_MAX_NUMA_NODES);
+			return -1;
 #endif
-
+		}
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bf6b818..81692e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -61,6 +61,7 @@
 #include <rte_launch.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
@@ -740,6 +741,12 @@ static int rte_eal_vfio_setup(void)
 }
 #endif
 
+static void rte_eal_init_alert(const char *msg)
+{
+	fprintf(stderr, "EAL: FATAL: %s\n", msg);
+	RTE_LOG(ERR, EAL, "%s\n", msg);
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -767,8 +774,11 @@ rte_eal_init(int argc, char **argv)
 	/* set log level as early as possible */
 	rte_set_log_level(internal_config.log_level);
 
-	if (rte_eal_cpu_init() < 0)
-		rte_panic("Cannot detect lcores\n");
+	if (rte_eal_cpu_init() < 0) {
+		rte_eal_init_alert("Cannot detect lcores.");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	fctret = eal_parse_args(argc, argv);
 	if (fctret < 0)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 02/14] eal: do not panic when CPU isn't supported
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 01/14] eal: do not panic on cpu detection Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-23 13:47   ` Bruce Richardson
  2017-03-22 20:19 ` [PATCH v7 03/14] eal: do not panic on hugepage info init Aaron Conole
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

This adds a new API to check for the eal cpu versions.

It's now possible to gracefully exit the application, or for
applications which support non-dpdk datapaths working in concert with
DPDK datapaths, there no longer is the possibility of exiting for
unsupported CPUs.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                      |  6 +++++-
 lib/librte_eal/common/eal_common_cpuflags.c          | 13 +++++++++++--
 lib/librte_eal/common/include/generic/rte_cpuflags.h |  9 +++++++++
 lib/librte_eal/linuxapp/eal/eal.c                    |  6 +++++-
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 12df127..8ad6157 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -505,7 +505,11 @@ rte_eal_init(int argc, char **argv)
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
 	/* checks if the machine is adequate */
-	rte_cpu_check_supported();
+	if (!rte_cpu_is_supported()) {
+		rte_eal_init_alert("unsupported cpu type.");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (!rte_atomic32_test_and_set(&run_once))
 		return -1;
diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index b5f76f7..9a2d080 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -43,6 +43,13 @@
 void
 rte_cpu_check_supported(void)
 {
+	if (!rte_cpu_is_supported())
+		exit(1);
+}
+
+int
+rte_cpu_is_supported(void)
+{
 	/* This is generated at compile-time by the build system */
 	static const enum rte_cpu_flag_t compile_time_flags[] = {
 			RTE_COMPILE_TIME_CPUFLAGS
@@ -57,14 +64,16 @@ rte_cpu_check_supported(void)
 			fprintf(stderr,
 				"ERROR: CPU feature flag lookup failed with error %d\n",
 				ret);
-			exit(1);
+			return 0;
 		}
 		if (!ret) {
 			fprintf(stderr,
 			        "ERROR: This system does not support \"%s\".\n"
 			        "Please check that RTE_MACHINE is set correctly.\n",
 			        rte_cpu_get_flag_name(compile_time_flags[i]));
-			exit(1);
+			return 0;
 		}
 	}
+
+	return 1;
 }
diff --git a/lib/librte_eal/common/include/generic/rte_cpuflags.h b/lib/librte_eal/common/include/generic/rte_cpuflags.h
index 71321f3..f01624d 100644
--- a/lib/librte_eal/common/include/generic/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/generic/rte_cpuflags.h
@@ -82,4 +82,13 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature);
 void
 rte_cpu_check_supported(void);
 
+/**
+ * This function checks that the currently used CPU supports the CPU features
+ * that were specified at compile time. It is called automatically within the
+ * EAL, so does not need to be used by applications.  This version returns a
+ * result so that decisions may be made (for instance, graceful shutdowns).
+ */
+int
+rte_cpu_is_supported(void);
+
 #endif /* _RTE_CPUFLAGS_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 81692e7..67e4c6f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -759,7 +759,11 @@ rte_eal_init(int argc, char **argv)
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
 	/* checks if the machine is adequate */
-	rte_cpu_check_supported();
+	if (!rte_cpu_is_supported()) {
+		rte_eal_init_alert("unsupported cpu type.");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (!rte_atomic32_test_and_set(&run_once))
 		return -1;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 03/14] eal: do not panic on hugepage info init
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 01/14] eal: do not panic on cpu detection Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 02/14] eal: do not panic when CPU isn't supported Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 04/14] eal: do not panic if parsing args returns error Aaron Conole
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

When attempting to scan hugepages, signal to the eal that an error has
occurred, rather than performing a panic.

If we fail to acquire hugepage information, simply signal an error to
the application.  This clears the run_once counter, allowing the user or
application to take a corrective action and retry.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c                 | 8 ++++++--
 lib/librte_eal/linuxapp/eal/eal.c               | 8 ++++++--
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 9 ++++++---
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 8ad6157..60de9f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -533,8 +533,12 @@ rte_eal_init(int argc, char **argv)
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
-			eal_hugepage_info_init() < 0)
-		rte_panic("Cannot get hugepage information\n");
+			eal_hugepage_info_init() < 0) {
+		rte_eal_init_alert("Cannot get hugepage information.");
+		rte_errno = EACCES;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (internal_config.memory == 0 && internal_config.force_sockets == 0) {
 		if (internal_config.no_hugetlbfs)
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 67e4c6f..161f726 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -791,8 +791,12 @@ rte_eal_init(int argc, char **argv)
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
-			eal_hugepage_info_init() < 0)
-		rte_panic("Cannot get hugepage information\n");
+			eal_hugepage_info_init() < 0) {
+		rte_eal_init_alert("Cannot get hugepage information.");
+		rte_errno = EACCES;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (internal_config.memory == 0 && internal_config.force_sockets == 0) {
 		if (internal_config.no_hugetlbfs)
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 18858e2..7a21e8f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -283,9 +283,12 @@ eal_hugepage_info_init(void)
 	struct dirent *dirent;
 
 	dir = opendir(sys_dir_path);
-	if (dir == NULL)
-		rte_panic("Cannot open directory %s to read system hugepage "
-			  "info\n", sys_dir_path);
+	if (dir == NULL) {
+		RTE_LOG(ERR, EAL,
+			"Cannot open directory %s to read system hugepage info\n",
+			sys_dir_path);
+		return -1;
+	}
 
 	for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
 		struct hugepage_info *hpi;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 04/14] eal: do not panic if parsing args returns error
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (2 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 03/14] eal: do not panic on hugepage info init Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 05/14] eal: do not panic on memzone initialization fails Aaron Conole
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

It's possible that the application could take a corrective action here,
and either prompt the user for different arguments, or at least perform
a better logging.  Exiting this early prevents any useful information
gathering from the application layer.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 8 ++++++--
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 60de9f9..ab34c0d 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -528,8 +528,12 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	fctret = eal_parse_args(argc, argv);
-	if (fctret < 0)
-		exit(1);
+	if (fctret < 0) {
+		rte_eal_init_alert("Invalid 'command line' arguments.");
+		rte_errno = EINVAL;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 161f726..a671ed4 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -785,8 +785,12 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	fctret = eal_parse_args(argc, argv);
-	if (fctret < 0)
-		exit(1);
+	if (fctret < 0) {
+		rte_eal_init_alert("Invalid 'command line' arguments.");
+		rte_errno = EINVAL;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 05/14] eal: do not panic on memzone initialization fails
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (3 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 04/14] eal: do not panic if parsing args returns error Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 06/14] eal: set errno when exiting for already called Aaron Conole
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

When memzone initialization fails, report the error to the calling
application rather than panic().  Without a good way of detaching /
releasing hugepages, at this point the application will have to restart.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 7 +++++--
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index ab34c0d..a71566c 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -569,8 +569,11 @@ rte_eal_init(int argc, char **argv)
 	if (rte_eal_memory_init() < 0)
 		rte_panic("Cannot init memory\n");
 
-	if (rte_eal_memzone_init() < 0)
-		rte_panic("Cannot init memzone\n");
+	if (rte_eal_memzone_init() < 0) {
+		rte_eal_init_alert("Cannot init memzone\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
 
 	if (rte_eal_tailqs_init() < 0)
 		rte_panic("Cannot init tail queues for objects\n");
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index a671ed4..5a92b28 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv)
 	/* the directories are locked during eal_hugepage_info_init */
 	eal_hugedirs_unlock();
 
-	if (rte_eal_memzone_init() < 0)
-		rte_panic("Cannot init memzone\n");
+	if (rte_eal_memzone_init() < 0) {
+		rte_eal_init_alert("Cannot init memzone\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
 
 	if (rte_eal_tailqs_init() < 0)
 		rte_panic("Cannot init tail queues for objects\n");
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 06/14] eal: set errno when exiting for already called
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (4 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 05/14] eal: do not panic on memzone initialization fails Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 07/14] eal: do not panic on a number of conditions Aaron Conole
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 7 +++++--
 lib/librte_eal/linuxapp/eal/eal.c | 5 ++++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index a71566c..ea76d40 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -511,8 +511,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (!rte_atomic32_test_and_set(&run_once))
-		return -1;
+	if (!rte_atomic32_test_and_set(&run_once)) {
+		rte_eal_init_alert("already called initialization.");
+		rte_errno = EALREADY;
+		return -1;
+	}
 
 	thread_id = pthread_self();
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 5a92b28..564cac3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -765,8 +765,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (!rte_atomic32_test_and_set(&run_once))
+	if (!rte_atomic32_test_and_set(&run_once)) {
+		rte_eal_init_alert("already called initialization.");
+		rte_errno = EALREADY;
 		return -1;
+	}
 
 	logid = strrchr(argv[0], '/');
 	logid = strdup(logid ? logid + 1: argv[0]);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 07/14] eal: do not panic on a number of conditions
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (5 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 06/14] eal: set errno when exiting for already called Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 08/14] eal: do not panic on timer init failure Aaron Conole
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

When log initialization fails, it's generally because the fopencookie
failed.  While this is rare in practice, it could happen, and it is
likely because of memory pressure.  So, flag the error, and allow the
user to retry.

Memory init can only fail when access to hugepages (either as primary or
secondary process) fails (and that is usually permissions).  Since the
manner of failure is not reversible, we cannot allow retry.

There are some theoretical racy conditions in the system that _could_
cause early tailq init to fail;  however, no need to panic the
application.  While it can't continue using DPDK, it could make better
alerts to the user.

rte_eal_alarm_init() call uses the linux timerfd framework to create a
poll()-able timer using standard posix file operations.  This could fail
for a few reasons given in the man-pages, but many could be
corrected by the user application.  No need to panic.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c           | 21 +++++++++++++-----
 lib/librte_eal/common/eal_common_tailqs.c |  3 +--
 lib/librte_eal/linuxapp/eal/eal.c         | 37 ++++++++++++++++++++++---------
 3 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index ea76d40..b893dc1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -569,8 +569,11 @@ rte_eal_init(int argc, char **argv)
 
 	rte_config_init();
 
-	if (rte_eal_memory_init() < 0)
-		rte_panic("Cannot init memory\n");
+	if (rte_eal_memory_init() < 0) {
+		rte_eal_init_alert("Cannot init memory\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
 
 	if (rte_eal_memzone_init() < 0) {
 		rte_eal_init_alert("Cannot init memzone\n");
@@ -578,11 +581,17 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_tailqs_init() < 0)
-		rte_panic("Cannot init tail queues for objects\n");
+	if (rte_eal_tailqs_init() < 0) {
+		rte_eal_init_alert("Cannot init tail queues for objects\n");
+		rte_errno = EFAULT;
+		return -1;
+	}
 
-	if (rte_eal_alarm_init() < 0)
-		rte_panic("Cannot init interrupt-handling thread\n");
+	if (rte_eal_alarm_init() < 0) {
+		rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+		/* rte_eal_alarm_init sets rte_errno on failure. */
+		return -1;
+	}
 
 	if (rte_eal_intr_init() < 0)
 		rte_panic("Cannot init interrupt-handling thread\n");
diff --git a/lib/librte_eal/common/eal_common_tailqs.c b/lib/librte_eal/common/eal_common_tailqs.c
index bb08ec8..4f69828 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -188,8 +188,7 @@ rte_eal_tailqs_init(void)
 		if (t->head == NULL) {
 			RTE_LOG(ERR, EAL,
 				"Cannot initialize tailq: %s\n", t->name);
-			/* no need to TAILQ_REMOVE, we are going to panic in
-			 * rte_eal_init() */
+			/* TAILQ_REMOVE not needed, error is already fatal */
 			goto fail;
 		}
 	}
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 564cac3..bfb8260 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -825,19 +825,30 @@ rte_eal_init(int argc, char **argv)
 
 	rte_config_init();
 
-	if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0)
-		rte_panic("Cannot init logs\n");
+	if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0) {
+		rte_eal_init_alert("Cannot init logging.");
+		rte_errno = ENOMEM;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (rte_eal_pci_init() < 0)
 		rte_panic("Cannot init PCI\n");
 
 #ifdef VFIO_PRESENT
-	if (rte_eal_vfio_setup() < 0)
-		rte_panic("Cannot init VFIO\n");
+	if (rte_eal_vfio_setup() < 0) {
+		rte_eal_init_alert("Cannot init VFIO\n");
+		rte_errno = EAGAIN;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 #endif
 
-	if (rte_eal_memory_init() < 0)
-		rte_panic("Cannot init memory\n");
+	if (rte_eal_memory_init() < 0) {
+		rte_eal_init_alert("Cannot init memory\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
 
 	/* the directories are locked during eal_hugepage_info_init */
 	eal_hugedirs_unlock();
@@ -848,11 +859,17 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_tailqs_init() < 0)
-		rte_panic("Cannot init tail queues for objects\n");
+	if (rte_eal_tailqs_init() < 0) {
+		rte_eal_init_alert("Cannot init tail queues for objects\n");
+		rte_errno = EFAULT;
+		return -1;
+	}
 
-	if (rte_eal_alarm_init() < 0)
-		rte_panic("Cannot init interrupt-handling thread\n");
+	if (rte_eal_alarm_init() < 0) {
+		rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+		/* rte_eal_alarm_init sets rte_errno on failure. */
+		return -1;
+	}
 
 	if (rte_eal_timer_init() < 0)
 		rte_panic("Cannot init HPET or TSC timers\n");
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 08/14] eal: do not panic on timer init failure
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (6 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 07/14] eal: do not panic on a number of conditions Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 09/14] eal: do not panic on interrupt thread init Aaron Conole
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

After code inspection, there is no way for eal_timer_init() to fail.  It
simply returns 0 in all cases.  As such, this test could either go-away
or stay here as 'future-proofing'.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 7 +++++--
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index b893dc1..00d607b 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -596,8 +596,11 @@ rte_eal_init(int argc, char **argv)
 	if (rte_eal_intr_init() < 0)
 		rte_panic("Cannot init interrupt-handling thread\n");
 
-	if (rte_eal_timer_init() < 0)
-		rte_panic("Cannot init HPET or TSC timers\n");
+	if (rte_eal_timer_init() < 0) {
+		rte_eal_init_alert("Cannot init HPET or TSC timers\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (rte_eal_pci_init() < 0)
 		rte_panic("Cannot init PCI\n");
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bfb8260..9fb5421 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -871,8 +871,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_timer_init() < 0)
-		rte_panic("Cannot init HPET or TSC timers\n");
+	if (rte_eal_timer_init() < 0) {
+		rte_eal_init_alert("Cannot init HPET or TSC timers\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	eal_check_mem_on_local_socket();
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 09/14] eal: do not panic on interrupt thread init
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (7 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 08/14] eal: do not panic on timer init failure Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 10/14] eal: do not error if plugins fail to init Aaron Conole
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

There could be some confusion as to why the call failed - this change
will always reflect the value of the error in rte_error.

When initializing the interrupt thread, there are a number of possible
reasons for failure - some of which are correctable by the application.
Do not panic() needlessly, and give the application a change to reflect
this information to the user.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c              | 6 ++++--
 lib/librte_eal/linuxapp/eal/eal.c            | 6 ++++--
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 ++++-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 00d607b..e9a5d93 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -593,8 +593,10 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_intr_init() < 0)
-		rte_panic("Cannot init interrupt-handling thread\n");
+	if (rte_eal_intr_init() < 0) {
+		rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+		return -1;
+	}
 
 	if (rte_eal_timer_init() < 0) {
 		rte_eal_init_alert("Cannot init HPET or TSC timers\n");
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 9fb5421..ecf567a 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -890,8 +890,10 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, (int)thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_eal_intr_init() < 0)
-		rte_panic("Cannot init interrupt-handling thread\n");
+	if (rte_eal_intr_init() < 0) {
+		rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+		return -1;
+	}
 
 	if (rte_bus_scan())
 		rte_panic("Cannot scan the buses for devices\n");
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 92a19cb..5bb833e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -898,13 +898,16 @@ rte_eal_intr_init(void)
 	 * create a pipe which will be waited by epoll and notified to
 	 * rebuild the wait list of epoll.
 	 */
-	if (pipe(intr_pipe.pipefd) < 0)
+	if (pipe(intr_pipe.pipefd) < 0) {
+		rte_errno = errno;
 		return -1;
+	}
 
 	/* create the host thread to wait/handle the interrupt */
 	ret = pthread_create(&intr_thread, NULL,
 			eal_intr_thread_main, NULL);
 	if (ret != 0) {
+		rte_errno = ret;
 		RTE_LOG(ERR, EAL,
 			"Failed to create thread for interrupt handling\n");
 	} else {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 10/14] eal: do not error if plugins fail to init
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (8 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 09/14] eal: do not panic on interrupt thread init Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 11/14] eal: do not panic on PCI failures Aaron Conole
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

Plugins are useful and important.  However, it seems crazy to abort
everything just because they don't initialize properly.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 2 +-
 lib/librte_eal/linuxapp/eal/eal.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index e9a5d93..7c6dd4e 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -610,7 +610,7 @@ rte_eal_init(int argc, char **argv)
 	eal_check_mem_on_local_socket();
 
 	if (eal_plugins_init() < 0)
-		rte_panic("Cannot init plugins\n");
+		rte_eal_init_alert("Cannot init plugins\n");
 
 	eal_thread_init_master(rte_config.master_lcore);
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ecf567a..b2a9005 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -880,7 +880,7 @@ rte_eal_init(int argc, char **argv)
 	eal_check_mem_on_local_socket();
 
 	if (eal_plugins_init() < 0)
-		rte_panic("Cannot init plugins\n");
+		rte_eal_init_alert("Cannot init plugins\n");
 
 	eal_thread_init_master(rte_config.master_lcore);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 11/14] eal: do not panic on PCI failures
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (9 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 10/14] eal: do not error if plugins fail to init Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 12/14] eal: do not panic if vdev init fails Aaron Conole
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

Some devices may be inaccessible for a variety of reasons, or the
PCI-bus may be unavailable causing the whole thing to fail.  Still,
better to continue attempts at probes.

Since PCI isn't neccessarily required, it may be possible to simply log
the error and continue on letting the user check the logs and restart
the application when things have failed.

This will usually be an issue because of permissions.  However, it could
also be caused by OOM.  In either case, errno will contain the
underlying cause.

For linux, it is safe to re-init the system here, so allow the
application to take corrective action and reinit.

For BSD, this is not the case, for other reasons, including hugepage
allocation has already happened, and needs to be properly uninitialized.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c        | 15 +++++++++++----
 lib/librte_eal/common/eal_common_pci.c | 12 +++++++++---
 lib/librte_eal/linuxapp/eal/eal.c      | 15 +++++++++++----
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 7c6dd4e..75ddf31 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -604,8 +604,12 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_pci_init() < 0)
-		rte_panic("Cannot init PCI\n");
+	if (rte_eal_pci_init() < 0) {
+		rte_eal_init_alert("Cannot init PCI\n");
+		rte_errno = EPROTO;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	eal_check_mem_on_local_socket();
 
@@ -660,8 +664,11 @@ rte_eal_init(int argc, char **argv)
 		rte_panic("Cannot probe devices\n");
 
 	/* Probe & Initialize PCI devices */
-	if (rte_eal_pci_probe())
-		rte_panic("Cannot probe PCI\n");
+	if (rte_eal_pci_probe()) {
+		rte_eal_init_alert("Cannot probe PCI\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (rte_eal_dev_init() < 0)
 		rte_panic("Cannot init pmd devices\n");
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 72547bd..d45b7d3 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -69,6 +69,7 @@
 #include <sys/queue.h>
 #include <sys/mman.h>
 
+#include <rte_errno.h>
 #include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
@@ -414,6 +415,7 @@ int
 rte_eal_pci_probe(void)
 {
 	struct rte_pci_device *dev = NULL;
+	size_t probed = 0, failed = 0;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
 	int ret = 0;
@@ -422,6 +424,7 @@ rte_eal_pci_probe(void)
 		probe_all = 1;
 
 	TAILQ_FOREACH(dev, &pci_device_list, next) {
+		probed++;
 
 		/* set devargs in PCI structure */
 		devargs = pci_devargs_lookup(dev);
@@ -434,13 +437,16 @@ rte_eal_pci_probe(void)
 		else if (devargs != NULL &&
 			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
 			ret = pci_probe_all_drivers(dev);
-		if (ret < 0)
-			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
+			rte_errno = errno;
+			failed++;
+		}
 	}
 
-	return 0;
+	return (probed && probed == failed) ? -1 : 0;
 }
 
 /* dump one device */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index b2a9005..354d0d8 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -832,8 +832,12 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_pci_init() < 0)
-		rte_panic("Cannot init PCI\n");
+	if (rte_eal_pci_init() < 0) {
+		rte_eal_init_alert("Cannot init PCI\n");
+		rte_errno = EPROTO;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 #ifdef VFIO_PRESENT
 	if (rte_eal_vfio_setup() < 0) {
@@ -939,8 +943,11 @@ rte_eal_init(int argc, char **argv)
 		rte_panic("Cannot probe devices\n");
 
 	/* Probe & Initialize PCI devices */
-	if (rte_eal_pci_probe())
-		rte_panic("Cannot probe PCI\n");
+	if (rte_eal_pci_probe()) {
+		rte_eal_init_alert("Cannot probe PCI\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (rte_eal_dev_init() < 0)
 		rte_panic("Cannot init pmd devices\n");
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 12/14] eal: do not panic if vdev init fails
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (10 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 11/14] eal: do not panic on PCI failures Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 13/14] eal: do not panic when bus probe/scan fails Aaron Conole
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

Even if one vdev should fail, there's no need to prevent further
processing.  Log the error, and reflect it to the higher levels to
decide.

Seems like it's possible to continue.  At least, the error is reflected
properly in the logs.  A user could then go and correct or investigate
the situation.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c        | 2 +-
 lib/librte_eal/common/eal_common_dev.c | 5 +++--
 lib/librte_eal/linuxapp/eal/eal.c      | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 75ddf31..ce10f81 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -671,7 +671,7 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	if (rte_eal_dev_init() < 0)
-		rte_panic("Cannot init pmd devices\n");
+		rte_eal_init_alert("Cannot init pmd devices\n");
 
 	rte_eal_mcfg_complete();
 
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 4f3b493..9889997 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -80,6 +80,7 @@ int
 rte_eal_dev_init(void)
 {
 	struct rte_devargs *devargs;
+	int ret = 0;
 
 	/*
 	 * Note that the dev_driver_list is populated here
@@ -97,11 +98,11 @@ rte_eal_dev_init(void)
 					devargs->args)) {
 			RTE_LOG(ERR, EAL, "failed to initialize %s device\n",
 					devargs->virt.drv_name);
-			return -1;
+			ret = -1;
 		}
 	}
 
-	return 0;
+	return ret;
 }
 
 int rte_eal_dev_attach(const char *name, const char *devargs)
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 354d0d8..8abc1c6 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -950,7 +950,7 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	if (rte_eal_dev_init() < 0)
-		rte_panic("Cannot init pmd devices\n");
+		rte_eal_init_alert("Cannot init pmd devices\n");
 
 	rte_eal_mcfg_complete();
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 13/14] eal: do not panic when bus probe/scan fails
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (11 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 12/14] eal: do not panic if vdev init fails Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-22 20:19 ` [PATCH v7 14/14] rte_eal_init: add info about various error codes Aaron Conole
  2017-03-23 14:04 ` [PATCH v7 00/14] eal: Remove most causes of panic on init Bruce Richardson
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

For now, exit the init.  It's likely that even aborting the initialization
is premature in this case, as it may be possible to proceed even if one
bus or another is not available.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/bsdapp/eal/eal.c   | 14 ++++++++++----
 lib/librte_eal/linuxapp/eal/eal.c | 14 ++++++++++----
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index ce10f81..93e44fa 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -624,8 +624,11 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_bus_scan())
-		rte_panic("Cannot scan the buses for devices\n");
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
@@ -660,8 +663,11 @@ rte_eal_init(int argc, char **argv)
 	rte_eal_mp_wait_lcore();
 
 	/* Probe all the buses and devices/drivers on them */
-	if (rte_bus_probe())
-		rte_panic("Cannot probe devices\n");
+	if (rte_bus_probe()) {
+		rte_eal_init_alert("Cannot probe devices\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	/* Probe & Initialize PCI devices */
 	if (rte_eal_pci_probe()) {
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 8abc1c6..f0ded18 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -899,8 +899,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_bus_scan())
-		rte_panic("Cannot scan the buses for devices\n");
+	if (rte_bus_scan()) {
+		rte_eal_init_alert("Cannot scan the buses for devices\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(i) {
 
@@ -939,8 +942,11 @@ rte_eal_init(int argc, char **argv)
 	rte_eal_mp_wait_lcore();
 
 	/* Probe all the buses and devices/drivers on them */
-	if (rte_bus_probe())
-		rte_panic("Cannot probe devices\n");
+	if (rte_bus_probe()) {
+		rte_eal_init_alert("Cannot probe devices\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	/* Probe & Initialize PCI devices */
 	if (rte_eal_pci_probe()) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 14/14] rte_eal_init: add info about various error codes
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (12 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 13/14] eal: do not panic when bus probe/scan fails Aaron Conole
@ 2017-03-22 20:19 ` Aaron Conole
  2017-03-23 14:04 ` [PATCH v7 00/14] eal: Remove most causes of panic on init Bruce Richardson
  14 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-22 20:19 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Thomas Monjalon, Stephen Hemminger

The rte_eal_init function will now pass failure reason hints to the
application.  To help app developers deciper this, add some brief
information about what the codes are indicating.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_eal/common/include/rte_eal.h | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 03fee50..abf020b 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -159,7 +159,32 @@ int rte_eal_iopl_init(void);
  *     function call and should not be further interpreted by the
  *     application.  The EAL does not take any ownership of the memory used
  *     for either the argv array, or its members.
- *   - On failure, a negative error value.
+ *   - On failure, -1 and rte_errno is set to a value indicating the cause
+ *     for failure.  In some instances, the application will need to be
+ *     restarted as part of clearing the issue.
+ *
+ *   Error codes returned via rte_errno:
+ *     EACCES indicates a permissions issue.
+ *
+ *     EAGAIN indicates either a bus or system resource was not available,
+ *            setup may be attempted again.
+ *
+ *     EALREADY indicates that the rte_eal_init function has already been
+ *              called, and cannot be called again.
+ *
+ *     EFAULT indicates the tailq configuration name was not found in
+ *            memory configuration.
+ *
+ *     EINVAL indicates invalid parameters were passed as argv/argc.
+ *
+ *     ENOMEM indicates failure likely caused by an out-of-memory condition.
+ *
+ *     ENODEV indicates memory setup issues.
+ *
+ *     ENOTSUP indicates that the EAL cannot initialize on this system.
+ *
+ *     EPROTO indicates that the PCI bus is either not present, or is not
+ *            readable by the eal.
  */
 int rte_eal_init(int argc, char **argv);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 02/14] eal: do not panic when CPU isn't supported
  2017-03-22 20:19 ` [PATCH v7 02/14] eal: do not panic when CPU isn't supported Aaron Conole
@ 2017-03-23 13:47   ` Bruce Richardson
  2017-03-23 14:27     ` Aaron Conole
  0 siblings, 1 reply; 20+ messages in thread
From: Bruce Richardson @ 2017-03-23 13:47 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev, Thomas Monjalon, Stephen Hemminger

On Wed, Mar 22, 2017 at 04:19:28PM -0400, Aaron Conole wrote:
> This adds a new API to check for the eal cpu versions.
> 
> It's now possible to gracefully exit the application, or for
> applications which support non-dpdk datapaths working in concert with
> DPDK datapaths, there no longer is the possibility of exiting for
> unsupported CPUs.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>

I think we should mark the old function as deprecated or else delete it
entirely. It was technically public, but I suspect it was only ever used
by the EAL init, so I'd look to get rid of it ASAP rather than leave it
hanging around unneeded.

/Bruce

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 00/14] eal: Remove most causes of panic on init
  2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
                   ` (13 preceding siblings ...)
  2017-03-22 20:19 ` [PATCH v7 14/14] rte_eal_init: add info about various error codes Aaron Conole
@ 2017-03-23 14:04 ` Bruce Richardson
  2017-03-27 14:06   ` Thomas Monjalon
  14 siblings, 1 reply; 20+ messages in thread
From: Bruce Richardson @ 2017-03-23 14:04 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev, Thomas Monjalon, Stephen Hemminger

On Wed, Mar 22, 2017 at 04:19:26PM -0400, Aaron Conole wrote:
> In many cases, it's enough to simply let the application know that the
> call to initialize DPDK has failed.  A complete halt can then be
> decided by the application based on error returned (and the app could
> even attempt a possible re-attempt after some corrective action by the
> user or application).
> 
> Changes ->v2:
> - Audited all "RTE_LOG (" calls that were introduced, and converted
>   to "RTE_LOG("
> - Added some fprintf(stderr, "") lines to indicate errors before logging
>   is initialized
> - Removed assignments to errno.
> - Changed patch 14/25 to reflect EFAULT, and document in 25/25
> 
> Changes ->v3:
> - Checkpatch issues in patches 3 (spelling mistake), 9 (issue with leading
>   spaces), and 19 (braces around single line statement if-condition)
> 
> Changes ->v4:
> - Error text cleanup.
> - Add a new check around rte_bus_scan(), added during the development of
>   this series.
> 
> Changes ->v5:
> - checkpatch.pl cleanup in patch 02/26
> - move rte_errno.h include from patch 15 to patch 02
> - remove stdbool.h and use int as return type in patch 06/26
> 
> Changes ->v6:
> - convert all of the initialization calls to RTE_LOG() to rte_eal_init_alert()
> - run through check-git-log and checkpatches
> - add Bruce's ack to the series
> 
> Changes ->v7:
> - Squash a bunch of commits
> - Make the corresponding BSD side changes
> - refactor the PCI probe failure code to be more explicit in the intent
> - Remove most of Bruce's acks (with all the shuffling/changes I think the
>   series should be re-evaluated)
> 
Ran a sanity test compiling with clang on FreeBSD 11 and had a quick
scan of the patches. All looks reasonably ok to me. Did not test on
linux.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 02/14] eal: do not panic when CPU isn't supported
  2017-03-23 13:47   ` Bruce Richardson
@ 2017-03-23 14:27     ` Aaron Conole
  0 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-23 14:27 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Thomas Monjalon, Stephen Hemminger

Bruce Richardson <bruce.richardson@intel.com> writes:

> On Wed, Mar 22, 2017 at 04:19:28PM -0400, Aaron Conole wrote:
>> This adds a new API to check for the eal cpu versions.
>> 
>> It's now possible to gracefully exit the application, or for
>> applications which support non-dpdk datapaths working in concert with
>> DPDK datapaths, there no longer is the possibility of exiting for
>> unsupported CPUs.
>> 
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>
> I think we should mark the old function as deprecated or else delete it
> entirely. It was technically public, but I suspect it was only ever used
> by the EAL init, so I'd look to get rid of it ASAP rather than leave it
> hanging around unneeded.

Okay, I'll submit a follow on patch after this is accepted to mark it as
deprecated.  I agree, I don't know of any application using it (though I
looked at MoonGen, ovs, and warp17).

> /Bruce

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 00/14] eal: Remove most causes of panic on init
  2017-03-23 14:04 ` [PATCH v7 00/14] eal: Remove most causes of panic on init Bruce Richardson
@ 2017-03-27 14:06   ` Thomas Monjalon
  2017-03-31 17:54     ` Aaron Conole
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Monjalon @ 2017-03-27 14:06 UTC (permalink / raw)
  To: Aaron Conole; +Cc: Bruce Richardson, dev, Stephen Hemminger

2017-03-23 14:04, Bruce Richardson:
> On Wed, Mar 22, 2017 at 04:19:26PM -0400, Aaron Conole wrote:
> > In many cases, it's enough to simply let the application know that the
> > call to initialize DPDK has failed.  A complete halt can then be
> > decided by the application based on error returned (and the app could
> > even attempt a possible re-attempt after some corrective action by the
> > user or application).
> > 
> Ran a sanity test compiling with clang on FreeBSD 11 and had a quick
> scan of the patches. All looks reasonably ok to me. Did not test on
> linux.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thank you so much for this work.

Do you plan to continue hunting rte_panic in DPDK?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 00/14] eal: Remove most causes of panic on init
  2017-03-27 14:06   ` Thomas Monjalon
@ 2017-03-31 17:54     ` Aaron Conole
  0 siblings, 0 replies; 20+ messages in thread
From: Aaron Conole @ 2017-03-31 17:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Bruce Richardson, dev, Stephen Hemminger

Thomas Monjalon <thomas.monjalon@6wind.com> writes:

> 2017-03-23 14:04, Bruce Richardson:
>> On Wed, Mar 22, 2017 at 04:19:26PM -0400, Aaron Conole wrote:
>> > In many cases, it's enough to simply let the application know that the
>> > call to initialize DPDK has failed.  A complete halt can then be
>> > decided by the application based on error returned (and the app could
>> > even attempt a possible re-attempt after some corrective action by the
>> > user or application).
>> > 
>> Ran a sanity test compiling with clang on FreeBSD 11 and had a quick
>> scan of the patches. All looks reasonably ok to me. Did not test on
>> linux.
>> 
>> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>
> Applied, thank you so much for this work.
>
> Do you plan to continue hunting rte_panic in DPDK?

I don't have any concrete plans to do so, but as I see them, I'll
generate patches.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-03-31 17:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 20:19 [PATCH v7 00/14] eal: Remove most causes of panic on init Aaron Conole
2017-03-22 20:19 ` [PATCH v7 01/14] eal: do not panic on cpu detection Aaron Conole
2017-03-22 20:19 ` [PATCH v7 02/14] eal: do not panic when CPU isn't supported Aaron Conole
2017-03-23 13:47   ` Bruce Richardson
2017-03-23 14:27     ` Aaron Conole
2017-03-22 20:19 ` [PATCH v7 03/14] eal: do not panic on hugepage info init Aaron Conole
2017-03-22 20:19 ` [PATCH v7 04/14] eal: do not panic if parsing args returns error Aaron Conole
2017-03-22 20:19 ` [PATCH v7 05/14] eal: do not panic on memzone initialization fails Aaron Conole
2017-03-22 20:19 ` [PATCH v7 06/14] eal: set errno when exiting for already called Aaron Conole
2017-03-22 20:19 ` [PATCH v7 07/14] eal: do not panic on a number of conditions Aaron Conole
2017-03-22 20:19 ` [PATCH v7 08/14] eal: do not panic on timer init failure Aaron Conole
2017-03-22 20:19 ` [PATCH v7 09/14] eal: do not panic on interrupt thread init Aaron Conole
2017-03-22 20:19 ` [PATCH v7 10/14] eal: do not error if plugins fail to init Aaron Conole
2017-03-22 20:19 ` [PATCH v7 11/14] eal: do not panic on PCI failures Aaron Conole
2017-03-22 20:19 ` [PATCH v7 12/14] eal: do not panic if vdev init fails Aaron Conole
2017-03-22 20:19 ` [PATCH v7 13/14] eal: do not panic when bus probe/scan fails Aaron Conole
2017-03-22 20:19 ` [PATCH v7 14/14] rte_eal_init: add info about various error codes Aaron Conole
2017-03-23 14:04 ` [PATCH v7 00/14] eal: Remove most causes of panic on init Bruce Richardson
2017-03-27 14:06   ` Thomas Monjalon
2017-03-31 17:54     ` Aaron Conole

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.