[PATCH 00/24] linux/eal: Remove most causes of panic on init

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/24] linux/eal: Remove most causes of panic on init
@ 2017-01-27 14:56 Aaron Conole
  2017-01-27 14:56 ` [PATCH 01/25] eal: CPU init will no longer panic Aaron Conole
                   ` (25 more replies)
  0 siblings, 26 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

In many cases, it's enough to simply let the application know that the
call to initialize DPDK has failed.  A complete halt can then be
decided by the application based on error returned (and the app could
even attempt a possible re-attempt after some corrective action by the
user or application).

Aaron Conole (24):
  eal: CPU init will no longer panic
  eal: return error instead of panic for cpu init
  eal: No panic on hugepages info init
  eal: do not panic on failed hugepage query
  eal: failure to parse args returns error
  eal-common: introduce a way to query cpu support
  eal: Signal error when CPU isn't supported
  eal: do not panic on memzone initialization fails
  eal: set errno when exiting for already called
  eal: Do not panic on log failures
  eal: Do not panic on pci-probe
  eal: do not panic on vfio failure
  eal: do not panic on memory init
  eal: do not panic on tailq init
  eal: do not panic on alarm init
  eal: convert timer_init not to call panic
  eal: change the private pipe call to reflect errno
  eal: Do not panic on interrupt thread init
  eal: do not error if plugins fail to init
  eal_pci: Continue probing even on failures
  eal: do not panic on failed PCI probe
  eal_common_dev: continue initializing vdevs
  eal: do not panic (or abort) if vdev init fails
  eal: do not panic when bus probe fails

 lib/librte_eal/common/eal_common_cpuflags.c        |  13 ++-
 lib/librte_eal/common/eal_common_dev.c             |   5 +-
 lib/librte_eal/common/eal_common_lcore.c           |   7 +-
 lib/librte_eal/common/eal_common_pci.c             |  15 ++-
 lib/librte_eal/common/eal_common_tailqs.c          |   4 +-
 .../common/include/generic/rte_cpuflags.h          |   9 ++
 lib/librte_eal/linuxapp/eal/eal.c                  | 121 +++++++++++++++------
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c    |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |   5 +-
 9 files changed, 135 insertions(+), 50 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/25] eal: CPU init will no longer panic
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 02/25] eal: return error instead of panic for cpu init Aaron Conole
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

After this change, the EAL CPU NUMA node resolution step can no longer
emit an rte_panic.  This aligns with the code in rte_eal_init, which
expects failures to return an error code.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/eal_common_lcore.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 2cd4132..84fa0cb 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -83,16 +83,17 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
 		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES)
+		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
 #ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
 			lcore_config[lcore_id].socket_id = 0;
 #else
-			rte_panic("Socket ID (%u) is greater than "
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
 				"RTE_MAX_NUMA_NODES (%d)\n",
 				lcore_config[lcore_id].socket_id,
 				RTE_MAX_NUMA_NODES);
+			return -1;
 #endif
-
+		}
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/25] eal: return error instead of panic for cpu init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
  2017-01-27 14:56 ` [PATCH 01/25] eal: CPU init will no longer panic Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 03/25] eal: No panic on hugepages info init Aaron Conole
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

There may be no way to gracefully recover, but the application
should be notified that a failure happened, rather than completely
aborting.  This allows the user to proceed with a "slow-path" type
solution.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bf6b818..ea7a4e4 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -767,8 +767,11 @@ rte_eal_init(int argc, char **argv)
 	/* set log level as early as possible */
 	rte_set_log_level(internal_config.log_level);
 
-	if (rte_eal_cpu_init() < 0)
-		rte_panic("Cannot detect lcores\n");
+	if (rte_eal_cpu_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot detect lcores\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	fctret = eal_parse_args(argc, argv);
 	if (fctret < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/25] eal: No panic on hugepages info init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
  2017-01-27 14:56 ` [PATCH 01/25] eal: CPU init will no longer panic Aaron Conole
  2017-01-27 14:56 ` [PATCH 02/25] eal: return error instead of panic for cpu init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 04/25] eal: do not panic on failed hugepage query Aaron Conole
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

When attempting to scan hugepages, signal to the eal.c that an error has
occured, rather than performing a panic.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 18858e2..4d47eaf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -283,9 +283,11 @@ eal_hugepage_info_init(void)
 	struct dirent *dirent;
 
 	dir = opendir(sys_dir_path);
-	if (dir == NULL)
-		rte_panic("Cannot open directory %s to read system hugepage "
+	if (dir == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot open directory %s to read system hugepage "
 			  "info\n", sys_dir_path);
+		return -1;
+	}
 
 	for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
 		struct hugepage_info *hpi;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/25] eal: do not panic on failed hugepage query
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (2 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 03/25] eal: No panic on hugepages info init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 05/25] eal: failure to parse args returns error Aaron Conole
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

If we fail to acquire hugepage information, simply signal an error to
the application.  This clears the run_once counter, allowing the user or
application to take a corrective action and retry.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ea7a4e4..f139558 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -780,8 +780,12 @@ rte_eal_init(int argc, char **argv)
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
 			internal_config.xen_dom0_support == 0 &&
-			eal_hugepage_info_init() < 0)
-		rte_panic("Cannot get hugepage information\n");
+			eal_hugepage_info_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot get hugepage information\n");
+		rte_errno = EACCES;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (internal_config.memory == 0 && internal_config.force_sockets == 0) {
 		if (internal_config.no_hugetlbfs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/25] eal: failure to parse args returns error
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (3 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 04/25] eal: do not panic on failed hugepage query Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 06/25] eal-common: introduce a way to query cpu support Aaron Conole
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

It's possible that the application could take a corrective action here,
and either prompt the user for different arguments, or at least perform
a better logging.  Exiting this early prevents any useful information
gathering from the application layer.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index f139558..413be16 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -774,8 +774,12 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	fctret = eal_parse_args(argc, argv);
-	if (fctret < 0)
-		exit(1);
+	if (fctret < 0) {
+		RTE_LOG (ERR, EAL, "Invalid 'command line' arguments\n");
+		rte_errno = EINVAL;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (internal_config.no_hugetlbfs == 0 &&
 			internal_config.process_type != RTE_PROC_SECONDARY &&
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/25] eal-common: introduce a way to query cpu support
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (4 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 05/25] eal: failure to parse args returns error Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 07/25] eal: Signal error when CPU isn't supported Aaron Conole
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

This adds a new API to check for the eal cpu versions.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/eal_common_cpuflags.c          | 13 +++++++++++--
 lib/librte_eal/common/include/generic/rte_cpuflags.h |  9 +++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index b5f76f7..2c2127b 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -43,6 +43,13 @@
 void
 rte_cpu_check_supported(void)
 {
+	if (!rte_cpu_is_supported())
+		exit(1);
+}
+
+bool
+rte_cpu_is_supported(void)
+{
 	/* This is generated at compile-time by the build system */
 	static const enum rte_cpu_flag_t compile_time_flags[] = {
 			RTE_COMPILE_TIME_CPUFLAGS
@@ -57,14 +64,16 @@ rte_cpu_check_supported(void)
 			fprintf(stderr,
 				"ERROR: CPU feature flag lookup failed with error %d\n",
 				ret);
-			exit(1);
+			return false;
 		}
 		if (!ret) {
 			fprintf(stderr,
 			        "ERROR: This system does not support \"%s\".\n"
 			        "Please check that RTE_MACHINE is set correctly.\n",
 			        rte_cpu_get_flag_name(compile_time_flags[i]));
-			exit(1);
+			return false;
 		}
 	}
+
+	return true;
 }
diff --git a/lib/librte_eal/common/include/generic/rte_cpuflags.h b/lib/librte_eal/common/include/generic/rte_cpuflags.h
index 71321f3..e4342ad 100644
--- a/lib/librte_eal/common/include/generic/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/generic/rte_cpuflags.h
@@ -40,6 +40,7 @@
  */
 
 #include <errno.h>
+#include <stdbool.h>
 
 /**
  * Enumeration of all CPU features supported
@@ -82,4 +83,12 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature);
 void
 rte_cpu_check_supported(void);
 
+/**
+ * This function checks that the currently used CPU supports the CPU features
+ * that were specified at compile time. It is called automatically within the
+ * EAL, so does not need to be used by applications.  This version returns a
+ * result so that decisions may be made (for instance, graceful shutdowns).
+ */
+bool
+rte_cpu_is_supported(void);
 #endif /* _RTE_CPUFLAGS_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/25] eal: Signal error when CPU isn't supported
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (5 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 06/25] eal-common: introduce a way to query cpu support Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 16:27   ` Stephen Hemminger
  2017-01-27 14:56 ` [PATCH 08/25] eal: do not panic on memzone initialization fails Aaron Conole
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

It's now possible to gracefully exit the application, or for
applications which support non-dpdk datapaths working in concert with
DPDK datapaths, there no longer is the possibility of exiting for
unsupported CPUs.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 413be16..cd976f5 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -752,7 +752,10 @@ rte_eal_init(int argc, char **argv)
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
 	/* checks if the machine is adequate */
-	rte_cpu_check_supported();
+	if (!rte_cpu_is_supported()) {
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (!rte_atomic32_test_and_set(&run_once))
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/25] eal: do not panic on memzone initialization fails
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (6 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 07/25] eal: Signal error when CPU isn't supported Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 09/25] eal: set errno when exiting for already called Aaron Conole
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

When memzone initialization fails, report the error to the calling
application rather than panic().  Without a good way of detaching /
releasing hugepages, at this point the application will have to restart.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index cd976f5..cc1bcb5 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -831,8 +831,11 @@ rte_eal_init(int argc, char **argv)
 	/* the directories are locked during eal_hugepage_info_init */
 	eal_hugedirs_unlock();
 
-	if (rte_eal_memzone_init() < 0)
-		rte_panic("Cannot init memzone\n");
+	if (rte_eal_memzone_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init memzone\n");
+		rte_errno = ENODEV;
+		return -1;
+	}
 
 	if (rte_eal_tailqs_init() < 0)
 		rte_panic("Cannot init tail queues for objects\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/25] eal: set errno when exiting for already called
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (7 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 08/25] eal: do not panic on memzone initialization fails Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 10/25] eal: Do not panic on log failures Aaron Conole
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index cc1bcb5..f5f6629 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -757,8 +757,10 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (!rte_atomic32_test_and_set(&run_once))
+	if (!rte_atomic32_test_and_set(&run_once)) {
+		rte_errno = EALREADY;
 		return -1;
+	}
 
 	logid = strrchr(argv[0], '/');
 	logid = strdup(logid ? logid + 1: argv[0]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/25] eal: Do not panic on log failures
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (8 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 09/25] eal: set errno when exiting for already called Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 11/25] eal: Do not panic on pci-probe Aaron Conole
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

When log initialization fails, it's generally because the fopencookie
failed.  While this is rare in practice, it could happen, and it is
likely because of memory pressure.  So, flag the error, and allow the
user to retry.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index f5f6629..b774e41 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -816,8 +816,12 @@ rte_eal_init(int argc, char **argv)
 
 	rte_config_init();
 
-	if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0)
-		rte_panic("Cannot init logs\n");
+	if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init logging\n");
+		rte_errno = EIO;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 	if (rte_eal_pci_init() < 0)
 		rte_panic("Cannot init PCI\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/25] eal: Do not panic on pci-probe
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (9 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 10/25] eal: Do not panic on log failures Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 12/25] eal: do not panic on vfio failure Aaron Conole
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

This will usually be an issue because of permissions.  However, it could
also be caused by OOM.  In either case, errno will contain the
underlying cause.  It is safe to re-init the system here, so allow the
application to take corrective action and reinit.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index b774e41..bea9a23 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -823,8 +823,12 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_pci_init() < 0)
-		rte_panic("Cannot init PCI\n");
+	if (rte_eal_pci_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init PCI\n");
+		rte_errno = EUNATCH;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 
 #ifdef VFIO_PRESENT
 	if (rte_eal_vfio_setup() < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/25] eal: do not panic on vfio failure
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (10 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 11/25] eal: Do not panic on pci-probe Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 13/25] eal: do not panic on memory init Aaron Conole
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bea9a23..cfeefad 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -831,8 +831,12 @@ rte_eal_init(int argc, char **argv)
 	}
 
 #ifdef VFIO_PRESENT
-	if (rte_eal_vfio_setup() < 0)
-		rte_panic("Cannot init VFIO\n");
+	if (rte_eal_vfio_setup() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init VFIO\n");
+		rte_errno = EAGAIN;
+		rte_atomic32_clear(&run_once);
+		return -1;
+	}
 #endif
 
 	if (rte_eal_memory_init() < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/25] eal: do not panic on memory init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (11 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 12/25] eal: do not panic on vfio failure Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 14/25] eal: do not panic on tailq init Aaron Conole
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

This can only happen when access to hugepages (either as primary or
secondary process) fails (and that is usually permissions).  Since the
manner of failure is not reversible, we cannot allow retry.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index cfeefad..d20ac37 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv)
 	}
 #endif
 
-	if (rte_eal_memory_init() < 0)
-		rte_panic("Cannot init memory\n");
+	if (rte_eal_memory_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init memory\n");
+		rte_errno = EACCES;
+		return -1;
+	}
 
 	/* the directories are locked during eal_hugepage_info_init */
 	eal_hugedirs_unlock();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/25] eal: do not panic on tailq init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (12 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 13/25] eal: do not panic on memory init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 16:30   ` Stephen Hemminger
  2017-01-27 14:56 ` [PATCH 15/25] eal: do not panic on alarm init Aaron Conole
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

There are some theoretical racy conditions in the system that _could_
cause early tailq init to fail;  however, no need to panic the
application.  While it can't continue using DPDK, it could make better
alerts to the user.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/eal_common_tailqs.c | 4 ++--
 lib/librte_eal/linuxapp/eal/eal.c         | 7 +++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_tailqs.c b/lib/librte_eal/common/eal_common_tailqs.c
index bb08ec8..b856ec9 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -188,8 +188,8 @@ rte_eal_tailqs_init(void)
 		if (t->head == NULL) {
 			RTE_LOG(ERR, EAL,
 				"Cannot initialize tailq: %s\n", t->name);
-			/* no need to TAILQ_REMOVE, we are going to panic in
-			 * rte_eal_init() */
+			/* no need to TAILQ_REMOVE, we are going to disallow re-attemtps
+			 * for rte_eal_init().  */
 			goto fail;
 		}
 	}
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index d20ac37..b16313c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -854,8 +854,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_tailqs_init() < 0)
-		rte_panic("Cannot init tail queues for objects\n");
+	if (rte_eal_tailqs_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init tail queues for objects\n");
+		errno = ENOTSUP;
+		return -1;
+	}
 
 	if (rte_eal_alarm_init() < 0)
 		rte_panic("Cannot init interrupt-handling thread\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/25] eal: do not panic on alarm init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (13 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 14/25] eal: do not panic on tailq init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 16:31   ` Stephen Hemminger
  2017-01-27 14:56 ` [PATCH 16/25] eal: convert timer_init not to call panic Aaron Conole
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

rte_eal_alarm_init() call uses the linux timerfd framework to create a
poll()-able timer using standard posix file operations.  This could fail
for a few reasons given in the man-pages, but many could be
corrected by the user application.  No need to panic.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index b16313c..122d9c2 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -61,6 +61,7 @@
 #include <rte_launch.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
@@ -860,8 +861,12 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_alarm_init() < 0)
-		rte_panic("Cannot init interrupt-handling thread\n");
+	if (rte_eal_alarm_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init interrupt-handling thread\n");
+		/* rte_eal_alarm_init sets rte_errno on failure. */
+		errno = rte_errno;
+		return -1;
+	}
 
 	if (rte_eal_timer_init() < 0)
 		rte_panic("Cannot init HPET or TSC timers\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/25] eal: convert timer_init not to call panic
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (14 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 15/25] eal: do not panic on alarm init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 17/25] eal: change the private pipe call to reflect errno Aaron Conole
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

After code inspection, there is no way for eal_timer_init() to fail.  It
simply returns 0 in all cases.  As such, this test could either go-away
or stay here as 'future-proofing'.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 122d9c2..bd1863d 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -868,8 +868,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_timer_init() < 0)
-		rte_panic("Cannot init HPET or TSC timers\n");
+	if (rte_eal_timer_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init HPET or TSC timers\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	eal_check_mem_on_local_socket();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/25] eal: change the private pipe call to reflect errno
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (15 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 16/25] eal: convert timer_init not to call panic Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 18/25] eal: Do not panic on interrupt thread init Aaron Conole
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

There could be some confusion as to why the call failed - this change
will always reflect the value of the error in rte_error.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index b5b3f2b..b1a287c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -896,13 +896,16 @@ rte_eal_intr_init(void)
 	 * create a pipe which will be waited by epoll and notified to
 	 * rebuild the wait list of epoll.
 	 */
-	if (pipe(intr_pipe.pipefd) < 0)
+	if (pipe(intr_pipe.pipefd) < 0) {
+		rte_errno = errno;
 		return -1;
+	}
 
 	/* create the host thread to wait/handle the interrupt */
 	ret = pthread_create(&intr_thread, NULL,
 			eal_intr_thread_main, NULL);
 	if (ret != 0) {
+		rte_errno = ret;
 		RTE_LOG(ERR, EAL,
 			"Failed to create thread for interrupt handling\n");
 	} else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 18/25] eal: Do not panic on interrupt thread init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (16 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 17/25] eal: change the private pipe call to reflect errno Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 19/25] eal: do not error if plugins fail to init Aaron Conole
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

When initializing the interrupt thread, there are a number of possible
reasons for failure - some of which are correctable by the application.
Do not panic() needlessly, and give the application a change to reflect
this information to the user.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bd1863d..948393e 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -887,8 +887,11 @@ rte_eal_init(int argc, char **argv)
 		rte_config.master_lcore, (int)thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
-	if (rte_eal_intr_init() < 0)
-		rte_panic("Cannot init interrupt-handling thread\n");
+	if (rte_eal_intr_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init interrupt-handling thread\n");
+		errno = rte_errno;
+		return -1;
+	}
 
 	if (rte_bus_scan())
 		rte_panic("Cannot scan the buses for devices\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 19/25] eal: do not error if plugins fail to init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (17 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 18/25] eal: Do not panic on interrupt thread init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 20/25] eal_pci: Continue probing even on failures Aaron Conole
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

Plugins are useful and important.  However, it seems crazy to abort
everything just because they don't initialize properly.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 948393e..2fdafaa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -876,8 +876,9 @@ rte_eal_init(int argc, char **argv)
 
 	eal_check_mem_on_local_socket();
 
-	if (eal_plugins_init() < 0)
-		rte_panic("Cannot init plugins\n");
+	if (eal_plugins_init() < 0) {
+		RTE_LOG (ERR, EAL, "Cannot init plugins\n");
+	}
 
 	eal_thread_init_master(rte_config.master_lcore);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 20/25] eal_pci: Continue probing even on failures
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (18 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 19/25] eal: do not error if plugins fail to init Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:56 ` [PATCH 21/25] eal: do not panic on failed PCI probe Aaron Conole
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

Some devices may be inaccessible for a variety of reasons, or the
PCI-bus may be unavailable causing the whole thing to fail.  Still,
better to continue attempts at probes.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/eal_common_pci.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 72547bd..752c278 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -69,6 +69,7 @@
 #include <sys/queue.h>
 #include <sys/mman.h>
 
+#include <rte_errno.h>
 #include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
@@ -416,6 +417,7 @@ rte_eal_pci_probe(void)
 	struct rte_pci_device *dev = NULL;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
+	int ret_1 = 0;
 	int ret = 0;
 
 	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
@@ -430,17 +432,20 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			ret = pci_probe_all_drivers(dev);
+			ret_1 = pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
 			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-			ret = pci_probe_all_drivers(dev);
-		if (ret < 0)
-			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
+			ret_1 = pci_probe_all_drivers(dev);
+		if (ret_1 < 0) {
+			RTE_LOG (ERR, EAL, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
+			rte_errno = errno;
+			ret = 1;
+		}
 	}
 
-	return 0;
+	return -ret;
 }
 
 /* dump one device */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 21/25] eal: do not panic on failed PCI probe
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (19 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 20/25] eal_pci: Continue probing even on failures Aaron Conole
@ 2017-01-27 14:56 ` Aaron Conole
  2017-01-27 14:57 ` [PATCH 22/25] eal_common_dev: continue initializing vdevs Aaron Conole
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:56 UTC (permalink / raw)
  To: dev

It may even be possible to simply log the error and continue on letting
the user check the logs and restart the application when things are failed.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 2fdafaa..e77d7f7 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -938,8 +938,11 @@ rte_eal_init(int argc, char **argv)
 		rte_panic("Cannot probe devices\n");
 
 	/* Probe & Initialize PCI devices */
-	if (rte_eal_pci_probe())
-		rte_panic("Cannot probe PCI\n");
+	if (rte_eal_pci_probe()) {
+		RTE_LOG (ERR, EAL, "Cannot probe PCI\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	if (rte_eal_dev_init() < 0)
 		rte_panic("Cannot init pmd devices\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 22/25] eal_common_dev: continue initializing vdevs
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (20 preceding siblings ...)
  2017-01-27 14:56 ` [PATCH 21/25] eal: do not panic on failed PCI probe Aaron Conole
@ 2017-01-27 14:57 ` Aaron Conole
  2017-01-27 14:57 ` [PATCH 23/25] eal: do not panic (or abort) if vdev init fails Aaron Conole
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:57 UTC (permalink / raw)
  To: dev

Even if one vdev should fail, there's no need to prevent further
processing.  Log the error, and reflect it to the higher levels to
decide.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/eal_common_dev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 4f3b493..9889997 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -80,6 +80,7 @@ int
 rte_eal_dev_init(void)
 {
 	struct rte_devargs *devargs;
+	int ret = 0;
 
 	/*
 	 * Note that the dev_driver_list is populated here
@@ -97,11 +98,11 @@ rte_eal_dev_init(void)
 					devargs->args)) {
 			RTE_LOG(ERR, EAL, "failed to initialize %s device\n",
 					devargs->virt.drv_name);
-			return -1;
+			ret = -1;
 		}
 	}
 
-	return 0;
+	return ret;
 }
 
 int rte_eal_dev_attach(const char *name, const char *devargs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 23/25] eal: do not panic (or abort) if vdev init fails
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (21 preceding siblings ...)
  2017-01-27 14:57 ` [PATCH 22/25] eal_common_dev: continue initializing vdevs Aaron Conole
@ 2017-01-27 14:57 ` Aaron Conole
  2017-01-27 14:57 ` [PATCH 24/25] eal: do not panic when bus probe fails Aaron Conole
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:57 UTC (permalink / raw)
  To: dev

Seems like it's possible to continue.  At least, the error is reflected
properly in the logs.  A user could then go and correct or investigate
the situation.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e77d7f7..ecb6ac8 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -945,7 +945,7 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	if (rte_eal_dev_init() < 0)
-		rte_panic("Cannot init pmd devices\n");
+		RTE_LOG (ERR, EAL, "Cannot init pmd devices\n");
 
 	rte_eal_mcfg_complete();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 24/25] eal: do not panic when bus probe fails
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (22 preceding siblings ...)
  2017-01-27 14:57 ` [PATCH 23/25] eal: do not panic (or abort) if vdev init fails Aaron Conole
@ 2017-01-27 14:57 ` Aaron Conole
  2017-01-27 14:57 ` [PATCH 25/25] rte_eal_init: add info about rte_errno codes Aaron Conole
  2017-01-27 15:48 ` [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:57 UTC (permalink / raw)
  To: dev

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ecb6ac8..2783755 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -934,8 +934,11 @@ rte_eal_init(int argc, char **argv)
 	rte_eal_mp_wait_lcore();
 
 	/* Probe all the buses and devices/drivers on them */
-	if (rte_bus_probe())
-		rte_panic("Cannot probe devices\n");
+	if (rte_bus_probe()) {
+		RTE_LOG (ERR, EAL, "Cannot probe devices\n");
+		rte_errno = ENOTSUP;
+		return -1;
+	}
 
 	/* Probe & Initialize PCI devices */
 	if (rte_eal_pci_probe()) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (23 preceding siblings ...)
  2017-01-27 14:57 ` [PATCH 24/25] eal: do not panic when bus probe fails Aaron Conole
@ 2017-01-27 14:57 ` Aaron Conole
  2017-01-27 16:33   ` Stephen Hemminger
  2017-01-27 15:48 ` [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
  25 siblings, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 14:57 UTC (permalink / raw)
  To: dev

The rte_eal_init function will now pass failure reason hints to the
application.  To help app developers deciper this, add some brief
information about what the codes are indicating.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_eal/common/include/rte_eal.h | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 03fee50..46e427f 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
  *     function call and should not be further interpreted by the
  *     application.  The EAL does not take any ownership of the memory used
  *     for either the argv array, or its members.
- *   - On failure, a negative error value.
+ *   - On failure, -1 and rte_errno is set to a value indicating the cause
+ *     for failure.
+ *
+ *   The error codes returned via rte_errno:
+ *     EACCES indicates a permissions issue.
+ *
+ *     EAGAIN indicates either a bus or system resource was not available,
+ *            try again.
+ *
+ *     EALREADY indicates that the rte_eal_init function has already been
+ *              called, and cannot be called again.
+ *
+ *     EINVAL indicates invalid parameters were passed as argv/argc.
+ *
+ *     EIO indicates failure to setup the logging handlers.  This is usually
+ *         caused by an out-of-memory condition.
+ *
+ *     ENODEV indicates memory setup issues.
+ *
+ *     ENOTSUP indicates that the EAL cannot initialize on this system.
+ *
+ *     EUNATCH indicates that the PCI bus is either not present, or is not
+ *             readable by the eal.
  */
 int rte_eal_init(int argc, char **argv);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/24] linux/eal: Remove most causes of panic on init
  2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
                   ` (24 preceding siblings ...)
  2017-01-27 14:57 ` [PATCH 25/25] rte_eal_init: add info about rte_errno codes Aaron Conole
@ 2017-01-27 15:48 ` Aaron Conole
  25 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-27 15:48 UTC (permalink / raw)
  To: dev

Aaron Conole <aconole@redhat.com> writes:

> In many cases, it's enough to simply let the application know that the
> call to initialize DPDK has failed.  A complete halt can then be
> decided by the application based on error returned (and the app could
> even attempt a possible re-attempt after some corrective action by the
> user or application).
>

I got some emails about checkpatch issues for style.  I guess I have
none :(

I'll fix these all up and resubmit a v2 next week.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] eal: Signal error when CPU isn't supported
  2017-01-27 14:56 ` [PATCH 07/25] eal: Signal error when CPU isn't supported Aaron Conole
@ 2017-01-27 16:27   ` Stephen Hemminger
  2017-01-30 16:50     ` Aaron Conole
  0 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2017-01-27 16:27 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev

On Fri, 27 Jan 2017 09:56:45 -0500
Aaron Conole <aconole@redhat.com> wrote:

> It's now possible to gracefully exit the application, or for
> applications which support non-dpdk datapaths working in concert with
> DPDK datapaths, there no longer is the possibility of exiting for
> unsupported CPUs.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index 413be16..cd976f5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -752,7 +752,10 @@ rte_eal_init(int argc, char **argv)
>  	char thread_name[RTE_MAX_THREAD_NAME_LEN];
>  
>  	/* checks if the machine is adequate */
> -	rte_cpu_check_supported();
> +	if (!rte_cpu_is_supported()) {
> +		rte_errno = ENOTSUP;
> +		return -1;
> +	}
>  

I like not having DPDK applications panic.
My concern is that naive user will not know to check rte_errno.  Why not put
a high severity error out as well. If logging is not up just use stderr.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 14/25] eal: do not panic on tailq init
  2017-01-27 14:56 ` [PATCH 14/25] eal: do not panic on tailq init Aaron Conole
@ 2017-01-27 16:30   ` Stephen Hemminger
  2017-01-30 16:51     ` Aaron Conole
  0 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2017-01-27 16:30 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev

On Fri, 27 Jan 2017 09:56:52 -0500
Aaron Conole <aconole@redhat.com> wrote:

> +			/* no need to TAILQ_REMOVE, we are going to disallow re-attemtps
> +			 * for rte_eal_init().  */

Please put multi-line comments in form:
             /*
              * this is a long comment
              * because there really is lots to say
              */

In many cases, having shorter comment is the best way to handle these.
Often developer writes long comment for themselves because what ever problem
they saw was urgent. Then comment becomes irrelevant later.

             /* TAILQ_REMOVE not needed, error is already fatal */

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 15/25] eal: do not panic on alarm init
  2017-01-27 14:56 ` [PATCH 15/25] eal: do not panic on alarm init Aaron Conole
@ 2017-01-27 16:31   ` Stephen Hemminger
  2017-01-27 16:42     ` Bruce Richardson
  0 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2017-01-27 16:31 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev

On Fri, 27 Jan 2017 09:56:53 -0500
Aaron Conole <aconole@redhat.com> wrote:

> +	if (rte_eal_alarm_init() < 0) {
> +		RTE_LOG (ERR, EAL, "Cannot init interrupt-handling thread\n");
> +		/* rte_eal_alarm_init sets rte_errno on failure. */
> +		errno = rte_errno;

Hmm. DPDK in general does not reset errno but instead uses error code
directly on return (best) or in some cases rte_errno

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 14:57 ` [PATCH 25/25] rte_eal_init: add info about rte_errno codes Aaron Conole
@ 2017-01-27 16:33   ` Stephen Hemminger
  2017-01-27 16:47     ` Bruce Richardson
  0 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2017-01-27 16:33 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev

On Fri, 27 Jan 2017 09:57:03 -0500
Aaron Conole <aconole@redhat.com> wrote:

> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index 03fee50..46e427f 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
>   *     function call and should not be further interpreted by the
>   *     application.  The EAL does not take any ownership of the memory used
>   *     for either the argv array, or its members.
> - *   - On failure, a negative error value.
> + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> + *     for failure.
> + *
> + *   The error codes returned via rte_errno:
> + *     EACCES indicates a permissions issue.
> + *
> + *     EAGAIN indicates either a bus or system resource was not available,
> + *            try again.
> + *
> + *     EALREADY indicates that the rte_eal_init function has already been
> + *              called, and cannot be called again.
> + *
> + *     EINVAL indicates invalid parameters were passed as argv/argc.
> + *
> + *     EIO indicates failure to setup the logging handlers.  This is usually
> + *         caused by an out-of-memory condition.
> + *
> + *     ENODEV indicates memory setup issues.
> + *
> + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> + *
> + *     EUNATCH indicates that the PCI bus is either not present, or is not
> + *             readable by the eal.
>   */
>  int rte_eal_init(int argc, char **argv);

Why use rte_errno?
Most DPDK calls just return negative value on error which corresponds to error number.
Are you trying to keep ABI compatibility? Doesn't make sense because before all these
errors were panic's no working application is going to care.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 15/25] eal: do not panic on alarm init
  2017-01-27 16:31   ` Stephen Hemminger
@ 2017-01-27 16:42     ` Bruce Richardson
  2017-01-30 16:52       ` Aaron Conole
  0 siblings, 1 reply; 44+ messages in thread
From: Bruce Richardson @ 2017-01-27 16:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Aaron Conole, dev

On Fri, Jan 27, 2017 at 08:31:55AM -0800, Stephen Hemminger wrote:
> On Fri, 27 Jan 2017 09:56:53 -0500
> Aaron Conole <aconole@redhat.com> wrote:
> 
> > +	if (rte_eal_alarm_init() < 0) {
> > +		RTE_LOG (ERR, EAL, "Cannot init interrupt-handling thread\n");
> > +		/* rte_eal_alarm_init sets rte_errno on failure. */
> > +		errno = rte_errno;
> 
> Hmm. DPDK in general does not reset errno but instead uses error code
> directly on return (best) or in some cases rte_errno

I think we'll disagree on what way of returning error codes is best :-), but
yes, DPDK does not generally modify errno.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 16:33   ` Stephen Hemminger
@ 2017-01-27 16:47     ` Bruce Richardson
  2017-01-27 17:37       ` Stephen Hemminger
  0 siblings, 1 reply; 44+ messages in thread
From: Bruce Richardson @ 2017-01-27 16:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Aaron Conole, dev

On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> On Fri, 27 Jan 2017 09:57:03 -0500
> Aaron Conole <aconole@redhat.com> wrote:
> 
> > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > index 03fee50..46e427f 100644
> > --- a/lib/librte_eal/common/include/rte_eal.h
> > +++ b/lib/librte_eal/common/include/rte_eal.h
> > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> >   *     function call and should not be further interpreted by the
> >   *     application.  The EAL does not take any ownership of the memory used
> >   *     for either the argv array, or its members.
> > - *   - On failure, a negative error value.
> > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > + *     for failure.
> > + *
> > + *   The error codes returned via rte_errno:
> > + *     EACCES indicates a permissions issue.
> > + *
> > + *     EAGAIN indicates either a bus or system resource was not available,
> > + *            try again.
> > + *
> > + *     EALREADY indicates that the rte_eal_init function has already been
> > + *              called, and cannot be called again.
> > + *
> > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > + *
> > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > + *         caused by an out-of-memory condition.
> > + *
> > + *     ENODEV indicates memory setup issues.
> > + *
> > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > + *
> > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > + *             readable by the eal.
> >   */
> >  int rte_eal_init(int argc, char **argv);
> 
> Why use rte_errno?
> Most DPDK calls just return negative value on error which corresponds to error number.
> Are you trying to keep ABI compatibility? Doesn't make sense because before all these
> errors were panic's no working application is going to care.

Either will work, but I actually prefer this way. I view using rte_errno
to be better as it can work in just about all cases, including with
functions which return pointers. This allows you to have a standard
method across all functions for returning error codes, and it only
requires a single sentinal value to indicate error, rather than using a
whole range of values.

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 16:47     ` Bruce Richardson
@ 2017-01-27 17:37       ` Stephen Hemminger
  2017-01-30 18:38         ` Aaron Conole
  0 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2017-01-27 17:37 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Aaron Conole, dev

On Fri, 27 Jan 2017 16:47:40 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > On Fri, 27 Jan 2017 09:57:03 -0500
> > Aaron Conole <aconole@redhat.com> wrote:
> >   
> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > > index 03fee50..46e427f 100644
> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >   *     function call and should not be further interpreted by the
> > >   *     application.  The EAL does not take any ownership of the memory used
> > >   *     for either the argv array, or its members.
> > > - *   - On failure, a negative error value.
> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > > + *     for failure.
> > > + *
> > > + *   The error codes returned via rte_errno:
> > > + *     EACCES indicates a permissions issue.
> > > + *
> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > > + *            try again.
> > > + *
> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > > + *              called, and cannot be called again.
> > > + *
> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > > + *
> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > > + *         caused by an out-of-memory condition.
> > > + *
> > > + *     ENODEV indicates memory setup issues.
> > > + *
> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > > + *
> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > > + *             readable by the eal.
> > >   */
> > >  int rte_eal_init(int argc, char **argv);  
> > 
> > Why use rte_errno?
> > Most DPDK calls just return negative value on error which corresponds to error number.
> > Are you trying to keep ABI compatibility? Doesn't make sense because before all these
> > errors were panic's no working application is going to care.  
> 
> Either will work, but I actually prefer this way. I view using rte_errno
> to be better as it can work in just about all cases, including with
> functions which return pointers. This allows you to have a standard
> method across all functions for returning error codes, and it only
> requires a single sentinal value to indicate error, rather than using a
> whole range of values.

The problem is DPDK is getting more inconsistent on how this is done.
As long as error returns are always same as kernel/glibc errno's it really doesn't
matter much which way the value is returned from a technical point of view
but the inconsistency is sure to be a usability problem and source of errors.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] eal: Signal error when CPU isn't supported
  2017-01-27 16:27   ` Stephen Hemminger
@ 2017-01-30 16:50     ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-30 16:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Fri, 27 Jan 2017 09:56:45 -0500
> Aaron Conole <aconole@redhat.com> wrote:
>
>> It's now possible to gracefully exit the application, or for
>> applications which support non-dpdk datapaths working in concert with
>> DPDK datapaths, there no longer is the possibility of exiting for
>> unsupported CPUs.
>> 
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>> index 413be16..cd976f5 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -752,7 +752,10 @@ rte_eal_init(int argc, char **argv)
>>  	char thread_name[RTE_MAX_THREAD_NAME_LEN];
>>  
>>  	/* checks if the machine is adequate */
>> -	rte_cpu_check_supported();
>> +	if (!rte_cpu_is_supported()) {
>> +		rte_errno = ENOTSUP;
>> +		return -1;
>> +	}
>>  
>
> I like not having DPDK applications panic.
> My concern is that naive user will not know to check rte_errno.  Why not put
> a high severity error out as well. If logging is not up just use stderr.

I'll work in a quick blurt using stderr.

Thanks for the review, Stephen!

-Aaron

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 14/25] eal: do not panic on tailq init
  2017-01-27 16:30   ` Stephen Hemminger
@ 2017-01-30 16:51     ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-30 16:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Fri, 27 Jan 2017 09:56:52 -0500
> Aaron Conole <aconole@redhat.com> wrote:
>
>> + /* no need to TAILQ_REMOVE, we are going to disallow re-attemtps
>> +			 * for rte_eal_init().  */
>
> Please put multi-line comments in form:
>              /*
>               * this is a long comment
>               * because there really is lots to say
>               */

Okay, will do.

> In many cases, having shorter comment is the best way to handle these.
> Often developer writes long comment for themselves because what ever problem
> they saw was urgent. Then comment becomes irrelevant later.
>
>              /* TAILQ_REMOVE not needed, error is already fatal */

I'll fold this in.

Thanks for the review!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 15/25] eal: do not panic on alarm init
  2017-01-27 16:42     ` Bruce Richardson
@ 2017-01-30 16:52       ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-30 16:52 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Stephen Hemminger, dev

Bruce Richardson <bruce.richardson@intel.com> writes:

> On Fri, Jan 27, 2017 at 08:31:55AM -0800, Stephen Hemminger wrote:
>> On Fri, 27 Jan 2017 09:56:53 -0500
>> Aaron Conole <aconole@redhat.com> wrote:
>> 
>> > +	if (rte_eal_alarm_init() < 0) {
>> > +		RTE_LOG (ERR, EAL, "Cannot init interrupt-handling thread\n");
>> > +		/* rte_eal_alarm_init sets rte_errno on failure. */
>> > +		errno = rte_errno;
>> 
>> Hmm. DPDK in general does not reset errno but instead uses error code
>> directly on return (best) or in some cases rte_errno
>
> I think we'll disagree on what way of returning error codes is best :-), but
> yes, DPDK does not generally modify errno.

Okay, I'll drop the errno set.  I think it's a mistake from the first
version of the series (as RFC).

Thanks for the reviews, Bruce and Stephen!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 17:37       ` Stephen Hemminger
@ 2017-01-30 18:38         ` Aaron Conole
  2017-01-30 20:19           ` Thomas Monjalon
  2017-01-31  9:33           ` Bruce Richardson
  0 siblings, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2017-01-30 18:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Bruce Richardson, dev

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Fri, 27 Jan 2017 16:47:40 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > On Fri, 27 Jan 2017 09:57:03 -0500
>> > Aaron Conole <aconole@redhat.com> wrote:
>> >   
>> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
>> > > index 03fee50..46e427f 100644
>> > > --- a/lib/librte_eal/common/include/rte_eal.h
>> > > +++ b/lib/librte_eal/common/include/rte_eal.h
>> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
>> > >   *     function call and should not be further interpreted by the
>> > >   *     application.  The EAL does not take any ownership of the memory used
>> > >   *     for either the argv array, or its members.
>> > > - *   - On failure, a negative error value.
>> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
>> > > + *     for failure.
>> > > + *
>> > > + *   The error codes returned via rte_errno:
>> > > + *     EACCES indicates a permissions issue.
>> > > + *
>> > > + *     EAGAIN indicates either a bus or system resource was not available,
>> > > + *            try again.
>> > > + *
>> > > + *     EALREADY indicates that the rte_eal_init function has already been
>> > > + *              called, and cannot be called again.
>> > > + *
>> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
>> > > + *
>> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
>> > > + *         caused by an out-of-memory condition.
>> > > + *
>> > > + *     ENODEV indicates memory setup issues.
>> > > + *
>> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
>> > > + *
>> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
>> > > + *             readable by the eal.
>> > >   */
>> > >  int rte_eal_init(int argc, char **argv);  
>> > 
>> > Why use rte_errno?
>> > Most DPDK calls just return negative value on error which
>> > corresponds to error number.
>> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > because before all these
>> > errors were panic's no working application is going to care.  
>> 
>> Either will work, but I actually prefer this way. I view using rte_errno
>> to be better as it can work in just about all cases, including with
>> functions which return pointers. This allows you to have a standard
>> method across all functions for returning error codes, and it only
>> requires a single sentinal value to indicate error, rather than using a
>> whole range of values.
>
> The problem is DPDK is getting more inconsistent on how this is done.
> As long as error returns are always same as kernel/glibc errno's it really doesn't
> matter much which way the value is returned from a technical point of view
> but the inconsistency is sure to be a usability problem and source of errors.

I am using rte_errno here because I assumed it was the preferred
method.  In fact, looking at some recently contributed modules (for
instance pdump), it seems that folks are using it.

I'm not really sure the purpose of having rte_errno if it isn't used, so
it'd be helpful to know if there's some consensus on reflecting errors
via this variable, or on returning error codes.  Whichever is the more
consistent with the way the DPDK project does things, I'm game :).

Thanks for the thoughts, and review.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-30 18:38         ` Aaron Conole
@ 2017-01-30 20:19           ` Thomas Monjalon
  2017-02-01 10:54             ` Adrien Mazarguil
  2017-01-31  9:33           ` Bruce Richardson
  1 sibling, 1 reply; 44+ messages in thread
From: Thomas Monjalon @ 2017-01-30 20:19 UTC (permalink / raw)
  To: Aaron Conole, adrien.mazarguil; +Cc: dev, Stephen Hemminger, Bruce Richardson

2017-01-30 13:38, Aaron Conole:
> Stephen Hemminger <stephen@networkplumber.org> writes:
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > Why use rte_errno?
> >> > Most DPDK calls just return negative value on error which
> >> > corresponds to error number.
> >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > because before all these
> >> > errors were panic's no working application is going to care.  
> >> 
> >> Either will work, but I actually prefer this way. I view using rte_errno
> >> to be better as it can work in just about all cases, including with
> >> functions which return pointers. This allows you to have a standard
> >> method across all functions for returning error codes, and it only
> >> requires a single sentinal value to indicate error, rather than using a
> >> whole range of values.
> >
> > The problem is DPDK is getting more inconsistent on how this is done.
> > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > matter much which way the value is returned from a technical point of view
> > but the inconsistency is sure to be a usability problem and source of errors.
> 
> I am using rte_errno here because I assumed it was the preferred
> method.  In fact, looking at some recently contributed modules (for
> instance pdump), it seems that folks are using it.
> 
> I'm not really sure the purpose of having rte_errno if it isn't used, so
> it'd be helpful to know if there's some consensus on reflecting errors
> via this variable, or on returning error codes.  Whichever is the more
> consistent with the way the DPDK project does things, I'm game :).

I think we can use both return value and rte_errno.
We could try to enforce rte_errno as mandatory everywhere.

Adrien did the recent rte_flow API.
Please Adrien, could you give your thought?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-30 18:38         ` Aaron Conole
  2017-01-30 20:19           ` Thomas Monjalon
@ 2017-01-31  9:33           ` Bruce Richardson
  2017-01-31 16:56             ` Stephen Hemminger
  1 sibling, 1 reply; 44+ messages in thread
From: Bruce Richardson @ 2017-01-31  9:33 UTC (permalink / raw)
  To: Aaron Conole; +Cc: Stephen Hemminger, dev

On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> Stephen Hemminger <stephen@networkplumber.org> writes:
> 
> > On Fri, 27 Jan 2017 16:47:40 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > On Fri, 27 Jan 2017 09:57:03 -0500
> >> > Aaron Conole <aconole@redhat.com> wrote:
> >> >   
> >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> >> > > index 03fee50..46e427f 100644
> >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> >> > >   *     function call and should not be further interpreted by the
> >> > >   *     application.  The EAL does not take any ownership of the memory used
> >> > >   *     for either the argv array, or its members.
> >> > > - *   - On failure, a negative error value.
> >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> >> > > + *     for failure.
> >> > > + *
> >> > > + *   The error codes returned via rte_errno:
> >> > > + *     EACCES indicates a permissions issue.
> >> > > + *
> >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> >> > > + *            try again.
> >> > > + *
> >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> >> > > + *              called, and cannot be called again.
> >> > > + *
> >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> >> > > + *
> >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> >> > > + *         caused by an out-of-memory condition.
> >> > > + *
> >> > > + *     ENODEV indicates memory setup issues.
> >> > > + *
> >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> >> > > + *
> >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> >> > > + *             readable by the eal.
> >> > >   */
> >> > >  int rte_eal_init(int argc, char **argv);  
> >> > 
> >> > Why use rte_errno?
> >> > Most DPDK calls just return negative value on error which
> >> > corresponds to error number.
> >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > because before all these
> >> > errors were panic's no working application is going to care.  
> >> 
> >> Either will work, but I actually prefer this way. I view using rte_errno
> >> to be better as it can work in just about all cases, including with
> >> functions which return pointers. This allows you to have a standard
> >> method across all functions for returning error codes, and it only
> >> requires a single sentinal value to indicate error, rather than using a
> >> whole range of values.
> >
> > The problem is DPDK is getting more inconsistent on how this is done.
> > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > matter much which way the value is returned from a technical point of view
> > but the inconsistency is sure to be a usability problem and source of errors.
> 
> I am using rte_errno here because I assumed it was the preferred
> method.  In fact, looking at some recently contributed modules (for
> instance pdump), it seems that folks are using it.
> 
> I'm not really sure the purpose of having rte_errno if it isn't used, so
> it'd be helpful to know if there's some consensus on reflecting errors
> via this variable, or on returning error codes.  Whichever is the more
> consistent with the way the DPDK project does things, I'm game :).
> 
Unfortunately, this is one area where DPDK is inconsistent, and both
schemes are widely used. I much prefer using the rte_errno method, but
returning error codes directly is also common in DPDK.

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-31  9:33           ` Bruce Richardson
@ 2017-01-31 16:56             ` Stephen Hemminger
  0 siblings, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2017-01-31 16:56 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Aaron Conole, dev

On Tue, 31 Jan 2017 09:33:45 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >   
> > > On Fri, 27 Jan 2017 16:47:40 +0000
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >  
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:  
> > >> > On Fri, 27 Jan 2017 09:57:03 -0500
> > >> > Aaron Conole <aconole@redhat.com> wrote:
> > >> >     
> > >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > >> > > index 03fee50..46e427f 100644
> > >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >> > >   *     function call and should not be further interpreted by the
> > >> > >   *     application.  The EAL does not take any ownership of the memory used
> > >> > >   *     for either the argv array, or its members.
> > >> > > - *   - On failure, a negative error value.
> > >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > >> > > + *     for failure.
> > >> > > + *
> > >> > > + *   The error codes returned via rte_errno:
> > >> > > + *     EACCES indicates a permissions issue.
> > >> > > + *
> > >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > >> > > + *            try again.
> > >> > > + *
> > >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > >> > > + *              called, and cannot be called again.
> > >> > > + *
> > >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > >> > > + *
> > >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > >> > > + *         caused by an out-of-memory condition.
> > >> > > + *
> > >> > > + *     ENODEV indicates memory setup issues.
> > >> > > + *
> > >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > >> > > + *
> > >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > >> > > + *             readable by the eal.
> > >> > >   */
> > >> > >  int rte_eal_init(int argc, char **argv);    
> > >> > 
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.    
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.  
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.  
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> >   
> Unfortunately, this is one area where DPDK is inconsistent, and both
> schemes are widely used. I much prefer using the rte_errno method, but
> returning error codes directly is also common in DPDK.

One argument in favor of returning error codes directly, is that it makes
it safer in application when one user function is returning an error code 
back through its internal call tree.

Also, the API does not really do a good job of distinguishing between normal
(no data present) and exceptional (NIC has died).  At least it doesn't depend
on something like Structured Exception handling...

Feel free to clean the stables on this one.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-30 20:19           ` Thomas Monjalon
@ 2017-02-01 10:54             ` Adrien Mazarguil
  2017-02-01 12:06               ` Jan Blunck
  0 siblings, 1 reply; 44+ messages in thread
From: Adrien Mazarguil @ 2017-02-01 10:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Aaron Conole, dev, Stephen Hemminger, Bruce Richardson

On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> 2017-01-30 13:38, Aaron Conole:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.  
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> 
> I think we can use both return value and rte_errno.
> We could try to enforce rte_errno as mandatory everywhere.
> 
> Adrien did the recent rte_flow API.
> Please Adrien, could you give your thought?

Sure, actually as already pointed out in this thread, both approaches have
pros and cons depending on the use-case.

Through return value:

Pros
----

- Most common approach used in DPPK today.
- Used internally by the Linux kernel (negative errno) and in the pthreads
  library (positive errno).
- Avoids the need to access an external, global variable requiring its own
  thread-local storage.
- Inherently thread-safe and reentrant (i.e. safe with signal handlers).
- Returned value is also the error code, two facts reported at once.

Cons
----

- Difficult to use with functions returning anything other than signed
  integers with negative values having no other meaning.
- The returned value must be assigned to a local variable in order not to
  discard it and process it later most of the time.
- All function calls must be tested for errors.

Through rte_errno:

Pros
----

- errno-like, well known behavior defined by the C standard and used
  everywhere in the C library.
- Testing return values is not mandatory, e.g. rte_errno can be initialized
  to zero before calling a group of functions and checking its value
  afterward (rte_errno is only updated in case of error).
- Assigning a local variable to store its value is not necessary as long as
  another function that may affect rte_errno is not called.

Cons
----

- Not fully reentrant, thread-safety is fine for most purposes but signal
  handlers affecting it still cause undefined behavior (they must at least
  save and restore its value in case they modify it).
- Accessing non-local storage may affect CPU cycle-sensitive functions such
  as TX/RX burst.

My opinion is that rte_errno is best for control path operations while using
the return value makes more sense in the data path. The major issue being
that function returning anything other than int (e.g. TX/RX burst) cannot
describe any kind of error to the application.

I went with both in rte_flow (return + rte_errno) mostly due to the return
type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
consistent while maintaining compatibility with other DPDK APIs. Note there
is little overhead for API functions to set rte_errno _and_ return its
value, it's mostly free.

I think using both is best also because it leaves applications the choice of
error-handling method, however if I had to pick one I'd go with rte_errno
and standardize on -1 as the default error value (as in the C library).

Below are a bunch of use-case examples to illustrate how rte_errno could
be convenient to applications.

Easily creating many flow rules during init in a all-or-nothing fashion:

 rte_errno = 0;
 for (i = 0; i != num; ++i)
     rule[i] = rte_flow_create(port, ...);
 if (unlikely(rte_errno)) {
     rte_flow_flush(port);
     return -1;
 }

Complete TX packet burst failure with explanation (could also detect partial
failures by initializing rte_errno to 0):

 sent = rte_eth_tx_burst(...);
 if (unlikely(!sent)) {
     switch (rte_errno) {
         case E2BIG:
             // too many packets in burst
         ...
         case EMSGSIZE:
             // first packet is too large
         ...
         case ENOBUFS:
             // TX queue is full
         ...
     }
 }
 
TX burst functions in PMDs could be modified as follows with minimal impact
on their performance and no ABI change:

     uint16_t sent = 0;
     int error; // new variable
 
     [process burst]
     if (unlikely([something went wrong])) { // this check already exists
         error = EPROBLEM; // new assignment
         goto error; // instead of "return sent"
     }
     [process burst]
     return sent;
 error:
     rte_errno = error;
     return sent;

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-02-01 10:54             ` Adrien Mazarguil
@ 2017-02-01 12:06               ` Jan Blunck
  2017-02-01 14:18                 ` Bruce Richardson
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Blunck @ 2017-02-01 12:06 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, Aaron Conole, dev, Stephen Hemminger, Bruce Richardson

On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
>> 2017-01-30 13:38, Aaron Conole:
>> > Stephen Hemminger <stephen@networkplumber.org> writes:
>> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
>> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > >> > Why use rte_errno?
>> > >> > Most DPDK calls just return negative value on error which
>> > >> > corresponds to error number.
>> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > >> > because before all these
>> > >> > errors were panic's no working application is going to care.
>> > >>
>> > >> Either will work, but I actually prefer this way. I view using rte_errno
>> > >> to be better as it can work in just about all cases, including with
>> > >> functions which return pointers. This allows you to have a standard
>> > >> method across all functions for returning error codes, and it only
>> > >> requires a single sentinal value to indicate error, rather than using a
>> > >> whole range of values.
>> > >
>> > > The problem is DPDK is getting more inconsistent on how this is done.
>> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
>> > > matter much which way the value is returned from a technical point of view
>> > > but the inconsistency is sure to be a usability problem and source of errors.
>> >
>> > I am using rte_errno here because I assumed it was the preferred
>> > method.  In fact, looking at some recently contributed modules (for
>> > instance pdump), it seems that folks are using it.
>> >
>> > I'm not really sure the purpose of having rte_errno if it isn't used, so
>> > it'd be helpful to know if there's some consensus on reflecting errors
>> > via this variable, or on returning error codes.  Whichever is the more
>> > consistent with the way the DPDK project does things, I'm game :).
>>
>> I think we can use both return value and rte_errno.
>> We could try to enforce rte_errno as mandatory everywhere.
>>
>> Adrien did the recent rte_flow API.
>> Please Adrien, could you give your thought?
>
> Sure, actually as already pointed out in this thread, both approaches have
> pros and cons depending on the use-case.
>
> Through return value:
>
> Pros
> ----
>
> - Most common approach used in DPPK today.
> - Used internally by the Linux kernel (negative errno) and in the pthreads
>   library (positive errno).
> - Avoids the need to access an external, global variable requiring its own
>   thread-local storage.
> - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> - Returned value is also the error code, two facts reported at once.

Caller can decide to ignore return value if no error handling is wanted.

>
> Cons
> ----
>
> - Difficult to use with functions returning anything other than signed
>   integers with negative values having no other meaning.
> - The returned value must be assigned to a local variable in order not to
>   discard it and process it later most of the time.

I believe this is Pro since the rte_errno even needs to assign to a
thread-local variable even.

> - All function calls must be tested for errors.

The rte_errno needs to do this too to decide if it needs to assign a
value to rte_errno.

>
> Through rte_errno:
>
> Pros
> ----
>
> - errno-like, well known behavior defined by the C standard and used
>   everywhere in the C library.
> - Testing return values is not mandatory, e.g. rte_errno can be initialized
>   to zero before calling a group of functions and checking its value
>   afterward (rte_errno is only updated in case of error).
> - Assigning a local variable to store its value is not necessary as long as
>   another function that may affect rte_errno is not called.
>
> Cons
> ----
>
> - Not fully reentrant, thread-safety is fine for most purposes but signal
>   handlers affecting it still cause undefined behavior (they must at least
>   save and restore its value in case they modify it).
> - Accessing non-local storage may affect CPU cycle-sensitive functions such
>   as TX/RX burst.

Actually testing for errors mean you also have to reset the rte_errno
variable before. That also means you have to access thread-local
storage twice.

Besides that the problem of rte_errno is that you do error handling
twice because the implementation still needs to check for the error
condition before assigning a meaningful error value to rte_errno.
After that again the user code needs to check for the return value to
decide if looking at rte_errno makes any sense.


> My opinion is that rte_errno is best for control path operations while using
> the return value makes more sense in the data path. The major issue being
> that function returning anything other than int (e.g. TX/RX burst) cannot
> describe any kind of error to the application.
>
> I went with both in rte_flow (return + rte_errno) mostly due to the return
> type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> consistent while maintaining compatibility with other DPDK APIs. Note there
> is little overhead for API functions to set rte_errno _and_ return its
> value, it's mostly free.
>
> I think using both is best also because it leaves applications the choice of
> error-handling method, however if I had to pick one I'd go with rte_errno
> and standardize on -1 as the default error value (as in the C library).
>
> Below are a bunch of use-case examples to illustrate how rte_errno could
> be convenient to applications.
>
> Easily creating many flow rules during init in a all-or-nothing fashion:
>
>  rte_errno = 0;
>  for (i = 0; i != num; ++i)
>      rule[i] = rte_flow_create(port, ...);
>  if (unlikely(rte_errno)) {
>      rte_flow_flush(port);
>      return -1;
>  }
>
> Complete TX packet burst failure with explanation (could also detect partial
> failures by initializing rte_errno to 0):
>
>  sent = rte_eth_tx_burst(...);
>  if (unlikely(!sent)) {
>      switch (rte_errno) {
>          case E2BIG:
>              // too many packets in burst
>          ...
>          case EMSGSIZE:
>              // first packet is too large
>          ...
>          case ENOBUFS:
>              // TX queue is full
>          ...
>      }
>  }
>
> TX burst functions in PMDs could be modified as follows with minimal impact
> on their performance and no ABI change:
>
>      uint16_t sent = 0;
>      int error; // new variable
>
>      [process burst]
>      if (unlikely([something went wrong])) { // this check already exists
>          error = EPROBLEM; // new assignment
>          goto error; // instead of "return sent"
>      }
>      [process burst]
>      return sent;
>  error:
>      rte_errno = error;
>      return sent;
>
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-02-01 12:06               ` Jan Blunck
@ 2017-02-01 14:18                 ` Bruce Richardson
  0 siblings, 0 replies; 44+ messages in thread
From: Bruce Richardson @ 2017-02-01 14:18 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Adrien Mazarguil, Thomas Monjalon, Aaron Conole, dev, Stephen Hemminger

On Wed, Feb 01, 2017 at 01:06:03PM +0100, Jan Blunck wrote:
> On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> >> 2017-01-30 13:38, Aaron Conole:
> >> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > >> > Why use rte_errno?
> >> > >> > Most DPDK calls just return negative value on error which
> >> > >> > corresponds to error number.
> >> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > >> > because before all these
> >> > >> > errors were panic's no working application is going to care.
> >> > >>
> >> > >> Either will work, but I actually prefer this way. I view using rte_errno
> >> > >> to be better as it can work in just about all cases, including with
> >> > >> functions which return pointers. This allows you to have a standard
> >> > >> method across all functions for returning error codes, and it only
> >> > >> requires a single sentinal value to indicate error, rather than using a
> >> > >> whole range of values.
> >> > >
> >> > > The problem is DPDK is getting more inconsistent on how this is done.
> >> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> >> > > matter much which way the value is returned from a technical point of view
> >> > > but the inconsistency is sure to be a usability problem and source of errors.
> >> >
> >> > I am using rte_errno here because I assumed it was the preferred
> >> > method.  In fact, looking at some recently contributed modules (for
> >> > instance pdump), it seems that folks are using it.
> >> >
> >> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> >> > it'd be helpful to know if there's some consensus on reflecting errors
> >> > via this variable, or on returning error codes.  Whichever is the more
> >> > consistent with the way the DPDK project does things, I'm game :).
> >>
> >> I think we can use both return value and rte_errno.
> >> We could try to enforce rte_errno as mandatory everywhere.
> >>
> >> Adrien did the recent rte_flow API.
> >> Please Adrien, could you give your thought?
> >
> > Sure, actually as already pointed out in this thread, both approaches have
> > pros and cons depending on the use-case.
> >
> > Through return value:
> >
> > Pros
> > ----
> >
> > - Most common approach used in DPPK today.
> > - Used internally by the Linux kernel (negative errno) and in the pthreads
> >   library (positive errno).
> > - Avoids the need to access an external, global variable requiring its own
> >   thread-local storage.
> > - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> > - Returned value is also the error code, two facts reported at once.
> 
> Caller can decide to ignore return value if no error handling is wanted.
>
Not always the case. In the case of a rx or tx burst call, if there is a
negative error that must be checked for or assigned to zero in some
cases to make other logic in the path work sanely, e.g. updating an
array of stats using the return value.

> >
> > Cons
> > ----
> >
> > - Difficult to use with functions returning anything other than signed
> >   integers with negative values having no other meaning.
> > - The returned value must be assigned to a local variable in order not to
> >   discard it and process it later most of the time.
> 
> I believe this is Pro since the rte_errno even needs to assign to a
> thread-local variable even.

No, it's a con, since for errno the value will be preserved in the
absense of other errors. The application can delay handling the error as
long as it wants, in the absense of causes of subsequent errors.

> 
> > - All function calls must be tested for errors.
> 
> The rte_errno needs to do this too to decide if it needs to assign a
> value to rte_errno.
> 
Thats inside the called function, not the application. See my earlier
comment above about having to check your return value is in the valid
"logical range" expected from the call. Having a negative number of
packets received does not make logical sense, so you have to check the
return value when updating stats etc.


> >
> > Through rte_errno:
> >
> > Pros
> > ----
> >
> > - errno-like, well known behavior defined by the C standard and used
> >   everywhere in the C library.
> > - Testing return values is not mandatory, e.g. rte_errno can be initialized
> >   to zero before calling a group of functions and checking its value
> >   afterward (rte_errno is only updated in case of error).
> > - Assigning a local variable to store its value is not necessary as long as
> >   another function that may affect rte_errno is not called.
> >
> > Cons
> > ----
> >
> > - Not fully reentrant, thread-safety is fine for most purposes but signal
> >   handlers affecting it still cause undefined behavior (they must at least
> >   save and restore its value in case they modify it).
> > - Accessing non-local storage may affect CPU cycle-sensitive functions such
> >   as TX/RX burst.
> 
> Actually testing for errors mean you also have to reset the rte_errno
> variable before. That also means you have to access thread-local
> storage twice.
> 
Not true. Your return value still indicates an error via a single
sentinal value. Only in that case do you (the app) access the global value,
to find out the exact error reason.

> Besides that the problem of rte_errno is that you do error handling
> twice because the implementation still needs to check for the error
> condition before assigning a meaningful error value to rte_errno.
> After that again the user code needs to check for the return value to
> decide if looking at rte_errno makes any sense.
> 
Yes, in the case of an error occuring there will be an extra write to a
global variable, and a subsequent read from that value (which should not
be a problem, as the write will have occurred in the same thread).
However, this is irrelevant to normal path processing. Error should be
the exception not the rule.

> 
> > My opinion is that rte_errno is best for control path operations while using
> > the return value makes more sense in the data path. The major issue being
> > that function returning anything other than int (e.g. TX/RX burst) cannot
> > describe any kind of error to the application.
> >
> > I went with both in rte_flow (return + rte_errno) mostly due to the return
> > type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> > consistent while maintaining compatibility with other DPDK APIs. Note there
> > is little overhead for API functions to set rte_errno _and_ return its
> > value, it's mostly free
+1, and error cases should be rare, even if there is a small cost.
.
> >
> > I think using both is best also because it leaves applications the choice of
> > error-handling method, however if I had to pick one I'd go with rte_errno
> > and standardize on -1 as the default error value (as in the C library).
> >
+1
though I think the sentinal value will vary depending on each case. I would
look to keep the standard packet rx/tx functions and ones like them
returning a zero on any error, to simplify programming logic, and also
because in many cases the only real causes of error they can produce is
from bad parameters.
Functions returning pointers obviously will use NULL as error value.


> > Below are a bunch of use-case examples to illustrate how rte_errno could
> > be convenient to applications.
> >
> > Easily creating many flow rules during init in a all-or-nothing fashion:
> >
> >  rte_errno = 0;
> >  for (i = 0; i != num; ++i)
> >      rule[i] = rte_flow_create(port, ...);
> >  if (unlikely(rte_errno)) {
> >      rte_flow_flush(port);
> >      return -1;
> >  }
> >
> > Complete TX packet burst failure with explanation (could also detect partial
> > failures by initializing rte_errno to 0):
> >
> >  sent = rte_eth_tx_burst(...);
> >  if (unlikely(!sent)) {
> >      switch (rte_errno) {
> >          case E2BIG:
> >              // too many packets in burst
> >          ...
> >          case EMSGSIZE:
> >              // first packet is too large
> >          ...
> >          case ENOBUFS:
> >              // TX queue is full
> >          ...
> >      }
> >  }
> >
> > TX burst functions in PMDs could be modified as follows with minimal impact
> > on their performance and no ABI change:
> >
> >      uint16_t sent = 0;
> >      int error; // new variable
> >
> >      [process burst]
> >      if (unlikely([something went wrong])) { // this check already exists
> >          error = EPROBLEM; // new assignment
> >          goto error; // instead of "return sent"
> >      }
> >      [process burst]
> >      return sent;
> >  error:
> >      rte_errno = error;
> >      return sent;
> >
> > --
> > Adrien Mazarguil
> > 6WIND

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-02-01 14:18 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-27 14:56 [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
2017-01-27 14:56 ` [PATCH 01/25] eal: CPU init will no longer panic Aaron Conole
2017-01-27 14:56 ` [PATCH 02/25] eal: return error instead of panic for cpu init Aaron Conole
2017-01-27 14:56 ` [PATCH 03/25] eal: No panic on hugepages info init Aaron Conole
2017-01-27 14:56 ` [PATCH 04/25] eal: do not panic on failed hugepage query Aaron Conole
2017-01-27 14:56 ` [PATCH 05/25] eal: failure to parse args returns error Aaron Conole
2017-01-27 14:56 ` [PATCH 06/25] eal-common: introduce a way to query cpu support Aaron Conole
2017-01-27 14:56 ` [PATCH 07/25] eal: Signal error when CPU isn't supported Aaron Conole
2017-01-27 16:27   ` Stephen Hemminger
2017-01-30 16:50     ` Aaron Conole
2017-01-27 14:56 ` [PATCH 08/25] eal: do not panic on memzone initialization fails Aaron Conole
2017-01-27 14:56 ` [PATCH 09/25] eal: set errno when exiting for already called Aaron Conole
2017-01-27 14:56 ` [PATCH 10/25] eal: Do not panic on log failures Aaron Conole
2017-01-27 14:56 ` [PATCH 11/25] eal: Do not panic on pci-probe Aaron Conole
2017-01-27 14:56 ` [PATCH 12/25] eal: do not panic on vfio failure Aaron Conole
2017-01-27 14:56 ` [PATCH 13/25] eal: do not panic on memory init Aaron Conole
2017-01-27 14:56 ` [PATCH 14/25] eal: do not panic on tailq init Aaron Conole
2017-01-27 16:30   ` Stephen Hemminger
2017-01-30 16:51     ` Aaron Conole
2017-01-27 14:56 ` [PATCH 15/25] eal: do not panic on alarm init Aaron Conole
2017-01-27 16:31   ` Stephen Hemminger
2017-01-27 16:42     ` Bruce Richardson
2017-01-30 16:52       ` Aaron Conole
2017-01-27 14:56 ` [PATCH 16/25] eal: convert timer_init not to call panic Aaron Conole
2017-01-27 14:56 ` [PATCH 17/25] eal: change the private pipe call to reflect errno Aaron Conole
2017-01-27 14:56 ` [PATCH 18/25] eal: Do not panic on interrupt thread init Aaron Conole
2017-01-27 14:56 ` [PATCH 19/25] eal: do not error if plugins fail to init Aaron Conole
2017-01-27 14:56 ` [PATCH 20/25] eal_pci: Continue probing even on failures Aaron Conole
2017-01-27 14:56 ` [PATCH 21/25] eal: do not panic on failed PCI probe Aaron Conole
2017-01-27 14:57 ` [PATCH 22/25] eal_common_dev: continue initializing vdevs Aaron Conole
2017-01-27 14:57 ` [PATCH 23/25] eal: do not panic (or abort) if vdev init fails Aaron Conole
2017-01-27 14:57 ` [PATCH 24/25] eal: do not panic when bus probe fails Aaron Conole
2017-01-27 14:57 ` [PATCH 25/25] rte_eal_init: add info about rte_errno codes Aaron Conole
2017-01-27 16:33   ` Stephen Hemminger
2017-01-27 16:47     ` Bruce Richardson
2017-01-27 17:37       ` Stephen Hemminger
2017-01-30 18:38         ` Aaron Conole
2017-01-30 20:19           ` Thomas Monjalon
2017-02-01 10:54             ` Adrien Mazarguil
2017-02-01 12:06               ` Jan Blunck
2017-02-01 14:18                 ` Bruce Richardson
2017-01-31  9:33           ` Bruce Richardson
2017-01-31 16:56             ` Stephen Hemminger
2017-01-27 15:48 ` [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.