linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] kmod/umh: a few fixes
@ 2020-06-10 15:49 Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 1/5] selftests: kmod: Use variable NAME in kmod_test_0001() Luis R. Rodriguez
                   ` (5 more replies)
  0 siblings, 6 replies; 43+ messages in thread
From: Luis R. Rodriguez @ 2020-06-10 15:49 UTC (permalink / raw)
  To: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner
  Cc: slyfox, ast, keescook, josh, ravenexp, chainsaw, linux-fsdevel,
	linux-nfs, bridge, keyrings, linux-security-module, linux-kernel,
	Luis Chamberlain

From: Luis Chamberlain <mcgrof@kernel.org>

Tiezhu Yang had sent out a patch set with a slew of kmod selftest
fixes, and one patch which modified kmod to return 254 when a module
was not found. This opened up pandora's box about why that was being
used for and low and behold its because when UMH_WAIT_PROC is used
we call a kernel_wait4() call but have never unwrapped the error code.
The commit log for that fix details the rationale for the approach
taken. I'd appreciate some review on that, in particular nfs folks
as it seems a case was never really hit before.

This goes boot tested, selftested with kmod, and 0-day gives its
build blessings.

Luis Chamberlain (2):
  umh: fix processed error when UMH_WAIT_PROC is used
  selftests: simplify kmod failure value

Tiezhu Yang (3):
  selftests: kmod: Use variable NAME in kmod_test_0001()
  kmod: Remove redundant "be an" in the comment
  test_kmod: Avoid potential double free in trigger_config_run_type()

 drivers/block/drbd/drbd_nl.c         | 20 +++++------
 fs/nfsd/nfs4recover.c                |  2 +-
 include/linux/sched/task.h           | 13 ++++++++
 kernel/kmod.c                        |  5 ++-
 kernel/umh.c                         |  4 +--
 lib/test_kmod.c                      |  2 +-
 net/bridge/br_stp_if.c               | 10 ++----
 security/keys/request_key.c          |  2 +-
 tools/testing/selftests/kmod/kmod.sh | 50 +++++++++++++++++++++++-----
 9 files changed, 71 insertions(+), 37 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/5] selftests: kmod: Use variable NAME in kmod_test_0001()
  2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
@ 2020-06-10 15:49 ` Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 2/5] kmod: Remove redundant "be an" in the comment Luis R. Rodriguez
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Luis R. Rodriguez @ 2020-06-10 15:49 UTC (permalink / raw)
  To: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner
  Cc: slyfox, ast, keescook, josh, ravenexp, chainsaw, linux-fsdevel,
	linux-nfs, bridge, keyrings, linux-security-module, linux-kernel,
	Tiezhu Yang, Luis Chamberlain

From: Tiezhu Yang <yangtiezhu@loongson.cn>

Use the variable NAME instead of "\000" directly in kmod_test_0001().

Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 tools/testing/selftests/kmod/kmod.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index 3702dbcc90a7..da60c3bd4f23 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -341,7 +341,7 @@ kmod_test_0001_driver()
 
 	kmod_defaults_driver
 	config_num_threads 1
-	printf '\000' >"$DIR"/config_test_driver
+	printf $NAME >"$DIR"/config_test_driver
 	config_trigger ${FUNCNAME[0]}
 	config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
 }
@@ -352,7 +352,7 @@ kmod_test_0001_fs()
 
 	kmod_defaults_fs
 	config_num_threads 1
-	printf '\000' >"$DIR"/config_test_fs
+	printf $NAME >"$DIR"/config_test_fs
 	config_trigger ${FUNCNAME[0]}
 	config_expect_result ${FUNCNAME[0]} -EINVAL
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/5] kmod: Remove redundant "be an" in the comment
  2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 1/5] selftests: kmod: Use variable NAME in kmod_test_0001() Luis R. Rodriguez
@ 2020-06-10 15:49 ` Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 3/5] test_kmod: Avoid potential double free in trigger_config_run_type() Luis R. Rodriguez
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Luis R. Rodriguez @ 2020-06-10 15:49 UTC (permalink / raw)
  To: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner
  Cc: slyfox, ast, keescook, josh, ravenexp, chainsaw, linux-fsdevel,
	linux-nfs, bridge, keyrings, linux-security-module, linux-kernel,
	Tiezhu Yang, Luis Chamberlain

From: Tiezhu Yang <yangtiezhu@loongson.cn>

There exists redundant "be an" in the comment, remove it.

Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/kmod.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 37c3c4b97b8e..3cd075ce2a1e 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -36,9 +36,8 @@
  *
  * If you need less than 50 threads would mean we're dealing with systems
  * smaller than 3200 pages. This assumes you are capable of having ~13M memory,
- * and this would only be an be an upper limit, after which the OOM killer
- * would take effect. Systems like these are very unlikely if modules are
- * enabled.
+ * and this would only be an upper limit, after which the OOM killer would take
+ * effect. Systems like these are very unlikely if modules are enabled.
  */
 #define MAX_KMOD_CONCURRENT 50
 static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/5] test_kmod: Avoid potential double free in trigger_config_run_type()
  2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 1/5] selftests: kmod: Use variable NAME in kmod_test_0001() Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 2/5] kmod: Remove redundant "be an" in the comment Luis R. Rodriguez
@ 2020-06-10 15:49 ` Luis R. Rodriguez
  2020-06-10 15:49 ` [PATCH 4/5] umh: fix processed error when UMH_WAIT_PROC is used Luis R. Rodriguez
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 43+ messages in thread
From: Luis R. Rodriguez @ 2020-06-10 15:49 UTC (permalink / raw)
  To: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner
  Cc: slyfox, ast, keescook, josh, ravenexp, chainsaw, linux-fsdevel,
	linux-nfs, bridge, keyrings, linux-security-module, linux-kernel,
	Tiezhu Yang, Luis Chamberlain

From: Tiezhu Yang <yangtiezhu@loongson.cn>

Reset the member "test_fs" of the test configuration after a call
of the function "kfree_const" to a null pointer so that a double
memory release will not be performed.

Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 lib/test_kmod.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_kmod.c b/lib/test_kmod.c
index e651c37d56db..eab52770070d 100644
--- a/lib/test_kmod.c
+++ b/lib/test_kmod.c
@@ -745,7 +745,7 @@ static int trigger_config_run_type(struct kmod_test_device *test_dev,
 		break;
 	case TEST_KMOD_FS_TYPE:
 		kfree_const(config->test_fs);
-		config->test_driver = NULL;
+		config->test_fs = NULL;
 		copied = config_copy_test_fs(config, test_str,
 					     strlen(test_str));
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 4/5] umh: fix processed error when UMH_WAIT_PROC is used
  2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
                   ` (2 preceding siblings ...)
  2020-06-10 15:49 ` [PATCH 3/5] test_kmod: Avoid potential double free in trigger_config_run_type() Luis R. Rodriguez
@ 2020-06-10 15:49 ` Luis R. Rodriguez
  2020-06-23 14:11   ` linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected) Christian Borntraeger
  2020-06-10 15:49 ` [PATCH 5/5] selftests: simplify kmod failure value Luis R. Rodriguez
  2020-06-18  0:43 ` [PATCH 0/5] kmod/umh: a few fixes Andrew Morton
  5 siblings, 1 reply; 43+ messages in thread
From: Luis R. Rodriguez @ 2020-06-10 15:49 UTC (permalink / raw)
  To: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner
  Cc: slyfox, ast, keescook, josh, ravenexp, chainsaw, linux-fsdevel,
	linux-nfs, bridge, keyrings, linux-security-module, linux-kernel,
	Luis Chamberlain, Tiezhu Yang

From: Luis Chamberlain <mcgrof@kernel.org>

When UMH_WAIT_PROC is used we call kernel_wait4(). This is
the *only* place in the kernel where we actually inspect the
error code. Prior to this patch we returned the value from the
wait call, and that technically requires us to use wrappers
such as WEXITSTATUS(). We either fix all callers to start
using WEXITSTATUS() and friends *or* we do address this within
the umh code and let the callers get the actual error code.

The way we use kernel_wait4() on the umh is with the options
set to 0, and when this is done the wait call only waits for
terminated children. Because of this, there is no point to
complicate checks for the umh with W*() calls. That would
make the checks complex, redundant, and simply not needed.

By making the umh do the checks for us we keep users
kernel_wait4() at bay, and promote avoiding introduction of
further W*() macros and the complexities this can bring.

There were only a few callers which properly checked for
the error status using open-coded solutions. We remove
them as they are no longer needed, and also remove open
coded implicit uses of W*() uses which should never
trigger given that the options passed to wait is 0.

The only helpers we really need are for termination, so we
just include those, and we prefix our W*() helpers with K.

Since all this does is *correct* an error code, if one
was found, this change only fixes reporting the *correct*
error, and there are two places where this matters, and
which this patch fixes:

  * request_module() used to fail with an error code of
    256 when a module was not found. Now it properly
    returns 1.

  * fs/nfsd/nfs4recover.c: we never were disabling the
    upcall as the error code of -ENOENT or -EACCES was
    *never* properly checked for.

Reported-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/drbd/drbd_nl.c | 20 ++++++++------------
 fs/nfsd/nfs4recover.c        |  2 +-
 include/linux/sched/task.h   | 13 +++++++++++++
 kernel/umh.c                 |  4 ++--
 net/bridge/br_stp_if.c       | 10 ++--------
 security/keys/request_key.c  |  2 +-
 6 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index da4a3ebe04ef..aee272e620b9 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -382,13 +382,11 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
 	notify_helper(NOTIFY_CALL, device, connection, cmd, 0);
 	ret = call_usermodehelper(drbd_usermode_helper, argv, envp, UMH_WAIT_PROC);
 	if (ret)
-		drbd_warn(device, "helper command: %s %s %s exit code %u (0x%x)\n",
-				drbd_usermode_helper, cmd, mb,
-				(ret >> 8) & 0xff, ret);
+		drbd_warn(device, "helper command: %s %s %s failed with exit code %u (0x%x)\n",
+				drbd_usermode_helper, cmd, mb, ret, ret);
 	else
-		drbd_info(device, "helper command: %s %s %s exit code %u (0x%x)\n",
-				drbd_usermode_helper, cmd, mb,
-				(ret >> 8) & 0xff, ret);
+		drbd_info(device, "helper command: %s %s %s completed successfully\n",
+				drbd_usermode_helper, cmd, mb);
 	sib.sib_reason = SIB_HELPER_POST;
 	sib.helper_exit_code = ret;
 	drbd_bcast_event(device, &sib);
@@ -424,13 +422,11 @@ enum drbd_peer_state conn_khelper(struct drbd_connection *connection, char *cmd)
 
 	ret = call_usermodehelper(drbd_usermode_helper, argv, envp, UMH_WAIT_PROC);
 	if (ret)
-		drbd_warn(connection, "helper command: %s %s %s exit code %u (0x%x)\n",
-			  drbd_usermode_helper, cmd, resource_name,
-			  (ret >> 8) & 0xff, ret);
+		drbd_warn(connection, "helper command: %s %s %s failed with exit code %u (0x%x)\n",
+			  drbd_usermode_helper, cmd, resource_name, ret, ret);
 	else
-		drbd_info(connection, "helper command: %s %s %s exit code %u (0x%x)\n",
-			  drbd_usermode_helper, cmd, resource_name,
-			  (ret >> 8) & 0xff, ret);
+		drbd_info(connection, "helper command: %s %s %s completed successfully\n",
+			  drbd_usermode_helper, cmd, resource_name);
 	/* TODO: conn_bcast_event() ?? */
 	notify_helper(NOTIFY_RESPONSE, NULL, connection, cmd, ret);
 
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 9e40dfecf1b1..33e6a7fd7961 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1820,7 +1820,7 @@ nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1)
 
 	ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
 	/*
-	 * Disable the upcall mechanism if we're getting an ENOENT or EACCES
+	 * Disable the upcall mechanism if we're getting an -ENOENT or -EACCES
 	 * error. The admin can re-enable it on the fly by using sysfs
 	 * once the problem has been fixed.
 	 */
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 38359071236a..bba06befbff5 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -103,6 +103,19 @@ struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
 
+/* Only add helpers for actual use cases in the kernel */
+#define KWEXITSTATUS(status)		(__KWEXITSTATUS(status))
+#define KWIFEXITED(status)		(__KWIFEXITED(status))
+
+/* Nonzero if STATUS indicates normal termination.  */
+#define __KWIFEXITED(status)     (__KWTERMSIG(status) == 0)
+
+/* If KWIFEXITED(STATUS), the low-order 8 bits of the status.  */
+#define __KWEXITSTATUS(status)   (((status) & 0xff00) >> 8)
+
+/* If KWIFSIGNALED(STATUS), the terminating signal.  */
+#define __KWTERMSIG(status)      ((status) & 0x7f)
+
 extern void free_task(struct task_struct *tsk);
 
 /* sched_exec is called by processes performing an exec */
diff --git a/kernel/umh.c b/kernel/umh.c
index 79f139a7ca03..f81e8698e36e 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 		 * the real error code is already in sub_info->retval or
 		 * sub_info->retval is 0 anyway, so don't mess with it then.
 		 */
-		if (ret)
-			sub_info->retval = ret;
+		if (KWIFEXITED(ret))
+			sub_info->retval = KWEXITSTATUS(ret);
 	}
 
 	/* Restore default kernel sig handler */
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index ba55851fe132..bdd94b45396b 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -133,14 +133,8 @@ static int br_stp_call_user(struct net_bridge *br, char *arg)
 
 	/* call userspace STP and report program errors */
 	rc = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC);
-	if (rc > 0) {
-		if (rc & 0xff)
-			br_debug(br, BR_STP_PROG " received signal %d\n",
-				 rc & 0x7f);
-		else
-			br_debug(br, BR_STP_PROG " exited with code %d\n",
-				 (rc >> 8) & 0xff);
-	}
+	if (rc != 0)
+		br_debug(br, BR_STP_PROG " failed with exit code %d\n", rc);
 
 	return rc;
 }
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index e1b9f1a80676..ff462f3d46ca 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -193,7 +193,7 @@ static int call_sbin_request_key(struct key *authkey, void *aux)
 	ret = call_usermodehelper_keys(request_key, argv, envp, keyring,
 				       UMH_WAIT_PROC);
 	kdebug("usermode -> 0x%x", ret);
-	if (ret >= 0) {
+	if (ret != 0) {
 		/* ret is the exit/wait code */
 		if (test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) ||
 		    key_validate(key) < 0)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 5/5] selftests: simplify kmod failure value
  2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
                   ` (3 preceding siblings ...)
  2020-06-10 15:49 ` [PATCH 4/5] umh: fix processed error when UMH_WAIT_PROC is used Luis R. Rodriguez
@ 2020-06-10 15:49 ` Luis R. Rodriguez
  2020-06-18  0:43 ` [PATCH 0/5] kmod/umh: a few fixes Andrew Morton
  5 siblings, 0 replies; 43+ messages in thread
From: Luis R. Rodriguez @ 2020-06-10 15:49 UTC (permalink / raw)
  To: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner
  Cc: slyfox, ast, keescook, josh, ravenexp, chainsaw, linux-fsdevel,
	linux-nfs, bridge, keyrings, linux-security-module, linux-kernel,
	Luis Chamberlain, Tiezhu Yang

From: Luis Chamberlain <mcgrof@kernel.org>

The "odd" 256 value was just an issue with the umh never
wrapping it around with WEXITSTATUS() for us. Now that it
does that, we can use a sane value / name for the selftest,
and this is no longer a oddity.

We add a way to detect this for older kernels, and support
the old return value for kernel code where it was given.

This never affected userspace.

Reported-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 tools/testing/selftests/kmod/kmod.sh | 46 +++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index da60c3bd4f23..df7b21d8561c 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -64,6 +64,8 @@ ALL_TESTS="$ALL_TESTS 0009:150:1"
 ALL_TESTS="$ALL_TESTS 0010:1:1"
 ALL_TESTS="$ALL_TESTS 0011:1:1"
 
+MODULE_NOT_FOUND="FAILURE"
+
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 
@@ -155,14 +157,19 @@ test_finish()
 	echo "Test completed"
 }
 
+# OLD_FAILURE is just because the old kernel umh never wrapped
+# the error with WEXITSTATUS(). Now that it does it, we get the
+# appropriate actual value from userspace observed in-kernel.
+
+# We keep the old mapping to ensure this script keeps working
+# with older kernels.
 errno_name_to_val()
 {
 	case "$1" in
-	# kmod calls modprobe and upon of a module not found
-	# modprobe returns just 1... However in the kernel we
-	# *sometimes* see 256...
-	MODULE_NOT_FOUND)
+	OLD_FAILURE)
 		echo 256;;
+	FAILURE)
+		echo 1;;
 	SUCCESS)
 		echo 0;;
 	-EPERM)
@@ -181,7 +188,9 @@ errno_name_to_val()
 errno_val_to_name()
 	case "$1" in
 	256)
-		echo MODULE_NOT_FOUND;;
+		echo OLD_FAILURE;;
+	1)
+		echo FAILURE;;
 	0)
 		echo SUCCESS;;
 	-1)
@@ -335,6 +344,28 @@ kmod_defaults_fs()
 	config_set_test_case_fs
 }
 
+check_umh()
+{
+	NAME=''
+
+	kmod_defaults_driver
+	config_num_threads 1
+	printf '\0' >"$DIR"/config_test_driver
+	config_trigger ${FUNCNAME[0]}
+	RC=$(config_get_test_result)
+	if [[ "$RC" == "256" ]]; then
+		MODULE_NOT_FOUND="OLD_FAILURE"
+		echo "check_umh: you have and old umh which didn't wrap errors"
+		echo "           with WEXITSTATUS(). This is OK!"
+	elif [[ "$RC" != "1" ]]; then
+		echo "check_umh: Unexpected return value with no modprobe argument: $RC"
+		exit
+	else
+		echo "check_umh: You have a new umh which wraps erros with"
+		echo "           WEXITSTATUS(). This is OK!"
+	fi
+}
+
 kmod_test_0001_driver()
 {
 	NAME='\000'
@@ -343,7 +374,7 @@ kmod_test_0001_driver()
 	config_num_threads 1
 	printf $NAME >"$DIR"/config_test_driver
 	config_trigger ${FUNCNAME[0]}
-	config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+	config_expect_result ${FUNCNAME[0]} $MODULE_NOT_FOUND
 }
 
 kmod_test_0001_fs()
@@ -371,7 +402,7 @@ kmod_test_0002_driver()
 	config_set_driver $NAME
 	config_num_threads 1
 	config_trigger ${FUNCNAME[0]}
-	config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+	config_expect_result ${FUNCNAME[0]} $MODULE_NOT_FOUND
 }
 
 kmod_test_0002_fs()
@@ -648,6 +679,7 @@ load_req_mod
 MODPROBE=$(</proc/sys/kernel/modprobe)
 trap "test_finish" EXIT
 
+check_umh
 parse_args $@
 
 exit 0
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/5] kmod/umh: a few fixes
  2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
                   ` (4 preceding siblings ...)
  2020-06-10 15:49 ` [PATCH 5/5] selftests: simplify kmod failure value Luis R. Rodriguez
@ 2020-06-18  0:43 ` Andrew Morton
  2020-06-19 20:46   ` Luis Chamberlain
  5 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2020-06-18  0:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, bfields,
	chuck.lever, roopa, nikolay, davem, kuba, dhowells,
	jarkko.sakkinen, jmorris, serge, christian.brauner, slyfox, ast,
	keescook, josh, ravenexp, chainsaw, linux-fsdevel, linux-nfs,
	bridge, keyrings, linux-security-module, linux-kernel

On Wed, 10 Jun 2020 15:49:18 +0000 "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:

> Tiezhu Yang had sent out a patch set with a slew of kmod selftest
> fixes, and one patch which modified kmod to return 254 when a module
> was not found. This opened up pandora's box about why that was being
> used for and low and behold its because when UMH_WAIT_PROC is used
> we call a kernel_wait4() call but have never unwrapped the error code.
> The commit log for that fix details the rationale for the approach
> taken. I'd appreciate some review on that, in particular nfs folks
> as it seems a case was never really hit before.
> 
> This goes boot tested, selftested with kmod, and 0-day gives its
> build blessings.

Any thoughts on which kernel version(s) need some/all of these fixes?

>  drivers/block/drbd/drbd_nl.c         | 20 +++++------
>  fs/nfsd/nfs4recover.c                |  2 +-
>  include/linux/sched/task.h           | 13 ++++++++
>  kernel/kmod.c                        |  5 ++-
>  kernel/umh.c                         |  4 +--
>  lib/test_kmod.c                      |  2 +-
>  net/bridge/br_stp_if.c               | 10 ++----
>  security/keys/request_key.c          |  2 +-
>  tools/testing/selftests/kmod/kmod.sh | 50 +++++++++++++++++++++++-----

I'm not really sure who takes kmod changes - I'll grab these unless
someone shouts at me.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/5] kmod/umh: a few fixes
  2020-06-18  0:43 ` [PATCH 0/5] kmod/umh: a few fixes Andrew Morton
@ 2020-06-19 20:46   ` Luis Chamberlain
  2020-06-19 21:07     ` Luis Chamberlain
  0 siblings, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-19 20:46 UTC (permalink / raw)
  Cc: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, roopa,
	nikolay, davem, kuba, dhowells, jarkko.sakkinen, jmorris, serge,
	christian.brauner, slyfox, ast, keescook, josh, ravenexp,
	chainsaw, linux-fsdevel, bridge, keyrings, linux-security-module,
	linux-kernel

On Wed, Jun 17, 2020 at 05:43:48PM -0700, Andrew Morton wrote:
> On Wed, 10 Jun 2020 15:49:18 +0000 "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> 
> > Tiezhu Yang had sent out a patch set with a slew of kmod selftest
> > fixes, and one patch which modified kmod to return 254 when a module
> > was not found. This opened up pandora's box about why that was being
> > used for and low and behold its because when UMH_WAIT_PROC is used
> > we call a kernel_wait4() call but have never unwrapped the error code.
> > The commit log for that fix details the rationale for the approach
> > taken. I'd appreciate some review on that, in particular nfs folks
> > as it seems a case was never really hit before.
> > 
> > This goes boot tested, selftested with kmod, and 0-day gives its
> > build blessings.
> 
> Any thoughts on which kernel version(s) need some/all of these fixes?

Well, in so far as fixes, this is the real important part:

* request_module() used to fail with an error code of
  256 when a module was not found. Now it properly
  returns 1.

* fs/nfsd/nfs4recover.c: we never were disabling the
  upcall as the error code of -ENOENT or -EACCES was
  *never* properly checked for error code

Since the request_module() fix is only affecting userspace
for the kmod tests, through the kmod test driver, ie, we don't expose
this to userspace in any other place, I don't see that as critical.
Let me be clear, we have a test_kmod driver which exposes knobs
and one of the knobs lets userspace query the return value of a
request_module() call, and we use this test_kmod driver to stress
test kmod loader. Let us also recall that the fix is *iff* an error
*did* occur. I *cannot* think of a reason why this would be critical
to merge to older stable kernels for this reason for request_module()'s
sake.

Bruce, Chuck:

But... for NFS... I'd like the NFS folks to really look at that
and tell us is some folks really should care about that. I also
find it perplexing there was a comment in place there to *ensure*
the error was checked for, and so it seemed someone cared for that
condition.

> >  drivers/block/drbd/drbd_nl.c         | 20 +++++------
> >  fs/nfsd/nfs4recover.c                |  2 +-
> >  include/linux/sched/task.h           | 13 ++++++++
> >  kernel/kmod.c                        |  5 ++-
> >  kernel/umh.c                         |  4 +--
> >  lib/test_kmod.c                      |  2 +-
> >  net/bridge/br_stp_if.c               | 10 ++----
> >  security/keys/request_key.c          |  2 +-
> >  tools/testing/selftests/kmod/kmod.sh | 50 +++++++++++++++++++++++-----
> 
> I'm not really sure who takes kmod changes - I'll grab these unless
> someone shouts at me.

Greg usually takes it, but as usual, thanks for picking up the slack ;)

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/5] kmod/umh: a few fixes
  2020-06-19 20:46   ` Luis Chamberlain
@ 2020-06-19 21:07     ` Luis Chamberlain
  0 siblings, 0 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-19 21:07 UTC (permalink / raw)
  To: Andrew Morton, J. Bruce Fields, Chuck Lever, linux-nfs,
	Christoph Hellwig, Jessica Yu
  Cc: gregkh, viro, philipp.reisner, lars.ellenberg, axboe, roopa,
	nikolay, davem, kuba, dhowells, jarkko.sakkinen, jmorris, serge,
	christian.brauner, slyfox, ast, keescook, josh, ravenexp,
	chainsaw, linux-fsdevel, bridge, keyrings, linux-security-module,
	linux-kernel

Sorry it seems mutt ate my To:, so adding the folks I intended to
address on the To: field now :)

  Luis

On Fri, Jun 19, 2020 at 08:46:26PM +0000, Luis Chamberlain wrote:
> On Wed, Jun 17, 2020 at 05:43:48PM -0700, Andrew Morton wrote:
> > On Wed, 10 Jun 2020 15:49:18 +0000 "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> > 
> > > Tiezhu Yang had sent out a patch set with a slew of kmod selftest
> > > fixes, and one patch which modified kmod to return 254 when a module
> > > was not found. This opened up pandora's box about why that was being
> > > used for and low and behold its because when UMH_WAIT_PROC is used
> > > we call a kernel_wait4() call but have never unwrapped the error code.
> > > The commit log for that fix details the rationale for the approach
> > > taken. I'd appreciate some review on that, in particular nfs folks
> > > as it seems a case was never really hit before.
> > > 
> > > This goes boot tested, selftested with kmod, and 0-day gives its
> > > build blessings.
> > 
> > Any thoughts on which kernel version(s) need some/all of these fixes?
> 
> Well, in so far as fixes, this is the real important part:
> 
> * request_module() used to fail with an error code of
>   256 when a module was not found. Now it properly
>   returns 1.
> 
> * fs/nfsd/nfs4recover.c: we never were disabling the
>   upcall as the error code of -ENOENT or -EACCES was
>   *never* properly checked for error code
> 
> Since the request_module() fix is only affecting userspace
> for the kmod tests, through the kmod test driver, ie, we don't expose
> this to userspace in any other place, I don't see that as critical.
> Let me be clear, we have a test_kmod driver which exposes knobs
> and one of the knobs lets userspace query the return value of a
> request_module() call, and we use this test_kmod driver to stress
> test kmod loader. Let us also recall that the fix is *iff* an error
> *did* occur. I *cannot* think of a reason why this would be critical
> to merge to older stable kernels for this reason for request_module()'s
> sake.
> 
> Bruce, Chuck:
> 
> But... for NFS... I'd like the NFS folks to really look at that
> and tell us is some folks really should care about that. I also
> find it perplexing there was a comment in place there to *ensure*
> the error was checked for, and so it seemed someone cared for that
> condition.
> 
> > >  drivers/block/drbd/drbd_nl.c         | 20 +++++------
> > >  fs/nfsd/nfs4recover.c                |  2 +-
> > >  include/linux/sched/task.h           | 13 ++++++++
> > >  kernel/kmod.c                        |  5 ++-
> > >  kernel/umh.c                         |  4 +--
> > >  lib/test_kmod.c                      |  2 +-
> > >  net/bridge/br_stp_if.c               | 10 ++----
> > >  security/keys/request_key.c          |  2 +-
> > >  tools/testing/selftests/kmod/kmod.sh | 50 +++++++++++++++++++++++-----
> > 
> > I'm not really sure who takes kmod changes - I'll grab these unless
> > someone shouts at me.
> 
> Greg usually takes it, but as usual, thanks for picking up the slack ;)
> 
>   Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-10 15:49 ` [PATCH 4/5] umh: fix processed error when UMH_WAIT_PROC is used Luis R. Rodriguez
@ 2020-06-23 14:11   ` Christian Borntraeger
  2020-06-23 14:23     ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-23 14:11 UTC (permalink / raw)
  To: mcgrof
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

Jens Markwardt reported a regression in the linux-next runs.  with "umh: fix
processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge
with an KVM guests no longer activates :

without patch
# ip addr show dev virbr1
6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
       valid_lft forever preferred_lft forever

with this patch the bridge stays DOWN with NO-CARRIER

# ip addr show dev virbr1
6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
       valid_lft forever preferred_lft forever

This was bisected in linux-next. Reverting from linux-next also fixes the issue.

Any idea?

Christian

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-23 14:11   ` linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected) Christian Borntraeger
@ 2020-06-23 14:23     ` Christian Borntraeger
  2020-06-24 11:11       ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-23 14:23 UTC (permalink / raw)
  To: mcgrof
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390



On 23.06.20 16:11, Christian Borntraeger wrote:
> Jens Markwardt reported a regression in the linux-next runs.  with "umh: fix
> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge
> with an KVM guests no longer activates :
> 
> without patch
> # ip addr show dev virbr1
> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
>        valid_lft forever preferred_lft forever
> 
> with this patch the bridge stays DOWN with NO-CARRIER
> 
> # ip addr show dev virbr1
> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
>        valid_lft forever preferred_lft forever
> 
> This was bisected in linux-next. Reverting from linux-next also fixes the issue.
> 
> Any idea?

FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-23 14:23     ` Christian Borntraeger
@ 2020-06-24 11:11       ` Christian Borntraeger
  2020-06-24 12:05         ` Luis Chamberlain
  2020-06-24 14:43         ` Christoph Hellwig
  0 siblings, 2 replies; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-24 11:11 UTC (permalink / raw)
  To: mcgrof
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390



On 23.06.20 16:23, Christian Borntraeger wrote:
> 
> 
> On 23.06.20 16:11, Christian Borntraeger wrote:
>> Jens Markwardt reported a regression in the linux-next runs.  with "umh: fix
>> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge
>> with an KVM guests no longer activates :
>>
>> without patch
>> # ip addr show dev virbr1
>> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
>>        valid_lft forever preferred_lft forever
>>
>> with this patch the bridge stays DOWN with NO-CARRIER
>>
>> # ip addr show dev virbr1
>> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
>>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
>>        valid_lft forever preferred_lft forever
>>
>> This was bisected in linux-next. Reverting from linux-next also fixes the issue.
>>
>> Any idea?
> 
> FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong.

Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
linux-bridge on s390?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 11:11       ` Christian Borntraeger
@ 2020-06-24 12:05         ` Luis Chamberlain
  2020-06-24 13:17           ` Luis Chamberlain
  2020-06-24 14:43         ` Christoph Hellwig
  1 sibling, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-24 12:05 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
> 
> 
> On 23.06.20 16:23, Christian Borntraeger wrote:
> > 
> > 
> > On 23.06.20 16:11, Christian Borntraeger wrote:
> >> Jens Markwardt reported a regression in the linux-next runs.  with "umh: fix
> >> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge
> >> with an KVM guests no longer activates :
> >>
> >> without patch
> >> # ip addr show dev virbr1
> >> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> >>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
> >>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
> >>        valid_lft forever preferred_lft forever
> >>
> >> with this patch the bridge stays DOWN with NO-CARRIER
> >>
> >> # ip addr show dev virbr1
> >> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
> >>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
> >>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
> >>        valid_lft forever preferred_lft forever
> >>
> >> This was bisected in linux-next. Reverting from linux-next also fixes the issue.
> >>
> >> Any idea?
> > 
> > FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong.
> 
> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
> linux-bridge on s390?

glibc for instance defines __WEXITSTATUS in only one location: bits/waitstatus.h
and it does not special case it per architecture, so at this point I'd
have to say we have to look somewhere else for why this is happening.

The commmit which caused this is issuing a correct error code down the
pipeline, nothing more. I'll make taking a look at this a priority right
now. Let us see what I come up with today.

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 12:05         ` Luis Chamberlain
@ 2020-06-24 13:17           ` Luis Chamberlain
  2020-06-24 16:13             ` Luis Chamberlain
  0 siblings, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-24 13:17 UTC (permalink / raw)
  To: Christian Borntraeger, Andrew Morton, Martin Doucha
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

Martin, your eyeballs would be appreciated for a bit on this.

On Wed, Jun 24, 2020 at 12:05:46PM +0000, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 23.06.20 16:23, Christian Borntraeger wrote:
> > > 
> > > 
> > > On 23.06.20 16:11, Christian Borntraeger wrote:
> > >> Jens Markwardt reported a regression in the linux-next runs.  with "umh: fix
> > >> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge
> > >> with an KVM guests no longer activates :
> > >>
> > >> without patch
> > >> # ip addr show dev virbr1
> > >> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > >>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
> > >>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
> > >>        valid_lft forever preferred_lft forever
> > >>
> > >> with this patch the bridge stays DOWN with NO-CARRIER
> > >>
> > >> # ip addr show dev virbr1
> > >> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
> > >>     link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff
> > >>     inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1
> > >>        valid_lft forever preferred_lft forever
> > >>
> > >> This was bisected in linux-next. Reverting from linux-next also fixes the issue.
> > >>
> > >> Any idea?
> > > 
> > > FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong.
> > 
> > Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
> > linux-bridge on s390?
> 
> glibc for instance defines __WEXITSTATUS in only one location: bits/waitstatus.h
> and it does not special case it per architecture, so at this point I'd
> have to say we have to look somewhere else for why this is happening.

I found however an LTP bug indicating the need to test for
s390 wait macros [0] in light of a recent bug in glibc for s390.
I am asking for references to that issue given I cannot find
any mention of this on glibc yet.

I'm in hopes Martin might be aware of that mentioned s390 glic bug.

[0] https://github.com/linux-test-project/ltp/issues/605

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 11:11       ` Christian Borntraeger
  2020-06-24 12:05         ` Luis Chamberlain
@ 2020-06-24 14:43         ` Christoph Hellwig
  2020-06-24 15:54           ` Christian Borntraeger
  1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2020-06-24 14:43 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: mcgrof, ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
> linux-bridge on s390?

Are we even sure this is s390 specific and doesn't happen on other
architectures with the same bridge setup?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 14:43         ` Christoph Hellwig
@ 2020-06-24 15:54           ` Christian Borntraeger
  2020-06-24 16:09             ` Luis Chamberlain
  0 siblings, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-24 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mcgrof, ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390



On 24.06.20 16:43, Christoph Hellwig wrote:
> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>> linux-bridge on s390?
> 
> Are we even sure this is s390 specific and doesn't happen on other
> architectures with the same bridge setup?

Fair point. AFAIK nobody has tested this yet on x86.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 15:54           ` Christian Borntraeger
@ 2020-06-24 16:09             ` Luis Chamberlain
  2020-06-24 17:58               ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-24 16:09 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 16:43, Christoph Hellwig wrote:
> > On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
> >> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
> >> linux-bridge on s390?
> > 
> > Are we even sure this is s390 specific and doesn't happen on other
> > architectures with the same bridge setup?
> 
> Fair point. AFAIK nobody has tested this yet on x86.

Regardless, can you enable dynamic debug prints, to see if the kernel
reveals anything on the bridge code which may be relevant:

echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 13:17           ` Luis Chamberlain
@ 2020-06-24 16:13             ` Luis Chamberlain
  0 siblings, 0 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Christian Borntraeger, Andrew Morton, Martin Doucha, hch
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

On Wed, Jun 24, 2020 at 01:17:25PM +0000, Luis Chamberlain wrote:
> I found however an LTP bug indicating the need to test for
> s390 wait macros [0] in light of a recent bug in glibc for s390.
> I am asking for references to that issue given I cannot find
> any mention of this on glibc yet.
> 
> [0] https://github.com/linux-test-project/ltp/issues/605

I looked into this and the bug associated was:

https://sourceware.org/bugzilla/show_bug.cgi?id=19613

The commit in question was upstream glibc commit
b49ab5f4503f36dcbf43f821f817da66b2931fe6 ("Remove union wait [BZ
#19613]"), and while I don't see anything s390 mentioned there,
the issue there was due to the caller of the wait using a long
instead of an int for the return value.

In other words, that'd not the droid we are looking for.

So the issue is something else.

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 16:09             ` Luis Chamberlain
@ 2020-06-24 17:58               ` Christian Borntraeger
  2020-06-24 18:09                 ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-24 17:58 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 24.06.20 18:09, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 24.06.20 16:43, Christoph Hellwig wrote:
>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>>>> linux-bridge on s390?
>>>
>>> Are we even sure this is s390 specific and doesn't happen on other
>>> architectures with the same bridge setup?
>>
>> Fair point. AFAIK nobody has tested this yet on x86.
> 
> Regardless, can you enable dynamic debug prints, to see if the kernel
> reveals anything on the bridge code which may be relevant:
> 
> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control
> 
>   Luis

When I start a guest the following happens with the patch:

[   47.420237] virbr0: port 2(vnet0) entered blocking state
[   47.420242] virbr0: port 2(vnet0) entered disabled state
[   47.420315] device vnet0 entered promiscuous mode
[   47.420365] virbr0: port 2(vnet0) event 16
[   47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   47.420373] virbr0: toggle option: 12 state: 0 -> 0
[   47.420536] virbr0: port 2(vnet0) entered blocking state
[   47.420538] virbr0: port 2(vnet0) event 16
[   47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0

and the nothing happens.


without the patch
[   33.805410] virbr0: hello timer expired
[   35.805413] virbr0: hello timer expired
[   36.184349] virbr0: port 2(vnet0) entered blocking state
[   36.184353] virbr0: port 2(vnet0) entered disabled state
[   36.184427] device vnet0 entered promiscuous mode
[   36.184479] virbr0: port 2(vnet0) event 16
[   36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   36.184487] virbr0: toggle option: 12 state: 0 -> 0
[   36.184636] virbr0: port 2(vnet0) entered blocking state
[   36.184638] virbr0: port 2(vnet0) entered listening state
[   36.184639] virbr0: port 2(vnet0) event 16
[   36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   36.184645] virbr0: port 2(vnet0) event 16
[   36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   37.805478] virbr0: hello timer expired
[   38.205413] virbr0: port 2(vnet0) forward delay timer
[   38.205414] virbr0: port 2(vnet0) entered learning state
[   38.205427] virbr0: port 2(vnet0) event 16
[   38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   38.765414] virbr0: port 2(vnet0) hold timer expired
[   39.805415] virbr0: hello timer expired
[   40.285410] virbr0: port 2(vnet0) forward delay timer
[   40.285411] virbr0: port 2(vnet0) entered forwarding state
[   40.285418] virbr0: topology change detected, propagating
[   40.285420] virbr0: decreasing ageing time to 400
[   40.285427] virbr0: port 2(vnet0) event 16
[   40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   40.765408] virbr0: port 2(vnet0) hold timer expired
[   41.805415] virbr0: hello timer expired
[   42.765426] virbr0: port 2(vnet0) hold timer expired
[   43.805425] virbr0: hello timer expired
[   44.765426] virbr0: port 2(vnet0) hold timer expired
[   45.805418] virbr0: hello timer expired

and continuing....





^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 17:58               ` Christian Borntraeger
@ 2020-06-24 18:09                 ` Christian Borntraeger
  2020-06-24 18:32                   ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-24 18:09 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 24.06.20 19:58, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 18:09, Luis Chamberlain wrote:
>> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 24.06.20 16:43, Christoph Hellwig wrote:
>>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>>>>> linux-bridge on s390?
>>>>
>>>> Are we even sure this is s390 specific and doesn't happen on other
>>>> architectures with the same bridge setup?
>>>
>>> Fair point. AFAIK nobody has tested this yet on x86.
>>
>> Regardless, can you enable dynamic debug prints, to see if the kernel
>> reveals anything on the bridge code which may be relevant:
>>
>> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control
>>
>>   Luis
> 
> When I start a guest the following happens with the patch:
> 
> [   47.420237] virbr0: port 2(vnet0) entered blocking state
> [   47.420242] virbr0: port 2(vnet0) entered disabled state
> [   47.420315] device vnet0 entered promiscuous mode
> [   47.420365] virbr0: port 2(vnet0) event 16
> [   47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   47.420373] virbr0: toggle option: 12 state: 0 -> 0
> [   47.420536] virbr0: port 2(vnet0) entered blocking state
> [   47.420538] virbr0: port 2(vnet0) event 16
> [   47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0
> 
> and the nothing happens.
> 
> 
> without the patch
> [   33.805410] virbr0: hello timer expired
> [   35.805413] virbr0: hello timer expired
> [   36.184349] virbr0: port 2(vnet0) entered blocking state
> [   36.184353] virbr0: port 2(vnet0) entered disabled state
> [   36.184427] device vnet0 entered promiscuous mode
> [   36.184479] virbr0: port 2(vnet0) event 16
> [   36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   36.184487] virbr0: toggle option: 12 state: 0 -> 0
> [   36.184636] virbr0: port 2(vnet0) entered blocking state
> [   36.184638] virbr0: port 2(vnet0) entered listening state
> [   36.184639] virbr0: port 2(vnet0) event 16
> [   36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   36.184645] virbr0: port 2(vnet0) event 16
> [   36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   37.805478] virbr0: hello timer expired
> [   38.205413] virbr0: port 2(vnet0) forward delay timer
> [   38.205414] virbr0: port 2(vnet0) entered learning state
> [   38.205427] virbr0: port 2(vnet0) event 16
> [   38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   38.765414] virbr0: port 2(vnet0) hold timer expired
> [   39.805415] virbr0: hello timer expired
> [   40.285410] virbr0: port 2(vnet0) forward delay timer
> [   40.285411] virbr0: port 2(vnet0) entered forwarding state
> [   40.285418] virbr0: topology change detected, propagating
> [   40.285420] virbr0: decreasing ageing time to 400
> [   40.285427] virbr0: port 2(vnet0) event 16
> [   40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   40.765408] virbr0: port 2(vnet0) hold timer expired
> [   41.805415] virbr0: hello timer expired
> [   42.765426] virbr0: port 2(vnet0) hold timer expired
> [   43.805425] virbr0: hello timer expired
> [   44.765426] virbr0: port 2(vnet0) hold timer expired
> [   45.805418] virbr0: hello timer expired
> 
> and continuing....

Just reverting the umh.c parts like this makes the problem go away.

diff --git a/kernel/umh.c b/kernel/umh.c
index f81e8698e36e..79f139a7ca03 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
                 * the real error code is already in sub_info->retval or
                 * sub_info->retval is 0 anyway, so don't mess with it then.
                 */
-               if (KWIFEXITED(ret))
-                       sub_info->retval = KWEXITSTATUS(ret);
+               if (ret)
+                       sub_info->retval = ret;
        }
 
        /* Restore default kernel sig handler */



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 18:09                 ` Christian Borntraeger
@ 2020-06-24 18:32                   ` Christian Borntraeger
  2020-06-24 18:37                     ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-24 18:32 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 24.06.20 20:09, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 19:58, Christian Borntraeger wrote:
>>
>>
>> On 24.06.20 18:09, Luis Chamberlain wrote:
>>> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 24.06.20 16:43, Christoph Hellwig wrote:
>>>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>>>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>>>>>> linux-bridge on s390?
>>>>>
>>>>> Are we even sure this is s390 specific and doesn't happen on other
>>>>> architectures with the same bridge setup?
>>>>
>>>> Fair point. AFAIK nobody has tested this yet on x86.
>>>
>>> Regardless, can you enable dynamic debug prints, to see if the kernel
>>> reveals anything on the bridge code which may be relevant:
>>>
>>> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control
>>>
>>>   Luis
>>
>> When I start a guest the following happens with the patch:
>>
>> [   47.420237] virbr0: port 2(vnet0) entered blocking state
>> [   47.420242] virbr0: port 2(vnet0) entered disabled state
>> [   47.420315] device vnet0 entered promiscuous mode
>> [   47.420365] virbr0: port 2(vnet0) event 16
>> [   47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   47.420373] virbr0: toggle option: 12 state: 0 -> 0
>> [   47.420536] virbr0: port 2(vnet0) entered blocking state
>> [   47.420538] virbr0: port 2(vnet0) event 16
>> [   47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0
>>
>> and the nothing happens.
>>
>>
>> without the patch
>> [   33.805410] virbr0: hello timer expired
>> [   35.805413] virbr0: hello timer expired
>> [   36.184349] virbr0: port 2(vnet0) entered blocking state
>> [   36.184353] virbr0: port 2(vnet0) entered disabled state
>> [   36.184427] device vnet0 entered promiscuous mode
>> [   36.184479] virbr0: port 2(vnet0) event 16
>> [   36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   36.184487] virbr0: toggle option: 12 state: 0 -> 0
>> [   36.184636] virbr0: port 2(vnet0) entered blocking state
>> [   36.184638] virbr0: port 2(vnet0) entered listening state
>> [   36.184639] virbr0: port 2(vnet0) event 16
>> [   36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   36.184645] virbr0: port 2(vnet0) event 16
>> [   36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   37.805478] virbr0: hello timer expired
>> [   38.205413] virbr0: port 2(vnet0) forward delay timer
>> [   38.205414] virbr0: port 2(vnet0) entered learning state
>> [   38.205427] virbr0: port 2(vnet0) event 16
>> [   38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   38.765414] virbr0: port 2(vnet0) hold timer expired
>> [   39.805415] virbr0: hello timer expired
>> [   40.285410] virbr0: port 2(vnet0) forward delay timer
>> [   40.285411] virbr0: port 2(vnet0) entered forwarding state
>> [   40.285418] virbr0: topology change detected, propagating
>> [   40.285420] virbr0: decreasing ageing time to 400
>> [   40.285427] virbr0: port 2(vnet0) event 16
>> [   40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   40.765408] virbr0: port 2(vnet0) hold timer expired
>> [   41.805415] virbr0: hello timer expired
>> [   42.765426] virbr0: port 2(vnet0) hold timer expired
>> [   43.805425] virbr0: hello timer expired
>> [   44.765426] virbr0: port 2(vnet0) hold timer expired
>> [   45.805418] virbr0: hello timer expired
>>
>> and continuing....
> 
> Just reverting the umh.c parts like this makes the problem go away.
> 
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f81e8698e36e..79f139a7ca03 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>                  * the real error code is already in sub_info->retval or
>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>                  */
> -               if (KWIFEXITED(ret))
> -                       sub_info->retval = KWEXITSTATUS(ret);
> +               if (ret)
> +                       sub_info->retval = ret;
>         }
>  
>         /* Restore default kernel sig handler */

I instrumented this:

[    5.118528] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[    9.409235] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.114914] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.116308] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.117690] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.118732] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.119899] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.121365] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.128044] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.945814] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.962799] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.983755] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.992705] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.118877] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.124359] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.129364] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.142139] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.148385] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.153686] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.158861] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.164068] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.192445] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.200605] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.208493] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.216612] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.226467] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.525363] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.532758] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.535279] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.697660] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.701634] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.818652] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.836228] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   12.082266] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   40.965898] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 

So the translations look correct. But your change is actually a sematic change
if(ret) will only trigger if there is an error
if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
and we did not do it before. 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 18:32                   ` Christian Borntraeger
@ 2020-06-24 18:37                     ` Christian Borntraeger
  2020-06-25 13:26                       ` Christian Borntraeger
  2020-06-26  2:54                       ` Luis Chamberlain
  0 siblings, 2 replies; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-24 18:37 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 24.06.20 20:32, Christian Borntraeger wrote:
[...]> 
> So the translations look correct. But your change is actually a sematic change
> if(ret) will only trigger if there is an error
> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> and we did not do it before. 
> 

So the right fix is

diff --git a/kernel/umh.c b/kernel/umh.c
index f81e8698e36e..a3a3196e84d1 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
                 * the real error code is already in sub_info->retval or
                 * sub_info->retval is 0 anyway, so don't mess with it then.
                 */
-               if (KWIFEXITED(ret))
+               if (KWEXITSTATUS(ret))
                        sub_info->retval = KWEXITSTATUS(ret);
        }
 


I think.

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 18:37                     ` Christian Borntraeger
@ 2020-06-25 13:26                       ` Christian Borntraeger
  2020-06-26  2:54                       ` Luis Chamberlain
  1 sibling, 0 replies; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-25 13:26 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 24.06.20 20:37, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 20:32, Christian Borntraeger wrote:
> [...]> 
>> So the translations look correct. But your change is actually a sematic change
>> if(ret) will only trigger if there is an error
>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>> and we did not do it before. 
>>
> 
> So the right fix is
> 
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f81e8698e36e..a3a3196e84d1 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>                  * the real error code is already in sub_info->retval or
>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>                  */
> -               if (KWIFEXITED(ret))
> +               if (KWEXITSTATUS(ret))
>                         sub_info->retval = KWEXITSTATUS(ret);
>         }

Ping. Shall I send this as a proper patch or are we merging this fixup in Andrews patch queue?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-24 18:37                     ` Christian Borntraeger
  2020-06-25 13:26                       ` Christian Borntraeger
@ 2020-06-26  2:54                       ` Luis Chamberlain
  2020-06-26  5:22                         ` Christian Borntraeger
  2020-06-30 17:57                         ` Luis Chamberlain
  1 sibling, 2 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-26  2:54 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 20:32, Christian Borntraeger wrote:
> [...]> 
> > So the translations look correct. But your change is actually a sematic change
> > if(ret) will only trigger if there is an error
> > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> > and we did not do it before. 
> > 
> 
> So the right fix is
> 
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f81e8698e36e..a3a3196e84d1 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>                  * the real error code is already in sub_info->retval or
>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>                  */
> -               if (KWIFEXITED(ret))
> +               if (KWEXITSTATUS(ret))
>                         sub_info->retval = KWEXITSTATUS(ret);
>         }
>  
> I think.

Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
I'm not able to reproduce this on x86 with a bridge. What type of bridge
are you using on a guest, or did you mean using KVM so that the *host*
can spawn kvm guests?

It would be good if you can try to add a bridge manually and see where
things fail. Can you do something like this:

brctl addbr br0
brctl addif br0 ens6 
ip link set dev br0 up

Note that most callers are for modprobe. I'd be curious to see which
umh is failing which breaks bridge for you. Can you trut this so we can
see which umh call is failing?

diff --git a/kernel/umh.c b/kernel/umh.c
index f81e8698e36e..5ad74bc301d8 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -2,6 +2,9 @@
 /*
  * umh - the kernel usermode helper
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/module.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
@@ -154,8 +157,12 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 		 * the real error code is already in sub_info->retval or
 		 * sub_info->retval is 0 anyway, so don't mess with it then.
 		 */
-		if (KWIFEXITED(ret))
+		printk("== ret: %02x\n", ret);
+		printk("== KWIFEXITED(ret): %02x\n", KWIFEXITED(ret));
+		if (KWIFEXITED(ret)) {
+			printk("KWEXITSTATUS(ret): %d\n", KWEXITSTATUS(ret));
 			sub_info->retval = KWEXITSTATUS(ret);
+		}
 	}
 
 	/* Restore default kernel sig handler */
@@ -383,6 +390,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 		void *data)
 {
 	struct subprocess_info *sub_info;
+	unsigned int i = 0;
 	sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
 	if (!sub_info)
 		goto out;
@@ -394,6 +402,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
 #else
 	sub_info->path = path;
 #endif
+	pr_info("sub_info->path: %s\n", sub_info->path);
+	while (argv[i])
+		printk(KERN_INFO "%s ", argv[i++]);
+	printk(KERN_INFO  "\n");
+
 	sub_info->argv = argv;
 	sub_info->envp = envp;
 


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-26  2:54                       ` Luis Chamberlain
@ 2020-06-26  5:22                         ` Christian Borntraeger
  2020-06-26  9:00                           ` Christoph Hellwig
  2020-06-30 17:57                         ` Luis Chamberlain
  1 sibling, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-06-26  5:22 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 26.06.20 04:54, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 24.06.20 20:32, Christian Borntraeger wrote:
>> [...]> 
>>> So the translations look correct. But your change is actually a sematic change
>>> if(ret) will only trigger if there is an error
>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>>> and we did not do it before. 
>>>
>>
>> So the right fix is
>>
>> diff --git a/kernel/umh.c b/kernel/umh.c
>> index f81e8698e36e..a3a3196e84d1 100644
>> --- a/kernel/umh.c
>> +++ b/kernel/umh.c
>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>                  * the real error code is already in sub_info->retval or
>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>>                  */
>> -               if (KWIFEXITED(ret))
>> +               if (KWEXITSTATUS(ret))
>>                         sub_info->retval = KWEXITSTATUS(ret);
>>         }
>>  
>> I think.
> 
> Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().

But this IS a change over the previous code, no?
I will test next week as I am travelling right now. 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-26  5:22                         ` Christian Borntraeger
@ 2020-06-26  9:00                           ` Christoph Hellwig
  2020-06-26 11:40                             ` Luis Chamberlain
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2020-06-26  9:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Luis Chamberlain, Christoph Hellwig, ast, axboe, bfields, bridge,
	chainsaw, christian.brauner, chuck.lever, davem, dhowells,
	gregkh, jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On Fri, Jun 26, 2020 at 07:22:34AM +0200, Christian Borntraeger wrote:
> 
> 
> On 26.06.20 04:54, Luis Chamberlain wrote:
> > On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> >>
> >>
> >> On 24.06.20 20:32, Christian Borntraeger wrote:
> >> [...]> 
> >>> So the translations look correct. But your change is actually a sematic change
> >>> if(ret) will only trigger if there is an error
> >>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> >>> and we did not do it before. 
> >>>
> >>
> >> So the right fix is
> >>
> >> diff --git a/kernel/umh.c b/kernel/umh.c
> >> index f81e8698e36e..a3a3196e84d1 100644
> >> --- a/kernel/umh.c
> >> +++ b/kernel/umh.c
> >> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >>                  * the real error code is already in sub_info->retval or
> >>                  * sub_info->retval is 0 anyway, so don't mess with it then.
> >>                  */
> >> -               if (KWIFEXITED(ret))
> >> +               if (KWEXITSTATUS(ret))
> >>                         sub_info->retval = KWEXITSTATUS(ret);
> >>         }
> >>  
> >> I think.
> > 
> > Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
> 
> But this IS a change over the previous code, no?
> I will test next week as I am travelling right now. 

I'm all for reverting back to the previous behavior.  If someone wants
a behavior change it should be a separate patch.  And out of pure self
interest I'd like to see that change after my addition of the
kernel_wait helper to replace the kernel_wait4 abuse :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-26  9:00                           ` Christoph Hellwig
@ 2020-06-26 11:40                             ` Luis Chamberlain
  2020-06-26 11:50                               ` Luis Chamberlain
  0 siblings, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-26 11:40 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton
  Cc: Christian Borntraeger, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On Fri, Jun 26, 2020 at 10:00:01AM +0100, Christoph Hellwig wrote:
> On Fri, Jun 26, 2020 at 07:22:34AM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 26.06.20 04:54, Luis Chamberlain wrote:
> > > On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> > >>
> > >>
> > >> On 24.06.20 20:32, Christian Borntraeger wrote:
> > >> [...]> 
> > >>> So the translations look correct. But your change is actually a sematic change
> > >>> if(ret) will only trigger if there is an error
> > >>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> > >>> and we did not do it before. 
> > >>>
> > >>
> > >> So the right fix is
> > >>
> > >> diff --git a/kernel/umh.c b/kernel/umh.c
> > >> index f81e8698e36e..a3a3196e84d1 100644
> > >> --- a/kernel/umh.c
> > >> +++ b/kernel/umh.c
> > >> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> > >>                  * the real error code is already in sub_info->retval or
> > >>                  * sub_info->retval is 0 anyway, so don't mess with it then.
> > >>                  */
> > >> -               if (KWIFEXITED(ret))
> > >> +               if (KWEXITSTATUS(ret))
> > >>                         sub_info->retval = KWEXITSTATUS(ret);
> > >>         }
> > >>  
> > >> I think.
> > > 
> > > Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
> > 
> > But this IS a change over the previous code, no?
> > I will test next week as I am travelling right now. 
> 
> I'm all for reverting back to the previous behavior.  If someone wants
> a behavior change it should be a separate patch.  And out of pure self
> interest I'd like to see that change after my addition of the
> kernel_wait helper to replace the kernel_wait4 abuse :)

Yeah sure, this will be *slightly* cleaner after that. Today we
implicitly return -ECHLD as well under the assumption the kernel_wait4()
call failed.

Andrew, can you please revert these two for now:

selftests: simplify kmod failure value
umh: fix processed error when UMH_WAIT_PROC is used

Later, we'll add Christoph's simplier kernel wait, and make the change
directly there to catpure the right error. That still won't fix this reported
issue, but it will be cleaner and will go tested by Christian Borntraeger
before.

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-26 11:40                             ` Luis Chamberlain
@ 2020-06-26 11:50                               ` Luis Chamberlain
  0 siblings, 0 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-26 11:50 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton
  Cc: Christian Borntraeger, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On Fri, Jun 26, 2020 at 11:40:08AM +0000, Luis Chamberlain wrote:
> Andrew, can you please revert these two for now:
> 
> selftests: simplify kmod failure value
> umh: fix processed error when UMH_WAIT_PROC is used
> 
> Later, we'll add Christoph's simplier kernel wait, and make the change
> directly there to catpure the right error. That still won't fix this reported
> issue, but it will be cleaner and will go tested by Christian Borntraeger
> before.

However, note that the only consideration to make here against this
approach of the fix later going in with the simpler kernel wait is
if this actually is fixing a real issue, and if we'd want this going to
stable.

We should for sure revert though, so Andrew please do proceed to revert
or drop those two patches alone for now.

It was unclear to me if this should go to stable given I was not sure
how severe that NFS case mentioned on the commit log was, and no one on
the NFS side has replied about that yet, however there may be other
areas where code inspection of callsites was not sufficient to find the
real critical areas.

I'm now very curious what this issue that Christian with bridge on s390
found will be.

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-26  2:54                       ` Luis Chamberlain
  2020-06-26  5:22                         ` Christian Borntraeger
@ 2020-06-30 17:57                         ` Luis Chamberlain
  2020-07-01 10:08                           ` Christian Borntraeger
  1 sibling, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-06-30 17:57 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> > 
> > 
> > On 24.06.20 20:32, Christian Borntraeger wrote:
> > [...]> 
> > > So the translations look correct. But your change is actually a sematic change
> > > if(ret) will only trigger if there is an error
> > > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> > > and we did not do it before. 
> > > 
> > 
> > So the right fix is
> > 
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index f81e8698e36e..a3a3196e84d1 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >                  * the real error code is already in sub_info->retval or
> >                  * sub_info->retval is 0 anyway, so don't mess with it then.
> >                  */
> > -               if (KWIFEXITED(ret))
> > +               if (KWEXITSTATUS(ret))
> >                         sub_info->retval = KWEXITSTATUS(ret);
> >         }
> >  
> > I think.
> 
> Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
> I'm not able to reproduce this on x86 with a bridge. What type of bridge
> are you using on a guest, or did you mean using KVM so that the *host*
> can spawn kvm guests?
> 
> It would be good if you can try to add a bridge manually and see where
> things fail. Can you do something like this:
> 
> brctl addbr br0
> brctl addif br0 ens6 
> ip link set dev br0 up
> 
> Note that most callers are for modprobe. I'd be curious to see which
> umh is failing which breaks bridge for you. Can you trut this so we can
> see which umh call is failing?

Christian, any luck getting to test the code below to see what this
reveals?

  Luis

> 
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f81e8698e36e..5ad74bc301d8 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -2,6 +2,9 @@
>  /*
>   * umh - the kernel usermode helper
>   */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/module.h>
>  #include <linux/sched.h>
>  #include <linux/sched/task.h>
> @@ -154,8 +157,12 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>  		 * the real error code is already in sub_info->retval or
>  		 * sub_info->retval is 0 anyway, so don't mess with it then.
>  		 */
> -		if (KWIFEXITED(ret))
> +		printk("== ret: %02x\n", ret);
> +		printk("== KWIFEXITED(ret): %02x\n", KWIFEXITED(ret));
> +		if (KWIFEXITED(ret)) {
> +			printk("KWEXITSTATUS(ret): %d\n", KWEXITSTATUS(ret));
>  			sub_info->retval = KWEXITSTATUS(ret);
> +		}
>  	}
>  
>  	/* Restore default kernel sig handler */
> @@ -383,6 +390,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
>  		void *data)
>  {
>  	struct subprocess_info *sub_info;
> +	unsigned int i = 0;
>  	sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask);
>  	if (!sub_info)
>  		goto out;
> @@ -394,6 +402,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
>  #else
>  	sub_info->path = path;
>  #endif
> +	pr_info("sub_info->path: %s\n", sub_info->path);
> +	while (argv[i])
> +		printk(KERN_INFO "%s ", argv[i++]);
> +	printk(KERN_INFO  "\n");
> +
>  	sub_info->argv = argv;
>  	sub_info->envp = envp;
>  
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-06-30 17:57                         ` Luis Chamberlain
@ 2020-07-01 10:08                           ` Christian Borntraeger
  2020-07-01 13:24                             ` Tetsuo Handa
  2020-07-01 13:46                             ` Luis Chamberlain
  0 siblings, 2 replies; 43+ messages in thread
From: Christian Borntraeger @ 2020-07-01 10:08 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]



On 30.06.20 19:57, Luis Chamberlain wrote:
> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 24.06.20 20:32, Christian Borntraeger wrote:
>>> [...]> 
>>>> So the translations look correct. But your change is actually a sematic change
>>>> if(ret) will only trigger if there is an error
>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>>>> and we did not do it before. 
>>>>
>>>
>>> So the right fix is
>>>
>>> diff --git a/kernel/umh.c b/kernel/umh.c
>>> index f81e8698e36e..a3a3196e84d1 100644
>>> --- a/kernel/umh.c
>>> +++ b/kernel/umh.c
>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>>                  * the real error code is already in sub_info->retval or
>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>>>                  */
>>> -               if (KWIFEXITED(ret))
>>> +               if (KWEXITSTATUS(ret))
>>>                         sub_info->retval = KWEXITSTATUS(ret);
>>>         }
>>>  
>>> I think.
>>
>> Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS().
>> I'm not able to reproduce this on x86 with a bridge. What type of bridge
>> are you using on a guest, or did you mean using KVM so that the *host*
>> can spawn kvm guests?
>>
>> It would be good if you can try to add a bridge manually and see where
>> things fail. Can you do something like this:
>>
>> brctl addbr br0
>> brctl addif br0 ens6 
>> ip link set dev br0 up
>>
>> Note that most callers are for modprobe. I'd be curious to see which
>> umh is failing which breaks bridge for you. Can you trut this so we can
>> see which umh call is failing?
> 
> Christian, any luck getting to test the code below to see what this
> reveals?
> 
>   Luis

dmesg attached


[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 41598 bytes --]

[    0.137762] Linux version 5.8.0-rc2-next-20200623-00002-g5a5b173022ac-dirty (cborntra@m83lp52.lnxne.boe) (gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1), GNU ld version 2.32-31.fc31) #60 SMP Wed Jul 1 09:14:56 CEST 2020
[    0.137764] setup: Linux is running natively in 64-bit mode
[    0.137812] setup: The maximum memory size is 131072MB
[    0.137817] setup: Reserving 1024MB of memory at 130048MB for crashkernel (System RAM: 130048MB)
[    0.137901] cpu: 64 configured CPUs, 0 standby CPUs
[    0.137966] cpu: The CPU configuration topology of the machine is: 0 0 4 2 3 10 / 4
[    0.138533] Write protected kernel read-only data: 13660k
[    0.139239] Zone ranges:
[    0.139240]   DMA      [mem 0x0000000000000000-0x000000007fffffff]
[    0.139241]   Normal   [mem 0x0000000080000000-0x0000001fffffffff]
[    0.139243] Movable zone start for each node
[    0.139243] Early memory node ranges
[    0.139244]   node   0: [mem 0x0000000000000000-0x0000001fffffffff]
[    0.139251] Initmem setup node 0 [mem 0x0000000000000000-0x0000001fffffffff]
[    0.139252] On node 0 totalpages: 33554432
[    0.139253]   DMA zone: 8192 pages used for memmap
[    0.139253]   DMA zone: 0 pages reserved
[    0.139254]   DMA zone: 524288 pages, LIFO batch:63
[    0.152116]   Normal zone: 516096 pages used for memmap
[    0.152117]   Normal zone: 33030144 pages, LIFO batch:63
[    0.168380] percpu: Embedded 33 pages/cpu s97536 r8192 d29440 u135168
[    0.168388] pcpu-alloc: s97536 r8192 d29440 u135168 alloc=33*4096
[    0.168388] pcpu-alloc: [0] 000 [0] 001 [0] 002 [0] 003 
[    0.168390] pcpu-alloc: [0] 004 [0] 005 [0] 006 [0] 007 
[    0.168391] pcpu-alloc: [0] 008 [0] 009 [0] 010 [0] 011 
[    0.168392] pcpu-alloc: [0] 012 [0] 013 [0] 014 [0] 015 
[    0.168393] pcpu-alloc: [0] 016 [0] 017 [0] 018 [0] 019 
[    0.168394] pcpu-alloc: [0] 020 [0] 021 [0] 022 [0] 023 
[    0.168395] pcpu-alloc: [0] 024 [0] 025 [0] 026 [0] 027 
[    0.168396] pcpu-alloc: [0] 028 [0] 029 [0] 030 [0] 031 
[    0.168397] pcpu-alloc: [0] 032 [0] 033 [0] 034 [0] 035 
[    0.168398] pcpu-alloc: [0] 036 [0] 037 [0] 038 [0] 039 
[    0.168399] pcpu-alloc: [0] 040 [0] 041 [0] 042 [0] 043 
[    0.168400] pcpu-alloc: [0] 044 [0] 045 [0] 046 [0] 047 
[    0.168401] pcpu-alloc: [0] 048 [0] 049 [0] 050 [0] 051 
[    0.168403] pcpu-alloc: [0] 052 [0] 053 [0] 054 [0] 055 
[    0.168404] pcpu-alloc: [0] 056 [0] 057 [0] 058 [0] 059 
[    0.168405] pcpu-alloc: [0] 060 [0] 061 [0] 062 [0] 063 
[    0.168406] pcpu-alloc: [0] 064 [0] 065 [0] 066 [0] 067 
[    0.168407] pcpu-alloc: [0] 068 [0] 069 [0] 070 [0] 071 
[    0.168408] pcpu-alloc: [0] 072 [0] 073 [0] 074 [0] 075 
[    0.168409] pcpu-alloc: [0] 076 [0] 077 [0] 078 [0] 079 
[    0.168410] pcpu-alloc: [0] 080 [0] 081 [0] 082 [0] 083 
[    0.168411] pcpu-alloc: [0] 084 [0] 085 [0] 086 [0] 087 
[    0.168413] pcpu-alloc: [0] 088 [0] 089 [0] 090 [0] 091 
[    0.168414] pcpu-alloc: [0] 092 [0] 093 [0] 094 [0] 095 
[    0.168415] pcpu-alloc: [0] 096 [0] 097 [0] 098 [0] 099 
[    0.168416] pcpu-alloc: [0] 100 [0] 101 [0] 102 [0] 103 
[    0.168417] pcpu-alloc: [0] 104 [0] 105 [0] 106 [0] 107 
[    0.168418] pcpu-alloc: [0] 108 [0] 109 [0] 110 [0] 111 
[    0.168419] pcpu-alloc: [0] 112 [0] 113 [0] 114 [0] 115 
[    0.168420] pcpu-alloc: [0] 116 [0] 117 [0] 118 [0] 119 
[    0.168421] pcpu-alloc: [0] 120 [0] 121 [0] 122 [0] 123 
[    0.168422] pcpu-alloc: [0] 124 [0] 125 [0] 126 [0] 127 
[    0.168424] pcpu-alloc: [0] 128 [0] 129 [0] 130 [0] 131 
[    0.168425] pcpu-alloc: [0] 132 [0] 133 [0] 134 [0] 135 
[    0.168426] pcpu-alloc: [0] 136 [0] 137 [0] 138 [0] 139 
[    0.168427] pcpu-alloc: [0] 140 [0] 141 [0] 142 [0] 143 
[    0.168428] pcpu-alloc: [0] 144 [0] 145 [0] 146 [0] 147 
[    0.168429] pcpu-alloc: [0] 148 [0] 149 [0] 150 [0] 151 
[    0.168430] pcpu-alloc: [0] 152 [0] 153 [0] 154 [0] 155 
[    0.168431] pcpu-alloc: [0] 156 [0] 157 [0] 158 [0] 159 
[    0.168432] pcpu-alloc: [0] 160 [0] 161 [0] 162 [0] 163 
[    0.168433] pcpu-alloc: [0] 164 [0] 165 [0] 166 [0] 167 
[    0.168434] pcpu-alloc: [0] 168 [0] 169 [0] 170 [0] 171 
[    0.168435] pcpu-alloc: [0] 172 [0] 173 [0] 174 [0] 175 
[    0.168436] pcpu-alloc: [0] 176 [0] 177 [0] 178 [0] 179 
[    0.168437] pcpu-alloc: [0] 180 [0] 181 [0] 182 [0] 183 
[    0.168438] pcpu-alloc: [0] 184 [0] 185 [0] 186 [0] 187 
[    0.168439] pcpu-alloc: [0] 188 [0] 189 [0] 190 [0] 191 
[    0.168440] pcpu-alloc: [0] 192 [0] 193 [0] 194 [0] 195 
[    0.168442] pcpu-alloc: [0] 196 [0] 197 [0] 198 [0] 199 
[    0.168443] pcpu-alloc: [0] 200 [0] 201 [0] 202 [0] 203 
[    0.168444] pcpu-alloc: [0] 204 [0] 205 [0] 206 [0] 207 
[    0.168445] pcpu-alloc: [0] 208 [0] 209 [0] 210 [0] 211 
[    0.168446] pcpu-alloc: [0] 212 [0] 213 [0] 214 [0] 215 
[    0.168447] pcpu-alloc: [0] 216 [0] 217 [0] 218 [0] 219 
[    0.168448] pcpu-alloc: [0] 220 [0] 221 [0] 222 [0] 223 
[    0.168449] pcpu-alloc: [0] 224 [0] 225 [0] 226 [0] 227 
[    0.168450] pcpu-alloc: [0] 228 [0] 229 [0] 230 [0] 231 
[    0.168451] pcpu-alloc: [0] 232 [0] 233 [0] 234 [0] 235 
[    0.168452] pcpu-alloc: [0] 236 [0] 237 [0] 238 [0] 239 
[    0.168453] pcpu-alloc: [0] 240 [0] 241 [0] 242 [0] 243 
[    0.168454] pcpu-alloc: [0] 244 [0] 245 [0] 246 [0] 247 
[    0.168455] pcpu-alloc: [0] 248 [0] 249 [0] 250 [0] 251 
[    0.168456] pcpu-alloc: [0] 252 [0] 253 [0] 254 [0] 255 
[    0.168457] pcpu-alloc: [0] 256 [0] 257 [0] 258 [0] 259 
[    0.168459] pcpu-alloc: [0] 260 [0] 261 [0] 262 [0] 263 
[    0.168460] pcpu-alloc: [0] 264 [0] 265 [0] 266 [0] 267 
[    0.168461] pcpu-alloc: [0] 268 [0] 269 [0] 270 [0] 271 
[    0.168462] pcpu-alloc: [0] 272 [0] 273 [0] 274 [0] 275 
[    0.168463] pcpu-alloc: [0] 276 [0] 277 [0] 278 [0] 279 
[    0.168464] pcpu-alloc: [0] 280 [0] 281 [0] 282 [0] 283 
[    0.168465] pcpu-alloc: [0] 284 [0] 285 [0] 286 [0] 287 
[    0.168466] pcpu-alloc: [0] 288 [0] 289 [0] 290 [0] 291 
[    0.168467] pcpu-alloc: [0] 292 [0] 293 [0] 294 [0] 295 
[    0.168468] pcpu-alloc: [0] 296 [0] 297 [0] 298 [0] 299 
[    0.168469] pcpu-alloc: [0] 300 [0] 301 [0] 302 [0] 303 
[    0.168470] pcpu-alloc: [0] 304 [0] 305 [0] 306 [0] 307 
[    0.168471] pcpu-alloc: [0] 308 [0] 309 [0] 310 [0] 311 
[    0.168472] pcpu-alloc: [0] 312 [0] 313 [0] 314 [0] 315 
[    0.168473] pcpu-alloc: [0] 316 [0] 317 [0] 318 [0] 319 
[    0.168474] pcpu-alloc: [0] 320 [0] 321 [0] 322 [0] 323 
[    0.168475] pcpu-alloc: [0] 324 [0] 325 [0] 326 [0] 327 
[    0.168476] pcpu-alloc: [0] 328 [0] 329 [0] 330 [0] 331 
[    0.168477] pcpu-alloc: [0] 332 [0] 333 [0] 334 [0] 335 
[    0.168478] pcpu-alloc: [0] 336 [0] 337 [0] 338 [0] 339 
[    0.168501] Built 1 zonelists, mobility grouping on.  Total pages: 33030144
[    0.168502] Policy zone: Normal
[    0.168503] Kernel command line: root=/dev/disk/by-path/ccw-0.0.3318-part1 rd.dasd=0.0.3318 cio_ignore=all,!condev rd.znet=qeth,0.0.bd00,0.0.bd01,0.0.bd02,layer2=1,portno=0,portname=OSAPORT zfcp.allow_lun_scan=0 BOOT_IMAGE=0 crashkernel=1G brd.rd_size=8000000 brd.rd_nr=8 BOOT_IMAGE=0
[    0.169679] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.169680] printk: log_buf_len total cpu_extra contributions: 1388544 bytes
[    0.169680] printk: log_buf_len min size: 131072 bytes
[    0.169973] printk: log_buf_len: 2097152 bytes
[    0.169974] printk: early log buf free: 123812(94%)
[    0.179095] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes, linear)
[    0.183664] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes, linear)
[    0.183681] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.212977] Memory: 2312420K/134217728K available (10512K kernel code, 2084K rwdata, 3148K rodata, 4008K init, 856K bss, 3358660K reserved, 0K cma-reserved)
[    0.213438] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=340, Nodes=1
[    0.213476] ftrace: allocating 31392 entries in 123 pages
[    0.218045] ftrace: allocated 123 pages with 6 groups
[    0.219167] rcu: Hierarchical RCU implementation.
[    0.219167] rcu: 	RCU event tracing is enabled.
[    0.219168] rcu: 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=340.
[    0.219169] 	Trampoline variant of Tasks RCU enabled.
[    0.219169] 	Rude variant of Tasks RCU enabled.
[    0.219169] 	Tracing variant of Tasks RCU enabled.
[    0.219170] rcu: RCU calculated value of scheduler-enlistment delay is 11 jiffies.
[    0.219171] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=340
[    0.222162] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3
[    0.222313] clocksource: tod: mask: 0xffffffffffffffff max_cycles: 0x3b0a9be803b0a9, max_idle_ns: 1805497147909793 ns
[    0.222534] Console: colour dummy device 80x25
[    0.323253] printk: console [ttyS0] enabled
[    0.457563] Calibrating delay loop (skipped)... 21881.00 BogoMIPS preset
[    0.457564] pid_max: default: 348160 minimum: 2720
[    0.457736] LSM: Security Framework initializing
[    0.457767] SELinux:  Initializing.
[    0.458030] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.458174] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.459174] rcu: Hierarchical SRCU implementation.
[    0.462173] smp: Bringing up secondary CPUs ...
[    0.485719] smp: Brought up 1 node, 64 CPUs
[    1.072386] random: fast init done
[    1.413328] node 0 deferred pages initialised in 930ms
[    1.442064] devtmpfs: initialized
[    1.442586] random: get_random_u32 called from bucket_table_alloc.isra.0+0x82/0x120 with crng_init=1
[    1.442863] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    1.443545] futex hash table entries: 131072 (order: 13, 33554432 bytes, vmalloc)
[    1.447717] xor: automatically using best checksumming function   xc        
[    1.447917] NET: Registered protocol family 16
[    1.447956] audit: initializing netlink subsys (disabled)
[    1.448023] audit: type=2000 audit(1593597944.331:1): state=initialized audit_enabled=0 res=1
[    1.448147] Spectre V2 mitigation: etokens
[    1.456276] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages
[    1.702379] raid6: vx128x8  gen() 21910 MB/s
[    1.872319] raid6: vx128x8  xor() 13356 MB/s
[    1.872320] raid6: using algorithm vx128x8 gen() 21910 MB/s
[    1.872321] raid6: .... xor() 13356 MB/s, rmw enabled
[    1.872322] raid6: using s390xc recovery algorithm
[    1.872601] iommu: Default domain type: Translated 
[    1.872748] SCSI subsystem initialized
[    1.894528] PCI host bridge to bus 0000:00
[    1.894533] pci_bus 0000:00: root bus resource [bus 00]
[    1.894535] pci_bus 0000:00: root bus resource [mem 0x8000000000000000-0x8000000007ffffff 64bit pref]
[    1.894600] pci 0000:00:00.0: [1014:044b] type 00 class 0x120000
[    1.894650] pci 0000:00:00.0: reg 0x10: [mem 0xffffd80008000000-0xffffd8000fffffff 64bit pref]
[    1.894918] pci 0000:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0000:00:00.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[    1.894951] pci 0000:00:00.0: Adding to iommu group 0
[    1.896880] PCI host bridge to bus 0001:00
[    1.896882] pci_bus 0001:00: root bus resource [bus 00]
[    1.896883] pci_bus 0001:00: root bus resource [mem 0x8001000000000000-0x80010000000fffff 64bit pref]
[    1.896971] pci 0001:00:00.0: [15b3:1016] type 00 class 0x020000
[    1.897066] pci 0001:00:00.0: reg 0x10: [mem 0xffffd40002000000-0xffffd400020fffff 64bit pref]
[    1.897216] pci 0001:00:00.0: enabling Extended Tags
[    1.897702] pci 0001:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0001:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[    1.897740] pci 0001:00:00.0: Adding to iommu group 1
[    1.899688] PCI host bridge to bus 0002:00
[    1.899690] pci_bus 0002:00: root bus resource [bus 00]
[    1.899692] pci_bus 0002:00: root bus resource [mem 0x8002000000000000-0x80020000000fffff 64bit pref]
[    1.899779] pci 0002:00:00.0: [15b3:1016] type 00 class 0x020000
[    1.899877] pci 0002:00:00.0: reg 0x10: [mem 0xffffd40008000000-0xffffd400080fffff 64bit pref]
[    1.900031] pci 0002:00:00.0: enabling Extended Tags
[    1.900530] pci 0002:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0002:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[    1.900563] pci 0002:00:00.0: Adding to iommu group 2
[    2.424592] VFS: Disk quotas dquot_6.6.0
[    2.424648] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    2.425726] random: crng init done
[    2.426438] NET: Registered protocol family 2
[    2.427232] tcp_listen_portaddr_hash hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    2.427784] TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
[    2.429806] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    2.430265] TCP: Hash tables configured (established 524288 bind 65536)
[    2.430641] UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
[    2.431636] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
[    2.433139] NET: Registered protocol family 1
[    2.433180] Trying to unpack rootfs image as initramfs...
[    3.042521] Freeing initrd memory: 46272K
[    3.042920] alg: No test for crc32be (crc32be-vx)
[    3.045455] Initialise system trusted keyrings
[    3.045514] workingset: timestamp_bits=45 max_order=25 bucket_order=0
[    3.046303] zbud: loaded
[    3.046685] fuse: init (API version 7.31)
[    3.046739] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
[    3.049940] Key type asymmetric registered
[    3.049942] Asymmetric key parser 'x509' registered
[    3.049948] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    3.050150] io scheduler mq-deadline registered
[    3.050152] io scheduler kyber registered
[    3.050171] io scheduler bfq registered
[    3.051187] atomic64_test: passed
[    3.051236] hvc_iucv: The z/VM IUCV HVC device driver cannot be used without z/VM
[    3.055982] brd: module loaded
[    3.056482] NET: Registered protocol family 10
[    3.057354] Segment Routing with IPv6
[    3.057366] NET: Registered protocol family 17
[    3.057370] Key type dns_resolver registered
[    3.057524] cio: Channel measurement facility initialized using format extended (mode autodetected)
[    3.057802] Discipline DIAG cannot be used without z/VM
[    5.446198] sclp_sd: No data is available for the config data entity
[    5.954014] qeth: loading core functions
[    5.954063] qeth: register layer 2 discipline
[    5.954064] qeth: register layer 3 discipline
[    5.954126] registered taskstats version 1
[    5.954131] Loading compiled-in X.509 certificates
[    5.993277] Loaded X.509 cert 'Build time autogenerated kernel key: c46ba92ee388c82c5891ee836c9c20b752cdfac5'
[    5.995428] zswap: loaded using pool lzo/zbud
[    5.996258] Key type ._fscrypt registered
[    5.996258] Key type .fscrypt registered
[    5.996259] Key type fscrypt-provisioning registered
[    5.996570] Btrfs loaded, crc32c=crc32c-vx
[    5.996583] ima: No TPM chip found, activating TPM-bypass!
[    5.996586] ima: Allocated hash algorithm: sha256
[    5.996596] ima: No architecture policies found
[    5.997561] Freeing unused kernel memory: 4008K
[    6.042376] Write protected read-only-after-init data: 80k
[    6.042381] Run /init as init process
[    6.042383]   with arguments:
[    6.042383]     /init
[    6.042383]   with environment:
[    6.042384]     HOME=/
[    6.042384]     TERM=linux
[    6.042384]     BOOT_IMAGE=0
[    6.042385]     crashkernel=1G
[    6.057157] systemd[1]: Inserted module 'autofs4'
[    6.058276] systemd[1]: systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[    6.059083] systemd[1]: Detected architecture s390x.
[    6.059085] systemd[1]: Running in initial RAM disk.
[    6.059147] systemd[1]: Set hostname to <m83lp52.lnxne.boe>.
[    6.098479] systemd[1]: Reached target Local File Systems.
[    6.098536] systemd[1]: Reached target Slices.
[    6.098555] systemd[1]: Reached target Swap.
[    6.098570] systemd[1]: Reached target Timers.
[    6.098663] systemd[1]: Listening on Journal Audit Socket.
[    6.098712] systemd[1]: Listening on Journal Socket (/dev/log).
[    6.145682] umh: sub_info->path: /sbin/modprobe
[    6.145683] /sbin/modprobe 
[    6.145684] -q 
[    6.145685] -- 
[    6.145685] sch_fq_codel 

[    6.146408] == ret: 100
[    6.146410] == KWIFEXITED(ret): 01
[    6.146411] KWEXITSTATUS(ret): 1
[    6.365672] audit: type=1130 audit(1593597949.251:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    6.372997] audit: type=1130 audit(1593597949.261:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.294273] audit: type=1130 audit(1593597950.181:4): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.310777] audit: type=1130 audit(1593597950.191:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-pre-udev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.311257] audit: type=1334 audit(1593597950.191:6): prog-id=6 op=LOAD
[    7.311287] audit: type=1334 audit(1593597950.191:7): prog-id=7 op=LOAD
[    7.533584] audit: type=1130 audit(1593597950.421:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.538462] qeth 0.0.bd00: portname is deprecated and is ignored
[    7.539958] dasd-eckd 0.0.3318: A channel path to the device has become operational
[    7.540242] dasd-eckd 0.0.3319: A channel path to the device has become operational
[    7.540743] dasd-eckd 0.0.331a: A channel path to the device has become operational
[    7.540985] dasd-eckd 0.0.331b: A channel path to the device has become operational
[    7.543658] qdio: 0.0.bd02 OSA on SC 159b using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
[    7.550125] dasd-eckd 0.0.3319: New DASD 3390/0E (CU 3990/01) with 262668 cylinders, 15 heads, 224 sectors
[    7.552945] dasd-eckd 0.0.3319: DASD with 4 KB/block, 189120960 KB total size, 48 KB/track, compatible disk layout
[    7.554058]  dasdb:VOL1/  0X3319: dasdb1
[    7.554603] dasd-eckd 0.0.3318: New DASD 3390/0E (CU 3990/01) with 262668 cylinders, 15 heads, 224 sectors
[    7.557378] dasd-eckd 0.0.3318: DASD with 4 KB/block, 189120960 KB total size, 48 KB/track, compatible disk layout
[    7.558349]  dasda:VOL1/  0X3318: dasda1
[    7.558949] dasd-eckd 0.0.331a: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors
[    7.561612] dasd-eckd 0.0.331a: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout
[    7.563390] dasd-eckd 0.0.331b: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors
[    7.563427]  dasdc:VOL1/  0X3333: dasdc1 dasdc2 dasdc3
[    7.566603] dasd-eckd 0.0.331b: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout
[    7.567763]  dasdd:VOL1/  0X3333: dasdd1 dasdd2 dasdd3
[    7.575390] qeth 0.0.bd00: QDIO data connection isolation is deactivated
[    7.575863] qeth 0.0.bd00: The device represents a Bridge Capable Port
[    7.576279] audit: type=1130 audit(1593597950.461:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.579230] qeth 0.0.bd00: MAC address b6:c4:0f:21:ab:4e successfully registered
[    7.579749] qeth 0.0.bd00: Device is a OSD Express card (level: 0202)
               with link type OSD_10GIG.
[    7.590981] audit: type=1130 audit(1593597950.471:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    7.621333] qeth 0.0.bd00: MAC address de:45:d7:61:c4:13 successfully registered
[    7.623582] qeth 0.0.bd00 encbd00: renamed from eth0
[    7.812522] mlx5_core 0001:00:00.0: enabling device (0000 -> 0002)
[    7.812608] mlx5_core 0001:00:00.0: firmware version: 14.23.1020
[    8.035163] dasdconf.sh Warning: 0.0.3319 is already online, not configuring
[    8.037162] dasdconf.sh Warning: 0.0.331b is already online, not configuring
[    8.037885] dasdconf.sh Warning: 0.0.3318 is already online, not configuring
[    8.065688] dasdconf.sh Warning: 0.0.331a is already online, not configuring
[    8.254855] umh: sub_info->path: /sbin/modprobe
[    8.254857] /sbin/modprobe 
[    8.254858] -q 
[    8.254858] -- 
[    8.254859] mlx5_ib 

[    8.255376] mlx5_core 0002:00:00.0: enabling device (0000 -> 0002)
[    8.255460] mlx5_core 0002:00:00.0: firmware version: 14.23.1020
[    8.706511] umh: sub_info->path: /sbin/modprobe
[    8.706542] /sbin/modprobe 
[    8.706546] -q 
[    8.706555] -- 
[    8.706563] mlx5_ib 

[    8.708963] mlx5_core 0001:00:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[    8.853364] mlx5_core 0002:00:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[    9.023031] mlx5_core 0002:00:00.0 enP2s564np0: renamed from eth1
[    9.223067] mlx5_core 0001:00:00.0 enP1s519np0: renamed from eth0
[    9.553829] audit: type=1130 audit(1593597952.441:11): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.570174] audit: type=1130 audit(1593597952.451:12): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    9.578552] EXT4-fs (dasda1): mounted filesystem with ordered data mode. Opts: (null)
[    9.586599] audit: type=1334 audit(1593597952.471:13): prog-id=5 op=UNLOAD
[    9.937200] systemd-journald[484]: Received SIGTERM from PID 1 (systemd).
[    9.954691] printk: systemd: 19 output lines suppressed due to ratelimiting
[   10.278153] SELinux:  Permission watch in class filesystem not defined in policy.
[   10.278160] SELinux:  Permission watch in class file not defined in policy.
[   10.278161] SELinux:  Permission watch_mount in class file not defined in policy.
[   10.278162] SELinux:  Permission watch_sb in class file not defined in policy.
[   10.278163] SELinux:  Permission watch_with_perm in class file not defined in policy.
[   10.278164] SELinux:  Permission watch_reads in class file not defined in policy.
[   10.278167] SELinux:  Permission watch in class dir not defined in policy.
[   10.278168] SELinux:  Permission watch_mount in class dir not defined in policy.
[   10.278169] SELinux:  Permission watch_sb in class dir not defined in policy.
[   10.278170] SELinux:  Permission watch_with_perm in class dir not defined in policy.
[   10.278171] SELinux:  Permission watch_reads in class dir not defined in policy.
[   10.278174] SELinux:  Permission watch in class lnk_file not defined in policy.
[   10.278175] SELinux:  Permission watch_mount in class lnk_file not defined in policy.
[   10.278176] SELinux:  Permission watch_sb in class lnk_file not defined in policy.
[   10.278177] SELinux:  Permission watch_with_perm in class lnk_file not defined in policy.
[   10.278178] SELinux:  Permission watch_reads in class lnk_file not defined in policy.
[   10.278180] SELinux:  Permission watch in class chr_file not defined in policy.
[   10.278198] SELinux:  Permission watch_mount in class chr_file not defined in policy.
[   10.278199] SELinux:  Permission watch_sb in class chr_file not defined in policy.
[   10.278200] SELinux:  Permission watch_with_perm in class chr_file not defined in policy.
[   10.278201] SELinux:  Permission watch_reads in class chr_file not defined in policy.
[   10.278203] SELinux:  Permission watch in class blk_file not defined in policy.
[   10.278204] SELinux:  Permission watch_mount in class blk_file not defined in policy.
[   10.278205] SELinux:  Permission watch_sb in class blk_file not defined in policy.
[   10.278206] SELinux:  Permission watch_with_perm in class blk_file not defined in policy.
[   10.278207] SELinux:  Permission watch_reads in class blk_file not defined in policy.
[   10.278209] SELinux:  Permission watch in class sock_file not defined in policy.
[   10.278211] SELinux:  Permission watch_mount in class sock_file not defined in policy.
[   10.278212] SELinux:  Permission watch_sb in class sock_file not defined in policy.
[   10.278212] SELinux:  Permission watch_with_perm in class sock_file not defined in policy.
[   10.278213] SELinux:  Permission watch_reads in class sock_file not defined in policy.
[   10.278216] SELinux:  Permission watch in class fifo_file not defined in policy.
[   10.278217] SELinux:  Permission watch_mount in class fifo_file not defined in policy.
[   10.278218] SELinux:  Permission watch_sb in class fifo_file not defined in policy.
[   10.278219] SELinux:  Permission watch_with_perm in class fifo_file not defined in policy.
[   10.278220] SELinux:  Permission watch_reads in class fifo_file not defined in policy.
[   10.278262] SELinux:  Permission perfmon in class capability2 not defined in policy.
[   10.278263] SELinux:  Permission bpf in class capability2 not defined in policy.
[   10.278268] SELinux:  Permission perfmon in class cap2_userns not defined in policy.
[   10.278269] SELinux:  Permission bpf in class cap2_userns not defined in policy.
[   10.278304] SELinux:  Class perf_event not defined in policy.
[   10.278305] SELinux:  Class lockdown not defined in policy.
[   10.278306] SELinux: the above unknown classes and permissions will be allowed
[   10.278311] SELinux:  policy capability network_peer_controls=1
[   10.278311] SELinux:  policy capability open_perms=1
[   10.278312] SELinux:  policy capability extended_socket_class=1
[   10.278313] SELinux:  policy capability always_check_network=0
[   10.278314] SELinux:  policy capability cgroup_seclabel=1
[   10.278314] SELinux:  policy capability nnp_nosuid_transition=1
[   10.278315] SELinux:  policy capability genfs_seclabel_symlinks=0
[   10.389095] systemd[1]: Successfully loaded SELinux policy in 339.175ms.
[   10.534855] systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 13.169ms.
[   10.537104] systemd[1]: systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[   10.537930] systemd[1]: Detected architecture s390x.
[   10.540018] systemd[1]: Set hostname to <m83lp52.lnxne.boe>.
[   10.716298] systemd[1]: /usr/lib/systemd/system/sssd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/sssd.pid → /run/sssd.pid; please update the unit file accordingly.
[   10.721240] systemd[1]: /usr/lib/systemd/system/iscsid.service:11: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsid.pid → /run/iscsid.pid; please update the unit file accordingly.
[   10.721443] systemd[1]: /usr/lib/systemd/system/iscsiuio.service:13: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsiuio.pid → /run/iscsiuio.pid; please update the unit file accordingly.
[   10.748479] systemd[1]: /usr/lib/systemd/system/sssd-kcm.socket:7: ListenStream= references a path below legacy directory /var/run/, updating /var/run/.heim_org.h5l.kcm-socket → /run/.heim_org.h5l.kcm-socket; please update the unit file accordingly.
[   10.781859] systemd[1]: initrd-switch-root.service: Succeeded.
[   10.781985] systemd[1]: Stopped Switch Root.
[   10.782251] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[   10.810779] EXT4-fs (dasda1): re-mounted. Opts: (null)
[   10.841271] umh: sub_info->path: /sbin/modprobe
[   10.841275] /sbin/modprobe 
[   10.841275] -q 
[   10.841276] -- 
[   10.841277] sch_fq_codel 

[   10.889454] == ret: 00
[   10.889456] == KWIFEXITED(ret): 01
[   10.889457] KWEXITSTATUS(ret): 0
[   11.118962] systemd-journald[990]: Received client request to flush runtime journal.
[   11.392641] VFIO - User Level meta-driver version: 0.3
[   11.490782] genwqe 0000:00:00.0: enabling device (0000 -> 0002)
[   11.506302] dasdconf.sh Warning: 0.0.3319 is already online, not configuring
[   11.507431] dasdconf.sh Warning: 0.0.331a is already online, not configuring
[   11.507992] dasdconf.sh Warning: 0.0.331b is already online, not configuring
[   11.508762] dasdconf.sh Warning: 0.0.3318 is already online, not configuring
[   11.625686] umh: sub_info->path: /sbin/modprobe
[   11.625690] /sbin/modprobe 
[   11.625690] -q 
[   11.625691] -- 
[   11.625692] crypto-ecb(aes) 

[   11.627270] == ret: 100
[   11.627272] == KWIFEXITED(ret): 01
[   11.627274] KWEXITSTATUS(ret): 1
[   11.627678] umh: sub_info->path: /sbin/modprobe
[   11.627681] /sbin/modprobe 
[   11.627682] -q 
[   11.627682] -- 
[   11.627683] crypto-cbc(aes) 

[   11.628695] == ret: 100
[   11.628699] == KWIFEXITED(ret): 01
[   11.628699] KWEXITSTATUS(ret): 1
[   11.629108] umh: sub_info->path: /sbin/modprobe
[   11.629110] /sbin/modprobe 
[   11.629110] -q 
[   11.629111] -- 
[   11.629112] crypto-xts(aes) 

[   11.630032] == ret: 100
[   11.630034] == KWIFEXITED(ret): 01
[   11.630035] KWEXITSTATUS(ret): 1
[   11.630098] umh: sub_info->path: /sbin/modprobe
[   11.630100] /sbin/modprobe 
[   11.630100] -q 
[   11.630101] -- 
[   11.630102] crypto-aes 

[   11.630979] == ret: 100
[   11.630980] == KWIFEXITED(ret): 01
[   11.631000] KWEXITSTATUS(ret): 1
[   11.631010] umh: sub_info->path: /sbin/modprobe
[   11.631012] /sbin/modprobe 
[   11.631012] -q 
[   11.631012] -- 
[   11.631013] cryptomgr 

[   11.631844] == ret: 00
[   11.631845] == KWIFEXITED(ret): 01
[   11.631846] KWEXITSTATUS(ret): 0
[   11.632291] umh: sub_info->path: /sbin/modprobe
[   11.632293] /sbin/modprobe 
[   11.632293] -q 
[   11.632294] -- 
[   11.632295] crypto-ctr(aes) 

[   11.633134] == ret: 100
[   11.633136] == KWIFEXITED(ret): 01
[   11.633136] KWEXITSTATUS(ret): 1
[   11.633211] umh: sub_info->path: /sbin/modprobe
[   11.633213] /sbin/modprobe 
[   11.633213] -q 
[   11.633214] -- 
[   11.633215] crypto-ctr 

[   11.639862] == ret: 00
[   11.639863] == KWIFEXITED(ret): 01
[   11.639864] KWEXITSTATUS(ret): 0
[   12.291736] device-mapper: uevent: version 1.0.3
[   12.291869] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
[   12.811519] kauditd_printk_skb: 81 callbacks suppressed
[   12.811520] audit: type=1130 audit(1593597955.691:93): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.811526] audit: type=1131 audit(1593597955.691:94): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.827013] audit: type=1130 audit(1593597955.711:95): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2dpath-ccw\x2d0.0.3319\x2dpart1 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.832192] XFS (dasdb1): Mounting V5 Filesystem
[   12.842643] RPC: Registered named UNIX socket transport module.
[   12.842645] RPC: Registered udp transport module.
[   12.842646] RPC: Registered tcp transport module.
[   12.842647] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   12.855633] XFS (dasdb1): Ending clean mount
[   12.857782] xfs filesystem being mounted at /home supports timestamps until 2038 (0x7fffffff)
[   12.876314] audit: type=1130 audit(1593597955.761:96): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.896111] audit: type=1130 audit(1593597955.781:97): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.896302] audit: type=1130 audit(1593597955.781:98): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.899463] RPC: Registered rdma transport module.
[   12.899466] RPC: Registered rdma backchannel transport module.
[   12.900337] audit: type=1130 audit(1593597955.781:99): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rdma-load-modules@rdma comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.916905] audit: type=1130 audit(1593597955.801:100): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rdma comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.952843] audit: type=1130 audit(1593597955.841:101): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   12.962072] audit: type=1305 audit(1593597955.841:102): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[   13.444427] umh: sub_info->path: /sbin/modprobe
[   13.444430] /sbin/modprobe 
[   13.444430] -q 
[   13.444431] -- 
[   13.444432] iptable_filter 

[   13.485947] == ret: 00
[   13.485949] == KWIFEXITED(ret): 01
[   13.485950] KWEXITSTATUS(ret): 0
[   13.525155] umh: sub_info->path: /sbin/modprobe
[   13.525156] /sbin/modprobe 
[   13.525157] -q 
[   13.525157] -- 
[   13.525158] ip6table_filter 

[   13.575540] == ret: 00
[   13.575542] == KWIFEXITED(ret): 01
[   13.575543] KWEXITSTATUS(ret): 0
[   13.590423] umh: sub_info->path: /sbin/modprobe
[   13.590425] /sbin/modprobe 
[   13.590425] -q 
[   13.590426] -- 
[   13.590427] net-pf-16-proto-12 

[   13.617106] == ret: 00
[   13.617108] == KWIFEXITED(ret): 01
[   13.617109] KWEXITSTATUS(ret): 0
[   13.617147] umh: sub_info->path: /sbin/modprobe
[   13.617148] /sbin/modprobe 
[   13.617149] -q 
[   13.617149] -- 
[   13.617150] nfnetlink-subsys-6 

[   13.638728] == ret: 00
[   13.638729] == KWIFEXITED(ret): 01
[   13.638730] KWEXITSTATUS(ret): 0
[   13.831375] umh: sub_info->path: /sbin/modprobe
[   13.831377] /sbin/modprobe 
[   13.831378] -q 
[   13.831379] -- 
[   13.831379] iptable_security 

[   13.846372] == ret: 00
[   13.846374] == KWIFEXITED(ret): 01
[   13.846375] KWEXITSTATUS(ret): 0
[   13.847932] umh: sub_info->path: /sbin/modprobe
[   13.847934] /sbin/modprobe 
[   13.847934] -q 
[   13.847935] -- 
[   13.847935] iptable_raw 

[   13.854549] == ret: 00
[   13.854550] == KWIFEXITED(ret): 01
[   13.854551] KWEXITSTATUS(ret): 0
[   13.855856] umh: sub_info->path: /sbin/modprobe
[   13.855857] /sbin/modprobe 
[   13.855857] -q 
[   13.855858] -- 
[   13.855858] iptable_mangle 

[   13.860364] == ret: 00
[   13.860365] == KWIFEXITED(ret): 01
[   13.860366] KWEXITSTATUS(ret): 0
[   13.861761] umh: sub_info->path: /sbin/modprobe
[   13.861762] /sbin/modprobe 
[   13.861763] -q 
[   13.861763] -- 
[   13.861764] iptable_nat 

[   13.911825] == ret: 00
[   13.911826] == KWIFEXITED(ret): 01
[   13.911827] KWEXITSTATUS(ret): 0
[   13.914686] umh: sub_info->path: /sbin/modprobe
[   13.914688] /sbin/modprobe 
[   13.914688] -q 
[   13.914689] -- 
[   13.914689] ip6table_security 

[   13.945628] == ret: 00
[   13.945630] == KWIFEXITED(ret): 01
[   13.945630] KWEXITSTATUS(ret): 0
[   13.947019] umh: sub_info->path: /sbin/modprobe
[   13.947020] /sbin/modprobe 
[   13.947020] -q 
[   13.947021] -- 
[   13.947022] ip6table_raw 

[   13.969857] == ret: 00
[   13.969859] == KWIFEXITED(ret): 01
[   13.969860] KWEXITSTATUS(ret): 0
[   13.971549] umh: sub_info->path: /sbin/modprobe
[   13.971551] /sbin/modprobe 
[   13.971552] -q 
[   13.971553] -- 
[   13.971553] ip6table_mangle 

[   13.977724] == ret: 00
[   13.977725] == KWIFEXITED(ret): 01
[   13.977726] KWEXITSTATUS(ret): 0
[   13.979392] umh: sub_info->path: /sbin/modprobe
[   13.979393] /sbin/modprobe 
[   13.979394] -q 
[   13.979394] -- 
[   13.979395] ip6table_nat 

[   13.985103] == ret: 00
[   13.985104] == KWIFEXITED(ret): 01
[   13.985105] KWEXITSTATUS(ret): 0
[   14.014760] umh: sub_info->path: /sbin/modprobe
[   14.014762] /sbin/modprobe 
[   14.014762] -q 
[   14.014762] -- 
[   14.014763] ipt_conntrack 

[   14.039306] mlx5_core 0001:00:00.0 enP1s519np0: Link up
[   14.041835] IPv6: ADDRCONF(NETDEV_CHANGE): enP1s519np0: link becomes ready
[   14.047723] == ret: 00
[   14.047725] == KWIFEXITED(ret): 01
[   14.047726] KWEXITSTATUS(ret): 0
[   14.048012] umh: sub_info->path: /sbin/modprobe
[   14.048014] /sbin/modprobe 
[   14.048014] -q 
[   14.048015] -- 
[   14.048015] ipt_REJECT 

[   14.073846] == ret: 00
[   14.073848] == KWIFEXITED(ret): 01
[   14.073848] KWEXITSTATUS(ret): 0
[   14.077239] umh: sub_info->path: /sbin/modprobe
[   14.077241] /sbin/modprobe 
[   14.077241] -q 
[   14.077242] -- 
[   14.077242] ip6t_rpfilter 

[   14.099370] == ret: 00
[   14.099372] == KWIFEXITED(ret): 01
[   14.099373] KWEXITSTATUS(ret): 0
[   14.100300] umh: sub_info->path: /sbin/modprobe
[   14.100301] /sbin/modprobe 
[   14.100302] -q 
[   14.100302] -- 
[   14.100303] ip6t_REJECT 

[   14.122176] == ret: 00
[   14.122177] == KWIFEXITED(ret): 01
[   14.122178] KWEXITSTATUS(ret): 0
[   14.128047] umh: sub_info->path: /sbin/modprobe
[   14.128048] /sbin/modprobe 
[   14.128048] -q 
[   14.128049] -- 
[   14.128049] ip6t_tcp 

[   14.149302] mlx5_core 0002:00:00.0 enP2s564np0: Link up
[   14.173411] == ret: 00
[   14.173413] == KWIFEXITED(ret): 01
[   14.173414] KWEXITSTATUS(ret): 0
[   14.322406] umh: sub_info->path: /sbin/modprobe
[   14.322409] /sbin/modprobe 
[   14.322410] -q 
[   14.322410] -- 
[   14.322411] rtnl-link-bridge 

[   14.389015] == ret: 00
[   14.389017] == KWIFEXITED(ret): 01
[   14.389018] KWEXITSTATUS(ret): 0
[   14.389335] umh: sub_info->path: /sbin/modprobe
[   14.389336] /sbin/modprobe 
[   14.389336] -q 
[   14.389337] -- 
[   14.389338] char-major-10-200 

[   14.437420] tun: Universal TUN/TAP device driver, 1.6
[   14.437604] == ret: 00
[   14.437606] == KWIFEXITED(ret): 01
[   14.437606] KWEXITSTATUS(ret): 0
[   14.438482] virbr0: port 1(virbr0-nic) entered blocking state
[   14.438485] virbr0: port 1(virbr0-nic) entered disabled state
[   14.438635] device virbr0-nic entered promiscuous mode
[   14.439654] umh: sub_info->path: /sbin/bridge-stp
[   14.439656] /sbin/bridge-stp 
[   14.439656] virbr0 
[   14.439656] start 

[   14.439734] == ret: 00
[   14.439735] == KWIFEXITED(ret): 01
[   14.439736] KWEXITSTATUS(ret): 0
[   14.567072] umh: sub_info->path: /sbin/modprobe
[   14.567074] /sbin/modprobe 
[   14.567075] -q 
[   14.567075] -- 
[   14.567076] ipt_CT 

[   14.576352] == ret: 00
[   14.576353] == KWIFEXITED(ret): 01
[   14.576354] KWEXITSTATUS(ret): 0
[   14.576443] umh: sub_info->path: /sbin/modprobe
[   14.576444] /sbin/modprobe 
[   14.576444] -q 
[   14.576445] -- 
[   14.576445] nfct-helper-tftp 

[   14.605182] == ret: 00
[   14.605183] == KWIFEXITED(ret): 01
[   14.605184] KWEXITSTATUS(ret): 0
[   14.675162] umh: sub_info->path: /sbin/modprobe
[   14.675164] /sbin/modprobe 
[   14.675164] -q 
[   14.675165] -- 
[   14.675165] ipt_MASQUERADE 

[   14.684223] == ret: 00
[   14.684224] == KWIFEXITED(ret): 01
[   14.684225] KWEXITSTATUS(ret): 0
[   14.701460] umh: sub_info->path: /sbin/modprobe
[   14.701462] /sbin/modprobe 
[   14.701462] -q 
[   14.701462] -- 
[   14.701463] ipt_CHECKSUM 

[   14.707019] == ret: 00
[   14.707021] == KWIFEXITED(ret): 01
[   14.707022] KWEXITSTATUS(ret): 0
[   14.707948] virbr0: port 1(virbr0-nic) entered blocking state
[   14.731752] virbr0: port 1(virbr0-nic) entered disabled state
[   14.981871] umh: sub_info->path: /sbin/modprobe
[   14.981873] /sbin/modprobe 
[   14.981873] -q 
[   14.981874] -- 
[   14.981874] char-major-10-232 

[   15.014374] == ret: 00
[   15.014375] == KWIFEXITED(ret): 01
[   15.014376] KWEXITSTATUS(ret): 0
[   15.102392] IPv6: ADDRCONF(NETDEV_CHANGE): enP2s564np0: link becomes ready
[   20.341424] virbr0: port 2(vnet0) entered blocking state
[   20.341429] virbr0: port 2(vnet0) entered disabled state
[   20.341514] device vnet0 entered promiscuous mode
[   20.341714] virbr0: port 2(vnet0) entered blocking state
[   20.341988] umh: sub_info->path: /sbin/modprobe
[   20.341989] /sbin/modprobe 
[   20.341989] -q 
[   20.341990] -- 
[   20.341991] char-major-10-238 

[   20.408823] == ret: 00
[   20.408825] == KWIFEXITED(ret): 01
[   20.408826] KWEXITSTATUS(ret): 0

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 10:08                           ` Christian Borntraeger
@ 2020-07-01 13:24                             ` Tetsuo Handa
  2020-07-01 13:53                               ` Luis Chamberlain
  2020-07-01 13:46                             ` Luis Chamberlain
  1 sibling, 1 reply; 43+ messages in thread
From: Tetsuo Handa @ 2020-07-01 13:24 UTC (permalink / raw)
  To: Christian Borntraeger, Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390

On 2020/07/01 19:08, Christian Borntraeger wrote:
> 
> 
> On 30.06.20 19:57, Luis Chamberlain wrote:
>> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
>>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 24.06.20 20:32, Christian Borntraeger wrote:
>>>> [...]> 
>>>>> So the translations look correct. But your change is actually a sematic change
>>>>> if(ret) will only trigger if there is an error
>>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>>>>> and we did not do it before. 
>>>>>
>>>>
>>>> So the right fix is
>>>>
>>>> diff --git a/kernel/umh.c b/kernel/umh.c
>>>> index f81e8698e36e..a3a3196e84d1 100644
>>>> --- a/kernel/umh.c
>>>> +++ b/kernel/umh.c
>>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>>>                  * the real error code is already in sub_info->retval or
>>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>>>>                  */
>>>> -               if (KWIFEXITED(ret))
>>>> +               if (KWEXITSTATUS(ret))
>>>>                         sub_info->retval = KWEXITSTATUS(ret);

Well, it is not br_stp_call_user() but br_stp_start() which is expecting
to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
processed error when UMH_WAIT_PROC is used" will be the correct behavior).


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 10:08                           ` Christian Borntraeger
  2020-07-01 13:24                             ` Tetsuo Handa
@ 2020-07-01 13:46                             ` Luis Chamberlain
  1 sibling, 0 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-07-01 13:46 UTC (permalink / raw)
  To: Christian Borntraeger, Jessica Yu, Tetsuo Handa
  Cc: Christoph Hellwig, Eric W. Biederman, Andrew Morton, ast, axboe,
	bfields, bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390

On Wed, Jul 01, 2020 at 12:08:11PM +0200, Christian Borntraeger wrote:
> dmesg attached
> [   14.438482] virbr0: port 1(virbr0-nic) entered blocking state
> [   14.438485] virbr0: port 1(virbr0-nic) entered disabled state
> [   14.438635] device virbr0-nic entered promiscuous mode
> [   14.439654] umh: sub_info->path: /sbin/bridge-stp
> [   14.439656] /sbin/bridge-stp 
> [   14.439656] virbr0 
> [   14.439656] start 

OK so what we seem to want to debug is the umh call for:

/sbin/bridge-stp virbr0 start

> [   14.439734] == ret: 00
> [   14.439735] == KWIFEXITED(ret): 01
> [   14.439736] KWEXITSTATUS(ret): 0

Its not clear if this is the respective return value, but now
that we have a clue that this is the the only non-modprobe
call, we should have a clearer certainty that this is the
issue.

Indeed my patch "umh: fix processed error when UMH_WAIT_PROC is used"
did modify bridge-stp in the following way:

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index ba55851fe132..bdd94b45396b 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -133,14 +133,8 @@ static int br_stp_call_user(struct net_bridge *br, char *arg)
 
 	/* call userspace STP and report program errors */
 	rc = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC);
-	if (rc > 0) {
-		if (rc & 0xff)
-			br_debug(br, BR_STP_PROG " received signal %d\n",
-				 rc & 0x7f);
-		else
-			br_debug(br, BR_STP_PROG " exited with code %d\n",
-				 (rc >> 8) & 0xff);
-	}
+	if (rc != 0)
+		br_debug(br, BR_STP_PROG " failed with exit code %d\n", rc);
 
 	return rc;
 }

If you look at this carefully though you'll notice that the change just
modifies *when* we issue the debug print. The more important relevant
part of the patch however was that we now do return a correct error
value when the call fails.

More importantly, depending on if an error or not we run the kernel STP
or userspace STP later:

static void br_stp_start(struct net_bridge *br)
{
	int err = -ENOENT;

	if (net_eq(dev_net(br->dev), &init_net))
		err = br_stp_call_user(br, "start");

	if (err && err != -ENOENT)
		br_err(br, "failed to start userspace STP (%d)\n", err);

	spin_lock_bh(&br->lock);

	if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY)
		__br_set_forward_delay(br, BR_MIN_FORWARD_DELAY);
	else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY)
		__br_set_forward_delay(br, BR_MAX_FORWARD_DELAY);

--------------------->  can you enable debug print for this to see what
--------------------->  you end up using?
	if (!err) {
		br->stp_enabled = BR_USER_STP;
		br_debug(br, "userspace STP started\n");
	} else {
		br->stp_enabled = BR_KERNEL_STP;
		br_debug(br, "using kernel STP\n");

		/* To start timers on any ports left in blocking */
		if (br->dev->flags & IFF_UP)
			mod_timer(&br->hello_timer, jiffies + br->hello_time);
		br_port_state_selection(br);
	}
----------------->

	spin_unlock_bh(&br->lock);
}


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 13:24                             ` Tetsuo Handa
@ 2020-07-01 13:53                               ` Luis Chamberlain
  2020-07-01 14:08                                 ` Tetsuo Handa
  2020-07-01 15:26                                 ` Christian Borntraeger
  0 siblings, 2 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-07-01 13:53 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Christian Borntraeger, Christoph Hellwig, ast, axboe, bfields,
	bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390

On Wed, Jul 01, 2020 at 10:24:29PM +0900, Tetsuo Handa wrote:
> On 2020/07/01 19:08, Christian Borntraeger wrote:
> > 
> > 
> > On 30.06.20 19:57, Luis Chamberlain wrote:
> >> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
> >>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 24.06.20 20:32, Christian Borntraeger wrote:
> >>>> [...]> 
> >>>>> So the translations look correct. But your change is actually a sematic change
> >>>>> if(ret) will only trigger if there is an error
> >>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> >>>>> and we did not do it before. 
> >>>>>
> >>>>
> >>>> So the right fix is
> >>>>
> >>>> diff --git a/kernel/umh.c b/kernel/umh.c
> >>>> index f81e8698e36e..a3a3196e84d1 100644
> >>>> --- a/kernel/umh.c
> >>>> +++ b/kernel/umh.c
> >>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >>>>                  * the real error code is already in sub_info->retval or
> >>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
> >>>>                  */
> >>>> -               if (KWIFEXITED(ret))
> >>>> +               if (KWEXITSTATUS(ret))
> >>>>                         sub_info->retval = KWEXITSTATUS(ret);
> 
> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
> processed error when UMH_WAIT_PROC is used" will be the correct behavior).

br_stp_start() doesn't check for the raw value, it just checks for err
or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
used" propagates the correct error now.

Christian, can you try removing the binary temporarily and seeing if
you get your bridge working?

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 13:53                               ` Luis Chamberlain
@ 2020-07-01 14:08                                 ` Tetsuo Handa
  2020-07-01 15:38                                   ` Luis Chamberlain
  2020-07-01 15:26                                 ` Christian Borntraeger
  1 sibling, 1 reply; 43+ messages in thread
From: Tetsuo Handa @ 2020-07-01 14:08 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christian Borntraeger, Christoph Hellwig, ast, axboe, bfields,
	bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390

On 2020/07/01 22:53, Luis Chamberlain wrote:
>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
>> processed error when UMH_WAIT_PROC is used" will be the correct behavior).
> 
> br_stp_start() doesn't check for the raw value, it just checks for err
> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
> used" propagates the correct error now.

No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV
(for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line),
br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that
/sbin/bridge-stp is always terminated by exit() syscall (and hence we can't
ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()).


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 13:53                               ` Luis Chamberlain
  2020-07-01 14:08                                 ` Tetsuo Handa
@ 2020-07-01 15:26                                 ` Christian Borntraeger
  1 sibling, 0 replies; 43+ messages in thread
From: Christian Borntraeger @ 2020-07-01 15:26 UTC (permalink / raw)
  To: Luis Chamberlain, Tetsuo Handa
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390



On 01.07.20 15:53, Luis Chamberlain wrote:
> On Wed, Jul 01, 2020 at 10:24:29PM +0900, Tetsuo Handa wrote:
>> On 2020/07/01 19:08, Christian Borntraeger wrote:
>>>
>>>
>>> On 30.06.20 19:57, Luis Chamberlain wrote:
>>>> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote:
>>>>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> On 24.06.20 20:32, Christian Borntraeger wrote:
>>>>>> [...]> 
>>>>>>> So the translations look correct. But your change is actually a sematic change
>>>>>>> if(ret) will only trigger if there is an error
>>>>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
>>>>>>> and we did not do it before. 
>>>>>>>
>>>>>>
>>>>>> So the right fix is
>>>>>>
>>>>>> diff --git a/kernel/umh.c b/kernel/umh.c
>>>>>> index f81e8698e36e..a3a3196e84d1 100644
>>>>>> --- a/kernel/umh.c
>>>>>> +++ b/kernel/umh.c
>>>>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>>>>>                  * the real error code is already in sub_info->retval or
>>>>>>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>>>>>>                  */
>>>>>> -               if (KWIFEXITED(ret))
>>>>>> +               if (KWEXITSTATUS(ret))
>>>>>>                         sub_info->retval = KWEXITSTATUS(ret);
>>
>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
>> processed error when UMH_WAIT_PROC is used" will be the correct behavior).
> 
> br_stp_start() doesn't check for the raw value, it just checks for err
> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
> used" propagates the correct error now.
> 
> Christian, can you try removing the binary temporarily and seeing if
> you get your bridge working?

As I matter of fact I do NOT have /sbin/bridge-stp installed. 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 14:08                                 ` Tetsuo Handa
@ 2020-07-01 15:38                                   ` Luis Chamberlain
  2020-07-01 15:48                                     ` Christian Borntraeger
  2020-07-02  4:26                                     ` Tetsuo Handa
  0 siblings, 2 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-07-01 15:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Christian Borntraeger, Christoph Hellwig, Eric W. Biederman, ast,
	axboe, bfields, bridge, chainsaw, christian.brauner, chuck.lever,
	davem, dhowells, gregkh, jarkko.sakkinen, jmorris, josh,
	keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, mcgrof, linux-s390

On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote:
> On 2020/07/01 22:53, Luis Chamberlain wrote:
> >> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
> >> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
> >> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
> >> processed error when UMH_WAIT_PROC is used" will be the correct behavior).
> > 
> > br_stp_start() doesn't check for the raw value, it just checks for err
> > or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
> > used" propagates the correct error now.
> 
> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV
> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line),
> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that
> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't
> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()).

Ah, well that would be a different fix required, becuase again,
br_stp_start() does not untangle the correct error today really.
I also I think it would be odd odd that SIGSEGV or another signal 
is what was terminating Christian's bridge stp call, but let's
find out!

Note we pass 0 to the options to wait so the mistake here could indeed
be that we did not need KWIFSIGNALED(). I was afraid of this prospect...
as it other implications.

It means we either *open code* all callers, or we handle this in a
unified way on the umh. And if we do handle this in a unified way, it
then begs the question as to *what* do we pass for the signals case and
continued case. Below we just pass the signal, and treat continued as
OK, but treating continued as OK would also be a *new* change as well.

For instance (this goes just boot tested, but Christian if you can
try this as well that would be appreciated):

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index bba06befbff5..d1898f5dd1fc 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -105,10 +105,12 @@ extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
 
 /* Only add helpers for actual use cases in the kernel */
 #define KWEXITSTATUS(status)		(__KWEXITSTATUS(status))
+#define KWTERMSIG(status)		(__KWTERMSIG(status))
+#define KWSTOPSIG(status)		(__KWSTOPSIG(status))
 #define KWIFEXITED(status)		(__KWIFEXITED(status))
-
-/* Nonzero if STATUS indicates normal termination.  */
-#define __KWIFEXITED(status)     (__KWTERMSIG(status) == 0)
+#define KWIFSIGNALED(status)		(__KWIFSIGNALED(status))
+#define KWIFSTOPPED(status)		(__KWIFSTOPPED(status))
+#define KWIFCONTINUED(status)		(__KWIFCONTINUED(status))
 
 /* If KWIFEXITED(STATUS), the low-order 8 bits of the status.  */
 #define __KWEXITSTATUS(status)   (((status) & 0xff00) >> 8)
@@ -116,6 +118,24 @@ extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
 /* If KWIFSIGNALED(STATUS), the terminating signal.  */
 #define __KWTERMSIG(status)      ((status) & 0x7f)
 
+/* If KWIFSTOPPED(STATUS), the signal that stopped the child.  */
+#define __KWSTOPSIG(status)      __KWEXITSTATUS(status)
+
+/* Nonzero if STATUS indicates normal termination.  */
+#define __KWIFEXITED(status)     (__KWTERMSIG(status) == 0)
+
+/* Nonzero if STATUS indicates termination by a signal.  */
+#define __KWIFSIGNALED(status) \
+	(((signed char) (((status) & 0x7f) + 1) >> 1) > 0)
+
+/* Nonzero if STATUS indicates the child is stopped.  */
+#define __KWIFSTOPPED(status)    (((status) & 0xff) == 0x7f)
+
+/* Nonzero if STATUS indicates the child continued after a stop. */
+#define __KWIFCONTINUED(status) ((status) == __KW_CONTINUED)
+
+#define __KW_CONTINUED		0xffff
+
 extern void free_task(struct task_struct *tsk);
 
 /* sched_exec is called by processes performing an exec */
diff --git a/kernel/umh.c b/kernel/umh.c
index f81e8698e36e..c98fb1ed90c9 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
 		 */
 		if (KWIFEXITED(ret))
 			sub_info->retval = KWEXITSTATUS(ret);
+		/*
+		 * Do we really want to be passing the signal, or do we pass
+		 * a single error code for all cases?
+		 */
+		else if (KWIFSIGNALED(ret))
+			sub_info->retval = KWTERMSIG(ret);
+		/* Same here */
+		else if (KWIFSTOPPED((ret)))
+			sub_info->retval = KWSTOPSIG(ret);
+		/* And are we really sure we want this? */
+		else if (KWIFCONTINUED((ret)))
+			sub_info->retval = 0;
 	}
 
 	/* Restore default kernel sig handler */

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 15:38                                   ` Luis Chamberlain
@ 2020-07-01 15:48                                     ` Christian Borntraeger
  2020-07-01 15:58                                       ` Luis Chamberlain
  2020-07-02  4:26                                     ` Tetsuo Handa
  1 sibling, 1 reply; 43+ messages in thread
From: Christian Borntraeger @ 2020-07-01 15:48 UTC (permalink / raw)
  To: Luis Chamberlain, Tetsuo Handa
  Cc: Christoph Hellwig, Eric W. Biederman, ast, axboe, bfields,
	bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390



On 01.07.20 17:38, Luis Chamberlain wrote:
> On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote:
>> On 2020/07/01 22:53, Luis Chamberlain wrote:
>>>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
>>>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
>>>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
>>>> processed error when UMH_WAIT_PROC is used" will be the correct behavior).
>>>
>>> br_stp_start() doesn't check for the raw value, it just checks for err
>>> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
>>> used" propagates the correct error now.
>>
>> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV
>> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line),
>> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that
>> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't
>> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()).
> 
> Ah, well that would be a different fix required, becuase again,
> br_stp_start() does not untangle the correct error today really.
> I also I think it would be odd odd that SIGSEGV or another signal 
> is what was terminating Christian's bridge stp call, but let's
> find out!
> 
> Note we pass 0 to the options to wait so the mistake here could indeed
> be that we did not need KWIFSIGNALED(). I was afraid of this prospect...
> as it other implications.
> 
> It means we either *open code* all callers, or we handle this in a
> unified way on the umh. And if we do handle this in a unified way, it
> then begs the question as to *what* do we pass for the signals case and
> continued case. Below we just pass the signal, and treat continued as
> OK, but treating continued as OK would also be a *new* change as well.
> 
> For instance (this goes just boot tested, but Christian if you can
> try this as well that would be appreciated):


Does not help, the bridge stays in DOWN state. 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 15:48                                     ` Christian Borntraeger
@ 2020-07-01 15:58                                       ` Luis Chamberlain
  2020-07-01 16:01                                         ` Christian Borntraeger
  0 siblings, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-07-01 15:58 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Tetsuo Handa, Christoph Hellwig, Eric W. Biederman, ast, axboe,
	bfields, bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390

On Wed, Jul 01, 2020 at 05:48:48PM +0200, Christian Borntraeger wrote:
> 
> 
> On 01.07.20 17:38, Luis Chamberlain wrote:
> > On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote:
> >> On 2020/07/01 22:53, Luis Chamberlain wrote:
> >>>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting
> >>>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case.
> >>>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix
> >>>> processed error when UMH_WAIT_PROC is used" will be the correct behavior).
> >>>
> >>> br_stp_start() doesn't check for the raw value, it just checks for err
> >>> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is
> >>> used" propagates the correct error now.
> >>
> >> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV
> >> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line),
> >> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that
> >> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't
> >> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()).
> > 
> > Ah, well that would be a different fix required, becuase again,
> > br_stp_start() does not untangle the correct error today really.
> > I also I think it would be odd odd that SIGSEGV or another signal 
> > is what was terminating Christian's bridge stp call, but let's
> > find out!
> > 
> > Note we pass 0 to the options to wait so the mistake here could indeed
> > be that we did not need KWIFSIGNALED(). I was afraid of this prospect...
> > as it other implications.
> > 
> > It means we either *open code* all callers, or we handle this in a
> > unified way on the umh. And if we do handle this in a unified way, it
> > then begs the question as to *what* do we pass for the signals case and
> > continued case. Below we just pass the signal, and treat continued as
> > OK, but treating continued as OK would also be a *new* change as well.
> > 
> > For instance (this goes just boot tested, but Christian if you can
> > try this as well that would be appreciated):
> 
> 
> Does not help, the bridge stays in DOWN state. 

OK thanks for testing, that was fast! Does your code go through the
STP kernel path or userpath? If it is taking the STP kernel path
then this is not the real culprit to your issue then.

  Luis

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 15:58                                       ` Luis Chamberlain
@ 2020-07-01 16:01                                         ` Christian Borntraeger
  0 siblings, 0 replies; 43+ messages in thread
From: Christian Borntraeger @ 2020-07-01 16:01 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Tetsuo Handa, Christoph Hellwig, Eric W. Biederman, ast, axboe,
	bfields, bridge, chainsaw, christian.brauner, chuck.lever, davem,
	dhowells, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390



On 01.07.20 17:58, Luis Chamberlain wrote:
[...]

>>>
>>> Ah, well that would be a different fix required, becuase again,
>>> br_stp_start() does not untangle the correct error today really.
>>> I also I think it would be odd odd that SIGSEGV or another signal 
>>> is what was terminating Christian's bridge stp call, but let's
>>> find out!
>>>
>>> Note we pass 0 to the options to wait so the mistake here could indeed
>>> be that we did not need KWIFSIGNALED(). I was afraid of this prospect...
>>> as it other implications.
>>>
>>> It means we either *open code* all callers, or we handle this in a
>>> unified way on the umh. And if we do handle this in a unified way, it
>>> then begs the question as to *what* do we pass for the signals case and
>>> continued case. Below we just pass the signal, and treat continued as
>>> OK, but treating continued as OK would also be a *new* change as well.
>>>
>>> For instance (this goes just boot tested, but Christian if you can
>>> try this as well that would be appreciated):
>>
>>
>> Does not help, the bridge stays in DOWN state. 
> 
> OK thanks for testing, that was fast! Does your code go through the
> STP kernel path or userpath? If it is taking the STP kernel path
> then this is not the real culprit to your issue then.

I have no idea and I cannot look into this right now. I can test
patches as compile,reboot and test is almost no effort.

FWIW, this is just the network of a KVM guest of libvirts default network
no longer working, maybe you can reproduce this on x86 as well?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-01 15:38                                   ` Luis Chamberlain
  2020-07-01 15:48                                     ` Christian Borntraeger
@ 2020-07-02  4:26                                     ` Tetsuo Handa
  2020-07-02 19:46                                       ` Luis Chamberlain
  1 sibling, 1 reply; 43+ messages in thread
From: Tetsuo Handa @ 2020-07-02  4:26 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christian Borntraeger, Christoph Hellwig, Eric W. Biederman, ast,
	axboe, bfields, bridge, chainsaw, christian.brauner, chuck.lever,
	davem, dhowells, gregkh, jarkko.sakkinen, jmorris, josh,
	keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

On 2020/07/02 0:38, Luis Chamberlain wrote:
> @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>  		 */
>  		if (KWIFEXITED(ret))
>  			sub_info->retval = KWEXITSTATUS(ret);
> +		/*
> +		 * Do we really want to be passing the signal, or do we pass
> +		 * a single error code for all cases?
> +		 */
> +		else if (KWIFSIGNALED(ret))
> +			sub_info->retval = KWTERMSIG(ret);

No, this is bad. Caller of usermode helper is unable to distinguish exit(9)
and e.g. SIGKILL'ed by the OOM-killer. Please pass raw exit status value.

I feel that caller of usermode helper should not use exit status value.
For example, call_sbin_request_key() is checking

  test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0

condition (if usermode helper was invoked) in order to "ignore any errors from
userspace if the key was instantiated".

> +		/* Same here */
> +		else if (KWIFSTOPPED((ret)))
> +			sub_info->retval = KWSTOPSIG(ret);
> +		/* And are we really sure we want this? */
> +		else if (KWIFCONTINUED((ret)))
> +			sub_info->retval = 0;
>  	}
>  
>  	/* Restore default kernel sig handler */
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-02  4:26                                     ` Tetsuo Handa
@ 2020-07-02 19:46                                       ` Luis Chamberlain
  2020-07-03  0:52                                         ` Tetsuo Handa
  0 siblings, 1 reply; 43+ messages in thread
From: Luis Chamberlain @ 2020-07-02 19:46 UTC (permalink / raw)
  To: Tetsuo Handa, David Howells
  Cc: Christian Borntraeger, Christoph Hellwig, Eric W. Biederman, ast,
	axboe, bfields, bridge, chainsaw, christian.brauner, chuck.lever,
	davem, gregkh, jarkko.sakkinen, jmorris, josh, keescook,
	keyrings, kuba, lars.ellenberg, linux-fsdevel, linux-kernel,
	linux-nfs, linux-security-module, nikolay, philipp.reisner,
	ravenexp, roopa, serge, slyfox, viro, yangtiezhu, netdev,
	markward, linux-s390

On Thu, Jul 02, 2020 at 01:26:53PM +0900, Tetsuo Handa wrote:
> On 2020/07/02 0:38, Luis Chamberlain wrote:
> > @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
> >  		 */
> >  		if (KWIFEXITED(ret))
> >  			sub_info->retval = KWEXITSTATUS(ret);
> > +		/*
> > +		 * Do we really want to be passing the signal, or do we pass
> > +		 * a single error code for all cases?
> > +		 */
> > +		else if (KWIFSIGNALED(ret))
> > +			sub_info->retval = KWTERMSIG(ret);
> 
> No, this is bad. Caller of usermode helper is unable to distinguish exit(9)
> and e.g. SIGKILL'ed by the OOM-killer.

Right, the question is: do we care?

> Please pass raw exit status value.
> 
> I feel that caller of usermode helper should not use exit status value.
> For example, call_sbin_request_key() is checking
> 
>   test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0
> 
> condition (if usermode helper was invoked) in order to "ignore any errors from
> userspace if the key was instantiated".

For those not familiar with this code path, or if you cannot decipher
the above, the code path in question was:

static int call_sbin_request_key(struct key *authkey, void *aux)                
{
	...
	/* do it */                                                             
	ret = call_usermodehelper_keys(request_key, argv, envp, keyring,        
				       UMH_WAIT_PROC);
	kdebug("usermode -> 0x%x", ret);
	if (ret >= 0) {
		/* ret is the exit/wait code */
		if (test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) ||
		    key_validate(key) < 0)
		    ret = -ENOKEY;
		/* ignore any errors from userspace if the key was      
		 * instantiated */  
		 ret = 0;
	}
	...
}

And the umh patch "umh: fix processed error when UMH_WAIT_PROC is used"
changed this to:

-       if (ret >= 0) {
+       if (ret != 0) {

Prior to the patch negative return values from userspace were still
being captured, and likewise signals, but the error value was not
raw, not the actual value. After the patch, since we check for ret != 0
we still upkeep the sanity check for any error, correct the error value,
but as you noted signals were ignored as I made the wrong assumption
we would ignore them. The umh sub_info->retval is set after my original
patch only if KWIFSIGNALED(ret)), and ignored signals, and so that
would be now capitured with the additional KWIFSIGNALED(ret)) check.

The question still stands:

Do we want to open code all these checks or simply wrap them up in
the umh. If we do the later, as you note exit(9) and a SIGKILL will
be the same to the inspector in the kernel. But do we care?

Do we really want umh callers differntiatin between signals and exit values?

The alternative to making a compromise is using generic wrappers for
things which make sense and letting the callers use those.

  Luis

> > +		/* Same here */
> > +		else if (KWIFSTOPPED((ret)))
> > +			sub_info->retval = KWSTOPSIG(ret);
> > +		/* And are we really sure we want this? */
> > +		else if (KWIFCONTINUED((ret)))
> > +			sub_info->retval = 0;
> >  	}
> >  
> >  	/* Restore default kernel sig handler */
> > 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-02 19:46                                       ` Luis Chamberlain
@ 2020-07-03  0:52                                         ` Tetsuo Handa
  2020-07-03 13:28                                           ` Luis Chamberlain
  0 siblings, 1 reply; 43+ messages in thread
From: Tetsuo Handa @ 2020-07-03  0:52 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: David Howells, Christian Borntraeger, Christoph Hellwig,
	Eric W. Biederman, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, gregkh, jarkko.sakkinen,
	jmorris, josh, keescook, keyrings, kuba, lars.ellenberg,
	linux-fsdevel, linux-kernel, linux-nfs, linux-security-module,
	nikolay, philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

On 2020/07/03 4:46, Luis Chamberlain wrote:
> On Thu, Jul 02, 2020 at 01:26:53PM +0900, Tetsuo Handa wrote:
>> On 2020/07/02 0:38, Luis Chamberlain wrote:
>>> @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>>>  		 */
>>>  		if (KWIFEXITED(ret))
>>>  			sub_info->retval = KWEXITSTATUS(ret);
>>> +		/*
>>> +		 * Do we really want to be passing the signal, or do we pass
>>> +		 * a single error code for all cases?
>>> +		 */
>>> +		else if (KWIFSIGNALED(ret))
>>> +			sub_info->retval = KWTERMSIG(ret);
>>
>> No, this is bad. Caller of usermode helper is unable to distinguish exit(9)
>> and e.g. SIGKILL'ed by the OOM-killer.
> 
> Right, the question is: do we care?

Yes, we have to care.

> And the umh patch "umh: fix processed error when UMH_WAIT_PROC is used"
> changed this to:
> 
> -       if (ret >= 0) {
> +       if (ret != 0) {
> 
> Prior to the patch negative return values from userspace were still
> being captured, and likewise signals, but the error value was not
> raw, not the actual value. After the patch, since we check for ret != 0
> we still upkeep the sanity check for any error, correct the error value,
> but as you noted signals were ignored as I made the wrong assumption
> we would ignore them. The umh sub_info->retval is set after my original
> patch only if KWIFSIGNALED(ret)), and ignored signals, and so that
> would be now capitured with the additional KWIFSIGNALED(ret)) check.

"call_usermodehelper_keys() == 0" (i.e. usermode helper was successfully
started and successfully terminated via exit(0)) is different from "there is
nothing to do". call_sbin_request_key() == 0 case still has to check for
possibility of -ENOKEY case.

> 
> The question still stands:
> 
> Do we want to open code all these checks or simply wrap them up in
> the umh. If we do the later, as you note exit(9) and a SIGKILL will
> be the same to the inspector in the kernel. But do we care?

Yes, we do care.

> 
> Do we really want umh callers differntiatin between signals and exit values?

Yes, we do.

> 
> The alternative to making a compromise is using generic wrappers for
> things which make sense and letting the callers use those.

I suggest just introducing KWIFEXITED()/KWEXITSTATUS()/KWIFSIGNALED()/KWTERMSIG()
macros and fixing the callers, for some callers are not aware of possibility of
KWIFSIGNALED() case.

For example, conn_try_outdate_peer() in drivers/block/drbd/drbd_nl.c misbehaves if
drbd_usermode_helper process was terminated by a signal, for the switch() statement
after returning from conn_helper() is assuming that the return value of conn_helper()
is a KWEXITSTATUS() value if drbd_usermode_helper process was successfully started.
If drbd_usermode_helper process was terminated by SIGQUIT (which is 3),
conn_try_outdate_peer() will by error hit "case P_INCONSISTENT:" (which is 3);
conn_try_outdate_peer() should hit "default: /* The script is broken ... */"
unless KWIFEXITED() == true.

Your patch is trying to obnubilate the return code.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
  2020-07-03  0:52                                         ` Tetsuo Handa
@ 2020-07-03 13:28                                           ` Luis Chamberlain
  0 siblings, 0 replies; 43+ messages in thread
From: Luis Chamberlain @ 2020-07-03 13:28 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: David Howells, Christian Borntraeger, Christoph Hellwig,
	Eric W. Biederman, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, gregkh, jarkko.sakkinen,
	jmorris, josh, keescook, keyrings, kuba, lars.ellenberg,
	linux-fsdevel, linux-kernel, linux-nfs, linux-security-module,
	nikolay, philipp.reisner, ravenexp, roopa, serge, slyfox, viro,
	yangtiezhu, netdev, markward, linux-s390

On Fri, Jul 03, 2020 at 09:52:01AM +0900, Tetsuo Handa wrote:
> On 2020/07/03 4:46, Luis Chamberlain wrote:
> > The alternative to making a compromise is using generic wrappers for
> > things which make sense and letting the callers use those.
> 
> I suggest just introducing KWIFEXITED()/KWEXITSTATUS()/KWIFSIGNALED()/KWTERMSIG()
> macros and fixing the callers, for some callers are not aware of possibility of
> KWIFSIGNALED() case.

OK so we open code all uses. Do that in a next iteration.

  Luis


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-07-03 13:29 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10 15:49 [PATCH 0/5] kmod/umh: a few fixes Luis R. Rodriguez
2020-06-10 15:49 ` [PATCH 1/5] selftests: kmod: Use variable NAME in kmod_test_0001() Luis R. Rodriguez
2020-06-10 15:49 ` [PATCH 2/5] kmod: Remove redundant "be an" in the comment Luis R. Rodriguez
2020-06-10 15:49 ` [PATCH 3/5] test_kmod: Avoid potential double free in trigger_config_run_type() Luis R. Rodriguez
2020-06-10 15:49 ` [PATCH 4/5] umh: fix processed error when UMH_WAIT_PROC is used Luis R. Rodriguez
2020-06-23 14:11   ` linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected) Christian Borntraeger
2020-06-23 14:23     ` Christian Borntraeger
2020-06-24 11:11       ` Christian Borntraeger
2020-06-24 12:05         ` Luis Chamberlain
2020-06-24 13:17           ` Luis Chamberlain
2020-06-24 16:13             ` Luis Chamberlain
2020-06-24 14:43         ` Christoph Hellwig
2020-06-24 15:54           ` Christian Borntraeger
2020-06-24 16:09             ` Luis Chamberlain
2020-06-24 17:58               ` Christian Borntraeger
2020-06-24 18:09                 ` Christian Borntraeger
2020-06-24 18:32                   ` Christian Borntraeger
2020-06-24 18:37                     ` Christian Borntraeger
2020-06-25 13:26                       ` Christian Borntraeger
2020-06-26  2:54                       ` Luis Chamberlain
2020-06-26  5:22                         ` Christian Borntraeger
2020-06-26  9:00                           ` Christoph Hellwig
2020-06-26 11:40                             ` Luis Chamberlain
2020-06-26 11:50                               ` Luis Chamberlain
2020-06-30 17:57                         ` Luis Chamberlain
2020-07-01 10:08                           ` Christian Borntraeger
2020-07-01 13:24                             ` Tetsuo Handa
2020-07-01 13:53                               ` Luis Chamberlain
2020-07-01 14:08                                 ` Tetsuo Handa
2020-07-01 15:38                                   ` Luis Chamberlain
2020-07-01 15:48                                     ` Christian Borntraeger
2020-07-01 15:58                                       ` Luis Chamberlain
2020-07-01 16:01                                         ` Christian Borntraeger
2020-07-02  4:26                                     ` Tetsuo Handa
2020-07-02 19:46                                       ` Luis Chamberlain
2020-07-03  0:52                                         ` Tetsuo Handa
2020-07-03 13:28                                           ` Luis Chamberlain
2020-07-01 15:26                                 ` Christian Borntraeger
2020-07-01 13:46                             ` Luis Chamberlain
2020-06-10 15:49 ` [PATCH 5/5] selftests: simplify kmod failure value Luis R. Rodriguez
2020-06-18  0:43 ` [PATCH 0/5] kmod/umh: a few fixes Andrew Morton
2020-06-19 20:46   ` Luis Chamberlain
2020-06-19 21:07     ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).