All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test Mathieu Desnoyers
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

The current state of signal handling for test scripts is: on
SIGTERM/SIGINT of the tests (e.g. a CTRL-C on the console), session
daemon and relay daemon are killed with SIGKILL, thus leaking all their
resources, and leaving lttng kernel modules loaded.

Revamp the "stop" functions to take a signal number and a timeout
as optional parameters. The default signal number is SIGTERM.

The full_cleanup trap handler now tries to nicely kill relayd and
sessiond (if they are present) with SIGTERM, and wait up to the
user-configurable LTTNG_TEST_TEARDOWN_TIMEOUT environment variable
(which has a default of 60s). Then, if there are still either relayd,
sessiond, or consumerd present, it will SIGKILL them and wait for
them to vanish. If it had to kill sessiond with SIGKILL, it will
also explicitly try to unload the lttng modules with modprobe.

This approach is inspired from sysv init script shutdown behavior.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tests/utils/utils.sh | 180 +++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 137 insertions(+), 43 deletions(-)

diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index 94b3a3c4..d273b278 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -15,14 +15,12 @@
 
 SESSIOND_BIN="lttng-sessiond"
 SESSIOND_MATCH=".*lttng-sess.*"
-SESSIOND_PIDS=""
 RUNAS_BIN="lttng-runas"
 RUNAS_MATCH=".*lttng-runas.*"
 CONSUMERD_BIN="lttng-consumerd"
 CONSUMERD_MATCH=".*lttng-consumerd.*"
 RELAYD_BIN="lttng-relayd"
 RELAYD_MATCH=".*lttng-relayd.*"
-RELAYD_PIDS=""
 LTTNG_BIN="lttng"
 BABELTRACE_BIN="babeltrace"
 OUTPUT_DEST=/dev/null
@@ -48,11 +46,20 @@ export LTTNG_SESSIOND_PATH="/bin/true"
 
 source $TESTDIR/utils/tap/tap.sh
 
+if [ -z $LTTNG_TEST_TEARDOWN_TIMEOUT ]; then
+	LTTNG_TEST_TEARDOWN_TIMEOUT=60
+fi
+
 function full_cleanup ()
 {
-	if [ -n "${SESSIOND_PIDS}" ] || [ -n "${RELAYD_PIDS}" ]; then
-		kill -9 ${SESSIOND_PIDS} ${RELAYD_PIDS} > /dev/null 2>&1
-	fi
+	# Try to kill daemons gracefully
+	stop_lttng_relayd_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
+	stop_lttng_sessiond_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
+
+	# If daemons are still present, forcibly kill them
+	stop_lttng_relayd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
+	stop_lttng_sessiond_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
+	stop_lttng_consumerd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
 
 	# Disable trap for SIGTERM since the following kill to the
 	# pidgroup will be SIGTERM. Otherwise it loops.
@@ -397,8 +404,6 @@ function start_lttng_relayd_opt()
 	else
 		pass "Start lttng-relayd (opt: $opt)"
 	fi
-
-	RELAYD_PIDS=$(pgrep $RELAYD_MATCH)
 }
 
 function start_lttng_relayd()
@@ -414,29 +419,58 @@ function start_lttng_relayd_notap()
 function stop_lttng_relayd_opt()
 {
 	local withtap=$1
+	local signal=$2
+	local timeout=$3
+	local dtimeleft=
+	local fail=0
+	local pids=$(pgrep $RELAYD_MATCH)
 
-	if [ $withtap -eq "1" ]; then
-		diag "Killing lttng-relayd (pid: $RELAYD_PIDS)"
+	if [ -n "$timeout" ]; then
+		dtimeleft=$(($timeout * 2))
+	fi
+
+	if [ -z "$signal" ]; then
+		signal="SIGTERM"
 	fi
-	kill $RELAYD_PIDS 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
+
+	if [ -z "$pids" ]; then
+		if [ $withtap -eq "1" ]; then
+			pass "No relay daemon to kill"
+		fi
+		return 0
+	fi
+
+	diag "Killing (signal $signal) lttng-relayd (pid: $pids)"
+
+	kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
 	retval=$?
 
 	if [ $? -eq 1 ]; then
+		fail=1
 		if [ $withtap -eq "1" ]; then
 			fail "Kill relay daemon"
 		fi
-		return 1
 	else
 		out=1
 		while [ -n "$out" ]; do
 			out=$(pgrep $RELAYD_MATCH)
+			if [ -n "$dtimeleft" ]; then
+				if [ $dtimeleft -lt 0 ]; then
+					out=
+					fail=1
+				fi
+				dtimeleft=$(($dtimeleft - 1))
+			fi
 			sleep 0.5
 		done
 		if [ $withtap -eq "1" ]; then
-			pass "Kill relay daemon"
+			if [ $fail -eq "0" ]; then
+				pass "Wait after kill relay daemon"
+			else
+				fail "Wait after kill relay daemon"
+			fi
 		fi
 	fi
-	RELAYD_PIDS=""
 	return $retval
 }
 
@@ -508,7 +542,6 @@ function start_lttng_sessiond_opt()
 			ok $status "Start session daemon"
 		fi
 	fi
-	SESSIOND_PIDS=$(pgrep $SESSIOND_MATCH)
 }
 
 function start_lttng_sessiond()
@@ -525,24 +558,43 @@ function stop_lttng_sessiond_opt()
 {
 	local withtap=$1
 	local signal=$2
-	local kill_opt=""
+	local timeout=$3
+	local dtimeleft=
+	local fail=0
 
-	if [ -n $TEST_NO_SESSIOND ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
+	if [ -n "$timeout" ]; then
+		dtimeleft=$(($timeout * 2))
+	fi
+
+	if [ -n "$TEST_NO_SESSIOND" ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
 		# Env variable requested no session daemon
 		return
 	fi
 
-	local pids="${SESSIOND_PIDS} $(pgrep $RUNAS_MATCH)"
+	local runas_pids=$(pgrep $RUNAS_MATCH)
+	local pids=$(pgrep $SESSIOND_MATCH)
 
-	if [ -n "$2" ]; then
-		kill_opt="$kill_opt -s $signal"
+	if [ -n "$runas_pids" ]; then
+		pids="$pids $runas_pids"
 	fi
-	if [ $withtap -eq "1" ]; then
-		diag "Killing $SESSIOND_BIN and lt-$SESSIOND_BIN pids: $(echo $pids | tr '\n' ' ')"
+
+	if [ -z "$pids" ]; then
+		if [ $withtap -eq "1" ]; then
+			pass "No session daemon to kill"
+		fi
+		return
 	fi
-	kill $kill_opt $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
+
+	if [ -z "$signal" ]; then
+		signal=SIGTERM
+	fi
+
+	diag "Killing (signal $signal) $SESSIOND_BIN and lt-$SESSIOND_BIN pids: $(echo $pids | tr '\n' ' ')"
+
+	kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
 
 	if [ $? -eq 1 ]; then
+		fail=1
 		if [ $withtap -eq "1" ]; then
 			fail "Kill sessions daemon"
 		fi
@@ -550,17 +602,44 @@ function stop_lttng_sessiond_opt()
 		out=1
 		while [ -n "$out" ]; do
 			out=$(pgrep ${SESSIOND_MATCH})
+			if [ -n "$dtimeleft" ]; then
+				if [ $dtimeleft -lt 0 ]; then
+					out=
+					fail=1
+				fi
+				dtimeleft=$(($dtimeleft - 1))
+			fi
 			sleep 0.5
 		done
 		out=1
 		while [ -n "$out" ]; do
 			out=$(pgrep $CONSUMERD_MATCH)
+			if [ -n "$dtimeleft" ]; then
+				if [ $dtimeleft -lt 0 ]; then
+					out=
+					fail=1
+				fi
+				dtimeleft=$(($dtimeleft - 1))
+			fi
 			sleep 0.5
 		done
 
-		SESSIOND_PIDS=""
 		if [ $withtap -eq "1" ]; then
-			pass "Kill session daemon"
+			if [ $fail -eq "0" ]; then
+				pass "Wait after kill session daemon"
+			else
+				fail "Wait after kill session daemon"
+			fi
+		fi
+	fi
+	if [ "$signal" = "SIGKILL" ]; then
+		if [ "$(id -u)" -eq "0" ]; then
+			local modules="$(lsmod | grep ^lttng | awk '{print $1}')"
+
+			if [ -n "$modules" ]; then
+				diag "Unloading all LTTng modules"
+				modprobe -r $modules
+			fi
 		fi
 	fi
 }
@@ -579,21 +658,18 @@ function sigstop_lttng_sessiond_opt()
 {
 	local withtap=$1
 	local signal=SIGSTOP
-	local kill_opt=""
 
-	if [ -n $TEST_NO_SESSIOND ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
+	if [ -n "$TEST_NO_SESSIOND" ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
 		# Env variable requested no session daemon
 		return
 	fi
 
 	PID_SESSIOND="$(pgrep ${SESSIOND_MATCH}) $(pgrep $RUNAS_MATCH)"
 
-	kill_opt="$kill_opt -s $signal"
-
 	if [ $withtap -eq "1" ]; then
 		diag "Sending SIGSTOP to lt-$SESSIOND_BIN and $SESSIOND_BIN pids: $(echo $PID_SESSIOND | tr '\n' ' ')"
 	fi
-	kill $kill_opt $PID_SESSIOND 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
+	kill -s $signal $PID_SESSIOND 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
 
 	if [ $? -eq 1 ]; then
 		if [ $withtap -eq "1" ]; then
@@ -635,26 +711,37 @@ function stop_lttng_consumerd_opt()
 {
 	local withtap=$1
 	local signal=$2
-	local kill_opt=""
+	local timeout=$3
+	local dtimeleft=
+	local fail=0
 
 	PID_CONSUMERD=$(pgrep $CONSUMERD_MATCH)
 
-	if [ -n "$2" ]; then
-		kill_opt="$kill_opt -s $signal"
+	if [ -n "$timeout" ]; then
+		dtimeleft=$(($timeout * 2))
 	fi
 
-	if [ $withtap -eq "1" ]; then
-		diag "Killing $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
+	if [ -z "$signal" ]; then
+		signal=SIGTERM
 	fi
 
-	kill $kill_opt $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
+	if [ -z "$PID_CONSUMERD" ]; then
+		if [ $withtap -eq "1" ]; then
+			pass "No consumer daemon to kill"
+		fi
+		return
+	fi
+
+	diag "Killing (signal $signal) $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
+
+	kill -s $signal $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
 	retval=$?
 
 	if [ $? -eq 1 ]; then
+		fail=1
 		if [ $withtap -eq "1" ]; then
 			fail "Kill consumer daemon"
 		fi
-		return 1
 	else
 		out=1
 		while [ $out -ne 0 ]; do
@@ -669,10 +756,21 @@ function stop_lttng_consumerd_opt()
 					out=1
 				fi
 			done
+			if [ -n "$dtimeleft" ]; then
+				if [ $dtimeleft -lt 0 ]; then
+					out=0
+					fail=1
+				fi
+				dtimeleft=$(($dtimeleft - 1))
+			fi
 			sleep 0.5
 		done
 		if [ $withtap -eq "1" ]; then
-			pass "Kill consumer daemon"
+			if [ $fail -eq "0" ]; then
+				pass "Wait after kill consumer daemon"
+			else
+				fail "Wait after kill consumer daemon"
+			fi
 		fi
 	fi
 	return $retval
@@ -692,16 +790,12 @@ function sigstop_lttng_consumerd_opt()
 {
 	local withtap=$1
 	local signal=SIGSTOP
-	local kill_opt=""
 
 	PID_CONSUMERD=$(pgrep $CONSUMERD_MATCH)
 
-	kill_opt="$kill_opt -s $signal"
+	diag "Sending SIGSTOP to $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
 
-	if [ $withtap -eq "1" ]; then
-		diag "Sending SIGSTOP to $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
-	fi
-	kill $kill_opt $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
+	kill -s $signal $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
 	retval=$?
 
 	if [ $? -eq 1 ]; then
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
  2019-05-03 13:55 ` [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 3/9] Fix: utils.sh: handle SIGPIPE Mathieu Desnoyers
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

Each individual call to "tc" should be checked for error, else we
may fail to catch specific tc errors caused, for instance, by a
kernel configuration that only contains some of the required
class modules.

Also, invoke the utils.sh full_cleanup function from the script-specific
interrupt_cleanup trap handler rather than try to perform stopping
of relayd and sessiond within the script.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 .../tools/streaming/test_high_throughput_limits    | 32 ++++++++++++++++++----
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/tests/regression/tools/streaming/test_high_throughput_limits b/tests/regression/tools/streaming/test_high_throughput_limits
index 32c3f1f2..68973c73 100755
--- a/tests/regression/tools/streaming/test_high_throughput_limits
+++ b/tests/regression/tools/streaming/test_high_throughput_limits
@@ -51,28 +51,47 @@ function set_bw_limit
 	# parent qdisc (1:) will always limit us to the right max value
 	dataportlimit=$((9*${ctrlportlimit}))
 
+	diag "Set bandwidth limits to ${limit}kbits, ${ctrlportlimit} for control and ${dataportlimit} for data"
 
 	tc qdisc add dev $DEFAULT_IF root handle 1: htb default 15 >/dev/null 2>&1
+	if [ $? -ne 0 ]; then
+		return 1
+	fi
 
 	# the total bandwidth is the limit set by the user
 	tc class add dev $DEFAULT_IF parent 1: classid 1:1 htb rate ${limit}kbit ceil ${limit}kbit >/dev/null 2>&1
+	if [ $? -ne 0 ]; then
+		return 1
+	fi
 	# 1/10 of the bandwidth guaranteed and traffic prioritized for the control port
 	tc class add dev $DEFAULT_IF parent 1:1 classid 1:10 htb rate ${ctrlportlimit}kbit ceil ${limit}kbit prio 1 >/dev/null 2>&1
+	if [ $? -ne 0 ]; then
+		return 1
+	fi
 	# 9/10 of the bandwidth guaranteed and can borrow up to the total bandwidth (if unused)
 	tc class add dev $DEFAULT_IF parent 1:1 classid 1:11 htb rate ${dataportlimit}kbit ceil ${limit}kbit prio 2 >/dev/null 2>&1
+	if [ $? -ne 0 ]; then
+		return 1
+	fi
 
 	# filter to assign control traffic to the 1:10 class
 	tc filter add dev $DEFAULT_IF parent 1: protocol ip u32 match ip dport $SESSIOND_CTRL_PORT 0xffff flowid 1:10 >/dev/null 2>&1
+	if [ $? -ne 0 ]; then
+		return 1
+	fi
 	# filter to assign data traffic to the 1:11 class
 	tc filter add dev $DEFAULT_IF parent 1: protocol ip u32 match ip dport $SESSIOND_DATA_PORT 0xffff flowid 1:11 >/dev/null 2>&1
+	if [ $? -ne 0 ]; then
+		return 1
+	fi
 
-	ok $? "Set bandwidth limits to ${limit}kbits, ${ctrlportlimit} for control and ${dataportlimit} for data"
+	return 0
 }
 
 function reset_bw_limit
 {
 	tc qdisc del dev $DEFAULT_IF root >/dev/null 2>&1
-	ok $? "Reset bandwith limits"
+	return $?
 }
 
 function create_lttng_session_with_uri
@@ -148,9 +167,9 @@ function validate_event_count
 function interrupt_cleanup()
 {
 	diag "*** Exiting ***"
-	stop_lttng_relayd
-	stop_lttng_sessiond
 	reset_bw_limit
+	# invoke utils cleanup
+	full_cleanup
 	exit 1
 }
 
@@ -168,8 +187,7 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests."
 {
 
 	# Catch sigint and try to cleanup limits
-	trap interrupt_cleanup SIGTERM
-	trap interrupt_cleanup SIGINT
+	trap interrupt_cleanup SIGTERM SIGINT
 
 	BW_LIMITS=(3200 1600 800 400 200 100 50 25)
 	for BW in ${BW_LIMITS[@]};
@@ -177,6 +195,7 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests."
 		diag "Test high-throughput with bandwidth limit set to ${BW}kbits"
 
 		set_bw_limit $BW
+		ok $? "Setting bandwidth limit"
 
 		start_lttng_sessiond
 		start_lttng_relayd "-o $TRACE_PATH"
@@ -185,5 +204,6 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests."
 		stop_lttng_relayd
 		stop_lttng_sessiond
 		reset_bw_limit
+		ok $? "Reset bandwith limits"
 	done
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 3/9] Fix: utils.sh: handle SIGPIPE
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
  2019-05-03 13:55 ` [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 4/9] Fix: test: utils.sh: exit from process on full_cleanup Mathieu Desnoyers
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

perl prove closes its child pipes before giving it a chance to execute
the signal trap handler. This means the child will not be able to
complete execution of the trap handler if that handler writes to stdout
or stderr.

Work-around this situation by redirecting stdin, stdout, and stderr
to /dev/null if a SIGPIPE is caught.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tests/utils/utils.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index d273b278..b8ac88c1 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -68,9 +68,21 @@ function full_cleanup ()
 	trap - SIGTERM && kill -- -$$
 }
 
+function null_pipes ()
+{
+	exec 0>/dev/null
+	exec 1>/dev/null
+	exec 2>/dev/null
+}
 
 trap full_cleanup SIGINT SIGTERM
 
+# perl prove closes its child pipes before giving it a chance to run its
+# signal trap handlers. Redirect pipes to /dev/null if SIGPIPE is caught
+# to allow those trap handlers to proceed.
+
+trap null_pipes SIGPIPE
+
 function print_ok ()
 {
 	# Check if we are a terminal
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 4/9] Fix: test: utils.sh: exit from process on full_cleanup
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
                   ` (2 preceding siblings ...)
  2019-05-03 13:55 ` [PATCH lttng-tools 3/9] Fix: utils.sh: handle SIGPIPE Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice Mathieu Desnoyers
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tests/utils/utils.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
index b8ac88c1..42e78d25 100644
--- a/tests/utils/utils.sh
+++ b/tests/utils/utils.sh
@@ -66,6 +66,7 @@ function full_cleanup ()
 	# The '-' before the pid number ($$) indicates 'kill' to signal the
 	# whole process group.
 	trap - SIGTERM && kill -- -$$
+	exit 1
 }
 
 function null_pipes ()
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
                   ` (3 preceding siblings ...)
  2019-05-03 13:55 ` [PATCH lttng-tools 4/9] Fix: test: utils.sh: exit from process on full_cleanup Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r Mathieu Desnoyers
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tests/regression/tools/live/test_lttng_ust | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/regression/tools/live/test_lttng_ust b/tests/regression/tools/live/test_lttng_ust
index 06017d01..830fc783 100755
--- a/tests/regression/tools/live/test_lttng_ust
+++ b/tests/regression/tools/live/test_lttng_ust
@@ -34,7 +34,7 @@ TRACE_PATH=$(mktemp -d)
 
 DIR=$(readlink -f $TESTDIR)
 
-NUM_TESTS=12
+NUM_TESTS=11
 
 source $TESTDIR/utils/utils.sh
 
@@ -84,5 +84,4 @@ stop_lttng_relayd
 
 test_custom_url
 
-stop_lttng_relayd
 stop_lttng_sessiond
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
                   ` (4 preceding siblings ...)
  2019-05-03 13:55 ` [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API Mathieu Desnoyers
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

Scripts implementing their own trap handlers override the generic
one provided by utils.sh (full_cleanup). Invoke it at the end of
the handlers to provide the utils cleanup as well.

Moreover, change use of "rmmod" to "modprobe -r", which is better
in trap handlers because it does not print errors if the module
was not loaded yet when the signal occurs.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tests/regression/kernel/test_clock_override                    | 10 +++-------
 tests/regression/kernel/test_rotation_destroy_flush            |  7 +++----
 tests/regression/tools/crash/test_crash                        |  3 +--
 tests/regression/tools/notification/test_notification_kernel   |  2 +-
 .../regression/tools/notification/test_notification_multi_app  |  2 +-
 tests/regression/tools/notification/test_notification_ust      |  2 +-
 tests/regression/tools/streaming/test_high_throughput_limits   |  1 -
 .../ust/rotation-destroy-flush/test_rotation_destroy_flush     |  3 +--
 tests/stress/test_multi_sessions_per_uid_10app                 |  5 ++---
 tests/stress/test_multi_sessions_per_uid_5app_streaming        |  5 ++---
 .../test_multi_sessions_per_uid_5app_streaming_kill_relayd     |  5 ++---
 11 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/tests/regression/kernel/test_clock_override b/tests/regression/kernel/test_clock_override
index e19b77e1..1fbba771 100755
--- a/tests/regression/kernel/test_clock_override
+++ b/tests/regression/kernel/test_clock_override
@@ -49,11 +49,9 @@ source $TESTDIR/utils/utils.sh
 function signal_cleanup()
 {
 	diag "*** Exiting ***"
-	rmmod lttng-test
 	stop_lttng_sessiond
-	rmmod lttng-clock-plugin-test
-	rmmod lttng-clock
-	exit 1
+	modprobe -r lttng-test lttng-clock-plugin-test lttng-clock
+	full_cleanup
 }
 
 function extract_clock_metadata()
@@ -93,10 +91,8 @@ function test_clock_override_metadata()
 	stop_lttng_tracing_ok $SESSION_NAME
 	destroy_lttng_session_ok $SESSION_NAME
 
-	rmmod lttng-test
 	stop_lttng_sessiond
-	rmmod lttng-clock-plugin-test
-	rmmod lttng-clock
+	modprobe -r lttng-test lttng-clock-plugin-test lttng-clock
 
 	local TRACE_METADATA_FILE_PATH="$(find "$TRACE_PATH" -name metadata -type f)"
 	local TRACE_METADATA_DIR="$(dirname "$TRACE_METADATA_FILE_PATH")"
diff --git a/tests/regression/kernel/test_rotation_destroy_flush b/tests/regression/kernel/test_rotation_destroy_flush
index 0b0b0ca7..03933a3a 100755
--- a/tests/regression/kernel/test_rotation_destroy_flush
+++ b/tests/regression/kernel/test_rotation_destroy_flush
@@ -39,9 +39,8 @@ source $TESTDIR/utils/utils.sh
 function signal_cleanup()
 {
 	diag "*** Exiting ***"
-	rmmod lttng-test
-	stop_lttng_sessiond
-	exit 1
+	modprobe -r lttng-test
+	full_cleanup
 }
 
 function enable_kernel_lttng_channel_size_limit ()
@@ -107,7 +106,7 @@ function test_rotation_destroy_flush_single()
 
 	rm -rf $TRACE_PATH
 
-	rmmod lttng-test
+	modprobe -r lttng-test
 	stop_lttng_sessiond
 }
 
diff --git a/tests/regression/tools/crash/test_crash b/tests/regression/tools/crash/test_crash
index 13909c1b..5bad16e5 100755
--- a/tests/regression/tools/crash/test_crash
+++ b/tests/regression/tools/crash/test_crash
@@ -392,8 +392,7 @@ function interrupt_cleanup()
 {
     diag "*** Cleaning-up test ***"
     stop_test_apps
-    stop_lttng_sessiond
-    exit 1
+    full_cleanup
 }
 
 TESTS=(
diff --git a/tests/regression/tools/notification/test_notification_kernel b/tests/regression/tools/notification/test_notification_kernel
index e7368df2..cc6fc581 100755
--- a/tests/regression/tools/notification/test_notification_kernel
+++ b/tests/regression/tools/notification/test_notification_kernel
@@ -56,7 +56,7 @@ function kernel_event_generator
 	state_file=$1
 	kernel_event_generator_suspended=0
 	trap kernel_event_generator_toogle_state SIGUSR1
-	trap "exit" SIGTERM SIGINT EXIT
+
 	while (true); do
 		if [[ $kernel_event_generator_suspended -eq "1" ]]; then
 			touch $state_file
diff --git a/tests/regression/tools/notification/test_notification_multi_app b/tests/regression/tools/notification/test_notification_multi_app
index 7465a83f..51d94e4f 100755
--- a/tests/regression/tools/notification/test_notification_multi_app
+++ b/tests/regression/tools/notification/test_notification_multi_app
@@ -64,7 +64,7 @@ function kernel_event_generator
 	state_file=$1
 	kernel_event_generator_suspended=0
 	trap kernel_event_generator_toogle_state SIGUSR1
-	trap "exit" SIGTERM SIGINT
+
 	while (true); do
 		if [[ $kernel_event_generator_suspended -eq "1" ]]; then
 			touch $state_file
diff --git a/tests/regression/tools/notification/test_notification_ust b/tests/regression/tools/notification/test_notification_ust
index 8941e476..82f79a8e 100755
--- a/tests/regression/tools/notification/test_notification_ust
+++ b/tests/regression/tools/notification/test_notification_ust
@@ -56,7 +56,7 @@ function ust_event_generator
 	state_file=$1
 	ust_event_generator_suspended=0
 	trap ust_event_generator_toogle_state SIGUSR1
-	trap "exit" SIGTERM SIGINT
+
 	while (true); do
 		if [[ $ust_event_generator_suspended -eq "1" ]]; then
 			touch $state_file
diff --git a/tests/regression/tools/streaming/test_high_throughput_limits b/tests/regression/tools/streaming/test_high_throughput_limits
index 68973c73..c49c13c2 100755
--- a/tests/regression/tools/streaming/test_high_throughput_limits
+++ b/tests/regression/tools/streaming/test_high_throughput_limits
@@ -170,7 +170,6 @@ function interrupt_cleanup()
 	reset_bw_limit
 	# invoke utils cleanup
 	full_cleanup
-	exit 1
 }
 
 plan_tests $NUM_TESTS
diff --git a/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush b/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush
index a7a93771..e404564e 100755
--- a/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush
+++ b/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush
@@ -48,8 +48,7 @@ function run_app()
 function signal_cleanup()
 {
 	diag "*** Exiting ***"
-	stop_lttng_sessiond
-	exit 1
+	full_cleanup
 }
 
 function enable_ust_lttng_channel_size_limit ()
diff --git a/tests/stress/test_multi_sessions_per_uid_10app b/tests/stress/test_multi_sessions_per_uid_10app
index 82e8ad50..c9f8403e 100755
--- a/tests/stress/test_multi_sessions_per_uid_10app
+++ b/tests/stress/test_multi_sessions_per_uid_10app
@@ -112,11 +112,10 @@ function sighandler()
 {
 	cleanup
 	rm $LOG_FILE
-	exit 1
+	full_cleanup
 }
 
-trap sighandler SIGINT
-trap sighandler SIGTERM
+trap sighandler SIGINT SIGTERM
 
 # Make sure we collect a coredump if possible.
 ulimit -c unlimited
diff --git a/tests/stress/test_multi_sessions_per_uid_5app_streaming b/tests/stress/test_multi_sessions_per_uid_5app_streaming
index ed989498..4203ac30 100755
--- a/tests/stress/test_multi_sessions_per_uid_5app_streaming
+++ b/tests/stress/test_multi_sessions_per_uid_5app_streaming
@@ -142,11 +142,10 @@ function sighandler()
 {
 	cleanup
 	rm $LOG_FILE_SESSIOND $LOG_FILE_RELAYD
-	exit 1
+	full_cleanup
 }
 
-trap sighandler SIGINT
-trap sighandler SIGTERM
+trap sighandler SIGINT SIGTERM
 
 # Make sure we collect a coredump if possible.
 ulimit -c unlimited
diff --git a/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd b/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd
index c699ac22..d0121e32 100755
--- a/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd
+++ b/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd
@@ -144,11 +144,10 @@ function sighandler()
 {
 	cleanup
 	#rm $LOG_FILE_SESSIOND $LOG_FILE_RELAYD
-	exit 1
+	full_cleanup
 }
 
-trap sighandler SIGINT
-trap sighandler SIGTERM
+trap sighandler SIGINT SIGTERM
 
 # Make sure we collect a coredump if possible.
 ulimit -c unlimited
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
                   ` (5 preceding siblings ...)
  2019-05-03 13:55 ` [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select() Mathieu Desnoyers
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

Some use of the epoll/poll wrapper require interruption
by signals to make the poll call return -1, errno EINTR.
Expose a new lttng_poll_wait_interruptible API for this
purpose.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Yannick Lamarre <ylamarre@efficios.com>
---
 src/common/compat/compat-epoll.c |  9 +++++----
 src/common/compat/compat-poll.c  |  9 +++++----
 src/common/compat/poll.h         | 12 ++++++++----
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/common/compat/compat-epoll.c b/src/common/compat/compat-epoll.c
index 6a781c7a..7108b717 100644
--- a/src/common/compat/compat-epoll.c
+++ b/src/common/compat/compat-epoll.c
@@ -241,7 +241,7 @@ error:
 /*
  * Wait on epoll set. This is a blocking call of timeout value.
  */
-int compat_epoll_wait(struct lttng_poll_event *events, int timeout)
+int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int interruptible)
 {
 	int ret;
 	uint32_t new_size;
@@ -273,10 +273,11 @@ int compat_epoll_wait(struct lttng_poll_event *events, int timeout)
 
 	do {
 		ret = epoll_wait(events->epfd, events->events, events->nb_fd, timeout);
-	} while (ret == -1 && errno == EINTR);
+	} while (!interruptible && ret == -1 && errno == EINTR);
 	if (ret < 0) {
-		/* At this point, every error is fatal */
-		PERROR("epoll_wait");
+		if (errno != EINTR) {
+			PERROR("epoll_wait");
+		}
 		goto error;
 	}
 
diff --git a/src/common/compat/compat-poll.c b/src/common/compat/compat-poll.c
index b45b39dc..cdb6f8b5 100644
--- a/src/common/compat/compat-poll.c
+++ b/src/common/compat/compat-poll.c
@@ -281,7 +281,7 @@ error:
 /*
  * Wait on poll() with timeout. Blocking call.
  */
-int compat_poll_wait(struct lttng_poll_event *events, int timeout)
+int compat_poll_wait(struct lttng_poll_event *events, int timeout, int interruptible)
 {
 	int ret;
 
@@ -308,10 +308,11 @@ int compat_poll_wait(struct lttng_poll_event *events, int timeout)
 
 	do {
 		ret = poll(events->wait.events, events->wait.nb_fd, timeout);
-	} while (ret == -1 && errno == EINTR);
+	} while (!interruptible && ret == -1 && errno == EINTR);
 	if (ret < 0) {
-		/* At this point, every error is fatal */
-		PERROR("poll wait");
+		if (errno != EINTR) {
+			PERROR("poll wait");
+		}
 		goto error;
 	}
 
diff --git a/src/common/compat/poll.h b/src/common/compat/poll.h
index d4bd87f5..5400e5b1 100644
--- a/src/common/compat/poll.h
+++ b/src/common/compat/poll.h
@@ -152,9 +152,11 @@ static inline int compat_glibc_epoll_create(int size, int flags)
  * Wait on epoll set with the number of fd registered to the lttng_poll_event
  * data structure (events).
  */
-extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout);
+extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int interruptible);
 #define lttng_poll_wait(events, timeout) \
-	compat_epoll_wait(events, timeout)
+	compat_epoll_wait(events, timeout, 0)
+#define lttng_poll_wait_interruptible(events, timeout) \
+	compat_epoll_wait(events, timeout, 1)
 
 /*
  * Add a fd to the epoll set and resize the epoll_event structure if needed.
@@ -334,9 +336,11 @@ extern int compat_poll_create(struct lttng_poll_event *events, int size);
  * Wait on poll(2) event with nb_fd registered to the lttng_poll_event data
  * structure.
  */
-extern int compat_poll_wait(struct lttng_poll_event *events, int timeout);
+extern int compat_poll_wait(struct lttng_poll_event *events, int timeout, int interruptible);
 #define lttng_poll_wait(events, timeout) \
-	compat_poll_wait(events, timeout)
+	compat_poll_wait(events, timeout, 0)
+#define lttng_poll_wait_interruptible(events, timeout) \
+	compat_poll_wait(events, timeout, 1)
 
 /*
  * Add the fd to the pollfd structure. Resize if needed.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select()
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
                   ` (6 preceding siblings ...)
  2019-05-03 13:55 ` [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
  2019-05-03 13:55 ` [PATCH lttng-tools 9/9] sessiond: " Mathieu Desnoyers
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

The select(2) system call is an ancient ABI limited to processes
containing at most FD_SETSIZE file descriptors overall (typically
1024).

Those notification APIs will fail if the target file descriptor
is above FD_SETSIZE in a process containing many file descriptors.

Never use select, use the lttng epoll/poll wrapper instead.

This patch depends on "Change lttng_poll_wait behaviour of compat-poll
to match compat-epoll" posted by Yannick Lamarre.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Yannick Lamarre <ylamarre@efficios.com>
---
 src/lib/lttng-ctl/channel.c | 76 +++++++++++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/src/lib/lttng-ctl/channel.c b/src/lib/lttng-ctl/channel.c
index 5271aa13..bcecc65f 100644
--- a/src/lib/lttng-ctl/channel.c
+++ b/src/lib/lttng-ctl/channel.c
@@ -26,8 +26,7 @@
 #include <common/defaults.h>
 #include <assert.h>
 #include "lttng-ctl-helper.h"
-#include <sys/select.h>
-#include <sys/time.h>
+#include <common/compat/poll.h>
 
 static
 int handshake(struct lttng_notification_channel *channel);
@@ -211,7 +210,7 @@ lttng_notification_channel_get_next_notification(
 	struct lttng_notification *notification = NULL;
 	enum lttng_notification_channel_status status =
 			LTTNG_NOTIFICATION_CHANNEL_STATUS_OK;
-	fd_set read_fds;
+	struct lttng_poll_event events;
 
 	if (!channel || !_notification) {
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_INVALID;
@@ -241,9 +240,9 @@ lttng_notification_channel_get_next_notification(
 	}
 
 	/*
-	 * Block on select() instead of the message reception itself as the
-	 * recvmsg() wrappers always restard on EINTR. We choose to wait
-	 * using select() in order to:
+	 * Block on interruptible epoll/poll() instead of the message reception
+	 * itself as the recvmsg() wrappers always restart on EINTR. We choose
+	 * to wait using interruptible epoll/poll() in order to:
 	 *   1) Return if a signal occurs,
 	 *   2) Not deal with partially received messages.
 	 *
@@ -252,20 +251,28 @@ lttng_notification_channel_get_next_notification(
 	 * announced length, receive_message() will block on recvmsg()
 	 * and never return (even if a signal is received).
 	 */
-	FD_ZERO(&read_fds);
-	FD_SET(channel->socket, &read_fds);
-	ret = select(channel->socket + 1, &read_fds, NULL, NULL, NULL);
-	if (ret == -1) {
-		status = errno == EINTR ?
+	ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC);
+	if (ret < 0) {
+		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
+		goto end_unlock;
+	}
+	ret = lttng_poll_add(&events, channel->socket, LPOLLIN | LPOLLERR);
+	if (ret < 0) {
+		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
+		goto end_clean_poll;
+	}
+	ret = lttng_poll_wait_interruptible(&events, -1);
+	if (ret <= 0) {
+		status = (ret == -1 && errno == EINTR) ?
 			LTTNG_NOTIFICATION_CHANNEL_STATUS_INTERRUPTED :
 			LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-		goto end_unlock;
+		goto end_clean_poll;
 	}
 
 	ret = receive_message(channel);
 	if (ret) {
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-		goto end_unlock;
+		goto end_clean_poll;
 	}
 
 	switch (get_current_message_type(channel)) {
@@ -274,7 +281,7 @@ lttng_notification_channel_get_next_notification(
 				channel);
 		if (!notification) {
 			status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-			goto end_unlock;
+			goto end_clean_poll;
 		}
 		break;
 	case LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED:
@@ -284,9 +291,11 @@ lttng_notification_channel_get_next_notification(
 	default:
 		/* Protocol error. */
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-		goto end_unlock;
+		goto end_clean_poll;
 	}
 
+end_clean_poll:
+	lttng_poll_clean(&events);
 end_unlock:
 	pthread_mutex_unlock(&channel->lock);
 	*_notification = notification;
@@ -387,11 +396,7 @@ lttng_notification_channel_has_pending_notification(
 	int ret;
 	enum lttng_notification_channel_status status =
 			LTTNG_NOTIFICATION_CHANNEL_STATUS_OK;
-	fd_set read_fds;
-	struct timeval timeout;
-
-	FD_ZERO(&read_fds);
-	memset(&timeout, 0, sizeof(timeout));
+	struct lttng_poll_event events;
 
 	if (!channel || !_notification_pending) {
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_INVALID;
@@ -426,48 +431,57 @@ lttng_notification_channel_has_pending_notification(
 	 * message if we see data available on the socket. If the peer does
 	 * not respect the protocol, this may block indefinitely.
 	 */
-	FD_SET(channel->socket, &read_fds);
-	do {
-		ret = select(channel->socket + 1, &read_fds, NULL, NULL, &timeout);
-	} while (ret < 0 && errno == EINTR);
-
+	ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC);
+	if (ret < 0) {
+		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
+		goto end_unlock;
+	}
+	ret = lttng_poll_add(&events, channel->socket, LPOLLIN | LPOLLERR);
+	if (ret < 0) {
+		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
+		goto end_clean_poll;
+	}
+	/* timeout = 0: return immediately. */
+	ret = lttng_poll_wait_interruptible(&events, 0);
 	if (ret == 0) {
 		/* No data available. */
 		*_notification_pending = false;
-		goto end_unlock;
+		goto end_clean_poll;
 	} else if (ret < 0) {
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-		goto end_unlock;
+		goto end_clean_poll;
 	}
 
 	/* Data available on socket. */
 	ret = receive_message(channel);
 	if (ret) {
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-		goto end_unlock;
+		goto end_clean_poll;
 	}
 
 	switch (get_current_message_type(channel)) {
 	case LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION:
 		ret = enqueue_notification_from_current_message(channel);
 		if (ret) {
-			goto end_unlock;
+			goto end_clean_poll;
 		}
 		*_notification_pending = true;
 		break;
 	case LTTNG_NOTIFICATION_CHANNEL_MESSAGE_TYPE_NOTIFICATION_DROPPED:
 		ret = enqueue_dropped_notification(channel);
 		if (ret) {
-			goto end_unlock;
+			goto end_clean_poll;
 		}
 		*_notification_pending = true;
 		break;
 	default:
 		/* Protocol error. */
 		status = LTTNG_NOTIFICATION_CHANNEL_STATUS_ERROR;
-		goto end_unlock;
+		goto end_clean_poll;
 	}
 
+end_clean_poll:
+	lttng_poll_clean(&events);
 end_unlock:
 	pthread_mutex_unlock(&channel->lock);
 end:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH lttng-tools 9/9] sessiond: use epoll()/poll() instead of select()
       [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
                   ` (7 preceding siblings ...)
  2019-05-03 13:55 ` [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select() Mathieu Desnoyers
@ 2019-05-03 13:55 ` Mathieu Desnoyers
       [not found] ` <20190503135547.12968-8-mathieu.desnoyers@efficios.com>
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-03 13:55 UTC (permalink / raw)
  To: jgalar, joraj; +Cc: lttng-dev

The select(2) system call is an ancient ABI limited to processes
containing at most FD_SETSIZE file descriptors overall (typically
1024).

This select call will fail if the target file descriptor is above
FD_SETSIZE in a session daemon containing many file descriptors.
This is unlikely to happen in normal use given than
sessiond_init_thread_quit_pipe() is called early by main(). Odd
scenarios could trigger this, for instance if the parent process leaves
a large number of file descriptors open, or if a library which
allocates file descriptors is LD_PRELOADed with the sessiond.

Never use select, use the lttng epoll/poll wrapper instead.

This patch depends on "Change lttng_poll_wait behaviour of compat-poll
to match compat-epoll" posted by Yannick Lamarre.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Yannick Lamarre <ylamarre@efficios.com>
---
 src/bin/lttng-sessiond/lttng-sessiond.h |  2 +-
 src/bin/lttng-sessiond/main.c           |  2 +-
 src/bin/lttng-sessiond/thread-utils.c   | 39 ++++++++++++++-------------------
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/src/bin/lttng-sessiond/lttng-sessiond.h b/src/bin/lttng-sessiond/lttng-sessiond.h
index d321fced..6810e896 100644
--- a/src/bin/lttng-sessiond/lttng-sessiond.h
+++ b/src/bin/lttng-sessiond/lttng-sessiond.h
@@ -161,7 +161,7 @@ extern struct consumer_data kconsumer_data;
 
 int sessiond_init_thread_quit_pipe(void);
 int sessiond_check_thread_quit_pipe(int fd, uint32_t events);
-int sessiond_wait_for_quit_pipe(unsigned int timeout_us);
+int sessiond_wait_for_quit_pipe(int timeout_ms);
 int sessiond_notify_quit_pipe(void);
 void sessiond_close_quit_pipe(void);
 
diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c
index 57324820..a6719a4d 100644
--- a/src/bin/lttng-sessiond/main.c
+++ b/src/bin/lttng-sessiond/main.c
@@ -1833,7 +1833,7 @@ int main(int argc, char **argv)
 	 */
 
 	/* Initiate teardown once activity occurs on the quit pipe. */
-	sessiond_wait_for_quit_pipe(-1U);
+	sessiond_wait_for_quit_pipe(-1);
 
 stop_threads:
 	/*
diff --git a/src/bin/lttng-sessiond/thread-utils.c b/src/bin/lttng-sessiond/thread-utils.c
index 16ae9d69..e1c02290 100644
--- a/src/bin/lttng-sessiond/thread-utils.c
+++ b/src/bin/lttng-sessiond/thread-utils.c
@@ -73,41 +73,36 @@ int sessiond_check_thread_quit_pipe(int fd, uint32_t events)
  * Returns 1 if the caller should quit, 0 if the timeout was reached, and
  * -1 if an error was encountered.
  */
-int sessiond_wait_for_quit_pipe(unsigned int timeout_us)
+int sessiond_wait_for_quit_pipe(int timeout_ms)
 {
 	int ret;
-	fd_set read_fds;
-	struct timeval timeout;
-
-	FD_ZERO(&read_fds);
-	FD_SET(thread_quit_pipe[0], &read_fds);
-	memset(&timeout, 0, sizeof(timeout));
-	timeout.tv_sec = timeout_us / USEC_PER_SEC;
-	timeout.tv_usec = timeout_us % USEC_PER_SEC;
-
-	while (true) {
-		ret = select(thread_quit_pipe[0] + 1, &read_fds, NULL, NULL,
-				timeout_us != -1U ? &timeout : NULL);
-		if (ret < 0 && errno == EINTR) {
-			/* Retry on interrupt. */
-			continue;
-		} else {
-			break;
-		}
-	}
+	struct lttng_poll_event events;
 
+	ret = lttng_poll_create(&events, 1, LTTNG_CLOEXEC);
+	if (ret < 0) {
+		PERROR("Failure in lttng_poll_create");
+		return -1;
+	}
+	ret = lttng_poll_add(&events, thread_quit_pipe[0], LPOLLIN | LPOLLERR);
+	if (ret < 0) {
+		PERROR("Failure in lttng_poll_add");
+		ret = -1;
+		goto end;
+	}
+	ret = lttng_poll_wait(&events, timeout_ms);
 	if (ret > 0) {
 		/* Should quit. */
 		ret = 1;
 	} else if (ret < 0 && errno != EINTR) {
 		/* Unknown error. */
-		PERROR("Failed to select() thread quit pipe");
+		PERROR("Failed to epoll()/poll() thread quit pipe");
 		ret = -1;
 	} else {
 		/* Timeout reached. */
 		ret = 0;
 	}
-
+end:
+	lttng_poll_clean(&events);
 	return ret;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API
       [not found] ` <20190503135547.12968-8-mathieu.desnoyers@efficios.com>
@ 2019-05-15 15:18   ` Jonathan Rajotte-Julien
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-05-15 15:18 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, jgalar, joraj

Hi,

This patch does not apply.

Please rebase.

Cheers

On Fri, May 03, 2019 at 09:55:45AM -0400, Mathieu Desnoyers wrote:
> Some use of the epoll/poll wrapper require interruption
> by signals to make the poll call return -1, errno EINTR.
> Expose a new lttng_poll_wait_interruptible API for this
> purpose.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Yannick Lamarre <ylamarre@efficios.com>
> ---
>  src/common/compat/compat-epoll.c |  9 +++++----
>  src/common/compat/compat-poll.c  |  9 +++++----
>  src/common/compat/poll.h         | 12 ++++++++----
>  3 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/src/common/compat/compat-epoll.c b/src/common/compat/compat-epoll.c
> index 6a781c7a..7108b717 100644
> --- a/src/common/compat/compat-epoll.c
> +++ b/src/common/compat/compat-epoll.c
> @@ -241,7 +241,7 @@ error:
>  /*
>   * Wait on epoll set. This is a blocking call of timeout value.
>   */
> -int compat_epoll_wait(struct lttng_poll_event *events, int timeout)
> +int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int interruptible)
>  {
>  	int ret;
>  	uint32_t new_size;
> @@ -273,10 +273,11 @@ int compat_epoll_wait(struct lttng_poll_event *events, int timeout)
>  
>  	do {
>  		ret = epoll_wait(events->epfd, events->events, events->nb_fd, timeout);
> -	} while (ret == -1 && errno == EINTR);
> +	} while (!interruptible && ret == -1 && errno == EINTR);
>  	if (ret < 0) {
> -		/* At this point, every error is fatal */
> -		PERROR("epoll_wait");
> +		if (errno != EINTR) {
> +			PERROR("epoll_wait");
> +		}
>  		goto error;
>  	}
>  
> diff --git a/src/common/compat/compat-poll.c b/src/common/compat/compat-poll.c
> index b45b39dc..cdb6f8b5 100644
> --- a/src/common/compat/compat-poll.c
> +++ b/src/common/compat/compat-poll.c
> @@ -281,7 +281,7 @@ error:
>  /*
>   * Wait on poll() with timeout. Blocking call.
>   */
> -int compat_poll_wait(struct lttng_poll_event *events, int timeout)
> +int compat_poll_wait(struct lttng_poll_event *events, int timeout, int interruptible)
>  {
>  	int ret;
>  
> @@ -308,10 +308,11 @@ int compat_poll_wait(struct lttng_poll_event *events, int timeout)
>  
>  	do {
>  		ret = poll(events->wait.events, events->wait.nb_fd, timeout);
> -	} while (ret == -1 && errno == EINTR);
> +	} while (!interruptible && ret == -1 && errno == EINTR);
>  	if (ret < 0) {
> -		/* At this point, every error is fatal */
> -		PERROR("poll wait");
> +		if (errno != EINTR) {
> +			PERROR("poll wait");
> +		}
>  		goto error;
>  	}
>  
> diff --git a/src/common/compat/poll.h b/src/common/compat/poll.h
> index d4bd87f5..5400e5b1 100644
> --- a/src/common/compat/poll.h
> +++ b/src/common/compat/poll.h
> @@ -152,9 +152,11 @@ static inline int compat_glibc_epoll_create(int size, int flags)
>   * Wait on epoll set with the number of fd registered to the lttng_poll_event
>   * data structure (events).
>   */
> -extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout);
> +extern int compat_epoll_wait(struct lttng_poll_event *events, int timeout, int interruptible);
>  #define lttng_poll_wait(events, timeout) \
> -	compat_epoll_wait(events, timeout)
> +	compat_epoll_wait(events, timeout, 0)
> +#define lttng_poll_wait_interruptible(events, timeout) \
> +	compat_epoll_wait(events, timeout, 1)
>  
>  /*
>   * Add a fd to the epoll set and resize the epoll_event structure if needed.
> @@ -334,9 +336,11 @@ extern int compat_poll_create(struct lttng_poll_event *events, int size);
>   * Wait on poll(2) event with nb_fd registered to the lttng_poll_event data
>   * structure.
>   */
> -extern int compat_poll_wait(struct lttng_poll_event *events, int timeout);
> +extern int compat_poll_wait(struct lttng_poll_event *events, int timeout, int interruptible);
>  #define lttng_poll_wait(events, timeout) \
> -	compat_poll_wait(events, timeout)
> +	compat_poll_wait(events, timeout, 0)
> +#define lttng_poll_wait_interruptible(events, timeout) \
> +	compat_poll_wait(events, timeout, 1)
>  
>  /*
>   * Add the fd to the pollfd structure. Resize if needed.
> -- 
> 2.11.0
> 

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select()
       [not found] ` <20190503135547.12968-9-mathieu.desnoyers@efficios.com>
@ 2019-05-15 15:20   ` Jonathan Rajotte-Julien
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-05-15 15:20 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, jgalar, joraj

On Fri, May 03, 2019 at 09:55:46AM -0400, Mathieu Desnoyers wrote:
> The select(2) system call is an ancient ABI limited to processes
> containing at most FD_SETSIZE file descriptors overall (typically
> 1024).
> 
> Those notification APIs will fail if the target file descriptor
> is above FD_SETSIZE in a process containing many file descriptors.
> 
> Never use select, use the lttng epoll/poll wrapper instead.
> 
> This patch depends on "Change lttng_poll_wait behaviour of compat-poll
> to match compat-epoll" posted by Yannick Lamarre.

Please split the current patchset in two. It will accelerate the acceptation of
the patches not depending on this work.

This is valid for 7,8,9 if I'm not mistaken.

Cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT
       [not found] ` <20190503135547.12968-2-mathieu.desnoyers@efficios.com>
@ 2019-05-15 16:19   ` Jonathan Rajotte-Julien
       [not found]   ` <20190515161917.GC14986@joraj-alpa>
  1 sibling, 0 replies; 17+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-05-15 16:19 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, jgalar, joraj

Hi,

Please run shellcheck on this and fix only the issue introduced.

This is valid for all patches of this series.

Most comments for stop_*_opt functions applies to the others since the changes
are similar.

On Fri, May 03, 2019 at 09:55:39AM -0400, Mathieu Desnoyers wrote:
> The current state of signal handling for test scripts is: on
> SIGTERM/SIGINT of the tests (e.g. a CTRL-C on the console), session
> daemon and relay daemon are killed with SIGKILL, thus leaking all their
> resources, and leaving lttng kernel modules loaded.
> 
> Revamp the "stop" functions to take a signal number and a timeout
> as optional parameters. The default signal number is SIGTERM.
> 
> The full_cleanup trap handler now tries to nicely kill relayd and
> sessiond (if they are present) with SIGTERM, and wait up to the
> user-configurable LTTNG_TEST_TEARDOWN_TIMEOUT environment variable
> (which has a default of 60s). Then, if there are still either relayd,
> sessiond, or consumerd present, it will SIGKILL them and wait for
> them to vanish. If it had to kill sessiond with SIGKILL, it will
> also explicitly try to unload the lttng modules with modprobe.
> 
> This approach is inspired from sysv init script shutdown behavior.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  tests/utils/utils.sh | 180 +++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 137 insertions(+), 43 deletions(-)
> 
> diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
> index 94b3a3c4..d273b278 100644
> --- a/tests/utils/utils.sh
> +++ b/tests/utils/utils.sh
> @@ -15,14 +15,12 @@
>  
>  SESSIOND_BIN="lttng-sessiond"
>  SESSIOND_MATCH=".*lttng-sess.*"
> -SESSIOND_PIDS=""
>  RUNAS_BIN="lttng-runas"
>  RUNAS_MATCH=".*lttng-runas.*"
>  CONSUMERD_BIN="lttng-consumerd"
>  CONSUMERD_MATCH=".*lttng-consumerd.*"
>  RELAYD_BIN="lttng-relayd"
>  RELAYD_MATCH=".*lttng-relayd.*"
> -RELAYD_PIDS=""
>  LTTNG_BIN="lttng"
>  BABELTRACE_BIN="babeltrace"
>  OUTPUT_DEST=/dev/null
> @@ -48,11 +46,20 @@ export LTTNG_SESSIOND_PATH="/bin/true"
>  
>  source $TESTDIR/utils/tap/tap.sh
>  
> +if [ -z $LTTNG_TEST_TEARDOWN_TIMEOUT ]; then
> +	LTTNG_TEST_TEARDOWN_TIMEOUT=60
> +fi
> +
>  function full_cleanup ()
>  {
> -	if [ -n "${SESSIOND_PIDS}" ] || [ -n "${RELAYD_PIDS}" ]; then
> -		kill -9 ${SESSIOND_PIDS} ${RELAYD_PIDS} > /dev/null 2>&1
> -	fi
> +	# Try to kill daemons gracefully
> +	stop_lttng_relayd_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
> +	stop_lttng_sessiond_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
> +
> +	# If daemons are still present, forcibly kill them
> +	stop_lttng_relayd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
> +	stop_lttng_sessiond_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
> +	stop_lttng_consumerd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
>  
>  	# Disable trap for SIGTERM since the following kill to the
>  	# pidgroup will be SIGTERM. Otherwise it loops.
> @@ -397,8 +404,6 @@ function start_lttng_relayd_opt()
>  	else
>  		pass "Start lttng-relayd (opt: $opt)"
>  	fi
> -
> -	RELAYD_PIDS=$(pgrep $RELAYD_MATCH)
>  }
>  
>  function start_lttng_relayd()
> @@ -414,29 +419,58 @@ function start_lttng_relayd_notap()
>  function stop_lttng_relayd_opt()
>  {
>  	local withtap=$1
> +	local signal=$2
> +	local timeout=$3

What is timeout expected unit? seconds?

If so make it explicit.

> +	local dtimeleft=
> +	local fail=0
> +	local pids=$(pgrep $RELAYD_MATCH)
>  
> -	if [ $withtap -eq "1" ]; then
> -		diag "Killing lttng-relayd (pid: $RELAYD_PIDS)"
> +	if [ -n "$timeout" ]; then

Add a comment on why you are doing the multiplication here (easier arithmetic
down the line).

> +		dtimeleft=$(($timeout * 2))
> +	fi
> +
> +	if [ -z "$signal" ]; then
> +		signal="SIGTERM"
>  	fi
> -	kill $RELAYD_PIDS 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
> +
> +	if [ -z "$pids" ]; then
> +		if [ $withtap -eq "1" ]; then
> +			pass "No relay daemon to kill"
> +		fi
> +		return 0
> +	fi
> +
> +	diag "Killing (signal $signal) lttng-relayd (pid: $pids)"
> +
> +	kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>  	retval=$?
>  
>  	if [ $? -eq 1 ]; then
> +		fail=1

"fail" is set here but never reused, is there a check on "fail" missing down the
road?

>  		if [ $withtap -eq "1" ]; then
>  			fail "Kill relay daemon"
>  		fi
> -		return 1
>  	else
>  		out=1
>  		while [ -n "$out" ]; do
>  			out=$(pgrep $RELAYD_MATCH)
> +			if [ -n "$dtimeleft" ]; then
> +				if [ $dtimeleft -lt 0 ]; then
> +					out=
> +					fail=1
> +				fi
> +				dtimeleft=$(($dtimeleft - 1))
> +			fi
>  			sleep 0.5

We could also consider using 1 here and remove the * 2 found earlier.

We aren't that time sensitive. If we want to go that route, let's use 0.1 here
and * 10 as the base multiplier.

>  		done
>  		if [ $withtap -eq "1" ]; then
> -			pass "Kill relay daemon"
> +			if [ $fail -eq "0" ]; then
> +				pass "Wait after kill relay daemon"
> +			else
> +				fail "Wait after kill relay daemon"
> +			fi
>  		fi
>  	fi
> -	RELAYD_PIDS=""

Should retval be 1 if we failed due to timeout?

The matter of "do we need retval here?" will be for another time.

>  	return $retval
>  }
>  
> @@ -508,7 +542,6 @@ function start_lttng_sessiond_opt()
>  			ok $status "Start session daemon"
>  		fi
>  	fi
> -	SESSIOND_PIDS=$(pgrep $SESSIOND_MATCH)
>  }
>  
>  function start_lttng_sessiond()
> @@ -525,24 +558,43 @@ function stop_lttng_sessiond_opt()
>  {
>  	local withtap=$1
>  	local signal=$2
> -	local kill_opt=""
> +	local timeout=$3
> +	local dtimeleft=
> +	local fail=0
>  
> -	if [ -n $TEST_NO_SESSIOND ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
> +	if [ -n "$timeout" ]; then
> +		dtimeleft=$(($timeout * 2))
> +	fi
> +
> +	if [ -n "$TEST_NO_SESSIOND" ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
>  		# Env variable requested no session daemon
>  		return
>  	fi
>  
> -	local pids="${SESSIOND_PIDS} $(pgrep $RUNAS_MATCH)"
> +	local runas_pids=$(pgrep $RUNAS_MATCH)
> +	local pids=$(pgrep $SESSIOND_MATCH)
>  
> -	if [ -n "$2" ]; then
> -		kill_opt="$kill_opt -s $signal"
> +	if [ -n "$runas_pids" ]; then
> +		pids="$pids $runas_pids"
>  	fi
> -	if [ $withtap -eq "1" ]; then
> -		diag "Killing $SESSIOND_BIN and lt-$SESSIOND_BIN pids: $(echo $pids | tr '\n' ' ')"
> +
> +	if [ -z "$pids" ]; then
> +		if [ $withtap -eq "1" ]; then
> +			pass "No session daemon to kill"
> +		fi
> +		return
>  	fi
> -	kill $kill_opt $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
> +
> +	if [ -z "$signal" ]; then
> +		signal=SIGTERM
> +	fi

Move this after variable declaration at the top.

> +
> +	diag "Killing (signal $signal) $SESSIOND_BIN and lt-$SESSIOND_BIN pids: $(echo $pids | tr '\n' ' ')"
> +
> +	kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>  
>  	if [ $? -eq 1 ]; then
> +		fail=1

"fail" is set but never used. Is it intentional?

>  		if [ $withtap -eq "1" ]; then
>  			fail "Kill sessions daemon"
>  		fi
> @@ -550,17 +602,44 @@ function stop_lttng_sessiond_opt()
>  		out=1
>  		while [ -n "$out" ]; do
>  			out=$(pgrep ${SESSIOND_MATCH})
> +			if [ -n "$dtimeleft" ]; then
> +				if [ $dtimeleft -lt 0 ]; then
> +					out=
> +					fail=1
> +				fi
> +				dtimeleft=$(($dtimeleft - 1))
> +			fi
>  			sleep 0.5
>  		done
>  		out=1
>  		while [ -n "$out" ]; do
>  			out=$(pgrep $CONSUMERD_MATCH)
> +			if [ -n "$dtimeleft" ]; then
> +				if [ $dtimeleft -lt 0 ]; then
> +					out=
> +					fail=1
> +				fi
> +				dtimeleft=$(($dtimeleft - 1))
> +			fi
>  			sleep 0.5
>  		done
>  
> -		SESSIOND_PIDS=""
>  		if [ $withtap -eq "1" ]; then
> -			pass "Kill session daemon"
> +			if [ $fail -eq "0" ]; then
> +				pass "Wait after kill session daemon"
> +			else
> +				fail "Wait after kill session daemon"
> +			fi
> +		fi
> +	fi
> +	if [ "$signal" = "SIGKILL" ]; then
> +		if [ "$(id -u)" -eq "0" ]; then
> +			local modules="$(lsmod | grep ^lttng | awk '{print $1}')"
> +
> +			if [ -n "$modules" ]; then
> +				diag "Unloading all LTTng modules"
> +				modprobe -r $modules
> +			fi
>  		fi
>  	fi
>  }
> @@ -579,21 +658,18 @@ function sigstop_lttng_sessiond_opt()
>  {
>  	local withtap=$1
>  	local signal=SIGSTOP
> -	local kill_opt=""
>  
> -	if [ -n $TEST_NO_SESSIOND ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
> +	if [ -n "$TEST_NO_SESSIOND" ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
>  		# Env variable requested no session daemon
>  		return
>  	fi
>  
>  	PID_SESSIOND="$(pgrep ${SESSIOND_MATCH}) $(pgrep $RUNAS_MATCH)"
>  
> -	kill_opt="$kill_opt -s $signal"
> -
>  	if [ $withtap -eq "1" ]; then
>  		diag "Sending SIGSTOP to lt-$SESSIOND_BIN and $SESSIOND_BIN pids: $(echo $PID_SESSIOND | tr '\n' ' ')"
>  	fi
> -	kill $kill_opt $PID_SESSIOND 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
> +	kill -s $signal $PID_SESSIOND 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>  
>  	if [ $? -eq 1 ]; then
>  		if [ $withtap -eq "1" ]; then
> @@ -635,26 +711,37 @@ function stop_lttng_consumerd_opt()
>  {
>  	local withtap=$1
>  	local signal=$2
> -	local kill_opt=""
> +	local timeout=$3
> +	local dtimeleft=
> +	local fail=0
>  
>  	PID_CONSUMERD=$(pgrep $CONSUMERD_MATCH)
>  
> -	if [ -n "$2" ]; then
> -		kill_opt="$kill_opt -s $signal"
> +	if [ -n "$timeout" ]; then
> +		dtimeleft=$(($timeout * 2))
>  	fi
>  
> -	if [ $withtap -eq "1" ]; then
> -		diag "Killing $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
> +	if [ -z "$signal" ]; then
> +		signal=SIGTERM
>  	fi
>  
> -	kill $kill_opt $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
> +	if [ -z "$PID_CONSUMERD" ]; then
> +		if [ $withtap -eq "1" ]; then
> +			pass "No consumer daemon to kill"
> +		fi
> +		return
> +	fi
> +
> +	diag "Killing (signal $signal) $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
> +
> +	kill -s $signal $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>  	retval=$?
>  
>  	if [ $? -eq 1 ]; then
> +		fail=1
>  		if [ $withtap -eq "1" ]; then
>  			fail "Kill consumer daemon"
>  		fi
> -		return 1
>  	else
>  		out=1
>  		while [ $out -ne 0 ]; do
> @@ -669,10 +756,21 @@ function stop_lttng_consumerd_opt()
>  					out=1
>  				fi
>  			done
> +			if [ -n "$dtimeleft" ]; then
> +				if [ $dtimeleft -lt 0 ]; then
> +					out=0
> +					fail=1
> +				fi
> +				dtimeleft=$(($dtimeleft - 1))
> +			fi
>  			sleep 0.5
>  		done
>  		if [ $withtap -eq "1" ]; then
> -			pass "Kill consumer daemon"
> +			if [ $fail -eq "0" ]; then
> +				pass "Wait after kill consumer daemon"
> +			else
> +				fail "Wait after kill consumer daemon"
> +			fi
>  		fi
>  	fi
>  	return $retval
> @@ -692,16 +790,12 @@ function sigstop_lttng_consumerd_opt()
>  {
>  	local withtap=$1
>  	local signal=SIGSTOP
> -	local kill_opt=""
>  
>  	PID_CONSUMERD=$(pgrep $CONSUMERD_MATCH)
>  
> -	kill_opt="$kill_opt -s $signal"
> +	diag "Sending SIGSTOP to $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
>  
> -	if [ $withtap -eq "1" ]; then
> -		diag "Sending SIGSTOP to $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
> -	fi
> -	kill $kill_opt $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
> +	kill -s $signal $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>  	retval=$?
>  
>  	if [ $? -eq 1 ]; then
> -- 
> 2.11.0
> 

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test
       [not found] ` <20190503135547.12968-3-mathieu.desnoyers@efficios.com>
@ 2019-05-15 16:31   ` Jonathan Rajotte-Julien
       [not found]   ` <20190515163127.GD14986@joraj-alpa>
  1 sibling, 0 replies; 17+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-05-15 16:31 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, jgalar, joraj

> @@ -168,8 +187,7 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests."
>  {
>  
>  	# Catch sigint and try to cleanup limits
> -	trap interrupt_cleanup SIGTERM
> -	trap interrupt_cleanup SIGINT
> +	trap interrupt_cleanup SIGTERM SIGINT
>  
>  	BW_LIMITS=(3200 1600 800 400 200 100 50 25)
>  	for BW in ${BW_LIMITS[@]};
> @@ -177,6 +195,7 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests."
>  		diag "Test high-throughput with bandwidth limit set to ${BW}kbits"
>  
>  		set_bw_limit $BW
> +		ok $? "Setting bandwidth limit"

In case of failure here we should force a reset since it could lead to long
timeout/hang depending on which tc command failed?

This could also be done inside set_bw_limit at each stage.

>  
>  		start_lttng_sessiond
>  		start_lttng_relayd "-o $TRACE_PATH"
> @@ -185,5 +204,6 @@ skip $isroot "Root access is needed to set bandwith limits. Skipping all tests."
>  		stop_lttng_relayd
>  		stop_lttng_sessiond
>  		reset_bw_limit
> +		ok $? "Reset bandwith limits"
>  	done
>  }
> -- 
> 2.11.0
> 

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r
       [not found] ` <20190503135547.12968-7-mathieu.desnoyers@efficios.com>
@ 2019-05-15 16:43   ` Jonathan Rajotte-Julien
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-05-15 16:43 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, jgalar, joraj

LGTM

On Fri, May 03, 2019 at 09:55:44AM -0400, Mathieu Desnoyers wrote:
> Scripts implementing their own trap handlers override the generic
> one provided by utils.sh (full_cleanup). Invoke it at the end of
> the handlers to provide the utils cleanup as well.
> 
> Moreover, change use of "rmmod" to "modprobe -r", which is better
> in trap handlers because it does not print errors if the module
> was not loaded yet when the signal occurs.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  tests/regression/kernel/test_clock_override                    | 10 +++-------
>  tests/regression/kernel/test_rotation_destroy_flush            |  7 +++----
>  tests/regression/tools/crash/test_crash                        |  3 +--
>  tests/regression/tools/notification/test_notification_kernel   |  2 +-
>  .../regression/tools/notification/test_notification_multi_app  |  2 +-
>  tests/regression/tools/notification/test_notification_ust      |  2 +-
>  tests/regression/tools/streaming/test_high_throughput_limits   |  1 -
>  .../ust/rotation-destroy-flush/test_rotation_destroy_flush     |  3 +--
>  tests/stress/test_multi_sessions_per_uid_10app                 |  5 ++---
>  tests/stress/test_multi_sessions_per_uid_5app_streaming        |  5 ++---
>  .../test_multi_sessions_per_uid_5app_streaming_kill_relayd     |  5 ++---
>  11 files changed, 17 insertions(+), 28 deletions(-)
> 
> diff --git a/tests/regression/kernel/test_clock_override b/tests/regression/kernel/test_clock_override
> index e19b77e1..1fbba771 100755
> --- a/tests/regression/kernel/test_clock_override
> +++ b/tests/regression/kernel/test_clock_override
> @@ -49,11 +49,9 @@ source $TESTDIR/utils/utils.sh
>  function signal_cleanup()
>  {
>  	diag "*** Exiting ***"
> -	rmmod lttng-test
>  	stop_lttng_sessiond
> -	rmmod lttng-clock-plugin-test
> -	rmmod lttng-clock
> -	exit 1
> +	modprobe -r lttng-test lttng-clock-plugin-test lttng-clock
> +	full_cleanup
>  }
>  
>  function extract_clock_metadata()
> @@ -93,10 +91,8 @@ function test_clock_override_metadata()
>  	stop_lttng_tracing_ok $SESSION_NAME
>  	destroy_lttng_session_ok $SESSION_NAME
>  
> -	rmmod lttng-test
>  	stop_lttng_sessiond
> -	rmmod lttng-clock-plugin-test
> -	rmmod lttng-clock
> +	modprobe -r lttng-test lttng-clock-plugin-test lttng-clock
>  
>  	local TRACE_METADATA_FILE_PATH="$(find "$TRACE_PATH" -name metadata -type f)"
>  	local TRACE_METADATA_DIR="$(dirname "$TRACE_METADATA_FILE_PATH")"
> diff --git a/tests/regression/kernel/test_rotation_destroy_flush b/tests/regression/kernel/test_rotation_destroy_flush
> index 0b0b0ca7..03933a3a 100755
> --- a/tests/regression/kernel/test_rotation_destroy_flush
> +++ b/tests/regression/kernel/test_rotation_destroy_flush
> @@ -39,9 +39,8 @@ source $TESTDIR/utils/utils.sh
>  function signal_cleanup()
>  {
>  	diag "*** Exiting ***"
> -	rmmod lttng-test
> -	stop_lttng_sessiond
> -	exit 1
> +	modprobe -r lttng-test
> +	full_cleanup
>  }
>  
>  function enable_kernel_lttng_channel_size_limit ()
> @@ -107,7 +106,7 @@ function test_rotation_destroy_flush_single()
>  
>  	rm -rf $TRACE_PATH
>  
> -	rmmod lttng-test
> +	modprobe -r lttng-test
>  	stop_lttng_sessiond
>  }
>  
> diff --git a/tests/regression/tools/crash/test_crash b/tests/regression/tools/crash/test_crash
> index 13909c1b..5bad16e5 100755
> --- a/tests/regression/tools/crash/test_crash
> +++ b/tests/regression/tools/crash/test_crash
> @@ -392,8 +392,7 @@ function interrupt_cleanup()
>  {
>      diag "*** Cleaning-up test ***"
>      stop_test_apps
> -    stop_lttng_sessiond
> -    exit 1
> +    full_cleanup
>  }
>  
>  TESTS=(
> diff --git a/tests/regression/tools/notification/test_notification_kernel b/tests/regression/tools/notification/test_notification_kernel
> index e7368df2..cc6fc581 100755
> --- a/tests/regression/tools/notification/test_notification_kernel
> +++ b/tests/regression/tools/notification/test_notification_kernel
> @@ -56,7 +56,7 @@ function kernel_event_generator
>  	state_file=$1
>  	kernel_event_generator_suspended=0
>  	trap kernel_event_generator_toogle_state SIGUSR1
> -	trap "exit" SIGTERM SIGINT EXIT
> +
>  	while (true); do
>  		if [[ $kernel_event_generator_suspended -eq "1" ]]; then
>  			touch $state_file
> diff --git a/tests/regression/tools/notification/test_notification_multi_app b/tests/regression/tools/notification/test_notification_multi_app
> index 7465a83f..51d94e4f 100755
> --- a/tests/regression/tools/notification/test_notification_multi_app
> +++ b/tests/regression/tools/notification/test_notification_multi_app
> @@ -64,7 +64,7 @@ function kernel_event_generator
>  	state_file=$1
>  	kernel_event_generator_suspended=0
>  	trap kernel_event_generator_toogle_state SIGUSR1
> -	trap "exit" SIGTERM SIGINT
> +
>  	while (true); do
>  		if [[ $kernel_event_generator_suspended -eq "1" ]]; then
>  			touch $state_file
> diff --git a/tests/regression/tools/notification/test_notification_ust b/tests/regression/tools/notification/test_notification_ust
> index 8941e476..82f79a8e 100755
> --- a/tests/regression/tools/notification/test_notification_ust
> +++ b/tests/regression/tools/notification/test_notification_ust
> @@ -56,7 +56,7 @@ function ust_event_generator
>  	state_file=$1
>  	ust_event_generator_suspended=0
>  	trap ust_event_generator_toogle_state SIGUSR1
> -	trap "exit" SIGTERM SIGINT
> +
>  	while (true); do
>  		if [[ $ust_event_generator_suspended -eq "1" ]]; then
>  			touch $state_file
> diff --git a/tests/regression/tools/streaming/test_high_throughput_limits b/tests/regression/tools/streaming/test_high_throughput_limits
> index 68973c73..c49c13c2 100755
> --- a/tests/regression/tools/streaming/test_high_throughput_limits
> +++ b/tests/regression/tools/streaming/test_high_throughput_limits
> @@ -170,7 +170,6 @@ function interrupt_cleanup()
>  	reset_bw_limit
>  	# invoke utils cleanup
>  	full_cleanup
> -	exit 1
>  }
>  
>  plan_tests $NUM_TESTS
> diff --git a/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush b/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush
> index a7a93771..e404564e 100755
> --- a/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush
> +++ b/tests/regression/ust/rotation-destroy-flush/test_rotation_destroy_flush
> @@ -48,8 +48,7 @@ function run_app()
>  function signal_cleanup()
>  {
>  	diag "*** Exiting ***"
> -	stop_lttng_sessiond
> -	exit 1
> +	full_cleanup
>  }
>  
>  function enable_ust_lttng_channel_size_limit ()
> diff --git a/tests/stress/test_multi_sessions_per_uid_10app b/tests/stress/test_multi_sessions_per_uid_10app
> index 82e8ad50..c9f8403e 100755
> --- a/tests/stress/test_multi_sessions_per_uid_10app
> +++ b/tests/stress/test_multi_sessions_per_uid_10app
> @@ -112,11 +112,10 @@ function sighandler()
>  {
>  	cleanup
>  	rm $LOG_FILE
> -	exit 1
> +	full_cleanup
>  }
>  
> -trap sighandler SIGINT
> -trap sighandler SIGTERM
> +trap sighandler SIGINT SIGTERM
>  
>  # Make sure we collect a coredump if possible.
>  ulimit -c unlimited
> diff --git a/tests/stress/test_multi_sessions_per_uid_5app_streaming b/tests/stress/test_multi_sessions_per_uid_5app_streaming
> index ed989498..4203ac30 100755
> --- a/tests/stress/test_multi_sessions_per_uid_5app_streaming
> +++ b/tests/stress/test_multi_sessions_per_uid_5app_streaming
> @@ -142,11 +142,10 @@ function sighandler()
>  {
>  	cleanup
>  	rm $LOG_FILE_SESSIOND $LOG_FILE_RELAYD
> -	exit 1
> +	full_cleanup
>  }
>  
> -trap sighandler SIGINT
> -trap sighandler SIGTERM
> +trap sighandler SIGINT SIGTERM
>  
>  # Make sure we collect a coredump if possible.
>  ulimit -c unlimited
> diff --git a/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd b/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd
> index c699ac22..d0121e32 100755
> --- a/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd
> +++ b/tests/stress/test_multi_sessions_per_uid_5app_streaming_kill_relayd
> @@ -144,11 +144,10 @@ function sighandler()
>  {
>  	cleanup
>  	#rm $LOG_FILE_SESSIOND $LOG_FILE_RELAYD
> -	exit 1
> +	full_cleanup
>  }
>  
> -trap sighandler SIGINT
> -trap sighandler SIGTERM
> +trap sighandler SIGINT SIGTERM
>  
>  # Make sure we collect a coredump if possible.
>  ulimit -c unlimited
> -- 
> 2.11.0
> 

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice
       [not found] ` <20190503135547.12968-6-mathieu.desnoyers@efficios.com>
@ 2019-05-15 16:44   ` Jonathan Rajotte-Julien
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Rajotte-Julien @ 2019-05-15 16:44 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev, jgalar, joraj

LGTM

On Fri, May 03, 2019 at 09:55:43AM -0400, Mathieu Desnoyers wrote:
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  tests/regression/tools/live/test_lttng_ust | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/tests/regression/tools/live/test_lttng_ust b/tests/regression/tools/live/test_lttng_ust
> index 06017d01..830fc783 100755
> --- a/tests/regression/tools/live/test_lttng_ust
> +++ b/tests/regression/tools/live/test_lttng_ust
> @@ -34,7 +34,7 @@ TRACE_PATH=$(mktemp -d)
>  
>  DIR=$(readlink -f $TESTDIR)
>  
> -NUM_TESTS=12
> +NUM_TESTS=11
>  
>  source $TESTDIR/utils/utils.sh
>  
> @@ -84,5 +84,4 @@ stop_lttng_relayd
>  
>  test_custom_url
>  
> -stop_lttng_relayd
>  stop_lttng_sessiond
> -- 
> 2.11.0
> 

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test
       [not found]   ` <20190515163127.GD14986@joraj-alpa>
@ 2019-05-15 22:36     ` Mathieu Desnoyers
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-15 22:36 UTC (permalink / raw)
  To: Jonathan Rajotte; +Cc: lttng-dev, Jeremie Galarneau, Jonathan Rajotte-Julien

----- On May 15, 2019, at 12:31 PM, Jonathan Rajotte jonathan.rajotte-julien@efficios.com wrote:

>> @@ -168,8 +187,7 @@ skip $isroot "Root access is needed to set bandwith limits.
>> Skipping all tests."
>>  {
>>  
>>  	# Catch sigint and try to cleanup limits
>> -	trap interrupt_cleanup SIGTERM
>> -	trap interrupt_cleanup SIGINT
>> +	trap interrupt_cleanup SIGTERM SIGINT
>>  
>>  	BW_LIMITS=(3200 1600 800 400 200 100 50 25)
>>  	for BW in ${BW_LIMITS[@]};
>> @@ -177,6 +195,7 @@ skip $isroot "Root access is needed to set bandwith limits.
>> Skipping all tests."
>>  		diag "Test high-throughput with bandwidth limit set to ${BW}kbits"
>>  
>>  		set_bw_limit $BW
>> +		ok $? "Setting bandwidth limit"
> 
> In case of failure here we should force a reset since it could lead to long
> timeout/hang depending on which tc command failed?
> 
> This could also be done inside set_bw_limit at each stage.

I'll do this. Will be in v2 of this patch.

Thanks,

Mathieu

> 
>>  
>>  		start_lttng_sessiond
>>  		start_lttng_relayd "-o $TRACE_PATH"
>> @@ -185,5 +204,6 @@ skip $isroot "Root access is needed to set bandwith limits.
>> Skipping all tests."
>>  		stop_lttng_relayd
>>  		stop_lttng_sessiond
>>  		reset_bw_limit
>> +		ok $? "Reset bandwith limits"
>>  	done
>>  }
>> --
>> 2.11.0
>> 
> 
> --
> Jonathan Rajotte-Julien
> EfficiOS

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT
       [not found]   ` <20190515161917.GC14986@joraj-alpa>
@ 2019-05-16 16:00     ` Mathieu Desnoyers
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Desnoyers @ 2019-05-16 16:00 UTC (permalink / raw)
  To: Jonathan Rajotte; +Cc: lttng-dev, Jeremie Galarneau, Jonathan Rajotte-Julien

----- On May 15, 2019, at 12:19 PM, Jonathan Rajotte jonathan.rajotte-julien@efficios.com wrote:

> Hi,
> 
> Please run shellcheck on this and fix only the issue introduced.

OK

> 
> This is valid for all patches of this series.

OK

> 
> Most comments for stop_*_opt functions applies to the others since the changes
> are similar.

OK

> 
> On Fri, May 03, 2019 at 09:55:39AM -0400, Mathieu Desnoyers wrote:
>> The current state of signal handling for test scripts is: on
>> SIGTERM/SIGINT of the tests (e.g. a CTRL-C on the console), session
>> daemon and relay daemon are killed with SIGKILL, thus leaking all their
>> resources, and leaving lttng kernel modules loaded.
>> 
>> Revamp the "stop" functions to take a signal number and a timeout
>> as optional parameters. The default signal number is SIGTERM.
>> 
>> The full_cleanup trap handler now tries to nicely kill relayd and
>> sessiond (if they are present) with SIGTERM, and wait up to the
>> user-configurable LTTNG_TEST_TEARDOWN_TIMEOUT environment variable
>> (which has a default of 60s). Then, if there are still either relayd,
>> sessiond, or consumerd present, it will SIGKILL them and wait for
>> them to vanish. If it had to kill sessiond with SIGKILL, it will
>> also explicitly try to unload the lttng modules with modprobe.
>> 
>> This approach is inspired from sysv init script shutdown behavior.
>> 
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> ---
>>  tests/utils/utils.sh | 180 +++++++++++++++++++++++++++++++++++++++------------
>>  1 file changed, 137 insertions(+), 43 deletions(-)
>> 
>> diff --git a/tests/utils/utils.sh b/tests/utils/utils.sh
>> index 94b3a3c4..d273b278 100644
>> --- a/tests/utils/utils.sh
>> +++ b/tests/utils/utils.sh
>> @@ -15,14 +15,12 @@
>>  
>>  SESSIOND_BIN="lttng-sessiond"
>>  SESSIOND_MATCH=".*lttng-sess.*"
>> -SESSIOND_PIDS=""
>>  RUNAS_BIN="lttng-runas"
>>  RUNAS_MATCH=".*lttng-runas.*"
>>  CONSUMERD_BIN="lttng-consumerd"
>>  CONSUMERD_MATCH=".*lttng-consumerd.*"
>>  RELAYD_BIN="lttng-relayd"
>>  RELAYD_MATCH=".*lttng-relayd.*"
>> -RELAYD_PIDS=""
>>  LTTNG_BIN="lttng"
>>  BABELTRACE_BIN="babeltrace"
>>  OUTPUT_DEST=/dev/null
>> @@ -48,11 +46,20 @@ export LTTNG_SESSIOND_PATH="/bin/true"
>>  
>>  source $TESTDIR/utils/tap/tap.sh
>>  
>> +if [ -z $LTTNG_TEST_TEARDOWN_TIMEOUT ]; then
>> +	LTTNG_TEST_TEARDOWN_TIMEOUT=60
>> +fi
>> +
>>  function full_cleanup ()
>>  {
>> -	if [ -n "${SESSIOND_PIDS}" ] || [ -n "${RELAYD_PIDS}" ]; then
>> -		kill -9 ${SESSIOND_PIDS} ${RELAYD_PIDS} > /dev/null 2>&1
>> -	fi
>> +	# Try to kill daemons gracefully
>> +	stop_lttng_relayd_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
>> +	stop_lttng_sessiond_notap SIGTERM $LTTNG_TEST_TEARDOWN_TIMEOUT
>> +
>> +	# If daemons are still present, forcibly kill them
>> +	stop_lttng_relayd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
>> +	stop_lttng_sessiond_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
>> +	stop_lttng_consumerd_notap SIGKILL $LTTNG_TEST_TEARDOWN_TIMEOUT
>>  
>>  	# Disable trap for SIGTERM since the following kill to the
>>  	# pidgroup will be SIGTERM. Otherwise it loops.
>> @@ -397,8 +404,6 @@ function start_lttng_relayd_opt()
>>  	else
>>  		pass "Start lttng-relayd (opt: $opt)"
>>  	fi
>> -
>> -	RELAYD_PIDS=$(pgrep $RELAYD_MATCH)
>>  }
>>  
>>  function start_lttng_relayd()
>> @@ -414,29 +419,58 @@ function start_lttng_relayd_notap()
>>  function stop_lttng_relayd_opt()
>>  {
>>  	local withtap=$1
>> +	local signal=$2
>> +	local timeout=$3
> 
> What is timeout expected unit? seconds?

Yes, will rename to "timeout_s".

> 
> If so make it explicit.
> 
>> +	local dtimeleft=
>> +	local fail=0
>> +	local pids=$(pgrep $RELAYD_MATCH)
>>  
>> -	if [ $withtap -eq "1" ]; then
>> -		diag "Killing lttng-relayd (pid: $RELAYD_PIDS)"
>> +	if [ -n "$timeout" ]; then
> 
> Add a comment on why you are doing the multiplication here (easier arithmetic
> down the line).

OK

> 
>> +		dtimeleft=$(($timeout * 2))
>> +	fi
>> +
>> +	if [ -z "$signal" ]; then
>> +		signal="SIGTERM"
>>  	fi
>> -	kill $RELAYD_PIDS 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>> +
>> +	if [ -z "$pids" ]; then
>> +		if [ $withtap -eq "1" ]; then
>> +			pass "No relay daemon to kill"
>> +		fi
>> +		return 0
>> +	fi
>> +
>> +	diag "Killing (signal $signal) lttng-relayd (pid: $pids)"
>> +
>> +	kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>>  	retval=$?
>>  
>>  	if [ $? -eq 1 ]; then
>> +		fail=1
> 
> "fail" is set here but never reused, is there a check on "fail" missing down the
> road?

Will change the functions to return "$retval", and rename fail to retval,
and remove the retval=$? which becomes redundant.

> 
>>  		if [ $withtap -eq "1" ]; then
>>  			fail "Kill relay daemon"
>>  		fi
>> -		return 1
>>  	else
>>  		out=1
>>  		while [ -n "$out" ]; do
>>  			out=$(pgrep $RELAYD_MATCH)
>> +			if [ -n "$dtimeleft" ]; then
>> +				if [ $dtimeleft -lt 0 ]; then
>> +					out=
>> +					fail=1
>> +				fi
>> +				dtimeleft=$(($dtimeleft - 1))
>> +			fi
>>  			sleep 0.5
> 
> We could also consider using 1 here and remove the * 2 found earlier.
> 
> We aren't that time sensitive. If we want to go that route, let's use 0.1 here
> and * 10 as the base multiplier.

The intent of this patchset is not to tweak the delay wait increments.
Feel free to propose that change later on. I'm keeping the existing
delays for now.

> 
>>  		done
>>  		if [ $withtap -eq "1" ]; then
>> -			pass "Kill relay daemon"
>> +			if [ $fail -eq "0" ]; then
>> +				pass "Wait after kill relay daemon"
>> +			else
>> +				fail "Wait after kill relay daemon"
>> +			fi
>>  		fi
>>  	fi
>> -	RELAYD_PIDS=""
> 
> Should retval be 1 if we failed due to timeout?

Yes. I'm correcting this as well.

> 
> The matter of "do we need retval here?" will be for another time.

indeed.

> 
>>  	return $retval
>>  }
>>  
>> @@ -508,7 +542,6 @@ function start_lttng_sessiond_opt()
>>  			ok $status "Start session daemon"
>>  		fi
>>  	fi
>> -	SESSIOND_PIDS=$(pgrep $SESSIOND_MATCH)
>>  }
>>  
>>  function start_lttng_sessiond()
>> @@ -525,24 +558,43 @@ function stop_lttng_sessiond_opt()
>>  {
>>  	local withtap=$1
>>  	local signal=$2
>> -	local kill_opt=""
>> +	local timeout=$3
>> +	local dtimeleft=
>> +	local fail=0
>>  
>> -	if [ -n $TEST_NO_SESSIOND ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
>> +	if [ -n "$timeout" ]; then
>> +		dtimeleft=$(($timeout * 2))
>> +	fi
>> +
>> +	if [ -n "$TEST_NO_SESSIOND" ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
>>  		# Env variable requested no session daemon
>>  		return
>>  	fi
>>  
>> -	local pids="${SESSIOND_PIDS} $(pgrep $RUNAS_MATCH)"
>> +	local runas_pids=$(pgrep $RUNAS_MATCH)
>> +	local pids=$(pgrep $SESSIOND_MATCH)
>>  
>> -	if [ -n "$2" ]; then
>> -		kill_opt="$kill_opt -s $signal"
>> +	if [ -n "$runas_pids" ]; then
>> +		pids="$pids $runas_pids"
>>  	fi
>> -	if [ $withtap -eq "1" ]; then
>> -		diag "Killing $SESSIOND_BIN and lt-$SESSIOND_BIN pids: $(echo $pids | tr '\n'
>> ' ')"
>> +
>> +	if [ -z "$pids" ]; then
>> +		if [ $withtap -eq "1" ]; then
>> +			pass "No session daemon to kill"
>> +		fi
>> +		return
>>  	fi
>> -	kill $kill_opt $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>> +
>> +	if [ -z "$signal" ]; then
>> +		signal=SIGTERM
>> +	fi
> 
> Move this after variable declaration at the top.

ok

> 
>> +
>> +	diag "Killing (signal $signal) $SESSIOND_BIN and lt-$SESSIOND_BIN pids: $(echo
>> $pids | tr '\n' ' ')"
>> +
>> +	kill -s $signal $pids 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>>  
>>  	if [ $? -eq 1 ]; then
>> +		fail=1
> 
> "fail" is set but never used. Is it intentional?

fixing.

Thanks,

Mathieu

> 
>>  		if [ $withtap -eq "1" ]; then
>>  			fail "Kill sessions daemon"
>>  		fi
>> @@ -550,17 +602,44 @@ function stop_lttng_sessiond_opt()
>>  		out=1
>>  		while [ -n "$out" ]; do
>>  			out=$(pgrep ${SESSIOND_MATCH})
>> +			if [ -n "$dtimeleft" ]; then
>> +				if [ $dtimeleft -lt 0 ]; then
>> +					out=
>> +					fail=1
>> +				fi
>> +				dtimeleft=$(($dtimeleft - 1))
>> +			fi
>>  			sleep 0.5
>>  		done
>>  		out=1
>>  		while [ -n "$out" ]; do
>>  			out=$(pgrep $CONSUMERD_MATCH)
>> +			if [ -n "$dtimeleft" ]; then
>> +				if [ $dtimeleft -lt 0 ]; then
>> +					out=
>> +					fail=1
>> +				fi
>> +				dtimeleft=$(($dtimeleft - 1))
>> +			fi
>>  			sleep 0.5
>>  		done
>>  
>> -		SESSIOND_PIDS=""
>>  		if [ $withtap -eq "1" ]; then
>> -			pass "Kill session daemon"
>> +			if [ $fail -eq "0" ]; then
>> +				pass "Wait after kill session daemon"
>> +			else
>> +				fail "Wait after kill session daemon"
>> +			fi
>> +		fi
>> +	fi
>> +	if [ "$signal" = "SIGKILL" ]; then
>> +		if [ "$(id -u)" -eq "0" ]; then
>> +			local modules="$(lsmod | grep ^lttng | awk '{print $1}')"
>> +
>> +			if [ -n "$modules" ]; then
>> +				diag "Unloading all LTTng modules"
>> +				modprobe -r $modules
>> +			fi
>>  		fi
>>  	fi
>>  }
>> @@ -579,21 +658,18 @@ function sigstop_lttng_sessiond_opt()
>>  {
>>  	local withtap=$1
>>  	local signal=SIGSTOP
>> -	local kill_opt=""
>>  
>> -	if [ -n $TEST_NO_SESSIOND ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
>> +	if [ -n "$TEST_NO_SESSIOND" ] && [ "$TEST_NO_SESSIOND" == "1" ]; then
>>  		# Env variable requested no session daemon
>>  		return
>>  	fi
>>  
>>  	PID_SESSIOND="$(pgrep ${SESSIOND_MATCH}) $(pgrep $RUNAS_MATCH)"
>>  
>> -	kill_opt="$kill_opt -s $signal"
>> -
>>  	if [ $withtap -eq "1" ]; then
>>  		diag "Sending SIGSTOP to lt-$SESSIOND_BIN and $SESSIOND_BIN pids: $(echo
>>  		$PID_SESSIOND | tr '\n' ' ')"
>>  	fi
>> -	kill $kill_opt $PID_SESSIOND 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>> +	kill -s $signal $PID_SESSIOND 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>>  
>>  	if [ $? -eq 1 ]; then
>>  		if [ $withtap -eq "1" ]; then
>> @@ -635,26 +711,37 @@ function stop_lttng_consumerd_opt()
>>  {
>>  	local withtap=$1
>>  	local signal=$2
>> -	local kill_opt=""
>> +	local timeout=$3
>> +	local dtimeleft=
>> +	local fail=0
>>  
>>  	PID_CONSUMERD=$(pgrep $CONSUMERD_MATCH)
>>  
>> -	if [ -n "$2" ]; then
>> -		kill_opt="$kill_opt -s $signal"
>> +	if [ -n "$timeout" ]; then
>> +		dtimeleft=$(($timeout * 2))
>>  	fi
>>  
>> -	if [ $withtap -eq "1" ]; then
>> -		diag "Killing $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n' ' ')"
>> +	if [ -z "$signal" ]; then
>> +		signal=SIGTERM
>>  	fi
>>  
>> -	kill $kill_opt $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>> +	if [ -z "$PID_CONSUMERD" ]; then
>> +		if [ $withtap -eq "1" ]; then
>> +			pass "No consumer daemon to kill"
>> +		fi
>> +		return
>> +	fi
>> +
>> +	diag "Killing (signal $signal) $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr
>> '\n' ' ')"
>> +
>> +	kill -s $signal $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>>  	retval=$?
>>  
>>  	if [ $? -eq 1 ]; then
>> +		fail=1
>>  		if [ $withtap -eq "1" ]; then
>>  			fail "Kill consumer daemon"
>>  		fi
>> -		return 1
>>  	else
>>  		out=1
>>  		while [ $out -ne 0 ]; do
>> @@ -669,10 +756,21 @@ function stop_lttng_consumerd_opt()
>>  					out=1
>>  				fi
>>  			done
>> +			if [ -n "$dtimeleft" ]; then
>> +				if [ $dtimeleft -lt 0 ]; then
>> +					out=0
>> +					fail=1
>> +				fi
>> +				dtimeleft=$(($dtimeleft - 1))
>> +			fi
>>  			sleep 0.5
>>  		done
>>  		if [ $withtap -eq "1" ]; then
>> -			pass "Kill consumer daemon"
>> +			if [ $fail -eq "0" ]; then
>> +				pass "Wait after kill consumer daemon"
>> +			else
>> +				fail "Wait after kill consumer daemon"
>> +			fi
>>  		fi
>>  	fi
>>  	return $retval
>> @@ -692,16 +790,12 @@ function sigstop_lttng_consumerd_opt()
>>  {
>>  	local withtap=$1
>>  	local signal=SIGSTOP
>> -	local kill_opt=""
>>  
>>  	PID_CONSUMERD=$(pgrep $CONSUMERD_MATCH)
>>  
>> -	kill_opt="$kill_opt -s $signal"
>> +	diag "Sending SIGSTOP to $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n'
>> ' ')"
>>  
>> -	if [ $withtap -eq "1" ]; then
>> -		diag "Sending SIGSTOP to $CONSUMERD_BIN pids: $(echo $PID_CONSUMERD | tr '\n'
>> ' ')"
>> -	fi
>> -	kill $kill_opt $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>> +	kill -s $signal $PID_CONSUMERD 1> $OUTPUT_DEST 2> $ERROR_OUTPUT_DEST
>>  	retval=$?
>>  
>>  	if [ $? -eq 1 ]; then
>> --
>> 2.11.0
>> 
> 
> --
> Jonathan Rajotte-Julien
> EfficiOS

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-05-16 16:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190503135547.12968-1-mathieu.desnoyers@efficios.com>
2019-05-03 13:55 ` [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 3/9] Fix: utils.sh: handle SIGPIPE Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 4/9] Fix: test: utils.sh: exit from process on full_cleanup Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select() Mathieu Desnoyers
2019-05-03 13:55 ` [PATCH lttng-tools 9/9] sessiond: " Mathieu Desnoyers
     [not found] ` <20190503135547.12968-8-mathieu.desnoyers@efficios.com>
2019-05-15 15:18   ` [PATCH lttng-tools 7/9] epoll/poll compat: expose interruptible API Jonathan Rajotte-Julien
     [not found] ` <20190503135547.12968-9-mathieu.desnoyers@efficios.com>
2019-05-15 15:20   ` [PATCH lttng-tools 8/9] lttng-ctl: notifications: use epoll()/poll() instead of select() Jonathan Rajotte-Julien
     [not found] ` <20190503135547.12968-2-mathieu.desnoyers@efficios.com>
2019-05-15 16:19   ` [PATCH lttng-tools 1/9] Improve handling of test SIGTERM/SIGINT Jonathan Rajotte-Julien
     [not found]   ` <20190515161917.GC14986@joraj-alpa>
2019-05-16 16:00     ` Mathieu Desnoyers
     [not found] ` <20190503135547.12968-3-mathieu.desnoyers@efficios.com>
2019-05-15 16:31   ` [PATCH lttng-tools 2/9] Fix: tests: error handling in high throughput limits test Jonathan Rajotte-Julien
     [not found]   ` <20190515163127.GD14986@joraj-alpa>
2019-05-15 22:36     ` Mathieu Desnoyers
     [not found] ` <20190503135547.12968-7-mathieu.desnoyers@efficios.com>
2019-05-15 16:43   ` [PATCH lttng-tools 6/9] tests: invoke full_cleanup from script trap handlers, use modprobe -r Jonathan Rajotte-Julien
     [not found] ` <20190503135547.12968-6-mathieu.desnoyers@efficios.com>
2019-05-15 16:44   ` [PATCH lttng-tools 5/9] Cleanup: test: don't stop relayd twice Jonathan Rajotte-Julien

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.