All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: djwong@kernel.org, guaneryu@gmail.com
Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org, guan@eryu.me
Subject: [PATCH 5/8] check: run _check_filesystems in an OOM-happy subshell
Date: Tue, 06 Jul 2021 17:21:34 -0700	[thread overview]
Message-ID: <162561729448.543423.13588309966120368094.stgit@locust> (raw)
In-Reply-To: <162561726690.543423.15033740972304281407.stgit@locust>

From: Darrick J. Wong <djwong@kernel.org>

While running fstests one night, I observed that fstests stopped
abruptly because ./check ran _check_filesystems to run xfs_repair.  In
turn, repair (which inherited oom_score_adj=-1000 from ./check) consumed
so much memory that the OOM killer ran around killing other daemons,
rendering the system nonfunctional.

This is silly -- we set an OOM score adjustment of -1000 on the ./check
process so that the test framework itself wouldn't get OOM-killed,
because that aborts the entire run.  Everything else is fair game for
that, including subprocesses started by _check_filesystems.

Therefore, adapt _check_filesystems (and its children) to run in a
subshell with a much higher oom score adjustment.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 check |   24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)


diff --git a/check b/check
index de8104d0..bb7e030c 100755
--- a/check
+++ b/check
@@ -525,17 +525,20 @@ _summary()
 
 _check_filesystems()
 {
+	local ret=0
+
 	if [ -f ${RESULT_DIR}/require_test ]; then
-		_check_test_fs || err=true
+		_check_test_fs || ret=1
 		rm -f ${RESULT_DIR}/require_test*
 	else
 		_test_unmount 2> /dev/null
 	fi
 	if [ -f ${RESULT_DIR}/require_scratch ]; then
-		_check_scratch_fs || err=true
+		_check_scratch_fs || ret=1
 		rm -f ${RESULT_DIR}/require_scratch*
 	fi
 	_scratch_unmount 2> /dev/null
+	return $ret
 }
 
 _expunge_test()
@@ -558,11 +561,15 @@ test $? -eq 77 && HAVE_SYSTEMD_SCOPES=yes
 
 # Make the check script unattractive to the OOM killer...
 OOM_SCORE_ADJ="/proc/self/oom_score_adj"
-test -w ${OOM_SCORE_ADJ} && echo -1000 > ${OOM_SCORE_ADJ}
+function _adjust_oom_score() {
+	test -w "${OOM_SCORE_ADJ}" && echo "$1" > "${OOM_SCORE_ADJ}"
+}
+_adjust_oom_score -1000
 
 # ...and make the tests themselves somewhat more attractive to it, so that if
 # the system runs out of memory it'll be the test that gets killed and not the
-# test framework.
+# test framework.  The test is run in a separate process without any of our
+# functions, so we open-code adjusting the OOM score.
 #
 # If systemd is available, run the entire test script in a scope so that we can
 # kill all subprocesses of the test if it fails to clean up after itself.  This
@@ -875,9 +882,12 @@ function run_section()
 			rm -f ${RESULT_DIR}/require_scratch*
 			err=true
 		else
-			# the test apparently passed, so check for corruption
-			# and log messages that shouldn't be there.
-			_check_filesystems
+			# The test apparently passed, so check for corruption
+			# and log messages that shouldn't be there.  Run the
+			# checking tools from a subshell with adjusted OOM
+			# score so that the OOM killer will target them instead
+			# of the check script itself.
+			(_adjust_oom_score 250; _check_filesystems) || err=true
 			_check_dmesg || err=true
 		fi
 


  parent reply	other threads:[~2021-07-07  0:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  0:21 [PATCHSET 0/8] fstests: random fixes Darrick J. Wong
2021-07-07  0:21 ` [PATCH 1/8] xfs/172: disable test when file writes don't use delayed allocation Darrick J. Wong
2021-07-09 23:38   ` Allison Henderson
2021-07-07  0:21 ` [PATCH 2/8] generic/561: hide assertions when duperemove is killed Darrick J. Wong
2021-07-09 23:38   ` Allison Henderson
2021-07-10  1:25     ` Darrick J. Wong
2021-07-07  0:21 ` [PATCH 3/8] shared/298: fix random deletion when filenames contain spaces Darrick J. Wong
2021-07-09 23:39   ` Allison Henderson
2021-07-07  0:21 ` [PATCH 4/8] dmthin: erase the metadata device properly before starting Darrick J. Wong
2021-07-09 23:39   ` Allison Henderson
2021-07-18 14:20   ` Eryu Guan
2021-07-18 14:32   ` Eryu Guan
2021-07-07  0:21 ` Darrick J. Wong [this message]
2021-07-07  0:21 ` [PATCH 6/8] xfs/084: fix test program status collection and processing Darrick J. Wong
2021-07-07  0:21 ` [PATCH 7/8] generic/371: disable speculative preallocation regressions on XFS Darrick J. Wong
2021-07-09 23:50   ` Allison Henderson
2021-07-07  0:21 ` [PATCH 8/8] generic/019: don't dump cores when fio/fsstress hit io errors Darrick J. Wong
2021-07-09 23:39   ` Allison Henderson
2021-07-18 14:35 ` [PATCHSET 0/8] fstests: random fixes Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=162561729448.543423.13588309966120368094.stgit@locust \
    --to=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=guan@eryu.me \
    --cc=guaneryu@gmail.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.