fstests.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: guaneryu@gmail.com, linux-xfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH 8/9] check: run tests in a systemd scope for mandatory test cleanup
Date: Wed, 28 Oct 2020 09:58:25 -0700	[thread overview]
Message-ID: <20201028165825.GD1061252@magnolia> (raw)
In-Reply-To: <20201028074407.GH2750@infradead.org>

On Wed, Oct 28, 2020 at 07:44:07AM +0000, Christoph Hellwig wrote:
> On Tue, Oct 27, 2020 at 12:02:21PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If systemd is available, run each test in its own temporary systemd
> > scope.  This enables the test harness to forcibly clean up all of the
> > test's child processes (if it does not do so itself) so that we can move
> > into the post-test unmount and check cleanly.
> 
> Can you explain what this mean in more detail?  Most importantly what
> problems it fixes.

I'll answer these in reverse order. :)

I frequently run fstests in "low" memory situations (2GB!) to force the
kernel to do interesting things.  There are a few tests like generic/224
and generic/561 that put processes in the background and occasionally
trigger the OOM killer.  Most of the time the OOM killer correctly
shoots down fsstress or duperemove, but once in a while it's stupid
enough to shoot down the test control process (i.e. tests/generic/224)
instead.  fsstress is still running in the background, and the one
process that knew about that is dead.

When the control process dies, ./check moves on to the post-test fsck,
which fails because fsstress is still running and we can't unmount.
After fsck fails, ./check moves on to the next test, which fails because
fsstress is /still/ writing to the filesystem and we can't unmount or
format.

The end result is that that one OOM kill causes cascading test failures,
and I have to re-start fstests to see if I get a clean(er) run.  This is
frustrating in the -rc1 days, where I more frequently observe problems
with memory reclaim and OOM kills.  (Note: those problems are usually
gone by -rc3.)

So, the solution I present in this patch is to teach ./check to try to
run the test script in a systemd scope.  If that succeeds, ./check will
tell systemd to kill the scope when the test script exits and returns
control to ./check.  Concretely, this means that systemd creates a new
cgroup, stuffs the processes in that cgroup, and when we kill the scope,
systemd kills all the processes in that cgroup and deletes the cgroup.

The end result is that fstests now has an easy way to ensure that /all/
child processes of a test are dead before we try to unmount the test and
scratch devices.  I've designed this to be optional, because not
everyone does or wants or likes to run systemd, but it makes QA easier.

Hmm, this might make a better commit log.  I'll excerpt this into the
patch message.

--D

  reply	other threads:[~2020-10-29  2:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-27 19:01 [PATCH 0/9] xfstests: random fixes Darrick J. Wong
2020-10-27 19:01 ` [PATCH 1/9] common: extract rt extent size for _get_file_block_size Darrick J. Wong
2020-10-28  7:41   ` Christoph Hellwig
2020-10-28 22:24     ` Darrick J. Wong
2020-10-27 19:01 ` [PATCH 2/9] xfs/520: disable external devices Darrick J. Wong
2020-10-28  7:41   ` Christoph Hellwig
2020-10-27 19:01 ` [PATCH 3/9] xfs/341: fix test when rextsize > 1 Darrick J. Wong
2020-10-28  7:41   ` Christoph Hellwig
2020-10-27 19:01 ` [PATCH 4/9] various: replace _get_block_size with _get_file_block_size when needed Darrick J. Wong
2020-10-28  7:42   ` Christoph Hellwig
2020-10-27 19:02 ` [PATCH 5/9] xfs/327: fix inode reflink flag checking Darrick J. Wong
2020-10-28  7:42   ` Christoph Hellwig
2020-10-27 19:02 ` [PATCH 6/9] xfs/27[26]: force realtime on or off as needed Darrick J. Wong
2020-10-28  7:43   ` Christoph Hellwig
2020-10-27 19:02 ` [PATCH 7/9] xfs/030: hide the btree levels check errors Darrick J. Wong
2020-10-28  7:43   ` Christoph Hellwig
2020-10-27 19:02 ` [PATCH 8/9] check: run tests in a systemd scope for mandatory test cleanup Darrick J. Wong
2020-10-28  7:44   ` Christoph Hellwig
2020-10-28 16:58     ` Darrick J. Wong [this message]
2020-10-29  1:04     ` Darrick J. Wong
2020-11-02 21:37   ` Darrick J. Wong
2020-10-27 19:02 ` [PATCH 9/9] common/populate: make sure _scratch_xfs_populate puts its files on the data device Darrick J. Wong
2020-10-28  7:44   ` Christoph Hellwig
2020-10-28 16:27     ` Darrick J. Wong
2020-11-08 10:05 ` [PATCH 0/9] xfstests: random fixes Eryu Guan
2020-11-08 17:23   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201028165825.GD1061252@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=fstests@vger.kernel.org \
    --cc=guaneryu@gmail.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).