From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: sandeen@sandeen.net, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] libxfs: simulate system failure after a certain number of writes
Date: Wed, 17 Feb 2021 20:36:20 -0800 [thread overview]
Message-ID: <20210218043620.GQ7193@magnolia> (raw)
In-Reply-To: <20210216115645.GC534175@bfoster>
On Tue, Feb 16, 2021 at 06:56:45AM -0500, Brian Foster wrote:
> On Fri, Feb 12, 2021 at 09:46:56PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Add an error injection knob so that we can simulate system failure after
> > a certain number of disk writes. This knob is being added so that we
> > can check repair's behavior after an arbitrary number of tests.
> >
> > Set LIBXFS_DEBUG_WRITE_CRASH={ddev,logdev,rtdev}=nn in the environment
> > to make libxfs SIGKILL itself after nn writes to the data, log, or rt
> > devices. Note that this only applies to xfs_buf writes and zero_range.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > libxfs/init.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++---
> > libxfs/libxfs_io.h | 19 +++++++++++++++
> > libxfs/rdwr.c | 6 ++++-
> > 3 files changed, 88 insertions(+), 5 deletions(-)
> >
> >
> > diff --git a/libxfs/init.c b/libxfs/init.c
> > index 8a8ce3c4..1ec83791 100644
> > --- a/libxfs/init.c
> > +++ b/libxfs/init.c
> ...
> > @@ -614,6 +634,46 @@ libxfs_buftarg_init(
> > dev_t logdev,
> > dev_t rtdev)
> > {
> > + char *p = getenv("LIBXFS_DEBUG_WRITE_CRASH");
> > + unsigned long dfail = 0, lfail = 0, rfail = 0;
>
> Was there a reason for using an environment variable now rather than the
> original command line option?
Well, you said you wanted a generic write error injection hook for
libxfs, and this is the simplest way to add that, given that libraries
don't have a direct means to parse argc and argv.
I mean... this /could/ take the form of an exposed library function that
xfs utilities could opt into their own getopt loops, but that's even
/more/ infrastructure code that I'd have to write.
OTOH there's already precedent for magic environment variables to enable
libxfs debug hooks.
> > +
> > + /* Simulate utility crash after a certain number of writes. */
> > + while (p && *p) {
> > + char *val;
> > +
> > + switch (getsubopt(&p, wf_opts, &val)) {
> > + case WF_DATA:
> > + if (!val) {
> > + fprintf(stderr,
> > + _("ddev write fail requires a parameter\n"));
> > + exit(1);
> > + }
> > + dfail = strtoul(val, NULL, 0);
> > + break;
> > + case WF_LOG:
> > + if (!val) {
> > + fprintf(stderr,
> > + _("logdev write fail requires a parameter\n"));
> > + exit(1);
> > + }
> > + lfail = strtoul(val, NULL, 0);
> > + break;
> > + case WF_RT:
> > + if (!val) {
> > + fprintf(stderr,
> > + _("rtdev write fail requires a parameter\n"));
> > + exit(1);
> > + }
> > + rfail = strtoul(val, NULL, 0);
> > + break;
> > + default:
> > + fprintf(stderr, _("unknown write fail type %s\n"),
> > + val);
> > + exit(1);
> > + break;
> > + }
> > + }
> > +
> > if (mp->m_ddev_targp) {
> > /* should already have all buftargs initialised */
> > if (mp->m_ddev_targp->bt_bdev != dev ||
> ...
> > diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
> > index c80e2d59..85485257 100644
> > --- a/libxfs/libxfs_io.h
> > +++ b/libxfs/libxfs_io.h
> ...
> > @@ -30,6 +32,23 @@ struct xfs_buftarg {
> > #define XFS_BUFTARG_LOST_WRITE (1 << 0)
> > /* A dirty buffer failed the write verifier. */
> > #define XFS_BUFTARG_CORRUPT_WRITE (1 << 1)
> > +/* Simulate failure after a certain number of writes. */
> > +#define XFS_BUFTARG_INJECT_WRITE_FAIL (1 << 2)
> > +
> > +/* Simulate the system crashing after a write. */
> > +static inline void
> > +xfs_buftarg_trip_write(
> > + struct xfs_buftarg *btp)
> > +{
> > + if (!(btp->flags & XFS_BUFTARG_INJECT_WRITE_FAIL))
> > + return;
> > +
> > + pthread_mutex_lock(&btp->lock);
> > + btp->writes_left--;
> > + if (!btp->writes_left)
> > + kill(getpid(), SIGKILL);
>
> Can we just exit()?
>
> (Same questions for the next patch..)
The goal of this generic write error injection framework is to simulate
total system crashes immediately after a write.
SIGKILL and exit are not the same, because atexit handlers don't run if
the process forcibly kills itself.
--D
>
> Brian
>
> > + pthread_mutex_unlock(&btp->lock);
> > +}
> >
> > extern void libxfs_buftarg_init(struct xfs_mount *mp, dev_t ddev,
> > dev_t logdev, dev_t rtdev);
> > diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> > index ca272387..fd456d6b 100644
> > --- a/libxfs/rdwr.c
> > +++ b/libxfs/rdwr.c
> > @@ -74,8 +74,10 @@ libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
> > /* try to use special zeroing methods, fall back to writes if needed */
> > len_bytes = LIBXFS_BBTOOFF64(len);
> > error = platform_zero_range(fd, start_offset, len_bytes);
> > - if (!error)
> > + if (!error) {
> > + xfs_buftarg_trip_write(btp);
> > return 0;
> > + }
> >
> > zsize = min(BDSTRAT_SIZE, BBTOB(len));
> > if ((z = memalign(libxfs_device_alignment(), zsize)) == NULL) {
> > @@ -105,6 +107,7 @@ libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
> > progname, __FUNCTION__);
> > exit(1);
> > }
> > + xfs_buftarg_trip_write(btp);
> > offset += bytes;
> > }
> > free(z);
> > @@ -860,6 +863,7 @@ libxfs_bwrite(
> > } else {
> > bp->b_flags |= LIBXFS_B_UPTODATE;
> > bp->b_flags &= ~(LIBXFS_B_DIRTY | LIBXFS_B_UNCHECKED);
> > + xfs_buftarg_trip_write(bp->b_target);
> > }
> > return bp->b_error;
> > }
> >
>
next prev parent reply other threads:[~2021-02-18 4:37 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-13 5:46 [PATCHSET 0/3] xfs_repair: set needsrepair when dirtying filesystems Darrick J. Wong
2021-02-13 5:46 ` [PATCH 1/3] xfs_repair: set NEEDSREPAIR the first time we write to a filesystem Darrick J. Wong
2021-02-16 11:55 ` Brian Foster
2021-02-18 4:45 ` Darrick J. Wong
2021-02-18 12:59 ` Brian Foster
2021-02-18 17:07 ` Darrick J. Wong
2021-02-13 5:46 ` [PATCH 2/3] libxfs: simulate system failure after a certain number of writes Darrick J. Wong
2021-02-16 11:56 ` Brian Foster
2021-02-18 4:36 ` Darrick J. Wong [this message]
2021-02-18 13:02 ` Brian Foster
2021-02-18 17:42 ` Darrick J. Wong
2021-02-13 5:47 ` [PATCH 3/3] xfs_repair: add post-phase error injection points Darrick J. Wong
2021-02-16 11:58 ` Brian Foster
2021-02-18 4:47 ` Darrick J. Wong
2021-02-18 13:02 ` Brian Foster
2021-02-18 18:01 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210218043620.GQ7193@magnolia \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).