linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neil@brown.name>
To: Jeff Layton <jlayton@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-mm@kvack.org, jfs-discussion@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, cluster-devel@redhat.com,
	linux-f2fs-devel@lists.sourceforge.net,
	v9fs-developer@lists.sourceforge.net,
	linux-nilfs@vger.kernel.org, linux-block@vger.kernel.org
Cc: dhowells@redhat.com, akpm@linux-foundation.org,
	hch@infradead.org, ross.zwisler@linux.intel.com,
	mawilcox@microsoft.com, jack@suse.com, viro@zeniv.linux.org.uk,
	corbet@lwn.net, neilb@suse.de, clm@fb.com, tytso@mit.edu,
	axboe@kernel.dk, josef@toxicpanda.com, hubcap@omnibond.com,
	rpeterso@redhat.com, bo.li.liu@oracle.com
Subject: Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it
Date: Wed, 10 May 2017 08:03:13 +1000	[thread overview]
Message-ID: <87inl9n0wu.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170509154930.29524-14-jlayton@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 10479 bytes --]

On Tue, May 09 2017, Jeff Layton wrote:

> An errseq_t is a way of recording errors in one place, and allowing any
> number of "subscribers" to tell whether an error has been set again
> since a previous time.
>
> It's implemented as an unsigned 32-bit value that is managed with atomic
> operations. The low order bits are designated to hold an error code
> (max size of MAX_ERRNO). The upper bits are used as a counter.
>
> The API works with consumers sampling an errseq_t value at a particular
> point in time. Later, that value can be used to tell whether new errors
> have been set since that time.
>
> Note that there is a 1 in 512k risk of collisions here if new errors
> are being recorded frequently, since we have so few bits to use as a
> counter. To mitigate this, one bit is used as a flag to tell whether the
> value has been sampled since a new value was recorded. That allows
> us to avoid bumping the counter if no one has sampled it since it
> was last bumped.
>
> Later patches will build on this infrastructure to change how writeback
> errors are tracked in the kernel.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

I like that this is a separate lib/*.c - nicely structured too.

Reviewed-by: NeilBrown <neilb@suse.com>

Thanks,
NeilBrown


> ---
>  include/linux/errseq.h |  19 +++++
>  lib/Makefile           |   2 +-
>  lib/errseq.c           | 199 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 219 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/errseq.h
>  create mode 100644 lib/errseq.c
>
> diff --git a/include/linux/errseq.h b/include/linux/errseq.h
> new file mode 100644
> index 000000000000..0d2555f310cd
> --- /dev/null
> +++ b/include/linux/errseq.h
> @@ -0,0 +1,19 @@
> +#ifndef _LINUX_ERRSEQ_H
> +#define _LINUX_ERRSEQ_H
> +
> +/* See lib/errseq.c for more info */
> +
> +typedef u32	errseq_t;
> +
> +void __errseq_set(errseq_t *eseq, int err);
> +static inline void errseq_set(errseq_t *eseq, int err)
> +{
> +	/* Optimize for the common case of no error */
> +	if (unlikely(err))
> +		__errseq_set(eseq, err);
> +}
> +
> +errseq_t errseq_sample(errseq_t *eseq);
> +int errseq_check(errseq_t *eseq, errseq_t since);
> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
> +#endif
> diff --git a/lib/Makefile b/lib/Makefile
> index 320ac46a8725..2423afef40f7 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -41,7 +41,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
>  	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
>  	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
>  	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
> -	 once.o refcount.o
> +	 once.o refcount.o errseq.o
>  obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>  obj-y += hexdump.o
> diff --git a/lib/errseq.c b/lib/errseq.c
> new file mode 100644
> index 000000000000..0f8b4ed460f0
> --- /dev/null
> +++ b/lib/errseq.c
> @@ -0,0 +1,199 @@
> +#include <linux/err.h>
> +#include <linux/bug.h>
> +#include <linux/atomic.h>
> +#include <linux/errseq.h>
> +
> +/*
> + * An errseq_t is a way of recording errors in one place, and allowing any
> + * number of "subscribers" to tell whether it has changed since an arbitrary
> + * time of their choosing.
> + *
> + * It's implemented as an unsigned 32-bit value. The low order bits are
> + * designated to hold an error code (between 0 and -MAX_ERRNO). The upper bits
> + * are used as a counter. This is done with atomics instead of locking so that
> + * these functions can be called from any context.
> + *
> + * The general idea is for consumers to sample an errseq_t value at a
> + * particular point in time. Later, that value can be used to tell whether any
> + * new errors have occurred since that time.
> + *
> + * Note that there is a risk of collisions, if new errors are being recorded
> + * frequently, since we have so few bits to use as a counter.
> + *
> + * To mitigate this, one bit is used as a flag to tell whether the value has
> + * been sampled since a new value was recorded. That allows us to avoid bumping
> + * the counter if no one has sampled it since the last time an error was
> + * recorded.
> + *
> + * A new errseq_t should always be zeroed out.  A errseq_t value of all zeroes
> + * is the special (but common) case where there has never been an error. An all
> + * zero value thus serves as the "epoch" if one wishes to know whether there
> + * has ever been an error set since it was first initialized.
> + */
> +
> +/* The low bits are designated for error code (max of MAX_ERRNO) */
> +#define ERRSEQ_SHIFT		ilog2(MAX_ERRNO + 1)
> +
> +/* This bit is used as a flag to indicate whether the value has been seen */
> +#define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)
> +
> +/* The "ones" bit for the counter */
> +#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
> +
> +/**
> + * __errseq_set - set a errseq_t for later reporting
> + * @eseq: errseq_t field that should be set
> + * @err: error to set
> + *
> + * This function sets the error in *eseq, and increments the sequence counter
> + * if the last sequence was sampled at some point in the past.
> + *
> + * Any error set will always overwrite an existing error.
> + *
> + * Most callers will want to use the errseq_set inline wrapper to efficiently
> + * handle the common case where err is 0.
> + */
> +void __errseq_set(errseq_t *eseq, int err)
> +{
> +	errseq_t old;
> +
> +	/* MAX_ERRNO must be able to serve as a mask */
> +	BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);
> +
> +	/*
> +	 * Ensure the error code actually fits where we want it to go. If it
> +	 * doesn't then just throw a warning and don't record anything. We
> +	 * also don't accept zero here as that would effectively clear a
> +	 * previous error.
> +	 */
> +	if (WARN(unlikely(err == 0 || (unsigned int)-err > MAX_ERRNO),
> +				"err = %d\n", err))
> +		return;
> +
> +	old = READ_ONCE(*eseq);
> +	for (;;) {
> +		errseq_t new, cur;
> +
> +		/* Clear out error bits and set new error */
> +		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
> +
> +		/* Only increment if someone has looked at it */
> +		if (old & ERRSEQ_SEEN)
> +			new += ERRSEQ_CTR_INC;
> +
> +		/* If there would be no change, then call it done */
> +		if (new == old)
> +			break;
> +
> +		/* Try to swap the new value into place */
> +		cur = cmpxchg(eseq, old, new);
> +
> +		/*
> +		 * Call it success if we did the swap or someone else beat us
> +		 * to it for the same value.
> +		 */
> +		if (likely(cur == old || cur == new))
> +			break;
> +
> +		/* Raced with an update, try again */
> +		old = cur;
> +	}
> +}
> +EXPORT_SYMBOL(__errseq_set);
> +
> +/**
> + * errseq_sample - grab current errseq_t value
> + * @eseq: pointer to errseq_t to be sampled
> + *
> + * This function allows callers to sample an errseq_t value, marking it as
> + * "seen" if required.
> + */
> +errseq_t errseq_sample(errseq_t *eseq)
> +{
> +	errseq_t old = READ_ONCE(*eseq);
> +	errseq_t new = old;
> +
> +	/*
> +	 * For the common case of no errors ever having been set, we can skip
> +	 * marking the SEEN bit. Once an error has been set, the value will
> +	 * never go back to zero.
> +	 */
> +	if (old != 0) {
> +		new |= ERRSEQ_SEEN;
> +		if (old != new)
> +			cmpxchg(eseq, old, new);
> +	}
> +	return new;
> +}
> +EXPORT_SYMBOL(errseq_sample);
> +
> +/**
> + * errseq_check - has an error occurred since a particular point in time?
> + * @eseq: pointer to errseq_t value to be checked
> + * @since: previously-sampled errseq_t from which to check
> + *
> + * Grab the value that eseq points to, and see if it has changed "since"
> + * the given value was sampled. The "since" value is not advanced, so there
> + * is no need to mark the value as seen.
> + *
> + * Returns the latest error set in the errseq_t or 0 if it hasn't changed.
> + */
> +int errseq_check(errseq_t *eseq, errseq_t since)
> +{
> +	errseq_t cur = READ_ONCE(*eseq);
> +
> +	if (likely(cur == since))
> +		return 0;
> +	return -(cur & MAX_ERRNO);
> +}
> +EXPORT_SYMBOL(errseq_check);
> +
> +/**
> + * errseq_check_and_advance - check an errseq_t and advance it to the current value
> + * @eseq: pointer to value being checked reported
> + * @since: pointer to previously-sampled errseq_t to check against and advance
> + *
> + * Grab the eseq value, and see whether it matches the value that "since"
> + * points to. If it does, then just return 0.
> + *
> + * If it doesn't, then the value has changed. Set the "seen" flag, and try to
> + * swap it into place as the new eseq value. Then, set that value as the new
> + * "since" value, and return whatever the error portion is set to.
> + *
> + * Note that no locking is provided here for concurrent updates to the "since"
> + * value. The caller must provide that if necessary. Because of this, callers
> + * may want to do a lockless errseq_check before taking the lock and calling
> + * this.
> + */
> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> +{
> +	int err = 0;
> +	errseq_t old, new;
> +
> +	/*
> +	 * Most callers will want to use the inline wrapper to check this,
> +	 * so that the common case of no error is handled without needing
> +	 * to lock.
> +	 */
> +	old = READ_ONCE(*eseq);
> +	if (old != *since) {
> +		/*
> +		 * Set the flag and try to swap it into place if it has
> +		 * changed.
> +		 *
> +		 * We don't care about the outcome of the swap here. If the
> +		 * swap doesn't occur, then it has either been updated by a
> +		 * writer who is bumping the seq count anyway, or another
> +		 * reader who is just setting the "seen" flag. Either outcome
> +		 * is OK, and we can advance "since" and return an error based
> +		 * on what we have.
> +		 */
> +		new = old | ERRSEQ_SEEN;
> +		if (new != old)
> +			cmpxchg(eseq, old, new);
> +		*since = new;
> +		err = -(new & MAX_ERRNO);
> +	}
> +	return err;
> +}
> +EXPORT_SYMBOL(errseq_check_and_advance);
> -- 
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-05-09 22:03 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-09 15:49 [PATCH v4 00/27] fs: introduce new writeback error reporting and convert existing API as a wrapper around it Jeff Layton
2017-05-09 15:49 ` [PATCH v4 01/27] fs: remove unneeded forward definition of mm_struct from fs.h Jeff Layton
2017-05-10 11:04   ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 02/27] mm: drop "wait" parameter from write_one_page Jeff Layton
2017-05-09 15:49 ` [PATCH v4 03/27] mm: fix mapping_set_error call in me_pagecache_dirty Jeff Layton
2017-05-09 15:49 ` [PATCH v4 04/27] buffer: use mapping_set_error instead of setting the flag Jeff Layton
2017-05-09 15:49 ` [PATCH v4 05/27] btrfs: btrfs_wait_tree_block_writeback can be void return Jeff Layton
2017-05-10 11:09   ` Jan Kara
2017-05-19  4:07   ` Liu Bo
2017-05-09 15:49 ` [PATCH v4 06/27] fs: check for writeback errors after syncing out buffers in generic_file_fsync Jeff Layton
2017-05-10 12:48   ` Matthew Wilcox
2017-05-09 15:49 ` [PATCH v4 07/27] orangefs: don't call filemap_write_and_wait from fsync Jeff Layton
2017-05-09 15:49 ` [PATCH v4 08/27] dax: set errors in mapping when writeback fails Jeff Layton
2017-05-09 15:49 ` [PATCH v4 09/27] nilfs2: set the mapping error when calling SetPageError on writeback Jeff Layton
2017-05-09 15:49 ` [PATCH v4 10/27] 9p: set mapping error when writeback fails in launder_page Jeff Layton
2017-05-09 15:49 ` [PATCH v4 11/27] fuse: set mapping error in writepage_locked when it fails Jeff Layton
2017-05-10 11:13   ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 12/27] cifs: set mapping error when page writeback fails in writepage or launder_pages Jeff Layton
2017-05-10 11:14   ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it Jeff Layton
2017-05-09 22:03   ` NeilBrown [this message]
2017-05-10 11:29     ` Jeff Layton
2017-05-10 11:34   ` Jan Kara
2017-05-10 11:58     ` Jeff Layton
2017-05-10 14:18   ` Matthew Wilcox
2017-05-10 14:56     ` Jeff Layton
2017-05-09 15:49 ` [PATCH v4 14/27] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-05-10 11:48   ` Jan Kara
2017-05-10 12:19     ` Jeff Layton
2017-05-10 13:46       ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 15/27] fs: retrofit old error reporting API onto new infrastructure Jeff Layton
2017-05-15 10:42   ` Jan Kara
2017-05-15 17:58     ` Jeff Layton
2017-05-19 19:20     ` Jeff Layton
2017-05-22 13:38       ` Jan Kara
2017-05-22 13:53         ` Jeff Layton
2017-05-22 17:53           ` Jan Kara
2017-05-22 19:09             ` Jeff Layton
2017-05-23  9:05               ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 16/27] fs: adapt sync_file_range to new reporting infrastructure Jeff Layton
2017-05-09 15:49 ` [PATCH v4 17/27] mm: remove AS_EIO and AS_ENOSPC flags Jeff Layton
2017-05-09 15:49 ` [PATCH v4 18/27] mm: don't TestClearPageError in __filemap_fdatawait_range Jeff Layton
2017-05-09 15:49 ` [PATCH v4 19/27] buffer: set errors in mapping at the time that the error occurs Jeff Layton
2017-05-15 11:53   ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 20/27] cifs: cleanup writeback handling errors and comments Jeff Layton
2017-05-09 15:49 ` [PATCH v4 21/27] mm: clean up error handling in write_one_page Jeff Layton
2017-05-15 12:01   ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 22/27] jbd2: don't reset error in journal_finish_inode_data_buffers Jeff Layton
2017-05-15 11:58   ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 23/27] gfs2: clean up some filemap_* calls Jeff Layton
2017-05-10 16:18   ` Bob Peterson
2017-05-09 15:49 ` [PATCH v4 24/27][RFC] nfs: convert to new errseq_t based error tracking for writeback errors Jeff Layton
2017-05-09 15:49 ` [PATCH v4 25/27] Documentation: flesh out the section in vfs.txt on storing and reporting " Jeff Layton
2017-05-09 16:24   ` Jeff Layton
2017-05-09 15:49 ` [PATCH v4 26/27] mm: flesh out comments over mapping_set_error Jeff Layton
2017-05-09 15:49 ` [PATCH v4 27/27] mm: clean up comments in me_pagecache_dirty Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inl9n0wu.fsf@notabene.neil.brown.name \
    --to=neil@brown.name \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bo.li.liu@oracle.com \
    --cc=clm@fb.com \
    --cc=cluster-devel@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=hubcap@omnibond.com \
    --cc=jack@suse.com \
    --cc=jfs-discussion@lists.sourceforge.net \
    --cc=jlayton@redhat.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-nilfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=neilb@suse.de \
    --cc=ross.zwisler@linux.intel.com \
    --cc=rpeterso@redhat.com \
    --cc=tytso@mit.edu \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).