From: NeilBrown <neil@brown.name>
To: Jeff Layton <jlayton@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-mm@kvack.org, jfs-discussion@lists.sourceforge.net,
linux-xfs@vger.kernel.org, cluster-devel@redhat.com,
linux-f2fs-devel@lists.sourceforge.net,
v9fs-developer@lists.sourceforge.net,
linux-nilfs@vger.kernel.org, linux-block@vger.kernel.org
Cc: dhowells@redhat.com, akpm@linux-foundation.org,
hch@infradead.org, ross.zwisler@linux.intel.com,
mawilcox@microsoft.com, jack@suse.com, viro@zeniv.linux.org.uk,
corbet@lwn.net, neilb@suse.de, clm@fb.com, tytso@mit.edu,
axboe@kernel.dk, josef@toxicpanda.com, hubcap@omnibond.com,
rpeterso@redhat.com, bo.li.liu@oracle.com
Subject: Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it
Date: Wed, 10 May 2017 08:03:13 +1000 [thread overview]
Message-ID: <87inl9n0wu.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170509154930.29524-14-jlayton@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 10479 bytes --]
On Tue, May 09 2017, Jeff Layton wrote:
> An errseq_t is a way of recording errors in one place, and allowing any
> number of "subscribers" to tell whether an error has been set again
> since a previous time.
>
> It's implemented as an unsigned 32-bit value that is managed with atomic
> operations. The low order bits are designated to hold an error code
> (max size of MAX_ERRNO). The upper bits are used as a counter.
>
> The API works with consumers sampling an errseq_t value at a particular
> point in time. Later, that value can be used to tell whether new errors
> have been set since that time.
>
> Note that there is a 1 in 512k risk of collisions here if new errors
> are being recorded frequently, since we have so few bits to use as a
> counter. To mitigate this, one bit is used as a flag to tell whether the
> value has been sampled since a new value was recorded. That allows
> us to avoid bumping the counter if no one has sampled it since it
> was last bumped.
>
> Later patches will build on this infrastructure to change how writeback
> errors are tracked in the kernel.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
I like that this is a separate lib/*.c - nicely structured too.
Reviewed-by: NeilBrown <neilb@suse.com>
Thanks,
NeilBrown
> ---
> include/linux/errseq.h | 19 +++++
> lib/Makefile | 2 +-
> lib/errseq.c | 199 +++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 219 insertions(+), 1 deletion(-)
> create mode 100644 include/linux/errseq.h
> create mode 100644 lib/errseq.c
>
> diff --git a/include/linux/errseq.h b/include/linux/errseq.h
> new file mode 100644
> index 000000000000..0d2555f310cd
> --- /dev/null
> +++ b/include/linux/errseq.h
> @@ -0,0 +1,19 @@
> +#ifndef _LINUX_ERRSEQ_H
> +#define _LINUX_ERRSEQ_H
> +
> +/* See lib/errseq.c for more info */
> +
> +typedef u32 errseq_t;
> +
> +void __errseq_set(errseq_t *eseq, int err);
> +static inline void errseq_set(errseq_t *eseq, int err)
> +{
> + /* Optimize for the common case of no error */
> + if (unlikely(err))
> + __errseq_set(eseq, err);
> +}
> +
> +errseq_t errseq_sample(errseq_t *eseq);
> +int errseq_check(errseq_t *eseq, errseq_t since);
> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
> +#endif
> diff --git a/lib/Makefile b/lib/Makefile
> index 320ac46a8725..2423afef40f7 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -41,7 +41,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
> gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
> bsearch.o find_bit.o llist.o memweight.o kfifo.o \
> percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
> - once.o refcount.o
> + once.o refcount.o errseq.o
> obj-y += string_helpers.o
> obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> obj-y += hexdump.o
> diff --git a/lib/errseq.c b/lib/errseq.c
> new file mode 100644
> index 000000000000..0f8b4ed460f0
> --- /dev/null
> +++ b/lib/errseq.c
> @@ -0,0 +1,199 @@
> +#include <linux/err.h>
> +#include <linux/bug.h>
> +#include <linux/atomic.h>
> +#include <linux/errseq.h>
> +
> +/*
> + * An errseq_t is a way of recording errors in one place, and allowing any
> + * number of "subscribers" to tell whether it has changed since an arbitrary
> + * time of their choosing.
> + *
> + * It's implemented as an unsigned 32-bit value. The low order bits are
> + * designated to hold an error code (between 0 and -MAX_ERRNO). The upper bits
> + * are used as a counter. This is done with atomics instead of locking so that
> + * these functions can be called from any context.
> + *
> + * The general idea is for consumers to sample an errseq_t value at a
> + * particular point in time. Later, that value can be used to tell whether any
> + * new errors have occurred since that time.
> + *
> + * Note that there is a risk of collisions, if new errors are being recorded
> + * frequently, since we have so few bits to use as a counter.
> + *
> + * To mitigate this, one bit is used as a flag to tell whether the value has
> + * been sampled since a new value was recorded. That allows us to avoid bumping
> + * the counter if no one has sampled it since the last time an error was
> + * recorded.
> + *
> + * A new errseq_t should always be zeroed out. A errseq_t value of all zeroes
> + * is the special (but common) case where there has never been an error. An all
> + * zero value thus serves as the "epoch" if one wishes to know whether there
> + * has ever been an error set since it was first initialized.
> + */
> +
> +/* The low bits are designated for error code (max of MAX_ERRNO) */
> +#define ERRSEQ_SHIFT ilog2(MAX_ERRNO + 1)
> +
> +/* This bit is used as a flag to indicate whether the value has been seen */
> +#define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT)
> +
> +/* The "ones" bit for the counter */
> +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1))
> +
> +/**
> + * __errseq_set - set a errseq_t for later reporting
> + * @eseq: errseq_t field that should be set
> + * @err: error to set
> + *
> + * This function sets the error in *eseq, and increments the sequence counter
> + * if the last sequence was sampled at some point in the past.
> + *
> + * Any error set will always overwrite an existing error.
> + *
> + * Most callers will want to use the errseq_set inline wrapper to efficiently
> + * handle the common case where err is 0.
> + */
> +void __errseq_set(errseq_t *eseq, int err)
> +{
> + errseq_t old;
> +
> + /* MAX_ERRNO must be able to serve as a mask */
> + BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);
> +
> + /*
> + * Ensure the error code actually fits where we want it to go. If it
> + * doesn't then just throw a warning and don't record anything. We
> + * also don't accept zero here as that would effectively clear a
> + * previous error.
> + */
> + if (WARN(unlikely(err == 0 || (unsigned int)-err > MAX_ERRNO),
> + "err = %d\n", err))
> + return;
> +
> + old = READ_ONCE(*eseq);
> + for (;;) {
> + errseq_t new, cur;
> +
> + /* Clear out error bits and set new error */
> + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
> +
> + /* Only increment if someone has looked at it */
> + if (old & ERRSEQ_SEEN)
> + new += ERRSEQ_CTR_INC;
> +
> + /* If there would be no change, then call it done */
> + if (new == old)
> + break;
> +
> + /* Try to swap the new value into place */
> + cur = cmpxchg(eseq, old, new);
> +
> + /*
> + * Call it success if we did the swap or someone else beat us
> + * to it for the same value.
> + */
> + if (likely(cur == old || cur == new))
> + break;
> +
> + /* Raced with an update, try again */
> + old = cur;
> + }
> +}
> +EXPORT_SYMBOL(__errseq_set);
> +
> +/**
> + * errseq_sample - grab current errseq_t value
> + * @eseq: pointer to errseq_t to be sampled
> + *
> + * This function allows callers to sample an errseq_t value, marking it as
> + * "seen" if required.
> + */
> +errseq_t errseq_sample(errseq_t *eseq)
> +{
> + errseq_t old = READ_ONCE(*eseq);
> + errseq_t new = old;
> +
> + /*
> + * For the common case of no errors ever having been set, we can skip
> + * marking the SEEN bit. Once an error has been set, the value will
> + * never go back to zero.
> + */
> + if (old != 0) {
> + new |= ERRSEQ_SEEN;
> + if (old != new)
> + cmpxchg(eseq, old, new);
> + }
> + return new;
> +}
> +EXPORT_SYMBOL(errseq_sample);
> +
> +/**
> + * errseq_check - has an error occurred since a particular point in time?
> + * @eseq: pointer to errseq_t value to be checked
> + * @since: previously-sampled errseq_t from which to check
> + *
> + * Grab the value that eseq points to, and see if it has changed "since"
> + * the given value was sampled. The "since" value is not advanced, so there
> + * is no need to mark the value as seen.
> + *
> + * Returns the latest error set in the errseq_t or 0 if it hasn't changed.
> + */
> +int errseq_check(errseq_t *eseq, errseq_t since)
> +{
> + errseq_t cur = READ_ONCE(*eseq);
> +
> + if (likely(cur == since))
> + return 0;
> + return -(cur & MAX_ERRNO);
> +}
> +EXPORT_SYMBOL(errseq_check);
> +
> +/**
> + * errseq_check_and_advance - check an errseq_t and advance it to the current value
> + * @eseq: pointer to value being checked reported
> + * @since: pointer to previously-sampled errseq_t to check against and advance
> + *
> + * Grab the eseq value, and see whether it matches the value that "since"
> + * points to. If it does, then just return 0.
> + *
> + * If it doesn't, then the value has changed. Set the "seen" flag, and try to
> + * swap it into place as the new eseq value. Then, set that value as the new
> + * "since" value, and return whatever the error portion is set to.
> + *
> + * Note that no locking is provided here for concurrent updates to the "since"
> + * value. The caller must provide that if necessary. Because of this, callers
> + * may want to do a lockless errseq_check before taking the lock and calling
> + * this.
> + */
> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> +{
> + int err = 0;
> + errseq_t old, new;
> +
> + /*
> + * Most callers will want to use the inline wrapper to check this,
> + * so that the common case of no error is handled without needing
> + * to lock.
> + */
> + old = READ_ONCE(*eseq);
> + if (old != *since) {
> + /*
> + * Set the flag and try to swap it into place if it has
> + * changed.
> + *
> + * We don't care about the outcome of the swap here. If the
> + * swap doesn't occur, then it has either been updated by a
> + * writer who is bumping the seq count anyway, or another
> + * reader who is just setting the "seen" flag. Either outcome
> + * is OK, and we can advance "since" and return an error based
> + * on what we have.
> + */
> + new = old | ERRSEQ_SEEN;
> + if (new != old)
> + cmpxchg(eseq, old, new);
> + *since = new;
> + err = -(new & MAX_ERRNO);
> + }
> + return err;
> +}
> +EXPORT_SYMBOL(errseq_check_and_advance);
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-05-09 22:03 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-09 15:49 [PATCH v4 00/27] fs: introduce new writeback error reporting and convert existing API as a wrapper around it Jeff Layton
2017-05-09 15:49 ` [PATCH v4 01/27] fs: remove unneeded forward definition of mm_struct from fs.h Jeff Layton
2017-05-10 11:04 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 02/27] mm: drop "wait" parameter from write_one_page Jeff Layton
2017-05-09 15:49 ` [PATCH v4 03/27] mm: fix mapping_set_error call in me_pagecache_dirty Jeff Layton
2017-05-09 15:49 ` [PATCH v4 04/27] buffer: use mapping_set_error instead of setting the flag Jeff Layton
2017-05-09 15:49 ` [PATCH v4 05/27] btrfs: btrfs_wait_tree_block_writeback can be void return Jeff Layton
2017-05-10 11:09 ` Jan Kara
2017-05-19 4:07 ` Liu Bo
2017-05-09 15:49 ` [PATCH v4 06/27] fs: check for writeback errors after syncing out buffers in generic_file_fsync Jeff Layton
2017-05-10 12:48 ` Matthew Wilcox
2017-05-09 15:49 ` [PATCH v4 07/27] orangefs: don't call filemap_write_and_wait from fsync Jeff Layton
2017-05-09 15:49 ` [PATCH v4 08/27] dax: set errors in mapping when writeback fails Jeff Layton
2017-05-09 15:49 ` [PATCH v4 09/27] nilfs2: set the mapping error when calling SetPageError on writeback Jeff Layton
2017-05-09 15:49 ` [PATCH v4 10/27] 9p: set mapping error when writeback fails in launder_page Jeff Layton
2017-05-09 15:49 ` [PATCH v4 11/27] fuse: set mapping error in writepage_locked when it fails Jeff Layton
2017-05-10 11:13 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 12/27] cifs: set mapping error when page writeback fails in writepage or launder_pages Jeff Layton
2017-05-10 11:14 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it Jeff Layton
2017-05-09 22:03 ` NeilBrown [this message]
2017-05-10 11:29 ` Jeff Layton
2017-05-10 11:34 ` Jan Kara
2017-05-10 11:58 ` Jeff Layton
2017-05-10 14:18 ` Matthew Wilcox
2017-05-10 14:56 ` Jeff Layton
2017-05-09 15:49 ` [PATCH v4 14/27] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-05-10 11:48 ` Jan Kara
2017-05-10 12:19 ` Jeff Layton
2017-05-10 13:46 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 15/27] fs: retrofit old error reporting API onto new infrastructure Jeff Layton
2017-05-15 10:42 ` Jan Kara
2017-05-15 17:58 ` Jeff Layton
2017-05-19 19:20 ` Jeff Layton
2017-05-22 13:38 ` Jan Kara
2017-05-22 13:53 ` Jeff Layton
2017-05-22 17:53 ` Jan Kara
2017-05-22 19:09 ` Jeff Layton
2017-05-23 9:05 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 16/27] fs: adapt sync_file_range to new reporting infrastructure Jeff Layton
2017-05-09 15:49 ` [PATCH v4 17/27] mm: remove AS_EIO and AS_ENOSPC flags Jeff Layton
2017-05-09 15:49 ` [PATCH v4 18/27] mm: don't TestClearPageError in __filemap_fdatawait_range Jeff Layton
2017-05-09 15:49 ` [PATCH v4 19/27] buffer: set errors in mapping at the time that the error occurs Jeff Layton
2017-05-15 11:53 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 20/27] cifs: cleanup writeback handling errors and comments Jeff Layton
2017-05-09 15:49 ` [PATCH v4 21/27] mm: clean up error handling in write_one_page Jeff Layton
2017-05-15 12:01 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 22/27] jbd2: don't reset error in journal_finish_inode_data_buffers Jeff Layton
2017-05-15 11:58 ` Jan Kara
2017-05-09 15:49 ` [PATCH v4 23/27] gfs2: clean up some filemap_* calls Jeff Layton
2017-05-10 16:18 ` Bob Peterson
2017-05-09 15:49 ` [PATCH v4 24/27][RFC] nfs: convert to new errseq_t based error tracking for writeback errors Jeff Layton
2017-05-09 15:49 ` [PATCH v4 25/27] Documentation: flesh out the section in vfs.txt on storing and reporting " Jeff Layton
2017-05-09 16:24 ` Jeff Layton
2017-05-09 15:49 ` [PATCH v4 26/27] mm: flesh out comments over mapping_set_error Jeff Layton
2017-05-09 15:49 ` [PATCH v4 27/27] mm: clean up comments in me_pagecache_dirty Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87inl9n0wu.fsf@notabene.neil.brown.name \
--to=neil@brown.name \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bo.li.liu@oracle.com \
--cc=clm@fb.com \
--cc=cluster-devel@redhat.com \
--cc=corbet@lwn.net \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=hubcap@omnibond.com \
--cc=jack@suse.com \
--cc=jfs-discussion@lists.sourceforge.net \
--cc=jlayton@redhat.com \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-nilfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mawilcox@microsoft.com \
--cc=neilb@suse.de \
--cc=ross.zwisler@linux.intel.com \
--cc=rpeterso@redhat.com \
--cc=tytso@mit.edu \
--cc=v9fs-developer@lists.sourceforge.net \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).