linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.com>
Subject: [PATCH 4/5] btrfs: serialize blocking_writers updates
Date: Thu, 17 Oct 2019 21:39:00 +0200	[thread overview]
Message-ID: <2a310858bfa3754f6f7a4d4b7693959b0fdd7005.1571340084.git.dsterba@suse.com> (raw)
In-Reply-To: <cover.1571340084.git.dsterba@suse.com>

There's one potential memory ordering problem regarding
eb::blocking_writers, although this hasn't been hit in practice. On TSO
(total store ordering) architectures, the writes will always be seen in
right order.

In case the calls in btrfs_set_lock_blocking_write are seen in this
(wrong) order on another CPU:

0:  /* blocking_writers == 0 */
1:  write_unlock()
2:  blocking_writers = 1

btrfs_try_tree_read_lock, btrfs_tree_read_lock_atomic or
btrfs_try_tree_write_lock would have to squeeze all of this between 1:
and 2:

- check if blocking_writers is 0 (it is, continue)
- try read lock, read lock or write lock (succeeds after 1:)
- check blocking_writers again (continue)

All of that assumes that the reordered writes can survive for quite some
time (unlikely if its in the internal store buffer).

Another point against observing that in practice is that
blocking_writers and the lock are on the same cacheline (64B), so it's
more likely both values are stored in order, or some sort of pending
write flush will update blocking_writers, rwlock before the try lock
happens. Assuming the CPUs work like that and don't hide other
surprises.

struct extent_buffer {
	u64                        start;                /*     0     8 */
	long unsigned int          len;                  /*     8     8 */
	long unsigned int          bflags;               /*    16     8 */
	struct btrfs_fs_info *     fs_info;              /*    24     8 */
	spinlock_t                 refs_lock;            /*    32     4 */
	atomic_t                   refs;                 /*    36     4 */
	atomic_t                   io_pages;             /*    40     4 */
	int                        read_mirror;          /*    44     4 */
	struct callback_head callback_head __attribute__((__aligned__(8))); /*    48    16 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	pid_t                      lock_owner;           /*    64     4 */
	int                        blocking_writers;     /*    68     4 */
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	atomic_t                   blocking_readers;     /*    72     4 */
	bool                       lock_nested;          /*    76     1 */

	/* XXX 1 byte hole, try to pack */

	short int                  log_index;            /*    78     2 */
	rwlock_t                   lock;                 /*    80     8 */
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	wait_queue_head_t          write_lock_wq;        /*    88    24 */
	wait_queue_head_t          read_lock_wq;         /*   112    24 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	struct page *              pages[16];            /*   136   128 */

	/* size: 264, cachelines: 5, members: 18 */
	/* sum members: 263, holes: 1, sum holes: 1 */
	/* forced alignments: 1 */
	/* last cacheline: 8 bytes */
} __attribute__((__aligned__(8)));

Add the barriers for correctness sake.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/locking.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index a12f3edc3505..e0e0430577aa 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -110,6 +110,18 @@ void btrfs_set_lock_blocking_write(struct extent_buffer *eb)
 		btrfs_assert_spinning_writers_put(eb);
 		btrfs_assert_tree_locked(eb);
 		WRITE_ONCE(eb->blocking_writers, 1);
+		/*
+		 * Writers must be be updated before the lock is released so
+		 * that other threads see that in order. Otherwise this could
+		 * theoretically lead to blocking_writers be still set to 1 but
+		 * this would be unexpected eg. for spinning locks.
+		 *
+		 * There are no pairing read barriers for unlocked access as we
+		 * don't strictly need them for correctness (eg. in
+		 * btrfs_try_tree_read_lock), and the unlock/lock is an implied
+		 * barrier in other cases.
+		 */
+		smp_wmb();
 		write_unlock(&eb->lock);
 	}
 }
@@ -316,7 +328,9 @@ void btrfs_tree_unlock(struct extent_buffer *eb)
 		/*
 		 * We need to order modifying blocking_writers above with
 		 * actually waking up the sleepers to ensure they see the
-		 * updated value of blocking_writers
+		 * updated value of blocking_writers.
+		 *
+		 * cond_wake_up calls smp_mb
 		 */
 		cond_wake_up(&eb->write_lock_wq);
 	} else {
-- 
2.23.0


  parent reply	other threads:[~2019-10-17 19:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 19:38 [PATCH 0/5] Extent buffer locking and documentation David Sterba
2019-10-17 19:38 ` [PATCH 1/5] btrfs: merge blocking_writers branches in btrfs_tree_read_lock David Sterba
2019-10-17 19:38 ` [PATCH 2/5] btrfs: set blocking_writers directly, no increment or decrement David Sterba
2019-10-18 12:08   ` Nikolay Borisov
2019-10-18 17:31     ` David Sterba
2019-10-17 19:38 ` [PATCH 3/5] btrfs: access eb::blocking_writers according to ACCESS_ONCE policies David Sterba
2019-10-23  8:42   ` Nikolay Borisov
2019-10-29 21:33     ` David Sterba
2019-10-17 19:39 ` David Sterba [this message]
2019-10-23  9:57   ` [PATCH 4/5] btrfs: serialize blocking_writers updates Nikolay Borisov
2019-10-29 17:51     ` David Sterba
2019-10-29 18:48       ` Nikolay Borisov
2019-10-29 21:15         ` David Sterba
2019-10-17 19:39 ` [PATCH 5/5] btrfs: document extent buffer locking David Sterba
2019-10-18  0:17   ` Qu Wenruo
2019-10-18 11:56     ` David Sterba
2019-10-22  9:53   ` Nikolay Borisov
2019-10-22 10:41     ` David Sterba
2019-10-23 11:16   ` Nikolay Borisov
2019-10-29 17:33     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a310858bfa3754f6f7a4d4b7693959b0fdd7005.1571340084.git.dsterba@suse.com \
    --to=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).