linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 4.20 071/117] btrfs: volumes: Make sure there is no overlap of dev extents at mount time
       [not found] <20190108192628.121270-1-sashal@kernel.org>
@ 2019-01-08 19:25 ` Sasha Levin
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2019-01-08 19:25 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Qu Wenruo, David Sterba, Sasha Levin, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

[ Upstream commit 5eb193812a42dc49331f25137a38dfef9612d3e4 ]

Enhance btrfs_verify_dev_extents() to remember previous checked dev
extents, so it can verify no dev extents can overlap.

Analysis from Hans:

"Imagine allocating a DATA|DUP chunk.

 In the chunk allocator, we first set...
   max_stripe_size = SZ_1G;
   max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE
 ... which is 10GiB.

 Then...
   /* we don't want a chunk larger than 10% of writeable space */
   max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
       		 max_chunk_size);

 Imagine we only have one 7880MiB block device in this filesystem. Now
 max_chunk_size is down to 788MiB.

 The next step in the code is to search for max_stripe_size * dev_stripes
 amount of free space on the device, which is in our example 1GiB * 2 =
 2GiB. Imagine the device has exactly 1578MiB free in one contiguous
 piece. This amount of bytes will be put in devices_info[ndevs - 1].max_avail

 Next we recalculate the stripe_size (which is actually the device extent
 length), based on the actual maximum amount of available raw disk space:
   stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);

 stripe_size is now 789MiB

 Next we do...
   data_stripes = num_stripes / ncopies
 ...where data_stripes ends up as 1, because num_stripes is 2 (the amount
 of device extents we're going to have), and DUP has ncopies 2.

 Next there's a check...
   if (stripe_size * data_stripes > max_chunk_size)
 ...which matches because 789MiB * 1 > 788MiB.

 We go into the if code, and next is...
   stripe_size = div_u64(max_chunk_size, data_stripes);
 ...which resets stripe_size to max_chunk_size: 788MiB

 Next is a fun one...
   /* bump the answer up to a 16MB boundary */
   stripe_size = round_up(stripe_size, SZ_16M);
 ...which changes stripe_size from 788MiB to 800MiB.

 We're not done changing stripe_size yet...
   /* But don't go higher than the limits we found while searching
    * for free extents
    */
   stripe_size = min(devices_info[ndevs - 1].max_avail,
       	      stripe_size);

 This is bad. max_avail is twice the stripe_size (we need to fit 2 device
 extents on the same device for DUP).

 The result here is that 800MiB < 1578MiB, so it's unchanged. However,
 the resulting DUP chunk will need 1600MiB disk space, which isn't there,
 and the second dev_extent might extend into the next thing (next
 dev_extent? end of device?) for 22MiB.

 The last shown line of code relies on a situation where there's twice
 the value of stripe_size present as value for the variable stripe_size
 when it's DUP. This was actually the case before commit 92e222df7b
 "btrfs: alloc_chunk: fix DUP stripe size handling", from which I quote:
   "[...] in the meantime there's a check to see if the stripe_size does
 not exceed max_chunk_size. Since during this check stripe_size is twice
 the amount as intended, the check will reduce the stripe_size to
 max_chunk_size if the actual correct to be used stripe_size is more than
 half the amount of max_chunk_size."

 In the previous version of the code, the 16MiB alignment (why is this
 done, by the way?) would result in a 50% chance that it would actually
 do an 8MiB alignment for the individual dev_extents, since it was
 operating on double the size. Does this matter?

 Does it matter that stripe_size can be set to anything which is not
 16MiB aligned because of the amount of remaining available disk space
 which is just taken?

 What is the main purpose of this round_up?

 The most straightforward thing to do seems something like...
   stripe_size = min(
       div_u64(devices_info[ndevs - 1].max_avail, dev_stripes),
       stripe_size
   )
 ..just putting half of the max_avail into stripe_size."

Link: https://lore.kernel.org/linux-btrfs/b3461a38-e5f8-f41d-c67c-2efac8129054@mendix.com/
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ add analysis from report ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/volumes.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..a567ee0bf060 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7478,6 +7478,8 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 	struct btrfs_path *path;
 	struct btrfs_root *root = fs_info->dev_root;
 	struct btrfs_key key;
+	u64 prev_devid = 0;
+	u64 prev_dev_ext_end = 0;
 	int ret = 0;
 
 	key.objectid = 1;
@@ -7522,10 +7524,22 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
 		chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
 		physical_len = btrfs_dev_extent_length(leaf, dext);
 
+		/* Check if this dev extent overlaps with the previous one */
+		if (devid == prev_devid && physical_offset < prev_dev_ext_end) {
+			btrfs_err(fs_info,
+"dev extent devid %llu physical offset %llu overlap with previous dev extent end %llu",
+				  devid, physical_offset, prev_dev_ext_end);
+			ret = -EUCLEAN;
+			goto out;
+		}
+
 		ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
 					    physical_offset, physical_len);
 		if (ret < 0)
 			goto out;
+		prev_devid = devid;
+		prev_dev_ext_end = physical_offset + physical_len;
+
 		ret = btrfs_next_item(root, path);
 		if (ret < 0)
 			goto out;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
       [not found] <20190108192628.121270-1-sashal@kernel.org>
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 071/117] btrfs: volumes: Make sure there is no overlap of dev extents at mount time Sasha Levin
@ 2019-01-08 19:25 ` Sasha Levin
  2019-01-08 23:52   ` Hans van Kranenburg
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 073/117] btrfs: fix use-after-free due to race between replace start and cancel Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Sasha Levin @ 2019-01-08 19:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hans van Kranenburg, David Sterba, Sasha Levin, linux-btrfs

From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

[ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]

Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
fixed calculating the stripe_size for a new DUP chunk.

However, the same calculation reappears a bit later, and that one was
not changed yet. The resulting bug that is exposed is that the newly
allocated device extents ('stripes') can have a few MiB overlap with the
next thing stored after them, which is another device extent or the end
of the disk.

The scenario in which this can happen is:
* The block device for the filesystem is less than 10GiB in size.
* The amount of contiguous free unallocated disk space chosen to use for
  chunk allocation is 20% of the total device size, or a few MiB more or
  less.

An example:
- The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB)
- There's 1578MiB unallocated raw disk space left in one contiguous
  piece.

In this case stripe_size is first calculated as 789MiB, (half of
1578MiB).

Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we
enter the if block. Now stripe_size value is immediately overwritten
while calculating an adjusted value based on max_chunk_size, which ends
up as 788MiB.

Next, the value is rounded up to a 16MiB boundary, 800MiB, which is
actually more than the value we had before. However, the last comparison
fails to detect this, because it's comparing the value with the total
amount of free space, which is about twice the size of stripe_size.

In the example above, this means that the resulting raw disk space being
allocated is 1600MiB, while only a gap of 1578MiB has been found. The
second device extent object for this DUP chunk will overlap for 22MiB
with whatever comes next.

The underlying problem here is that the stripe_size is reused all the
time for different things. So, when entering the code in the if block,
stripe_size is immediately overwritten with something else. If later we
decide we want to have the previous value back, then the logic to
compute it was copy pasted in again.

With this change, the value in stripe_size is not unnecessarily
destroyed, so the duplicated calculation is not needed any more.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/volumes.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a567ee0bf060..1797a82eb7df 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4768,19 +4768,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	/*
 	 * Use the number of data stripes to figure out how big this chunk
 	 * is really going to be in terms of logical address space,
-	 * and compare that answer with the max chunk size
+	 * and compare that answer with the max chunk size. If it's higher,
+	 * we try to reduce stripe_size.
 	 */
 	if (stripe_size * data_stripes > max_chunk_size) {
-		stripe_size = div_u64(max_chunk_size, data_stripes);
-
-		/* bump the answer up to a 16MB boundary */
-		stripe_size = round_up(stripe_size, SZ_16M);
-
 		/*
-		 * But don't go higher than the limits we found while searching
-		 * for free extents
+		 * Reduce stripe_size, round it up to a 16MB boundary again and
+		 * then use it, unless it ends up being even bigger than the
+		 * previous value we had already.
 		 */
-		stripe_size = min(devices_info[ndevs - 1].max_avail,
+		stripe_size = min(round_up(div_u64(max_chunk_size,
+						   data_stripes), SZ_16M),
 				  stripe_size);
 	}
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH AUTOSEL 4.20 073/117] btrfs: fix use-after-free due to race between replace start and cancel
       [not found] <20190108192628.121270-1-sashal@kernel.org>
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 071/117] btrfs: volumes: Make sure there is no overlap of dev extents at mount time Sasha Levin
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling Sasha Levin
@ 2019-01-08 19:25 ` Sasha Levin
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 074/117] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2019-01-08 19:25 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Anand Jain, David Sterba, Sasha Levin, linux-btrfs

From: Anand Jain <anand.jain@oracle.com>

[ Upstream commit d189dd70e2556181732598956d808ea53cc8774e ]

The device replace cancel thread can race with the replace start thread
and if fs_info::scrubs_running is not yet set, btrfs_scrub_cancel() will
fail to stop the scrub thread.

The scrub thread continues with the scrub for replace which then will
try to write to the target device and which is already freed by the
cancel thread.

scrub_setup_ctx() warns as tgtdev is NULL.

  struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
  {
  ...
	  if (is_dev_replace) {
		  WARN_ON(!fs_info->dev_replace.tgtdev);  <===
		  sctx->pages_per_wr_bio = SCRUB_PAGES_PER_WR_BIO;
		  sctx->wr_tgtdev = fs_info->dev_replace.tgtdev;
		  sctx->flush_all_writes = false;
	  }

  [ 6724.497655] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc started
  [ 6753.945017] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 1) to /dev/sdc canceled
  [ 6852.426700] WARNING: CPU: 0 PID: 4494 at fs/btrfs/scrub.c:622 scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
  ...
  [ 6852.428928] RIP: 0010:scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
  ...
  [ 6852.432970] Call Trace:
  [ 6852.433202]  btrfs_scrub_dev+0x19b/0x5c0 [btrfs]
  [ 6852.433471]  btrfs_dev_replace_start+0x48c/0x6a0 [btrfs]
  [ 6852.433800]  btrfs_dev_replace_by_ioctl+0x3a/0x60 [btrfs]
  [ 6852.434097]  btrfs_ioctl+0x2476/0x2d20 [btrfs]
  [ 6852.434365]  ? do_sigaction+0x7d/0x1e0
  [ 6852.434623]  do_vfs_ioctl+0xa9/0x6c0
  [ 6852.434865]  ? syscall_trace_enter+0x1c8/0x310
  [ 6852.435124]  ? syscall_trace_enter+0x1c8/0x310
  [ 6852.435387]  ksys_ioctl+0x60/0x90
  [ 6852.435663]  __x64_sys_ioctl+0x16/0x20
  [ 6852.435907]  do_syscall_64+0x50/0x180
  [ 6852.436150]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Further, as the replace thread enters scrub_write_page_to_dev_replace()
without the target device it panics:

  static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
				      struct scrub_page *spage)
  {
  ...
	bio_set_dev(bio, sbio->dev->bdev); <======

  [ 6929.715145] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
  ..
  [ 6929.717106] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
  [ 6929.717420] RIP: 0010:scrub_write_page_to_dev_replace+0xb4/0x260
  [btrfs]
  ..
  [ 6929.721430] Call Trace:
  [ 6929.721663]  scrub_write_block_to_dev_replace+0x3f/0x60 [btrfs]
  [ 6929.721975]  scrub_bio_end_io_worker+0x1af/0x490 [btrfs]
  [ 6929.722277]  normal_work_helper+0xf0/0x4c0 [btrfs]
  [ 6929.722552]  process_one_work+0x1f4/0x520
  [ 6929.722805]  ? process_one_work+0x16e/0x520
  [ 6929.723063]  worker_thread+0x46/0x3d0
  [ 6929.723313]  kthread+0xf8/0x130
  [ 6929.723544]  ? process_one_work+0x520/0x520
  [ 6929.723800]  ? kthread_delayed_work_timer_fn+0x80/0x80
  [ 6929.724081]  ret_from_fork+0x3a/0x50

Fix this by letting the btrfs_dev_replace_finishing() to do the job of
cleaning after the cancel, including freeing of the target device.
btrfs_dev_replace_finishing() is called when btrfs_scub_dev() returns
along with the scrub return status.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/dev-replace.c | 63 +++++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 2aa48aecc52b..f6e9bf9f9a69 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -797,39 +797,58 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
 		result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NOT_STARTED;
 		btrfs_dev_replace_write_unlock(dev_replace);
-		goto leave;
+		break;
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
+		result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
+		tgt_device = dev_replace->tgtdev;
+		src_device = dev_replace->srcdev;
+		btrfs_dev_replace_write_unlock(dev_replace);
+		btrfs_scrub_cancel(fs_info);
+		/* btrfs_dev_replace_finishing() will handle the cleanup part */
+		btrfs_info_in_rcu(fs_info,
+			"dev_replace from %s (devid %llu) to %s canceled",
+			btrfs_dev_name(src_device), src_device->devid,
+			btrfs_dev_name(tgt_device));
+		break;
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
+		/*
+		 * Scrub doing the replace isn't running so we need to do the
+		 * cleanup step of btrfs_dev_replace_finishing() here
+		 */
 		result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
 		tgt_device = dev_replace->tgtdev;
 		src_device = dev_replace->srcdev;
 		dev_replace->tgtdev = NULL;
 		dev_replace->srcdev = NULL;
-		break;
-	}
-	dev_replace->replace_state = BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED;
-	dev_replace->time_stopped = ktime_get_real_seconds();
-	dev_replace->item_needs_writeback = 1;
-	btrfs_dev_replace_write_unlock(dev_replace);
-	btrfs_scrub_cancel(fs_info);
+		dev_replace->replace_state =
+				BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED;
+		dev_replace->time_stopped = ktime_get_real_seconds();
+		dev_replace->item_needs_writeback = 1;
 
-	trans = btrfs_start_transaction(root, 0);
-	if (IS_ERR(trans)) {
-		mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
-		return PTR_ERR(trans);
-	}
-	ret = btrfs_commit_transaction(trans);
-	WARN_ON(ret);
+		btrfs_dev_replace_write_unlock(dev_replace);
 
-	btrfs_info_in_rcu(fs_info,
-		"dev_replace from %s (devid %llu) to %s canceled",
-		btrfs_dev_name(src_device), src_device->devid,
-		btrfs_dev_name(tgt_device));
+		btrfs_scrub_cancel(fs_info);
+
+		trans = btrfs_start_transaction(root, 0);
+		if (IS_ERR(trans)) {
+			mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
+			return PTR_ERR(trans);
+		}
+		ret = btrfs_commit_transaction(trans);
+		WARN_ON(ret);
 
-	if (tgt_device)
-		btrfs_destroy_dev_replace_tgtdev(tgt_device);
+		btrfs_info_in_rcu(fs_info,
+		"suspended dev_replace from %s (devid %llu) to %s canceled",
+			btrfs_dev_name(src_device), src_device->devid,
+			btrfs_dev_name(tgt_device));
+
+		if (tgt_device)
+			btrfs_destroy_dev_replace_tgtdev(tgt_device);
+		break;
+	default:
+		result = -EINVAL;
+	}
 
-leave:
 	mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
 	return result;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH AUTOSEL 4.20 074/117] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation
       [not found] <20190108192628.121270-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 073/117] btrfs: fix use-after-free due to race between replace start and cancel Sasha Levin
@ 2019-01-08 19:25 ` Sasha Levin
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 075/117] Btrfs: fix access to available allocation bits when starting balance Sasha Levin
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 076/117] btrfs: improve error handling of btrfs_add_link Sasha Levin
  5 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2019-01-08 19:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Filipe Manana, David Sterba, Sasha Levin, linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 9a6f209e36500efac51528132a3e3083586eda5f ]

If the quota enable and snapshot creation ioctls are called concurrently
we can get into a deadlock where the task enabling quotas will deadlock
on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
twice, or the task creating a snapshot tries to commit the transaction
while the task enabling quota waits for the former task to commit the
transaction while holding the mutex. The following time diagrams show how
both cases happen.

First scenario:

           CPU 0                                    CPU 1

 btrfs_ioctl()
  btrfs_ioctl_quota_ctl()
   btrfs_quota_enable()
    mutex_lock(fs_info->qgroup_ioctl_lock)
    btrfs_start_transaction()

                                             btrfs_ioctl()
                                              btrfs_ioctl_snap_create_v2
                                               create_snapshot()
                                                --> adds snapshot to the
                                                    list pending_snapshots
                                                    of the current
                                                    transaction

    btrfs_commit_transaction()
     create_pending_snapshots()
       create_pending_snapshot()
        qgroup_account_snapshot()
         btrfs_qgroup_inherit()
	   mutex_lock(fs_info->qgroup_ioctl_lock)
	    --> deadlock, mutex already locked
	        by this task at
		btrfs_quota_enable()

Second scenario:

           CPU 0                                    CPU 1

 btrfs_ioctl()
  btrfs_ioctl_quota_ctl()
   btrfs_quota_enable()
    mutex_lock(fs_info->qgroup_ioctl_lock)
    btrfs_start_transaction()

                                             btrfs_ioctl()
                                              btrfs_ioctl_snap_create_v2
                                               create_snapshot()
                                                --> adds snapshot to the
                                                    list pending_snapshots
                                                    of the current
                                                    transaction

                                                btrfs_commit_transaction()
                                                 --> waits for task at
                                                     CPU 0 to release
                                                     its transaction
                                                     handle

    btrfs_commit_transaction()
     --> sees another task started
         the transaction commit first
     --> releases its transaction
         handle
     --> waits for the transaction
         commit to be completed by
         the task at CPU 1

                                                 create_pending_snapshot()
                                                  qgroup_account_snapshot()
                                                   btrfs_qgroup_inherit()
                                                    mutex_lock(fs_info->qgroup_ioctl_lock)
                                                     --> deadlock, task at CPU 0
                                                         has the mutex locked but
                                                         it is waiting for us to
                                                         finish the transaction
                                                         commit

So fix this by setting the quota enabled flag in fs_info after committing
the transaction at btrfs_quota_enable(). This ends up serializing quota
enable and snapshot creation as if the snapshot creation happened just
before the quota enable request. The quota rescan task, scheduled after
committing the transaction in btrfs_quote_enable(), will do the accounting.

Fixes: 6426c7ad697d ("btrfs: qgroup: Fix qgroup accounting when creating snapshot")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/qgroup.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f70825af6438..9e419f6878c5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1013,16 +1013,22 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 		btrfs_abort_transaction(trans, ret);
 		goto out_free_path;
 	}
-	spin_lock(&fs_info->qgroup_lock);
-	fs_info->quota_root = quota_root;
-	set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
-	spin_unlock(&fs_info->qgroup_lock);
 
 	ret = btrfs_commit_transaction(trans);
 	trans = NULL;
 	if (ret)
 		goto out_free_path;
 
+	/*
+	 * Set quota enabled flag after committing the transaction, to avoid
+	 * deadlocks on fs_info->qgroup_ioctl_lock with concurrent snapshot
+	 * creation.
+	 */
+	spin_lock(&fs_info->qgroup_lock);
+	fs_info->quota_root = quota_root;
+	set_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
+	spin_unlock(&fs_info->qgroup_lock);
+
 	ret = qgroup_rescan_init(fs_info, 0, 1);
 	if (!ret) {
 	        qgroup_rescan_zero_tracking(fs_info);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH AUTOSEL 4.20 075/117] Btrfs: fix access to available allocation bits when starting balance
       [not found] <20190108192628.121270-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 074/117] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation Sasha Levin
@ 2019-01-08 19:25 ` Sasha Levin
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 076/117] btrfs: improve error handling of btrfs_add_link Sasha Levin
  5 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2019-01-08 19:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Filipe Manana, David Sterba, Sasha Levin, linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 5a8067c0d17feb7579db0476191417b441a8996e ]

The available allocation bits members from struct btrfs_fs_info are
protected by a sequence lock, and when starting balance we access them
incorrectly in two different ways:

1) In the read sequence lock loop at btrfs_balance() we use the values we
   read from fs_info->avail_*_alloc_bits and we can immediately do actions
   that have side effects and can not be undone (printing a message and
   jumping to a label). This is wrong because a retry might be needed, so
   our actions must not have side effects and must be repeatable as long
   as read_seqretry() returns a non-zero value. In other words, we were
   essentially ignoring the sequence lock;

2) Right below the read sequence lock loop, we were reading the values
   from avail_metadata_alloc_bits and avail_data_alloc_bits without any
   protection from concurrent writers, that is, reading them outside of
   the read sequence lock critical section.

So fix this by making sure we only read the available allocation bits
while in a read sequence lock critical section and that what we do in the
critical section is repeatable (has nothing that can not be undone) so
that any eventual retry that is needed is handled properly.

Fixes: de98ced9e743 ("Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits")
Fixes: 14506127979a ("btrfs: fix a bogus warning when converting only data or metadata")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/volumes.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1797a82eb7df..ea5fa9df9405 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3724,6 +3724,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
 	int ret;
 	u64 num_devices;
 	unsigned seq;
+	bool reducing_integrity;
 
 	if (btrfs_fs_closing(fs_info) ||
 	    atomic_read(&fs_info->balance_pause_req) ||
@@ -3803,24 +3804,30 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
 		     !(bctl->sys.target & allowed)) ||
 		    ((bctl->meta.flags & BTRFS_BALANCE_ARGS_CONVERT) &&
 		     (fs_info->avail_metadata_alloc_bits & allowed) &&
-		     !(bctl->meta.target & allowed))) {
-			if (bctl->flags & BTRFS_BALANCE_FORCE) {
-				btrfs_info(fs_info,
-				"balance: force reducing metadata integrity");
-			} else {
-				btrfs_err(fs_info,
-	"balance: reduces metadata integrity, use --force if you want this");
-				ret = -EINVAL;
-				goto out;
-			}
-		}
+		     !(bctl->meta.target & allowed)))
+			reducing_integrity = true;
+		else
+			reducing_integrity = false;
+
+		/* if we're not converting, the target field is uninitialized */
+		meta_target = (bctl->meta.flags & BTRFS_BALANCE_ARGS_CONVERT) ?
+			bctl->meta.target : fs_info->avail_metadata_alloc_bits;
+		data_target = (bctl->data.flags & BTRFS_BALANCE_ARGS_CONVERT) ?
+			bctl->data.target : fs_info->avail_data_alloc_bits;
 	} while (read_seqretry(&fs_info->profiles_lock, seq));
 
-	/* if we're not converting, the target field is uninitialized */
-	meta_target = (bctl->meta.flags & BTRFS_BALANCE_ARGS_CONVERT) ?
-		bctl->meta.target : fs_info->avail_metadata_alloc_bits;
-	data_target = (bctl->data.flags & BTRFS_BALANCE_ARGS_CONVERT) ?
-		bctl->data.target : fs_info->avail_data_alloc_bits;
+	if (reducing_integrity) {
+		if (bctl->flags & BTRFS_BALANCE_FORCE) {
+			btrfs_info(fs_info,
+				   "balance: force reducing metadata integrity");
+		} else {
+			btrfs_err(fs_info,
+	  "balance: reduces metadata integrity, use --force if you want this");
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
 	if (btrfs_get_num_tolerated_disk_barrier_failures(meta_target) <
 		btrfs_get_num_tolerated_disk_barrier_failures(data_target)) {
 		int meta_index = btrfs_bg_flags_to_raid_index(meta_target);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH AUTOSEL 4.20 076/117] btrfs: improve error handling of btrfs_add_link
       [not found] <20190108192628.121270-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 075/117] Btrfs: fix access to available allocation bits when starting balance Sasha Levin
@ 2019-01-08 19:25 ` Sasha Levin
  5 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2019-01-08 19:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Johannes Thumshirn, David Sterba, Sasha Levin, linux-btrfs

From: Johannes Thumshirn <jthumshirn@suse.de>

[ Upstream commit 1690dd41e0cb1dade80850ed8a3eb0121b96d22f ]

In the error handling block, err holds the return value of either
btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked
since it's introduction with commit fe66a05a0679 (Btrfs: improve error
handling for btrfs_insert_dir_item callers) in 2012.

If the error handling in the error handling fails, there's not much left
to do and the abort either happened earlier in the callees or is
necessary here.

So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort
the transaction, but still return the original code of the failure
stored in 'ret' as this will be reported to the user.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9ea4c6f0352f..ff9268a60eaf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6406,14 +6406,19 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
 		err = btrfs_del_root_ref(trans, key.objectid,
 					 root->root_key.objectid, parent_ino,
 					 &local_index, name, name_len);
-
+		if (err)
+			btrfs_abort_transaction(trans, err);
 	} else if (add_backref) {
 		u64 local_index;
 		int err;
 
 		err = btrfs_del_inode_ref(trans, root, name, name_len,
 					  ino, parent_ino, &local_index);
+		if (err)
+			btrfs_abort_transaction(trans, err);
 	}
+
+	/* Return the original error code */
 	return ret;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling Sasha Levin
@ 2019-01-08 23:52   ` Hans van Kranenburg
  2019-01-23 14:37     ` Sasha Levin
  0 siblings, 1 reply; 13+ messages in thread
From: Hans van Kranenburg @ 2019-01-08 23:52 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable; +Cc: David Sterba, linux-btrfs

Hi Sasha,

On 1/8/19 8:25 PM, Sasha Levin wrote:
> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> 
> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
> 
> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
> fixed calculating the stripe_size for a new DUP chunk.

That one also ended up as:

4.14-stable
0136bd7238b2cb8238426af4183ed0b02165c3f9

4.9-stable
8890bae03f4dba1c2292e5445682b556af4e8f1b

4.4-stable
97c3e46ef53748278286fc09dcc30b138d6677c4

3.16.57-rc1
f68f46284a199f6837c1d5b94a6ae979a2cc463c

While hitting the failure condition without adding "crafting" steps to
make it exactly match the scenario is unlikely, it might be good if we
just go all the way back with this regression fix?

Thanks,
Hans

> However, the same calculation reappears a bit later, and that one was
> not changed yet. The resulting bug that is exposed is that the newly
> allocated device extents ('stripes') can have a few MiB overlap with the
> next thing stored after them, which is another device extent or the end
> of the disk.
> 
> The scenario in which this can happen is:
> * The block device for the filesystem is less than 10GiB in size.
> * The amount of contiguous free unallocated disk space chosen to use for
>   chunk allocation is 20% of the total device size, or a few MiB more or
>   less.
> 
> An example:
> - The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB)
> - There's 1578MiB unallocated raw disk space left in one contiguous
>   piece.
> 
> In this case stripe_size is first calculated as 789MiB, (half of
> 1578MiB).
> 
> Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we
> enter the if block. Now stripe_size value is immediately overwritten
> while calculating an adjusted value based on max_chunk_size, which ends
> up as 788MiB.
> 
> Next, the value is rounded up to a 16MiB boundary, 800MiB, which is
> actually more than the value we had before. However, the last comparison
> fails to detect this, because it's comparing the value with the total
> amount of free space, which is about twice the size of stripe_size.
> 
> In the example above, this means that the resulting raw disk space being
> allocated is 1600MiB, while only a gap of 1578MiB has been found. The
> second device extent object for this DUP chunk will overlap for 22MiB
> with whatever comes next.
> 
> The underlying problem here is that the stripe_size is reused all the
> time for different things. So, when entering the code in the if block,
> stripe_size is immediately overwritten with something else. If later we
> decide we want to have the previous value back, then the logic to
> compute it was copy pasted in again.
> 
> With this change, the value in stripe_size is not unnecessarily
> destroyed, so the duplicated calculation is not needed any more.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  fs/btrfs/volumes.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a567ee0bf060..1797a82eb7df 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4768,19 +4768,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	/*
>  	 * Use the number of data stripes to figure out how big this chunk
>  	 * is really going to be in terms of logical address space,
> -	 * and compare that answer with the max chunk size
> +	 * and compare that answer with the max chunk size. If it's higher,
> +	 * we try to reduce stripe_size.
>  	 */
>  	if (stripe_size * data_stripes > max_chunk_size) {
> -		stripe_size = div_u64(max_chunk_size, data_stripes);
> -
> -		/* bump the answer up to a 16MB boundary */
> -		stripe_size = round_up(stripe_size, SZ_16M);
> -
>  		/*
> -		 * But don't go higher than the limits we found while searching
> -		 * for free extents
> +		 * Reduce stripe_size, round it up to a 16MB boundary again and
> +		 * then use it, unless it ends up being even bigger than the
> +		 * previous value we had already.
>  		 */
> -		stripe_size = min(devices_info[ndevs - 1].max_avail,
> +		stripe_size = min(round_up(div_u64(max_chunk_size,
> +						   data_stripes), SZ_16M),
>  				  stripe_size);
>  	}
>  
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-08 23:52   ` Hans van Kranenburg
@ 2019-01-23 14:37     ` Sasha Levin
  2019-01-23 15:54       ` Hans van Kranenburg
  0 siblings, 1 reply; 13+ messages in thread
From: Sasha Levin @ 2019-01-23 14:37 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-kernel, stable, David Sterba, linux-btrfs

On Tue, Jan 08, 2019 at 11:52:02PM +0000, Hans van Kranenburg wrote:
>Hi Sasha,
>
>On 1/8/19 8:25 PM, Sasha Levin wrote:
>> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
>>
>> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
>>
>> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
>> fixed calculating the stripe_size for a new DUP chunk.
>
>That one also ended up as:
>
>4.14-stable
>0136bd7238b2cb8238426af4183ed0b02165c3f9
>
>4.9-stable
>8890bae03f4dba1c2292e5445682b556af4e8f1b
>
>4.4-stable
>97c3e46ef53748278286fc09dcc30b138d6677c4
>
>3.16.57-rc1
>f68f46284a199f6837c1d5b94a6ae979a2cc463c
>
>While hitting the failure condition without adding "crafting" steps to
>make it exactly match the scenario is unlikely, it might be good if we
>just go all the way back with this regression fix?

What do you mean with "all the way back"?

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-23 14:37     ` Sasha Levin
@ 2019-01-23 15:54       ` Hans van Kranenburg
  2019-01-23 17:41         ` David Sterba
  2019-01-23 18:18         ` Sasha Levin
  0 siblings, 2 replies; 13+ messages in thread
From: Hans van Kranenburg @ 2019-01-23 15:54 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, stable, David Sterba, linux-btrfs

On 1/23/19 3:37 PM, Sasha Levin wrote:
> On Tue, Jan 08, 2019 at 11:52:02PM +0000, Hans van Kranenburg wrote:
>> Hi Sasha,
>>
>> On 1/8/19 8:25 PM, Sasha Levin wrote:
>>> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
>>>
>>> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
>>>
>>> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
>>> fixed calculating the stripe_size for a new DUP chunk.
>>
>> That one also ended up as:
>>
>> 4.14-stable
>> 0136bd7238b2cb8238426af4183ed0b02165c3f9
>>
>> 4.9-stable
>> 8890bae03f4dba1c2292e5445682b556af4e8f1b
>>
>> 4.4-stable
>> 97c3e46ef53748278286fc09dcc30b138d6677c4
>>
>> 3.16.57-rc1
>> f68f46284a199f6837c1d5b94a6ae979a2cc463c
>>
>> While hitting the failure condition without adding "crafting" steps to
>> make it exactly match the scenario is unlikely, it might be good if we
>> just go all the way back with this regression fix?
> 
> What do you mean with "all the way back"?

Oh, apologies for not using unambigious phrasing.

I mean, it seems the autoselection only found 92e222df7b in places where
it's actually called 92e222df7b, and not where it was cherry-picked.

So, for my own understanding: If I have to do something like this ever
again, then should I have added it like this inside baf92114c?

Fixes: 92e222df7b ("btrfs: alloc_chunk: fix DUP stripe size handling")
Fixes: 0136bd7238 ("btrfs: alloc_chunk: fix DUP stripe size handling")
Fixes: 8890bae03f ("btrfs: alloc_chunk: fix DUP stripe size handling")
Fixes: 97c3e46ef5 ("btrfs: alloc_chunk: fix DUP stripe size handling")
Fixes: f68f46284a ("btrfs: alloc_chunk: fix DUP stripe size handling")

Thanks for your patience, :)

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-23 15:54       ` Hans van Kranenburg
@ 2019-01-23 17:41         ` David Sterba
  2019-01-23 18:18         ` Sasha Levin
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-01-23 17:41 UTC (permalink / raw)
  To: Hans van Kranenburg
  Cc: Sasha Levin, linux-kernel, stable, David Sterba, linux-btrfs

On Wed, Jan 23, 2019 at 03:54:00PM +0000, Hans van Kranenburg wrote:
> On 1/23/19 3:37 PM, Sasha Levin wrote:
> > On Tue, Jan 08, 2019 at 11:52:02PM +0000, Hans van Kranenburg wrote:
> >> Hi Sasha,
> >>
> >> On 1/8/19 8:25 PM, Sasha Levin wrote:
> >>> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> >>>
> >>> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
> >>>
> >>> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
> >>> fixed calculating the stripe_size for a new DUP chunk.
> >>
> >> That one also ended up as:
> >>
> >> 4.14-stable
> >> 0136bd7238b2cb8238426af4183ed0b02165c3f9
> >>
> >> 4.9-stable
> >> 8890bae03f4dba1c2292e5445682b556af4e8f1b
> >>
> >> 4.4-stable
> >> 97c3e46ef53748278286fc09dcc30b138d6677c4
> >>
> >> 3.16.57-rc1
> >> f68f46284a199f6837c1d5b94a6ae979a2cc463c
> >>
> >> While hitting the failure condition without adding "crafting" steps to
> >> make it exactly match the scenario is unlikely, it might be good if we
> >> just go all the way back with this regression fix?
> > 
> > What do you mean with "all the way back"?
> 
> Oh, apologies for not using unambigious phrasing.
> 
> I mean, it seems the autoselection only found 92e222df7b in places where
> it's actually called 92e222df7b, and not where it was cherry-picked.
> 
> So, for my own understanding: If I have to do something like this ever
> again, then should I have added it like this inside baf92114c?
> 
> Fixes: 92e222df7b ("btrfs: alloc_chunk: fix DUP stripe size handling")
> Fixes: 0136bd7238 ("btrfs: alloc_chunk: fix DUP stripe size handling")
> Fixes: 8890bae03f ("btrfs: alloc_chunk: fix DUP stripe size handling")
> Fixes: 97c3e46ef5 ("btrfs: alloc_chunk: fix DUP stripe size handling")
> Fixes: f68f46284a ("btrfs: alloc_chunk: fix DUP stripe size handling")

The Fixes: tag should always refer to a commit id in Linus' tree, the
stable tree backports use that for common reference so the individual
commit ids of stable trees are not relevant.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-23 15:54       ` Hans van Kranenburg
  2019-01-23 17:41         ` David Sterba
@ 2019-01-23 18:18         ` Sasha Levin
  2019-01-23 19:32           ` Hans van Kranenburg
  1 sibling, 1 reply; 13+ messages in thread
From: Sasha Levin @ 2019-01-23 18:18 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-kernel, stable, David Sterba, linux-btrfs

On Wed, Jan 23, 2019 at 03:54:00PM +0000, Hans van Kranenburg wrote:
>On 1/23/19 3:37 PM, Sasha Levin wrote:
>> On Tue, Jan 08, 2019 at 11:52:02PM +0000, Hans van Kranenburg wrote:
>>> Hi Sasha,
>>>
>>> On 1/8/19 8:25 PM, Sasha Levin wrote:
>>>> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
>>>>
>>>> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
>>>>
>>>> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
>>>> fixed calculating the stripe_size for a new DUP chunk.
>>>
>>> That one also ended up as:
>>>
>>> 4.14-stable
>>> 0136bd7238b2cb8238426af4183ed0b02165c3f9
>>>
>>> 4.9-stable
>>> 8890bae03f4dba1c2292e5445682b556af4e8f1b
>>>
>>> 4.4-stable
>>> 97c3e46ef53748278286fc09dcc30b138d6677c4
>>>
>>> 3.16.57-rc1
>>> f68f46284a199f6837c1d5b94a6ae979a2cc463c
>>>
>>> While hitting the failure condition without adding "crafting" steps to
>>> make it exactly match the scenario is unlikely, it might be good if we
>>> just go all the way back with this regression fix?
>>
>> What do you mean with "all the way back"?
>
>Oh, apologies for not using unambigious phrasing.
>
>I mean, it seems the autoselection only found 92e222df7b in places where
>it's actually called 92e222df7b, and not where it was cherry-picked.
>
>So, for my own understanding: If I have to do something like this ever
>again, then should I have added it like this inside baf92114c?
>
>Fixes: 92e222df7b ("btrfs: alloc_chunk: fix DUP stripe size handling")
>Fixes: 0136bd7238 ("btrfs: alloc_chunk: fix DUP stripe size handling")
>Fixes: 8890bae03f ("btrfs: alloc_chunk: fix DUP stripe size handling")
>Fixes: 97c3e46ef5 ("btrfs: alloc_chunk: fix DUP stripe size handling")
>Fixes: f68f46284a ("btrfs: alloc_chunk: fix DUP stripe size handling")
>
>Thanks for your patience, :)

Ah, the scripts have enough "brains" to deal with these on their own, so
no need to annotate that much.

This patch wasn't applied to older trees because it didn't cherry-pick
cleanly on top of them. Looking at it now, it seems to depend on
793ff2c88c6 ("btrfs: volumes: Cleanup stripe size calculation") which
can possibly be picked up if it makes sense.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-23 18:18         ` Sasha Levin
@ 2019-01-23 19:32           ` Hans van Kranenburg
  2019-11-19 15:23             ` Ben Hutchings
  0 siblings, 1 reply; 13+ messages in thread
From: Hans van Kranenburg @ 2019-01-23 19:32 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, David Sterba, linux-btrfs, Ben Hutchings

[-- Attachment #1: Type: text/plain, Size: 3110 bytes --]

(Add to Cc: Ben Hutchings)

On 1/23/19 7:18 PM, Sasha Levin wrote:
> On Wed, Jan 23, 2019 at 03:54:00PM +0000, Hans van Kranenburg wrote:
>> On 1/23/19 3:37 PM, Sasha Levin wrote:
>>> On Tue, Jan 08, 2019 at 11:52:02PM +0000, Hans van Kranenburg wrote:
>>>> Hi Sasha,
>>>>
>>>> On 1/8/19 8:25 PM, Sasha Levin wrote:
>>>>> From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
>>>>>
>>>>> [ Upstream commit baf92114c7e6dd6124aa3d506e4bc4b694da3bc3 ]
>>>>>
>>>>> Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
>>>>> fixed calculating the stripe_size for a new DUP chunk.
>>>>
>>>> That one also ended up as:
>>>>
>>>> 4.14-stable
>>>> 0136bd7238b2cb8238426af4183ed0b02165c3f9
>>>>
>>>> 4.9-stable
>>>> 8890bae03f4dba1c2292e5445682b556af4e8f1b
>>>>
>>>> 4.4-stable
>>>> 97c3e46ef53748278286fc09dcc30b138d6677c4
>>>>
>>>> 3.16.57-rc1
>>>> f68f46284a199f6837c1d5b94a6ae979a2cc463c
>>>>
>>>> While hitting the failure condition without adding "crafting" steps to
>>>> make it exactly match the scenario is unlikely, it might be good if we
>>>> just go all the way back with this regression fix?
>>>
>>> What do you mean with "all the way back"?
>>
>> Oh, apologies for not using unambigious phrasing.
>>
>> I mean, it seems the autoselection only found 92e222df7b in places where
>> it's actually called 92e222df7b, and not where it was cherry-picked.
>>
>> So, for my own understanding: If I have to do something like this ever
>> again, then should I have added it like this inside baf92114c?
>>
>> Fixes: 92e222df7b ("btrfs: alloc_chunk: fix DUP stripe size handling")
>> Fixes: 0136bd7238 ("btrfs: alloc_chunk: fix DUP stripe size handling")
>> Fixes: 8890bae03f ("btrfs: alloc_chunk: fix DUP stripe size handling")
>> Fixes: 97c3e46ef5 ("btrfs: alloc_chunk: fix DUP stripe size handling")
>> Fixes: f68f46284a ("btrfs: alloc_chunk: fix DUP stripe size handling")
>>
>> Thanks for your patience, :)
> 
> Ah, the scripts have enough "brains" to deal with these on their own, so
> no need to annotate that much.
> 
> This patch wasn't applied to older trees because it didn't cherry-pick
> cleanly on top of them. Looking at it now, it seems to depend on
> 793ff2c88c6 ("btrfs: volumes: Cleanup stripe size calculation") which
> can possibly be picked up if it makes sense.

Ok, I get it.

The changes are really limited to the few lines in that if block. And
793ff2c88c6 also is, and it doesn't change the behavior, so that's good.

For 3.16.y it also first needs a little part of b8b93addde from David
Sterba, which is a collection of coding style changes in quite some
places. After that 793ff2c88c6 and then baf92114c7 apply cleanly, only
touch that single part of the code and end up with the right thing.

I attached 3.16.y-WIP-partially-apply-b8b93addde.patch as quick and
dirty example, please let me know how this should be done properly.

For 4.4, 4.9 and 4.14, first 793ff2c88c6 and then baf92114c7 indeed also
end up with the right thing. I just tried that here.

Thanks,
-- 
Hans van Kranenburg

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 3.16.y-WIP-partially-apply-b8b93addde.patch --]
[-- Type: text/x-patch; name="3.16.y-WIP-partially-apply-b8b93addde.patch", Size: 871 bytes --]

From f56c231f4251fd2cae5e5ab3ccc39e57c57c83ca Mon Sep 17 00:00:00 2001
From: Hans van Kranenburg <hans@knorrie.org>
Date: Wed, 23 Jan 2019 20:03:49 +0100
Subject: [PATCH] WIP partially apply b8b93addde

---
 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4aa1a20fc5d7..b4b98a75ca8b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4274,8 +4274,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	 */
 	if (stripe_size * data_stripes > max_chunk_size) {
 		u64 mask = (1ULL << 24) - 1;
-		stripe_size = max_chunk_size;
-		do_div(stripe_size, data_stripes);
+
+		stripe_size = div_u64(max_chunk_size, data_stripes);
 
 		/* bump the answer up to a 16MB boundary */
 		stripe_size = (stripe_size + mask) & ~mask;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling
  2019-01-23 19:32           ` Hans van Kranenburg
@ 2019-11-19 15:23             ` Ben Hutchings
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Hutchings @ 2019-11-19 15:23 UTC (permalink / raw)
  To: Hans van Kranenburg, Sasha Levin
  Cc: linux-kernel, stable, David Sterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

On Wed, 2019-01-23 at 19:32 +0000, Hans van Kranenburg wrote:
[...]
> For 3.16.y it also first needs a little part of b8b93addde from David
> Sterba, which is a collection of coding style changes in quite some
> places. After that 793ff2c88c6 and then baf92114c7 apply cleanly, only
> touch that single part of the code and end up with the right thing.
> 
> I attached 3.16.y-WIP-partially-apply-b8b93addde.patch as quick and
> dirty example, please let me know how this should be done properly.
> 
> For 4.4, 4.9 and 4.14, first 793ff2c88c6 and then baf92114c7 indeed also
> end up with the right thing. I just tried that here.

I've belatedly queued up the required changes for 3.16, thanks.

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice - John Levine



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-19 15:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190108192628.121270-1-sashal@kernel.org>
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 071/117] btrfs: volumes: Make sure there is no overlap of dev extents at mount time Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 072/117] btrfs: alloc_chunk: fix more DUP stripe size handling Sasha Levin
2019-01-08 23:52   ` Hans van Kranenburg
2019-01-23 14:37     ` Sasha Levin
2019-01-23 15:54       ` Hans van Kranenburg
2019-01-23 17:41         ` David Sterba
2019-01-23 18:18         ` Sasha Levin
2019-01-23 19:32           ` Hans van Kranenburg
2019-11-19 15:23             ` Ben Hutchings
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 073/117] btrfs: fix use-after-free due to race between replace start and cancel Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 074/117] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 075/117] Btrfs: fix access to available allocation bits when starting balance Sasha Levin
2019-01-08 19:25 ` [PATCH AUTOSEL 4.20 076/117] btrfs: improve error handling of btrfs_add_link Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).