[PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance
@ 2020-07-06  7:44 Qu Wenruo
  2020-07-06  7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo
  2020-07-06  7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo
  0 siblings, 2 replies; 12+ messages in thread
From: Qu Wenruo @ 2020-07-06  7:44 UTC (permalink / raw)
  To: linux-btrfs

There is a report that, unlucky signal timing during balance can cause
btrfs to remounted into RO mode.

This is caused by the fact that, most btrfs_start_transaction() or
delalloc metadata reserve are interruptible.

That would return -EINTR to a lot of critical code section, and under
most case, our way to handle such error is just to abort transaction,
without any consideration for -EINTR.

This is never a good idea to allow random Ctrl-C to make btrfs RO, even
if the window is pretty small for regular operations.

This patchset will address it in a different direction, since most
operations are pretty fast, we don't need that signal check in waiting
ticket.

For those long running operations, signal should be checked in their
call sites.
E.g. __generic_block_fiemap() calls fatal_signal_pending() to check if
it needs to exit, so does btrfs_clone().

We shouldn't check the signal, and just throw a -EINTR for all ticketing
system callers, they don't really want to handle that damn -EINTR.

Only long executing operations really need that signal checking, and let
them to check, not the infrastructure.

Reason for RFC:
I'm not yet completely sure if uninterruptible ticketing system would
cause extra problems.
Any advice on that would be great.

Qu Wenruo (2):
  btrfs: relocation: Allow signal to cancel balance
  btrfs: space-info: Don't allow signal to interrupt ticket waiting

 fs/btrfs/relocation.c | 3 ++-
 fs/btrfs/space-info.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

-- 
2.27.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance
  2020-07-06  7:44 [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance Qu Wenruo
@ 2020-07-06  7:44 ` Qu Wenruo
  2020-07-06 13:45   ` Josef Bacik
  2020-07-06 18:19   ` Hans van Kranenburg
  2020-07-06  7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo
  1 sibling, 2 replies; 12+ messages in thread
From: Qu Wenruo @ 2020-07-06  7:44 UTC (permalink / raw)
  To: linux-btrfs

Although btrfs balance can be canceled with "btrfs balance cancel"
command, it's still almost muscle memory to press Ctrl-C to cancel a
long running btrfs balance.

So allow btrfs balance to check signal to determine if it should exit.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 523d2e5fab8f..2b869fb2e62c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2656,7 +2656,8 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
  */
 int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info)
 {
-	return atomic_read(&fs_info->balance_cancel_req);
+	return atomic_read(&fs_info->balance_cancel_req) ||
+		fatal_signal_pending(current);
 }
 ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06  7:44 [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance Qu Wenruo
  2020-07-06  7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo
@ 2020-07-06  7:44 ` Qu Wenruo
  2020-07-06 13:45   ` Josef Bacik
  1 sibling, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2020-07-06  7:44 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Hans van Kranenburg

[BUG]
When balance receive a fatal signal, it can make the fs to read-only
mode if the timing is unlucky enough:

  BTRFS info (device xvdb): balance: start -d -m -s
  BTRFS info (device xvdb): relocating block group 73001861120 flags metadata
  BTRFS info (device xvdb): found 12236 extents, stage: move data extents
  BTRFS info (device xvdb): relocating block group 71928119296 flags data
  BTRFS info (device xvdb): found 3 extents, stage: move data extents
  BTRFS info (device xvdb): found 3 extents, stage: update data pointers
  BTRFS info (device xvdb): relocating block group 60922265600 flags metadata
  BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown
  BTRFS info (device xvdb): forced readonly
  BTRFS info (device xvdb): balance: ended with status: -4

[CAUSE]
This is caused by the fact that btrfs ticketing space system can be
interrupted, and cause all kind of -EINTR returned to various critical
section, where we never thought of -EINTR at all.

Even for things like btrfs_start_transaction() can be affected by
signal:
 btrfs_start_transaction()
 |- start_transaction(flush = FLUSH_ALL)
    |- btrfs_block_rsv_add()
       |- btrfs_reserve_metadata_bytes()
          |- __reserve_metadata_bytes()
             |- handle_reserve_ticket()
                |- wait_reserve_ticket()
                   |- prepare_to_wait_event(TASK_KILLABLE)
                   |- ticket->error = -EINTR;

And all related callers get -EINTR error.

In fact, there are really very limited call sites can really handle that
-EINTR properly, above btrfs_drop_snapshot() is one case.

[FIX]
Things like metadata allocation is really a critical section for btrfs,
we don't really want it to be that killable by some impatient users.

In fact, for really long duration calls, it should have their own checks
on signal, like balance, reflink, generic fiemap calls.

So this patch will make ticket waiting uninterruptible, relying on each
long duration calls to handle their signals more properly.

Reported-by: Hans van Kranenburg <hans@knorrie.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/space-info.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c7bd3fdd7792..c5cfc759b804 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1099,7 +1099,8 @@ static void wait_reserve_ticket(struct btrfs_fs_info *fs_info,
 
 	spin_lock(&space_info->lock);
 	while (ticket->bytes > 0 && ticket->error == 0) {
-		ret = prepare_to_wait_event(&ticket->wait, &wait, TASK_KILLABLE);
+		ret = prepare_to_wait_event(&ticket->wait, &wait,
+					    TASK_UNINTERRUPTIBLE);
 		if (ret) {
 			/*
 			 * Delete us from the list. After we unlock the space
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06  7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo
@ 2020-07-06 13:45   ` Josef Bacik
  2020-07-06 13:50     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-07-06 13:45 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg

On 7/6/20 3:44 AM, Qu Wenruo wrote:
> [BUG]
> When balance receive a fatal signal, it can make the fs to read-only
> mode if the timing is unlucky enough:
> 
>    BTRFS info (device xvdb): balance: start -d -m -s
>    BTRFS info (device xvdb): relocating block group 73001861120 flags metadata
>    BTRFS info (device xvdb): found 12236 extents, stage: move data extents
>    BTRFS info (device xvdb): relocating block group 71928119296 flags data
>    BTRFS info (device xvdb): found 3 extents, stage: move data extents
>    BTRFS info (device xvdb): found 3 extents, stage: update data pointers
>    BTRFS info (device xvdb): relocating block group 60922265600 flags metadata
>    BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown
>    BTRFS info (device xvdb): forced readonly
>    BTRFS info (device xvdb): balance: ended with status: -4
> 
> [CAUSE]
> This is caused by the fact that btrfs ticketing space system can be
> interrupted, and cause all kind of -EINTR returned to various critical
> section, where we never thought of -EINTR at all.
> 
> Even for things like btrfs_start_transaction() can be affected by
> signal:
>   btrfs_start_transaction()
>   |- start_transaction(flush = FLUSH_ALL)
>      |- btrfs_block_rsv_add()
>         |- btrfs_reserve_metadata_bytes()
>            |- __reserve_metadata_bytes()
>               |- handle_reserve_ticket()
>                  |- wait_reserve_ticket()
>                     |- prepare_to_wait_event(TASK_KILLABLE)
>                     |- ticket->error = -EINTR;
> 
> And all related callers get -EINTR error.
> 
> In fact, there are really very limited call sites can really handle that
> -EINTR properly, above btrfs_drop_snapshot() is one case.
> 
> [FIX]
> Things like metadata allocation is really a critical section for btrfs,
> we don't really want it to be that killable by some impatient users.
> 
> In fact, for really long duration calls, it should have their own checks
> on signal, like balance, reflink, generic fiemap calls.
> 
> So this patch will make ticket waiting uninterruptible, relying on each
> long duration calls to handle their signals more properly.
> 

Nope, everybody that calls start_transaction() should be able to handle -EINTR, 
so we need to find those callsites and fix them, not make it so we hang the box 
because we don't like fixing error paths.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance
  2020-07-06  7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo
@ 2020-07-06 13:45   ` Josef Bacik
  2020-07-06 18:19   ` Hans van Kranenburg
  1 sibling, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-07-06 13:45 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 7/6/20 3:44 AM, Qu Wenruo wrote:
> Although btrfs balance can be canceled with "btrfs balance cancel"
> command, it's still almost muscle memory to press Ctrl-C to cancel a
> long running btrfs balance.
> 
> So allow btrfs balance to check signal to determine if it should exit.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06 13:45   ` Josef Bacik
@ 2020-07-06 13:50     ` Qu Wenruo
  2020-07-06 13:53       ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2020-07-06 13:50 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: Hans van Kranenburg



On 2020/7/6 下午9:45, Josef Bacik wrote:
> On 7/6/20 3:44 AM, Qu Wenruo wrote:
>> [BUG]
>> When balance receive a fatal signal, it can make the fs to read-only
>> mode if the timing is unlucky enough:
>>
>>    BTRFS info (device xvdb): balance: start -d -m -s
>>    BTRFS info (device xvdb): relocating block group 73001861120 flags
>> metadata
>>    BTRFS info (device xvdb): found 12236 extents, stage: move data
>> extents
>>    BTRFS info (device xvdb): relocating block group 71928119296 flags
>> data
>>    BTRFS info (device xvdb): found 3 extents, stage: move data extents
>>    BTRFS info (device xvdb): found 3 extents, stage: update data pointers
>>    BTRFS info (device xvdb): relocating block group 60922265600 flags
>> metadata
>>    BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4
>> unknown
>>    BTRFS info (device xvdb): forced readonly
>>    BTRFS info (device xvdb): balance: ended with status: -4
>>
>> [CAUSE]
>> This is caused by the fact that btrfs ticketing space system can be
>> interrupted, and cause all kind of -EINTR returned to various critical
>> section, where we never thought of -EINTR at all.
>>
>> Even for things like btrfs_start_transaction() can be affected by
>> signal:
>>   btrfs_start_transaction()
>>   |- start_transaction(flush = FLUSH_ALL)
>>      |- btrfs_block_rsv_add()
>>         |- btrfs_reserve_metadata_bytes()
>>            |- __reserve_metadata_bytes()
>>               |- handle_reserve_ticket()
>>                  |- wait_reserve_ticket()
>>                     |- prepare_to_wait_event(TASK_KILLABLE)
>>                     |- ticket->error = -EINTR;
>>
>> And all related callers get -EINTR error.
>>
>> In fact, there are really very limited call sites can really handle that
>> -EINTR properly, above btrfs_drop_snapshot() is one case.
>>
>> [FIX]
>> Things like metadata allocation is really a critical section for btrfs,
>> we don't really want it to be that killable by some impatient users.
>>
>> In fact, for really long duration calls, it should have their own checks
>> on signal, like balance, reflink, generic fiemap calls.
>>
>> So this patch will make ticket waiting uninterruptible, relying on each
>> long duration calls to handle their signals more properly.
>>
> 
> Nope, everybody that calls start_transaction() should be able to handle
> -EINTR, so we need to find those callsites and fix them, not make it so
> we hang the box because we don't like fixing error paths.  Thanks,

Then we also need to handle btrfs_delalloc_reserve_metadata(),
btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add().

Are you really willing to go to that rabbit hole?

To me, there are only limited call sites would benefit from signal
checking, while we need to handle tons of unnecessary possible -EINTR
errors just for no obvious benefits for other calls sites?

That doesn't sound sane to me.

Thanks,
Qu

> 
> Josef
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06 13:50     ` Qu Wenruo
@ 2020-07-06 13:53       ` Josef Bacik
  2020-07-06 14:05         ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-07-06 13:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg

On 7/6/20 9:50 AM, Qu Wenruo wrote:
> 
> 
> On 2020/7/6 下午9:45, Josef Bacik wrote:
>> On 7/6/20 3:44 AM, Qu Wenruo wrote:
>>> [BUG]
>>> When balance receive a fatal signal, it can make the fs to read-only
>>> mode if the timing is unlucky enough:
>>>
>>>     BTRFS info (device xvdb): balance: start -d -m -s
>>>     BTRFS info (device xvdb): relocating block group 73001861120 flags
>>> metadata
>>>     BTRFS info (device xvdb): found 12236 extents, stage: move data
>>> extents
>>>     BTRFS info (device xvdb): relocating block group 71928119296 flags
>>> data
>>>     BTRFS info (device xvdb): found 3 extents, stage: move data extents
>>>     BTRFS info (device xvdb): found 3 extents, stage: update data pointers
>>>     BTRFS info (device xvdb): relocating block group 60922265600 flags
>>> metadata
>>>     BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4
>>> unknown
>>>     BTRFS info (device xvdb): forced readonly
>>>     BTRFS info (device xvdb): balance: ended with status: -4
>>>
>>> [CAUSE]
>>> This is caused by the fact that btrfs ticketing space system can be
>>> interrupted, and cause all kind of -EINTR returned to various critical
>>> section, where we never thought of -EINTR at all.
>>>
>>> Even for things like btrfs_start_transaction() can be affected by
>>> signal:
>>>    btrfs_start_transaction()
>>>    |- start_transaction(flush = FLUSH_ALL)
>>>       |- btrfs_block_rsv_add()
>>>          |- btrfs_reserve_metadata_bytes()
>>>             |- __reserve_metadata_bytes()
>>>                |- handle_reserve_ticket()
>>>                   |- wait_reserve_ticket()
>>>                      |- prepare_to_wait_event(TASK_KILLABLE)
>>>                      |- ticket->error = -EINTR;
>>>
>>> And all related callers get -EINTR error.
>>>
>>> In fact, there are really very limited call sites can really handle that
>>> -EINTR properly, above btrfs_drop_snapshot() is one case.
>>>
>>> [FIX]
>>> Things like metadata allocation is really a critical section for btrfs,
>>> we don't really want it to be that killable by some impatient users.
>>>
>>> In fact, for really long duration calls, it should have their own checks
>>> on signal, like balance, reflink, generic fiemap calls.
>>>
>>> So this patch will make ticket waiting uninterruptible, relying on each
>>> long duration calls to handle their signals more properly.
>>>
>>
>> Nope, everybody that calls start_transaction() should be able to handle
>> -EINTR, so we need to find those callsites and fix them, not make it so
>> we hang the box because we don't like fixing error paths.  Thanks,
> 
> Then we also need to handle btrfs_delalloc_reserve_metadata(),
> btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add().
> 

This only needs to be handled for FLUSH_ALL and FLUSH_STEAL.  Anybody doing 
btrfs_start_transaction() should be able to fail with -EINTR, because they 
should be close to the syscall path.  Balance needs to be fixed to deal with it, 
and I assume there might be a few other places, but by-in-large none of these 
places should flip read only.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06 13:53       ` Josef Bacik
@ 2020-07-06 14:05         ` Qu Wenruo
  2020-07-06 14:33           ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2020-07-06 14:05 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg


[-- Attachment #1.1: Type: text/plain, Size: 4025 bytes --]



On 2020/7/6 下午9:53, Josef Bacik wrote:
> On 7/6/20 9:50 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/6 下午9:45, Josef Bacik wrote:
>>> On 7/6/20 3:44 AM, Qu Wenruo wrote:
>>>> [BUG]
>>>> When balance receive a fatal signal, it can make the fs to read-only
>>>> mode if the timing is unlucky enough:
>>>>
>>>>     BTRFS info (device xvdb): balance: start -d -m -s
>>>>     BTRFS info (device xvdb): relocating block group 73001861120 flags
>>>> metadata
>>>>     BTRFS info (device xvdb): found 12236 extents, stage: move data
>>>> extents
>>>>     BTRFS info (device xvdb): relocating block group 71928119296 flags
>>>> data
>>>>     BTRFS info (device xvdb): found 3 extents, stage: move data extents
>>>>     BTRFS info (device xvdb): found 3 extents, stage: update data
>>>> pointers
>>>>     BTRFS info (device xvdb): relocating block group 60922265600 flags
>>>> metadata
>>>>     BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4
>>>> unknown
>>>>     BTRFS info (device xvdb): forced readonly
>>>>     BTRFS info (device xvdb): balance: ended with status: -4
>>>>
>>>> [CAUSE]
>>>> This is caused by the fact that btrfs ticketing space system can be
>>>> interrupted, and cause all kind of -EINTR returned to various critical
>>>> section, where we never thought of -EINTR at all.
>>>>
>>>> Even for things like btrfs_start_transaction() can be affected by
>>>> signal:
>>>>    btrfs_start_transaction()
>>>>    |- start_transaction(flush = FLUSH_ALL)
>>>>       |- btrfs_block_rsv_add()
>>>>          |- btrfs_reserve_metadata_bytes()
>>>>             |- __reserve_metadata_bytes()
>>>>                |- handle_reserve_ticket()
>>>>                   |- wait_reserve_ticket()
>>>>                      |- prepare_to_wait_event(TASK_KILLABLE)
>>>>                      |- ticket->error = -EINTR;
>>>>
>>>> And all related callers get -EINTR error.
>>>>
>>>> In fact, there are really very limited call sites can really handle
>>>> that
>>>> -EINTR properly, above btrfs_drop_snapshot() is one case.
>>>>
>>>> [FIX]
>>>> Things like metadata allocation is really a critical section for btrfs,
>>>> we don't really want it to be that killable by some impatient users.
>>>>
>>>> In fact, for really long duration calls, it should have their own
>>>> checks
>>>> on signal, like balance, reflink, generic fiemap calls.
>>>>
>>>> So this patch will make ticket waiting uninterruptible, relying on each
>>>> long duration calls to handle their signals more properly.
>>>>
>>>
>>> Nope, everybody that calls start_transaction() should be able to handle
>>> -EINTR, so we need to find those callsites and fix them, not make it so
>>> we hang the box because we don't like fixing error paths.  Thanks,
>>
>> Then we also need to handle btrfs_delalloc_reserve_metadata(),
>> btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add().
>>
> 
> This only needs to be handled for FLUSH_ALL and FLUSH_STEAL.  Anybody
> doing btrfs_start_transaction() should be able to fail with -EINTR,
> because they should be close to the syscall path.  Balance needs to be
> fixed to deal with it, and I assume there might be a few other places,
> but by-in-large none of these places should flip read only.  Thanks,

There are already ones existing, for btrfs_drop_snapshot().

This is mostly caused by btrfs_delalloc_reserve_metadata(), which always
use FLUSH_ALL unless there is a running trans or its space cache inode.

But still, for a lot of relocation code, we don't really want to bother
the EINTR and just want uninterruptible at least for now.

Any idea for that?
Or just rework how we handle errors in a lot of places?

It doesn't look correct for such a low level mechanism to return -EINTR
while most of callers doesn't really want to bother.

Thanks,
Qu

> 
> Josef


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06 14:05         ` Qu Wenruo
@ 2020-07-06 14:33           ` Josef Bacik
  2020-07-07 16:16             ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-07-06 14:33 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: Hans van Kranenburg

On 7/6/20 10:05 AM, Qu Wenruo wrote:
> 
> 
> On 2020/7/6 下午9:53, Josef Bacik wrote:
>> On 7/6/20 9:50 AM, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/7/6 下午9:45, Josef Bacik wrote:
>>>> On 7/6/20 3:44 AM, Qu Wenruo wrote:
>>>>> [BUG]
>>>>> When balance receive a fatal signal, it can make the fs to read-only
>>>>> mode if the timing is unlucky enough:
>>>>>
>>>>>      BTRFS info (device xvdb): balance: start -d -m -s
>>>>>      BTRFS info (device xvdb): relocating block group 73001861120 flags
>>>>> metadata
>>>>>      BTRFS info (device xvdb): found 12236 extents, stage: move data
>>>>> extents
>>>>>      BTRFS info (device xvdb): relocating block group 71928119296 flags
>>>>> data
>>>>>      BTRFS info (device xvdb): found 3 extents, stage: move data extents
>>>>>      BTRFS info (device xvdb): found 3 extents, stage: update data
>>>>> pointers
>>>>>      BTRFS info (device xvdb): relocating block group 60922265600 flags
>>>>> metadata
>>>>>      BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4
>>>>> unknown
>>>>>      BTRFS info (device xvdb): forced readonly
>>>>>      BTRFS info (device xvdb): balance: ended with status: -4
>>>>>
>>>>> [CAUSE]
>>>>> This is caused by the fact that btrfs ticketing space system can be
>>>>> interrupted, and cause all kind of -EINTR returned to various critical
>>>>> section, where we never thought of -EINTR at all.
>>>>>
>>>>> Even for things like btrfs_start_transaction() can be affected by
>>>>> signal:
>>>>>     btrfs_start_transaction()
>>>>>     |- start_transaction(flush = FLUSH_ALL)
>>>>>        |- btrfs_block_rsv_add()
>>>>>           |- btrfs_reserve_metadata_bytes()
>>>>>              |- __reserve_metadata_bytes()
>>>>>                 |- handle_reserve_ticket()
>>>>>                    |- wait_reserve_ticket()
>>>>>                       |- prepare_to_wait_event(TASK_KILLABLE)
>>>>>                       |- ticket->error = -EINTR;
>>>>>
>>>>> And all related callers get -EINTR error.
>>>>>
>>>>> In fact, there are really very limited call sites can really handle
>>>>> that
>>>>> -EINTR properly, above btrfs_drop_snapshot() is one case.
>>>>>
>>>>> [FIX]
>>>>> Things like metadata allocation is really a critical section for btrfs,
>>>>> we don't really want it to be that killable by some impatient users.
>>>>>
>>>>> In fact, for really long duration calls, it should have their own
>>>>> checks
>>>>> on signal, like balance, reflink, generic fiemap calls.
>>>>>
>>>>> So this patch will make ticket waiting uninterruptible, relying on each
>>>>> long duration calls to handle their signals more properly.
>>>>>
>>>>
>>>> Nope, everybody that calls start_transaction() should be able to handle
>>>> -EINTR, so we need to find those callsites and fix them, not make it so
>>>> we hang the box because we don't like fixing error paths.  Thanks,
>>>
>>> Then we also need to handle btrfs_delalloc_reserve_metadata(),
>>> btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add().
>>>
>>
>> This only needs to be handled for FLUSH_ALL and FLUSH_STEAL.  Anybody
>> doing btrfs_start_transaction() should be able to fail with -EINTR,
>> because they should be close to the syscall path.  Balance needs to be
>> fixed to deal with it, and I assume there might be a few other places,
>> but by-in-large none of these places should flip read only.  Thanks,
> 
> There are already ones existing, for btrfs_drop_snapshot().
> 
> This is mostly caused by btrfs_delalloc_reserve_metadata(), which always
> use FLUSH_ALL unless there is a running trans or its space cache inode.
> 
> But still, for a lot of relocation code, we don't really want to bother
> the EINTR and just want uninterruptible at least for now.
> 
> Any idea for that?
> Or just rework how we handle errors in a lot of places?
> 
> It doesn't look correct for such a low level mechanism to return -EINTR
> while most of callers doesn't really want to bother.
> 

That's the point, most callers shouldn't have to, because most callers shouldn't 
be far enough into their operations that -EINTR causes problems.

We should probably just change btrfs_drop_snapshot to use join, and then have it 
do any space reservation it needs outside of the trans handle.  The other option 
is a FLUSH_ALL_UNKILLABLE.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance
  2020-07-06  7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo
  2020-07-06 13:45   ` Josef Bacik
@ 2020-07-06 18:19   ` Hans van Kranenburg
  2020-07-06 22:43     ` Qu Wenruo
  1 sibling, 1 reply; 12+ messages in thread
From: Hans van Kranenburg @ 2020-07-06 18:19 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hi,

On 7/6/20 9:44 AM, Qu Wenruo wrote:
> Although btrfs balance can be canceled with "btrfs balance cancel"
> command, it's still almost muscle memory to press Ctrl-C to cancel a
> long running btrfs balance.

Thanks for investigating all of this.

Has it actually been unsafe (read: undefined behaviour) forever, or only
since some recent change?

Or did it just by accident not cause real damage earlier on in the past?

[ 1041.810963] BTRFS info (device xvdb): relocating block group
91423244288 flags metadata

<- ^C made it stop here

[ 1076.189766] BTRFS info (device xvdb): relocating block group
91423244288 flags metadata
[ 1081.300131] BTRFS info (device xvdb): found 6689 extents
[ 1081.311138] BTRFS info (device xvdb): relocating block group
90349502464 flags data
[ 1089.776066] BTRFS info (device xvdb): found 215 extents

The above is with 4.19.118. Now I'm trying this again and looking just a
little better at the logging, I see that what I thought (it always
stopped after doing 1 block group) is not true. With ^C I can make it
stop halfway and then next time it again starts at 91423244288.

Related question: should, additionally, the btrfs balance in progs also
be changed to catch the SIGINT while it's doing the balance ioctl, to
prevent the signal from leaking to the kernel space? I mean, kernels
with the bug are already 'out there' now...

Thanks

> So allow btrfs balance to check signal to determine if it should exit.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/relocation.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 523d2e5fab8f..2b869fb2e62c 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2656,7 +2656,8 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
>   */
>  int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info)
>  {
> -	return atomic_read(&fs_info->balance_cancel_req);
> +	return atomic_read(&fs_info->balance_cancel_req) ||
> +		fatal_signal_pending(current);
>  }
>  ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE);
>  
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance
  2020-07-06 18:19   ` Hans van Kranenburg
@ 2020-07-06 22:43     ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2020-07-06 22:43 UTC (permalink / raw)
  To: Hans van Kranenburg, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --]



On 2020/7/7 上午2:19, Hans van Kranenburg wrote:
> Hi,
> 
> On 7/6/20 9:44 AM, Qu Wenruo wrote:
>> Although btrfs balance can be canceled with "btrfs balance cancel"
>> command, it's still almost muscle memory to press Ctrl-C to cancel a
>> long running btrfs balance.
> 
> Thanks for investigating all of this.
> 
> Has it actually been unsafe (read: undefined behaviour) forever, or only
> since some recent change?

Forever.

That -EINTR from metadata reservation path is there for a long long time.

> 
> Or did it just by accident not cause real damage earlier on in the past?
> 
> [ 1041.810963] BTRFS info (device xvdb): relocating block group
> 91423244288 flags metadata
> 
> <- ^C made it stop here
> 
> [ 1076.189766] BTRFS info (device xvdb): relocating block group
> 91423244288 flags metadata
> [ 1081.300131] BTRFS info (device xvdb): found 6689 extents
> [ 1081.311138] BTRFS info (device xvdb): relocating block group
> 90349502464 flags data
> [ 1089.776066] BTRFS info (device xvdb): found 215 extents
> 
> The above is with 4.19.118. Now I'm trying this again and looking just a
> little better at the logging, I see that what I thought (it always
> stopped after doing 1 block group) is not true. With ^C I can make it
> stop halfway and then next time it again starts at 91423244288.
> 
> Related question: should, additionally, the btrfs balance in progs also
> be changed to catch the SIGINT while it's doing the balance ioctl, to
> prevent the signal from leaking to the kernel space? I mean, kernels
> with the bug are already 'out there' now...

It has no way to catch signal while trapped into kernel space.

My previous assumption of the whole ioctl thing is still right, when
we're in kernel space, we can't catch any signal.

It's just the metadata reservation code manually checking the pending
signal and return -EINTR.

So even if we make btrfs-progs to catch that signal, it won't work.
And even if it seems to work, it's because in btrfs module we're
checking signal explicitly.

Thanks,
Qu

> 
> Thanks
> 
>> So allow btrfs balance to check signal to determine if it should exit.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/relocation.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 523d2e5fab8f..2b869fb2e62c 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -2656,7 +2656,8 @@ int setup_extent_mapping(struct inode *inode, u64 start, u64 end,
>>   */
>>  int btrfs_should_cancel_balance(struct btrfs_fs_info *fs_info)
>>  {
>> -	return atomic_read(&fs_info->balance_cancel_req);
>> +	return atomic_read(&fs_info->balance_cancel_req) ||
>> +		fatal_signal_pending(current);
>>  }
>>  ALLOW_ERROR_INJECTION(btrfs_should_cancel_balance, TRUE);
>>  
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting
  2020-07-06 14:33           ` Josef Bacik
@ 2020-07-07 16:16             ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2020-07-07 16:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, Qu Wenruo, linux-btrfs, Hans van Kranenburg

On Mon, Jul 06, 2020 at 10:33:56AM -0400, Josef Bacik wrote:
> On 7/6/20 10:05 AM, Qu Wenruo wrote:
> >> This only needs to be handled for FLUSH_ALL and FLUSH_STEAL.  Anybody
> >> doing btrfs_start_transaction() should be able to fail with -EINTR,
> >> because they should be close to the syscall path.  Balance needs to be
> >> fixed to deal with it, and I assume there might be a few other places,
> >> but by-in-large none of these places should flip read only.  Thanks,
> > 
> > There are already ones existing, for btrfs_drop_snapshot().
> > 
> > This is mostly caused by btrfs_delalloc_reserve_metadata(), which always
> > use FLUSH_ALL unless there is a running trans or its space cache inode.
> > 
> > But still, for a lot of relocation code, we don't really want to bother
> > the EINTR and just want uninterruptible at least for now.
> > 
> > Any idea for that?
> > Or just rework how we handle errors in a lot of places?
> > 
> > It doesn't look correct for such a low level mechanism to return -EINTR
> > while most of callers doesn't really want to bother.
> > 
> 
> That's the point, most callers shouldn't have to, because most callers shouldn't 
> be far enough into their operations that -EINTR causes problems.

Agreed, that's a sane pattern to follow so we should convert the
remaining cases, perhaps also auditing all existing
btrfs_start_transaction calls but after a quick look I haven't spotted
anything else than the reloc and drop snapshot.

> We should probably just change btrfs_drop_snapshot to use join, and then have it 
> do any space reservation it needs outside of the trans handle.  The other option 
> is a FLUSH_ALL_UNKILLABLE.  Thanks,

I'd rather not push the killable semantics to the flushing state machine
and let it up to the caller context to decide.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-07 16:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06  7:44 [PATCH RFC 0/2] btrfs: make ticket wait uninterruptible to address unexpected RO during balance Qu Wenruo
2020-07-06  7:44 ` [PATCH RFC 1/2] btrfs: relocation: Allow signal to cancel balance Qu Wenruo
2020-07-06 13:45   ` Josef Bacik
2020-07-06 18:19   ` Hans van Kranenburg
2020-07-06 22:43     ` Qu Wenruo
2020-07-06  7:44 ` [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting Qu Wenruo
2020-07-06 13:45   ` Josef Bacik
2020-07-06 13:50     ` Qu Wenruo
2020-07-06 13:53       ` Josef Bacik
2020-07-06 14:05         ` Qu Wenruo
2020-07-06 14:33           ` Josef Bacik
2020-07-07 16:16             ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.