All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] Deadlock when swapping a table with a dm-era target
@ 2021-12-01 17:07 Nikos Tsironis
  2021-12-02 15:41 ` Zdenek Kabelac
  0 siblings, 1 reply; 5+ messages in thread
From: Nikos Tsironis @ 2021-12-01 17:07 UTC (permalink / raw)
  To: dm-devel; +Cc: ejt, agk, Mike Snitzer

Hello,

Under certain conditions, swapping a table, that includes a dm-era
target, with a new table, causes a deadlock.

This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
in the suspended dm-era target.

dm-era executes all metadata operations in a worker thread, which stops
processing requests when the target is suspended, and resumes again when
the target is resumed.

So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
device blocks, until the device is resumed.

This seems to be a problem on its own.

If we then load a new table to the device, while the aforementioned
dmsetup command is blocked in dm-era, and resume the device, we
deadlock.

The problem is that the 'dmsetup status' and 'dmsetup message' commands
hold a reference to the live table, i.e., they hold an SRCU read lock on
md->io_barrier, while they are blocked.

When the device is resumed, the old table is replaced with the new one
by dm_swap_table(), which ends up calling synchronize_srcu() on
md->io_barrier.

Since the blocked dmsetup command is holding the SRCU read lock, and the
old table is never resumed, 'dmsetup resume' blocks too, and we have a
deadlock.

Steps to reproduce:

1. Create device with dm-era target

    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"

2. Suspend the device

    # dmsetup suspend eradev

3. Load new table to device, e.g., to resize the device

    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"

4. Device now has LIVE and INACTIVE tables

    # dmsetup info eradev
    Name:              eradev
    State:             SUSPENDED
    Read Ahead:        16384
    Tables present:    LIVE & INACTIVE
    Open count:        0
    Event number:      0
    Major, minor:      252, 2
    Number of targets: 1

5. Retrieve the status of the device. This blocks because the device is
    suspended. Equivalently, any 'dmsetup message' operation would block
    too. This command holds the SRCU read lock.

    # dmsetup status eradev

6. Resume the device. The resume operation tries to swap the old table
    with the new one and deadlocks, because it synchronizes SRCU for the
    old table, while the blocked 'dmsetup status' holds the SRCU read
    lock. And the old table is never resumed again at this point.

    # dmsetup resume eradev

7. The relevant dmesg logs are:


[ 7093.345486] dm-2: detected capacity change from 1048576 to 2097152
[ 7250.875665] INFO: task dmsetup:1986 blocked for more than 120 seconds.
[ 7250.875722]       Not tainted 5.16.0-rc2-release+ #16
[ 7250.875756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7250.875803] task:dmsetup         state:D stack:    0 pid: 1986 ppid:  1313 flags:0x00000000
[ 7250.875809] Call Trace:
[ 7250.875812]  <TASK>
[ 7250.875816]  __schedule+0x330/0x8b0
[ 7250.875827]  schedule+0x4e/0xc0
[ 7250.875831]  schedule_timeout+0x20f/0x2e0
[ 7250.875836]  ? do_set_pte+0xb8/0x120
[ 7250.875843]  ? prep_new_page+0x91/0xa0
[ 7250.875847]  wait_for_completion+0x8c/0xf0
[ 7250.875854]  perform_rpc+0x95/0xb0 [dm_era]
[ 7250.875862]  in_worker1.constprop.20+0x48/0x70 [dm_era]
[ 7250.875867]  ? era_iterate_devices+0x30/0x30 [dm_era]
[ 7250.875872]  ? era_status+0x64/0x1e0 [dm_era]
[ 7250.875877]  era_status+0x64/0x1e0 [dm_era]
[ 7250.875882]  ? dm_get_live_or_inactive_table.isra.11+0x20/0x20 [dm_mod]
[ 7250.875900]  ? __mod_node_page_state+0x82/0xc0
[ 7250.875909]  retrieve_status+0xbc/0x1e0 [dm_mod]
[ 7250.875921]  ? dm_get_live_or_inactive_table.isra.11+0x20/0x20 [dm_mod]
[ 7250.875932]  table_status+0x61/0xa0 [dm_mod]
[ 7250.875942]  ctl_ioctl+0x1b5/0x4f0 [dm_mod]
[ 7250.875956]  dm_ctl_ioctl+0xa/0x10 [dm_mod]
[ 7250.875966]  __x64_sys_ioctl+0x8e/0xd0
[ 7250.875970]  do_syscall_64+0x3a/0xd0
[ 7250.875974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7250.875980] RIP: 0033:0x7f20b7cd4017
[ 7250.875984] RSP: 002b:00007ffd443874b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7250.875988] RAX: ffffffffffffffda RBX: 000055d69d6bd0e0 RCX: 00007f20b7cd4017
[ 7250.875991] RDX: 000055d69d6bd0e0 RSI: 00000000c138fd0c RDI: 0000000000000003
[ 7250.875993] RBP: 000000000000001e R08: 00007f20b81df648 R09: 00007ffd44387320
[ 7250.875996] R10: 00007f20b81deb53 R11: 0000000000000246 R12: 000055d69d6bd110
[ 7250.875998] R13: 00007f20b81deb53 R14: 000055d69d6bd000 R15: 0000000000000000
[ 7250.876002]  </TASK>
[ 7250.876004] INFO: task dmsetup:1987 blocked for more than 120 seconds.
[ 7250.876046]       Not tainted 5.16.0-rc2-release+ #16
[ 7250.876083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7250.876129] task:dmsetup         state:D stack:    0 pid: 1987 ppid:  1385 flags:0x00000000
[ 7250.876134] Call Trace:
[ 7250.876136]  <TASK>
[ 7250.876138]  __schedule+0x330/0x8b0
[ 7250.876142]  schedule+0x4e/0xc0
[ 7250.876145]  schedule_timeout+0x20f/0x2e0
[ 7250.876149]  ? __queue_work+0x226/0x420
[ 7250.876156]  wait_for_completion+0x8c/0xf0
[ 7250.876160]  __synchronize_srcu.part.19+0x92/0xc0
[ 7250.876167]  ? __bpf_trace_rcu_stall_warning+0x10/0x10
[ 7250.876173]  ? dm_swap_table+0x2f4/0x310 [dm_mod]
[ 7250.876185]  dm_swap_table+0x2f4/0x310 [dm_mod]
[ 7250.876198]  ? table_load+0x360/0x360 [dm_mod]
[ 7250.876207]  dev_suspend+0x95/0x250 [dm_mod]
[ 7250.876217]  ctl_ioctl+0x1b5/0x4f0 [dm_mod]
[ 7250.876231]  dm_ctl_ioctl+0xa/0x10 [dm_mod]
[ 7250.876240]  __x64_sys_ioctl+0x8e/0xd0
[ 7250.876244]  do_syscall_64+0x3a/0xd0
[ 7250.876247]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 7250.876252] RIP: 0033:0x7f15e9254017
[ 7250.876254] RSP: 002b:00007ffffdc59458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7250.876257] RAX: ffffffffffffffda RBX: 000055d4d99560e0 RCX: 00007f15e9254017
[ 7250.876260] RDX: 000055d4d99560e0 RSI: 00000000c138fd06 RDI: 0000000000000003
[ 7250.876261] RBP: 000000000000000f R08: 00007f15e975f648 R09: 00007ffffdc592c0
[ 7250.876263] R10: 00007f15e975eb53 R11: 0000000000000246 R12: 000055d4d9956110
[ 7250.876265] R13: 00007f15e975eb53 R14: 000055d4d9956000 R15: 0000000000000001
[ 7250.876269]  </TASK>

I was thinking of how to fix this, and I would like your feedback to
ensure I work on the right direction.

I have thought of the following possible solutions.

1. Have dm-era fail all operations while it's suspended.

    This would work for messages, since they return an error code, but
    the status operation doesn't return errors.

    Moreover, I think it makes sense for the status operation to work
    even if the device is suspended, so failing it doesn't seem the right
    thing to do.

    Maybe it's possible to fix dm-era to bypass the worker thread when
    suspended, and return a valid status? I haven't checked yet if this
    is possible.

2. Redesign dm-era to use locks for accessing its metadata, so we don't
    depend on the worker thread to serialize metadata operations.

    This way we can run all required metadata operations directly from
    the user thread that runs the dmsetup command.

3. Could DM core handle this situation somehow?

    As far as I can tell, the rest of the targets don't block in status
    and message operations until the target is resumed. Is this a
    requirement imposed by DM core, that dm-era violates? I couldn't find
    any documentation regarding this.

I think the right way to go is the second approach, that is redesign
dm-era to use locks instead of depending on the worker thread to
serialize metadata operations, but I would like your input before moving
on.

Looking forward to your feedback,
Nikos.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] Deadlock when swapping a table with a dm-era target
  2021-12-01 17:07 [dm-devel] Deadlock when swapping a table with a dm-era target Nikos Tsironis
@ 2021-12-02 15:41 ` Zdenek Kabelac
  2021-12-03 14:42   ` Nikos Tsironis
  0 siblings, 1 reply; 5+ messages in thread
From: Zdenek Kabelac @ 2021-12-02 15:41 UTC (permalink / raw)
  To: Nikos Tsironis, dm-devel

Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
> Hello,
>
> Under certain conditions, swapping a table, that includes a dm-era
> target, with a new table, causes a deadlock.
>
> This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
> in the suspended dm-era target.
>
> dm-era executes all metadata operations in a worker thread, which stops
> processing requests when the target is suspended, and resumes again when
> the target is resumed.
>
> So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
> device blocks, until the device is resumed.
>
> This seems to be a problem on its own.
>
> If we then load a new table to the device, while the aforementioned
> dmsetup command is blocked in dm-era, and resume the device, we
> deadlock.
>
> The problem is that the 'dmsetup status' and 'dmsetup message' commands
> hold a reference to the live table, i.e., they hold an SRCU read lock on
> md->io_barrier, while they are blocked.
>
> When the device is resumed, the old table is replaced with the new one
> by dm_swap_table(), which ends up calling synchronize_srcu() on
> md->io_barrier.
>
> Since the blocked dmsetup command is holding the SRCU read lock, and the
> old table is never resumed, 'dmsetup resume' blocks too, and we have a
> deadlock.
>
> Steps to reproduce:
>
> 1. Create device with dm-era target
>
>    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta 
> /dev/datavg/eradata 8192"
>
> 2. Suspend the device
>
>    # dmsetup suspend eradev
>
> 3. Load new table to device, e.g., to resize the device
>
>    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta 
> /dev/datavg/eradata 8192"
>

Your sequence is faulty - you must always preload  new table before suspend.

Suspend&Resume should be absolutely minimal in its timing.

Also nothing should be allocating memory in suspend so that's why suspend has 
to be used after table line is fully loaded.

Regards


Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] Deadlock when swapping a table with a dm-era target
  2021-12-02 15:41 ` Zdenek Kabelac
@ 2021-12-03 14:42   ` Nikos Tsironis
  2021-12-03 16:00     ` Zdenek Kabelac
  0 siblings, 1 reply; 5+ messages in thread
From: Nikos Tsironis @ 2021-12-03 14:42 UTC (permalink / raw)
  To: Zdenek Kabelac, dm-devel

On 12/2/21 5:41 PM, Zdenek Kabelac wrote:
> Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
>> Hello,
>>
>> Under certain conditions, swapping a table, that includes a dm-era
>> target, with a new table, causes a deadlock.
>>
>> This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
>> in the suspended dm-era target.
>>
>> dm-era executes all metadata operations in a worker thread, which stops
>> processing requests when the target is suspended, and resumes again when
>> the target is resumed.
>>
>> So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
>> device blocks, until the device is resumed.
>>
>> This seems to be a problem on its own.
>>
>> If we then load a new table to the device, while the aforementioned
>> dmsetup command is blocked in dm-era, and resume the device, we
>> deadlock.
>>
>> The problem is that the 'dmsetup status' and 'dmsetup message' commands
>> hold a reference to the live table, i.e., they hold an SRCU read lock on
>> md->io_barrier, while they are blocked.
>>
>> When the device is resumed, the old table is replaced with the new one
>> by dm_swap_table(), which ends up calling synchronize_srcu() on
>> md->io_barrier.
>>
>> Since the blocked dmsetup command is holding the SRCU read lock, and the
>> old table is never resumed, 'dmsetup resume' blocks too, and we have a
>> deadlock.
>>
>> Steps to reproduce:
>>
>> 1. Create device with dm-era target
>>
>>    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
>> 2. Suspend the device
>>
>>    # dmsetup suspend eradev
>>
>> 3. Load new table to device, e.g., to resize the device
>>
>>    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
> 
> Your sequence is faulty - you must always preload  new table before suspend.
> 
> Suspend&Resume should be absolutely minimal in its timing.
> 
> Also nothing should be allocating memory in suspend so that's why suspend has to be used after table line is fully loaded.
> 

Hi Zdenek,

Thanks for the feedback. There doesn't seem to be any documentation
mentioning that loading the new table should happen before suspend, so
thanks a lot for explaining it.

Unfortunately, this isn't what causes the deadlock. The following
sequence, which loads the table before suspend, also results in a
deadlock:

1. Create device with dm-era target

    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"

2. Load new table to device, e.g., to resize the device

    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"

3. Suspend the device

    # dmsetup suspend eradev

4. Retrieve the status of the device. This blocks for the reasons I
    explained in my previous email.

    # dmsetup status eradev

5. Resume the device. This deadlocks for the reasons I explained in my
    previous email.

    # dmsetup resume eradev

6. The dmesg logs are the same as the ones I included in my previous
    email.

I have explained the reasons for the deadlock in my previous email, but
I would be more than happy to discuss them more.

I would also like your feedback on the solutions I proposed there, so I
can work on a fix.

Thanks,
Nikos.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] Deadlock when swapping a table with a dm-era target
  2021-12-03 14:42   ` Nikos Tsironis
@ 2021-12-03 16:00     ` Zdenek Kabelac
  2021-12-08 20:10       ` Nikos Tsironis
  0 siblings, 1 reply; 5+ messages in thread
From: Zdenek Kabelac @ 2021-12-03 16:00 UTC (permalink / raw)
  To: Nikos Tsironis, dm-devel

Dne 03. 12. 21 v 15:42 Nikos Tsironis napsal(a):
> On 12/2/21 5:41 PM, Zdenek Kabelac wrote:
>> Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
>>> Hello,
>>>
>>> Under certain conditions, swapping a table, that includes a dm-era
>>> target, with a new table, causes a deadlock.
>>>
>>> This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
>>> in the suspended dm-era target.
>>>
>>> dm-era executes all metadata operations in a worker thread, which stops
>>> processing requests when the target is suspended, and resumes again when
>>> the target is resumed.
>>>
>>> So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
>>> device blocks, until the device is resumed.
>>>
> Hi Zdenek,
>
> Thanks for the feedback. There doesn't seem to be any documentation
> mentioning that loading the new table should happen before suspend, so
> thanks a lot for explaining it.
>
> Unfortunately, this isn't what causes the deadlock. The following
> sequence, which loads the table before suspend, also results in a
> deadlock:
>
> 1. Create device with dm-era target
>
>    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta 
> /dev/datavg/eradata 8192"
>
> 2. Load new table to device, e.g., to resize the device
>
>    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta 
> /dev/datavg/eradata 8192"
>
> 3. Suspend the device
>
>    # dmsetup suspend eradev
>
> 4. Retrieve the status of the device. This blocks for the reasons I
>    explained in my previous email.
>
>    # dmsetup status eradev


Hi

Querying 'status' while the device is suspend is the next issue you need to 
fix in your workflow.

Normally 'status' operation may need to flush queued IO operations to get 
accurate data.

So you should query states before you start to mess with tables.

If you want to get 'status' without flushing - use:   'dmsetup status --noflush'.


> 5. Resume the device. This deadlocks for the reasons I explained in my
>    previous email.
>
>    # dmsetup resume eradev
>
> 6. The dmesg logs are the same as the ones I included in my previous
>    email.
>
> I have explained the reasons for the deadlock in my previous email, but
> I would be more than happy to discuss them more.
>

There is no bug - if your only problem is 'stuck'  status while you have 
devices in suspended state.

You should NOT be doing basically anything while being suspend!!

i.e. imagine you suspend 'swap' device and while you are in suspened state 
kernel decides to swap memory pages - so you get instantly frozen here.

For this reason lvm2 while doing  'suspend/resume' sequance preallocates all 
memory in front of this operation - does very minimal set of operation between 
suspend/resume to minimize also latencies and so on.

Clearly if you suspend just some 'supportive'  disk of yours - you likely are 
no in danger of blocking your swap - but the 'status --noflush' logic still 
applies.


Regards

Zdenek


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dm-devel] Deadlock when swapping a table with a dm-era target
  2021-12-03 16:00     ` Zdenek Kabelac
@ 2021-12-08 20:10       ` Nikos Tsironis
  0 siblings, 0 replies; 5+ messages in thread
From: Nikos Tsironis @ 2021-12-08 20:10 UTC (permalink / raw)
  To: Zdenek Kabelac, dm-devel

On 12/3/21 6:00 PM, Zdenek Kabelac wrote:
> Dne 03. 12. 21 v 15:42 Nikos Tsironis napsal(a):
>> On 12/2/21 5:41 PM, Zdenek Kabelac wrote:
>>> Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
>>>> Hello,
>>>>
>>>> Under certain conditions, swapping a table, that includes a dm-era
>>>> target, with a new table, causes a deadlock.
>>>>
>>>> This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
>>>> in the suspended dm-era target.
>>>>
>>>> dm-era executes all metadata operations in a worker thread, which stops
>>>> processing requests when the target is suspended, and resumes again when
>>>> the target is resumed.
>>>>
>>>> So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
>>>> device blocks, until the device is resumed.
>>>>
>> Hi Zdenek,
>>
>> Thanks for the feedback. There doesn't seem to be any documentation
>> mentioning that loading the new table should happen before suspend, so
>> thanks a lot for explaining it.
>>
>> Unfortunately, this isn't what causes the deadlock. The following
>> sequence, which loads the table before suspend, also results in a
>> deadlock:
>>
>> 1. Create device with dm-era target
>>
>>    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
>> 2. Load new table to device, e.g., to resize the device
>>
>>    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"
>>
>> 3. Suspend the device
>>
>>    # dmsetup suspend eradev
>>
>> 4. Retrieve the status of the device. This blocks for the reasons I
>>    explained in my previous email.
>>
>>    # dmsetup status eradev
> 
> 
> Hi
> 
> Querying 'status' while the device is suspend is the next issue you need to fix in your workflow.
> 

Hi,

These steps are not my exact workflow. It's just a series of steps to
easily reproduce the bug.

I am not the one retrieving the status of the suspended device. LVM is.
LVM, when running commands like 'lvs' and 'vgs', retrieves the status of
the devices on the system using the DM_TABLE_STATUS ioctl.

LVM indeed uses the DM_NOFLUSH_FLAG, but this doesn't make a difference
for dm-era, since it doesn't check for this flag.

So, for example, a user or a monitoring daemon running an LVM command,
like 'vgs', at the "wrong" time triggers the bug:

1. Create device with dm-era target

    # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"

2. Load new table to device, e.g., to resize the device

    # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"

3. Suspend the device

    # dmsetup suspend eradev

4. Someone, e.g., a user or a monitoring daemon, runs an LVM command at
    this point, e.g. 'vgs'.

5. 'vgs' tries to retrieve the status of the dm-era device using the
    DM_TABLE_STATUS ioctl, and blocks.

6. Resume the device: This deadlocks.

    # dmsetup resume eradev

So, I can't change something in my workflow to prevent the bug. It's a
race that happens when someone runs an LVM command at the "wrong" time.

I am aware that using an appropriate LVM 'global_filter' can prevent
this, but:

1. This is just a workaround, not a proper solution.
2. This is not always an option. Imagine someone running an LVM command
    in a container, for example. Or, we may not be allowed to change the
    LVM configuration of the host at all.

> Normally 'status' operation may need to flush queued IO operations to get accurate data.
> 
> So you should query states before you start to mess with tables.
> 
> If you want to get 'status' without flushing - use:   'dmsetup status --noflush'.
> 

I am aware of that, and of the '--noflush' flag.

But, note, that:

1. As I have already explained in my previous emails, the reason of the
    deadlock is not I/O related.
2. dm-era doesn't check for this flag, so using it doesn't make a
    difference.
3. Other targets, e.g., dm-thin and dm-cache, that check for this flag,
    also check _explicitly_ if the device is suspended, before committing
    their metadata to get accurate statistics. They don't just rely on
    the user to use the '--noflush' flag.

That said, fixing 'era_status()' to check for the 'noflush' flag and to
check if the device is suspended, could be a possible fix, which I have
already proposed in my first email.

Although, as I have already explained, it's not a simple matter of not
committing metadata when the 'noflush' flag is used, or the device is
suspended.

dm-era queues the status operation (as well as all operations that touch
the metadata) for execution by a worker thread, to avoid using locks for
accessing the metadata.

When the target is suspended this thread doesn't execute operations, so
the 'table_status()' call blocks, holding the SRCU read lock of the
device (md->io_barrier), until the target is resumed.

But, 'table_status()' _never_ unblocks if you resume the device with a
new table preloaded. Instead, the resume operation ('dm_swap_table()')
deadlocks waiting for 'table_status()' to drop the SRCU read lock.

This never happens, and the only way to recover is to reboot.

> 
>> 5. Resume the device. This deadlocks for the reasons I explained in my
>>    previous email.
>>
>>    # dmsetup resume eradev
>>
>> 6. The dmesg logs are the same as the ones I included in my previous
>>    email.
>>
>> I have explained the reasons for the deadlock in my previous email, but
>> I would be more than happy to discuss them more.
>>
> 
> There is no bug - if your only problem is 'stuck'  status while you have devices in suspended state.
> 

As I explained previously, my problem is not 'stuck' status while the
device is suspended.

The issue is that if the suspended dm-era device has a new table
preloaded, the 'stuck' status results in 'stuck' resume.

And the only way to recover is by rebooting.

> You should NOT be doing basically anything while being suspend!!
> 

The documentation of the writecache target
(https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/writecache.html)
states that the following is the proper sequence for removing the cache
device:

1. send the "flush_on_suspend" message
2. load an inactive table with a linear target that maps to the
    underlying device
3. suspend the device
4. ask for status and verify that there are no errors
5. resume the device, so that it will use the linear target
6. the cache device is now inactive and it can be deleted

The above sequence, except from step 1 that is not applicable to dm-era,
is exactly the sequence of steps that triggers the bug for dm-era.

These steps, if run for dm-era, cause a deadlock.

So, although I understand your point about not doing anything with a
suspended device, it seems that this sequence of steps is not wrong, and
it is actually recommended by the writecache documentation.

Still, as I mentioned, I am not explicitly running the 'status'
operation on the suspended dm-era device. It's a race with LVM, which
runs it implicitly when running commands such as 'vgs' or 'lvs'.

> i.e. imagine you suspend 'swap' device and while you are in suspened state kernel decides to swap memory pages - so you get instantly frozen here.
> 
> For this reason lvm2 while doing  'suspend/resume' sequance preallocates all memory in front of this operation - does very minimal set of operation between suspend/resume to minimize also latencies and so on.
> 
> Clearly if you suspend just some 'supportive'  disk of yours - you likely are no in danger of blocking your swap - but the 'status --noflush' logic still applies.
> 

I get what you are describing about a 'swap' device, and I agree
completely.

But, this is not what happens in the case of dm-era.

Regards,
Nikos.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-08 20:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 17:07 [dm-devel] Deadlock when swapping a table with a dm-era target Nikos Tsironis
2021-12-02 15:41 ` Zdenek Kabelac
2021-12-03 14:42   ` Nikos Tsironis
2021-12-03 16:00     ` Zdenek Kabelac
2021-12-08 20:10       ` Nikos Tsironis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.