[dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze

All of lore.kernel.org
 help / color / mirror / Atom feed

* [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-21 20:34 Kirill Kirilenko
  2023-09-21 21:45   ` Mike Snitzer
  0 siblings, 1 reply; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-21 20:34 UTC (permalink / raw)
  To: Alasdair Kergon, Mike Snitzer; +Cc: dm-devel

Hello.

I created two LVM physical volumes: one on NVMe device and one on SATA 
SSD. I added them to a volume group and created a logical RAID1 volume 
in it. Then I enabled 'writemostly' flag on the second (slowest) PV.
And my system started to freeze at random times with no messages in 
syslog. I was able to determine that the freezing was happening during 
execution of 'fstrim' (via systemd timer). I checked this by running 
'fstrim' manually.
If I disable the 'writemostly' flag, I experience no freezes. I can 
reproduce this behavior on vanilla 6.5.0 kernel.

My LV is 150 GB ext4 volume, and it has lots of files in it, so running 
'fstrim' takes around a minute. This may be important.

Additional information:
OS: Linux Mint 21.2
CPU: AMD Ryzen 7 5800X
NVMe: Samsung SSD 980 500GB
SATA SSD: Samsung SSD 850 EVO M.2 250GB

Best regards,
Kirill Kirilenko.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-21 20:34 [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze Kirill Kirilenko
@ 2023-09-21 21:45   ` Mike Snitzer
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2023-09-21 21:45 UTC (permalink / raw)
  To: Kirill Kirilenko; +Cc: linux-raid, heinzm, dm-devel, Alasdair Kergon

[cc'ing Heinz and the linux-raid mailing list]

On Thu, Sep 21 2023 at  4:34P -0400,
Kirill Kirilenko <kirill@ultracoder.org> wrote:

> Hello.
> 
> I created two LVM physical volumes: one on NVMe device and one on SATA SSD.
> I added them to a volume group and created a logical RAID1 volume in it.
> Then I enabled 'writemostly' flag on the second (slowest) PV.
> And my system started to freeze at random times with no messages in syslog.
> I was able to determine that the freezing was happening during execution of
> 'fstrim' (via systemd timer). I checked this by running 'fstrim' manually.
> If I disable the 'writemostly' flag, I experience no freezes. I can
> reproduce this behavior on vanilla 6.5.0 kernel.
> 
> My LV is 150 GB ext4 volume, and it has lots of files in it, so running
> 'fstrim' takes around a minute. This may be important.
> 
> Additional information:
> OS: Linux Mint 21.2
> CPU: AMD Ryzen 7 5800X
> NVMe: Samsung SSD 980 500GB
> SATA SSD: Samsung SSD 850 EVO M.2 250GB
> 
> Best regards,
> Kirill Kirilenko.
> 

I just verified that 6.5.0 does have this DM core fix (needed to
prevent excessive splitting of discard IO.. which could cause fstrim
to take longer for a DM device), but again 6.5.0 has this fix so it
isn't relevant:
be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io

Given your use of 'writemostly' I'm inferring you're using lvm2's
raid1 that uses MD raid1 code in terms of the dm-raid target.

Discards (more generic term for fstrim) are considered writes, so
writemostly really shouldn't matter... but I know that there have been
issues with MD's writemostly code (identified by others relatively
recently).

All said: hopefully someone more MD oriented can review your report
and help you further.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-21 21:45   ` Mike Snitzer
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2023-09-21 21:45 UTC (permalink / raw)
  To: Kirill Kirilenko; +Cc: Alasdair Kergon, dm-devel, heinzm, linux-raid

[cc'ing Heinz and the linux-raid mailing list]

On Thu, Sep 21 2023 at  4:34P -0400,
Kirill Kirilenko <kirill@ultracoder.org> wrote:

> Hello.
> 
> I created two LVM physical volumes: one on NVMe device and one on SATA SSD.
> I added them to a volume group and created a logical RAID1 volume in it.
> Then I enabled 'writemostly' flag on the second (slowest) PV.
> And my system started to freeze at random times with no messages in syslog.
> I was able to determine that the freezing was happening during execution of
> 'fstrim' (via systemd timer). I checked this by running 'fstrim' manually.
> If I disable the 'writemostly' flag, I experience no freezes. I can
> reproduce this behavior on vanilla 6.5.0 kernel.
> 
> My LV is 150 GB ext4 volume, and it has lots of files in it, so running
> 'fstrim' takes around a minute. This may be important.
> 
> Additional information:
> OS: Linux Mint 21.2
> CPU: AMD Ryzen 7 5800X
> NVMe: Samsung SSD 980 500GB
> SATA SSD: Samsung SSD 850 EVO M.2 250GB
> 
> Best regards,
> Kirill Kirilenko.
> 

I just verified that 6.5.0 does have this DM core fix (needed to
prevent excessive splitting of discard IO.. which could cause fstrim
to take longer for a DM device), but again 6.5.0 has this fix so it
isn't relevant:
be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io

Given your use of 'writemostly' I'm inferring you're using lvm2's
raid1 that uses MD raid1 code in terms of the dm-raid target.

Discards (more generic term for fstrim) are considered writes, so
writemostly really shouldn't matter... but I know that there have been
issues with MD's writemostly code (identified by others relatively
recently).

All said: hopefully someone more MD oriented can review your report
and help you further.

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-21 21:45   ` Mike Snitzer
@ 2023-09-21 22:03     ` Roman Mamedov
  -1 siblings, 0 replies; 19+ messages in thread
From: Roman Mamedov @ 2023-09-21 22:03 UTC (permalink / raw)
  To: linux-raid; +Cc: Kirill Kirilenko, Alasdair Kergon, dm-devel, heinzm

On Thu, 21 Sep 2023 17:45:24 -0400
Mike Snitzer <snitzer@kernel.org> wrote:

> I just verified that 6.5.0 does have this DM core fix (needed to
> prevent excessive splitting of discard IO.. which could cause fstrim
> to take longer for a DM device), but again 6.5.0 has this fix so it
> isn't relevant:
> be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io
> 
> Given your use of 'writemostly' I'm inferring you're using lvm2's
> raid1 that uses MD raid1 code in terms of the dm-raid target.
> 
> Discards (more generic term for fstrim) are considered writes, so
> writemostly really shouldn't matter... but I know that there have been
> issues with MD's writemostly code (identified by others relatively
> recently).
> 
> All said: hopefully someone more MD oriented can review your report
> and help you further.
> 
> Mike  

I've reported that write-mostly TRIM gets split into 1MB pieces, which can be
an order of magnitude slower on some SSDs: https://www.spinics.net/lists/raid/msg72471.html

Nobody cared to reply, investigate or fix.

Maybe your system hasn't frozen too, just taking its time in processing all
the tiny split requests.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-21 22:03     ` Roman Mamedov
  0 siblings, 0 replies; 19+ messages in thread
From: Roman Mamedov @ 2023-09-21 22:03 UTC (permalink / raw)
  To: linux-raid; +Cc: Kirill Kirilenko, dm-devel, Alasdair Kergon, heinzm

On Thu, 21 Sep 2023 17:45:24 -0400
Mike Snitzer <snitzer@kernel.org> wrote:

> I just verified that 6.5.0 does have this DM core fix (needed to
> prevent excessive splitting of discard IO.. which could cause fstrim
> to take longer for a DM device), but again 6.5.0 has this fix so it
> isn't relevant:
> be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io
> 
> Given your use of 'writemostly' I'm inferring you're using lvm2's
> raid1 that uses MD raid1 code in terms of the dm-raid target.
> 
> Discards (more generic term for fstrim) are considered writes, so
> writemostly really shouldn't matter... but I know that there have been
> issues with MD's writemostly code (identified by others relatively
> recently).
> 
> All said: hopefully someone more MD oriented can review your report
> and help you further.
> 
> Mike  

I've reported that write-mostly TRIM gets split into 1MB pieces, which can be
an order of magnitude slower on some SSDs: https://www.spinics.net/lists/raid/msg72471.html

Nobody cared to reply, investigate or fix.

Maybe your system hasn't frozen too, just taking its time in processing all
the tiny split requests.

-- 
With respect,
Roman

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-21 22:03     ` [dm-devel] " Roman Mamedov
@ 2023-09-22 16:16       ` Kirill Kirilenko
  -1 siblings, 0 replies; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-22 16:16 UTC (permalink / raw)
  To: Mike Snitzer, Roman Mamedov; +Cc: linux-raid, heinzm, dm-devel, Alasdair Kergon

On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> Given your use of 'writemostly' I'm inferring you're using lvm2's
> raid1 that uses MD raid1 code in terms of the dm-raid target.

Yes, exactly.

On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> All said: hopefully someone more MD oriented can review your report
> and help you further.
Thank you. I don't need to send a new report to MD maintainers, do I?

On 22.09.2023 01:03 +0300, Roman Mamedov wrote:
> Maybe your system hasn't frozen too, just taking its time in processing all
> the tiny split requests.
I don't think so, because the disk activity light is off. Let me clarify:
if music was playing when the system froze, the last sound buffer begins 
to play cyclically.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-22 16:16       ` Kirill Kirilenko
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-22 16:16 UTC (permalink / raw)
  To: Mike Snitzer, Roman Mamedov; +Cc: Alasdair Kergon, heinzm, dm-devel, linux-raid

On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> Given your use of 'writemostly' I'm inferring you're using lvm2's
> raid1 that uses MD raid1 code in terms of the dm-raid target.

Yes, exactly.

On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> All said: hopefully someone more MD oriented can review your report
> and help you further.
Thank you. I don't need to send a new report to MD maintainers, do I?

On 22.09.2023 01:03 +0300, Roman Mamedov wrote:
> Maybe your system hasn't frozen too, just taking its time in processing all
> the tiny split requests.
I don't think so, because the disk activity light is off. Let me clarify:
if music was playing when the system froze, the last sound buffer begins 
to play cyclically.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-22 16:16       ` Kirill Kirilenko
@ 2023-09-22 23:08         ` Song Liu
  -1 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2023-09-22 23:08 UTC (permalink / raw)
  To: Kirill Kirilenko
  Cc: Mike Snitzer, Roman Mamedov, Alasdair Kergon, heinzm, dm-devel,
	linux-raid

Hi folks,

Thanks for the report. I will try to reproduce this issue next week.

Song

On Fri, Sep 22, 2023 at 9:25 AM Kirill Kirilenko <kirill@ultracoder.org> wrote:
>
> On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> > Given your use of 'writemostly' I'm inferring you're using lvm2's
> > raid1 that uses MD raid1 code in terms of the dm-raid target.
>
> Yes, exactly.
>
> On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> > All said: hopefully someone more MD oriented can review your report
> > and help you further.
> Thank you. I don't need to send a new report to MD maintainers, do I?
>
> On 22.09.2023 01:03 +0300, Roman Mamedov wrote:
> > Maybe your system hasn't frozen too, just taking its time in processing all
> > the tiny split requests.
> I don't think so, because the disk activity light is off. Let me clarify:
> if music was playing when the system froze, the last sound buffer begins
> to play cyclically.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-22 23:08         ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2023-09-22 23:08 UTC (permalink / raw)
  To: Kirill Kirilenko
  Cc: heinzm, Mike Snitzer, Roman Mamedov, linux-raid, dm-devel,
	Alasdair Kergon

Hi folks,

Thanks for the report. I will try to reproduce this issue next week.

Song

On Fri, Sep 22, 2023 at 9:25 AM Kirill Kirilenko <kirill@ultracoder.org> wrote:
>
> On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> > Given your use of 'writemostly' I'm inferring you're using lvm2's
> > raid1 that uses MD raid1 code in terms of the dm-raid target.
>
> Yes, exactly.
>
> On 22.09.2023 00:45 +0300, Mike Snitzer wrote:
> > All said: hopefully someone more MD oriented can review your report
> > and help you further.
> Thank you. I don't need to send a new report to MD maintainers, do I?
>
> On 22.09.2023 01:03 +0300, Roman Mamedov wrote:
> > Maybe your system hasn't frozen too, just taking its time in processing all
> > the tiny split requests.
> I don't think so, because the disk activity light is off. Let me clarify:
> if music was playing when the system froze, the last sound buffer begins
> to play cyclically.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-21 22:03     ` [dm-devel] " Roman Mamedov
@ 2023-09-25  2:58       ` Yu Kuai
  -1 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2023-09-25  2:58 UTC (permalink / raw)
  To: Roman Mamedov, linux-raid
  Cc: Kirill Kirilenko, Alasdair Kergon, dm-devel, heinzm, yukuai (C)

Hi,

在 2023/09/22 6:03, Roman Mamedov 写道:
> On Thu, 21 Sep 2023 17:45:24 -0400
> Mike Snitzer <snitzer@kernel.org> wrote:
> 
>> I just verified that 6.5.0 does have this DM core fix (needed to
>> prevent excessive splitting of discard IO.. which could cause fstrim
>> to take longer for a DM device), but again 6.5.0 has this fix so it
>> isn't relevant:
>> be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io
>>
>> Given your use of 'writemostly' I'm inferring you're using lvm2's
>> raid1 that uses MD raid1 code in terms of the dm-raid target.
>>
>> Discards (more generic term for fstrim) are considered writes, so
>> writemostly really shouldn't matter... but I know that there have been
>> issues with MD's writemostly code (identified by others relatively
>> recently).
>>
>> All said: hopefully someone more MD oriented can review your report
>> and help you further.
>>
>> Mike
> 
> I've reported that write-mostly TRIM gets split into 1MB pieces, which can be
> an order of magnitude slower on some SSDs: https://www.spinics.net/lists/raid/msg72471.html

Looks like I missed the report.

Based on code review, it's very clearly where diskcard bio is splited:

raid1_write_request
  for (i = 0;  i < disks; i++)
   if (rdev && test_bit(WriteMostly, &rdev->flags))
    write_behind = true

  if (write_behind && bitmap)
   max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * (PAGE_SIZE >> 9))
   // io size is 512 * (256 * (4k >> 9)) = 1M

  if (max_sectors < bio_sectors(bio))
   bio_split

Roman and Kirill, can you test the following patch?

Thanks,
Kuai

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4b30a1742162..4963f864ef99 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev 
*mddev, struct bio *bio,
         int first_clone;
         int max_sectors;
         bool write_behind = false;
+       bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);

         if (mddev_is_clustered(mddev) &&
              md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev 
*mddev, struct bio *bio,
                  * write-mostly, which means we could allocate write behind
                  * bio later.
                  */
-               if (rdev && test_bit(WriteMostly, &rdev->flags))
+               if (!is_discard && rdev && test_bit(WriteMostly, 
&rdev->flags))
                         write_behind = true;

                 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {


> 
> Nobody cared to reply, investigate or fix.
> 
> Maybe your system hasn't frozen too, just taking its time in processing all
> the tiny split requests.
> 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-25  2:58       ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2023-09-25  2:58 UTC (permalink / raw)
  To: Roman Mamedov, linux-raid
  Cc: Kirill Kirilenko, dm-devel, yukuai (C), Alasdair Kergon, heinzm

Hi,

在 2023/09/22 6:03, Roman Mamedov 写道:
> On Thu, 21 Sep 2023 17:45:24 -0400
> Mike Snitzer <snitzer@kernel.org> wrote:
> 
>> I just verified that 6.5.0 does have this DM core fix (needed to
>> prevent excessive splitting of discard IO.. which could cause fstrim
>> to take longer for a DM device), but again 6.5.0 has this fix so it
>> isn't relevant:
>> be04c14a1bd2 dm: use op specific max_sectors when splitting abnormal io
>>
>> Given your use of 'writemostly' I'm inferring you're using lvm2's
>> raid1 that uses MD raid1 code in terms of the dm-raid target.
>>
>> Discards (more generic term for fstrim) are considered writes, so
>> writemostly really shouldn't matter... but I know that there have been
>> issues with MD's writemostly code (identified by others relatively
>> recently).
>>
>> All said: hopefully someone more MD oriented can review your report
>> and help you further.
>>
>> Mike
> 
> I've reported that write-mostly TRIM gets split into 1MB pieces, which can be
> an order of magnitude slower on some SSDs: https://www.spinics.net/lists/raid/msg72471.html

Looks like I missed the report.

Based on code review, it's very clearly where diskcard bio is splited:

raid1_write_request
  for (i = 0;  i < disks; i++)
   if (rdev && test_bit(WriteMostly, &rdev->flags))
    write_behind = true

  if (write_behind && bitmap)
   max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * (PAGE_SIZE >> 9))
   // io size is 512 * (256 * (4k >> 9)) = 1M

  if (max_sectors < bio_sectors(bio))
   bio_split

Roman and Kirill, can you test the following patch?

Thanks,
Kuai

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4b30a1742162..4963f864ef99 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev 
*mddev, struct bio *bio,
         int first_clone;
         int max_sectors;
         bool write_behind = false;
+       bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);

         if (mddev_is_clustered(mddev) &&
              md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev 
*mddev, struct bio *bio,
                  * write-mostly, which means we could allocate write behind
                  * bio later.
                  */
-               if (rdev && test_bit(WriteMostly, &rdev->flags))
+               if (!is_discard && rdev && test_bit(WriteMostly, 
&rdev->flags))
                         write_behind = true;

                 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {


> 
> Nobody cared to reply, investigate or fix.
> 
> Maybe your system hasn't frozen too, just taking its time in processing all
> the tiny split requests.
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-25  2:58       ` [dm-devel] " Yu Kuai
@ 2023-09-25 23:59         ` Kirill Kirilenko
  -1 siblings, 0 replies; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-25 23:59 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Roman Mamedov, Song Liu, linux-raid, dm-devel

On 25.09.2023 05:58 +0300, Yu Kuai wrote:
> Roman and Kirill, can you test the following patch?
>
> Thanks,
> Kuai
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 4b30a1742162..4963f864ef99 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev
> *mddev, struct bio *bio,
>         int first_clone;
>         int max_sectors;
>         bool write_behind = false;
> +       bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
>
>         if (mddev_is_clustered(mddev) &&
>              md_cluster_ops->area_resyncing(mddev, WRITE,
> @@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev
> *mddev, struct bio *bio,
>                  * write-mostly, which means we could allocate write
> behind
>                  * bio later.
>                  */
> -               if (rdev && test_bit(WriteMostly, &rdev->flags))
> +               if (!is_discard && rdev && test_bit(WriteMostly,
> &rdev->flags))
>                         write_behind = true;
>
>                 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {

Thank you. I can confirm, that your patch eliminates freezes during
'fstrim' execution. Tested on kernel 6.5.0.
Still 'fstrim' takes more than 2 minutes, but I believe it's normal to a
file system with 1M+ inodes.

Probably I'm wrong here, but to me this doesn't look like a solution,
more like a masking the real problem.
Even with TRIM operations split in 1MB pieces, I don't expect kernel to
freeze.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-25 23:59         ` Kirill Kirilenko
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-25 23:59 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, Song Liu, dm-devel, Roman Mamedov

On 25.09.2023 05:58 +0300, Yu Kuai wrote:
> Roman and Kirill, can you test the following patch?
>
> Thanks,
> Kuai
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 4b30a1742162..4963f864ef99 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev
> *mddev, struct bio *bio,
>         int first_clone;
>         int max_sectors;
>         bool write_behind = false;
> +       bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
>
>         if (mddev_is_clustered(mddev) &&
>              md_cluster_ops->area_resyncing(mddev, WRITE,
> @@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev
> *mddev, struct bio *bio,
>                  * write-mostly, which means we could allocate write
> behind
>                  * bio later.
>                  */
> -               if (rdev && test_bit(WriteMostly, &rdev->flags))
> +               if (!is_discard && rdev && test_bit(WriteMostly,
> &rdev->flags))
>                         write_behind = true;
>
>                 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {

Thank you. I can confirm, that your patch eliminates freezes during
'fstrim' execution. Tested on kernel 6.5.0.
Still 'fstrim' takes more than 2 minutes, but I believe it's normal to a
file system with 1M+ inodes.

Probably I'm wrong here, but to me this doesn't look like a solution,
more like a masking the real problem.
Even with TRIM operations split in 1MB pieces, I don't expect kernel to
freeze.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-25 23:59         ` [dm-devel] " Kirill Kirilenko
@ 2023-09-26  3:28           ` Yu Kuai
  -1 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2023-09-26  3:28 UTC (permalink / raw)
  To: Kirill Kirilenko, Yu Kuai
  Cc: Roman Mamedov, Song Liu, linux-raid, dm-devel, yukuai (C)

Hi,

在 2023/09/26 7:59, Kirill Kirilenko 写道:
> On 25.09.2023 05:58 +0300, Yu Kuai wrote:
>> Roman and Kirill, can you test the following patch?
>>
>> Thanks,
>> Kuai
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 4b30a1742162..4963f864ef99 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev
>> *mddev, struct bio *bio,
>>          int first_clone;
>>          int max_sectors;
>>          bool write_behind = false;
>> +       bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
>>
>>          if (mddev_is_clustered(mddev) &&
>>               md_cluster_ops->area_resyncing(mddev, WRITE,
>> @@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev
>> *mddev, struct bio *bio,
>>                   * write-mostly, which means we could allocate write
>> behind
>>                   * bio later.
>>                   */
>> -               if (rdev && test_bit(WriteMostly, &rdev->flags))
>> +               if (!is_discard && rdev && test_bit(WriteMostly,
>> &rdev->flags))
>>                          write_behind = true;
>>
>>                  if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
> 
> Thank you. I can confirm, that your patch eliminates freezes during
> 'fstrim' execution. Tested on kernel 6.5.0.
> Still 'fstrim' takes more than 2 minutes, but I believe it's normal to a
> file system with 1M+ inodes.
Thanks for the test.

> 
> Probably I'm wrong here, but to me this doesn't look like a solution,
> more like a masking the real problem.
> Even with TRIM operations split in 1MB pieces, I don't expect kernel to
> freeze.

I still don't quite understand what you mean 'kernel freeze', this patch
indeed fix a problem that diskcard bio is treated as normal write bio
and it's splitted.

Can you explain more by how do you judge 'kernel freeze'? In the
meantime dose 'iostat -dmx 1' shows that disk is idle and no dicard io
is handled?

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-26  3:28           ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2023-09-26  3:28 UTC (permalink / raw)
  To: Kirill Kirilenko, Yu Kuai
  Cc: linux-raid, Song Liu, dm-devel, Roman Mamedov, yukuai (C)

Hi,

在 2023/09/26 7:59, Kirill Kirilenko 写道:
> On 25.09.2023 05:58 +0300, Yu Kuai wrote:
>> Roman and Kirill, can you test the following patch?
>>
>> Thanks,
>> Kuai
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 4b30a1742162..4963f864ef99 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1345,6 +1345,7 @@ static void raid1_write_request(struct mddev
>> *mddev, struct bio *bio,
>>          int first_clone;
>>          int max_sectors;
>>          bool write_behind = false;
>> +       bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
>>
>>          if (mddev_is_clustered(mddev) &&
>>               md_cluster_ops->area_resyncing(mddev, WRITE,
>> @@ -1405,7 +1406,7 @@ static void raid1_write_request(struct mddev
>> *mddev, struct bio *bio,
>>                   * write-mostly, which means we could allocate write
>> behind
>>                   * bio later.
>>                   */
>> -               if (rdev && test_bit(WriteMostly, &rdev->flags))
>> +               if (!is_discard && rdev && test_bit(WriteMostly,
>> &rdev->flags))
>>                          write_behind = true;
>>
>>                  if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
> 
> Thank you. I can confirm, that your patch eliminates freezes during
> 'fstrim' execution. Tested on kernel 6.5.0.
> Still 'fstrim' takes more than 2 minutes, but I believe it's normal to a
> file system with 1M+ inodes.
Thanks for the test.

> 
> Probably I'm wrong here, but to me this doesn't look like a solution,
> more like a masking the real problem.
> Even with TRIM operations split in 1MB pieces, I don't expect kernel to
> freeze.

I still don't quite understand what you mean 'kernel freeze', this patch
indeed fix a problem that diskcard bio is treated as normal write bio
and it's splitted.

Can you explain more by how do you judge 'kernel freeze'? In the
meantime dose 'iostat -dmx 1' shows that disk is idle and no dicard io
is handled?

Thanks,
Kuai

> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
       [not found]           ` <a4d3f9b0-15d5-4a90-f2c1-cad633badbbf@ultracoder.org>
@ 2023-09-26 13:21               ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2023-09-26 13:21 UTC (permalink / raw)
  To: Kirill Kirilenko, Yu Kuai
  Cc: Roman Mamedov, Song Liu, linux-raid, dm-devel, yukuai (C)

Hi,

在 2023/09/26 21:12, Kirill Kirilenko 写道:
> On 26.09.2023 06:28 +0300, Yu Kuai wrote:
>> I still don't quite understand what you mean 'kernel freeze', this patch
>> indeed fix a problem that diskcard bio is treated as normal write bio
>> and it's splitted.
>>
>> Can you explain more by how do you judge 'kernel freeze'? In the
>> meantime dose 'iostat -dmx 1' shows that disk is idle and no dicard io
>> is handled?
> 
> I mean, keyboard and mouse stop working, screen stops updating,
> sound card starts playing last audio buffer endlessly. At the same time,
> the disk activity indicator goes off.

This means cpu is busy with something, in this case you must use top or
perf to figure out what all your cpus are doing, probably issue io and
handle io interrupt.

> 
> I've attached the last output of 'iostat -dmx 1'. My RAID1 LV is 'dm-4',
> the underlying PVs are 'nvme0n1' and 'sda'. But the update interval
> is 1 second, may be at the moment of freezing all discards have already
> been completed.

iostat shows that disk is handling about 600 write, 1200 discard and 900
flush each second. I'm not sure what 'disk activity indicator goes off'
means, but your disk is really busy with handling all these io.

Thanks,
Kuai

> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-26 13:21               ` Yu Kuai
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2023-09-26 13:21 UTC (permalink / raw)
  To: Kirill Kirilenko, Yu Kuai
  Cc: linux-raid, Song Liu, dm-devel, Roman Mamedov, yukuai (C)

Hi,

在 2023/09/26 21:12, Kirill Kirilenko 写道:
> On 26.09.2023 06:28 +0300, Yu Kuai wrote:
>> I still don't quite understand what you mean 'kernel freeze', this patch
>> indeed fix a problem that diskcard bio is treated as normal write bio
>> and it's splitted.
>>
>> Can you explain more by how do you judge 'kernel freeze'? In the
>> meantime dose 'iostat -dmx 1' shows that disk is idle and no dicard io
>> is handled?
> 
> I mean, keyboard and mouse stop working, screen stops updating,
> sound card starts playing last audio buffer endlessly. At the same time,
> the disk activity indicator goes off.

This means cpu is busy with something, in this case you must use top or
perf to figure out what all your cpus are doing, probably issue io and
handle io interrupt.

> 
> I've attached the last output of 'iostat -dmx 1'. My RAID1 LV is 'dm-4',
> the underlying PVs are 'nvme0n1' and 'sda'. But the update interval
> is 1 second, may be at the moment of freezing all discards have already
> been completed.

iostat shows that disk is handling about 600 write, 1200 discard and 900
flush each second. I'm not sure what 'disk activity indicator goes off'
means, but your disk is really busy with handling all these io.

Thanks,
Kuai

> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: fstrim on raid1 LV with writemostly PV leads to system freeze
  2023-09-26 13:21               ` [dm-devel] " Yu Kuai
@ 2023-09-26 20:27                 ` Kirill Kirilenko
  -1 siblings, 0 replies; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-26 20:27 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Song Liu, yukuai (C), linux-raid, dm-devel

On 26.09.2023 16:21 +0300, Yu Kuai wrote:
> This means cpu is busy with something, in this case you must use top or
> perf to figure out what all your cpus are doing, probably issue io and
> handle io interrupt.

OK, here is the last output of 'perf top -d 1' before system froze:

4.00%  [kernel]  ledtrig_disk_activity
3.86%  [kernel]  led_trigger_blink_oneshot
1.93%  [kernel]  read_tsc
1.68%  perf      hist_entry__sort
1.62%  [kernel]  menu_select
1.57%  [kernel]  psi_group_change
1.19%  [kernel]  native_sched_clock
0.96%  [kernel]  scsi_complete
0.94%  [kernel]  __raw_spin_lock_irqsave
0.80%  perf      perf_hpp__is_dynamic_entry
...

And here is the last output of 'top':

load average: 1.48, 0.90, 0.38
%Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 93.5 id, 6.3 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem:  32005.3 total, 29138.3 free, 1327.0 used,   1540.0 buff/cache
MiB Swap: 16382.0 total, 16382.0 free,    0.0 used.  30244.5 avail Mem
S %CPU COMMAND
S 0.7  [mdX_raid1]
S 0.7  cinnamon --replace
I 0.3  [kworker/u64:2-events_unbound]
I 0.3  [kworker/u64:6-events_freezable_power_]
S 0.3  [gfx_low]
S 0.3  /usr/bin/containerd
S 0.3  /usr/lib/xorg/Xorg -core :0 ...
S 0.3  /usr/libexec/gnome-terminal-server
R 0.3  top
D 0.3  fstrim /home
...

I think, there are only two possibilities: either CPU is not that busy,
or it gets busy very quickly, so we can not see it like that. I have no
experience in kernel debugging. Maybe someone knows, how I can get
more accurate data when system freezes?

On 26.09.2023 16:21 +0300, Yu Kuai wrote:
> I'm not sure what 'disk activity indicator goes off' means

It means, the LED on my computer case, indicating disk activity, goes off.
According to 'perf' output, LED control is the largest contributor to
CPU load. :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze
@ 2023-09-26 20:27                 ` Kirill Kirilenko
  0 siblings, 0 replies; 19+ messages in thread
From: Kirill Kirilenko @ 2023-09-26 20:27 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, Song Liu, dm-devel, yukuai (C)

On 26.09.2023 16:21 +0300, Yu Kuai wrote:
> This means cpu is busy with something, in this case you must use top or
> perf to figure out what all your cpus are doing, probably issue io and
> handle io interrupt.

OK, here is the last output of 'perf top -d 1' before system froze:

4.00%  [kernel]  ledtrig_disk_activity
3.86%  [kernel]  led_trigger_blink_oneshot
1.93%  [kernel]  read_tsc
1.68%  perf      hist_entry__sort
1.62%  [kernel]  menu_select
1.57%  [kernel]  psi_group_change
1.19%  [kernel]  native_sched_clock
0.96%  [kernel]  scsi_complete
0.94%  [kernel]  __raw_spin_lock_irqsave
0.80%  perf      perf_hpp__is_dynamic_entry
...

And here is the last output of 'top':

load average: 1.48, 0.90, 0.38
%Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 93.5 id, 6.3 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem:  32005.3 total, 29138.3 free, 1327.0 used,   1540.0 buff/cache
MiB Swap: 16382.0 total, 16382.0 free,    0.0 used.  30244.5 avail Mem
S %CPU COMMAND
S 0.7  [mdX_raid1]
S 0.7  cinnamon --replace
I 0.3  [kworker/u64:2-events_unbound]
I 0.3  [kworker/u64:6-events_freezable_power_]
S 0.3  [gfx_low]
S 0.3  /usr/bin/containerd
S 0.3  /usr/lib/xorg/Xorg -core :0 ...
S 0.3  /usr/libexec/gnome-terminal-server
R 0.3  top
D 0.3  fstrim /home
...

I think, there are only two possibilities: either CPU is not that busy,
or it gets busy very quickly, so we can not see it like that. I have no
experience in kernel debugging. Maybe someone knows, how I can get
more accurate data when system freezes?

On 26.09.2023 16:21 +0300, Yu Kuai wrote:
> I'm not sure what 'disk activity indicator goes off' means

It means, the LED on my computer case, indicating disk activity, goes off.
According to 'perf' output, LED control is the largest contributor to
CPU load. :)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-09-26 20:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 20:34 [dm-devel] fstrim on raid1 LV with writemostly PV leads to system freeze Kirill Kirilenko
2023-09-21 21:45 ` Mike Snitzer
2023-09-21 21:45   ` Mike Snitzer
2023-09-21 22:03   ` Roman Mamedov
2023-09-21 22:03     ` [dm-devel] " Roman Mamedov
2023-09-22 16:16     ` Kirill Kirilenko
2023-09-22 16:16       ` Kirill Kirilenko
2023-09-22 23:08       ` Song Liu
2023-09-22 23:08         ` [dm-devel] " Song Liu
2023-09-25  2:58     ` Yu Kuai
2023-09-25  2:58       ` [dm-devel] " Yu Kuai
2023-09-25 23:59       ` Kirill Kirilenko
2023-09-25 23:59         ` [dm-devel] " Kirill Kirilenko
2023-09-26  3:28         ` Yu Kuai
2023-09-26  3:28           ` [dm-devel] " Yu Kuai
     [not found]           ` <a4d3f9b0-15d5-4a90-f2c1-cad633badbbf@ultracoder.org>
2023-09-26 13:21             ` Yu Kuai
2023-09-26 13:21               ` [dm-devel] " Yu Kuai
2023-09-26 20:27               ` Kirill Kirilenko
2023-09-26 20:27                 ` [dm-devel] " Kirill Kirilenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.