All of lore.kernel.org
 help / color / mirror / Atom feed
* lvmcache in writeback mode gets stuck flushing dirty blocks
@ 2019-07-30  4:58 Lakshmi Narasimhan Sundararajan
  2019-07-30  8:02 ` Nikhil Kshirsagar
  2019-07-30  8:21 ` lvmcache in writeback mode gets stuck flushing dirty blocks Zdenek Kabelac
  0 siblings, 2 replies; 11+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2019-07-30  4:58 UTC (permalink / raw)
  To: lvm-devel

Hi Team,
A very good day to all.

I am using lvmcache in writeback mode. When there are dirty blocks still in the lv, and if needs to be destroyed or flushed, then
It seems to me that there are some conditions under which the dirty data flush gets stuck forever.


As an example:
root at pdc4-sm35:~# lvremove -f pwx0/pool
  367 blocks must still be flushed.
  367 blocks must still be flushed.
  367 blocks must still be flushed.
  367 blocks must still be flushed.
  367 blocks must still be flushed.
  367 blocks must still be flushed.
^C
root at pdc4-sm35:~#

I am running these version:
root at pdc4-sm35:~# lvm version
  LVM version:     2.02.133(2) (2015-10-30)
  Library version: 1.02.110 (2015-10-30)
  Driver version:  4.34.0
root at pdc4-sm35:~#


This issue seems old and reported multiple places. There have been some acknowledgement that this issue is resolved in 2.02.133, but still I see it. Also, I have seen some posts report it in 2.02.170+ as well (here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2 Version: 2.02.173-1 Severity: normal)

I filed one here myself, https://github.com/lvmteam/lvm2/issues/22, trying  to understand from you experts where we are on this?

I would sincerely appreciate your help in understanding the state of this issue in more detail.

Best regards
LN
Sent from Mail for Windows 10

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190730/f1009a0a/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirty blocks
  2019-07-30  4:58 lvmcache in writeback mode gets stuck flushing dirty blocks Lakshmi Narasimhan Sundararajan
@ 2019-07-30  8:02 ` Nikhil Kshirsagar
  2019-07-30  8:15   ` Nikhil Kshirsagar
  2019-07-31  9:53   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
  2019-07-30  8:21 ` lvmcache in writeback mode gets stuck flushing dirty blocks Zdenek Kabelac
  1 sibling, 2 replies; 11+ messages in thread
From: Nikhil Kshirsagar @ 2019-07-30  8:02 UTC (permalink / raw)
  To: lvm-devel

This used to happen if the chunksize increased as a result of needing to
use more than a million chunks to store the size of the cached lv. What is
the size of the pool?

Regards,
Nikhil.

On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <
lns@portworx.com> wrote:

> Hi Team,
>
> A very good day to all.
>
>
> I am using lvmcache in writeback mode. When there are dirty blocks still
> in the lv, and if needs to be destroyed or flushed, then
>
> It seems to me that there are some conditions under which the dirty data
> flush gets stuck forever.
>
>
>
>
>
> As an example:
>
> root at pdc4-sm35:~# lvremove -f pwx0/pool
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
> ^C
>
> root at pdc4-sm35:~#
>
>
>
> I am running these version:
>
> root at pdc4-sm35:~# lvm version
>
>   LVM version:     2.02.133(2) (2015-10-30)
>
>   Library version: 1.02.110 (2015-10-30)
>
>   Driver version:  4.34.0
>
> root at pdc4-sm35:~#
>
>
>
>
>
> This issue seems old and reported multiple places. There have been some
> acknowledgement that this issue is resolved in 2.02.133, but still I see
> it. Also, I have seen some posts report it in 2.02.170+ as well (here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2
> Version: 2.02.173-1 Severity: normal)
>
>
>
> I filed one here myself, https://github.com/lvmteam/lvm2/issues/22,
> trying  to understand from you experts where we are on this?
>
>
>
> I would sincerely appreciate your help in understanding the state of this
> issue in more detail.
>
>
>
> Best regards
> LN
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190730/562a564e/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirty blocks
  2019-07-30  8:02 ` Nikhil Kshirsagar
@ 2019-07-30  8:15   ` Nikhil Kshirsagar
  2019-07-31  9:53   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
  1 sibling, 0 replies; 11+ messages in thread
From: Nikhil Kshirsagar @ 2019-07-30  8:15 UTC (permalink / raw)
  To: lvm-devel

Please see

https://bugzilla.redhat.com/show_bug.cgi?id=1665650
and
https://bugzilla.redhat.com/show_bug.cgi?id=1661987

Fixed by the migration threshold fix in
https://bugzilla.redhat.com/show_bug.cgi?id=1665654 I think.

Regards,
nikhil.

On Tue, Jul 30, 2019 at 1:32 PM Nikhil Kshirsagar <nkshirsa@redhat.com>
wrote:

> This used to happen if the chunksize increased as a result of needing to
> use more than a million chunks to store the size of the cached lv. What is
> the size of the pool?
>
> Regards,
> Nikhil.
>
> On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <
> lns at portworx.com> wrote:
>
>> Hi Team,
>>
>> A very good day to all.
>>
>>
>> I am using lvmcache in writeback mode. When there are dirty blocks still
>> in the lv, and if needs to be destroyed or flushed, then
>>
>> It seems to me that there are some conditions under which the dirty data
>> flush gets stuck forever.
>>
>>
>>
>>
>>
>> As an example:
>>
>> root at pdc4-sm35:~# lvremove -f pwx0/pool
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>> ^C
>>
>> root at pdc4-sm35:~#
>>
>>
>>
>> I am running these version:
>>
>> root at pdc4-sm35:~# lvm version
>>
>>   LVM version:     2.02.133(2) (2015-10-30)
>>
>>   Library version: 1.02.110 (2015-10-30)
>>
>>   Driver version:  4.34.0
>>
>> root at pdc4-sm35:~#
>>
>>
>>
>>
>>
>> This issue seems old and reported multiple places. There have been some
>> acknowledgement that this issue is resolved in 2.02.133, but still I see
>> it. Also, I have seen some posts report it in 2.02.170+ as well (here:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2
>> Version: 2.02.173-1 Severity: normal)
>>
>>
>>
>> I filed one here myself, https://github.com/lvmteam/lvm2/issues/22,
>> trying  to understand from you experts where we are on this?
>>
>>
>>
>> I would sincerely appreciate your help in understanding the state of this
>> issue in more detail.
>>
>>
>>
>> Best regards
>> LN
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>> --
>> lvm-devel mailing list
>> lvm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/lvm-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190730/dd00bd13/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirty blocks
  2019-07-30  4:58 lvmcache in writeback mode gets stuck flushing dirty blocks Lakshmi Narasimhan Sundararajan
  2019-07-30  8:02 ` Nikhil Kshirsagar
@ 2019-07-30  8:21 ` Zdenek Kabelac
  2019-07-30  9:23   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
  1 sibling, 1 reply; 11+ messages in thread
From: Zdenek Kabelac @ 2019-07-30  8:21 UTC (permalink / raw)
  To: lvm-devel

Dne 30. 07. 19 v 6:58 Lakshmi Narasimhan Sundararajan napsal(a):
> Hi Team,
> 
> A very good day to all.
> 
> 
> I am using lvmcache in writeback mode. When there are dirty blocks still in 
> the lv, and if needs to be destroyed or flushed, then
> 
> It seems to me that there are some conditions under which the dirty data flush 
> gets stuck forever.
> 
> As an example:
> 
> root at pdc4-sm35:~# lvremove -f pwx0/pool
> 
>  ? 367 blocks must still be flushed.
> 
>  ? 367 blocks must still be flushed.
> 
>  ? 367 blocks must still be flushed.

> I am running these version:
> 
> root at pdc4-sm35:~# lvm version
> 
>  ? LVM version:???? 2.02.133(2) (2015-10-30)
> 
> I filed one here myself, https://github.com/lvmteam/lvm2/issues/22, trying? to 
> understand from you experts where we are on this?
> 
> I would sincerely appreciate your help in understanding the state of this 
> issue in more detail.

Hi

Yep you are using very old version of lvm2 - there is already year 2019 - and 
in the initial releases of lvm2 with writeback cache support (as you happen to 
still use these days) there was a problem that uncaching was not switching to 
writethrough mode (and this was not the only one)

Please consider to use way newer lvm2 & kernel.

Regards

Zdenek



^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirtyblocks
  2019-07-30  8:21 ` lvmcache in writeback mode gets stuck flushing dirty blocks Zdenek Kabelac
@ 2019-07-30  9:23   ` Lakshmi Narasimhan Sundararajan
  2019-07-30 11:32     ` Zdenek Kabelac
  0 siblings, 1 reply; 11+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2019-07-30  9:23 UTC (permalink / raw)
  To: lvm-devel

Hi Zdenek,
Thank you for the acknowledging the issue.

I may not be at a liberty to choose the environment always, as most of the major distributions come bundled with 2.02.133
I have two followup questions.
1/ Is there a way to tell that a particular version has very critical bug (like the one I reported)? Nothing short of hitting it seem the way to confirm currently.
2/ which is the nearest stable release that addresses this particular issue?
3/ Does latest lvm stable work well in old distributions, Linux kernels too? Whats the compatibility matrix here? 

Regards
LN

From: Zdenek Kabelac
Sent: Tuesday, July 30, 2019 1:51 PM
To: LVM2 development; Lakshmi Narasimhan Sundararajan
Subject: Re: [lvm-devel] lvmcache in writeback mode gets stuck flushing dirtyblocks

Dne 30. 07. 19 v 6:58 Lakshmi Narasimhan Sundararajan napsal(a):
> Hi Team,
> 
> A very good day to all.
> 
> 
> I am using lvmcache in writeback mode. When there are dirty blocks still in 
> the lv, and if needs to be destroyed or flushed, then
> 
> It seems to me that there are some conditions under which the dirty data flush 
> gets stuck forever.
> 
> As an example:
> 
> root at pdc4-sm35:~# lvremove -f pwx0/pool
> 
>  ? 367 blocks must still be flushed.
> 
>  ? 367 blocks must still be flushed.
> 
>  ? 367 blocks must still be flushed.

> I am running these version:
> 
> root at pdc4-sm35:~# lvm version
> 
>  ? LVM version:???? 2.02.133(2) (2015-10-30)
> 
> I filed one here myself, https://github.com/lvmteam/lvm2/issues/22, trying? to 
> understand from you experts where we are on this?
> 
> I would sincerely appreciate your help in understanding the state of this 
> issue in more detail.

Hi

Yep you are using very old version of lvm2 - there is already year 2019 - and 
in the initial releases of lvm2 with writeback cache support (as you happen to 
still use these days) there was a problem that uncaching was not switching to 
writethrough mode (and this was not the only one)

Please consider to use way newer lvm2 & kernel.

Regards

Zdenek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190730/5a5c1a2e/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirtyblocks
  2019-07-30  9:23   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
@ 2019-07-30 11:32     ` Zdenek Kabelac
  0 siblings, 0 replies; 11+ messages in thread
From: Zdenek Kabelac @ 2019-07-30 11:32 UTC (permalink / raw)
  To: lvm-devel

Dne 30. 07. 19 v 11:23 Lakshmi Narasimhan Sundararajan napsal(a):
> Hi Zdenek,
> 
> Thank you for the acknowledging the issue.
> 
> I may not be at a liberty to choose the environment always, as most of the 
> major distributions come bundled with 2.02.133
> 
> I have two followup questions.
> 
> 1/ Is there a way to tell that a particular version has very critical bug 
> (like the one I reported)? Nothing short of hitting it seem the way to confirm 
> currently.


See  'stable-2.02' branch WHATS_NEW  file content:

https://sourceware.org/git/?p=lvm2.git;a=blob_plain;f=WHATS_NEW;hb=refs/heads/stable-2.02


Huge amount of bugfixes and improvements.


> 2/ which is the nearest stable release that addresses this particular issue?

2.02.185....

> 
> 3/ Does latest lvm stable work well in old distributions, Linux kernels too? 
> Whats the compatibility matrix here?


For cache or thin I'd not use anything below 4.20 kernel.

Also note - we do provide support upstream - not for every individual distro
out there in Universe.

If you want/need to backport individual fixes into your version - you will 
likely need to ask for this service your distro provider.

lvm2 is maintained to be fully backward compatible (version 2.02)
so it should work with almost any distros - to the point we are informed
about bug/problem :)

HEAD of git master is a bit more experimental (2.03)...

Regards

Zdenek



^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirtyblocks
  2019-07-30  8:02 ` Nikhil Kshirsagar
  2019-07-30  8:15   ` Nikhil Kshirsagar
@ 2019-07-31  9:53   ` Lakshmi Narasimhan Sundararajan
  2019-08-02 11:44     ` Nikhil Kshirsagar
  1 sibling, 1 reply; 11+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2019-07-31  9:53 UTC (permalink / raw)
  To: lvm-devel

Hi Nikhil,
Thank you for your email. Much appreciated.

In my environment, Chunksize is fixed at 1M irrespective of the pool size. This may take the number of entries over 1M and result in kernel warning. But the class of systems we are using are huge, and so the memory and cpu bottlenecks does not seem to be a factor in our testing.

I looked up at the bugs. The first one about chunksize > 1M, we should be safe on that given our chunksize is fixed at 1MB.
The other one about migration threshold is interesting, I will have to validate this again.

What would be the unit of migration threshold?  Is it the number of 512 byte sectors? And what exactly is its definition?

And also curiously this does not seem to be exported through lvm cli, need to fetch this only through dmsetup?

Thanks
LN
Sent from Mail for Windows 10

From: Nikhil Kshirsagar
Sent: Wednesday, July 31, 2019 3:04 PM
To: LVM2 development
Subject: Re: [lvm-devel] lvmcache in writeback mode gets stuck flushing dirtyblocks

This used to happen if the chunksize increased as a result of needing to use more than a million chunks to store the size of the cached lv. What is the size of the pool?

Regards,
Nikhil.

On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <lns@portworx.com> wrote:
Hi Team,
A very good day to all.

I am using lvmcache in writeback mode. When there are dirty blocks still in the lv, and if needs to be destroyed or flushed, then
It seems to me that there are some conditions under which the dirty data flush gets stuck forever.
?
?
As an example:
root at pdc4-sm35:~# lvremove -f pwx0/pool
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
^C
root at pdc4-sm35:~#
?
I am running these version:
root at pdc4-sm35:~# lvm version
? LVM version:???? 2.02.133(2) (2015-10-30)
? Library version: 1.02.110 (2015-10-30)
? Driver version:? 4.34.0
root at pdc4-sm35:~#
?
?
This issue seems old and reported multiple places. There have been some acknowledgement that this issue is resolved in 2.02.133, but still I see it. Also, I have seen some posts report it in 2.02.170+ as well (here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2 Version: 2.02.173-1 Severity: normal)
?
I filed one here myself, https://github.com/lvmteam/lvm2/issues/22, trying? to understand from you experts where we are on this?
?
I would sincerely appreciate your help in understanding the state of this issue in more detail.
?
Best regards
LN
Sent from Mail for Windows 10
?
--
lvm-devel mailing list
lvm-devel at redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190731/94cfcb69/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirtyblocks
  2019-07-31  9:53   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
@ 2019-08-02 11:44     ` Nikhil Kshirsagar
  2019-08-03  7:26       ` Nikhil Kshirsagar
  0 siblings, 1 reply; 11+ messages in thread
From: Nikhil Kshirsagar @ 2019-08-02 11:44 UTC (permalink / raw)
  To: lvm-devel

Hello,

You are welcome.

The migration threshold is in terms of chunks, I think.. So it should be at
least one chunk so the looping forever won't happen. The bug we found was
if chunksize goes beyond a certain value triggered by larger than one tb
sized cached lv, it ends up with migration threshold hard coded to lower
than the increased chunksize.

Yes migration threshold right now needs better documentation and
explanations. Also the ability to see it from lvm commands just like we can
see chunksize. We are working on it through the bzs mentioned earlier. (See
the bz about migration threshold needing better documentation in the man
pages)

I think right now you can get it only at the device mapper layer, will
check..

Regards,
Nikhil.



On Fri, 2 Aug, 2019, 5:09 PM Lakshmi Narasimhan Sundararajan, <
lns@portworx.com> wrote:

> Hi Nikhil,
> Thank you for your email. Much appreciated.
>
>
>
> In my environment, Chunksize is fixed at 1M irrespective of the pool size.
> This may take the number of entries over 1M and result in kernel warning.
> But the class of systems we are using are huge, and so the memory and cpu
> bottlenecks does not seem to be a factor in our testing.
>
>
>
> I looked up at the bugs. The first one about chunksize > 1M, we should be
> safe on that given our chunksize is fixed at 1MB.
>
> The other one about migration threshold is interesting, I will have to
> validate this again.
>
>
>
> What would be the unit of migration threshold?  Is it the number of 512
> byte sectors? And what exactly is its definition?
>
>
>
> And also curiously this does not seem to be exported through lvm cli, need
> to fetch this only through dmsetup?
>
>
>
> Thanks
>
> LN
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> *From: *Nikhil Kshirsagar <nkshirsa@redhat.com>
> *Sent: *Wednesday, July 31, 2019 3:04 PM
> *To: *LVM2 development <lvm-devel@redhat.com>
> *Subject: *Re: [lvm-devel] lvmcache in writeback mode gets stuck flushing
> dirtyblocks
>
>
>
> This used to happen if the chunksize increased as a result of needing to
> use more than a million chunks to store the size of the cached lv. What is
> the size of the pool?
>
>
>
> Regards,
>
> Nikhil.
>
>
>
> On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <
> lns at portworx.com> wrote:
>
> Hi Team,
>
> A very good day to all.
>
>
> I am using lvmcache in writeback mode. When there are dirty blocks still
> in the lv, and if needs to be destroyed or flushed, then
>
> It seems to me that there are some conditions under which the dirty data
> flush gets stuck forever.
>
>
>
>
>
> As an example:
>
> root at pdc4-sm35:~# lvremove -f pwx0/pool
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
>   367 blocks must still be flushed.
>
> ^C
>
> root at pdc4-sm35:~#
>
>
>
> I am running these version:
>
> root at pdc4-sm35:~# lvm version
>
>   LVM version:     2.02.133(2) (2015-10-30)
>
>   Library version: 1.02.110 (2015-10-30)
>
>   Driver version:  4.34.0
>
> root at pdc4-sm35:~#
>
>
>
>
>
> This issue seems old and reported multiple places. There have been some
> acknowledgement that this issue is resolved in 2.02.133, but still I see
> it. Also, I have seen some posts report it in 2.02.170+ as well (here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2
> Version: 2.02.173-1 Severity: normal)
>
>
>
> I filed one here myself, https://github.com/lvmteam/lvm2/issues/22,
> trying  to understand from you experts where we are on this?
>
>
>
> I would sincerely appreciate your help in understanding the state of this
> issue in more detail.
>
>
>
> Best regards
> LN
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
>
>
> --
> lvm-devel mailing list
> lvm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/lvm-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190802/a4558dd0/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushing dirtyblocks
  2019-08-02 11:44     ` Nikhil Kshirsagar
@ 2019-08-03  7:26       ` Nikhil Kshirsagar
  2019-08-07 12:14         ` lvmcache in writeback mode gets stuck flushingdirtyblocks Lakshmi Narasimhan Sundararajan
  0 siblings, 1 reply; 11+ messages in thread
From: Nikhil Kshirsagar @ 2019-08-03  7:26 UTC (permalink / raw)
  To: lvm-devel

Can you try increasing migration threshold through the device mapper
commands and check if this gets rid of the infinite flushes ?

On Fri, 2 Aug, 2019, 5:14 PM Nikhil Kshirsagar, <nkshirsa@redhat.com> wrote:

> Hello,
>
> You are welcome.
>
> The migration threshold is in terms of chunks, I think.. So it should be
> at least one chunk so the looping forever won't happen. The bug we found
> was if chunksize goes beyond a certain value triggered by larger than one
> tb sized cached lv, it ends up with migration threshold hard coded to lower
> than the increased chunksize.
>
> Yes migration threshold right now needs better documentation and
> explanations. Also the ability to see it from lvm commands just like we can
> see chunksize. We are working on it through the bzs mentioned earlier. (See
> the bz about migration threshold needing better documentation in the man
> pages)
>
> I think right now you can get it only at the device mapper layer, will
> check..
>
> Regards,
> Nikhil.
>
>
>
> On Fri, 2 Aug, 2019, 5:09 PM Lakshmi Narasimhan Sundararajan, <
> lns at portworx.com> wrote:
>
>> Hi Nikhil,
>> Thank you for your email. Much appreciated.
>>
>>
>>
>> In my environment, Chunksize is fixed at 1M irrespective of the pool
>> size. This may take the number of entries over 1M and result in kernel
>> warning. But the class of systems we are using are huge, and so the memory
>> and cpu bottlenecks does not seem to be a factor in our testing.
>>
>>
>>
>> I looked up at the bugs. The first one about chunksize > 1M, we should be
>> safe on that given our chunksize is fixed at 1MB.
>>
>> The other one about migration threshold is interesting, I will have to
>> validate this again.
>>
>>
>>
>> What would be the unit of migration threshold?  Is it the number of 512
>> byte sectors? And what exactly is its definition?
>>
>>
>>
>> And also curiously this does not seem to be exported through lvm cli,
>> need to fetch this only through dmsetup?
>>
>>
>>
>> Thanks
>>
>> LN
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>> *From: *Nikhil Kshirsagar <nkshirsa@redhat.com>
>> *Sent: *Wednesday, July 31, 2019 3:04 PM
>> *To: *LVM2 development <lvm-devel@redhat.com>
>> *Subject: *Re: [lvm-devel] lvmcache in writeback mode gets stuck
>> flushing dirtyblocks
>>
>>
>>
>> This used to happen if the chunksize increased as a result of needing to
>> use more than a million chunks to store the size of the cached lv. What is
>> the size of the pool?
>>
>>
>>
>> Regards,
>>
>> Nikhil.
>>
>>
>>
>> On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <
>> lns at portworx.com> wrote:
>>
>> Hi Team,
>>
>> A very good day to all.
>>
>>
>> I am using lvmcache in writeback mode. When there are dirty blocks still
>> in the lv, and if needs to be destroyed or flushed, then
>>
>> It seems to me that there are some conditions under which the dirty data
>> flush gets stuck forever.
>>
>>
>>
>>
>>
>> As an example:
>>
>> root at pdc4-sm35:~# lvremove -f pwx0/pool
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>>   367 blocks must still be flushed.
>>
>> ^C
>>
>> root at pdc4-sm35:~#
>>
>>
>>
>> I am running these version:
>>
>> root at pdc4-sm35:~# lvm version
>>
>>   LVM version:     2.02.133(2) (2015-10-30)
>>
>>   Library version: 1.02.110 (2015-10-30)
>>
>>   Driver version:  4.34.0
>>
>> root at pdc4-sm35:~#
>>
>>
>>
>>
>>
>> This issue seems old and reported multiple places. There have been some
>> acknowledgement that this issue is resolved in 2.02.133, but still I see
>> it. Also, I have seen some posts report it in 2.02.170+ as well (here:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2
>> Version: 2.02.173-1 Severity: normal)
>>
>>
>>
>> I filed one here myself, https://github.com/lvmteam/lvm2/issues/22,
>> trying  to understand from you experts where we are on this?
>>
>>
>>
>> I would sincerely appreciate your help in understanding the state of this
>> issue in more detail.
>>
>>
>>
>> Best regards
>> LN
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>> --
>> lvm-devel mailing list
>> lvm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/lvm-devel
>>
>>
>> --
>> lvm-devel mailing list
>> lvm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/lvm-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190803/5656d0b8/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuck flushingdirtyblocks
  2019-08-03  7:26       ` Nikhil Kshirsagar
@ 2019-08-07 12:14         ` Lakshmi Narasimhan Sundararajan
  2019-08-12  4:41           ` lvmcache in writeback mode gets stuckflushingdirtyblocks Lakshmi Narasimhan Sundararajan
  0 siblings, 1 reply; 11+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2019-08-07 12:14 UTC (permalink / raw)
  To: lvm-devel

Hi Nikhil,
So far with migration_threshold set to 20480 from original 2048 has not seen this problem. I shall keep you posted on further internal testing on this.

But I would like to understand more on the tunables we have with lvmcache. Can you please help refine the definitions and my understanding of the below.

Defaults:
migration_threshold 2048 
random_threshold 4 
sequential_threshold 512

1) Migration_threshold: This tunable controls how many sectors (512B) of data are pulled in or pushed out of cache. So all flush/writeback operations from the cache device operates in multiples of this threshold. There is no migration ever in writethrough cache.  Larger the number of sectors will help in moving larger context into/out of cache immediately and improve sequential performance, but shall adversely affect random performance.
2) Sequential_threshold: This tunable is a count of IO requests that have to be contiguous (start from last IO end) to treat incoming IO as sequential. Each IO can be of any size. As long as the next IO is contiguous it shall get counted. All IOs only after hitting the sequential_threshold shall be bypassed from cache. Even if one IO misses the sequential pattern from last IO, the threshold gets reset to zero? And all intervening IO are cached?
3) Random_threshold: This tunable is a count of IO requests that miss sequential condition to be considered as a random IO. In default condition, first 4 IO requests in the stream can never get cached. All IO between 4 and 512 requests in the stream get cached. And only after 512 requests does the caching module recognize incoming IO as sequential and stop caching further.

Outside of this I also see 3 other tunables.
    "read_promote_adjustment",
    "write_promote_adjustment",
    "discard_promote_adjustment"

To which I do not understand how this needs to be configured.
Are there any other tunables that I am not aware of.

Can you please help clarify on the same.

Regards
LN
Sent from Mail for Windows 10

From: Nikhil Kshirsagar
Sent: Monday, August 5, 2019 2:42 PM
To: LVM2 development
Subject: Re: [lvm-devel] lvmcache in writeback mode gets stuck flushingdirtyblocks

Can you try increasing migration threshold through the device mapper commands and check if this gets rid of the infinite flushes ?

On Fri, 2 Aug, 2019, 5:14 PM Nikhil Kshirsagar, <nkshirsa@redhat.com> wrote:
Hello,

You are welcome.

The migration threshold is in terms of chunks, I think.. So it should be at least one chunk so the looping forever won't happen. The bug we found was if chunksize goes beyond a certain value triggered by larger than one tb sized cached lv, it ends up with migration threshold hard coded to lower than the increased chunksize.

Yes migration threshold right now needs better documentation and explanations. Also the ability to see it from lvm commands just like we can see chunksize. We are working on it through the bzs mentioned earlier. (See the bz about migration threshold needing better documentation in the man pages)

I think right now you can get it only at the device mapper layer, will check..

Regards,
Nikhil.


On Fri, 2 Aug, 2019, 5:09 PM Lakshmi Narasimhan Sundararajan, <lns@portworx.com> wrote:
Hi Nikhil,
Thank you for your email. Much appreciated.
?
In my environment, Chunksize is fixed at 1M irrespective of the pool size. This may take the number of entries over 1M and result in kernel warning. But the class of systems we are using are huge, and so the memory and cpu bottlenecks does not seem to be a factor in our testing.
?
I looked up at the bugs. The first one about chunksize > 1M, we should be safe on that given our chunksize is fixed at 1MB.
The other one about migration threshold is interesting, I will have to validate this again.
?
What would be the unit of migration threshold?? Is it the number of 512 byte sectors? And what exactly is its definition?
?
And also curiously this does not seem to be exported through lvm cli, need to fetch this only through dmsetup?
?
Thanks
LN
Sent from Mail for Windows 10
?
From: Nikhil Kshirsagar
Sent: Wednesday, July 31, 2019 3:04 PM
To: LVM2 development
Subject: Re: [lvm-devel] lvmcache in writeback mode gets stuck flushing dirtyblocks
?
This used to happen if the chunksize increased as a result of needing to use more than a million chunks to store the size of the cached lv. What is the size of the pool?
?
Regards,
Nikhil.
?
On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <lns@portworx.com> wrote:
Hi Team,
A very good day to all.

I am using lvmcache in writeback mode. When there are dirty blocks still in the lv, and if needs to be destroyed or flushed, then
It seems to me that there are some conditions under which the dirty data flush gets stuck forever.
?
?
As an example:
root at pdc4-sm35:~# lvremove -f pwx0/pool
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
^C
root at pdc4-sm35:~#
?
I am running these version:
root at pdc4-sm35:~# lvm version
? LVM version:???? 2.02.133(2) (2015-10-30)
? Library version: 1.02.110 (2015-10-30)
? Driver version:? 4.34.0
root at pdc4-sm35:~#
?
?
This issue seems old and reported multiple places. There have been some acknowledgement that this issue is resolved in 2.02.133, but still I see it. Also, I have seen some posts report it in 2.02.170+ as well (here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2 Version: 2.02.173-1 Severity: normal)
?
I filed one here myself, https://github.com/lvmteam/lvm2/issues/22, trying? to understand from you experts where we are on this?
?
I would sincerely appreciate your help in understanding the state of this issue in more detail.
?
Best regards
LN
Sent from Mail for Windows 10
?
--
lvm-devel mailing list
lvm-devel at redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel
?
--
lvm-devel mailing list
lvm-devel at redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190807/8b9a51ad/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* lvmcache in writeback mode gets stuckflushingdirtyblocks
  2019-08-07 12:14         ` lvmcache in writeback mode gets stuck flushingdirtyblocks Lakshmi Narasimhan Sundararajan
@ 2019-08-12  4:41           ` Lakshmi Narasimhan Sundararajan
  0 siblings, 0 replies; 11+ messages in thread
From: Lakshmi Narasimhan Sundararajan @ 2019-08-12  4:41 UTC (permalink / raw)
  To: lvm-devel

Gentle reminder? I would sincerely appreciate clarification for the below.

Regards
LN
Sent from Mail for Windows 10

From: Lakshmi Narasimhan Sundararajan
Sent: Wednesday, August 7, 2019 5:44 PM
To: LVM2 development
Subject: RE: [lvm-devel] lvmcache in writeback mode gets stuckflushingdirtyblocks

Hi Nikhil,
So far with migration_threshold set to 20480 from original 2048 has not seen this problem. I shall keep you posted on further internal testing on this.

But I would like to understand more on the tunables we have with lvmcache. Can you please help refine the definitions and my understanding of the below.

Defaults:
migration_threshold 2048 
random_threshold 4 
sequential_threshold 512

1) Migration_threshold: This tunable controls how many sectors (512B) of data are pulled in or pushed out of cache. So all flush/writeback operations from the cache device operates in multiples of this threshold. There is no migration ever in writethrough cache. ?Larger the number of sectors will help in moving larger context into/out of cache immediately and improve sequential performance, but shall adversely affect random performance.
2) Sequential_threshold: This tunable is a count of IO requests that have to be contiguous (start from last IO end) to treat incoming IO as sequential. Each IO can be of any size. As long as the next IO is contiguous it shall get counted. All IOs only after hitting the sequential_threshold shall be bypassed from cache. Even if one IO misses the sequential pattern from last IO, the threshold gets reset to zero? And all intervening IO are cached?
3) Random_threshold: This tunable is a count of IO requests that miss sequential condition to be considered as a random IO. In default condition, first 4 IO requests in the stream can never get cached. All IO between 4 and 512 requests in the stream get cached. And only after 512 requests does the caching module recognize incoming IO as sequential and stop caching further.

Outside of this I also see 3 other tunables.
??? "read_promote_adjustment",
??? "write_promote_adjustment",
??? "discard_promote_adjustment"

To which I do not understand how this needs to be configured.
Are there any other tunables that I am not aware of.

Can you please help clarify on the same.

Regards
LN
Sent from Mail for Windows 10

From: Nikhil Kshirsagar
Sent: Monday, August 5, 2019 2:42 PM
To: LVM2 development
Subject: Re: [lvm-devel] lvmcache in writeback mode gets stuck flushingdirtyblocks

Can you try increasing migration threshold through the device mapper commands and check if this gets rid of the infinite flushes ?

On Fri, 2 Aug, 2019, 5:14 PM Nikhil Kshirsagar, <nkshirsa@redhat.com> wrote:
Hello,

You are welcome.

The migration threshold is in terms of chunks, I think.. So it should be at least one chunk so the looping forever won't happen. The bug we found was if chunksize goes beyond a certain value triggered by larger than one tb sized cached lv, it ends up with migration threshold hard coded to lower than the increased chunksize.

Yes migration threshold right now needs better documentation and explanations. Also the ability to see it from lvm commands just like we can see chunksize. We are working on it through the bzs mentioned earlier. (See the bz about migration threshold needing better documentation in the man pages)

I think right now you can get it only at the device mapper layer, will check..

Regards,
Nikhil.


On Fri, 2 Aug, 2019, 5:09 PM Lakshmi Narasimhan Sundararajan, <lns@portworx.com> wrote:
Hi Nikhil,
Thank you for your email. Much appreciated.
?
In my environment, Chunksize is fixed at 1M irrespective of the pool size. This may take the number of entries over 1M and result in kernel warning. But the class of systems we are using are huge, and so the memory and cpu bottlenecks does not seem to be a factor in our testing.
?
I looked up at the bugs. The first one about chunksize > 1M, we should be safe on that given our chunksize is fixed at 1MB.
The other one about migration threshold is interesting, I will have to validate this again.
?
What would be the unit of migration threshold?? Is it the number of 512 byte sectors? And what exactly is its definition?
?
And also curiously this does not seem to be exported through lvm cli, need to fetch this only through dmsetup?
?
Thanks
LN
Sent from Mail for Windows 10
?
From: Nikhil Kshirsagar
Sent: Wednesday, July 31, 2019 3:04 PM
To: LVM2 development
Subject: Re: [lvm-devel] lvmcache in writeback mode gets stuck flushing dirtyblocks
?
This used to happen if the chunksize increased as a result of needing to use more than a million chunks to store the size of the cached lv. What is the size of the pool?
?
Regards,
Nikhil.
?
On Tue, 30 Jul, 2019, 1:25 PM Lakshmi Narasimhan Sundararajan, <lns@portworx.com> wrote:
Hi Team,
A very good day to all.

I am using lvmcache in writeback mode. When there are dirty blocks still in the lv, and if needs to be destroyed or flushed, then
It seems to me that there are some conditions under which the dirty data flush gets stuck forever.
?
?
As an example:
root at pdc4-sm35:~# lvremove -f pwx0/pool
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
? 367 blocks must still be flushed.
^C
root at pdc4-sm35:~#
?
I am running these version:
root at pdc4-sm35:~# lvm version
? LVM version:???? 2.02.133(2) (2015-10-30)
? Library version: 1.02.110 (2015-10-30)
? Driver version:? 4.34.0
root at pdc4-sm35:~#
?
?
This issue seems old and reported multiple places. There have been some acknowledgement that this issue is resolved in 2.02.133, but still I see it. Also, I have seen some posts report it in 2.02.170+ as well (here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878441) (Package: lvm2 Version: 2.02.173-1 Severity: normal)
?
I filed one here myself, https://github.com/lvmteam/lvm2/issues/22, trying? to understand from you experts where we are on this?
?
I would sincerely appreciate your help in understanding the state of this issue in more detail.
?
Best regards
LN
Sent from Mail for Windows 10
?
--
lvm-devel mailing list
lvm-devel at redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel
?
--
lvm-devel mailing list
lvm-devel at redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190812/fb18713a/attachment.htm>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-12  4:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30  4:58 lvmcache in writeback mode gets stuck flushing dirty blocks Lakshmi Narasimhan Sundararajan
2019-07-30  8:02 ` Nikhil Kshirsagar
2019-07-30  8:15   ` Nikhil Kshirsagar
2019-07-31  9:53   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
2019-08-02 11:44     ` Nikhil Kshirsagar
2019-08-03  7:26       ` Nikhil Kshirsagar
2019-08-07 12:14         ` lvmcache in writeback mode gets stuck flushingdirtyblocks Lakshmi Narasimhan Sundararajan
2019-08-12  4:41           ` lvmcache in writeback mode gets stuckflushingdirtyblocks Lakshmi Narasimhan Sundararajan
2019-07-30  8:21 ` lvmcache in writeback mode gets stuck flushing dirty blocks Zdenek Kabelac
2019-07-30  9:23   ` lvmcache in writeback mode gets stuck flushing dirtyblocks Lakshmi Narasimhan Sundararajan
2019-07-30 11:32     ` Zdenek Kabelac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.