linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] It looks wrong for the timeout when lvm test running
@ 2019-12-09  9:20 Heming Zhao
  2019-12-09  9:40 ` Zdenek Kabelac
  0 siblings, 1 reply; 7+ messages in thread
From: Heming Zhao @ 2019-12-09  9:20 UTC (permalink / raw)
  To: linux-lvm

Hello List,

The lvm test default timeout (--timeout) default value is 180.
But I met below condition when running testcase (in virtual machine):
```
make check_local T=shell/snapshot-merge.sh
    ... ...
###       passed: [ndev-vanilla] shell/snapshot-merge.sh 665
    ... ...
```

Obviously, the timeout doesn't work normally.
If I change below code, the timeout work as expected. (in lib/brick-shelltest.h)
```
         if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
             silent_start = end; /* something happened */
             io.sync( false );
         }
```

And after change the code, some testcase will become timeout, which may pass before.


The whole patch as below:
```
 From bdbd569c2b9ab49d4a4246f203b1cd9f7049db97 Mon Sep 17 00:00:00 2001
From: Zhao Heming <heming.zhao@suse.com>
Date: Mon, 9 Dec 2019 17:09:17 +0800
Subject: [PATCH] test: fix timeout abnormal issue

---
  test/lib/brick-shelltest.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/lib/brick-shelltest.h b/test/lib/brick-shelltest.h
index b58f3d6843..12e08249f3 100644
--- a/test/lib/brick-shelltest.h
+++ b/test/lib/brick-shelltest.h
@@ -822,7 +822,7 @@ struct TestCase {
              }
          }
          if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
-            silent_start = end; /* something happened */
+            //silent_start = end; /* something happened */
              io.sync( false );
          }
  
-- 
2.24.0
```

Thanks.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] It looks wrong for the timeout when lvm test running
  2019-12-09  9:20 [linux-lvm] It looks wrong for the timeout when lvm test running Heming Zhao
@ 2019-12-09  9:40 ` Zdenek Kabelac
  2019-12-09 10:58   ` Heming Zhao
  0 siblings, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2019-12-09  9:40 UTC (permalink / raw)
  To: LVM general discussion and development, Heming Zhao

Dne 09. 12. 19 v 10:20 Heming Zhao napsal(a):
> Hello List,
> 
> The lvm test default timeout (--timeout) default value is 180.
> But I met below condition when running testcase (in virtual machine):
> ```
> make check_local T=shell/snapshot-merge.sh
>      ... ...
> ###       passed: [ndev-vanilla] shell/snapshot-merge.sh 665
>      ... ...
> ```
> 
> Obviously, the timeout doesn't work normally.
> If I change below code, the timeout work as expected. (in lib/brick-shelltest.h)
> ```
>           if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
>               silent_start = end; /* something happened */
>               io.sync( false );
>           }


Hi


Can you please provide (compressed) output of:

'make check_local T=snapshot-merge.sh VERBOSE=1 LVM_TEST_AUX_TRACE=1'

Thanks

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] It looks wrong for the timeout when lvm test running
  2019-12-09  9:40 ` Zdenek Kabelac
@ 2019-12-09 10:58   ` Heming Zhao
  2019-12-09 11:36     ` Zdenek Kabelac
  0 siblings, 1 reply; 7+ messages in thread
From: Heming Zhao @ 2019-12-09 10:58 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 2698 bytes --]

Hello Zdenek,

Please check the compressed file in attachment.

my env:  suse tumbleweed, running in qemu virtual machine (4 vcpu 5g mem).
uname -a:
Linux tb-base40g 5.3.8-1-default #1 SMP Wed Oct 30 07:12:05 UTC 2019 (ea4c828) x86_64 x86_64 x86_64 GNU/Linu

lvm2 version:
# ./tools/lvm version
   LVM version:     2.03.08(2)-git (2019-11-30)
   Library version: 1.02.169-git (2019-11-30)
   Driver version:  4.40.0
   Configuration:   ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --enable-dmeventd --enable-cmdlib --enable-udev_rules --enable-udev_sync --with-udev-prefix=/usr/ --enable-selinux --enable-pkgconfig --with-usrlibdir=/usr/lib64 --with-usrsbindir=/usr/sbin --with-default-dm-run-dir=/run --with-tmpfilesdir=/usr/lib/tmpfiles.d --with-thin=internal --with-device-gid=6 --with-device-mode=0640 --with-device-uid=0 --with-dmeventd-path=/usr/sbin/dmeventd --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-blkid_wiping --enable-lvmpolld --enable-realtime --with-cache=internal --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --enable-cmirrord --enable-fsadm --disable-silent-rules --enable-write_install --with-vdo=internal --with-vdo-format=/usr/bin/vdoformat --enable-debug --enable-dbus-service --with-writecache=internal


On 12/9/19 5:40 PM, Zdenek Kabelac wrote:
> Dne 09. 12. 19 v 10:20 Heming Zhao napsal(a):
>> Hello List,
>>
>> The lvm test default timeout (--timeout) default value is 180.
>> But I met below condition when running testcase (in virtual machine):
>> ```
>> make check_local T=shell/snapshot-merge.sh
>>      ... ...
>> ###       passed: [ndev-vanilla] shell/snapshot-merge.sh 665
>>      ... ...
>> ```
>>
>> Obviously, the timeout doesn't work normally.
>> If I change below code, the timeout work as expected. (in lib/brick-shelltest.h)
>> ```
>>           if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
>>               silent_start = end; /* something happened */
>>               io.sync( false );
>>           }
> 
> 
> Hi
> 
> 
> Can you please provide (compressed) output of:
> 
> 'make check_local T=snapshot-merge.sh VERBOSE=1 LVM_TEST_AUX_TRACE=1'
> 
> Thanks
> 
> Zdenek
> 
> 
> 
> 

[-- Attachment #2: snapshot-merge.sh.txt.tgz --]
[-- Type: application/x-compressed-tar, Size: 12430 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] It looks wrong for the timeout when lvm test running
  2019-12-09 10:58   ` Heming Zhao
@ 2019-12-09 11:36     ` Zdenek Kabelac
  2019-12-10  3:23       ` Heming Zhao
  0 siblings, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2019-12-09 11:36 UTC (permalink / raw)
  To: Heming Zhao, LVM general discussion and development

Dne 09. 12. 19 v 11:58 Heming Zhao napsal(a):
> Hello Zdenek,
> 
> Please check the compressed file in attachment.
> 
>> 
> On 12/9/19 5:40 PM, Zdenek Kabelac wrote:
>> Dne 09. 12. 19 v 10:20 Heming Zhao napsal(a):
>>> Hello List,
>>>
>>> The lvm test default timeout (--timeout) default value is 180.
>>> But I met below condition when running testcase (in virtual machine):
>>> ```
>>> make check_local T=shell/snapshot-merge.sh
>>>  ���� ... ...
>>> ###������ passed: [ndev-vanilla] shell/snapshot-merge.sh 665
>>>  ���� ... ...
>>> ```


Hi


So your test seems to be very slow in 'mkfs.ext4':

[ 0:03] Creating journal (4096 blocks): done
[ 1:30] Writing superblocks and filesystem accounting information: done
....
[ 1:34] Creating journal (4096 blocks): done
[ 3:03] Writing superblocks and filesystem accounting information: done
....
[ 3:07] Creating journal (4096 blocks): done
[ 4:37] Writing superblocks and filesystem accounting information: done
....
[ 4:40] Creating journal (4096 blocks): done
[ 6:08] Writing superblocks and filesystem accounting information: done
....


This looks certainly seriously like a very slow (possibly doing TRIM) ??
Any idea why ?

The test seems to be using loop device created in backend file in your /tmp
(i.e.: losetup /dev/loop0 /tmp/LVMTEST11574.cJtNfnlPrj/test.img)

So any idea why your virtual machine is that slow with handling this case ?

One idea - aren't you using a storage with very slow discard processing
(i.e. ATM VDO) ?

You can setup different backend device or different location of test dir
(see 'make help' in test subdir)

Regards

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] It looks wrong for the timeout when lvm test running
  2019-12-09 11:36     ` Zdenek Kabelac
@ 2019-12-10  3:23       ` Heming Zhao
  2019-12-10  9:03         ` Zdenek Kabelac
  0 siblings, 1 reply; 7+ messages in thread
From: Heming Zhao @ 2019-12-10  3:23 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

Hello Zdenek,

Thank you for your feedback.
When I switched to ramdisk backend devices, the time consuming is about 9s.

I raised this topic for there may have a bug in timeout related codes,
not for why the snapshot-merge.sh costs too much time.

The code in below overwrite silent_start when select() successfully return.
```
           if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
               silent_start = end; /* something happened */ <====== this line!
               io.sync( false );
           }
```

Thanks.

On 12/9/19 7:36 PM, Zdenek Kabelac wrote:
> Dne 09. 12. 19 v 11:58 Heming Zhao napsal(a):
>> Hello Zdenek,
>>
>> Please check the compressed file in attachment.
>>
>>>
>> On 12/9/19 5:40 PM, Zdenek Kabelac wrote:
>>> Dne 09. 12. 19 v 10:20 Heming Zhao napsal(a):
>>>> Hello List,
>>>>
>>>> The lvm test default timeout (--timeout) default value is 180.
>>>> But I met below condition when running testcase (in virtual machine):
>>>> ```
>>>> make check_local T=shell/snapshot-merge.sh
>>>> ����� ... ...
>>>> ###������ passed: [ndev-vanilla] shell/snapshot-merge.sh 665
>>>> ����� ... ...
>>>> ```
> 
> 
> Hi
> 
> 
> So your test seems to be very slow in 'mkfs.ext4':
> 
> [ 0:03] Creating journal (4096 blocks): done
> [ 1:30] Writing superblocks and filesystem accounting information: done
> ....
> [ 1:34] Creating journal (4096 blocks): done
> [ 3:03] Writing superblocks and filesystem accounting information: done
> ....
> [ 3:07] Creating journal (4096 blocks): done
> [ 4:37] Writing superblocks and filesystem accounting information: done
> ....
> [ 4:40] Creating journal (4096 blocks): done
> [ 6:08] Writing superblocks and filesystem accounting information: done
> ....
> 
> 
> This looks certainly seriously like a very slow (possibly doing TRIM) ??
> Any idea why ?
> 
> The test seems to be using loop device created in backend file in your /tmp
> (i.e.: losetup /dev/loop0 /tmp/LVMTEST11574.cJtNfnlPrj/test.img)
> 
> So any idea why your virtual machine is that slow with handling this case ?
> 
> One idea - aren't you using a storage with very slow discard processing
> (i.e. ATM VDO) ?
> 
> You can setup different backend device or different location of test dir
> (see 'make help' in test subdir)
> 
> Regards
> 
> Zdenek
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] It looks wrong for the timeout when lvm test running
  2019-12-10  3:23       ` Heming Zhao
@ 2019-12-10  9:03         ` Zdenek Kabelac
  2019-12-10  9:11           ` Heming Zhao
  0 siblings, 1 reply; 7+ messages in thread
From: Zdenek Kabelac @ 2019-12-10  9:03 UTC (permalink / raw)
  To: Heming Zhao, LVM general discussion and development

Dne 10. 12. 19 v 4:23 Heming Zhao napsal(a):
> Hello Zdenek,
> 
> Thank you for your feedback.
> When I switched to ramdisk backend devices, the time consuming is about 9s.
> 
> I raised this topic for there may have a bug in timeout related codes,
> not for why the snapshot-merge.sh costs too much time.
> 
> The code in below overwrite silent_start when select() successfully return.
> ```
>             if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
>                 silent_start = end; /* something happened */ <====== this line!
>                 io.sync( false );
>             }
> ```

Hi

This is intentional - the idea of this 'timeout' is - that during the timeout 
period there is no progress on output (i.e. for 180s there was nothing printed 
- which likely means something got locked - there is also secondary limitation
on number of output lines - so if something is doing endless printing loop,
it's also catched before it fill all disk spaces on testing machine... - 
hopefully this explains  'something happened'  comment.

In general we don't have strict 'limit' for test length in time - although if 
test takes 10 minutes - it's likely not what was intended - but we don't have 
that rule on individual test time as speed often largely depends on machine 
performance.

Zdenek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] It looks wrong for the timeout when lvm test running
  2019-12-10  9:03         ` Zdenek Kabelac
@ 2019-12-10  9:11           ` Heming Zhao
  0 siblings, 0 replies; 7+ messages in thread
From: Heming Zhao @ 2019-12-10  9:11 UTC (permalink / raw)
  To: Zdenek Kabelac, LVM general discussion and development

Hello Zdenek,

I got your point, thank you for your explanation.


On 12/10/19 5:03 PM, Zdenek Kabelac wrote:
> Dne 10. 12. 19 v 4:23 Heming Zhao napsal(a):
>> Hello Zdenek,
>>
>> Thank you for your feedback.
>> When I switched to ramdisk backend devices, the time consuming is about 9s.
>>
>> I raised this topic for there may have a bug in timeout related codes,
>> not for why the snapshot-merge.sh costs too much time.
>>
>> The code in below overwrite silent_start when select() successfully return.
>> ```
>> ����������� if ( select( nfds, &set, NULL, NULL, &wait ) > 0 ) {
>> ��������������� silent_start = end; /* something happened */ <====== this line!
>> ��������������� io.sync( false );
>> ����������� }
>> ```
> 
> Hi
> 
> This is intentional - the idea of this 'timeout' is - that during the timeout period there is no progress on output (i.e. for 180s there was nothing printed - which likely means something got locked - there is also secondary limitation
> on number of output lines - so if something is doing endless printing loop,
> it's also catched before it fill all disk spaces on testing machine... - hopefully this explains� 'something happened'� comment.
> 
> In general we don't have strict 'limit' for test length in time - although if test takes 10 minutes - it's likely not what was intended - but we don't have that rule on individual test time as speed often largely depends on machine performance.
> 
> Zdenek
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-10  9:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09  9:20 [linux-lvm] It looks wrong for the timeout when lvm test running Heming Zhao
2019-12-09  9:40 ` Zdenek Kabelac
2019-12-09 10:58   ` Heming Zhao
2019-12-09 11:36     ` Zdenek Kabelac
2019-12-10  3:23       ` Heming Zhao
2019-12-10  9:03         ` Zdenek Kabelac
2019-12-10  9:11           ` Heming Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).