All of lore.kernel.org
 help / color / mirror / Atom feed
* debug parallel root checks
@ 2018-03-09 14:47 Ruediger Meier
  2018-03-09 15:06 ` Ruediger Meier
  0 siblings, 1 reply; 6+ messages in thread
From: Ruediger Meier @ 2018-03-09 14:47 UTC (permalink / raw)
  To: util-linux

Hi,

Our parallel root checks look already nice on the first view.
On the second view they fail the stress test, at least on my
system.

Mostly umounting fails due to "device in use" and similar. For me it
seems that "udevadm --settle" has no effect sometimes. Very often the
label, utab, context and recursive mount tests make problems. I've made
sure that udisks and similar devils are not running.

One may try to use my script sippet below to run a loop until it fails.
You should remove the "sort" from run.sh to get more failures:

patch tests/run.sh
-----
 printf "%s\n" ${comps[*]} |
-       sort |
        xargs -I '{}' -P $paraller_jobs -n 1 bash -c "'{}' \"$OPTS\" ||
                echo 1 >> $top_builddir/tests/failures"
-----



#### ultest.bash ###
## Attention, using sudo"

i=0
while true; do
    echo -e "\n#### run test loop: $((++i)) ####\n"

    ## select any interesting subset of all tests and shuffle
    mytests=$(git grep -l "^ts_skip_nonroot$" -- tests/ts/*/ \
                | sort --random-sort | sed "s@tests/ts/@@")
    test -n "$mytests" || { echo "empty tests!?"; break; }

    time sudo ./tests/run.sh --parallel=30 --exclude="some-bad-tests" $mytests \
        || break
done

echo -e "\n#### bad run: $i ####\n"

###############


cu,
Rudi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: debug parallel root checks
  2018-03-09 14:47 debug parallel root checks Ruediger Meier
@ 2018-03-09 15:06 ` Ruediger Meier
  2018-03-09 21:46   ` Karel Zak
  2018-03-09 21:48   ` Karel Zak
  0 siblings, 2 replies; 6+ messages in thread
From: Ruediger Meier @ 2018-03-09 15:06 UTC (permalink / raw)
  To: util-linux

On Friday 09 March 2018, Ruediger Meier wrote:
> Hi,
>
> Our parallel root checks look already nice on the first view.
> On the second view they fail the stress test, at least on my
> system.

Just an arbitrary example. Sometimes I get this failure

$ cat  tests/diff/mount/uuid
--- /tmp/ul2/tests/expected/mount/uuid  2018-03-09 11:04:54.992654305 +0100
+++ /tmp/ul2/tests/output/mount/uuid    2018-03-09 15:55:00.168028810 +0100
@@ -1 +1,2 @@
-Success
+mount: /tmp/ul2/tests/output/mount/uuid-mnt: can't find UUID="f102445a-f6f3-4657-bc58-81ff164fc0d9".
+A) Cannot find /dev/loop10 in /proc/mounts


So mount can't find that UUID although it was found before by
ts_uuid_by_devname, i.e. by blkid(1).

What could be the problem:
   - another test thread removed the UUID
   - invalid blkid cache
   - a bug?


cu,
Rudi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: debug parallel root checks
  2018-03-09 15:06 ` Ruediger Meier
@ 2018-03-09 21:46   ` Karel Zak
  2018-03-09 22:16     ` Ruediger Meier
  2018-03-09 21:48   ` Karel Zak
  1 sibling, 1 reply; 6+ messages in thread
From: Karel Zak @ 2018-03-09 21:46 UTC (permalink / raw)
  To: Ruediger Meier; +Cc: util-linux

On Fri, Mar 09, 2018 at 04:06:08PM +0100, Ruediger Meier wrote:
> On Friday 09 March 2018, Ruediger Meier wrote:
> > Hi,
> >
> > Our parallel root checks look already nice on the first view.
> > On the second view they fail the stress test, at least on my
> > system.
> 
> Just an arbitrary example. Sometimes I get this failure
> 
> $ cat  tests/diff/mount/uuid
> --- /tmp/ul2/tests/expected/mount/uuid  2018-03-09 11:04:54.992654305 +0100
> +++ /tmp/ul2/tests/output/mount/uuid    2018-03-09 15:55:00.168028810 +0100
> @@ -1 +1,2 @@
> -Success
> +mount: /tmp/ul2/tests/output/mount/uuid-mnt: can't find UUID="f102445a-f6f3-4657-bc58-81ff164fc0d9".
> +A) Cannot find /dev/loop10 in /proc/mounts
> 
> 
> So mount can't find that UUID although it was found before by
> ts_uuid_by_devname, i.e. by blkid(1).
> 
> What could be the problem:
>    - another test thread removed the UUID
>    - invalid blkid cache
>    - a bug?

We do not use blkid cache, it should be based on udev symmlinks. The
symlinks are not ready yet. I guess sleep() or so fixed the problem.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: debug parallel root checks
  2018-03-09 15:06 ` Ruediger Meier
  2018-03-09 21:46   ` Karel Zak
@ 2018-03-09 21:48   ` Karel Zak
  2018-03-09 22:28     ` Ruediger Meier
  1 sibling, 1 reply; 6+ messages in thread
From: Karel Zak @ 2018-03-09 21:48 UTC (permalink / raw)
  To: Ruediger Meier; +Cc: util-linux

On Fri, Mar 09, 2018 at 04:06:08PM +0100, Ruediger Meier wrote:
> On Friday 09 March 2018, Ruediger Meier wrote:
> > Hi,
> >
> > Our parallel root checks look already nice on the first view.
> > On the second view they fail the stress test, at least on my
> > system.
> 
> Just an arbitrary example. Sometimes I get this failure
> 
> $ cat  tests/diff/mount/uuid
> --- /tmp/ul2/tests/expected/mount/uuid  2018-03-09 11:04:54.992654305 +0100
> +++ /tmp/ul2/tests/output/mount/uuid    2018-03-09 15:55:00.168028810 +0100
> @@ -1 +1,2 @@
> -Success
> +mount: /tmp/ul2/tests/output/mount/uuid-mnt: can't find UUID="f102445a-f6f3-4657-bc58-81ff164fc0d9".
> +A) Cannot find /dev/loop10 in /proc/mounts
> 
> 
> So mount can't find that UUID although it was found before by
> ts_uuid_by_devname, i.e. by blkid(1).
> 
> What could be the problem:
>    - another test thread removed the UUID
>    - invalid blkid cache
>    - a bug?


And note that I do not use --parallel as test case before a release. It
seems still too fragile to provide serious functional tests.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: debug parallel root checks
  2018-03-09 21:46   ` Karel Zak
@ 2018-03-09 22:16     ` Ruediger Meier
  0 siblings, 0 replies; 6+ messages in thread
From: Ruediger Meier @ 2018-03-09 22:16 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Friday 09 March 2018, Karel Zak wrote:
> On Fri, Mar 09, 2018 at 04:06:08PM +0100, Ruediger Meier wrote:
> > On Friday 09 March 2018, Ruediger Meier wrote:
> > > Hi,
> > >
> > > Our parallel root checks look already nice on the first view.
> > > On the second view they fail the stress test, at least on my
> > > system.
> >
> > Just an arbitrary example. Sometimes I get this failure
> >
> > $ cat  tests/diff/mount/uuid
> > --- /tmp/ul2/tests/expected/mount/uuid  2018-03-09
> > 11:04:54.992654305 +0100 +++ /tmp/ul2/tests/output/mount/uuid   
> > 2018-03-09 15:55:00.168028810 +0100 @@ -1 +1,2 @@
> > -Success
> > +mount: /tmp/ul2/tests/output/mount/uuid-mnt: can't find
> > UUID="f102445a-f6f3-4657-bc58-81ff164fc0d9". +A) Cannot find
> > /dev/loop10 in /proc/mounts
> >
> >
> > So mount can't find that UUID although it was found before by
> > ts_uuid_by_devname, i.e. by blkid(1).
> >
> > What could be the problem:
> >    - another test thread removed the UUID
> >    - invalid blkid cache
> >    - a bug?
>
> We do not use blkid cache, it should be based on udev symmlinks. The
> symlinks are not ready yet. I guess sleep() or so fixed the problem.

I've found now a really stpuid bug which fixes already a lot.

-----------
 tests: fix grep expressions for devices

ts_is_mounted "/dev/loop1" returned true if /dev/loop17 was
mounted. A very annoying source of sporadic failures since
many years. This issue became more visible since running the
checks in parallel, which increases the probability to get
bigger loop device numbers.

https://github.com/karelzak/util-linux/pull/594/commits/5a33a0cdb069b3e86c044eb4f80d0c2104c558b2
-----------

Can't believe that I haven't noticed this bug a few years
earlier ... Now I'm able to run that test loop endless and
collect all tests which still have instabilities.

cu,
Rudi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: debug parallel root checks
  2018-03-09 21:48   ` Karel Zak
@ 2018-03-09 22:28     ` Ruediger Meier
  0 siblings, 0 replies; 6+ messages in thread
From: Ruediger Meier @ 2018-03-09 22:28 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Friday 09 March 2018, Karel Zak wrote:
> On Fri, Mar 09, 2018 at 04:06:08PM +0100, Ruediger Meier wrote:
> > On Friday 09 March 2018, Ruediger Meier wrote:
> > > Hi,
> > >
> > > Our parallel root checks look already nice on the first view.
> > > On the second view they fail the stress test, at least on my
> > > system.
> >
> > Just an arbitrary example. Sometimes I get this failure
> >
> > $ cat  tests/diff/mount/uuid
> > --- /tmp/ul2/tests/expected/mount/uuid  2018-03-09
> > 11:04:54.992654305 +0100 +++ /tmp/ul2/tests/output/mount/uuid   
> > 2018-03-09 15:55:00.168028810 +0100 @@ -1 +1,2 @@
> > -Success
> > +mount: /tmp/ul2/tests/output/mount/uuid-mnt: can't find
> > UUID="f102445a-f6f3-4657-bc58-81ff164fc0d9". +A) Cannot find
> > /dev/loop10 in /proc/mounts
> >
> >
> > So mount can't find that UUID although it was found before by
> > ts_uuid_by_devname, i.e. by blkid(1).
> >
> > What could be the problem:
> >    - another test thread removed the UUID
> >    - invalid blkid cache
> >    - a bug?
>
> And note that I do not use --parallel as test case before a release.
> It seems still too fragile to provide serious functional tests.

Of course, but it helped already somehow to understand our test-suite 
better and discovered some bugs which could also happen 
without --parallel.

I'm still sceptical about our multiple locks strategy (deadlocks!?). 
Also the speed improvement is not significant against just running two 
separate "xargs threads", one for root and one for non-root. Anyways 
it's interesting to play around with the locks and maybe we could even 
avoid the big scsi_debug lock one day by re-using the module instead of
modprobe/rmmod.

cu,
Rudi



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-09 22:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 14:47 debug parallel root checks Ruediger Meier
2018-03-09 15:06 ` Ruediger Meier
2018-03-09 21:46   ` Karel Zak
2018-03-09 22:16     ` Ruediger Meier
2018-03-09 21:48   ` Karel Zak
2018-03-09 22:28     ` Ruediger Meier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.