All of lore.kernel.org
 help / color / mirror / Atom feed
* [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command
@ 2017-10-11  8:00 Dave Chinner
  2017-10-11  8:13 ` Dave Chinner
  2017-10-11 10:39 ` Brian Foster
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Chinner @ 2017-10-11  8:00 UTC (permalink / raw)
  To: fstests

HI folks,

I was wondering if anyone else had seen this problem, because it's
got me absolutely stumped. One of my test VMs is having a really
weird livelock in xfs/277. It's getting stuck in an endless loop
burning the entire CPU in a the 277 process (i.e. running bash).
What it is stuck on makes no sense to me, nor does the looping
behaviour, and I can only reproduce it on this one machine.

The code in question:

        "fsmap" )
                testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
                echo $testio | grep -q "Inappropriate ioctl" && \
                        _notrun "xfs_io $command support is missing"

Is pretty simple and obvious - not a lot to go wrong. set -x
shows the last command in the output file to be the fsmap command.
$test_io has about 5000 lines of output in it.

I did some testing to isolate the problem. This exits having
executed the fsmap command just fine:


        "fsmap" )
                testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
                exit

But this never exits and it starts burning down teh CPU:

        "fsmap" )
                testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
                echo $testio
                exit

Yeah, echoing the output of the fsmap command seems to cause bash to
enter an endless loop of some kind. Well, it's not endless, because
every 30s or so the process dies and a new child process runs the
same loop again. Attaching strace to one of these processes:

......
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=139264, ...}) = 0
getdents(3, /* 1024 entries */, 32768)  = 32744
getdents(3, /* 1023 entries */, 32768)  = 32752
getdents(3, /* 1023 entries */, 32768)  = 32744
getdents(3, /* 1021 entries */, 32768)  = 32768
getdents(3, /* 1022 entries */, 32768)  = 32744
getdents(3, /* 1022 entries */, 32768)  = 32768
getdents(3, /* 401 entries */, 32768)   = 12840
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=139264, ...}) = 0
getdents(3, /* 1024 entries */, 32768)  = 32744
getdents(3, /* 1023 entries */, 32768)  = 32752
getdents(3, /* 1023 entries */, 32768)  = 32744
getdents(3, /* 1021 entries */, 32768)  = 32768
getdents(3, /* 1022 entries */, 32768)  = 32744
getdents(3, /* 1022 entries */, 32768)  = 32768
getdents(3, /* 401 entries */, 32768)   = 12840
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=139264, ...}) = 0
getdents(3, /* 1024 entries */, 32768)  = 32744
getdents(3, /* 1023 entries */, 32768)  = 32752
getdents(3, /* 1023 entries */, 32768)  = 32744
getdents(3, /* 1021 entries */, 32768)  = 32768
getdents(3, /* 1022 entries */, 32768)  = 32744
getdents(3, /* 1022 entries */, 32768)  = 32768
getdents(3, /* 401 entries */, 32768)   = 12840
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
.....

bash is running around in a tight loop running readdir() on some
unknown directory over and over again. I can't work it out - this is
the only machine that does it, and it I can't reproduce it outside
of running xfs/277 from xfstests...

I'm outta ideas - I've got no idea what the hell is going wrong
here. Anyone got any ideas?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command
  2017-10-11  8:00 [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command Dave Chinner
@ 2017-10-11  8:13 ` Dave Chinner
  2017-10-11 12:20   ` Dave Chinner
  2017-10-11 10:39 ` Brian Foster
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2017-10-11  8:13 UTC (permalink / raw)
  To: fstests

On Wed, Oct 11, 2017 at 07:00:47PM +1100, Dave Chinner wrote:
> HI folks,
> 
> I was wondering if anyone else had seen this problem, because it's
> got me absolutely stumped. One of my test VMs is having a really
> weird livelock in xfs/277. It's getting stuck in an endless loop
> burning the entire CPU in a the 277 process (i.e. running bash).
> What it is stuck on makes no sense to me, nor does the looping
> behaviour, and I can only reproduce it on this one machine.

FWIW, the trigger for this was enabling rmap+reflink on the test
device on this machine for the first time. Hence this is the first
time the test has been run on that machine...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command
  2017-10-11  8:00 [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command Dave Chinner
  2017-10-11  8:13 ` Dave Chinner
@ 2017-10-11 10:39 ` Brian Foster
  2017-10-11 10:48   ` Dave Chinner
  1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2017-10-11 10:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests

On Wed, Oct 11, 2017 at 07:00:47PM +1100, Dave Chinner wrote:
> HI folks,
> 
> I was wondering if anyone else had seen this problem, because it's
> got me absolutely stumped. One of my test VMs is having a really
> weird livelock in xfs/277. It's getting stuck in an endless loop
> burning the entire CPU in a the 277 process (i.e. running bash).
> What it is stuck on makes no sense to me, nor does the looping
> behaviour, and I can only reproduce it on this one machine.
> 
> The code in question:
> 
>         "fsmap" )
>                 testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
>                 echo $testio | grep -q "Inappropriate ioctl" && \
>                         _notrun "xfs_io $command support is missing"
> 

Oh, so it looks like you're in _require_xfs_io_command(). I was hunting
around for this code in xfs/277. :P

> Is pretty simple and obvious - not a lot to go wrong. set -x
> shows the last command in the output file to be the fsmap command.
> $test_io has about 5000 lines of output in it.
> 
> I did some testing to isolate the problem. This exits having
> executed the fsmap command just fine:
> 
> 
>         "fsmap" )
>                 testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
>                 exit
> 
> But this never exits and it starts burning down teh CPU:
> 
>         "fsmap" )
>                 testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
>                 echo $testio
>                 exit
> 
> Yeah, echoing the output of the fsmap command seems to cause bash to
> enter an endless loop of some kind. Well, it's not endless, because
> every 30s or so the process dies and a new child process runs the
> same loop again. Attaching strace to one of these processes:
> 
...
> 
> bash is running around in a tight loop running readdir() on some
> unknown directory over and over again. I can't work it out - this is
> the only machine that does it, and it I can't reproduce it outside
> of running xfs/277 from xfstests...
> 
> I'm outta ideas - I've got no idea what the hell is going wrong
> here. Anyone got any ideas?
> 

Heh, that sounds pretty strange. Does your test dev have a pre-existing
$testfile? Have you tried to execute the associated code in an
independent bash script? E.g., I'd probably try to do something like
mount your test dev then run:

	XFS_IO_PROG=...
	testfile=...

	testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
	echo $testio

... from a separate script using hardcoded values from the xfstests
environment, and see what happens..?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command
  2017-10-11 10:39 ` Brian Foster
@ 2017-10-11 10:48   ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2017-10-11 10:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: fstests

On Wed, Oct 11, 2017 at 06:39:37AM -0400, Brian Foster wrote:
> On Wed, Oct 11, 2017 at 07:00:47PM +1100, Dave Chinner wrote:
> > HI folks,
> > 
> > I was wondering if anyone else had seen this problem, because it's
> > got me absolutely stumped. One of my test VMs is having a really
> > weird livelock in xfs/277. It's getting stuck in an endless loop
> > burning the entire CPU in a the 277 process (i.e. running bash).
> > What it is stuck on makes no sense to me, nor does the looping
> > behaviour, and I can only reproduce it on this one machine.
> > 
> > The code in question:
> > 
> >         "fsmap" )
> >                 testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
> >                 echo $testio | grep -q "Inappropriate ioctl" && \
> >                         _notrun "xfs_io $command support is missing"
> > 
> 
> Oh, so it looks like you're in _require_xfs_io_command(). I was hunting
> around for this code in xfs/277. :P

Oh, sorry, I forgot to paste the function name.

> 
> > Is pretty simple and obvious - not a lot to go wrong. set -x
> > shows the last command in the output file to be the fsmap command.
> > $test_io has about 5000 lines of output in it.
> > 
> > I did some testing to isolate the problem. This exits having
> > executed the fsmap command just fine:
> > 
> > 
> >         "fsmap" )
> >                 testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
> >                 exit
> > 
> > But this never exits and it starts burning down teh CPU:
> > 
> >         "fsmap" )
> >                 testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
> >                 echo $testio
> >                 exit
> > 
> > Yeah, echoing the output of the fsmap command seems to cause bash to
> > enter an endless loop of some kind. Well, it's not endless, because
> > every 30s or so the process dies and a new child process runs the
> > same loop again. Attaching strace to one of these processes:
> > 
> ...
> > 
> > bash is running around in a tight loop running readdir() on some
> > unknown directory over and over again. I can't work it out - this is
> > the only machine that does it, and it I can't reproduce it outside
> > of running xfs/277 from xfstests...
> > 
> > I'm outta ideas - I've got no idea what the hell is going wrong
> > here. Anyone got any ideas?
> > 
> 
> Heh, that sounds pretty strange. Does your test dev have a pre-existing
> $testfile?

Nope, the test file is "$pid.xfs_io". I've got several of them in
$TEST_DIR from killing tests that hung.

> Have you tried to execute the associated code in an
> independent bash script?

Yes. Doesn't fail from the command line, with unique new files or
the files created from _require_xfs_io_command.

> E.g., I'd probably try to do something like
> mount your test dev then run:
> 
> 	XFS_IO_PROG=...
> 	testfile=...
> 
> 	testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
> 	echo $testio
> 
> ... from a separate script using hardcoded values from the xfstests
> environment, and see what happens..?

... and it doesn't fail from a #!/bin/bash script.

I can only reproduce it from running xfs/277, and only on this
machine. That's what's got me stumped.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command
  2017-10-11  8:13 ` Dave Chinner
@ 2017-10-11 12:20   ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2017-10-11 12:20 UTC (permalink / raw)
  To: fstests

On Wed, Oct 11, 2017 at 07:13:30PM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 07:00:47PM +1100, Dave Chinner wrote:
> > HI folks,
> > 
> > I was wondering if anyone else had seen this problem, because it's
> > got me absolutely stumped. One of my test VMs is having a really
> > weird livelock in xfs/277. It's getting stuck in an endless loop
> > burning the entire CPU in a the 277 process (i.e. running bash).
> > What it is stuck on makes no sense to me, nor does the looping
> > behaviour, and I can only reproduce it on this one machine.
> 
> FWIW, the trigger for this was enabling rmap+reflink on the test
> device on this machine for the first time. Hence this is the first
> time the test has been run on that machine...

TL;DR: this is a result of bugs in the xfstests code. "echo $var"
results in $var being evaluated for glob/regex matches with
files in the current working directory.


Longer: the fsmap
output contains test like [1234..5678] on every line, and there were
5,000 lines of output being generated with about 100k of characters
inside [] brackets. If you run 'echo [1234]', it
does not output '[1234]' to the console - it treats [1234] as a glob
and goes searching the current working directory for matches. That's
where the readdir output comes from.

So:

$ echo foo [1234..5768] bar
foo 1 bar
$ ls -l |grep " 1$"
-rw-r--r-- 1 dave dave         35 Apr 14  2010 1
$

The "1" is output because it matches a file in the local directory.

and if I strace the echo command. I see:

open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=28672, ...}) = 0
getdents(3, /* 1109 entries */, 32768)  = 31656
getdents(3, /* 0 entries */, 32768)     = 0
close(3) = 0
write(1, "foo 1 bar\n", 10)             = 10

So, that's where the readdir comes from. Doesn't explain the
slowness, or why it doesn't reproduce in a interactive shell or on
any other machine.


So, look at what directory it is running in ($here, the root of the
xfstests installation) and now add other xfstests bugs that havei
resulted in ~6500 test files being dumped in $here.

(generic/109 is one of the culprits)

Now we have ~100k characters being matched against ~6500 filenames.
There's the slowness.  I simply wasn't patient enough to wait for
it.

One `git clean -f -d` later, and the test now runs and completes in
4s.

So, what's the bug in xfstests? The usual bug in shell programs:i
missing variable quoting.

$ echo foo [1234..5768] bar 
foo 1 bar
$ echo "foo [1234..5768] bar"
foo [1234..5768] bar

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-11 12:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11  8:00 [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command Dave Chinner
2017-10-11  8:13 ` Dave Chinner
2017-10-11 12:20   ` Dave Chinner
2017-10-11 10:39 ` Brian Foster
2017-10-11 10:48   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.