All of lore.kernel.org
 help / color / mirror / Atom feed
* Weird (seemingly flakey) p4 breakage in t9833
@ 2019-02-06 10:33 Johannes Schindelin
  2019-02-06 11:11 ` SZEDER Gábor
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Schindelin @ 2019-02-06 10:33 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git

Hi Luke,

in a private Azure Pipeline (sorry...) I noticed an intermittent problem
in the p4 tests on osx-gcc.

I would point you to a public log, but the Azure Pipelines support *just*
made it to next, so I *just* set up a public one targeting anything else
than my `vsts-ci` branch, at
https://dev.azure.com/gitgitgadget/git/_build/index?definitionId=6. And
those builds do not show that problem, so it must be a flakey test.

But maybe you can spot anything familiar from the log?

-- snip --
[...]
++ P4TICKETS='/Users/vsts/agent/2.146.0/work/1/s/t/trash
directory.t9833-errors/cli/tickets'
++ P4USER=short_expiry_user
++ echo password
++ p4 login
Enter password: 
User short_expiry_user logged in.
Perforce db files in '.' will be created if missing...
++ cd '/Users/vsts/agent/2.146.0/work/1/s/t/trash
directory.t9833-errors/git'
++ git p4 sync
++ true
+++ time_in_seconds
+++ cd /
+++ /usr/bin/python -c 'import time; print(int(time.time()))'
++ test 1549411312 -gt 1549411605
++ sleep 1
Perforce db files in '.' will be created if missing...
failure accessing depot: perforce ticket expires in 1 seconds
Performing incremental import into refs/remotes/p4/master git branch
Depot paths: //depot/
error: last command exited with $?=1
++ true
+++ time_in_seconds
+++ cd /
+++ /usr/bin/python -c 'import time; print(int(time.time()))'
++ test 1549411314 -gt 1549411605
++ sleep 1
not ok 6 - git operation with expired ticket
#	
#		P4TICKETS="$cli/tickets" &&
#		P4USER=short_expiry_user &&
#		echo "password" | p4 login &&
#		(
#			cd "$git" &&
#			git p4 sync &&
#			sleep 5 &&
#			test_must_fail git p4 sync 2>errmsg &&
#			grep "failure accessing depot" errmsg
#		)
#	
-- snap --

BTW I find it very odd to see a `sleep 1` in the trace but not in the
snippet (there is only a `sleep 5` instead, which I fail to see in the
trace). Odd?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Weird (seemingly flakey) p4 breakage in t9833
  2019-02-06 10:33 Weird (seemingly flakey) p4 breakage in t9833 Johannes Schindelin
@ 2019-02-06 11:11 ` SZEDER Gábor
  2019-02-06 11:54   ` Luke Diamand
  2019-02-06 13:55   ` Johannes Schindelin
  0 siblings, 2 replies; 4+ messages in thread
From: SZEDER Gábor @ 2019-02-06 11:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Luke Diamand, git

On Wed, Feb 06, 2019 at 11:33:21AM +0100, Johannes Schindelin wrote:
> Hi Luke,
> 
> in a private Azure Pipeline (sorry...) I noticed an intermittent problem
> in the p4 tests on osx-gcc.
> 
> I would point you to a public log, but the Azure Pipelines support *just*
> made it to next, so I *just* set up a public one targeting anything else
> than my `vsts-ci` branch, at
> https://dev.azure.com/gitgitgadget/git/_build/index?definitionId=6. And
> those builds do not show that problem, so it must be a flakey test.
> 
> But maybe you can spot anything familiar from the log?
> 
> -- snip --
> [...]
> ++ P4TICKETS='/Users/vsts/agent/2.146.0/work/1/s/t/trash
> directory.t9833-errors/cli/tickets'
> ++ P4USER=short_expiry_user
> ++ echo password
> ++ p4 login
> Enter password: 
> User short_expiry_user logged in.
> Perforce db files in '.' will be created if missing...
> ++ cd '/Users/vsts/agent/2.146.0/work/1/s/t/trash
> directory.t9833-errors/git'
> ++ git p4 sync
> ++ true
> +++ time_in_seconds
> +++ cd /
> +++ /usr/bin/python -c 'import time; print(int(time.time()))'
> ++ test 1549411312 -gt 1549411605
> ++ sleep 1
> Perforce db files in '.' will be created if missing...
> failure accessing depot: perforce ticket expires in 1 seconds
> Performing incremental import into refs/remotes/p4/master git branch
> Depot paths: //depot/
> error: last command exited with $?=1
> ++ true
> +++ time_in_seconds
> +++ cd /
> +++ /usr/bin/python -c 'import time; print(int(time.time()))'
> ++ test 1549411314 -gt 1549411605
> ++ sleep 1
> not ok 6 - git operation with expired ticket
> #	
> #		P4TICKETS="$cli/tickets" &&
> #		P4USER=short_expiry_user &&
> #		echo "password" | p4 login &&
> #		(
> #			cd "$git" &&
> #			git p4 sync &&
> #			sleep 5 &&
> #			test_must_fail git p4 sync 2>errmsg &&
> #			grep "failure accessing depot" errmsg
> #		)
> #	
> -- snap --

I saw this on Travis CI a couple of times, and looked into it, though
I have no idea how p4 is supposed to work...  anyway, my theory is:

The previous test 'create group with short ticket expiry' creates a
"ticket" with 3 seconds expiration time, to be used in this failing
one.  This timeout might be just a bit too short when running the test
under high load, and it takes long enough to reach the first 'git p4
sync', so long that the timeout is (almost?) up, and then 'git p4'
errors out.

I'm not sure what the proper solution would be (assuming that my
theory is right, that is): increasing the ticket timeout to a "must be
surely long enough" value would require longer 'sleep' in this test,
which is not good.  I wonder why that failing 'git p4 sync' is
necessary in the first place, and whether it's really necessary to
test ticket expiration, but then again: I have no idea how p4 works.

On a related note, I think it would be better if these two tests were
squashed into one, so we would get the whole picture.

> BTW I find it very odd to see a `sleep 1` in the trace but not in the
> snippet (there is only a `sleep 5` instead, which I fail to see in the
> trace). Odd?

Intentional.  p4d seems to be prone to hang, to circumvent this
start_p4d() from 'lib-git-p4.sh' starts a watchdog process in the
background to kill it after a long-enough timeout is up.  These three
lines in the log:

> +++ /usr/bin/python -c 'import time; print(int(time.time()))'
> ++ test 1549411312 -gt 1549411605
> ++ sleep 1

come from that background process.

A couple of cleanup patches on top of your 'test_atexit' will
eventually get rid of it...  once I get around to clean them up :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Weird (seemingly flakey) p4 breakage in t9833
  2019-02-06 11:11 ` SZEDER Gábor
@ 2019-02-06 11:54   ` Luke Diamand
  2019-02-06 13:55   ` Johannes Schindelin
  1 sibling, 0 replies; 4+ messages in thread
From: Luke Diamand @ 2019-02-06 11:54 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Johannes Schindelin, Git Users

On Wed, 6 Feb 2019 at 11:11, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Wed, Feb 06, 2019 at 11:33:21AM +0100, Johannes Schindelin wrote:
> > Hi Luke,
> >
> > in a private Azure Pipeline (sorry...) I noticed an intermittent problem
> > in the p4 tests on osx-gcc.
> >
> > I would point you to a public log, but the Azure Pipelines support *just*
> > made it to next, so I *just* set up a public one targeting anything else
> > than my `vsts-ci` branch, at
> > https://dev.azure.com/gitgitgadget/git/_build/index?definitionId=6. And
> > those builds do not show that problem, so it must be a flakey test.
> >
> > But maybe you can spot anything familiar from the log?

I think you're right - it's 'git operation with expired ticket'.

I wonder if we could make a faked-up Perforce client which returned
the correct error data - then it wouldn't need to wait at all.

Or we could just remove it (or make it optional based on an
environment variable).

Let me know which you think is best and I'll put together a patch.

Luke

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Weird (seemingly flakey) p4 breakage in t9833
  2019-02-06 11:11 ` SZEDER Gábor
  2019-02-06 11:54   ` Luke Diamand
@ 2019-02-06 13:55   ` Johannes Schindelin
  1 sibling, 0 replies; 4+ messages in thread
From: Johannes Schindelin @ 2019-02-06 13:55 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Luke Diamand, git

[-- Attachment #1: Type: text/plain, Size: 4060 bytes --]

Hi Gábor,

On Wed, 6 Feb 2019, SZEDER Gábor wrote:

> On Wed, Feb 06, 2019 at 11:33:21AM +0100, Johannes Schindelin wrote:
> 
> > in a private Azure Pipeline (sorry...) I noticed an intermittent problem
> > in the p4 tests on osx-gcc.
> > 
> > I would point you to a public log, but the Azure Pipelines support *just*
> > made it to next, so I *just* set up a public one targeting anything else
> > than my `vsts-ci` branch, at
> > https://dev.azure.com/gitgitgadget/git/_build/index?definitionId=6. And
> > those builds do not show that problem, so it must be a flakey test.
> > 
> > But maybe you can spot anything familiar from the log?
> > 
> > -- snip --
> > [...]
> > ++ P4TICKETS='/Users/vsts/agent/2.146.0/work/1/s/t/trash
> > directory.t9833-errors/cli/tickets'
> > ++ P4USER=short_expiry_user
> > ++ echo password
> > ++ p4 login
> > Enter password: 
> > User short_expiry_user logged in.
> > Perforce db files in '.' will be created if missing...
> > ++ cd '/Users/vsts/agent/2.146.0/work/1/s/t/trash
> > directory.t9833-errors/git'
> > ++ git p4 sync
> > ++ true
> > +++ time_in_seconds
> > +++ cd /
> > +++ /usr/bin/python -c 'import time; print(int(time.time()))'
> > ++ test 1549411312 -gt 1549411605
> > ++ sleep 1
> > Perforce db files in '.' will be created if missing...
> > failure accessing depot: perforce ticket expires in 1 seconds
> > Performing incremental import into refs/remotes/p4/master git branch
> > Depot paths: //depot/
> > error: last command exited with $?=1
> > ++ true
> > +++ time_in_seconds
> > +++ cd /
> > +++ /usr/bin/python -c 'import time; print(int(time.time()))'
> > ++ test 1549411314 -gt 1549411605
> > ++ sleep 1
> > not ok 6 - git operation with expired ticket
> > #	
> > #		P4TICKETS="$cli/tickets" &&
> > #		P4USER=short_expiry_user &&
> > #		echo "password" | p4 login &&
> > #		(
> > #			cd "$git" &&
> > #			git p4 sync &&
> > #			sleep 5 &&
> > #			test_must_fail git p4 sync 2>errmsg &&
> > #			grep "failure accessing depot" errmsg
> > #		)
> > #	
> > -- snap --
> 
> I saw this on Travis CI a couple of times, and looked into it, though
> I have no idea how p4 is supposed to work...  anyway, my theory is:
> 
> The previous test 'create group with short ticket expiry' creates a
> "ticket" with 3 seconds expiration time, to be used in this failing
> one.  This timeout might be just a bit too short when running the test
> under high load, and it takes long enough to reach the first 'git p4
> sync', so long that the timeout is (almost?) up, and then 'git p4'
> errors out.

That sounds very plausible to me, thank you for the analysis!

> I'm not sure what the proper solution would be (assuming that my
> theory is right, that is): increasing the ticket timeout to a "must be
> surely long enough" value would require longer 'sleep' in this test,
> which is not good.  I wonder why that failing 'git p4 sync' is
> necessary in the first place, and whether it's really necessary to
> test ticket expiration, but then again: I have no idea how p4 works.
> 
> On a related note, I think it would be better if these two tests were
> squashed into one, so we would get the whole picture.

Agreed.

> > BTW I find it very odd to see a `sleep 1` in the trace but not in the
> > snippet (there is only a `sleep 5` instead, which I fail to see in the
> > trace). Odd?
> 
> Intentional.  p4d seems to be prone to hang, to circumvent this
> start_p4d() from 'lib-git-p4.sh' starts a watchdog process in the
> background to kill it after a long-enough timeout is up.

What I found odd was that I did not see that `sleep 1` at all in the
trace, but now it makes sense.

> These three lines in the log:
> 
> > +++ /usr/bin/python -c 'import time; print(int(time.time()))'
> > ++ test 1549411312 -gt 1549411605
> > ++ sleep 1
> 
> come from that background process.
> 
> A couple of cleanup patches on top of your 'test_atexit' will
> eventually get rid of it...  once I get around to clean them up :)

Heh, so there *will* be something useful coming out of that aborted side
track? Great!

Thank you,
Dscho

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-02-06 13:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 10:33 Weird (seemingly flakey) p4 breakage in t9833 Johannes Schindelin
2019-02-06 11:11 ` SZEDER Gábor
2019-02-06 11:54   ` Luke Diamand
2019-02-06 13:55   ` Johannes Schindelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.