* pseudo interaction issue @ 2012-02-17 17:35 Paul Eggleton 2012-02-17 18:50 ` Mark Hatle 0 siblings, 1 reply; 24+ messages in thread From: Paul Eggleton @ 2012-02-17 17:35 UTC (permalink / raw) To: yocto Hi all, I'm trying to extend buildhistory to write out the metadata revisions just before it does the commit to the buildhistory repository, and I'm having some pseudo-related trouble. The structure is a little unusual, in that the execution flow is an event handler that calls a shell function (via bb.build.exec_func()) and during parsing this function an ${@...} reference to a python function is evaluated, which then calls os.popen(), at which point I get the error "pseudo: You must set the PSEUDO_PREFIX environment variable to run pseudo." I don't need pseudo at this stage. I've tried setting PSEUDO_DISABLED=1 and even PSEUDO_UNLOAD=1 just prior to the os.popen() call (or within it) and despite evidence that pseudo is taking notice of these being set in other contexts (when the function is called from elsewhere) even when doing this I still get the error above. I could rearrange the structure to avoid this execution flow however that would bar me from reusing existing code that we have for getting the metadata revision. Any suggestions? Cheers, Paul -- Paul Eggleton Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-02-17 17:35 pseudo interaction issue Paul Eggleton @ 2012-02-17 18:50 ` Mark Hatle 2012-03-14 9:02 ` Xu, Dongxiao 0 siblings, 1 reply; 24+ messages in thread From: Mark Hatle @ 2012-02-17 18:50 UTC (permalink / raw) To: yocto; +Cc: Peter Seebach We're looking into this issue. You should never get the "pseudo: You must set the PSEUDO_PREFIX environment variable to run pseudo." message. This means something appears to have avoided the wrappers. I'll let you know once we figure out something. --Mark On 2/17/12 9:35 AM, Paul Eggleton wrote: > Hi all, > > I'm trying to extend buildhistory to write out the metadata revisions just > before it does the commit to the buildhistory repository, and I'm having some > pseudo-related trouble. The structure is a little unusual, in that the > execution flow is an event handler that calls a shell function (via > bb.build.exec_func()) and during parsing this function an ${@...} reference to > a python function is evaluated, which then calls os.popen(), at which point I > get the error "pseudo: You must set the PSEUDO_PREFIX environment variable to > run pseudo." > > I don't need pseudo at this stage. I've tried setting PSEUDO_DISABLED=1 and > even PSEUDO_UNLOAD=1 just prior to the os.popen() call (or within it) and > despite evidence that pseudo is taking notice of these being set in other > contexts (when the function is called from elsewhere) even when doing this I > still get the error above. I could rearrange the structure to avoid this > execution flow however that would bar me from reusing existing code that we > have for getting the metadata revision. > > Any suggestions? > > Cheers, > Paul > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-02-17 18:50 ` Mark Hatle @ 2012-03-14 9:02 ` Xu, Dongxiao 2012-03-22 1:49 ` Xu, Dongxiao 0 siblings, 1 reply; 24+ messages in thread From: Xu, Dongxiao @ 2012-03-14 9:02 UTC (permalink / raw) To: Mark Hatle; +Cc: yocto, Peter Seebach Hi Mark, When using the new Hob to build targets, I also observed the pseudo output: "pseudo: You must set the PSEUDO_PREFIX environment variable to run pseudo." Here is the step to reproduce it: 1) source oe-init-build-env 2) hob 3) select machine and base image. Here I use qemux86 and core-image-minimal. 4) click "Just bake". For this first time build, pseudo works OK. 5) after the build finishes, return to image configuration page and click "Just bake" button again. Then after the build starts, pseudo will print out the above logs. Thanks, Dongxiao On Fri, 2012-02-17 at 10:50 -0800, Mark Hatle wrote: > We're looking into this issue. You should never get the "pseudo: You must set > the PSEUDO_PREFIX environment variable to run pseudo." message. This means > something appears to have avoided the wrappers. > > I'll let you know once we figure out something. > > --Mark > > On 2/17/12 9:35 AM, Paul Eggleton wrote: > > Hi all, > > > > I'm trying to extend buildhistory to write out the metadata revisions just > > before it does the commit to the buildhistory repository, and I'm having some > > pseudo-related trouble. The structure is a little unusual, in that the > > execution flow is an event handler that calls a shell function (via > > bb.build.exec_func()) and during parsing this function an ${@...} reference to > > a python function is evaluated, which then calls os.popen(), at which point I > > get the error "pseudo: You must set the PSEUDO_PREFIX environment variable to > > run pseudo." > > > > I don't need pseudo at this stage. I've tried setting PSEUDO_DISABLED=1 and > > even PSEUDO_UNLOAD=1 just prior to the os.popen() call (or within it) and > > despite evidence that pseudo is taking notice of these being set in other > > contexts (when the function is called from elsewhere) even when doing this I > > still get the error above. I could rearrange the structure to avoid this > > execution flow however that would bar me from reusing existing code that we > > have for getting the metadata revision. > > > > Any suggestions? > > > > Cheers, > > Paul > > > > _______________________________________________ > yocto mailing list > yocto@yoctoproject.org > https://lists.yoctoproject.org/listinfo/yocto ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-14 9:02 ` Xu, Dongxiao @ 2012-03-22 1:49 ` Xu, Dongxiao 2012-03-22 16:18 ` Peter Seebach 0 siblings, 1 reply; 24+ messages in thread From: Xu, Dongxiao @ 2012-03-22 1:49 UTC (permalink / raw) To: Mark Hatle; +Cc: yocto, Peter Seebach Hi Mark, Any update on this one? I think we may need to track it in bugzilla. Thanks, Dongxiao On Wed, 2012-03-14 at 17:02 +0800, Xu, Dongxiao wrote: > Hi Mark, > > When using the new Hob to build targets, I also observed the pseudo > output: > > "pseudo: You must set the PSEUDO_PREFIX environment variable to run > pseudo." > > Here is the step to reproduce it: > > 1) source oe-init-build-env > 2) hob > 3) select machine and base image. Here I use qemux86 and > core-image-minimal. > 4) click "Just bake". For this first time build, pseudo works OK. > 5) after the build finishes, return to image configuration page and > click "Just bake" button again. Then after the build starts, pseudo will > print out the above logs. > > Thanks, > Dongxiao > > On Fri, 2012-02-17 at 10:50 -0800, Mark Hatle wrote: > > We're looking into this issue. You should never get the "pseudo: You must set > > the PSEUDO_PREFIX environment variable to run pseudo." message. This means > > something appears to have avoided the wrappers. > > > > I'll let you know once we figure out something. > > > > --Mark > > > > On 2/17/12 9:35 AM, Paul Eggleton wrote: > > > Hi all, > > > > > > I'm trying to extend buildhistory to write out the metadata revisions just > > > before it does the commit to the buildhistory repository, and I'm having some > > > pseudo-related trouble. The structure is a little unusual, in that the > > > execution flow is an event handler that calls a shell function (via > > > bb.build.exec_func()) and during parsing this function an ${@...} reference to > > > a python function is evaluated, which then calls os.popen(), at which point I > > > get the error "pseudo: You must set the PSEUDO_PREFIX environment variable to > > > run pseudo." > > > > > > I don't need pseudo at this stage. I've tried setting PSEUDO_DISABLED=1 and > > > even PSEUDO_UNLOAD=1 just prior to the os.popen() call (or within it) and > > > despite evidence that pseudo is taking notice of these being set in other > > > contexts (when the function is called from elsewhere) even when doing this I > > > still get the error above. I could rearrange the structure to avoid this > > > execution flow however that would bar me from reusing existing code that we > > > have for getting the metadata revision. > > > > > > Any suggestions? > > > > > > Cheers, > > > Paul > > > > > > > _______________________________________________ > > yocto mailing list > > yocto@yoctoproject.org > > https://lists.yoctoproject.org/listinfo/yocto > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-22 1:49 ` Xu, Dongxiao @ 2012-03-22 16:18 ` Peter Seebach 2012-03-23 1:01 ` Xu, Dongxiao 0 siblings, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-22 16:18 UTC (permalink / raw) To: Xu, Dongxiao; +Cc: yocto On Thu, 22 Mar 2012 09:49:41 +0800 "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > Hi Mark, > > Any update on this one? I think we may need to track it in bugzilla. I have been looking into this. I've convinced myself that popen() is broken under pseudo, but that's not enough to explain this: * I have a fixed pseudo where popen works. It still fails sometimes under hob. * When it fails, the popen() wrapper isn't even getting called. * Still looking into this. Interestingly, I can't get this failure to occur at all outside of hob. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-22 16:18 ` Peter Seebach @ 2012-03-23 1:01 ` Xu, Dongxiao 2012-03-23 2:29 ` Peter Seebach 0 siblings, 1 reply; 24+ messages in thread From: Xu, Dongxiao @ 2012-03-23 1:01 UTC (permalink / raw) To: Peter Seebach; +Cc: yocto On Thu, 2012-03-22 at 11:18 -0500, Peter Seebach wrote: > On Thu, 22 Mar 2012 09:49:41 +0800 > "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > > > Hi Mark, > > > > Any update on this one? I think we may need to track it in bugzilla. > > I have been looking into this. I've convinced myself that popen() is > broken under pseudo, but that's not enough to explain this: > > * I have a fixed pseudo where popen works. It still fails sometimes > under hob. > * When it fails, the popen() wrapper isn't even getting called. > * Still looking into this. > > Interestingly, I can't get this failure to occur at all outside of hob. I think the difference between Hob and other UI (e.x., knotty) is that, when building image is finished in knotty, the UI, bitbake server, and pseudo all quit. But in Hob, everything still alive after a build. I noticed that the pseudo error happens only when Hob is trying to issue a second build. Thanks, Dongxiao > > -s ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 1:01 ` Xu, Dongxiao @ 2012-03-23 2:29 ` Peter Seebach 2012-03-23 3:21 ` Xu, Dongxiao 0 siblings, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-23 2:29 UTC (permalink / raw) To: Xu, Dongxiao; +Cc: yocto On Fri, 23 Mar 2012 09:01:16 +0800 "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > I think the difference between Hob and other UI (e.x., knotty) is > that, when building image is finished in knotty, the UI, bitbake > server, and pseudo all quit. But in Hob, everything still alive after > a build. I noticed that the pseudo error happens only when Hob is > trying to issue a second build. I get it on the first build. ... You see the issue. Pseudo shouldn't be having a problem, it's designed to restart as needed. Right now, what I know is: 1. I didn't catch popen(), and this can actually be an issue with stuff like PSEUDO_UNLOAD or PSEUDO_DISABLED in play. 2. If I wrap popen, and have the wrapper unconditionally emit a diagnostic, that works for simple os.popen() test cases. 3. But not for the case that's triggering this. So it looks like, when this runs, we have a Python session which has had pseudo unloaded, not just disabled, which then sets LD_PRELOAD but doesn't set PSEUDO_PREFIX. Or something. I'm still trying to get better data on this, like figure out how the sub-process is even getting invoked. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 2:29 ` Peter Seebach @ 2012-03-23 3:21 ` Xu, Dongxiao 2012-03-23 7:16 ` Peter Seebach 0 siblings, 1 reply; 24+ messages in thread From: Xu, Dongxiao @ 2012-03-23 3:21 UTC (permalink / raw) To: Peter Seebach; +Cc: yocto On Thu, 2012-03-22 at 21:29 -0500, Peter Seebach wrote: > On Fri, 23 Mar 2012 09:01:16 +0800 > "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > > > I think the difference between Hob and other UI (e.x., knotty) is > > that, when building image is finished in knotty, the UI, bitbake > > server, and pseudo all quit. But in Hob, everything still alive after > > a build. I noticed that the pseudo error happens only when Hob is > > trying to issue a second build. > > I get it on the first build. What do you mean by first build? Did you click "Just bake" button? Actually "Just bake" button is divided into two steps: 1) build_target(packages) 2) build_target(image) I noticed the pseudo error will happen when calling build_target(image). Therefore I also treated it as second build. Thanks, Dongxiao > > ... You see the issue. > > Pseudo shouldn't be having a problem, it's designed to restart as > needed. > > Right now, what I know is: > 1. I didn't catch popen(), and this can actually be an issue with > stuff like PSEUDO_UNLOAD or PSEUDO_DISABLED in play. > 2. If I wrap popen, and have the wrapper unconditionally emit a > diagnostic, that works for simple os.popen() test cases. > 3. But not for the case that's triggering this. > > So it looks like, when this runs, we have a Python session which has > had pseudo unloaded, not just disabled, which then sets LD_PRELOAD but > doesn't set PSEUDO_PREFIX. Or something. > > I'm still trying to get better data on this, like figure out how the > sub-process is even getting invoked. > > -s ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 3:21 ` Xu, Dongxiao @ 2012-03-23 7:16 ` Peter Seebach 2012-03-23 12:20 ` Paul Eggleton 0 siblings, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-23 7:16 UTC (permalink / raw) To: Xu, Dongxiao; +Cc: yocto On Fri, 23 Mar 2012 11:21:26 +0800 "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > What do you mean by first build? Did you click "Just bake" button? Yes. The original reproducer I saw said that it worked the first time, but failed the second time. > 1) build_target(packages) > 2) build_target(image) > I noticed the pseudo error will happen when calling > build_target(image). Therefore I also treated it as second build. Interesting. I am, currently, sort of stumped on this. The failures are coming from the git log/git branch commands using os.popen(), and those calls appear to be happening in a process: 1. Which did not have the pseudo library loaded, at all. 2. Which has LD_PRELOAD set to include pseudo. 3. Which has most of the PSEUDO_BINDIR, etc., values set. 4. Which does not have PSEUDO_PREFIX set. I can't figure out how to get there. I can't reproduce this outside of hob, yet. And obviously, if the pseudo library isn't loaded, I can't add more debugging or logging to pseudo to fix it. (I infer that it's not loaded because os.popen() isn't hitting diagnostic messages in pseudo's new popen() wrapper, even if they're completely unconditional.) I added an abort(), and got a core file. The core file shows a backtrace in /bin/sh: #0 0x00007f1232cd8a75 in raise () from /lib/libc.so.6 #1 0x00007f1232cdc5c0 in abort () from /lib/libc.so.6 #2 0x00007f1233497489 in pseudo_get_prefix (pathname=0x7f123349ac9d "<null>") at pseudo_util.c:1060 #3 0x00007f1233498244 in pseudo_setupenv () at pseudo_util.c:717 #4 0x00007f12334955dd in wrap_fork () at ports/common/guts/fork.c:14 #5 0x00007f1233495675 in fork () at ports/common/pseudo_wrappers.c:322 #6 0x000000000044339c in make_child () #7 0x0000000000436f04 in ?? () #8 0x0000000000433fa4 in execute_command_internal () #9 0x000000000043475e in execute_command_internal () #10 0x000000000047262a in parse_and_execute () #11 0x000000000041fd04 in ?? () #12 0x0000000000420e01 in main () I haven't found an obvious way to look at the arguments to main, but I'm pretty sure it's one of the "cd REPO; git log..." commands. So... Basically, I can't see how we can get here. Currently leaning towards a theory that the issue may be that we only set PSEUDO_PREFIX when we think we want pseudo, and it looks like in the PSEUDO_DISABLED=1 case, it may be that we can end up with a task being run in an environment where LD_PRELOAD had been set, but we've cleaned up our environment, and the disabled path is skipping part of the environment setup. Still really weird to me that I can't reproduce this outside of hob. I am pretty sure there exists a series of forks and execs and environment changes such that this will end up happening. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 7:16 ` Peter Seebach @ 2012-03-23 12:20 ` Paul Eggleton 2012-03-23 20:06 ` Peter Seebach 2012-03-23 22:45 ` Peter Seebach 0 siblings, 2 replies; 24+ messages in thread From: Paul Eggleton @ 2012-03-23 12:20 UTC (permalink / raw) To: yocto; +Cc: Peter Seebach [-- Attachment #1: Type: text/plain, Size: 838 bytes --] On Friday 23 March 2012 02:16:35 Peter Seebach wrote: > Still really weird to me that I can't reproduce this outside of hob. > I am pretty sure there exists a series of forks and execs and > environment changes such that this will end up happening. I now have a fairly simple test case outside of hob. Put the attached file in meta/classes/ and then add the following to your local.conf: INHERIT += "breakit" Then, just run something that will actually execute a real task. If bzip2 has already been built you can trigger it just with this, which doesn't take very long: bitbake -c package -f bzip2 This should give you a stream of "pseudo: You must set the PSEUDO_PREFIX environment variable to run pseudo." after the task summary. Cheers, Paul -- Paul Eggleton Intel Open Source Technology Centre [-- Attachment #2: breakit.bbclass --] [-- Type: text/plain, Size: 411 bytes --] breakit_break() { cat > ${TMPDIR}/test123 <<END ${@breakit_output(d)} END } def breakit_output(d): return '\n'.join(get_layers_branch_rev(d)) python breakit_eventhandler() { import bb.event import bb.build if isinstance(e, bb.event.BuildCompleted): bb.note("possible breakage coming up...") bb.build.exec_func("breakit_break", e.data) } addhandler breakit_eventhandler ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 12:20 ` Paul Eggleton @ 2012-03-23 20:06 ` Peter Seebach 2012-03-23 22:45 ` Peter Seebach 1 sibling, 0 replies; 24+ messages in thread From: Peter Seebach @ 2012-03-23 20:06 UTC (permalink / raw) To: Paul Eggleton; +Cc: yocto On Fri, 23 Mar 2012 12:20:08 +0000 Paul Eggleton <paul.eggleton@linux.intel.com> wrote: > On Friday 23 March 2012 02:16:35 Peter Seebach wrote: > > Still really weird to me that I can't reproduce this outside of hob. > > I am pretty sure there exists a series of forks and execs and > > environment changes such that this will end up happening. > > I now have a fairly simple test case outside of hob. Put the attached > file in meta/classes/ and then add the following to your local.conf: > > INHERIT += "breakit" Confirmed, this produces the problem for me, reducing the reproduction window from something over an hour to about 7 seconds. THANK YOU! -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 12:20 ` Paul Eggleton 2012-03-23 20:06 ` Peter Seebach @ 2012-03-23 22:45 ` Peter Seebach 2012-03-24 17:15 ` Richard Purdie 1 sibling, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-23 22:45 UTC (permalink / raw) To: Paul Eggleton; +Cc: yocto On Fri, 23 Mar 2012 12:20:08 +0000 Paul Eggleton <paul.eggleton@linux.intel.com> wrote: > On Friday 23 March 2012 02:16:35 Peter Seebach wrote: > > Still really weird to me that I can't reproduce this outside of hob. > > I am pretty sure there exists a series of forks and execs and > > environment changes such that this will end up happening. > > I now have a fairly simple test case outside of hob. Put the attached > file in meta/classes/ and then add the following to your local.conf: > > INHERIT += "breakit" Okay, some notes. The magic seems to come from the interpolated Python output that itself calls os.popen from inside the shell script. A bit of poking about turns up the following: 1. The environment setup and teardown in runqueue.py don't seem to be atomic at all, such that if I annotate the stashing in envbackup with a bb.note for each variable stashed, I sometimes see a fork() call in pseudo BETWEEN two variables. Which is to say, we can be forking WHILE changing the environment. 2. The func_exec_shell calls seem to be able to call the git_branch stuff (which uses os.popen()) in a way that does not hit the runqueue code AT ALL. Meaning it operates with Whatever Environment Seems Handy. 3. I am inclined to suggest that a first pass would be to distinguish between "we need to set this, but we never need to unset it" (PSEUDO_PREFIX) and "we need to set this and then revert it" (PSEUDO_UNLOAD). 4. We should have a handler for popen() anyway, but it will not in and of itself fix the problem. I am still getting the hang of finding my way around bitbake and figuring out who's calling what. I'd guess that just making sure PSEUDO_PREFIX never gets unset would effectively mitigate the problem, but I suspect that we'll still be vulnerable to Weird Race Conditions. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-23 22:45 ` Peter Seebach @ 2012-03-24 17:15 ` Richard Purdie 2012-03-24 17:41 ` Richard Purdie ` (3 more replies) 0 siblings, 4 replies; 24+ messages in thread From: Richard Purdie @ 2012-03-24 17:15 UTC (permalink / raw) To: Peter Seebach; +Cc: Paul Eggleton, yocto On Fri, 2012-03-23 at 17:45 -0500, Peter Seebach wrote: > On Fri, 23 Mar 2012 12:20:08 +0000 > Paul Eggleton <paul.eggleton@linux.intel.com> wrote: > > > On Friday 23 March 2012 02:16:35 Peter Seebach wrote: > > > Still really weird to me that I can't reproduce this outside of hob. > > > I am pretty sure there exists a series of forks and execs and > > > environment changes such that this will end up happening. > > > > I now have a fairly simple test case outside of hob. Put the attached > > file in meta/classes/ and then add the following to your local.conf: > > > > INHERIT += "breakit" > > Okay, some notes. > > The magic seems to come from the interpolated Python output that itself > calls os.popen from inside the shell script. > > A bit of poking about turns up the following: > > 1. The environment setup and teardown in runqueue.py don't seem to be > atomic at all, such that if I annotate the stashing in envbackup with a > bb.note for each variable stashed, I sometimes see a fork() call in > pseudo BETWEEN two variables. Which is to say, we can be forking WHILE > changing the environment. > 2. The func_exec_shell calls seem to be able to call the git_branch > stuff (which uses os.popen()) in a way that does not hit the runqueue > code AT ALL. Meaning it operates with Whatever Environment Seems Handy. > 3. I am inclined to suggest that a first pass would be to distinguish > between "we need to set this, but we never need to unset > it" (PSEUDO_PREFIX) and "we need to set this and then revert > it" (PSEUDO_UNLOAD). > 4. We should have a handler for popen() anyway, but it will not in and > of itself fix the problem. > > I am still getting the hang of finding my way around bitbake and > figuring out who's calling what. I'd guess that just making sure > PSEUDO_PREFIX never gets unset would effectively mitigate the problem, > but I suspect that we'll still be vulnerable to Weird Race Conditions. Let me share my notes. I looked at Pauls instructions and thought, fair enough and tried: bitbake git-native -c install -f which showed no error. Hmm. To summarise what I found: bitbake git-native -c install -f - no pseudo issue bitbake bzip2 -c compile -f - no pseudo issue bitbake bzip2 -c install -f - pseudo issue bitbake bzip2 -c package -f - pseudo issue So there is a pattern, we have to execute a task where we enable pseudo. Once we've done that we see a problem. Pseudo is left loaded but disabled for -native tasks and compile but is active for install/package of target recipes. This implies that once we enable pseudo for a child, there is some change in the parent which persists. Let me talk a little about what should happen. The code in question is in runqueue.py, the function fork_off_task(). Our starting position is pseudo is loaded but disabled as per the variables in scripts/bitbake (effectively PSEUDO_DISABLED=1). We then look at whether the child we need to create should run pseudo. If it does, we look at FAKEROOTENV (set from meta/conf/bitbake.conf) to: "PSEUDO_PREFIX=${STAGING_DIR_NATIVE}${prefix_native} PSEUDO_LOCALSTATEDIR=${PSEUDO_LOCALSTATEDIR} PSEUDO_PASSWD= ${PSEUDO_PASSWD} PSEUDO_NOSYMLINKEXP=1 PSEUDO_DISABLED=0" so we poke those into the environment, then fork(), the child does its thing under pseudo and the parent restores the environment to its original values. If the child does not need to run under pseudo, we process FAKEROOTNOENV ("PSEUDO_UNLOAD=1") and set this, fork the child where pseudo should have unloaded and then reset the environment back in the parent. So somehow the pseudo in the parent is changing state after we run any pseudo task. If we then run a command using popen later in the parent context, pseudo is complaining. I'm not sure if this is a bug in the way we're using pseudo, the code implmentation has an issue somewhere I'm not seeing or whether pseduo shouldn't be changing state like this. I did note I can make the error "disappear" with: diff --git a/scripts/bitbake b/scripts/bitbake index 45c8697..a5b1539 100755 --- a/scripts/bitbake +++ b/scripts/bitbake @@ -1,7 +1,7 @@ #!/bin/sh export BBFETCH2=True -export BB_ENV_EXTRAWHITE="PSEUDO_BUILD PSEUDO_DISABLED $BB_ENV_EXTRAWHITE" +export BB_ENV_EXTRAWHITE="PSEUDO_BUILD PSEUDO_DISABLED PSEUDO_PREFIX $BB_ENV_EXTRAWHITE" NO_BUILD_OPTS="--version -h --help -p --parse-only -s --show-versions -e --environment -g --graphviz" PASSTHROUGH_OPTS="-D -DD -DDD -DDDD -v" since the envbackup is unsetting PSEUDO_PREFIX in the pseudo enabled task case which seems to be the trigger. What puzzles me is we get this value from envbackup[key] = os.environ.get("PSEUDO_PREFIX") so its already not in the environment. So basically if we read "PSEUDO_PREFIX" from the environment we get nothing. If we unset the value back to being "nothing", things break. This would imply we have some other issue going on here somewhere... Cheers, Richard ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-24 17:15 ` Richard Purdie @ 2012-03-24 17:41 ` Richard Purdie 2012-03-26 16:44 ` Peter Seebach 2012-03-26 7:43 ` Peter Seebach ` (2 subsequent siblings) 3 siblings, 1 reply; 24+ messages in thread From: Richard Purdie @ 2012-03-24 17:41 UTC (permalink / raw) To: Peter Seebach; +Cc: Paul Eggleton, yocto On Sat, 2012-03-24 at 17:15 +0000, Richard Purdie wrote: > What puzzles me is we get this value from envbackup[key] = > os.environ.get("PSEUDO_PREFIX") so its already not in the environment. > > So basically if we read "PSEUDO_PREFIX" from the environment we get > nothing. If we unset the value back to being "nothing", things break. > > This would imply we have some other issue going on here somewhere... I've discovered that if I add: os.environ["PSEUDO_PREFIX"] = "/media/build1/poky/build/tmp/sysroots/x86_64-linux/usr" os.environ["PSEUDO_LOCALSTATEDIR"] = "/media/build1/poky/build/tmp/work/i586-poky-linux/bzip2-1.0.6-r5/pseudo/" del os.environ["PSEUDO_PREFIX"] del os.environ["PSEUDO_LOCALSTATEDIR"] to the top of fork_off_task(), then "bitbake bzip2 -c compile -f" will start to fail with the pseudo error like the install/package case. One or the other of the above on their own doesn't do this. Funky. Cheers, Richard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-24 17:41 ` Richard Purdie @ 2012-03-26 16:44 ` Peter Seebach 2012-03-26 16:47 ` Richard Purdie 0 siblings, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-26 16:44 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto On Sat, 24 Mar 2012 17:41:43 +0000 Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > One or the other of the above on their own doesn't do this. Funky. That's very strange. I wouldn't have expected LOCALSTATEDIR to have any effect either way; it might change how pseudo runs, but it shouldn't affect whether it's being enabled. If we are starting with pseudo loaded, I'm pretty sure it's unsafe to unset PSEUDO_PREFIX ever. After a fork(), pseudo will still be in memory, and if PSEUDO_PREFIX is unset, Bad Things Happen. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-26 16:44 ` Peter Seebach @ 2012-03-26 16:47 ` Richard Purdie 2012-03-26 17:18 ` Peter Seebach 0 siblings, 1 reply; 24+ messages in thread From: Richard Purdie @ 2012-03-26 16:47 UTC (permalink / raw) To: Peter Seebach; +Cc: Paul Eggleton, yocto On Mon, 2012-03-26 at 11:44 -0500, Peter Seebach wrote: > On Sat, 24 Mar 2012 17:41:43 +0000 > Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > > > One or the other of the above on their own doesn't do this. Funky. > > That's very strange. I wouldn't have expected LOCALSTATEDIR to have > any effect either way; it might change how pseudo runs, but it shouldn't > affect whether it's being enabled. > > If we are starting with pseudo loaded, I'm pretty sure it's unsafe to > unset PSEUDO_PREFIX ever. After a fork(), pseudo will still be in > memory, and if PSEUDO_PREFIX is unset, Bad Things Happen. This is pretty much what we do at the moment, it gets unset after we load. Pseudo is of course disabled at this point. I guess we just got lucky to this point and avoided "Bad Things"? Cheers, Richard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-26 16:47 ` Richard Purdie @ 2012-03-26 17:18 ` Peter Seebach 2012-03-26 21:45 ` Richard Purdie 0 siblings, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-26 17:18 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto On Mon, 26 Mar 2012 17:47:29 +0100 Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > This is pretty much what we do at the moment, it gets unset after we > load. Pseudo is of course disabled at this point. > > I guess we just got lucky to this point and avoided "Bad Things"? I suspect so. What's weird to me is that PSEUDO_PREFIX wasn't in the environment before, either. So I still don't quite get this. I am still missing something which will make this all make sense. ... at this point, I am leaning towards viewing this as a bug where it is not enough to simply correct the behavior, I will not feel confident in it until I have understood how it could have happened, but worked in many other cases. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-26 17:18 ` Peter Seebach @ 2012-03-26 21:45 ` Richard Purdie 2012-03-27 3:47 ` Peter Seebach 2012-03-27 14:26 ` Peter Seebach 0 siblings, 2 replies; 24+ messages in thread From: Richard Purdie @ 2012-03-26 21:45 UTC (permalink / raw) To: Peter Seebach; +Cc: Paul Eggleton, yocto On Mon, 2012-03-26 at 12:18 -0500, Peter Seebach wrote: > On Mon, 26 Mar 2012 17:47:29 +0100 > Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > > > This is pretty much what we do at the moment, it gets unset after we > > load. Pseudo is of course disabled at this point. > > > > I guess we just got lucky to this point and avoided "Bad Things"? > > I suspect so. What's weird to me is that PSEUDO_PREFIX wasn't in the > environment before, either. So I still don't quite get this. I am > still missing something which will make this all make sense. > > ... at this point, I am leaning towards viewing this as a bug where it > is not enough to simply correct the behavior, I will not feel confident > in it until I have understood how it could have happened, but worked in > many other cases. This is why I'm not saying lets just set PSEUDO_PREFIX. Its bothering me too. > > What puzzles me is we get this value from envbackup[key] = > > os.environ.get("PSEUDO_PREFIX") so its already not in the > > environment. > > > > So basically if we read "PSEUDO_PREFIX" from the environment we get > > nothing. If we unset the value back to being "nothing", things > > break. > > Yes. This is, of course, obviously impossible. > Obviously :). Except the code does this and I've watched it happen. I'm not claiming to understand it... > Hmm. Well, hmm. When we start up, we should pick up PSEUDO_PREFIX > from our environment, and during some of the initial client setup, we > should be stashing that value in our stashed values table. At this > point, so far as I can tell, nothing should ever unset that stashed > value. > > On fork(), we don't change anything until we're in the client side of > the fork, but that setup should happen in the same address space, with > the values still stashed. When we poke new values into the environment, will it corrupt the internal stash? > > Oh, nevermind, I just realized: We use antimagic as the > implementation > goo for PSEUDO_DISABLED. > > So a call to os.popen() from a program which has PSEUDO_DISABLED set > is going to think it's in antimagic mode. > > And suddenly, the trick is revealed: > > os.popen() is bypassing all the runqueue stuff which is trying to > ensure that the environment is in a valid state. So if bitbake code > calls os.popen(), it may behave weirdly, for the same reason that any > other direct invocation of fork() or system() or whatnot would behave > weirdly -- because bitbake is running with pseudo in a strange state. I'd be very surprised if we don't make some other system() call somewhere in bitbake's parent context. If this were a trigger, it could go a long way to explaining some errors people have reported though. > > So I think the thing is: > > Because bitbake is running with PSEUDO_DISABLED, any child process > that > is not explicitly set to either enable or unload pseudo is going to be > running under pseudo, with PSEUDO_DISABLED. And that means we need to > ensure that PSEUDO_PREFIX stays set, because we need that even when > pseudo is disabled. (It's used during some of the initial setup > sanity checks.) This is the answer I was expecting we'd come to. I'm still a little surprised we don't make any system() calls though. I just tried putting a os.system("true") call into the "breakit" class and it doesn't trigger the warnings. Could that be down to the lack of a popen wrapper? Cheers, Richard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-26 21:45 ` Richard Purdie @ 2012-03-27 3:47 ` Peter Seebach 2012-03-27 14:26 ` Peter Seebach 1 sibling, 0 replies; 24+ messages in thread From: Peter Seebach @ 2012-03-27 3:47 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto On Mon, 26 Mar 2012 22:45:16 +0100 Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > I'm still a little surprised we don't make any system() calls though. > I just tried putting a os.system("true") call into the "breakit" > class and it doesn't trigger the warnings. Could that be down to the > lack of a popen wrapper? I think that part could be, but this doesn't explain why adding the popen() wrapper doesn't fix it. Oh, wait. Yes, it does. I think I am now happy with this, although I have a loose end or two. So, here's what I've figured out. We start bitbake with PSEUDO_PREFIX set. This then gets stored in the internal stash. Thus, on any call we catch, we should be restoring it. We then unset it, because it's not part of the whitelisted environment. Now, what this means is that when we spawn child processes, they should be getting the environment, but the parent bitbake is running with PSEUDO_DISABLED. Which, in turn, sets antimagic. So most calls run through their non-wrapped form. The problem: I wrote my popen() wrapper wrong. See, I carefully removed the check for pseudo_disabled from the top of it. But! That code path is not actually the only way in which pseudo_disabled affects behavior. That's just an *OPTIMIZATION*. The pseudo_disabled flag also means that antimagic is set, and I copied that part of the wrapper in unmodified: if (antimagic > 0) { /* call the real syscall */ rc = (*real_popen)(command, mode); } else { /* exec*() use this to restore the sig mask */ pseudo_saved_sigmask = saved; rc = wrap_popen(command, mode); } And antimagic is 1, so I call real popen. Which forks. And even though pseudo isn't in the LD_PRELOAD environment variable, it's still in the process's address space, but PSEUDO_PREFIX isn't set, and for some reason, the stashed value is missing. Not sure I can explain that part yet; maybe we do have a path where we wipe the stashed value. But the underlying problem is that my popen() wrapper was never actually doing the setupenv/dropenv, or just a setupenv. And the other underlying problem is that calling os.popen() directly is probably something we should discourage, because we really do want to know, for each subprocess we plan to spawn, whether it is running in the pseudo environment or not. So if you call os.popen(), you get "whatever state bitbake is in", which may not be at all what you wanted or intended. If I fix the popen() patch, it may actually start working again, although I'm still not totally sure why the prefix is getting wiped out. So... I think... 1. We should probably whitelist PSEUDO_PREFIX because life is a heck of a lot easier if we aren't trying to set it and unset it all the time. 2. I need to fix my popen patch. 3. Once I've fixed that, I can probably do a much better job of articulating what's happening to pseudo_prefix that's causing us to end up with a child process where both the internal stash and the environment variable are unset. The big thing I was missing was that PSEUDO_DISABLED implies that everything will always have antimagic >= 1. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-26 21:45 ` Richard Purdie 2012-03-27 3:47 ` Peter Seebach @ 2012-03-27 14:26 ` Peter Seebach 1 sibling, 0 replies; 24+ messages in thread From: Peter Seebach @ 2012-03-27 14:26 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto On Mon, 26 Mar 2012 22:45:16 +0100 Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > I'm still a little surprised we don't make any system() calls though. > I just tried putting a os.system("true") call into the "breakit" > class and it doesn't trigger the warnings. Could that be down to the > lack of a popen wrapper? Conclusion: Yes, it could. Once I understood why the popen wrapper wasn't working, it got better; using the popen wrapper fixes things for the breakit class, and probably for hob also. However, the popen() wrapper is not really restoring the environment after it runs. (In fact, I just looked at system(), and it does a setupenv() but not a dropenv(), so it will, I think, be running anything it runs with pseudo loaded but disabled, I think.) I still think it might be wiser for us to move PSEUDO_PREFIX into the list of things that are always set. I'm not sure; if we don't, we have a good chance of finding weird problems faster, I guess! -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-24 17:15 ` Richard Purdie 2012-03-24 17:41 ` Richard Purdie @ 2012-03-26 7:43 ` Peter Seebach 2012-03-26 9:23 ` Richard Purdie 2012-03-26 20:36 ` Peter Seebach 2012-03-26 20:41 ` Peter Seebach 3 siblings, 1 reply; 24+ messages in thread From: Peter Seebach @ 2012-03-26 7:43 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto On Sat, 24 Mar 2012 17:15:15 +0000 Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > This implies that once we enable pseudo for a child, there is some > change in the parent which persists. Hmm. Is the parent running with pseudo loaded? If it were, then I would expect this -- pseudo does some environment magic that can affect the parent, and also cause it to stash values for later restoration. Otherwise, I'm less sure, because the parent process should be able to mess with its environment all it wants. Pseudo has some hackery for restoring missing values, to let things like "env -i ..." work. -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-26 7:43 ` Peter Seebach @ 2012-03-26 9:23 ` Richard Purdie 0 siblings, 0 replies; 24+ messages in thread From: Richard Purdie @ 2012-03-26 9:23 UTC (permalink / raw) To: Peter Seebach; +Cc: Paul Eggleton, yocto On Mon, 2012-03-26 at 02:43 -0500, Peter Seebach wrote: > On Sat, 24 Mar 2012 17:15:15 +0000 > Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > > > This implies that once we enable pseudo for a child, there is some > > change in the parent which persists. > > Hmm. > > Is the parent running with pseudo loaded? If it were, then I would > expect this -- pseudo does some environment magic that can affect the > parent, and also cause it to stash values for later restoration. BitBake always runs with pseudo loaded but disabled. When we fork a child, we decide whether to enable it or not and setup the environment to either bring pseudo to life, or unload it from memory entirely. It sounds to me like our enable/disable code confuses the heck out of pseudo in the parent environment, likely overwriting a stashed value which it needs to be able to find itself again (prefix/localstate). Obviously we can change the way we manipulate the environment although that isn't entirely straightforward as we need to ensure the pseudo paths don't leak into sstate package checksums for example. Cheers, Richard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-24 17:15 ` Richard Purdie 2012-03-24 17:41 ` Richard Purdie 2012-03-26 7:43 ` Peter Seebach @ 2012-03-26 20:36 ` Peter Seebach 2012-03-26 20:41 ` Peter Seebach 3 siblings, 0 replies; 24+ messages in thread From: Peter Seebach @ 2012-03-26 20:36 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto On Sat, 24 Mar 2012 17:15:15 +0000 Richard Purdie <richard.purdie@linuxfoundation.org> wrote: > What puzzles me is we get this value from envbackup[key] = > os.environ.get("PSEUDO_PREFIX") so its already not in the environment. > > So basically if we read "PSEUDO_PREFIX" from the environment we get > nothing. If we unset the value back to being "nothing", things break. Yes. This is, of course, obviously impossible. Hmm. Well, hmm. When we start up, we should pick up PSEUDO_PREFIX from our environment, and during some of the initial client setup, we should be stashing that value in our stashed values table. At this point, so far as I can tell, nothing should ever unset that stashed value. On fork(), we don't change anything until we're in the client side of the fork, but that setup should happen in the same address space, with the values still stashed. I did find one other thing, though, which worries me. I added a popen() wrapper, and the thing is. We're calling popen() with the "antimagic" bit set (the one that suppresses all the wrappers). Which would cause all sorts of problems, and I can't figure out how it'd be happening. So my new theory: * There's something specific causing us to end up invoking popen() with the antimagic bit set. This is obviously impossible. * But that means that, even if we trap other syscalls made by popen(), we won't be doing wrappers or fixups. * And that could expose problems to do with PSEUDO_PREFIX getting unset unexpectedly. So I think adding it to BB_ENV_EXTRAWHITE will hide this, but it won't explain how we're getting into a popen() call in antimagic mode. (Antimagic is the internal thing pseudo uses while trying to do client/server communications. Pretty sure it never calls popen.) -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pseudo interaction issue 2012-03-24 17:15 ` Richard Purdie ` (2 preceding siblings ...) 2012-03-26 20:36 ` Peter Seebach @ 2012-03-26 20:41 ` Peter Seebach 3 siblings, 0 replies; 24+ messages in thread From: Peter Seebach @ 2012-03-26 20:41 UTC (permalink / raw) To: Richard Purdie; +Cc: Paul Eggleton, yocto Oh, nevermind, I just realized: We use antimagic as the implementation goo for PSEUDO_DISABLED. So a call to os.popen() from a program which has PSEUDO_DISABLED set is going to think it's in antimagic mode. And suddenly, the trick is revealed: os.popen() is bypassing all the runqueue stuff which is trying to ensure that the environment is in a valid state. So if bitbake code calls os.popen(), it may behave weirdly, for the same reason that any other direct invocation of fork() or system() or whatnot would behave weirdly -- because bitbake is running with pseudo in a strange state. So I think the thing is: Because bitbake is running with PSEUDO_DISABLED, any child process that is not explicitly set to either enable or unload pseudo is going to be running under pseudo, with PSEUDO_DISABLED. And that means we need to ensure that PSEUDO_PREFIX stays set, because we need that even when pseudo is disabled. (It's used during some of the initial setup sanity checks.) -s -- Listen, get this. Nobody with a good compiler needs to be justified. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2012-03-27 14:26 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-17 17:35 pseudo interaction issue Paul Eggleton 2012-02-17 18:50 ` Mark Hatle 2012-03-14 9:02 ` Xu, Dongxiao 2012-03-22 1:49 ` Xu, Dongxiao 2012-03-22 16:18 ` Peter Seebach 2012-03-23 1:01 ` Xu, Dongxiao 2012-03-23 2:29 ` Peter Seebach 2012-03-23 3:21 ` Xu, Dongxiao 2012-03-23 7:16 ` Peter Seebach 2012-03-23 12:20 ` Paul Eggleton 2012-03-23 20:06 ` Peter Seebach 2012-03-23 22:45 ` Peter Seebach 2012-03-24 17:15 ` Richard Purdie 2012-03-24 17:41 ` Richard Purdie 2012-03-26 16:44 ` Peter Seebach 2012-03-26 16:47 ` Richard Purdie 2012-03-26 17:18 ` Peter Seebach 2012-03-26 21:45 ` Richard Purdie 2012-03-27 3:47 ` Peter Seebach 2012-03-27 14:26 ` Peter Seebach 2012-03-26 7:43 ` Peter Seebach 2012-03-26 9:23 ` Richard Purdie 2012-03-26 20:36 ` Peter Seebach 2012-03-26 20:41 ` Peter Seebach
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.