dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* The behavior of `jobs -p` is definitely, without a doubt, a bug
@ 2016-05-13  2:06 Geoff Nixon
  2016-05-18 21:03 ` Harald van Dijk
  0 siblings, 1 reply; 4+ messages in thread
From: Geoff Nixon @ 2016-05-13  2:06 UTC (permalink / raw)
  To: dash, 482999

Package: dash
Version: build-from-git-HEAD--http://git.kernel.org/cgit/utils/dash/dash
Severity: serious
X-Debbugs-CC: herbert@gondor.apana.org.au
---

To Whom It May Concern:

I wrote most of this before I read through the Debian bug report process.
I'm actually not sure whether this is supposed to go to Debian or to the
kernel.org address; So I apologize if some of it is redundant or sounds a
bit weird; I assumed I'd be "creating a new thread", so I refer to this
thread as "that thread", etc.

If I've made any other errors, again apologizes in advance; it's the first time
I've worked with the Debian process, and I'm using dash on another platform, so
I don't have access to "reportbug", etc.

Also, if I come off a bit pissy, petulant, pedantic, or arrogant, it's only
because I have so many, many times *been* "that guy" -- someone who takes the
time and goes out of his way to write and file a bug report and send it through
the proper channels -- only have it basically dismissed out of hand. So when
I found myself in the position of not only reporting a bug, but having to
refute the incorrect arguments made that led to the bug report being dismissed,
and thinking about how this could have been fixed 8 years ago, it did irk me.
So yes, I was a bit peeved as I wrote it, it is absolutely nothing personal
(except in that it's some of my own emotional baggage).

So, once again, apologies, I truly mean no disrespect.

-------------------------------------------------------------------------------

So I thought I was going to be reporting a "new" bug today.

But it turns out, this came up way back in 2008:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=482999

At at the time, the curt response and resolution seemed to be all of "no", and
"you should just use $!".

So I guess rather than a "bug report" per se, this is a vociferous argument
questioning that conclusion, and a request that it be reconsidered.

To me, this is definitely, absolutely, positively a bug.

---

Let me start with a couple of corrections to that previous thread.


1. The line: jobs -p > /tmp/pids                   # this works

Does *not*, in fact, work. Meaning there is *no* instance in which it works.

Now (8 years later), to the assertions made in that thread.

2. This response:

> Hi, I don't think that's a bug, the jobs builtin 'shall display the
> status of jobs that were started in the current shell environment;'[0]
> 
> When running jobs in a subshell, you change the shell environment for
> the jobs builtin.
> 
> Regards, Gerrit.
> 
> [0] http://www.opengroup.org/onlinepubs/009695399/utilities/jobs.html

Is just flat incorrect. In fact, it is almost the complete opposite of
what the standard says. Perhaps we should refer back to (I'm intentionally
using Issue 6 here, since that was the standard at the time of the original
discussion, but nothing of relevance differs in Issue 7)
http://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html,
"Shell Command Lanuage", in particular (all emphasis mine):

  2.12. Shell Execution Environment
  ---------------------------------
  ...

  *The environment of the shell process shall not be changed by the utility
  unless explicitly specified by the utility description* (for example, cd and
  umask).

  A subshell environment *shall be created as a duplicate of the shell
  environment*, except that signal traps that are not being ignored shall be
  set to the default action. Changes made to the subshell environment shall not
  affect the shell environment. *Command substitution*, commands that are
  grouped with parentheses, and *asynchronous lists* shall be executed in a
  subshell environment.

and

  2.9.3 Lists
  -----------
  ...

  Asynchronous Lists

  If a command is terminated by the control operator ampersand ( '&' ), the
shell shall execute the command asynchronously in a subshell. 
  ...
  When an element of an asynchronous list (the portion of the list ended by an
  ampersand ...), is started by the shell, the process ID of the last command
  in the asynchronous list element shall become known in the current shell
  execution environment; see Shell Execution Environment.

In review, a subshell inherits a duplicate of the parent environment, which
includes the asynchronous list of background tasks, and the shell environment is
*explicitly* not to be changed, unless otherwise specified. Which it is not.

Basically:  `(this)`, `$(this)`, `this &`, a few other cases, are subshells.
They inherit the environment of the parent shell. The other interpretation given
describes something like that which occurs with an invocation of `sh -c '...'`,
which is not the case here.

3. But let us return to the specification for "jobs". Since subshells are
really not even the issue here at all. Indeed, the specification does says it

  'shall display the status of jobs that were started in the current shell
  environment'

as quoted. In the description, but if you read a bit futher, it goes on to in
face specify *how* it should do so. This includes, specifically,

  STDOUT
  ------
  If the -p option is specified, the output shall consist of one line for each
  process ID:

  "%d\n", <process ID>

That is: `jobs -p` *is supposed to go to standard output*, and something like
the line below (note, here, we have no subshell):

  `jobs -p | while read job; do echo $job; done`

should work, as should all the examples in the original bug report:

  `echo $(jobs -p)`
  `for i in $(jobs -p); do echo $i; done`
  `jobs -p | xargs`

But they do not. The present behavior is to simply dump the process IDs directly
to the TTY, it seems. Which is not, at all, what the specification dictates.

5. Then lets read a bit further, shall we? Although from the "informative" part
of the spec, there's this:

  APPLICATION USAGE
  -----------------
  *The -p option is the only portable way* to find out the process group of a
  job because different implementations have different strategies for defining
  the process group of the job. *Usage such as ** $(jobs -p) ** provides a way
  of referring to the process group of the job in an implementation-independent
  way.

  The jobs utility does not work as expected when it is operating in its own
  utility execution environment because that environment has no applicable jobs
  to manipulate... For this reason, jobs is generally implemented as a shell
regular built-in. 

So, I'm pretty sure the specification intends for echo $(jobs -p) to work.
Since it *directly references that exact usage*. I'm not sure what's exactly
meant by "generally implemented as a shell regular built-in" -- but I indeed
just tried for an hour to see if I could write a shell function that could be
used to emulate the same behavior. But I couldn't find (and I doubt there is) a
way to do it without at lease something non-POSX, like a ksh DEBUG trap.

And finally, do I really need to lay out the reasons why one might need the
ability to list *all* jobs, as opposed to simply the last one used?

To me, "just use $!" sounds to me a lot more like "just go use another shell".


I apologize for being brusk.
Thank you for your attention to this matter.


Geoff Nixon
geoff at geoff.codes


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The behavior of `jobs -p` is definitely, without a doubt, a bug
  2016-05-13  2:06 The behavior of `jobs -p` is definitely, without a doubt, a bug Geoff Nixon
@ 2016-05-18 21:03 ` Harald van Dijk
  2016-05-19  0:03   ` Geoff Nixon
  0 siblings, 1 reply; 4+ messages in thread
From: Harald van Dijk @ 2016-05-18 21:03 UTC (permalink / raw)
  To: dash; +Cc: Geoff Nixon

Hi,

On 13/05/16 04:06, Geoff Nixon wrote:
 > Let me start with a couple of corrections to that previous thread.
 >
 >
 > 1. The line: jobs -p > /tmp/pids                   # this works
 >
 > Does *not*, in fact, work. Meaning there is *no* instance in which it 
works.

This should work just fine. Do you have a short test script where it 
does not work for you?

 > But they do not. The present behavior is to simply dump the process 
IDs directly
 > to the TTY, it seems. Which is not, at all, what the specification 
dictates.

That's not what happens. While there are a lot of contexts where you can 
reasonably argue that jobs -p should output PIDs to stdout and where 
dash doesn't, when it doesn't, it doesn't output anything. It doesn't 
somehow start dumping PIDs elsewhere.

 > In review, a subshell inherits a duplicate of the parent environment, 
which
 > includes the asynchronous list of background tasks, and the shell 
environment is
 > *explicitly* not to be changed, unless otherwise specified. Which it 
is not.

This is the main point of your message, and it's reasonable, but it's 
not clearly right or clearly wrong. ksh and mksh agree with your 
interpretation. bash comes across as inconsistent (but there may be a 
logic that's simply not immediately obvious). dash disagrees with your 
interpretation, and zsh appears to disagree as well.

   sleep 1 &
   echo $(jobs -p)
   (jobs -p)
   jobs -p

ksh and mksh print the PID thrice.
bash prints the PID twice, it omits it in (jobs -p).
dash prints the PID once.
zsh is a bit unclear since its -p option has a more limited effect, but 
appears to work like dash, only printing one (non-empty) line of output.

Your logic relies on the list of jobs being a property of the 
environment. That's a reasonable interpretation, but not the only one. 
Another interpretation is the environment that started a job being a 
property of the job. In that interpretation, that information is not 
part of the environment and hence not copied when duplicating the 
environment. And it wouldn't be the first time that the "Application 
Usage" section of a utility contradicts the normative requirements.

That said, it does seem extremely likely that trivial examples such as
   echo $(jobs -p)
are intended to work the way you expect, regardless of whether the 
standard manages to state the requirements correctly. Other cases are 
not so clear, not when looking at the standard, and also not when 
looking at what other shells do. This makes it difficult to come up with 
a complete fix.

Personally, I feel that even if your interpretation is wrong (and I'm 
not saying it is), it is still less undesirable than dash's current 
behaviour. If there's a reasonable chance a patch for it would be 
accepted, I'd be willing to try to make it so.

Cheers,
Harald van Dijk

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The behavior of `jobs -p` is definitely, without a doubt, a bug
  2016-05-18 21:03 ` Harald van Dijk
@ 2016-05-19  0:03   ` Geoff Nixon
  2016-05-19  1:51     ` Eric Blake
  0 siblings, 1 reply; 4+ messages in thread
From: Geoff Nixon @ 2016-05-19  0:03 UTC (permalink / raw)
  To: Harald van Dijk; +Cc: dash, 482999


---- On Wed, 18 May 2016 14:03:59 -0700 Harald van Dijk <harald@gigawatt.nl> wrote ---- 
> Hi, 
> 
> On 13/05/16 04:06, Geoff Nixon wrote: 
> > Let me start with a couple of corrections to that previous thread. 
> > 
> > 
> > 1. The line: jobs -p > /tmp/pids # this works 
> > 
> > Does *not*, in fact, work. Meaning there is *no* instance in which it 
> works. 
> 
> This should work just fine. Do you have a short test script where it 
> does not work for you? 

Sorry, you are absolutely right, this does *does* work. I could have sworn it
didn't when I tried before, but perhaps I was conflating it with something like
`jobs -p | tee /tmp/pids`, which doesn't work.

> > But they do not. The present behavior is to simply dump the process 
> IDs directly 
> > to the TTY, it seems. Which is not, at all, what the specification 
> dictates. 
> 
> That's not what happens. While there are a lot of contexts where you can 
> reasonably argue that jobs -p should output PIDs to stdout and where 
> dash doesn't, when it doesn't, it doesn't output anything. It doesn't 
> somehow start dumping PIDs elsewhere. 

Indeed, you are correct again; my mistake. Due to my incorrect conclusion
(that `jobs -p > /tmp/pids` fails), along with the fact that the
output of `jobs` fails in a pipeline, I drew the incorrect assumption that the
output of `jobs` *never* was sent to stdout. Apologies.

> > In review, a subshell inherits a duplicate of the parent environment, 
> which 
> > includes the asynchronous list of background tasks, and the shell 
> environment is 
> > *explicitly* not to be changed, unless otherwise specified. Which it 
> is not. 
> 
> This is the main point of your message...

Yes, that is the core of my argument.

> .... and it's reasonable, but it's 
> not clearly right or clearly wrong. ksh and mksh agree with your 
> interpretation. bash comes across as inconsistent (but there may be a 
> logic that's simply not immediately obvious). dash disagrees with your 
> interpretation, and zsh appears to disagree as well. 
> 
> sleep 1 & 
> echo $(jobs -p) 
> (jobs -p) 
> jobs -p 
> 
> ksh and mksh print the PID thrice. 
> bash prints the PID twice, it omits it in (jobs -p). 
> dash prints the PID once. 
> zsh is a bit unclear since its -p option has a more limited effect, but 
> appears to work like dash, only printing one (non-empty) line of output. 

Oops! I somehow forgot to include the results of my tests with other
shells. You are correct, and in addition to what you mention above: pdksh, oksh,
and yash all also exhibit the "thrice" behavior, while busybox's ash
(at least my old 1.20.0 version) exhibits the same behavior as bash -- that is,
`(jobs -p)` does not work, but the others do.

I don't think zsh should be considered here, since it is explicitly
documented that `jobs -p` does something entirely different (process groups):
http://zsh.sourceforge.net/Doc/Release/Shell-Builtin-Commands.html

> Your logic relies on the list of jobs being a property of the 
> environment. That's a reasonable interpretation, but not the only one. 
> Another interpretation is the environment that started a job being a 
> property of the job. In that interpretation, that information is not 
> part of the environment and hence not copied when duplicating the 
> environment.

While it's not really worth bickering about, I do in fact believe my
"interpretation" is *specified*. It's not on the `jobs` page, but in
"Shell Command Language", where it dictates that the environment includes:
" - Process IDs of the last commands in asynchronous lists known to this
shell environment".

> And it wouldn't be the first time that the "Application 
> Usage" section of a utility contradicts the normative requirements. 

True ;)

> That said, it does seem extremely likely that trivial examples such as 
> echo $(jobs -p) 
> are intended to work the way you expect, regardless of whether the 
> standard manages to state the requirements correctly. Other cases are 
> not so clear, not when looking at the standard, and also not when 
> looking at what other shells do. This makes it difficult to come up with 
> a complete fix.

Again, speaking for myself, I think the standard is actually pretty clear here.
Referring again to the "Shell Command Language" page:

    The format for grouping commands is as follows:
    (compound-list)
    Execute compound-list in a *subshell environment*...

That is, I believe the behavior of bash and busybox does not fully adhere to the
the standard; pdksh, ksh93, mksh, yash, and oksh are compliant.

> Personally, I feel that even if your interpretation is wrong (and I'm 
> not saying it is), it is still less undesirable than dash's current 
> behaviour. If there's a reasonable chance a patch for it would be 
> accepted, I'd be willing to try to make it so. 

Agreed. I'd very much like to see a patch for this; and I certainly hope it
would be accepted!

At the *very, very least*, I think the fact that `jobs -p` can have stdout
redirected to a file, yet cannot be used in a pipeline, is most definitely 
bug. Can you think of any other command where stdout can be redirected to a file
but cannot be piped?

I'd be willing to take a stab at a patch myself as well, but I'd much rather
leave this to more capable hands (someone more familiar with the dash codebase).
So I appreciate the offer!

Best,
Geoff

> 
> Cheers, 
> Harald van Dijk 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The behavior of `jobs -p` is definitely, without a doubt, a bug
  2016-05-19  0:03   ` Geoff Nixon
@ 2016-05-19  1:51     ` Eric Blake
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Blake @ 2016-05-19  1:51 UTC (permalink / raw)
  To: Geoff Nixon, Harald van Dijk; +Cc: dash, 482999

[-- Attachment #1: Type: text/plain, Size: 1758 bytes --]

On 05/18/2016 06:03 PM, Geoff Nixon wrote:
> Again, speaking for myself, I think the standard is actually pretty clear here.
> Referring again to the "Shell Command Language" page:
> 
>     The format for grouping commands is as follows:
>     (compound-list)
>     Execute compound-list in a *subshell environment*...
> 
> That is, I believe the behavior of bash and busybox does not fully adhere to the
> the standard; pdksh, ksh93, mksh, yash, and oksh are compliant.
> 
>> Personally, I feel that even if your interpretation is wrong (and I'm 
>> not saying it is), it is still less undesirable than dash's current 
>> behaviour. If there's a reasonable chance a patch for it would be 
>> accepted, I'd be willing to try to make it so. 
> 
> Agreed. I'd very much like to see a patch for this; and I certainly hope it
> would be accepted!
> 
> At the *very, very least*, I think the fact that `jobs -p` can have stdout
> redirected to a file, yet cannot be used in a pipeline, is most definitely 
> bug. Can you think of any other command where stdout can be redirected to a file
> but cannot be piped?

http://austingroupbugs.net/view.php?id=53

The trap command is in the same boat as jobs, where redirecting to a
file is different than execution in a subshell, and where the shell may
special case (but perhaps by lexical analysis only) that if a subshell
is about to run where only the single command is being executed, then it
can behave as if that single command were in the context of the parent
instead of being a true subshell, precisely for the purpose of giving
output that would otherwise be lost.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-05-19  1:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13  2:06 The behavior of `jobs -p` is definitely, without a doubt, a bug Geoff Nixon
2016-05-18 21:03 ` Harald van Dijk
2016-05-19  0:03   ` Geoff Nixon
2016-05-19  1:51     ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).