All of lore.kernel.org
 help / color / mirror / Atom feed
* Meta-variable naming convention in documentation
@ 2010-02-26  4:55 Mark Lodato
  2010-02-26  5:42 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Lodato @ 2010-02-26  4:55 UTC (permalink / raw)
  To: git list; +Cc: Junio C Hamano

Currently, git's documentation and usage statements are inconsistent in
the meta-variables (e.g., <path>) used to described positional arguments
that denote a path.  This post is an attempt to document the existing
issues and to propose a potential solution.

To begin, I ran each of the git sub-commands in git 1.7.0 and documented
its behavior.  Each row contains the program, the meta-variables, and
the type.  If the man page and the '-h' usage statement differ, I list
both.  For example, git-annotate uses "file" in both the man page and
the usage statement, while git-blame uses "<file>" in the man page but
"file" in the usage statement.

The following accept exact filenames.  No directory recursion or
globing is applied.  The first group die if the file does not exist,
while the second group "filters" (f) -- they silently ignore
non-matching files.

	program			man[/usage]		type
	-------			-----------		----
	git-annotate		file			file
	git-blame		<file>/file		file
	git-checkout-index	<file>			file
	git-hash-object		<file>			file
	git-mailsplit		<mbox>|<Maildir>	file
	git-merge-file		<current-file> ...	file
	git-merge-index		<file>/<filename>	file
	git-merge-one-file	<path>			file
	git-mergetool		<file>/file to merge	file
	git-mv			<args>/<source>		file
	git-pack-redundant	.pack filename/<...>	file
	git-send-email		<file>			file
	git-update-index	<file>			file
	git-am			<dir>			dir
	git-clone		<directory>		dir
	git-cvsserver		<directory>		dir
	git-daemon		<directory>		dir
	git-fetch-pack		<directory>		dir
	git-filter-branch	<directory>		dir
	git-format-patch	<dir>			dir
	git-fsck		<dir>			dir
	git-mailsplit		<directory>		dir
	git-peek-remote		<directory>		dir
	git-quiltimport		<dir>			dir
	git-receive-pack	<directory>		dir
	git-relink		<dir>			dir
	git-send-pack		<directory>		dir
	git-submodule		<path>			dir
	git-upload-archive	<directory>		dir
	git-upload-pack		<directory>		dir
	git-bundle		<file>			outfile
	git-mailinfo		<patch>			outfile
	git-pack-objects	base-name		outfile

	git-diff-tree		<path>			file	(f)
	git-ls-tree		paths/path		file	(f)

The following accept exact filenames or directories.  If a directory,
this matches all files within that directory recursively.

	git-archive		path			fdir

	git-bisect		<paths>/<pathspec>	fdir	(f)
	git-diff		<path>			fdir	(f)
	git-diff-files		<path>			fdir	(f)
	git-diff-index		<path>			fdir	(f)
	git-difftool		<path>/(nothing)	fdir	(f)
	git-log			<path>			fdir	(f)
	git-reset		<paths>			fdir	(f)
	git-rev-list		<paths>/paths		fdir	(f)
	git-show		(undoc)/<path>		fdir	(f)
	git-whatchanged		(undocumented)		fdir	(f)
	gitk			<path>			fdir	(f)

The following accept recursive directory matches or path globs.

	git-add			<filepattern>		glob
	git-checkout		<paths>/<file>		glob
	git-commit		<file>/<filepattern>	glob
	git-rm			<file>			glob

	git-clean		<path>/<paths>		glob	(f)
	git-grep		<path>/path		glob	(f)
	git-ls-files		<file>			glob	(f)
	git-status	<pathspec>/<filepattern>	glob?	(f)

* git-rm has --ignore-unmatch, which causes it to "filter"
* git-status only accepts globs for untracked files (currently a bug?)

Here are some examples showing what I mean by "type".  (May be useful to
have something like this in the documentation.)

    pattern            file  fdir  glob
    -------            ----  ----  ----
    path/to/base.ext   yes   yes   yes
    path                -    yes   yes
    *xt                 -     -    yes
    p*t                 -     -    yes
    base.ext            -     -     -
    b*                  -     -     -
    to                  -     -     -

>From the above, it appears that there are four major groups:
* 'file' or 'dir'
* 'fdir' (f)
* 'glob'
* 'glob' (f)

(The only exceptions are git-diff-tree, git-ls-tree, and git-archive.)

Therefore, I suggest that we stick to a consistent naming convention for
these four groups, and document them in git(1).  Here's is a proposal,
but I'm not tied to these names.
* <file> for 'file'
* <dir> for 'dir'
* <path> for 'fdir' (f)
* <filepattern> for 'glob'
* <pathspec> for 'glob' (f)

We would have to come up with something for the three exceptions above,
and some commands would benefit from more detailed meta-vars, such as
"<source>... <destination>" for git-mv.

Additionally, it would be nice if all non-filtering commands had
--ignore-unmatch, and all filtering commands had an option to die/warn
if any arguments did not match.  But, this is much more work than
a simple documentation change.

Anyway, what do you think of this proposal?  I am not sure that I have
my partitioning correct, but I would like to see some sort of
consistency in the documentation.  I would be happy to implement
whatever is decided.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Meta-variable naming convention in documentation
  2010-02-26  4:55 Meta-variable naming convention in documentation Mark Lodato
@ 2010-02-26  5:42 ` Junio C Hamano
  2010-02-26 22:51   ` Mark Lodato
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2010-02-26  5:42 UTC (permalink / raw)
  To: Mark Lodato; +Cc: git list

Mark Lodato <lodatom@gmail.com> writes:

> The following accept exact filenames or directories.  If a directory,
> this matches all files within that directory recursively.
>
> 	git-archive		path			fdir
>
> 	git-bisect		<paths>/<pathspec>	fdir	(f)

What does the gap between these two mean?  Do you mean "bisect and later
are not in the 'exact filenames or directories' group"?

In general, unless the command takes only one filesystem entity (e.g. in
"format-patch -o <dir>", <dir> cannot be anything but a single directory;
in "blame <file>", <file> cannot be anything but a single file), you never
give a single "filename" to git command.  Even when you say "git add
Makefile", you are _not_ giving a filename that is "M' "a" "k" "e" ...;
you are giving a _pattern_ to be matched with files git would find by
traversing the filesystem.  In the case of "Makefile", it may happen to
match only one single file.

This pattern is called "pathspec", and commands that can take one pathspec
can always take more than one.

Unfortunately, for historical reasons, there are two semantics of
pathspec, and at least three implementations of pathspec logic.

 - diff family (diff, log, show, rev-list) does not support glob.  The
   pattern is matched either as a leading directory path, or exact name.

 - ls-files family (I think "clean" also uses the logic internally) does
   support glob.  The pattern is matched either as a leading directory
   path, exact name, or a glob.

 - grep implements the same logic as ls-files but uses a newer
   implementation better suited for optimized tree traversal.

And there are higher level commands that internally use logic from either
diff family or ls-files family.  You can guess which pathspec is used if
you think about how you would implement what they do.  For example:

 - "add <pathspec>" traverses the work tree using ls-files logic and adds
   found files to the index.

 - "add -u <pathspec>" compares the index and the work tree using
   diff-files logic and adds paths with differences to the index.

 - "status" uses "diff-index --cached" logic to come up with 'Changed to
   be committed' list, "diff-files" logic to come up with 'Changed but not
   updated' list, and "ls-files" logic to list 'Untracked' files.

Unifying the two different semantics of pathspecs is one of the suggested
topics for GSoC, by the way.

I think it would make sense to document which ones are concrete paths
(e.g. "blame takes a filename" vs "diff takes zero or more pathspecs"),
but it would not make much sense to document the two different pathspecs.
The effort is better spent at fixing the difference --- obviously we would
eventually want to be able to say "git diff 'lib/*.h'".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Meta-variable naming convention in documentation
  2010-02-26  5:42 ` Junio C Hamano
@ 2010-02-26 22:51   ` Mark Lodato
  2010-02-27  2:41     ` Mark Lodato
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Lodato @ 2010-02-26 22:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git list

On Fri, Feb 26, 2010 at 12:42 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Mark Lodato <lodatom@gmail.com> writes:
>
>> The following accept exact filenames or directories.  If a directory,
>> this matches all files within that directory recursively.
>>
>>       git-archive             path                    fdir
>>
>>       git-bisect              <paths>/<pathspec>      fdir    (f)
>
> What does the gap between these two mean?  Do you mean "bisect and later
> are not in the 'exact filenames or directories' group"?

Each of the three types is divided in two, separated by a blank line.
The first group, all without (f) at the end, die if one of the paths
given on the command line does not exist.  The second group, all with
(f) at the end, do not complain if any of the paths do not match
anything.  I called the latter group "filtering" for lack of a better
term.

So, git-archive, git-add, git-checkout, git-commit, git-rm all take
pathspecs (the first using diff logic, the rest using ls-files logic),
but they complain if any of the pathspecs do not match.  It's probably
not worth documenting this, but it may be worth implementing
--ignore-unmatch for all but git-rm (which already has it.)

Similarly, it may be a nice option to have a --warn-unmatch option (or
a configuration variable) to warn (or possibly die) if any of the
pathspecs hit nothing.   Then again, maybe it's not worth the trouble.

It appears that git-diff-tree and git-ls-tree take some sort of
pathspec - it ignores unmatching patterns, but it does not allow
globbing or directory matches.

> In general, unless the command takes only one filesystem entity (e.g. in
> "format-patch -o <dir>", <dir> cannot be anything but a single directory;
> in "blame <file>", <file> cannot be anything but a single file)

In this case, I suggest that we use the same meta-variable always:
<file> or <dir>, or a more specific thing like <source>.

> you never
> give a single "filename" to git command.  Even when you say "git add
> Makefile", you are _not_ giving a filename that is "M' "a" "k" "e" ...;
> you are giving a _pattern_ to be matched with files git would find by
> traversing the filesystem.  In the case of "Makefile", it may happen to
> match only one single file.
>
> This pattern is called "pathspec", and commands that can take one pathspec
> can always take more than one.

In these cases, I think we should always use the meta-variable
<pathspec> - never <path>, <file>, etc.

> I think it would make sense to document which ones are concrete paths
> (e.g. "blame takes a filename" vs "diff takes zero or more pathspecs"),
> but it would not make much sense to document the two different pathspecs.
> The effort is better spent at fixing the difference --- obviously we would
> eventually want to be able to say "git diff 'lib/*.h'".

You're right, it's not worth having two different meta-variables, but
I do think it is worth noting in the documentation.  In git(1), we
could have something like the following:

<pathspec>::
	Indicates a pattern for filtering paths.  Matches are either exact,
	a leading directory, or a glob(7) pattern on the entire path.
	For example, "doc/help.txt" matches "doc/help.txt", "doc",
	"*.txt", and "d*t", but not "do", "help.txt", or "help.*".

<dir>::
<file>::
	Indicates a physical file or directory relative to
	the current directory.

(Note: the current document says "almost always relative to the root
of the tree structure `GIT_INDEX_FILE` describes."  Is this true?)

Then, for the diff logic commands, we could document that they do not
accept globs.  For example, in git-diff(1):

<pathspec>...::
	If given, limit the diff to paths matching the given parameters.
	(Does not accept glob(7) patterns.)

Once the logic is unified, and these commands do accept globs, we can
just remove the note.

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Meta-variable naming convention in documentation
  2010-02-26 22:51   ` Mark Lodato
@ 2010-02-27  2:41     ` Mark Lodato
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Lodato @ 2010-02-27  2:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git list

On a similar note, what do you think about dropping the term
<tree-ish> and just using <tree> everywhere?  The only command that
only accepts a <tree>, and not a <tree-ish>, is git-commit-tree.  For
example, `git commit-tree master' fails, but `git commit-tree
master^{tree}' works.  This can easily be written in the
documentation, or the program can be fixed so it also accepts a
<tree-ish> like everything else.  (git-grep(1) uses the term <tree>,
but this should be <tree-ish>.)

I imagine a similar thing can be done with <commit> vs <commit-ish> vs
<committish>, but I haven't verified this.  <commit-ish> is only used
in git(1) and builtin-revert.c, and <committish> is only used in
git-describe(1), git-fast-import(1), git-name-rev(1), git-shortlog(1),
gitcli(1), and builtin-describe.c.  I would guess that all the other
commands that say <commit> really accept a <commit-ish>, but perhaps
this is not true.

I also think <rev> should be replaced with <commit>, unless this means
something different.

If you give the go-ahead, I'll work on a patch to do this.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-02-27  2:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-26  4:55 Meta-variable naming convention in documentation Mark Lodato
2010-02-26  5:42 ` Junio C Hamano
2010-02-26 22:51   ` Mark Lodato
2010-02-27  2:41     ` Mark Lodato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.