git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Ardill <andrew.ardill@gmail.com>
To: Kelly Dean <kellydeanch@yahoo.com>
Cc: PJ Weisberg <pj@irregularexpressions.net>, git@vger.kernel.org
Subject: Re: Does content provenance matter?
Date: Tue, 8 May 2012 10:03:28 +1000	[thread overview]
Message-ID: <CAH5451nq=W6qNsMExobipFXhb4tV+WmLu08QwfhzWd2+mJnbYA@mail.gmail.com> (raw)
In-Reply-To: <1336432417.36394.YahooMailClassic@web121504.mail.ne1.yahoo.com>

On 8 May 2012 09:13, Kelly Dean <kellydeanch@yahoo.com> wrote:
>
> --- On Mon, 5/7/12, PJ Weisberg <pj@irregularexpressions.net> wrote:
> > But there could be any number of unrelated commits newer than "Bar"
> > but older than "Revert Bar" on other branches.  Even if you could
> > trust the timestamps to be accurate (you can't), you still can't
> > determine a commit's parent unambiguously.
> Therefore, provenance does matter, and it must be explicitly recorded
> because it can't necessarily be correctly and fully deduced from content
> alone. And git does record inter-commit provenance.
> However, git doesn't record intra-commit provenance, as I mentioned in my
> original message. My question is: why this discrepancy? Either provenance
> matters, or it doesn't; why record it in one case but not the other?

I don't think it is firmly decided that provenance is not important in
the intra-commit scope, rather that as you stated such information is
not available to us.

My understanding is that git makes a best guess effort to track the
flow of content through the repository. If the content is moved, by
deleting in one place and adding in another it is easy to see that in
git, however if content is merely added, and that same content occurs
in multiple places in the repository, there is no sane way of knowing
where that content came from.
Even if the content that was added only occurred in one other place,
you would need to check every single file for every single hunk added
every single commit in order to be able to determine just where this
content came from. Why stop there though? It's possible we are copying
the content from some other branch we don't have checked out at the
moment, so every time we commit, let's search the entire repositories
history for an occurrence of each hunk we are adding. This way is
madness.

With regards to file renames, all that has been shown so far is that
provenance matters for commit renames. Nothing about the similarities
between the commit parent and rename situations you mention leads me
to concluded that because provenance is important to one it is
important to the other.

Indeed, one of the arguments against provenance being important in the
file rename case is that generally we can determine this information
from the existing information, as opposed to the general commit parent
case. There are additional arguments, such as simply recording file
name changes doesn't capture many situations we would like to know
about, for example when a single file is split into two files.
Tracking the content of those files, and hence being able to deduce
where their content came from, solves this and the general rename
situation. Trying to guess which file was 'renamed' and which is 'new'
when a file is actually split into two new files would lead to
misleading and incomplete information in the end.

So just because provenance matters in some situations doesn't mean it
matters in all (at least in the way we have been applying 'matters'),
furthermore there are additional reasons why the existing
content-tracking system is beneficial. Extra layers of rename encoding
or the 'heritage of data chunks' would be extra work with little added
benefit (though there are a few corner cases, from memory, where
automatic rename detection fails and so /some/ benefit would be seen).

Regards,

Andrew Ardill

  reply	other threads:[~2012-05-08  0:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-05 20:49 Does content provenance matter? Kelly Dean
2012-05-07  8:23 ` Thomas Rast
2012-05-07 21:43   ` Kelly Dean
2012-05-07 22:14     ` PJ Weisberg
2012-05-07 23:13       ` Kelly Dean
2012-05-08  0:03         ` Andrew Ardill [this message]
2012-05-08  9:23         ` Philip Oakley
2012-05-08  0:08   ` Junio C Hamano
2012-05-08  0:11     ` Junio C Hamano
2012-05-07 23:12 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH5451nq=W6qNsMExobipFXhb4tV+WmLu08QwfhzWd2+mJnbYA@mail.gmail.com' \
    --to=andrew.ardill@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kellydeanch@yahoo.com \
    --cc=pj@irregularexpressions.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).