git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Marat Radchenko <marat@slonopotamus.org>
Cc: git@vger.kernel.org
Subject: Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?
Date: Mon, 29 Aug 2011 23:09:05 +0700	[thread overview]
Message-ID: <CACsJy8Dar5i3Fn+rhOq78vdsqRL4D+RNUc5G64BM-6DvKC=L5w@mail.gmail.com> (raw)
In-Reply-To: <loom.20110829T155805-331@post.gmane.org>

On Mon, Aug 29, 2011 at 9:48 PM, Marat Radchenko <marat@slonopotamus.org> wrote:
> Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:
>>  - is "file" above at top repo, or is it actually very/deep/path/to/a/file?
> 3 levels deep. Most parent dir (one after repo root) contains 20k files.
>
>>  - how many entries in the tree that contain "file"?
> Sorry, didn't understand this.

You have already answered it. I was asking the size of parent dir, but
phrased poorly.

>>  - how is "git ls-files | wc -l"?
> $ time git ls-files | wc -l
> 603137
>
> real    0m0.417s
> user    0m0.440s
> sys     0m0.060s
>
>>  - how about "time git diff branch another-branch -- file >/dev/null"?
>> That'd remove unpack-trees code.
> Pretty fast:
>
> git diff HEAD branch -- file > /dev/null
>
> real    0m0.276s
> user    0m0.240s
> sys     0m0.030s

That may explain it. "git diff <ref>" walks through the index, unpacks
tree objects along the way, matches up entries with the same path from
the branch, the index then feeds matching entries to diff function. If
tree cutting is not done efficiently, it could very well walk through
every entry in the index (~600k entries in your case), unpacking all
tree objects along the way.

And it looks like to me that diff_cache() in diff-lib.c, responsible
for this case, does not do any prefix trimming. traverse_trees() also
does not seem to do "never_interesting" optimization like in
tree_interesting(), so if the traversed tree is big (~20k as you told
me), it will take some time, even though you are only interested in a
single entry.

> So the only troubled variant is `git diff branch -- file`.

No, I suspect "git diff --cached" would be also slow. "git merge"
would be definitely slow. But we can hardly improve these cases
because the commands are usually called tree-wide, no path limiting.

If you only work on a small subset of files, there's some (unfinished)
code in narrow clone implementation that cuts down index size, which
may speed up in big-index repositories like yours. Could be a good
reason for me (or someone) to extract that part and get it in before
full narrow clone is implemented.
-- 
Duy

  reply	other threads:[~2011-08-29 16:09 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-23  7:25 What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`? Marat Radchenko
2011-08-23 10:03 ` Michael J Gruber
2011-08-23 10:52   ` Marat Radchenko
2011-08-23 15:20     ` Michael Witten
2011-08-23 15:34     ` Michael J Gruber
2011-08-23 16:45       ` Marat Radchenko
2011-08-23 17:15       ` Junio C Hamano
2011-08-23 18:21         ` Marat Radchenko
2011-08-23 20:07         ` Michael J Gruber
2011-08-25 16:09           ` Marat Radchenko
2011-08-25 21:10           ` Junio C Hamano
2011-08-26  9:43             ` Marat Radchenko
2011-08-29  7:41 ` Nguyen Thai Ngoc Duy
2011-08-29 14:48   ` Marat Radchenko
2011-08-29 16:09     ` Nguyen Thai Ngoc Duy [this message]
2011-08-29 17:18       ` Junio C Hamano
2011-08-29 20:42         ` Junio C Hamano
2011-08-29 20:50           ` Junio C Hamano
2011-08-29 21:09           ` Junio C Hamano
2011-08-29 21:33           ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Junio C Hamano
2011-08-29 21:33             ` [PATCH 1/3] traverse_trees(): allow pruning with pathspec Junio C Hamano
2011-08-30 12:53               ` Nguyen Thai Ngoc Duy
2011-08-30 17:44                 ` Junio C Hamano
2011-08-31  1:35                   ` Nguyen Thai Ngoc Duy
2011-10-09 15:39               ` Michael Haggerty
2011-10-09 21:35                 ` Nguyen Thai Ngoc Duy
2011-10-10  4:42                   ` Michael Haggerty
2011-08-29 21:33             ` [PATCH 2/3] unpack-trees: " Junio C Hamano
2011-08-30 13:03               ` Nguyen Thai Ngoc Duy
2011-08-30 17:32                 ` Junio C Hamano
2011-08-30 15:24               ` David Michael Barr
2011-08-29 21:33             ` [PATCH 3/3] diff-index: pass pathspec down to unpack-trees machinery Junio C Hamano
2012-01-11  6:31               ` Jonathan Nieder
2012-01-11  8:05                 ` Junio C Hamano
2012-01-11 12:33                 ` Nguyen Thai Ngoc Duy
2012-01-11 12:47                   ` Nguyen Thai Ngoc Duy
2012-01-11 20:40                   ` Junio C Hamano
2012-01-12  4:09                 ` [PATCH] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
2012-01-12  5:04                   ` Junio C Hamano
2012-01-12  5:44                     ` Nguyen Thai Ngoc Duy
2012-01-14  9:23                   ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Nguyễn Thái Ngọc Duy
2012-01-14  9:23                     ` [PATCH v2 2/2] tree_entry_interesting: make recursive mode default Nguyễn Thái Ngọc Duy
2012-01-15  3:12                       ` Junio C Hamano
2012-01-15 10:03                         ` Nguyen Thai Ngoc Duy
2012-01-16 22:15                           ` Junio C Hamano
2012-01-18  8:59                             ` Nguyen Thai Ngoc Duy
2012-01-15  2:38                     ` [PATCH v2 1/2] Document limited recursion pathspec matching with wildcards Junio C Hamano
2012-01-15  9:48                       ` Nguyen Thai Ngoc Duy
2011-08-29 21:56             ` [PATCH 0/3] Un-pessimize "diff-index $commit -- $pathspec" Linus Torvalds
2011-08-29 22:05               ` Junio C Hamano
2011-08-29 22:11                 ` Linus Torvalds
2011-08-29 23:42                   ` Junio C Hamano
2011-08-30  6:16                     ` Marat Radchenko
2011-08-31  0:18                       ` Junio C Hamano
2011-08-30 10:04             ` Michael J Gruber
2011-08-30 17:03               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8Dar5i3Fn+rhOq78vdsqRL4D+RNUc5G64BM-6DvKC=L5w@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=marat@slonopotamus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).