All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Refreshing index timestamps without reading content
Date: Mon, 9 Jan 2017 19:02:45 +0700	[thread overview]
Message-ID: <CACsJy8BRfJG6L49VyC+qsrQ9Arz0gCGpMATpK9uLq61Lx6_Jtg@mail.gmail.com> (raw)
In-Reply-To: <20170105112359.GN8116@chrystal.oracle.com>

On Thu, Jan 5, 2017 at 6:23 PM, Quentin Casasnovas
<quentin.casasnovas@oracle.com> wrote:
> Hi guys,
>
> Apologies if this is documented somewhere, I have fairly bad search vudu
> skills.
>
> I'm looking for a way to cause a full refresh of the index without causing
> any read of the files, basically telling git "trust me, all worktree files
> are matching the index, but their stat information have changed".  I have
> read about the update-index --assume-unchanged and --skip-worktree flags in
> the documentation, but these do not cause any index refresh - rather, they
> fake that the respective worktree files are matching the index until you
> remove those assume-unchanged/skip-worktree bits.
>
> This might sound like a really weird thing to do, but I do have a use case
> for it - we have some build farm setup where the resulting objects of a
> compilation are stored on a shared server.  The source files are not stored
> on the shared server, but locally on each of the build server (as to
> decrease network load and make good use of local storage as caches).
>
> We then use an onion filesystem to mount the compiled objects on top of the
> local sources - and change the modification time of the source to be older
> than the object files, so that on subsequent builds, make does not rebuild
> the whole world.
>
> This works fine except for one thing, after changing the mtime of the
> source files, the first subsequent git command needing to compare the tree
> with the index will take a LONG time since it will read all of the object
> content:
>
>   cd linux-2.6
>
>   # Less than a second  when the index is up to date
>   time git status > /dev/null
>   git status 0.06s user 0.09s system 172% cpu 0.087 total
>                                               ~~~~~~~~~~~
>
>   # Change the mtime..
>   git ls-tree -r --name-only HEAD | xargs -n 1024 touch
>
>   # Now 30s..
>   time git status > /dev/null
>   git status  2.73s user 1.79s system 13% cpu 32.453 total
>                                               ~~~~~~~~~~~~
>
> The timing information above was captured on my laptop SSD and the penalty
> is obviously much higher on spinning disks - especially when this operation
> is done on *hundreds* of different work tree in parallel, all hosted on the
> same filesystem (it can take tens of minutes!).
>
> Is there any way to tell git, after the git ls-tree command above, to
> refresh its stat cache information and trust us that the file content has
> not changed, as to avoid any useless file read (though it will obviously
> will have to stat all of them, but that's not something we can really
> avoid)

I don't think there's any way to do that, unfortunately.

> If not, I am willing to implement a --assume-content-unchanged to the git
> update-index if you guys don't see something fundamentally wrong with this
> approach.

If you do that, I think you should go with either of the following options

- Extend git-update-index --index-info to take stat info as well (or
maybe make a new option instead). Then you can feed stat info directly
to git without a use-case-specific "assume-content-unchanged".

- Add "git update-index --touch" that does what "touch" does. In this
case, it blindly updates stat info to latest. But like touch, we can
also specify  mtime from command line if we need to. It's a bit less
generic than the above option, but easier to use.

Caveat: The options I'm proposing can be rejected. So maybe wait a bit
to see how people feel and perhaps send an RFC patch, again to gauge
the reception.

-- 
Duy

  reply	other threads:[~2017-01-09 12:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-05 11:23 Refreshing index timestamps without reading content Quentin Casasnovas
2017-01-09 12:02 ` Duy Nguyen [this message]
2017-01-09 12:17   ` Quentin Casasnovas
2017-01-09 12:22     ` Quentin Casasnovas
2017-01-09 15:01   ` Junio C Hamano
2017-01-09 15:55     ` Quentin Casasnovas
2017-01-10 14:17       ` Quentin Casasnovas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8BRfJG6L49VyC+qsrQ9Arz0gCGpMATpK9uLq61Lx6_Jtg@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=quentin.casasnovas@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.