All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Clemens Buchacher <drizzd@aon.at>
Cc: "Thomas Gummerer" <t.gummerer@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Mike Hommey" <mh@glandium.org>,
	git@vger.kernel.org
Subject: Re: [PATCH] travis-ci: run previously failed tests first, then slowest to fastest
Date: Wed, 27 Jan 2016 12:49:31 -0800	[thread overview]
Message-ID: <xmqqmvrq7nok.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqd1sm9730.fsf@gitster.mtv.corp.google.com> (Junio C. Hamano's message of "Wed, 27 Jan 2016 11:05:07 -0800")

Junio C Hamano <gitster@pobox.com> writes:

> One way to solve (1) I can think of is to change the definition of
> ce_compare_data(), which is called by the code that does not trust
> the cached stat data (including but not limited to the Racy Git
> codepath).  The current semantics of that function asks this
> question:
>
>     We do not know if the working tree file and the indexed data
>     match.  Let's see if "git add" of that path would record the
>     data that is identical to what is in the index.
>
> This definition was cast in stone by 29e4d363 (Racy GIT, 2005-12-20)
> and has been with us since Git v1.0.0.  But that does not have to be
> the only sensible definition of this check.  I wonder what would
> break if we ask this question instead:
>
>     We do not know if the working tree file and the indexed data
>     match.  Let's see if "git checkout" of that path would leave the
>     same data as what currently is in the working tree file.
>
> If we did this, "reset --hard HEAD" followed by "diff HEAD" will by
> definition always report "is clean" as long as nobody changes files
> in the working tree, even with the inconsistent data in the index.
>
> This still requires that convert_to_working_tree(), i.e. your smudge
> filter, is deterministic, though, but I think that is a sensible
> assumption for sane people, even for those with inconsistent data in
> the index.

Just a few additional comments.

The primary reason why I originally chose "does 'git add' of what is
in the working tree give us the same blob in the index?" as opposed
to "does 'git checkout' from the index again will give the same
result in the working tree?" is because it is a lot less resource
intensive and also is simpler.  Back then I do not think we had a
streaming interface to hash huge contents from a file in the working
tree, but it requires us to read the entire file from the filesystem
just once, apply the convert_to_git() processing and then hash the
result, whether we keep the whole thing in core at once or process
the data in streaming fashion.  Doing the other check will have to
inflate the blob data and apply the convert_to_working_tree()
processing, and also read the whole thing from the filesystem and
compare, which is more work at runtime.  And for a sane set-up where
the data in the index does not contradict with the clean/smudge
filter and EOL settings, both would yield the same result.

If we were to switch the semantics of ce_compare_data(), we would
want a new sibling interface next to stream_blob_to_fd() that takes
a file descriptor opened on the file in the working tree for reading
(fd), the object name (sha1), and the output filter, and works very
similarly to stream_blob_to_fd().  The difference would be that we
would be reading from the fd (i.e. the file in the working tree) as
we read from the istream (i.e. the contents of the blob in the
index, after passing the convert_to_working_tree() filter) and
comparing them in the main loop.  The filter parameter to the
function would be obtained by calling get_stream_filter() just like
how write_entry() uses it to prepare the filter parameter to call
streaming_write_entry() with.  That way, we can rely on future
improvement of the streaming interface to make sure we keep the data
we have to keep in core to the minimum.

IOW, I am saying that the "add --fix-index" lunchbreak patch I sent
earlier in the thread that has to hold the data in-core while
processing is not a production quality patch ;-)

  reply	other threads:[~2016-01-27 20:49 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-19  9:24 [PATCH] travis-ci: run previously failed tests first, then slowest to fastest larsxschneider
2016-01-19 19:12 ` Jeff King
2016-01-19 23:00   ` Junio C Hamano
2016-01-20  0:26     ` Mike Hommey
2016-01-20  1:46       ` Junio C Hamano
2016-01-20  1:56         ` Jeff King
2016-01-20  9:22         ` Lars Schneider
2016-01-22  2:33           ` brian m. carlson
2016-01-22  5:52             ` Jeff King
2016-01-22  6:07               ` Jeff King
2016-01-24 14:34                 ` Thomas Gummerer
2016-01-24 20:03                   ` Junio C Hamano
2016-01-24 22:05                     ` Junio C Hamano
2016-01-25 14:42                       ` Thomas Gummerer
2016-01-25 17:26                         ` Junio C Hamano
2016-01-25 21:52                           ` Junio C Hamano
2016-01-27 15:16                             ` Clemens Buchacher
2016-01-27 19:05                               ` Junio C Hamano
2016-01-27 20:49                                 ` Junio C Hamano [this message]
2016-01-28  7:10                                   ` Clemens Buchacher
2016-01-28 21:32                                     ` Junio C Hamano
2016-01-30  8:13                                       ` Clemens Buchacher
2016-02-01 18:17                                         ` Junio C Hamano
2016-02-01 19:33                                           ` Clemens Buchacher
2016-02-02 23:14                                             ` Junio C Hamano
2016-02-03  8:31                                               ` Junio C Hamano
2016-02-01 20:26                                           ` Torsten Bögershausen
2016-01-28  6:20                                 ` eol round trip Was: [PATCH] travis-ci: run previously failed Torsten Bögershausen
2016-01-25 22:41                           ` [PATCH] travis-ci: run previously failed tests first, then slowest to fastest Thomas Gummerer
2016-01-20  1:53       ` Jeff King
2016-01-20  9:10       ` Lars Schneider
2016-01-19 20:00 ` Junio C Hamano
2016-01-19 22:53   ` Junio C Hamano
2016-01-19 23:06     ` Jeff King
2016-01-19 23:26       ` Junio C Hamano
2016-01-19 23:29         ` Jeff King
2016-01-19 23:30         ` Junio C Hamano
2016-01-19 23:27       ` Jeff King
2016-01-20  7:55   ` Johannes Schindelin
2016-01-20  9:04   ` Lars Schneider
2016-01-20 20:35     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmvrq7nok.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=t.gummerer@gmail.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.