Re: [PATCH v2 01/11] t: add tool to translate hash-related values

From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "Git List" <git@vger.kernel.org>, "Jeff King" <peff@peff.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Torsten Bögershausen" <tboegi@web.de>
Subject: Re: [PATCH v2 01/11] t: add tool to translate hash-related values
Date: Sun, 19 Aug 2018 21:50:22 +0000	[thread overview]
Message-ID: <20180819215022.GH70480@genre.crustytoothpaste.net> (raw)
In-Reply-To: <CAPig+cRZsJ00wNW08-jxmD=aW0V1hJJYEa9EVTMQT4=r0B+jeQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8110 bytes --]

On Sun, Aug 19, 2018 at 03:40:22PM -0400, Eric Sunshine wrote:
> On Sun, Aug 19, 2018 at 1:54 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > diff --git a/t/oid-info/oid b/t/oid-info/oid
> > @@ -0,0 +1,29 @@
> > +# All zeros or Fs missing one or two hex segments.
> > +zero_1         sha1:000000000000000000000000000000000000000
> > +zero_2         sha256:000000000000000000000000000000000000000000000000000000000000000
> > +zero_2         sha1:00000000000000000000000000000000000000
> > +zero_2         sha256:00000000000000000000000000000000000000000000000000000000000000
> 
> Too many zero_2's. I guess the first one should be zero_1.

Ah, yes, you're right.

> > diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
> > @@ -821,6 +821,35 @@ test_expect_success 'tests clean up even on failures' "
> > +test_oid_cache hash-info oid
> > +
> > +test_expect_success 'test_oid_info provides sane info by default' '
> 
> Is "test_oid_info" in the test title outdated? I don't see anything by
> that name. Same question regarding the other new tests.

Probably.

> > +       test_oid zero &&
> > +       test_oid zero >actual &&
> 
> If the lookup "test_oid zero" fails, it's going to fail whether
> redirected or not, so the first invocation can be dropped.

Right.  I think that might have been a debugging statement.  Will fix.

> > +       grep "00*" actual &&
> 
> Do you want to anchor this regex? ^00*$

I could.  That would probably be a bit more robust.

> > +       test "$(test_oid hexsz)" -eq $(wc -c <actual) &&
> > +       test $(( $(test_oid rawsz) * 2)) -eq "$(test_oid hexsz)"
> > +'
> 
> If $(test_oid rawsz) fails to find "rawsz" and returns nothing, this
> expression will be "*2", which will error out in a
> less-than-meaningful way. Perhaps it would make more sense to dump the
> results of the two test_oid() to file and use test_cmp()?
> 
> Similar comment regarding all the other "test ... -eq ..." expressions
> here and below: I wonder if it would be better to dump them to files
> and compare the files.

I think what I'd like to do instead is store in a variable and make
test_oid return unsuccessfully if the value doesn't exist.  I think
that's better for writing tests overall and will accomplish the same
goal.

> > +test_expect_success 'test_oid_info can look up data for SHA-1' '
> > +       test_detect_hash sha1 &&
> > +       test_oid zero >actual &&
> > +       grep "00*" actual &&
> > +       test $(wc -c <actual) -eq 40 &&
> > +       test "$(test_oid rawsz)" -eq 20 &&
> > +       test "$(test_oid hexsz)" -eq 40
> > +'
> > +
> > +test_expect_success 'test_oid_info can look up data for SHA-256' '
> > +       test_when_finished "test_detect_hash" &&
> 
> Should the previous test also do this test_when_finished() to protect
> against someone coming along and re-ordering them some day? Or someone
> inserting a test between the two?

Yeah, that sounds like a more robust solution.

> > +       test_detect_hash sha256 &&
> > +       test_oid zero >actual &&
> > +       grep "00*" actual &&
> > +       test $(wc -c <actual) -eq 64 &&
> > +       test "$(test_oid rawsz)" -eq 32 &&
> > +       test "$(test_oid hexsz)" -eq 64
> > +'
> > diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> > @@ -1147,3 +1147,39 @@ depacketize () {
> > +test_detect_hash () {
> > +       if test -n "$1"
> > +       then
> > +               test_hash_algo="$1"
> > +       else
> > +               test_hash_algo='sha1'
> > +       fi
> > +}
> 
> Mmph. Outside of t0000, do you expect other callers which will need to
> set the hash algorithm to an explicit value? If not, this sort of
> polymorphic behavior adds extra, perhaps unnecessary, complexity. Even
> if you do expect a few other callers, a dedicated test_set_hash()
> function might be clearer since test_detect_hash() has a very specific
> meaning.

I'm not anticipating one.  I can split it out into a separate function.

> > +test_oid_cache () {
> > +       test -n "$test_hash_algo" || test_detect_hash
> > +       if test -n "$1"
> > +       then
> > +               while test -n "$1"
> > +               do
> > +                       test_oid_cache <"$TEST_DIRECTORY/oid-info/$1"
> 
> What is the benefit of placing test-specific OID info in some
> dedicated directory? I suppose the idea of lumping all these OID files
> together in a single oid-info/ directory is that it makes it easier to
> see at a glance what needs to be changed next time a new hash
> algorithm is selected. However, that potential long-term benefit comes
> at the cost of divorcing test-specific information from the tests
> themselves. I imagine (for myself, at least) that it would be easier
> to grok a test if its OID's were specified and cached directly in the
> test script itself (via test_oid_cache <<here-doc). I could even
> imagine a test script invoking test_oid_cache<<here-doc before each
> test which has unique OID's in order to localize OID information to
> the test which needs it, or perhaps even within a test.

Putting them in a separate directory makes it possible to do something
like this:

   (cd t && ./t0002-* --verbose)

and then use shell editing to change the command line.  If we put them
in the same directory as the tests, we make developers' lives a bit
harder.

You mentioned the desire to experiment with additional hash algorithms
as a hope for this series.  I don't know if that's still something
desirable to have, now that we've decided on SHA-256, but I felt it
would make it easier to find all the places that need updating.

Since you seem to have a strong preference about keeping them in the
test script, I'm happy to make that change unless other people have
strong feelings one way or the other.

> So, I'm having trouble understanding the benefit of the current
> approach over merely loading OID's via here-docs in the test scripts
> themselves. Perhaps the commit message could say something about why
> the current approach was taken.

I can do that.  The idea is that we have lots of common uses of certain
values that will need to be loaded, and it's better to load some of
those values from a file.  I felt it would be ugly to have to write out
the full "$TEST_DIRECTORY..." piece every time.

> > +                       shift
> > +               done
> > +               return $?
> 
> Why, specifically, return $? here, as opposed to simply returning?

A simple return is probably better.  I think I wasn't clear on whether
POSIX required a bare "return" to return $? and may not have been online
at that time to look.  I remember being very clear that I didn't want to
return 0 unconditionally, though.

> Mmph. This polymorphic, recursive behavior in which test_oid_cache()
> can load data from a list of files or from its own standard input adds
> complexity. One alternative would be to have a separate
> test_oid_cache_file() function. However, I'm wondering why such
> functionality needs to be built in, in the first place, as opposed to
> clients merely doing the redirect themselves (test_oid_cache
> <whatever). I see that you want to support specifying multiple files
> for a single invocation, but is that likely to come up (aside from
> t0000)?

I expect that we are going to have a lot of uses of the hash-info and
oid files.  A separate command should be fine.

> > +       while read _tag _rest
> > +       do
> > +               case $_tag in \#*)
> > +                       continue;;
> > +               esac
> 
> This handles "# comment" lines, but what about blank lines?

Good catch.

> > +               _k="${_rest%:*}"
> > +               _v="${_rest#*:}"
> 
> Should this be doing any sort of validation, such as complaining if _k
> or _v is empty? Or if _k contains weird characters?

_v could be validly empty in some cases.  _k is probably worth checking.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]