All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Johannes Sixt <j6t@kdbg.org>, Jeff King <peff@peff.net>,
	Jens Lehmann <jens.lehmann@web.de>
Subject: Re: [PATCH] submodule: Port resolve_relative_url from shell to C
Date: Wed, 13 Jan 2016 14:47:20 -0800	[thread overview]
Message-ID: <CAGZ79ka0rxYK7GRSjh13XOsg887EgqYtc5B60z9qU=tAoJGERQ@mail.gmail.com> (raw)
In-Reply-To: <xmqq4mehm92b.fsf@gitster.mtv.corp.google.com>

On Wed, Jan 13, 2016 at 2:03 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Later on we want to deprecate the `git submodule init` command and make
>> it implicit in other submodule commands.
>
> I doubt there is a concensus for "deprecate" part to warrant the use
> of "we want" here.  I tend to think that the latter half of the
> sentence is uncontroversial, i.e. it is a good idea to make other
> "submodule" subcommands internally call it when it makes sense, and
> also make knobs available to other commands like "clone" and
> possibly "checkout" so that the users do not have to do the
> "submodule init" as a separate step, though.

Maybe I need to rethink my strategy here and deliver a patch series
which includes a complete port of `submodule init`, and maybe even
options in checkout (and clone) to run `submodule init`. That way the
immediate benefit would be clear on why the series is a good idea.

The current wording is mostly arguing to Jens, how to do the submodule
groups thing later on, but skipping the immediate steps.

>
>> As I was porting the functionality I noticed some odds with the inputs.
>
> I can parse but cannot quite grok.  You found some strange things in
> the input?  Whose input, that comes from where given by whom?
>
>> To fully understand the situation I added some logging to the function
>> temporarily to capture all calls to the function throughout the test
>> suite. Duplicates have been removed and all unique testing inputs have
>> been recorded into t0060.
>
> I can also parse this, but it is unclear what you did to the
> temporary debugging help at the end.  If you left it, then that is
> no longer a temporary but is part of the final product.  It is also
> unclear what "Duplicates" you are talking about here.

So in v1 somebody complained it's not clear what kind of input you'd get into
the relative_url(up_path, remoteurl, url) function. I did not know either, as it
was a straight port, passing the test suite. So I wanted to add tests.

To come up with reasonable tests I added a section to the code similar as this:

    {
        FILE *f = fopen("/tmp/testcases", "a");
        fprintf(f, "%s|%s|%s|%s\n", up_path, remoteurl, url, result);
        fclose(f);
    }

Then I run the whole test suite with the relative_url instrumented.
This gave me a file "/tmp/testcases" containing 500 lines with valid
in and output for the `relative_url` function.
However I run these 500 lines through sort|uniq to get about 90 lines
of unduplicated tests.

but in these 90 lines there were still syntactic duplicates where one
line may look like the other line just with
    s/trash directory.tXXXX/trash directory.tYYYY/
so I removed these lines manually, too.

And that's how I came up with the set of tests.
The logging function to "/tmp/testcases" was temporary and is not part
of the final product, but by mentioning that, some issues may be clear
to the reader, such as:
 * why there are tests with /u/trash directory-t7400.../...
 * the tests are as exhaustive as the test suite before.
 * there are no tests to test failure though, only test for good tests

>
> Do you mean that you found some of the existing tests were odd, and
> after examination with help from a temporary hack which does not
> remain in this patch, you determined that some tests were duplicated,
> which you removed, while adding new ones?

Yes, this.

>
>>  builtin/submodule--helper.c | 189 ++++++++++++++++++++++++++++++++++++++++++++
>>  git-submodule.sh            |  81 +------------------
>>  t/t0060-path-utils.sh       |  42 ++++++++++
>>  3 files changed, 235 insertions(+), 77 deletions(-)
>>
>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
>> index f4c3eff..3e58b5d 100644
>> --- a/builtin/submodule--helper.c
>> +++ b/builtin/submodule--helper.c
>> @@ -9,6 +9,193 @@
>>  #include "submodule-config.h"
>>  #include "string-list.h"
>>  #include "run-command.h"
>> +#include "remote.h"
>> +#include "refs.h"
>> +#include "connect.h"
>> +
>> +static char *get_default_remote(void)
>> +{
>> +     char *dest = NULL, *ret;
>> +     unsigned char sha1[20];
>> +     int flag;
>> +     struct strbuf sb = STRBUF_INIT;
>> +     const char *refname = resolve_ref_unsafe("HEAD", 0, sha1, &flag);
>> +
>> +     if (!refname)
>> +             die("No such ref: HEAD");
>> +
>> +     refname = shorten_unambiguous_ref(refname, 0);
>> +     strbuf_addf(&sb, "branch.%s.remote", refname);
>
> Is it correct to use shorten_unambiguous_ref() here like this?  The
> function is meant to be used when you want heads/frotz because you
> have both refs/heads/frotz and refs/tags/frotz at the same time.  I
> think you want to say branch.frotz.remote even in such a case.  IOW,
> shouldn't it be skip_prefix() with refs/heads/, together with die()
> if the prefix is something else?

Right.

>
>> +     if (git_config_get_string(sb.buf, &dest))
>> +             ret = xstrdup("origin");
>> +     else
>> +             ret = xstrdup(dest);
>> +
>> +     strbuf_release(&sb);
>> +     return ret;
>> +}
>> +
>> +static int starts_with_dot_slash(const char *str)
>> +{
>> +     return str[0] == '.' && is_dir_sep(str[1]);
>> +}
>> +
>> +static int starts_with_dot_dot_slash(const char *str)
>> +{
>> +     return str[0] == '.' && str[1] == '.' && is_dir_sep(str[2]);
>> +}
>> +
>> +static char *last_dir_separator(char *str)
>> +{
>> +     char* p = str + strlen(str);
>
> Asterisk sticks to the variable, not the type.

ok

>
>> +     while (p-- != str)
>
> It is preferable to use '>' not '!=' here, because you know p
> approaches str from the larger side, for readability.

Also known as the limes operator (p--> str, "p goes to str")
just kidding :)

>
>> +             if (is_dir_sep(*p))
>> +                     return p;
>> +     return NULL;
>> +}
>
> (a useless comment) This is one of the rare places where I wish
> there were a version of strcspn() that scans from the right.
>
>> +static char *relative_url(const char *remote_url,
>> +                             const char *url,
>> +                             const char *up_path)
>> +{
>> +     int is_relative = 0;
>> +     int colonsep = 0;
>> +     char *out;
>> +     char *remoteurl = xstrdup(remote_url);
>> +     struct strbuf sb = STRBUF_INIT;
>> +     size_t len;
>> +
>> +     len = strlen(remoteurl);
>
> Nothing wrong here, but it looked somewhat inconsistent to see this
> assignment, while remoteurl is done as an initialization [*1*]

ok, noted.

>
>
> [Footnote]
>
> *1* as a personal preference, I tend to prefer seeing only simple
> operations in initialization and heavyweight ones as a separate
> assignment to an otherwise uninitialized variable, and strlen() is
> lighter-weight than xstrdup() in my dictionary.
>
>
>
>> +     if (is_dir_sep(remoteurl[len]))
>> +             remoteurl[len] = '\0';
>> +
>> +     if (!url_is_local_not_ssh(remoteurl) || is_absolute_path(remoteurl))
>> +             is_relative = 0;
>> +     else {
>> +             is_relative = 1;
>> +
>> +             /* Prepend a './' to ensure all relative remoteurls start
>> +              * with './' or '../'. */
>
> Adjust the style, and perhaps remove the final full-stop to make the
> last string literal easier to see?  I.e.
>
>         /*
>          * Prepend a './' to ensure all relative remoteurls
>          * start with './' or '../'
>          */
>
> would be easier to see what it is said.

ok

>
>> +             if (!starts_with_dot_slash(remoteurl) &&
>> +                 !starts_with_dot_dot_slash(remoteurl)) {
>> +                     strbuf_reset(&sb);
>> +                     strbuf_addf(&sb, "./%s", remoteurl);
>> +                     free(remoteurl);
>> +                     remoteurl = strbuf_detach(&sb, NULL);
>> +             }
>> +     }
>> +     /* When the url starts with '../', remove that and the
>> +      * last directory in remoteurl. */
>
> Style.

ok

>
>> +     while (url) {
>> +             if (starts_with_dot_dot_slash(url)) {
>> +                     char *rfind;
>> +                     url += 3;
>> +
>> +                     rfind = last_dir_separator(remoteurl);
>> +                     if (rfind)
>> +                             *rfind = '\0';
>> +                     else {
>> +                             rfind = strrchr(remoteurl, ':');
>> +                             if (rfind) {
>> +                                     *rfind = '\0';
>> +                                     colonsep = 1;
>> +                             } else {
>> +                                     if (is_relative || !strcmp(".", remoteurl))
>> +                                             die(_("cannot strip one component off url '%s'"), remoteurl);
>> +                                     else
>> +                                             remoteurl = xstrdup(".");
>> +                             }
>> +                     }
>
> It is somewhat hard to see how this avoids stripping one (or both)
> slashes just after "http:" in remoteurl="http://site/path/", leaving
> just "http:/" (or "http:").

it would leave just 'http:/' if url were to be ../../some/where/else,
such that the constructed url below would be http://some/where/else.

>
> This codepath has overly deep nesting levels.  Is this the simplest
> we can do?

it's a direct translation from shell. I could imagine the inside of
    if (starts_with_dot_dot_slash(url)) {
        ...
    }

may go to its own function, such that it becomes:

    while (url) {
        if (starts_with_dot_dot_slash(url)) {
            adjust_remoteurl_and_url(&url, &remoteurl)
        else if (starts_with_dot_slash(url))
            url += 2;
        else
            break;
    }


with a proper name for adjust_remoteurl_and_url of course.

>
> The final else { if .. else } can be made into else if .. else to
> dedent the overlong die() by one level, but I am wondering if the
> deep nesting is just a symptom of logic being unnecessarily complex.

I don't think it's unnecessary complex, but results from a direct
shell->C translation.

>
>> +             } else if (starts_with_dot_slash(url)) {
>> +                     url += 2;
>> +             } else
>> +                     break;
>> +     }
>> +     strbuf_reset(&sb);
>> +     strbuf_addf(&sb, "%s%s%s", remoteurl, colonsep ? ":" : "/", url);
>> +
>> +     if (starts_with_dot_slash(sb.buf))
>> +             out = xstrdup(sb.buf + 2);
>> +     else
>> +             out = xstrdup(sb.buf);
>> +     strbuf_reset(&sb);
>> +
>> +     free(remoteurl);
>> +     if (!up_path || !is_relative)
>> +             return out;
>> +
>> +     strbuf_addf(&sb, "%s%s", up_path, out);
>> +     free(out);
>> +     return strbuf_detach(&sb, NULL);
>> +}
>
> Thanks.

  reply	other threads:[~2016-01-13 22:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-13 18:15 [PATCH] submodule: Port resolve_relative_url from shell to C Stefan Beller
2016-01-13 22:03 ` Junio C Hamano
2016-01-13 22:47   ` Stefan Beller [this message]
2016-01-14 20:50     ` Jens Lehmann
2016-01-14 23:43       ` Stefan Beller
2016-01-15 17:37     ` Junio C Hamano
2016-01-15 22:58       ` Stefan Beller
2016-01-15 23:03         ` Junio C Hamano
2016-01-14 20:57   ` Johannes Sixt
2016-01-14 22:49     ` Stefan Beller
2016-01-13 22:03 ` Eric Sunshine
2016-01-13 23:37   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2015-12-10  1:07 Stefan Beller
2015-12-10  6:48 ` Johannes Sixt
2015-12-16 22:36   ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79ka0rxYK7GRSjh13XOsg887EgqYtc5B60z9qU=tAoJGERQ@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=jens.lehmann@web.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.