Re: [PATCH v9 03/11] ref-filter: implement an `align` atom

From: Eric Sunshine <sunshine@sunshineco.com>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: Git List <git@vger.kernel.org>,
	Christian Couder <christian.couder@gmail.com>,
	Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v9 03/11] ref-filter: implement an `align` atom
Date: Sat, 8 Aug 2015 23:42:09 -0400	[thread overview]
Message-ID: <CAPig+cTHKbn0oCV61n=p5o9WihsaJbvWqKt4y9eFwA0noJoPgA@mail.gmail.com> (raw)
In-Reply-To: <CAOLa=ZSkmkPpiEfDrRXNQ6Tz5GT1+7vef3TOrj1t9aZ_3wm2Lw@mail.gmail.com>

On Sat, Aug 8, 2015 at 2:35 AM, Karthik Nayak <karthik.188@gmail.com> wrote:
> On Fri, Aug 7, 2015 at 8:57 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
>> On Wed, Aug 5, 2015 at 2:54 PM, Karthik Nayak <karthik.188@gmail.com> wrote:
>>> of the padding to be performed. If the atom length is more than the
>>> padding length then no padding is performed. e.g. to pad a succeeding
>>> atom to the middle with a total padding size of 40 we can do a
>>
>> It's odd to have alignment described in terms of "padding" and
>> "padding length", especially in the case of "center" alignment. It
>> might be better to rephrase the discussion in terms of field width or
>> such.
>>
>>> --format="%(align:middle,40).."
>
> Ok this makes sense,
> I'll rephrase as :
>
> `<width>` is the total length of the content with alignment.

This doesn't really make sense. <width> isn't the content length; it's
the size of the area into which the content will be placed.

> If the atom length is more than the width then no alignment is performed.

What "atom"? I think you mean the content between %(align:) and %(end)
rather than "atom". The description might be clearer if you actually
say "content between %(align:) and %(end)" to indicate specifically
what is being aligned.

> e.g. to align a succeeding atom to the middle with a total width of 40
> we can do:
> --format="%(align:middle,40).."
>>> @@ -687,6 +690,29 @@ static void populate_value(struct ref_array_item *ref)
>>>                         else
>>>                                 v->s = " ";
>>>                         continue;
>>> +               } else if (starts_with(name, "align:")) {
>>> +                       const char *valp = NULL;
>>
>> Unnecessary NULL assignment.
>
> Thats required for the second skip_prefix and so on.
> Else we get: "warning: ‘valp’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]"

Okay, so that's because skip_prefix() is inline, and it doesn't touch
its "out" argument unless it actually skips the prefix. Ugly, but
makes sense, although I think this issue would go away if you combined
the starts_with() and skips_prefix() as suggested earlier.

>>> +                       struct align *align = xmalloc(sizeof(struct align));
>>> +
>>> +                       skip_prefix(name, "align:", &valp);
>>
>> You could simplify the code by combining this skip_prefix() with the
>> earlier starts_with() in the conditional:
>>
>>     } else if (skip_prefix(name, "align:", &valp)) {
>>         struct align *align = xmalloc(sizeof(struct align));
>>         ...
>
> That would require valp to be previously defined. Hence the split.

Yes, it would require declaring 'valp' earlier, but that seems a
reasonable tradeoff for cleaner, simpler, less redundant code.

>>>  static void apply_formatting_state(struct ref_formatting_state *state, struct strbuf *final)
>>>  {
>>> -       /* More formatting options to be evetually added */
>>> +       if (state->align && state->end) {
>>> +               struct strbuf *value = state->output;
>>> +               int len = 0, buf_len = value->len;
>>> +               struct align *align = state->align;
>>> +
>>> +               if (!value->buf)
>>> +                       return;
>>> +               if (!is_utf8(value->buf)) {
>>> +                       len = value->len - utf8_strwidth(value->buf);
>>
>> What is this doing, exactly? If the string is *not* utf-8, then you're
>> asking it for its utf-8 width. Am I reading that correctly?
>>
>> Also, what is 'len' supposed to represent? I guess you want it to be
>> the difference between the byte length and the display length, but the
>> name 'len' doesn't convey that at all, nor does it help the reader
>> understand the code below where you do the actual formatting.
>>
>> In fact, if I'm reading this correctly, then 'len' is always zero in
>> your tests (because the tests never trigger this conditional), so this
>> functionality is never being exercised.
>
> There shouldn't be a "!" there, will change.
> I guess 'utf8_compensation' would be a better name.

Definitely better than 'len'.

>>> +               else if (align->align_type == ALIGN_MIDDLE) {
>>> +                       int right = (align->align_value - buf_len)/2;
>>> +                       strbuf_addf(final, "%*s%-*s", align->align_value - right + len,
>>> +                                   value->buf, right, "");
>>
>> An aesthetic aside: When (align_value - buf_len) is an odd number,
>> this implementation favors placing more whitespace to the left of the
>> string, and less to the right. In practice, this often tends to look a
>> bit more awkward than the inverse of placing more whitespace to the
>> right, and less to the left (but that again is subjective).
>
> I know that, maybe we could add an additional padding to even out the value
> given?

I don't understand your question. I was merely suggesting (purely
subjectively), for the "odd length" case, putting the extra space
after the centered text rather than before it. For instance:

    int left = (align->align_value - buf_len) / 2;
    strbuf_addf(final, "%*s%-*s", left, "",
        align->align_value - left + len, value->buf);

or any similar variation which would give the same result.

>> This is a tangent, but I could easily see all of the code from 'if
>> (align->align_value < buf_len)' down to this point being placed in
>> utf8.c as a general alignment utility function. Doing so would make
>> this function shorter, and the patch easier to review overall (which
>> might be an important consideration -- especially given that I've
>> already spent several hours reviewing this one patch).
>
> That's a valid suggestion, will do that, thanks, but that'd mean we need to
> send an align struct to utf8.c which is only defined in ref-filter.h.
> Either this
> is fine or we need to move the definition of struct align to utf8.h.
> I think personally move the align structure and enum over to utf8.h

No, you don't need to move the 'struct align' to utf8.h. That
structure is specific to ref-filter and should stay there. Instead,
you only need to move the enum. For instance, you'd add something like
this to utf8.h:

    enum utf8_alignment {
        ALIGN_LEFT,
        ALIGN_MIDDLE,
        ALIGN_RIGHT
    };

    void strbuf_utf8_align(struct strbuf *buf,
        utf8_alignment where, int width, const char *s);

By the way, I forgot to say earlier that this should be done as a
separate patch (in order to make the current patch smaller).

That raises another question. Why are 'struct ref_formatting_state',
'struct align', 'struct atom_value', etc. defined in ref-filter.h at
all? Aren't those private implementation details of ref-filter.c, or
do you expect other code to be using them?

>>>         for (i = 0; i < final_buf.len; i++)
>>>                 printf("%c", final_buf.buf[i]);
>>>         putchar('\n');
>>> diff --git a/ref-filter.h b/ref-filter.h
>>> index 9e6c2d4..5575fe9 100644
>>> --- a/ref-filter.h
>>> +++ b/ref-filter.h
>>> @@ -16,14 +16,30 @@
>>>  struct ref_formatting_state {
>>> -       int quote_style;
>>>         struct strbuf *output;
>>> +       struct align *align;
>>> +       int quote_style;
>>
>> Perhaps decide where you'd like 'quote_style' to reside from the start
>> so that you don't have to add it at one location in its introductory
>> patch and then move it in a later patch. Also, why move it here?
>
> Cause that'd save memory on a 64 bit processor, where the pointers would
> be 8 bytes long and int would be 4 bytes long, this would bring in padding if
> int is placed before the pointers. Will change before hand.

As I understand it, you're not likely to have many
ref_fomratting_state's around at any given time, so this sounds like
premature memory micro-optimization.