Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())

From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())
Date: Mon, 04 May 2015 18:44:42 +0200	[thread overview]
Message-ID: <87y4l4gumd.fsf@rasmusvillemoes.dk> (raw)
In-Reply-To: <CACVxJT_=GLvA0D9LGsXnOcj+r8VgFMP4pcL5oM6cQ2CcT2ZNVA@mail.gmail.com> (Alexey Dobriyan's message of "Mon, 4 May 2015 17:32:50 +0300")

[I'm merging the subthreads below]

On Mon, May 04 2015, Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Mon, May 4, 2015 at 4:24 PM, Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>>
>> First: Could you tell me what tree I can commit this on top of, to see
>> what gcc makes of it.
>
> Any recent kernel should be OK, code it quite self contained.
> I've just applied first two patches on top 4.1-rc2.
> BTW the correct order is
>
> 1) [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())
> *** 2) [PATCH CORRECT 03/10] parse_integer: convert sscanf()
> 3) [PATCH 03/10] parse_integer: convert sscanf()
> 4) [PATCH 04/10] sscanf: fix overflow
>  ...
> 10) [PATCH 10/10] ext2, ext3, ext4: convert to parse_integer()/kstrto*()
>
> I've copied patch #2 twice, so it won't apply and resent it
> with subject from patch #3 to confuse everyone even more.

You certainly successfully confused me :-) 

>>> +#define parse_integer(s, base, val)  \
>>> +({                                   \
>>> +     const char *_s = (s);           \
>>> +     unsigned int _base = (base);    \
>>> +     typeof(val) _val = (val);       \
>>> +                                     \
>>> +     __builtin_choose_expr(                                          \
>>> +     __builtin_types_compatible_p(typeof(_val), signed char *),      \
>>> +     _parse_integer_sc(_s, _base, (void *)_val),
>>> \
>>
>> Why the (void*) cast? Isn't _val supposed to have precisely the type
>> expected by _parse_integer_sc at this point?
>>
>>> +     __builtin_choose_expr(                                          \
>>> +     __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 4,\
>>> +     _parse_integer_i(_s, _base, (void *)_val),                      \
>>> +     __builtin_choose_expr(                                          \
>>> +     __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 8,\
>>> +     _parse_integer_ll(_s, _base, (void *)_val),                     \
>>> +     __builtin_choose_expr(                                          \
>>> +     __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 4,\
>>> +     _parse_integer_u(_s, _base, (void *)_val),                      \
>>> +     __builtin_choose_expr(                                          \
>>> +     __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 8,\
>>> +     _parse_integer_ull(_s, _base, (void *)_val),                    \
>>
>> Ah, I see. In these cases, one probably has to do a cast to pass a
>> (long*) as either (int*) or (long long*) - but why not cast to the type
>> actually expected by _parse_integer_* instead of the launder-anything (void*)?
>
> First macro was written without casts at all naively thinking
> that gcc will only typecheck in chosen __builtin_choose_expr branch.
> But it doesn't do that, remove casts and observe million of warnings.
> So I shut it up with "void *". Branch is chosen base on __b_t_c_p
> expression and I don't think it is possible to sneak in incorrect pointer.

I see. I was worried about something like the cast laundering away a
const qualifier, but (const int*) is not _b_t_c with (int*). So ok for
now - I'll play around with this a little and see if one can get rid of
the casts anyway.
>>
>> Is there any reason to disallow "-0"?
>
> No! -0 is not accepted because code is copied from kstrtoll()
> which doesn't accept "-0". It is even in the testsuite:
>
>   static void __init test_kstrtoll_fail(void)
>   {
>   ...
> /* negative zero isn't an integer in Linux */
> {"-0",  0},
> {"-0",  8},
> {"-0",  10},
> {"-0",  16},
>
> Frankly I don't even remember why it does that, and
> no one noticed until now. libc functions accept "-0".

I think it's odd to accept "+0" but not "-0", but that's probably just
because I'm a mathematician. Am I right that you just added these test
cases because of the existing behaviour of kstrtoll? I suppose that
behaviour is just a historical accident.

If "-0" is not going to be accepted, I think that deserves a comment
(with rationale) in the parsing code and not hidden away in the test
suite.

>>>  unsigned long long memparse(const char *ptr, char **retptr)
>>>  {
>>> -     char *endptr;   /* local pointer to end of parsed string */
>>> +     unsigned long long val;
>>>
>>> -     unsigned long long ret = simple_strtoull(ptr, &endptr, 0);
>>> -
>>> -     switch (*endptr) {
>>> +     ptr += parse_integer(ptr, 0, &val);
>>
>> This seems wrong. simple_strtoull used to "sanitize" the return value
>> from the (old) _parse_integer, so that endptr still points into the
>> given string. Unconditionally adding the result from parse_integer may
>> make ptr point far before the actual string, into who-knows-what.
>
> When converting I tried to preserve the amount of error checking done.
> simple_strtoull() either
> a) return 0 and not advance pointer, or
> b) return something and advance pointer.
>

Are we talking about the same simple_strtoull? I see

	cp = _parse_integer_fixup_radix(cp, &base);
	rv = _parse_integer(cp, base, &result);
	/* FIXME */
	cp += (rv & ~KSTRTOX_OVERFLOW);

so cp is definitely advanced even in case of overflow. And in the case
of "underflow" (no digits found), the old code does initialize *result
to 0, while parse_integer by design doesn't write anything.

> Current code just ignores error case, so do I.

There's a difference between ignoring an error (which the current code
does), and ignoring _the possibility_ of an error (which the new code
does).

There are lots of callers of memparse(), and I don't think any of them
are prepared to handle *endp ending up pointing before the passed-in
string (-EINVAL == -22, -ERANGE == -34). I can easily see how that could
lead to an infinite loop, maybe worse.

[on the use of anonymous variables to get right type]

>> I suggest using the more obvious approach of declaring a union on the
>> stack and pass the address of the appropriate member.
>
> Union will work.
>
> No that it matters, it's a filesystem option parsing code
> which is executed rarely near the top of sys_mount(),
> not exactly perfomance bottleneck.
>
> It's a shame to lose such clever hack :-(

I'm sure you can find somewhere else to apply the hack :-)

> Just grep for simple_strto* and see how much error checking people do.
> The aim of the patchset is to convert code, error checking is separate
> story.

I think that's the wrong approach. One reason for adding this new
interface is precisely to facilitate error checking. So if you do some
example conversions, please do them _properly_ to show how error
checking is done. 

As you say, it's easy to find places where error checking isn't done by
just grepping for simple_strto*. I don't want to have to add
parse_integer to that list. Abusing a new interface from the beginning
isn't really a recipe for happiness.

Converting uses of simple_strto*() to parse_integer() doesn't add value
if semantics of calling code is exactly the same. And it adds negative
value if the conversions change the semantics in a surprising way, as in
the memparse case above :-(

Rasmus