Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())

From: Alexey Dobriyan <adobriyan@gmail.com>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*())
Date: Mon, 4 May 2015 22:54:35 +0300	[thread overview]
Message-ID: <20150504195435.GA21686@p183.telecom.by> (raw)
In-Reply-To: <87y4l4gumd.fsf@rasmusvillemoes.dk>

On Mon, May 04, 2015 at 06:44:42PM +0200, Rasmus Villemoes wrote:
> [I'm merging the subthreads below]
> 
> On Mon, May 04 2015, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Mon, May 4, 2015 at 4:24 PM, Rasmus Villemoes

> >> Is there any reason to disallow "-0"?
> >
> > No! -0 is not accepted because code is copied from kstrtoll()
> > which doesn't accept "-0". It is even in the testsuite:
> >
> >   static void __init test_kstrtoll_fail(void)
> >   {
> >   ...
> > /* negative zero isn't an integer in Linux */
> > {"-0",  0},
> > {"-0",  8},
> > {"-0",  10},
> > {"-0",  16},
> >
> > Frankly I don't even remember why it does that, and
> > no one noticed until now. libc functions accept "-0".
> 
> I think it's odd to accept "+0" but not "-0", but that's probably just
> because I'm a mathematician. Am I right that you just added these test
> cases because of the existing behaviour of kstrtoll? I suppose that
> behaviour is just a historical accident.
> 
> If "-0" is not going to be accepted, I think that deserves a comment
> (with rationale) in the parsing code and not hidden away in the test
> suite.

Again, I honestly do not remember why "-0" was banned.
Let's change it to "+0 -0" for signed case, "+0" for unsigned case.

> >>>  unsigned long long memparse(const char *ptr, char **retptr)
> >>>  {
> >>> -     char *endptr;   /* local pointer to end of parsed string */
> >>> +     unsigned long long val;
> >>>
> >>> -     unsigned long long ret = simple_strtoull(ptr, &endptr, 0);
> >>> -
> >>> -     switch (*endptr) {
> >>> +     ptr += parse_integer(ptr, 0, &val);
> >>
> >> This seems wrong. simple_strtoull used to "sanitize" the return value
> >> from the (old) _parse_integer, so that endptr still points into the
> >> given string. Unconditionally adding the result from parse_integer may
> >> make ptr point far before the actual string, into who-knows-what.
> >
> > When converting I tried to preserve the amount of error checking done.
> > simple_strtoull() either
> > a) return 0 and not advance pointer, or
> > b) return something and advance pointer.
> >
> 
> Are we talking about the same simple_strtoull? I see
> 
> 	cp = _parse_integer_fixup_radix(cp, &base);
> 	rv = _parse_integer(cp, base, &result);
> 	/* FIXME */
> 	cp += (rv & ~KSTRTOX_OVERFLOW);
> 
> so cp is definitely advanced even in case of overflow. And in the case
> of "underflow" (no digits found), the old code does initialize *result
> to 0, while parse_integer by design doesn't write anything.
> 
> > Current code just ignores error case, so do I.
> 
> There's a difference between ignoring an error (which the current code
> does), and ignoring _the possibility_ of an error (which the new code
> does).
> 
> There are lots of callers of memparse(), and I don't think any of them
> are prepared to handle *endp ending up pointing before the passed-in
> string (-EINVAL == -22, -ERANGE == -34). I can easily see how that could
> lead to an infinite loop, maybe worse.

Yeah, possible bug could become worse, I'll add error checking,
but, seriously, you're defending this :^)

	case Opt_nr_inodes:
===>		/* memparse() will accept a K/M/G without a digit */
===>		if (!isdigit(*args[0].from))
===>			goto bad_val;
		pconfig->nr_inodes = memparse(args[0].from, &rest);
		break;

memparse() is misdesigned in the same sense strtoul() is misdesigned.
Every "memparse(s, NULL)" user is a bug for example.