From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753178AbbEDNYl (ORCPT ); Mon, 4 May 2015 09:24:41 -0400 Received: from mail-lb0-f176.google.com ([209.85.217.176]:34815 "EHLO mail-lb0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752883AbbEDNYc (ORCPT ); Mon, 4 May 2015 09:24:32 -0400 From: Rasmus Villemoes To: Alexey Dobriyan Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*()) Organization: D03 References: <20150502004714.GA21655@p183.telecom.by> X-Hashcash: 1:20:150504:linux-kernel@vger.kernel.org::+Yr7gRP5EoozVpmH:0000000000000000000000000000000001XST X-Hashcash: 1:20:150504:adobriyan@gmail.com::b5LXNqc4Xm9ROlsu:0000000000000000000000000000000000000000001gbm X-Hashcash: 1:20:150504:akpm@linux-foundation.org::j6wIXI9cCf7cUBl8:0000000000000000000000000000000000002nqY Date: Mon, 04 May 2015 15:24:28 +0200 In-Reply-To: <20150502004714.GA21655@p183.telecom.by> (Alexey Dobriyan's message of "Sat, 2 May 2015 03:47:14 +0300") Message-ID: <87h9rsiigj.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 02 2015, Alexey Dobriyan wrote: > kstrto*() and kstrto*_from_user() family of functions were added > to help with parsing one integer written as string to proc/sysfs/debugfs > files. But they have a limitation: string passed must end with \0 or \n\0. > There are enough places where kstrto*() functions can't be used because of > this limitation. Trivial example: major:minor "%u:%u". > > Currently the only way to parse everything is simple_strto*() functions. > But they are suboptimal: > * they do not detect overflow (can be fixed, but no one bothered since ~0.99.11), > * there are only 4 of them -- long and "long long" versions, > This leads to silent truncation in the most simple case: > > val = strtoul(s, NULL, 0); > > * half of the people think that "char **endp" argument is necessary and > add unnecessary variable. > > OpenBSD people, fed up with how complex correct integer parsing is, added > strtonum(3) to fixup for deficiencies of libc-style integer parsing: > http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/strtonum.3?query=strtonum&arch=i386 > > It'd be OK to copy that but it relies on "errno" and fixed strings as > error reporting channel which I think is not OK for kernel. > strtonum() also doesn't report number of characted consumed. > > What to do? > > Enter parse_integer(). > > int parse_integer(const char *s, unsigned int base, T *val); > I like the general idea. Few nits below (and in reply to other patches). First: Could you tell me what tree I can commit this on top of, to see what gcc makes of it. > Rationale: > > * parse_integer() is exactly 1 (one) interface not 4 or > many,one for each type. > > * parse_integer() reports -E errors reliably in a canonical kernel way: > > rv = parse_integer(str, 10, &val); > if (rv < 0) > return rv; > > * parse_integer() writes result only if there were no errors, at least > one digit has to be consumed, > > * parse_integer doesn't mix error reporting channel and value channel, > it does mix error and number-of-character-consumed channel, though. > > * parse_integer() reports number of characters consumed, makes parsing > multiple values easy: I agree that these are desirable properties, and the way it should be done. > There are several deficiencies in parse_integer() compared to strto*(): > * can't be used in initializers: > > const T x = strtoul(); > > * can't be used with bitfields, > * can't be used in expressions: > > x = strtol() * 1024; > x = y = strtol() * 1024; > x += strtol(); It's nice that you list these, but... > In defense of parse_integer() all I can say, is that using strtol() in > expression or initializer promotes no error checking and thus probably > should not be encouraged in C, ...agreed - I actually think it is a _good_ thing that the parse_integer interface makes it impossible to use in these cases. > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -387,8 +388,10 @@ static inline int __must_check kstrtos32_from_user(const char __user *s, size_t > return kstrtoint_from_user(s, count, base, res); > } > > -/* Obsolete, do not use. Use kstrto instead */ > - > +/* > + * Obsolete, do not use. > + * Use parse_integer(), kstrto*(), kstrto*_from_user(), sscanf(). > + */ > extern unsigned long simple_strtoul(const char *,char **,unsigned int); > extern long simple_strtol(const char *,char **,unsigned int); > extern unsigned long long simple_strtoull(const char *,char **,unsigned int); > --- /dev/null > +++ b/include/linux/parse-integer.h > @@ -0,0 +1,79 @@ > +#ifndef _PARSE_INTEGER_H > +#define _PARSE_INTEGER_H > +#include > +#include > + > +/* > + * int parse_integer(const char *s, unsigned int base, T *val); > + * > + * Convert integer string representation to an integer. > + * Range of accepted values equals to that of type T. > + * > + * Conversion to unsigned integer accepts sign "+". > + * Conversion to signed integer accepts sign "+" and sign "-". > + * > + * Radix 0 means autodetection: leading "0x" implies radix 16, > + * leading "0" implies radix 8, otherwise radix is 10. > + * Autodetection hint works after optional sign, but not before. > + * > + * Return number of characters parsed or -E. > + * > + * "T=char" case is not supported because -f{un,}signed-char can silently > + * change range of accepted values. > + */ > +#define parse_integer(s, base, val) \ > +({ \ > + const char *_s = (s); \ > + unsigned int _base = (base); \ > + typeof(val) _val = (val); \ > + \ > + __builtin_choose_expr( \ > + __builtin_types_compatible_p(typeof(_val), signed char *), \ > + _parse_integer_sc(_s, _base, (void *)_val), > \ Why the (void*) cast? Isn't _val supposed to have precisely the type expected by _parse_integer_sc at this point? > + __builtin_choose_expr( \ > + __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 4,\ > + _parse_integer_i(_s, _base, (void *)_val), \ > + __builtin_choose_expr( \ > + __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 8,\ > + _parse_integer_ll(_s, _base, (void *)_val), \ > + __builtin_choose_expr( \ > + __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 4,\ > + _parse_integer_u(_s, _base, (void *)_val), \ > + __builtin_choose_expr( \ > + __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 8,\ > + _parse_integer_ull(_s, _base, (void *)_val), \ Ah, I see. In these cases, one probably has to do a cast to pass a (long*) as either (int*) or (long long*) - but why not cast to the type actually expected by _parse_integer_* instead of the launder-anything (void*)? Another thing: It may be slightly confusing that this can't be used with an array passed as val. This won't work: long x[1]; rv = parse_integer(s, 0, x); One could argue that one should pass &x[0] instead, but since this is a macro, gcc doesn't really give a very helpful error (I just get "error: invalid initializer"). I think it can be fixed simply by declaring _val using typeof(&val[0]). > +int _parse_integer_ull(const char *s, unsigned int base, unsigned long long *val) > +{ > + int rv; > + > + if (*s == '-') { > + return -EINVAL; > + } else if (*s == '+') { > + rv = __parse_integer(s + 1, base, val); > + if (rv < 0) > + return rv; > + return rv + 1; > + } else > + return __parse_integer(s, base, val); > +} > +EXPORT_SYMBOL(_parse_integer_ull); > + > +int _parse_integer_ll(const char *s, unsigned int base, long long *val) > +{ > + unsigned long long tmp; > + int rv; > + > + if (*s == '-') { > + rv = __parse_integer(s + 1, base, &tmp); > + if (rv < 0) > + return rv; > + if ((long long)-tmp >= 0) > + return -ERANGE; Is there any reason to disallow "-0"? Rasmus