From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752094AbbEDQoy (ORCPT ); Mon, 4 May 2015 12:44:54 -0400 Received: from mail-la0-f49.google.com ([209.85.215.49]:32951 "EHLO mail-la0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750779AbbEDQop (ORCPT ); Mon, 4 May 2015 12:44:45 -0400 From: Rasmus Villemoes To: Alexey Dobriyan Cc: Andrew Morton , Linux Kernel Subject: Re: [PATCH 01/10] Add parse_integer() (replacement for simple_strto*()) Organization: D03 References: <20150502004714.GA21655@p183.telecom.by> <87h9rsiigj.fsf@rasmusvillemoes.dk> X-Hashcash: 1:20:150504:akpm@linux-foundation.org::HCAYWJHWL127Urdd:0000000000000000000000000000000000000NyH X-Hashcash: 1:20:150504:adobriyan@gmail.com::fkkyV9nfePbmDibK:0000000000000000000000000000000000000000000yEZ X-Hashcash: 1:20:150504:linux-kernel@vger.kernel.org::YoYJxa+JdBMVAr4C:0000000000000000000000000000000004TMd Date: Mon, 04 May 2015 18:44:42 +0200 In-Reply-To: (Alexey Dobriyan's message of "Mon, 4 May 2015 17:32:50 +0300") Message-ID: <87y4l4gumd.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [I'm merging the subthreads below] On Mon, May 04 2015, Alexey Dobriyan wrote: > On Mon, May 4, 2015 at 4:24 PM, Rasmus Villemoes > wrote: >> >> First: Could you tell me what tree I can commit this on top of, to see >> what gcc makes of it. > > Any recent kernel should be OK, code it quite self contained. > I've just applied first two patches on top 4.1-rc2. > BTW the correct order is > > 1) [PATCH 01/10] Add parse_integer() (replacement for simple_strto*()) > *** 2) [PATCH CORRECT 03/10] parse_integer: convert sscanf() > 3) [PATCH 03/10] parse_integer: convert sscanf() > 4) [PATCH 04/10] sscanf: fix overflow > ... > 10) [PATCH 10/10] ext2, ext3, ext4: convert to parse_integer()/kstrto*() > > I've copied patch #2 twice, so it won't apply and resent it > with subject from patch #3 to confuse everyone even more. You certainly successfully confused me :-) >>> +#define parse_integer(s, base, val) \ >>> +({ \ >>> + const char *_s = (s); \ >>> + unsigned int _base = (base); \ >>> + typeof(val) _val = (val); \ >>> + \ >>> + __builtin_choose_expr( \ >>> + __builtin_types_compatible_p(typeof(_val), signed char *), \ >>> + _parse_integer_sc(_s, _base, (void *)_val), >>> \ >> >> Why the (void*) cast? Isn't _val supposed to have precisely the type >> expected by _parse_integer_sc at this point? >> >>> + __builtin_choose_expr( \ >>> + __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 4,\ >>> + _parse_integer_i(_s, _base, (void *)_val), \ >>> + __builtin_choose_expr( \ >>> + __builtin_types_compatible_p(typeof(_val), long *) && sizeof(long) == 8,\ >>> + _parse_integer_ll(_s, _base, (void *)_val), \ >>> + __builtin_choose_expr( \ >>> + __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 4,\ >>> + _parse_integer_u(_s, _base, (void *)_val), \ >>> + __builtin_choose_expr( \ >>> + __builtin_types_compatible_p(typeof(_val), unsigned long *) && sizeof(unsigned long) == 8,\ >>> + _parse_integer_ull(_s, _base, (void *)_val), \ >> >> Ah, I see. In these cases, one probably has to do a cast to pass a >> (long*) as either (int*) or (long long*) - but why not cast to the type >> actually expected by _parse_integer_* instead of the launder-anything (void*)? > > First macro was written without casts at all naively thinking > that gcc will only typecheck in chosen __builtin_choose_expr branch. > But it doesn't do that, remove casts and observe million of warnings. > So I shut it up with "void *". Branch is chosen base on __b_t_c_p > expression and I don't think it is possible to sneak in incorrect pointer. I see. I was worried about something like the cast laundering away a const qualifier, but (const int*) is not _b_t_c with (int*). So ok for now - I'll play around with this a little and see if one can get rid of the casts anyway. >> >> Is there any reason to disallow "-0"? > > No! -0 is not accepted because code is copied from kstrtoll() > which doesn't accept "-0". It is even in the testsuite: > > static void __init test_kstrtoll_fail(void) > { > ... > /* negative zero isn't an integer in Linux */ > {"-0", 0}, > {"-0", 8}, > {"-0", 10}, > {"-0", 16}, > > Frankly I don't even remember why it does that, and > no one noticed until now. libc functions accept "-0". I think it's odd to accept "+0" but not "-0", but that's probably just because I'm a mathematician. Am I right that you just added these test cases because of the existing behaviour of kstrtoll? I suppose that behaviour is just a historical accident. If "-0" is not going to be accepted, I think that deserves a comment (with rationale) in the parsing code and not hidden away in the test suite. >>> unsigned long long memparse(const char *ptr, char **retptr) >>> { >>> - char *endptr; /* local pointer to end of parsed string */ >>> + unsigned long long val; >>> >>> - unsigned long long ret = simple_strtoull(ptr, &endptr, 0); >>> - >>> - switch (*endptr) { >>> + ptr += parse_integer(ptr, 0, &val); >> >> This seems wrong. simple_strtoull used to "sanitize" the return value >> from the (old) _parse_integer, so that endptr still points into the >> given string. Unconditionally adding the result from parse_integer may >> make ptr point far before the actual string, into who-knows-what. > > When converting I tried to preserve the amount of error checking done. > simple_strtoull() either > a) return 0 and not advance pointer, or > b) return something and advance pointer. > Are we talking about the same simple_strtoull? I see cp = _parse_integer_fixup_radix(cp, &base); rv = _parse_integer(cp, base, &result); /* FIXME */ cp += (rv & ~KSTRTOX_OVERFLOW); so cp is definitely advanced even in case of overflow. And in the case of "underflow" (no digits found), the old code does initialize *result to 0, while parse_integer by design doesn't write anything. > Current code just ignores error case, so do I. There's a difference between ignoring an error (which the current code does), and ignoring _the possibility_ of an error (which the new code does). There are lots of callers of memparse(), and I don't think any of them are prepared to handle *endp ending up pointing before the passed-in string (-EINVAL == -22, -ERANGE == -34). I can easily see how that could lead to an infinite loop, maybe worse. [on the use of anonymous variables to get right type] >> I suggest using the more obvious approach of declaring a union on the >> stack and pass the address of the appropriate member. > > Union will work. > > No that it matters, it's a filesystem option parsing code > which is executed rarely near the top of sys_mount(), > not exactly perfomance bottleneck. > > It's a shame to lose such clever hack :-( I'm sure you can find somewhere else to apply the hack :-) > Just grep for simple_strto* and see how much error checking people do. > The aim of the patchset is to convert code, error checking is separate > story. I think that's the wrong approach. One reason for adding this new interface is precisely to facilitate error checking. So if you do some example conversions, please do them _properly_ to show how error checking is done. As you say, it's easy to find places where error checking isn't done by just grepping for simple_strto*. I don't want to have to add parse_integer to that list. Abusing a new interface from the beginning isn't really a recipe for happiness. Converting uses of simple_strto*() to parse_integer() doesn't add value if semantics of calling code is exactly the same. And it adds negative value if the conversions change the semantics in a surprising way, as in the memparse case above :-( Rasmus