From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1j4DHr-0004PC-6n for mharc-grub-devel@gnu.org; Tue, 18 Feb 2020 19:32:23 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:52582) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j4DHo-0004P4-JK for grub-devel@gnu.org; Tue, 18 Feb 2020 19:32:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j4DHm-0004jS-8C for grub-devel@gnu.org; Tue, 18 Feb 2020 19:32:19 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:57515 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j4DHl-0004iM-TT for grub-devel@gnu.org; Tue, 18 Feb 2020 19:32:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582072336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xC4Z+N1gTeJsmsv7hYcP8/TaQ087iPvF0nxuU/tvSMU=; b=dejtQZ27gw3z56SgFEItvHnzbVaJGY4LlcHRB20ZgkztsrAKx0wo6lDxApVi3FUGex8puY ORg28NeC3nlNW9gEJ6Dc4yGtd4Xivtcu/JvQWHk+6f9CtYXPbN6f5szOvsEZC0eeDTtY8m 7nJQNU344iCMHCHgGHK/OX29QsTQl6w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-375-P8dwkYtAMNmHVy20-fGLPg-1; Tue, 18 Feb 2020 19:32:14 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A8D79107ACCC for ; Wed, 19 Feb 2020 00:32:13 +0000 (UTC) Received: from redhat.com (dhcp-10-20-1-15.bss.redhat.com [10.20.1.15]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 61149100194E for ; Wed, 19 Feb 2020 00:32:13 +0000 (UTC) Date: Tue, 18 Feb 2020 19:32:11 -0500 From: Peter Jones To: The development of GNU GRUB Subject: Re: [PATCH] Make grub_strtoul() "end" pointer have the right const params. Message-ID: <20200219003211.lfwfmcjce2hpndzb@redhat.com> References: <20200204210206.1256169-1-pjones@redhat.com> <2f461a03-8bbd-e3b7-0bdf-390beb74d73b@gmail.com> MIME-Version: 1.0 In-Reply-To: <2f461a03-8bbd-e3b7-0bdf-390beb74d73b@gmail.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: P8dwkYtAMNmHVy20-fGLPg-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.81 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Feb 2020 00:32:22 -0000 On Tue, Feb 04, 2020 at 08:04:30PM -0500, Nicholas Vinson wrote: > On 2/4/20 16:02, Peter Jones wrote: > > grub_strtoul() and grub_strtoull() don't make the /pointer/ to "end" be > > const like normal implementations do, and as a result, at many places i= n >=20 > grub_strtoul() and grub_strtoull() appear to be patterned after the C > standard functions strtoul() and strtoull(). >=20 > According to C18, the signatures for those functions are: >=20 > unsigned long int strtoul(const char * restrict nptr, > char ** restrict endptr, int base); >=20 > unsigned long long int strtoull(const char * restrict nptr, > char ** restrict endptr, int base); It's true that the standard does not mandate any const usage on the endptr parameter, and it does have restrict. I'm happy to add restrict, though I'm going to ignore it for the rest of this email, because the standard literally says it changes no semantics. Having "const" on both th= e object pointed to by **endptr and on the outer pointer is still compatible with the standard, though - nothing that's allowed by the standard isn't allowed with those qualifiers. Semantically, the behavior is described in a way that is compatible with making those const as well: the value written to *endptr must point inside the object pointed to by nptr, and neither function is allowed to modify that object. Adding the const requirement does give us benefits though: - It means when someone is modifying grub_strtoul() or grub_strtoull(), as inevitably happens every once in a while, making an error which would violate those requirements will cause an error building that change, rather than making it in to the tree. That is, "const char **" protects us from making an error that writes modifies the object nptr points at, and "char ** const" protects us from accidentally changing the local pointer instead of the caller's pointer. - Likewise, if someone isn't thinking through what they're doing and takes the const out in order to "fix" some perceived problem, that should alert whomever is reviewing the code. In both of these regards, having the const qualifiers helps us trust that our implementation is correct. - It lets us pass in pointers that we *want* to be const from the caller's point of view. It means that if a function calling strtoul() has a "const char *" argument, it can safely pass that as nptr as well as passing its address as endptr; without making the object const (i.e at least "const char ** endptr"), that'll get you a warning about discarding the type qualifier. That applies equally to cases that want to wrap these functions with another function that has similar semantics; having const qualifiers allows that function to declare those things const or not. Not having the qualifiers here means the calling function also can't, which is where we wind up having a lot of the ugly code. - The same applies to cases where the object is (or should be) genuinely immutable - for instance, data that's in .rodata, in data provided by an option ROM, or firmware configuration table. It is better to be able to keep the type qualifier throughout the whole stack. - It allows the optimizer to make assumptions about how we're using it which may allow it to generate more efficient output without having to analyze the caller. > Therefore, I recommend the GRUB maintainers keep the existing signatures = for > grub_strtoul() and grub_strtoull() as they more closely follow the > signatures of the standard functions. Obviously, I don't agree. > I also skimmed through the rest of the changes and it seems to me most of > the misuse of these calls could be solved simply by doing some variation = of: >=20 > const char *ptr =3D ...; > char *end; >=20 > strtoul(ptr, &end, 10); > ptr =3D end; While this is valid in many of the existing cases, it's not as good in several ways. In addition to the last three points I made above, this also creates an unnecessary alias into the character object, which the compiler then has to track. In this case, since it is both a qualified type of the same type of object and a character object, it doesn't violate the aliasing rules ("-fstrict-aliasing -Wstrict-aliasing" isn't allowed to complain), but it can cause the compiler to both take longer and generate more, slower code than if that were: const char *ptr =3D ...; unsigned long val; val =3D strtoul(ptr, &ptr, 10); That's true for both the caller and in the implementation of strtoul() itself - though I don't think with current GCC it *will* cause the strtoul() output to be much different, because it's pretty simple, but for a more complex function like sscanf() calling it, the chances are a bit rougher. Hypothetically, it can also reduce the chances that it winds up getting inlined if we were to use -flto. > which is what I would personally recommend instead. If the compiler > issues an error or warning with the above, then that's a bug the > compiler maintainers should fix. That's true, but it puts limits on the caller's usage and the compiler's output. --=20 Peter