From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 110A2C433F5 for ; Fri, 10 Dec 2021 10:23:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235195AbhLJK1d (ORCPT ); Fri, 10 Dec 2021 05:27:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234963AbhLJK1c (ORCPT ); Fri, 10 Dec 2021 05:27:32 -0500 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B119C0617A1 for ; Fri, 10 Dec 2021 02:23:57 -0800 (PST) Received: by mail-ed1-x534.google.com with SMTP id x15so28977556edv.1 for ; Fri, 10 Dec 2021 02:23:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=p7+3c7MAncaZnaNLD21trmpGATS0DBjdWzgtLWA9wXY=; b=N8nIAThJHPoiJCuuCc9kxADxv/W3nkogi+5bztwggxFbQydvjhwnMclfRg0jH7eHEP 6DPyRat2+KUszvk5w8EEhTkyb+FpKq48LHEeXAV1yTgjyGNf0RjIv23XzrX8v7LXrr6J nm/QeQnIaUP9KqYODGeyTj9t9fckjN5NiWGfz0dAD5yAzbtdlaJcYsKC8xc5kCwW5ZR4 m+J2030wpUldY4v2F7GcZn698nUl6waPhG585U/zdcR4SOioOLMpHGAXttCL/Dj3zZHM 8pxWMklfMYeFavJOeoTgc6/lgClmLQspEuCxYnVDYEtP6GS5TTNjG8HSm0xmt0Hd+LAQ yLdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version:content-transfer-encoding; bh=p7+3c7MAncaZnaNLD21trmpGATS0DBjdWzgtLWA9wXY=; b=kuU7KVBD++kNjo4i/5Hsd00xEWFaJlqkUabAt+q9sOqteT1e3n3VLSsdZMjlJ0b/Sa LmL6P2w8bgBv7FE6johbngFMiU8PE2HAXLmzXpO1uTU7idlxiVdDEU6JA6ucJm0xjq1L ihP/o4OvpT1IDsh4lxFGI2MoKrPJKNuHRS05Px+7xnjGj6if9ijK6v8nzh6gj6znY4E3 QmRwHH6aiXgVG8H62vUv7n0HQREmf+rCxXufhRNtXEOG9eJG+lNmhtgDilToEnjk60K+ NwBybyQ5NRvQbp368TUJu7t2goiD7AtSBfgV8Iv06E4o2ruz5PkY6QfL4xpvu47gTDiI odlA== X-Gm-Message-State: AOAM533PNRm3OkBPIZ0voV396VkODktZi5XV6lKlDYS4rf3hLDSAvTnj tvkyhxSDMHZLwprbJ1Jj5MI= X-Google-Smtp-Source: ABdhPJxAnIdExl2h1TuuQprTgP1RAamkRw60OM83W9DJvmKYWekJWAGtR0yCyZ09AOa0JJmlaLov+w== X-Received: by 2002:a05:6402:3cd:: with SMTP id t13mr37261838edw.97.1639131836012; Fri, 10 Dec 2021 02:23:56 -0800 (PST) Received: from gmgdl (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id hs8sm1254743ejc.53.2021.12.10.02.23.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Dec 2021 02:23:55 -0800 (PST) Received: from avar by gmgdl with local (Exim 4.95) (envelope-from ) id 1mvd4E-000Giv-JY; Fri, 10 Dec 2021 11:23:54 +0100 From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: git@vger.kernel.org, Junio C Hamano , Johannes Schindelin , Erik Faye-Lund , Jonathan Nieder Subject: Re: [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int" Date: Fri, 10 Dec 2021 11:22:59 +0100 References: User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.6.10 In-reply-to: Message-ID: <211210.86lf0sdah1.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Dec 09 2021, Jeff King wrote: > On Thu, Dec 09, 2021 at 08:19:19PM +0100, =C3=86var Arnfj=C3=B6r=C3=B0 Bj= armason wrote: > >> As documented in 320d0b493a2 (add helpers for detecting size_t >> overflow, 2016-02-19) the arguments to st_mult() and st_add() "must be >> unsigned". > > This isn't correct. The comment that says "must be unsigned" is for > unsigned_mult_overflows(). Which indeed would not work correctly for a > signed type. But st_add() is a separate function (and not a macro) > because that means its arguments will always be promoted or converted > into a size_t. And so no matter what you pass it, it will always itself > pass a size_t into unsigned_mult_overflows(). > >> In subsequent commits further overflows resulting in segfaults will be >> fixed in this code, but let's start by removing this supposed guard >> that does nothing except give us a false sense of >> security. E.g. providing an "n" of INT_MAX here will result in "1" on >> my system, causing us to write into memory. > > This guard doesn't do nothing. Your patch here: > >> @@ -312,7 +312,7 @@ static void get_correspondences(struct string_list *= a, struct string_list *b, >> int *cost, c, *a2b, *b2a; >> int i, j; >>=20=20 >> - ALLOC_ARRAY(cost, st_mult(n, n)); >> + ALLOC_ARRAY(cost, n * n); >> ALLOC_ARRAY(a2b, n); >> ALLOC_ARRAY(b2a, n); > > makes things strictly worse, because that "n * n" may overflow and cause > us to under-allocate the array. You can see it in isolation with some > extra code like this: > > diff --git a/git.c b/git.c > index 5ff21be21f..63349e4b79 100644 > --- a/git.c > +++ b/git.c > @@ -850,11 +850,23 @@ static int run_argv(int *argcp, const char ***argv) > return done_alias; > } >=20=20 > +static void foo(void) > +{ > + int n =3D 2 << 16; > + > + printf("n =3D %d\n", n); > + printf("raw mult =3D %"PRIuMAX"\n", (uintmax_t)(n * n)); > + printf("st_mult =3D %"PRIuMAX"\n", (uintmax_t)st_mult(n, n)); > + exit(0); > +} > + > int cmd_main(int argc, const char **argv) > { > const char *cmd; > int done_help =3D 0; >=20=20 > + foo(); > + > cmd =3D argv[0]; > if (!cmd) > cmd =3D "git-help"; > > With st_mult, we get the correct answer of 16GB. Without it, we end up > with 0! > > Back to the original code, if you generate a test setup like this: > > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > index e30bc48a29..f552d3086e 100755 > --- a/t/t3206-range-diff.sh > +++ b/t/t3206-range-diff.sh > @@ -772,4 +772,11 @@ test_expect_success '--left-only/--right-only' ' > test_cmp expect actual > ' >=20=20 > +test_expect_success 'giant case' ' > + test_commit base && > + test_commit_bulk --ref=3Drefs/heads/v1 --message=3D"v1 commit %s" 32768= && > + test_commit_bulk --ref=3Drefs/heads/v2 --message=3D"v2 commit %s" 32768= && > + git range-diff base v1 v2 > +' > + > test_done > > Then we'd allocate a 0-length array for "cost" and segfault as soon as > we try to put anything in it. You can confirm this by applying your > patch, running under gdb, and stopping at the xmalloc() call inside > get_correspondences(). With st_mult(), then we come up with the correct > value of 16GB (or complain about overflow on a 32-bit system). > > So st_add() is working as advertised here; it's goal is just to make > sure we never under-allocate. You are right, though, that the code after > that in get_correspondences() has trouble because of the signedness. If > the code used an "unsigned int", it would still be _wrong_. But when it > overflowed, it would always come up with a value under 4GB. So it might > produce a wrong answer, but it couldn't possibly point outside the > bounds of the array and cause a security problem. > > But because it's a signed int, we overflow to a negative value and try > to look at memory far before the start of the array. So I can see how it > is tempting to argue that this st_mult() isn't really helping since we > segfault anyway. But I'd disagree. By correctly computing the value of > 16GB instead of "0", we know that the ALLOC_ARRAY() line is doing the > right thing. And it may choose not to continue if it can't allocate > 16GB. That may happen because you don't have the RAM, but also because > of GIT_ALLOC_LIMIT. > > So if you set GIT_ALLOC_LIMIT=3D4g, for example, you are immune to the bug > here. We'd refuse the allocation and die there, which protects > downstream code from trying to fill in the array with bogus arithmetic. > > Dropping the st_mult() does nothing to fix the actual problem (which is > that this function should use a more appropriate type), but introduces > new failure modes. Yes you're entirely right. I had some stupid blinders on while writing this. FWIW I think I was experimenting with some local macros and conflated a testing of the overflow of n*n in gdb with the caste'd version, which you rightly point out here won't have the overflow issue at all. Sorry.