From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC64C4332D for ; Thu, 19 Mar 2020 17:49:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2CD1E2072D for ; Thu, 19 Mar 2020 17:49:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TwksaMrd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2CD1E2072D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B76166B000A; Thu, 19 Mar 2020 13:49:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B27036B000C; Thu, 19 Mar 2020 13:49:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A151C6B000D; Thu, 19 Mar 2020 13:49:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id 89B386B000A for ; Thu, 19 Mar 2020 13:49:55 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3815980C66E9 for ; Thu, 19 Mar 2020 17:49:55 +0000 (UTC) X-FDA: 76612849950.24.level66_1c5797254d247 X-HE-Tag: level66_1c5797254d247 X-Filterd-Recvd-Size: 9155 Received: from mail-io1-f65.google.com (mail-io1-f65.google.com [209.85.166.65]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Mar 2020 17:49:54 +0000 (UTC) Received: by mail-io1-f65.google.com with SMTP id d15so3214272iog.3 for ; Thu, 19 Mar 2020 10:49:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FkxTeSKwfL+aekJgXMO3FuXeUp/SmyvA0mALvLPX3Dw=; b=TwksaMrdG4g8Kazjeood/rE4Nkk7khj9/nbZx2P0DgNcScLnAh4H0UVfBeQADIFFFT hDAX3kpMHI56xn5SVZkaXOdyFJiHc6wcv9ugxkoHy3RD2qICsR0mfhyA4GNyC4JawmS8 zUWaUyC5rtrheENBHwqghBToRs0FBnGIXvgOlLJJjUN4JRIZIqKffFyptSuGrV+TtVP5 6Og/PxW2kGK0Ic6VtPi4LovEHjeF0XT3P2cEHeJdrVr1Jiyxs6dDa3vN4FyJ6wyuCZUj 7WoIzjRYe9J3mv7QlSWrcWtOx26uGSg0baMaSUl9/KeJ0xjRKB0k5Lxv0JmZLvgD2usc xEeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FkxTeSKwfL+aekJgXMO3FuXeUp/SmyvA0mALvLPX3Dw=; b=WAGKt/+3wM1jeJwDrvFiHot5zMvkyHq1aFB9+YIbx7NWVR3L4R9JUwXtyKK724+Dp8 bBaaPjfuZv9o5s396GgwrerPFnF3mrkWSXHtNlg3hzUkfsV5F54K1nktEIqnmPQS0KGc oAsaNBMTiA6Lro/pQ0R+FEuzZMBD+FicQNxjVUKvKE9TcvwxY9UJqRMUMNhpshMncL3f E30cRRUkpcuVxQIudoLhyY92uPgnwTSFHCZKhDak0UwEZsHTJb/q4/N9fmuRdM4QgrAn wnTUbZQh+ie3ZaAqKS01oc+eZ6UoiXqP98IaRLkpJRnM+BkTuMMPscdpp4rug39n2w2i w30g== X-Gm-Message-State: ANhLgQ2OsCho63H2vIxvdsQ8gti33cgStH87sIi39cbbTAUC6iGQbDjC Z/+1r4j4AxIZ3qIEibaLDvVj+51RERYRcmpvCHI= X-Google-Smtp-Source: ADFU+vsUIWeNPa373emrZHfQsIcMo8IHOyonncyTmRiik0+APQHBUH++umrrXxRQhetVubkwYw5Sewj6UxzrCX70cZU= X-Received: by 2002:a6b:156:: with SMTP id 83mr3788085iob.187.1584640193647; Thu, 19 Mar 2020 10:49:53 -0700 (PDT) MIME-Version: 1.0 References: <20200317135035.GA19442@SDF.ORG> <202003171435.41F7F0DF9@keescook> <20200317230612.GB19442@SDF.ORG> <202003171619.23210A7E0@keescook> <20200318014410.GA2281@SDF.ORG> <20200318203914.GA16083@SDF.ORG> <20200319120522.GA1484@SDF.ORG> In-Reply-To: <20200319120522.GA1484@SDF.ORG> From: Alexander Duyck Date: Thu, 19 Mar 2020 10:49:42 -0700 Message-ID: Subject: Re: [PATCH v4] mm/shuffle.c: Fix races in add_to_free_area_random() To: George Spelvin Cc: Kees Cook , Dan Williams , linux-mm , Andrew Morton , Randy Dunlap Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 19, 2020 at 5:05 AM George Spelvin wrote: > > The separate "rand" and "rand_count" variables could get out of > sync with bad results. > > In the worst case, two threads would see rand_count=1 and both > decrement it, resulting in rand_count=255 and rand being filled > with zeros for the next 255 calls. > > Instead, pack them both into a single, atomically updateable, > variable. This makes it a lot easier to reason about race > conditions. They are still there - the code deliberately eschews > locking - but basically harmless on the rare occasions that > they happen. > > Second, use READ_ONCE and WRITE_ONCE. Because the random bit > buffer is accessed by multiple threads concurrently without > locking, omitting those puts us deep in the land of nasal demons. > The compiler would be free to spill to the static variable in > arbitrarily perverse ways and create hard-to-find bugs. > > (I'm torn between this and just declaring the buffer "volatile". > Linux tends to prefer marking accesses rather than variables, > but in this case, every access to the buffer is volatile. > It makes no difference to the generated code.) > > Third, use long rather than u64. This not only keeps the state > atomically updateable, it also speeds up the fast path on 32-bit > machines. Saving at least three instructions on the fast path (one > load, one add-with-carry, and one store) is worth a second call > to get_random_u*() per 64 bits. The fast path of get_random_u* > is less than the 3*64 = 192 instructions saved, and the slow path > happens every 64 bytes so isn't affected by the change. > > Fourth, make the function inline. It's small, and there's only > one caller (in mm/page_alloc.c:__free_one_page()), so avoid the > function call overhead. > > Fifth, use the msbits of the buffer first (left shift) rather > than the lsbits (right shift). Testing the sign bit produces > slightly smaller/faster code than testing the lsbit. > > I've tried shifting both ways, and copying the desired bit to a > boolean before shifting rather than keeping separate full-width > r and rshift variables, but both produce larger code: > > x86-64 text size > Msbit 42236 > Lsbit 42242 (+6) > Lsbit+bool 42258 (+22) > Msbit+bool 42284 (+52) > > (Since this is straight-line code, size is a good proxy for number > of instructions and execution time. Using READ/WRITE_ONCE instead of > volatile makes no difference.) > > In a perfect world, on x86-64 the fast path would be: > shlq rand(%eip) > jz refill > refill_complete: > jc add_to_tail > > but I don't see how to get gcc to generate that, and this > function isn't worth arch-specific implementation. > > Signed-off-by: George Spelvin > Acked-by: Kees Cook > Acked-by: Dan Williams > Cc: Alexander Duyck > Cc: Randy Dunlap > Cc: Andrew Morton > Cc: linux-mm@kvack.org Acked-by: Alexander Duyck > --- > v2: Rewrote commit message to explain existing races better. > Made local variables unsigned to avoid (technically undefined) > signed overflow. > v3: Typos fixed, Acked-by, expanded commit message. > v4: Rebase against -next; function has changed from > add_to_free_area_random() to shuffle_pick_tail. Move to > inline function in shuffle.h. > Not sure if it's okay to keep Acked-by: after such a > significant change. > > mm/shuffle.c | 23 ----------------------- > mm/shuffle.h | 26 +++++++++++++++++++++++++- > 2 files changed, 25 insertions(+), 24 deletions(-) > > diff --git a/mm/shuffle.c b/mm/shuffle.c > index 44406d9977c7..ea281d5e1f23 100644 > --- a/mm/shuffle.c > +++ b/mm/shuffle.c > @@ -182,26 +182,3 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat) > for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) > shuffle_zone(z); > } > - > -bool shuffle_pick_tail(void) > -{ > - static u64 rand; > - static u8 rand_bits; > - bool ret; > - > - /* > - * The lack of locking is deliberate. If 2 threads race to > - * update the rand state it just adds to the entropy. > - */ > - if (rand_bits == 0) { > - rand_bits = 64; > - rand = get_random_u64(); > - } > - > - ret = rand & 1; > - > - rand_bits--; > - rand >>= 1; > - > - return ret; > -} > diff --git a/mm/shuffle.h b/mm/shuffle.h > index 4d79f03b6658..fb79e05cd86d 100644 > --- a/mm/shuffle.h > +++ b/mm/shuffle.h > @@ -22,7 +22,31 @@ enum mm_shuffle_ctl { > DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key); > extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl); > extern void __shuffle_free_memory(pg_data_t *pgdat); > -extern bool shuffle_pick_tail(void); > +static inline bool shuffle_pick_tail(void) > +{ > + static unsigned long rand; /* buffered random bits */ > + unsigned long r = READ_ONCE(rand), rshift = r << 1; > + > + /* > + * rand holds 0..BITS_PER_LONG-1 random msbits, followed by a > + * 1 bit, then zero-padding in the lsbits. This allows us to > + * maintain the pre-generated bits and the count of bits in a > + * single, atomically updatable, variable. > + * > + * The lack of locking is deliberate. If two threads race to > + * update the rand state it just adds to the entropy. The > + * worst that can happen is a random bit is used twice, or > + * get_random_long is called redundantly. > + */ > + if (unlikely(rshift == 0)) { > + r = get_random_long(); > + rshift = r << 1 | 1; > + } > + WRITE_ONCE(rand, rshift); > + > + return (long)r < 0; > +} > + > static inline void shuffle_free_memory(pg_data_t *pgdat) > { > if (!static_branch_unlikely(&page_alloc_shuffle_key)) > > base-commit: 47780d7892b77e922bbe19b5dea99cde06b2f0e5 > -- > 2.26.0.rc2