From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF96AC4360C for ; Thu, 26 Sep 2019 21:39:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B2916222C2 for ; Thu, 26 Sep 2019 21:39:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1569533989; bh=nMraaXkp92nv8nId3AQRLSVZZFMz/TVwa4+9PKZAyx0=; h=Subject:To:Cc:References:From:Date:In-Reply-To:List-ID:From; b=j4zahp5iagfJk8FBK8SNhBHKWQTIIzHmZ5ksWiL34G4xP8FMC4bAdoEjTsuaTLtfI BeLxglnMzjo3t9simzXnfAW1ssk9aIsWk+mVV0rlHuEwbxPQrl9KiTxHAwRe+vOx4K F3wD3/SCW4+4EXPAiw/INf99d0aA325J+wAlvQPk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726092AbfIZVjs (ORCPT ); Thu, 26 Sep 2019 17:39:48 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:39298 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725280AbfIZVjs (ORCPT ); Thu, 26 Sep 2019 17:39:48 -0400 Received: by mail-pl1-f194.google.com with SMTP id s17so180169plp.6 for ; Thu, 26 Sep 2019 14:39:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=pV3FpOkzQQShFchCizZU5uBqu5oBWsYmqNuxn4zVKE0=; b=nqTu9uRLhWdBwl+0m0XQX03hT0pvFXjsVpB4WJsqIN9PcUKdKFF232x51ONavywWGp 2cYBG2Lj70jxrSp/E3w0Mw6buSye+bxC7o0I48BzRqvVbXX5MrlrRRfmjxBLGHUZVdcl H/a2heorecZIrCoRhxd55w0VMYLpECIXTP8pMyJIx0eUBNq5SzqhDOcSJJlLXoZjmLc/ 3FVZCGil8XVwwz23Hmz/JzOIRNI4PD62gs973PKf7yQN8E7d8JwDBylNhByZiANCUEBO W4JBu3YJ1nscCa7Cj7jhDLhFWmZP9VJ/v/o1RaAh1xM/SsxCKZWCoY2BkAZOoOJ8Idci 3Ygw== X-Gm-Message-State: APjAAAUuCFyLfRq/7qQ9nG5Tkg0JYjXTKDmIjCv3Bu9MZK67D2JjKYcq DPCRJ51a2iId0A7s4TlFC2iG1g== X-Google-Smtp-Source: APXvYqy7KKyqlGOIFQNurRPuS4kxCwCBpKPXcUca5a9bY2uqtM2zEYAnQC8MtusRwyL7ipme3Xtx0w== X-Received: by 2002:a17:902:82cb:: with SMTP id u11mr729185plz.313.1569533987366; Thu, 26 Sep 2019 14:39:47 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:3602:86ff:fef6:e86b? ([2601:646:c200:1ef2:3602:86ff:fef6:e86b]) by smtp.googlemail.com with ESMTPSA id v8sm9595132pje.6.2019.09.26.14.39.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Sep 2019 14:39:46 -0700 (PDT) Subject: Re: [PATCH v5 1/1] random: getrandom(2): warn on large CRNG waits, introduce new flags To: "Ahmed S. Darwish" , Linus Torvalds , "Theodore Y. Ts'o" Cc: Florian Weimer , Willy Tarreau , Matthew Garrett , Lennart Poettering , "Eric W. Biederman" , "Alexander E. Patrakov" , Michael Kerrisk , lkml , linux-ext4 , linux-api , linux-man References: <20190912082530.GA27365@mit.edu> <20190914122500.GA1425@darwi-home-pc> <008f17bc-102b-e762-a17c-e2766d48f515@gmail.com> <20190915052242.GG19710@mit.edu> <20190918211503.GA1808@darwi-home-pc> <20190918211713.GA2225@darwi-home-pc> <20190926204217.GA1366@pc> <20190926204425.GA2198@pc> From: Andy Lutomirski Message-ID: <9a9715dc-e30b-24fb-a754-464449cafb2f@kernel.org> Date: Thu, 26 Sep 2019 14:39:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <20190926204425.GA2198@pc> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/26/19 1:44 PM, Ahmed S. Darwish wrote: > Since Linux v3.17, getrandom(2) has been created as a new and more > secure interface for pseudorandom data requests. It attempted to > solve three problems, as compared to /dev/urandom: > > 1. the need to access filesystem paths, which can fail, e.g. under a > chroot > > 2. the need to open a file descriptor, which can fail under file > descriptor exhaustion attacks > > 3. the possibility of getting not-so-random data from /dev/urandom, > due to an incompletely initialized kernel entropy pool > > To solve the third point, getrandom(2) was made to block until a > proper amount of entropy has been accumulated to initialize the CRNG > ChaCha20 cipher. This made the system call have no guaranteed > upper-bound for its initial waiting time. > > Thus when it was introduced at c6e9d6f38894 ("random: introduce > getrandom(2) system call"), it came with a clear warning: "Any > userspace program which uses this new functionality must take care to > assure that if it is used during the boot process, that it will not > cause the init scripts or other portions of the system startup to hang > indefinitely." > > Unfortunately, due to multiple factors, including not having this > warning written in a scary-enough language in the manpages, and due to > glibc since v2.25 implementing a BSD-like getentropy(3) in terms of > getrandom(2), modern user-space is calling getrandom(2) in the boot > path everywhere (e.g. Qt, GDM, etc.) > > Embedded Linux systems were first hit by this, and reports of embedded > systems "getting stuck at boot" began to be common. Over time, the > issue began to even creep into consumer-level x86 laptops: mainstream > distributions, like Debian Buster, began to recommend installing > haveged as a duct-tape workaround... just to let the system boot. > > Moreover, filesystem optimizations in EXT4 and XFS, e.g. b03755ad6f33 > ("ext4: make __ext4_get_inode_loc plug"), which merged directory > lookup code inode table IO, and very fast systemd boots, further > exaggerated the problem by limiting interrupt-based entropy sources. > This led to large delays until the kernel's cryptographic random > number generator (CRNG) got initialized. > > On a Thinkpad E480 x86 laptop and an ArchLinux user-space, the ext4 > commit earlier mentioned reliably blocked the system on GDM boot. > Mitigate the problem, as a first step, in two ways: > > 1. Issue a big WARN_ON when any process gets stuck on getrandom(2) > for more than CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC seconds. > > 2. Introduce new getrandom(2) flags, with clear semantics that can > hopefully guide user-space in doing the right thing. > > Set CONFIG_GETRANDOM_WAIT_THRESHOLD_SEC to a heuristic 30-second > default value. System integrators and distribution builders are deeply > encouraged not to increase it much: during system boot, you either > have entropy, or you don't. And if you didn't have entropy, it will > stay like this forever, because if you had, you wouldn't have blocked > in the first place. It's an atomic "either/or" situation, with no > middle ground. Please think twice. So what do we expect glibc's getentropy() to do? If it just adds the new flag to shut up the warning, we haven't really accomplished much.