From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06C7FC4320A for ; Tue, 27 Jul 2021 16:32:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8476261B93 for ; Tue, 27 Jul 2021 16:32:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8476261B93 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chromium.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 060886B0036; Tue, 27 Jul 2021 12:32:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 011378D0001; Tue, 27 Jul 2021 12:32:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1AC46B006C; Tue, 27 Jul 2021 12:32:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id C37036B0036 for ; Tue, 27 Jul 2021 12:32:14 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 68586182888B2 for ; Tue, 27 Jul 2021 16:32:14 +0000 (UTC) X-FDA: 78408910188.32.A2B07C3 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf05.hostedemail.com (Postfix) with ESMTP id 1D09050174BC for ; Tue, 27 Jul 2021 16:32:13 +0000 (UTC) Received: by mail-lf1-f50.google.com with SMTP id y34so22711391lfa.8 for ; Tue, 27 Jul 2021 09:32:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tjzeUVvJswvcHGT2fbToSFt2ojNAWbzdCIBHmZESzEA=; b=Zi0pWiQP3DUB+TIDfmSW2Vzimjnj6sQSfgbliH9BOWJ1PnGcT0BhpKm/lV9iDTk3XW NTSsg0DQACe+l/jGdeBGe6mlp+wW4X7YozLHvV6cYfHcP5xCEc6My3/QdtrrkoSfna9u Gzcw83SoCyk30haw8/ONsjvc7VHwgcpcglQYo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tjzeUVvJswvcHGT2fbToSFt2ojNAWbzdCIBHmZESzEA=; b=p23kdOXp4hn7En3/7v2xFYMvYp+qDBKUQJiEyffc8RXYq5nutYMJkQS0gAmj9tZVE9 WbKjhocU9pyLL6TQnsMWYLBle+17SzySqjqtCAjdwViFxHF2TTPYC4911pRSQxq/YV6y ilqO8KYLQFU662kEHd6xQgaRvk8yLnsSH0wGp36Geqgj9DNuCCXJUqyNaHlXQAG/MCcC uL6K9fBZ2+6HGGOF9xLSV+DotgwKgXkkXi0jIvBLsChAgkLDqCnZWY2ndiU+lw1R4aAI XBWIE8PR+83sV4y1ZrWpyD1JOsiwzCp3jTTpjSW+pWPsZl0eATiFvCAz/xdf6umvF/D5 2eGg== X-Gm-Message-State: AOAM533KeUOjoMy5i39E/4rLSRI3PpEoLXnOw6BhCm+fBYif8alFOjtJ tYjVYMkleOTL3OJK2qdk10ImtOsnXSMO3g== X-Google-Smtp-Source: ABdhPJxH/NbiUwpFTnawRxKhRxkzPP5CS/czTamz0S6BWnmu9hr540YV/uVBtLDApZpNUlbUkAdKOg== X-Received: by 2002:ac2:4578:: with SMTP id k24mr16900579lfm.521.1627403531200; Tue, 27 Jul 2021 09:32:11 -0700 (PDT) Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com. [209.85.208.171]) by smtp.gmail.com with ESMTPSA id i16sm337298lfl.107.2021.07.27.09.32.09 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Jul 2021 09:32:10 -0700 (PDT) Received: by mail-lj1-f171.google.com with SMTP id b21so16660683ljo.13 for ; Tue, 27 Jul 2021 09:32:09 -0700 (PDT) X-Received: by 2002:a2e:90c4:: with SMTP id o4mr16333375ljg.28.1627403529209; Tue, 27 Jul 2021 09:32:09 -0700 (PDT) MIME-Version: 1.0 References: <20210726171106.v4.1.I09866d90c6de14f21223a03e9e6a31f8a02ecbaf@changeid> <6ff28cfe-1107-347b-0327-ad36e256141b@redhat.com> In-Reply-To: <6ff28cfe-1107-347b-0327-ad36e256141b@redhat.com> From: Evan Green Date: Tue, 27 Jul 2021 09:31:33 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4] mm: Enable suspend-only swap spaces To: David Hildenbrand Cc: Andrew Morton , Michal Hocko , Pavel Machek , linux-api@vger.kernel.org, Alex Shi , Alistair Popple , Johannes Weiner , Joonsoo Kim , "Matthew Wilcox (Oracle)" , Miaohe Lin , Minchan Kim , Suren Baghdasaryan , Vlastimil Babka , LKML , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1D09050174BC Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Zi0pWiQP; spf=pass (imf05.hostedemail.com: domain of evgreen@chromium.org designates 209.85.167.50 as permitted sender) smtp.mailfrom=evgreen@chromium.org; dmarc=pass (policy=none) header.from=chromium.org X-Stat-Signature: 5ag7iafnu63t1iq4awngc8pp7ohayrc4 X-HE-Tag: 1627403533-312917 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 27, 2021 at 5:21 AM David Hildenbrand wrote: > > On 27.07.21 11:48, David Hildenbrand wrote: > > On 27.07.21 02:12, Evan Green wrote: > >> Add a new SWAP_FLAG_HIBERNATE_ONLY that adds a swap region but refuses > >> to allow generic swapping to it. This region can still be wired up for > >> use in suspend-to-disk activities, but will never have regular pages > >> swapped to it. This flag will be passed in by utilities like swapon(8), > >> usage would probably look something like: swapon -o hibernate /dev/sda2. > >> > >> Currently it's not possible to enable hibernation without also enabling > >> generic swap for a given area. One semi-workaround for this is to delay > >> the call to swapon() until just before attempting to hibernate, and then > >> call swapoff() just after hibernate completes. This is somewhat kludgy, > >> and also doesn't really work to keep swap out of the hibernate region. > >> When hibernate begins, it starts by allocating a large chunk of memory > >> for itself. This often ends up forcing a lot of data out into swap. By > >> this time the hibernate region is eligible for generic swap, so swap > >> ends up leaking into the hibernate region even with the workaround. > >> > >> There are a few reasons why usermode might want to be able to > >> exclusively steer swap and hibernate. One reason relates to SSD wearing. > >> Hibernate's endurance and speed requirements are different from swap. > >> It may for instance be advantageous to keep hibernate in primary > >> storage, but put swap in an SLC namespace. These namespaces are faster > >> and have better endurance, but cost 3-4x in terms of capacity. > >> Exclusively steering hibernate and swap enables system designers to > >> accurately partition their storage without either wearing out their > >> primary storage, or overprovisioning their fast swap area. > >> > >> Another reason to allow exclusive steering has to do with security. > >> The requirements for designing systems with resilience against > >> offline attacks are different between swap and hibernate. Swap > >> effectively requires a dictionary of hashes, as pages can be added and > >> removed arbitrarily, whereas hibernate only needs a single hash for the > >> entire image. If you've set up block-level integrity for swap and > >> image-level integrity for hibernate, then allowing swap blocks to > >> possibly leak out to the hibernate region is problematic, since it > >> creates swap pages not protected by any integrity. > >> > >> Swap regions with SWAP_FLAG_HIBERNATE_ONLY set will not appear in > >> /proc/meminfo under SwapTotal and SwapFree, since they are not usable as > >> general swap. These regions do still appear in /proc/swaps. > > > > Right, and they also don't account towards the memory overcommit > > calculations. > > > > Thanks for extending the patch description! No problem, thanks for all the brainwaves directed at this. > > > > [...] > > > >> + if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) { > >> + if (IS_ENABLED(CONFIG_HIBERNATION)) { > >> + if (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS) > >> + return -EINVAL; > >> + > >> + } else { > >> + return -EINVAL; > >> + } > >> + } > > > > We could do short > > > > if ((swap_flags & SWAP_FLAG_HIBERNATE_ONLY) && > > (!IS_ENABLED(CONFIG_HIBERNATION) || > > (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS))) > > return -EINVAL; > > > > or > > > > if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY)) > > if (!IS_ENABLED(CONFIG_HIBERNATION) || > > (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)) > > return -EINVAL; > > > >> + > >> if (!capable(CAP_SYS_ADMIN)) > >> return -EPERM; > >> > >> @@ -3335,16 +3366,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > >> if (swap_flags & SWAP_FLAG_PREFER) > >> prio = > >> (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; > >> + > >> + if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) > >> + p->flags |= SWP_HIBERNATE_ONLY; > >> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); > >> > >> - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", > >> + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n", > >> p->pages<<(PAGE_SHIFT-10), name->name, p->prio, > >> nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), > >> (p->flags & SWP_SOLIDSTATE) ? "SS" : "", > >> (p->flags & SWP_DISCARDABLE) ? "D" : "", > >> (p->flags & SWP_AREA_DISCARD) ? "s" : "", > >> (p->flags & SWP_PAGE_DISCARD) ? "c" : "", > >> - (frontswap_map) ? "FS" : ""); > >> + (frontswap_map) ? "FS" : "", > >> + (p->flags & SWP_HIBERNATE_ONLY) ? "H" : ""); > >> > >> mutex_unlock(&swapon_mutex); > >> atomic_inc(&proc_poll_event); > >> > > > > Looks like the cleanest alternative to me, as long as we don't want to > > invent new interfaces. > > > > Acked-by: David Hildenbrand > > > > Pavel just mentioned uswsusp, and I wonder if it would be a possible > alternative to this patch. I think you're right that it would be possible to isolate the hibernate image with uswsusp if you avoid using the SNAPSHOT_*SWAP* ioctls. But I'd expect performance to suffer noticeably, since now every page is making a round trip out to usermode and back. I'd still very much use the HIBERNATE_ONLY flag if it were accepted, I think there's value to it. -Evan