All of lore.kernel.org
 help / color / mirror / Atom feed
From: Evan Green <evgreen@chromium.org>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Pavel Machek <pavel@ucw.cz>,
	linux-api@vger.kernel.org, Alex Shi <alexs@kernel.org>,
	Alistair Popple <apopple@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH v4] mm: Enable suspend-only swap spaces
Date: Tue, 27 Jul 2021 09:31:33 -0700	[thread overview]
Message-ID: <CAE=gft7567-2Lq7raJKrOpQ8UAvXTFWwPci=_GCRPET3nS=9SA@mail.gmail.com> (raw)
In-Reply-To: <6ff28cfe-1107-347b-0327-ad36e256141b@redhat.com>

On Tue, Jul 27, 2021 at 5:21 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 27.07.21 11:48, David Hildenbrand wrote:
> > On 27.07.21 02:12, Evan Green wrote:
> >> Add a new SWAP_FLAG_HIBERNATE_ONLY that adds a swap region but refuses
> >> to allow generic swapping to it. This region can still be wired up for
> >> use in suspend-to-disk activities, but will never have regular pages
> >> swapped to it. This flag will be passed in by utilities like swapon(8),
> >> usage would probably look something like: swapon -o hibernate /dev/sda2.
> >>
> >> Currently it's not possible to enable hibernation without also enabling
> >> generic swap for a given area. One semi-workaround for this is to delay
> >> the call to swapon() until just before attempting to hibernate, and then
> >> call swapoff() just after hibernate completes. This is somewhat kludgy,
> >> and also doesn't really work to keep swap out of the hibernate region.
> >> When hibernate begins, it starts by allocating a large chunk of memory
> >> for itself. This often ends up forcing a lot of data out into swap. By
> >> this time the hibernate region is eligible for generic swap, so swap
> >> ends up leaking into the hibernate region even with the workaround.
> >>
> >> There are a few reasons why usermode might want to be able to
> >> exclusively steer swap and hibernate. One reason relates to SSD wearing.
> >> Hibernate's endurance and speed requirements are different from swap.
> >> It may for instance be advantageous to keep hibernate in primary
> >> storage, but put swap in an SLC namespace. These namespaces are faster
> >> and have better endurance, but cost 3-4x in terms of capacity.
> >> Exclusively steering hibernate and swap enables system designers to
> >> accurately partition their storage without either wearing out their
> >> primary storage, or overprovisioning their fast swap area.
> >>
> >> Another reason to allow exclusive steering has to do with security.
> >> The requirements for designing systems with resilience against
> >> offline attacks are different between swap and hibernate. Swap
> >> effectively requires a dictionary of hashes, as pages can be added and
> >> removed arbitrarily, whereas hibernate only needs a single hash for the
> >> entire image. If you've set up block-level integrity for swap and
> >> image-level integrity for hibernate, then allowing swap blocks to
> >> possibly leak out to the hibernate region is problematic, since it
> >> creates swap pages not protected by any integrity.
> >>
> >> Swap regions with SWAP_FLAG_HIBERNATE_ONLY set will not appear in
> >> /proc/meminfo under SwapTotal and SwapFree, since they are not usable as
> >> general swap. These regions do still appear in /proc/swaps.
> >
> > Right, and they also don't account towards the memory overcommit
> > calculations.
> >
> > Thanks for extending the patch description!

No problem, thanks for all the brainwaves directed at this.

> >
> > [...]
> >
> >> +    if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) {
> >> +            if (IS_ENABLED(CONFIG_HIBERNATION)) {
> >> +                    if (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)
> >> +                            return -EINVAL;
> >> +
> >> +            } else {
> >> +                    return -EINVAL;
> >> +            }
> >> +    }
> >
> > We could do short
> >
> > if ((swap_flags & SWAP_FLAG_HIBERNATE_ONLY) &&
> >        (!IS_ENABLED(CONFIG_HIBERNATION) ||
> >         (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)))
> >       return -EINVAL;
> >
> > or
> >
> > if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY))
> >       if (!IS_ENABLED(CONFIG_HIBERNATION) ||
> >               (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS))
> >               return -EINVAL;
> >
> >> +
> >>      if (!capable(CAP_SYS_ADMIN))
> >>              return -EPERM;
> >>
> >> @@ -3335,16 +3366,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> >>      if (swap_flags & SWAP_FLAG_PREFER)
> >>              prio =
> >>                (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT;
> >> +
> >> +    if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY)
> >> +            p->flags |= SWP_HIBERNATE_ONLY;
> >>      enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
> >>
> >> -    pr_info("Adding %uk swap on %s.  Priority:%d extents:%d across:%lluk %s%s%s%s%s\n",
> >> +    pr_info("Adding %uk swap on %s.  Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n",
> >>              p->pages<<(PAGE_SHIFT-10), name->name, p->prio,
> >>              nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
> >>              (p->flags & SWP_SOLIDSTATE) ? "SS" : "",
> >>              (p->flags & SWP_DISCARDABLE) ? "D" : "",
> >>              (p->flags & SWP_AREA_DISCARD) ? "s" : "",
> >>              (p->flags & SWP_PAGE_DISCARD) ? "c" : "",
> >> -            (frontswap_map) ? "FS" : "");
> >> +            (frontswap_map) ? "FS" : "",
> >> +            (p->flags & SWP_HIBERNATE_ONLY) ? "H" : "");
> >>
> >>      mutex_unlock(&swapon_mutex);
> >>      atomic_inc(&proc_poll_event);
> >>
> >
> > Looks like the cleanest alternative to me, as long as we don't want to
> > invent new interfaces.
> >
> > Acked-by: David Hildenbrand <david@redhat.com>
> >
>
> Pavel just mentioned uswsusp, and I wonder if it would be a possible
> alternative to this patch.

I think you're right that it would be possible to isolate the
hibernate image with uswsusp if you avoid using the SNAPSHOT_*SWAP*
ioctls. But I'd expect performance to suffer noticeably, since now
every page is making a round trip out to usermode and back. I'd still
very much use the HIBERNATE_ONLY flag if it were accepted, I think
there's value to it.

-Evan

  reply	other threads:[~2021-07-27 16:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  0:12 [PATCH v4] mm: Enable suspend-only swap spaces Evan Green
2021-07-27  9:48 ` David Hildenbrand
2021-07-27 12:21   ` David Hildenbrand
2021-07-27 16:31     ` Evan Green [this message]
2021-07-27 16:31       ` Evan Green
2021-07-27 21:18       ` Andrew Morton
2021-08-03 18:00         ` Evan Green
2021-08-03 18:00           ` Evan Green
2021-08-17 18:32           ` Pavel Machek
2021-07-30 13:15 ` Karel Zak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE=gft7567-2Lq7raJKrOpQ8UAvXTFWwPci=_GCRPET3nS=9SA@mail.gmail.com' \
    --to=evgreen@chromium.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=pavel@ucw.cz \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.