From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2414BC11D00 for ; Thu, 20 Feb 2020 21:48:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E686B208E4 for ; Thu, 20 Feb 2020 21:48:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=colorremedies-com.20150623.gappssmtp.com header.i=@colorremedies-com.20150623.gappssmtp.com header.b="MbjRM2+Z" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727801AbgBTVsl (ORCPT ); Thu, 20 Feb 2020 16:48:41 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:36127 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727656AbgBTVsl (ORCPT ); Thu, 20 Feb 2020 16:48:41 -0500 Received: by mail-wm1-f65.google.com with SMTP id p17so162322wma.1 for ; Thu, 20 Feb 2020 13:48:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colorremedies-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WTVfgaoOc7Aubo1Fbnig9Cv2m0AIMq6Ne61JqyVLG9E=; b=MbjRM2+ZXKEzdMcW4Yqh2rqfq1l84Tbm4IUXg1VlOQ7uZsAlKU8wXaNe39awho+THs m1nZXfUiHkSrquIVX88Dsh8PZbhAQ8k7aleP5PlF6wjVWqnfLZwZj4MUIeDyKXUb48IO 7e83lOh9O4lkJc6RoNbkRc2SI7cBLPj3r+NinMDZK0QAec/xbb49x7ZpQ6uDpOW6dY0j AHo3QG05Sxsjuc1Gujp25h4Vfe8756Ov05OKLAZ58RO6RqfoRGe0o6rgMVUwtzo5YNyv ajBfave1aAxpaGD0fkjWbmObpMIVV2e8qASRvWJBuJzH4NPJijsLm3XIVpIx3boSVMJR ezZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WTVfgaoOc7Aubo1Fbnig9Cv2m0AIMq6Ne61JqyVLG9E=; b=DbYUBvriNXdXRgrxcV5CBLo4RiGAsZzzVu2E0zyJ1upqCJ+yUtNKU4MPBEbPx/pmfS +shC9I7zPUgtRy3D1Zh5TiT3gngGsGvocCVarXp5MYjQHdecPPzh5ua72aV/e0DGJlJD Qpq/1edrYHXew+EHEmDUeB2bt2nS9o9PFOxUo4Jty1B/m6u5Pgn8JqeqvCOjR/0t4e9j vYQUju1j+YQrLpBn4sc8i1Pac++6u634gBsyEREeZLS+KPHCN9keHE9ICdFUYlLVbheS vZyrI5SAA/C7f/30NTqrxsEhu0IOt8F1Izri21hZwK0BgC4czhuq7YSlT3U2K6H7cQtH ocew== X-Gm-Message-State: APjAAAVmAEMofpY6j9CGBLUzTDSCpRBvkb9Ni7UtHBkJh03ze25+QwwR poAGk3F55P2I2Egylem5uByLmJGJOv7GmFqNKmf/aA== X-Google-Smtp-Source: APXvYqyDhXbsrP0OJoJzPOTLfHKpbwYUPv45kXxa7TYJoFtbmqA6RHvFE+3PyaFELoBzsq+yqi0jjXzJ/PX4t1DCDKM= X-Received: by 2002:a05:600c:294a:: with SMTP id n10mr6639587wmd.11.1582235318995; Thu, 20 Feb 2020 13:48:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Murphy Date: Thu, 20 Feb 2020 14:48:22 -0700 Message-ID: Subject: Re: is hibernation usable? To: Luigi Semenzato Cc: Chris Murphy , Linux Memory Management List , Linux PM Content-Type: text/plain; charset="UTF-8" Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato wrote: > > On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy wrote: > > > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato wrote: > > > > > > I think this is the right group for the memory issues. > > > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > > to the unreliability of the page allocator. In my experience, under > > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > > can fail even when in theory they should succeed. (I wish I were > > > wrong and that someone would convincingly correct me.) > > > > What is vm.swappiness set to on your system? A fellow Fedora > > contributor who has consistently reproduced what you describe, has > > discovered he has vm.swappiness=0, and even if it's set to 1, the > > problem no longer happens. And this is not a documented consequence of > > using a value of 0. > > I am using the default value of 60. > > A zero value should cause all file pages to be discarded before any > anonymous pages are swapped. I wonder if the fellow Fedora > contributor's workload has a lot of file pages, so that discarding > them is enough for the image allocator to succeed. In that case "sync; > echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving > the same result. (By the way, in my experiments I do that just before > hibernating.) Unfortunately I can't reproduce graceful failure you describe, myself. I either get successful hibernation/resume or some kind of non-deterministic and fatal failure to enter hibernation - and any dmesg/journal that might contain evidence of the failure is lost. I've had better success with qemu-kvm testing, but even in that case I see about 1/4 of the time (with a ridiculously small sample size) failure to complete hibernation entry. I can't tell if the failure happens during page out, hibernation image creation, or hibernation image write out - but the result is a black screen (virt-manager console) and the VM never shutsdown or reboots, it just hangs and spins ~400% CPU (even though it's only assigned 3 CPUs). It's sufficiently unreliable that I can't really consider it supported or supportable. Microsoft and Apple have put more emphasis lately on S0 low power idle, faster booting, and application state saving. The behavior in Windows 10 with hiberfil.sys is a limited environment, essentially that of the login window (no user environment state is saved in it), and is used both for resuming from S4, as well as fast boot. A separate file pagefile.sys is used for paging, so there's never a conflict where a use case that depends on significant page out can prevent hibernation from succeeding. It's also Secure Boot compatible. Where on linux with x86_64 it isn't. Between kernel and ACPI and firmware bugs, it's going to take a lot more effort to make it reliable and trustworthy for the general case. Or it should just be abandoned, it seems to be mostly that way already. -- Chris Murphy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 270A7C11D25 for ; Thu, 20 Feb 2020 21:48:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D7ED2206F4 for ; Thu, 20 Feb 2020 21:48:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=colorremedies-com.20150623.gappssmtp.com header.i=@colorremedies-com.20150623.gappssmtp.com header.b="MbjRM2+Z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7ED2206F4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=colorremedies.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E5956B0005; Thu, 20 Feb 2020 16:48:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 597026B0006; Thu, 20 Feb 2020 16:48:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AC196B0007; Thu, 20 Feb 2020 16:48:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0237.hostedemail.com [216.40.44.237]) by kanga.kvack.org (Postfix) with ESMTP id 3325C6B0005 for ; Thu, 20 Feb 2020 16:48:41 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E86844DBF for ; Thu, 20 Feb 2020 21:48:40 +0000 (UTC) X-FDA: 76511845200.21.sea15_2ea36c6ec7a43 X-HE-Tag: sea15_2ea36c6ec7a43 X-Filterd-Recvd-Size: 6250 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Feb 2020 21:48:40 +0000 (UTC) Received: by mail-wm1-f67.google.com with SMTP id s144so3739999wme.1 for ; Thu, 20 Feb 2020 13:48:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colorremedies-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WTVfgaoOc7Aubo1Fbnig9Cv2m0AIMq6Ne61JqyVLG9E=; b=MbjRM2+ZXKEzdMcW4Yqh2rqfq1l84Tbm4IUXg1VlOQ7uZsAlKU8wXaNe39awho+THs m1nZXfUiHkSrquIVX88Dsh8PZbhAQ8k7aleP5PlF6wjVWqnfLZwZj4MUIeDyKXUb48IO 7e83lOh9O4lkJc6RoNbkRc2SI7cBLPj3r+NinMDZK0QAec/xbb49x7ZpQ6uDpOW6dY0j AHo3QG05Sxsjuc1Gujp25h4Vfe8756Ov05OKLAZ58RO6RqfoRGe0o6rgMVUwtzo5YNyv ajBfave1aAxpaGD0fkjWbmObpMIVV2e8qASRvWJBuJzH4NPJijsLm3XIVpIx3boSVMJR ezZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WTVfgaoOc7Aubo1Fbnig9Cv2m0AIMq6Ne61JqyVLG9E=; b=atX2BwiiYnDUwFCjL2h6TlcbwG3spAfULB32IuN+Ln7FLkKeERTcoF8fKX7Be/o8yg 5GgBhqzN1fbHeCpuzf9b47XaQqUf4ygt3LF3prIs1dmpUjQ70tpLTX/OU4fpV3lntHpR A1Ex7dWUyg2M/9o3KdkSeGs3UzTLHlA9smcinlRQvk+rp8hQUeCRCa0CVaZpbxyaG4NP kEQqvJjf+Z3hwFtlHG9awn+NNRg8PdeVsFaJX+X4Zt+yWlMHDvYelqsMALue9+rZT/uj ShdAchjTb0E814jgH3/G/dqZrtCdhTAnBCBz3BKlp/hMpWFUvan5ciO8twwAf6yK1nVL IneQ== X-Gm-Message-State: APjAAAW+lecwtgcNikZ3zhzBNQ0pCZiWOjuptY2mufsjQjclbERoKfS4 Eq0IIwxokNLNaqLwUHTo79OEOEG2IYdk3BY00NePjg== X-Google-Smtp-Source: APXvYqyDhXbsrP0OJoJzPOTLfHKpbwYUPv45kXxa7TYJoFtbmqA6RHvFE+3PyaFELoBzsq+yqi0jjXzJ/PX4t1DCDKM= X-Received: by 2002:a05:600c:294a:: with SMTP id n10mr6639587wmd.11.1582235318995; Thu, 20 Feb 2020 13:48:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Murphy Date: Thu, 20 Feb 2020 14:48:22 -0700 Message-ID: Subject: Re: is hibernation usable? To: Luigi Semenzato Cc: Chris Murphy , Linux Memory Management List , Linux PM Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato wrote: > > On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy wrote: > > > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato wrote: > > > > > > I think this is the right group for the memory issues. > > > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > > to the unreliability of the page allocator. In my experience, under > > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > > can fail even when in theory they should succeed. (I wish I were > > > wrong and that someone would convincingly correct me.) > > > > What is vm.swappiness set to on your system? A fellow Fedora > > contributor who has consistently reproduced what you describe, has > > discovered he has vm.swappiness=0, and even if it's set to 1, the > > problem no longer happens. And this is not a documented consequence of > > using a value of 0. > > I am using the default value of 60. > > A zero value should cause all file pages to be discarded before any > anonymous pages are swapped. I wonder if the fellow Fedora > contributor's workload has a lot of file pages, so that discarding > them is enough for the image allocator to succeed. In that case "sync; > echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving > the same result. (By the way, in my experiments I do that just before > hibernating.) Unfortunately I can't reproduce graceful failure you describe, myself. I either get successful hibernation/resume or some kind of non-deterministic and fatal failure to enter hibernation - and any dmesg/journal that might contain evidence of the failure is lost. I've had better success with qemu-kvm testing, but even in that case I see about 1/4 of the time (with a ridiculously small sample size) failure to complete hibernation entry. I can't tell if the failure happens during page out, hibernation image creation, or hibernation image write out - but the result is a black screen (virt-manager console) and the VM never shutsdown or reboots, it just hangs and spins ~400% CPU (even though it's only assigned 3 CPUs). It's sufficiently unreliable that I can't really consider it supported or supportable. Microsoft and Apple have put more emphasis lately on S0 low power idle, faster booting, and application state saving. The behavior in Windows 10 with hiberfil.sys is a limited environment, essentially that of the login window (no user environment state is saved in it), and is used both for resuming from S4, as well as fast boot. A separate file pagefile.sys is used for paging, so there's never a conflict where a use case that depends on significant page out can prevent hibernation from succeeding. It's also Secure Boot compatible. Where on linux with x86_64 it isn't. Between kernel and ACPI and firmware bugs, it's going to take a lot more effort to make it reliable and trustworthy for the general case. Or it should just be abandoned, it seems to be mostly that way already. -- Chris Murphy