* re: is hibernation usable? @ 2020-02-11 19:50 Chris Murphy 2020-02-11 22:23 ` Luigi Semenzato 0 siblings, 1 reply; 27+ messages in thread From: Chris Murphy @ 2020-02-11 19:50 UTC (permalink / raw) To: linux-mm; +Cc: semenzato Original thread: https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/ This whole thread is a revelation. I have no doubt most users have no idea that hibernation image creation is expected to fail if more than 50% RAM is used. Please bear with me while I ask some possibly rudimentary questions to ensure I understand this in simple terms. Example system: 32G RAM, all of it used, plus 2G of page outs (into the swap device). + 2G already paged out to swap + 16GB needs to be paged out to swap, to free up enough memory to create the hibernation image + 8-16GB for the (compressed) hibernation image to be written to a *contiguous* range within swap device This suggests a 26G-34G swap device, correct? (I realize that this swap device could, in another example, contain more than 2G of page outs already, and that would only increase this requirement.) Is there now (or planned) an automatic kernel facility that will do the eviction automatically, to free up enough memory, so that the hibernation image can always be successfully created in-memory? If not, does this suggest some facility needs to be created, maybe in systemd, coordinating with the desktop environment? I don't need to understand the details but I do want to understand if this exists, will exist, and where it will exist. One idea floated on Fedora devel@ a few months ago by a systemd developer, is to activate a swap device at hibernation time. That way the system is constrained to a smaller swap device, e.g. swap on /dev/zram during normal use, but can still hibernate by activating a suitably sized swap device on-demand. Do you anticipate any problems with this idea? Could it be subject to race conditions? Is there any difference in hibernation reliability between swap partitions, versus swapfiles? I note there isn't a standard interface for all file systems, notably Btrfs has a unique requirement [1] Are there any prospects for signed hibernation images, in order to support hibernation when UEFI Secure Boot is enabled? What about the signing of swap? If there's a trust concern with the hibernation image, and I agree that there is in the context of UEFI SB, then it seems there's likewise a concern about active pages in swap. Yes? No? [1] https://lore.kernel.org/linux-btrfs/CAJCQCtSLYY-AY8b1WZ1D4neTrwMsm_A61-G-8e6-H3Dmfue_vQ@mail.gmail.com/ Thanks! -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-11 19:50 is hibernation usable? Chris Murphy @ 2020-02-11 22:23 ` Luigi Semenzato 2020-02-20 2:54 ` Chris Murphy 0 siblings, 1 reply; 27+ messages in thread From: Luigi Semenzato @ 2020-02-11 22:23 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux Memory Management List On Tue, Feb 11, 2020 at 11:50 AM Chris Murphy <lists@colorremedies.com> wrote: > > Original thread: > https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/ > > This whole thread is a revelation. I have no doubt most users have no > idea that hibernation image creation is expected to fail if more than > 50% RAM is used. Please bear with me while I ask some possibly > rudimentary questions to ensure I understand this in simple terms. To be clear, I am not completely sure of this. Other developers are not in agreement with this (as you can see from the thread). However, I can easily and consistently reproduce the memory allocation failure when anon is >50% of total. According to others, the image allocation should reclaim pages by forcing anon pages to swap. I don't understand if/how the swap partition accommodates both swapped pages and the hibernation image, but in any case, in my experiments, I allocate a swap disk the same size of RAM, which should be sufficient (again, according to the threads). > Example system: 32G RAM, all of it used, plus 2G of page outs (into > the swap device). > > + 2G already paged out to swap > + 16GB needs to be paged out to swap, to free up enough memory to > create the hibernation image > + 8-16GB for the (compressed) hibernation image to be written to a > *contiguous* range within swap device > > This suggests a 26G-34G swap device, correct? (I realize that this > swap device could, in another example, contain more than 2G of page > outs already, and that would only increase this requirement.) > > Is there now (or planned) an automatic kernel facility that will do > the eviction automatically, to free up enough memory, so that the > hibernation image can always be successfully created in-memory? If > not, does this suggest some facility needs to be created, maybe in > systemd, coordinating with the desktop environment? I don't need to > understand the details but I do want to understand if this exists, > will exist, and where it will exist. I have a workaround, but it needs memcgroups. You can echo $limit > .../$cgroup/memory.mem.limit_in_bytes and if your current usage is greater than $limit, and you have swap, the operation will block until enough pages have been swapped out to satisfy the limit. Even this isn't guaranteed to work, even with enough free swap. The limit adjustment invokes mem_cgroup_resize_limit() which contains a loop with multiple retries of a call to do_try_to_free_pages(). The number of retries looks like a heuristic, and I've seen the resizing fail. > One idea floated on Fedora devel@ a few months ago by a systemd > developer, is to activate a swap device at hibernation time. That way > the system is constrained to a smaller swap device, e.g. swap on > /dev/zram during normal use, but can still hibernate by activating a > suitably sized swap device on-demand. Do you anticipate any problems > with this idea? Could it be subject to race conditions? > > Is there any difference in hibernation reliability between swap > partitions, versus swapfiles? I note there isn't a standard interface > for all file systems, notably Btrfs has a unique requirement [1] > > Are there any prospects for signed hibernation images, in order to > support hibernation when UEFI Secure Boot is enabled? > > What about the signing of swap? If there's a trust concern with the > hibernation image, and I agree that there is in the context of UEFI > SB, then it seems there's likewise a concern about active pages in > swap. Yes? No? > > > [1] > https://lore.kernel.org/linux-btrfs/CAJCQCtSLYY-AY8b1WZ1D4neTrwMsm_A61-G-8e6-H3Dmfue_vQ@mail.gmail.com/ > > Thanks! > > -- > Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-11 22:23 ` Luigi Semenzato @ 2020-02-20 2:54 ` Chris Murphy 2020-02-20 2:56 ` Chris Murphy 0 siblings, 1 reply; 27+ messages in thread From: Chris Murphy @ 2020-02-20 2:54 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Linux Memory Management List On Tue, Feb 11, 2020 at 3:23 PM Luigi Semenzato <semenzato@google.com> wrote: > > On Tue, Feb 11, 2020 at 11:50 AM Chris Murphy <lists@colorremedies.com> wrote: > > > > Original thread: > > https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/ > > > > This whole thread is a revelation. I have no doubt most users have no > > idea that hibernation image creation is expected to fail if more than > > 50% RAM is used. Please bear with me while I ask some possibly > > rudimentary questions to ensure I understand this in simple terms. > > To be clear, I am not completely sure of this. Other developers are > not in agreement with this (as you can see from the thread). However, > I can easily and consistently reproduce the memory allocation failure > when anon is >50% of total. According to others, the image allocation > should reclaim pages by forcing anon pages to swap. I don't > understand if/how the swap partition accommodates both swapped pages > and the hibernation image, but in any case, in my experiments, I > allocate a swap disk the same size of RAM, which should be sufficient > (again, according to the threads). I'm testing with this method: # echo reboot > /sys/power/disk # echo disk > /sys/power/state About 2/3rd of the time on a test system, hibernation entry fails. It's fatal. The last journal entry is: [ 349.732372] PM: hibernation: hibernation entry Screen is blank, system gets hot, fans go to high, and it doesn't recover after 15 minutes. After forcing power off and rebooting, there is no hibernation signature reported in the swap partition so I don't think the kernel every reached reboot. Shifting over to a qemu-kvm with pm support enabled, this is working. If I fill up pretty much all of RAM and a small amount of swap is used, the above two commands succeed, the VM reboots, and the hibernation image is resumed without error. AnonPages is 73% of total. Upon successful resume, it appears quite a lot of pages were pushed to swap. It looks like about 1GiB was paged out. Before hibernation: $ cat /proc/meminfo MemTotal: 2985944 kB MemFree: 148376 kB MemAvailable: 220428 kB Buffers: 172 kB Cached: 366100 kB SwapCached: 4632 kB Active: 1962088 kB Inactive: 592576 kB Active(anon): 1842560 kB Inactive(anon): 467904 kB Active(file): 119528 kB Inactive(file): 124672 kB Unevictable: 1628 kB Mlocked: 1628 kB SwapTotal: 3117052 kB SwapFree: 2899952 kB Dirty: 6248 kB Writeback: 0 kB AnonPages: 2187236 kB Mapped: 245800 kB Shmem: 120504 kB KReclaimable: 58016 kB Slab: 203260 kB SReclaimable: 58016 kB SUnreclaim: 145244 kB KernelStack: 13712 kB PageTables: 23364 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4610024 kB Committed_AS: 6019396 kB VmallocTotal: 34359738367 kB VmallocUsed: 27528 kB VmallocChunk: 0 kB Percpu: 4016 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 238332 kB DirectMap2M: 2904064 kB After resume: [chris@vm ~]$ cat /proc/meminfo MemTotal: 2985944 kB MemFree: 1007132 kB MemAvailable: 1069576 kB Buffers: 76 kB Cached: 400464 kB SwapCached: 296112 kB Active: 755856 kB Inactive: 955624 kB Active(anon): 731668 kB Inactive(anon): 683352 kB Active(file): 24188 kB Inactive(file): 272272 kB Unevictable: 1632 kB Mlocked: 1632 kB SwapTotal: 3117052 kB SwapFree: 1874788 kB Dirty: 2716 kB Writeback: 0 kB AnonPages: 1182108 kB Mapped: 225352 kB Shmem: 102480 kB KReclaimable: 48968 kB Slab: 183104 kB SReclaimable: 48968 kB SUnreclaim: 134136 kB KernelStack: 14000 kB PageTables: 22924 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4610024 kB Committed_AS: 5937732 kB VmallocTotal: 34359738367 kB VmallocUsed: 27800 kB VmallocChunk: 0 kB Percpu: 4016 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 238332 kB DirectMap2M: 2904064 kB $ There must be some other cause for the 50% limitation. Is it possible it only starts once there's a certain amount of RAM present? e.g. maybe it can only page out 4GiB of Anon pages to swap? And after that point if at least 50% RAM isn't available, hibernation image creation fails? -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 2:54 ` Chris Murphy @ 2020-02-20 2:56 ` Chris Murphy 2020-02-20 17:16 ` Luigi Semenzato 0 siblings, 1 reply; 27+ messages in thread From: Chris Murphy @ 2020-02-20 2:56 UTC (permalink / raw) To: Linux Memory Management List; +Cc: Luigi Semenzato Also, is this the correct list for hibernation/swap discussion? Or linux-pm@? Thanks, Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 2:56 ` Chris Murphy @ 2020-02-20 17:16 ` Luigi Semenzato 2020-02-20 17:38 ` Luigi Semenzato 2020-02-20 19:09 ` Chris Murphy 0 siblings, 2 replies; 27+ messages in thread From: Luigi Semenzato @ 2020-02-20 17:16 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM I think this is the right group for the memory issues. I suspect that the problem with failed allocations (ENOMEM) boils down to the unreliability of the page allocator. In my experience, under pressure (i.e. pages must be swapped out to be reclaimed) allocations can fail even when in theory they should succeed. (I wish I were wrong and that someone would convincingly correct me.) I have a workaround in which I use memcgroups to free pages before starting hibernation. The cgroup request "echo $limit > .../memory.limit_in_bytes" blocks until memory usage in the chosen cgroup is below $limit. However, I have seen this request fail even when there is extra available swap space. The callback for the operation is mem_cgroup_resize_limit() (BTW I am looking at kernel version 4.3.5) and that code has a loop where try_to_free_pages() is called up to retry_count, which is at least 5. Why 5? One suspects that the writer of that code must have also realized that the page freeing request is unreliable and it's worth trying multiple times. So you could try something similar. I don't know if there are interfaces to try_to_free_pages() other than those in cgroups. If not, and you aren't using cgroups, one way might be to start several memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 | sleep infinity") and monitor allocation, then when they use more than 50% of RAM kill them and immediately hibernate before the freed pages are reused. If you can build your custom kernel, maybe it's worth adding a sysfs entry to invoke try_to_free_pages(). You could also change the hibernation code to do that, but having the user-level hook may be more flexible. On Wed, Feb 19, 2020 at 6:56 PM Chris Murphy <lists@colorremedies.com> wrote: > > Also, is this the correct list for hibernation/swap discussion? Or linux-pm@? > > Thanks, > > Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 17:16 ` Luigi Semenzato @ 2020-02-20 17:38 ` Luigi Semenzato 2020-02-21 8:49 ` Michal Hocko 2020-02-20 19:09 ` Chris Murphy 1 sibling, 1 reply; 27+ messages in thread From: Luigi Semenzato @ 2020-02-20 17:38 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM I was forgetting: forcing swap by eating up memory is dangerous because it can lead to unexpected OOM kills, but you can mitigate that by giving the memory-eaters a higher OOM kill score. Still, some way of calling try_to_free_pages() directly from user-level would be preferable. I wonder if such API has been discussed. On Thu, Feb 20, 2020 at 9:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > I think this is the right group for the memory issues. > > I suspect that the problem with failed allocations (ENOMEM) boils down > to the unreliability of the page allocator. In my experience, under > pressure (i.e. pages must be swapped out to be reclaimed) allocations > can fail even when in theory they should succeed. (I wish I were > wrong and that someone would convincingly correct me.) > > I have a workaround in which I use memcgroups to free pages before > starting hibernation. The cgroup request "echo $limit > > .../memory.limit_in_bytes" blocks until memory usage in the chosen > cgroup is below $limit. However, I have seen this request fail even > when there is extra available swap space. > > The callback for the operation is mem_cgroup_resize_limit() (BTW I am > looking at kernel version 4.3.5) and that code has a loop where > try_to_free_pages() is called up to retry_count, which is at least 5. > Why 5? One suspects that the writer of that code must have also > realized that the page freeing request is unreliable and it's worth > trying multiple times. > > So you could try something similar. I don't know if there are > interfaces to try_to_free_pages() other than those in cgroups. If > not, and you aren't using cgroups, one way might be to start several > memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 | > sleep infinity") and monitor allocation, then when they use more than > 50% of RAM kill them and immediately hibernate before the freed pages > are reused. If you can build your custom kernel, maybe it's worth > adding a sysfs entry to invoke try_to_free_pages(). You could also > change the hibernation code to do that, but having the user-level hook > may be more flexible. > > > On Wed, Feb 19, 2020 at 6:56 PM Chris Murphy <lists@colorremedies.com> wrote: > > > > Also, is this the correct list for hibernation/swap discussion? Or linux-pm@? > > > > Thanks, > > > > Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 17:38 ` Luigi Semenzato @ 2020-02-21 8:49 ` Michal Hocko 2020-02-21 9:04 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Michal Hocko @ 2020-02-21 8:49 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM On Thu 20-02-20 09:38:06, Luigi Semenzato wrote: > I was forgetting: forcing swap by eating up memory is dangerous > because it can lead to unexpected OOM kills Could you be more specific what you have in mind? swapoff causing the OOM killer? > , but you can mitigate that > by giving the memory-eaters a higher OOM kill score. Still, some way > of calling try_to_free_pages() directly from user-level would be > preferable. I wonder if such API has been discussed. No, there is no API to trigger the global memory reclaim. You could start the reclaim by increasing min_free_kbytes but I wouldn't really recommend that unless you know exactly what you are doing and also I fail to see the point. If s2disk fails due to insufficient swap space then how can a pro-active reclaim help in the first place? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-21 8:49 ` Michal Hocko @ 2020-02-21 9:04 ` Rafael J. Wysocki 0 siblings, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2020-02-21 9:04 UTC (permalink / raw) To: Michal Hocko Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote: > > I was forgetting: forcing swap by eating up memory is dangerous > > because it can lead to unexpected OOM kills > > Could you be more specific what you have in mind? swapoff causing the > OOM killer? > > > , but you can mitigate that > > by giving the memory-eaters a higher OOM kill score. Still, some way > > of calling try_to_free_pages() directly from user-level would be > > preferable. I wonder if such API has been discussed. > > No, there is no API to trigger the global memory reclaim. You could > start the reclaim by increasing min_free_kbytes but I wouldn't really > recommend that unless you know exactly what you are doing and also I > fail to see the point. If s2disk fails due to insufficient swap space > then how can a pro-active reclaim help in the first place? My understanding of the problem is that the size of swap is (theoretically) sufficient, but it is not used as expected during the preallocation of image memory. It was stated in one of the previous messages (not in this thread, cannot find it now) that swap (of the same size as RAM) was activated (swapon) right before hibernation, so theoretically that should be sufficient AFAICS. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? @ 2020-02-21 9:04 ` Rafael J. Wysocki 0 siblings, 0 replies; 27+ messages in thread From: Rafael J. Wysocki @ 2020-02-21 9:04 UTC (permalink / raw) To: Michal Hocko Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote: > > I was forgetting: forcing swap by eating up memory is dangerous > > because it can lead to unexpected OOM kills > > Could you be more specific what you have in mind? swapoff causing the > OOM killer? > > > , but you can mitigate that > > by giving the memory-eaters a higher OOM kill score. Still, some way > > of calling try_to_free_pages() directly from user-level would be > > preferable. I wonder if such API has been discussed. > > No, there is no API to trigger the global memory reclaim. You could > start the reclaim by increasing min_free_kbytes but I wouldn't really > recommend that unless you know exactly what you are doing and also I > fail to see the point. If s2disk fails due to insufficient swap space > then how can a pro-active reclaim help in the first place? My understanding of the problem is that the size of swap is (theoretically) sufficient, but it is not used as expected during the preallocation of image memory. It was stated in one of the previous messages (not in this thread, cannot find it now) that swap (of the same size as RAM) was activated (swapon) right before hibernation, so theoretically that should be sufficient AFAICS. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-21 9:04 ` Rafael J. Wysocki (?) @ 2020-02-21 9:36 ` Michal Hocko 2020-02-21 17:13 ` Luigi Semenzato -1 siblings, 1 reply; 27+ messages in thread From: Michal Hocko @ 2020-02-21 9:36 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote: > On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote: > > > I was forgetting: forcing swap by eating up memory is dangerous > > > because it can lead to unexpected OOM kills > > > > Could you be more specific what you have in mind? swapoff causing the > > OOM killer? > > > > > , but you can mitigate that > > > by giving the memory-eaters a higher OOM kill score. Still, some way > > > of calling try_to_free_pages() directly from user-level would be > > > preferable. I wonder if such API has been discussed. > > > > No, there is no API to trigger the global memory reclaim. You could > > start the reclaim by increasing min_free_kbytes but I wouldn't really > > recommend that unless you know exactly what you are doing and also I > > fail to see the point. If s2disk fails due to insufficient swap space > > then how can a pro-active reclaim help in the first place? > > My understanding of the problem is that the size of swap is > (theoretically) sufficient, but it is not used as expected during the > preallocation of image memory. > > It was stated in one of the previous messages (not in this thread, > cannot find it now) that swap (of the same size as RAM) was activated > (swapon) right before hibernation, so theoretically that should be > sufficient AFAICS. Hmm, this is interesting. Let me have a closer look... pm_restrict_gfp_mask which would completely rule out any IO happens after hibernate_preallocate_memory is done and my limited understanding tells me that this is where all the reclaim happens (via shrink_all_memory). It is quite possible that the MM decides to not swap in that path - depending on the memory usage - and miss it's target. More details would be needed. E.g. vmscan tracepoints could tell us more. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-21 9:36 ` Michal Hocko @ 2020-02-21 17:13 ` Luigi Semenzato 0 siblings, 0 replies; 27+ messages in thread From: Luigi Semenzato @ 2020-02-21 17:13 UTC (permalink / raw) To: Michal Hocko Cc: Rafael J. Wysocki, Chris Murphy, Linux Memory Management List, Linux PM On Fri, Feb 21, 2020 at 1:36 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote: > > On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote: > > > > I was forgetting: forcing swap by eating up memory is dangerous > > > > because it can lead to unexpected OOM kills > > > > > > Could you be more specific what you have in mind? swapoff causing the > > > OOM killer? No, not swapoff, just fast allocation. Also, in some earlier experiments I tried gradually increasing min_free_kbytes (precisely as suggested) and this would randomly trigger OOM kills when swap space was still available. > > > > , but you can mitigate that > > > > by giving the memory-eaters a higher OOM kill score. Still, some way > > > > of calling try_to_free_pages() directly from user-level would be > > > > preferable. I wonder if such API has been discussed. > > > > > > No, there is no API to trigger the global memory reclaim. You could > > > start the reclaim by increasing min_free_kbytes but I wouldn't really > > > recommend that unless you know exactly what you are doing and also I > > > fail to see the point. If s2disk fails due to insufficient swap space > > > then how can a pro-active reclaim help in the first place? > > > > My understanding of the problem is that the size of swap is > > (theoretically) sufficient, but it is not used as expected during the > > preallocation of image memory. > > > > It was stated in one of the previous messages (not in this thread, > > cannot find it now) that swap (of the same size as RAM) was activated > > (swapon) right before hibernation, so theoretically that should be > > sufficient AFAICS. Correct, those were my experiments. Search the archives for "semenzato", there are a couple of threads on the topic. But really, why not have a user-level interface for reclaim? I find it very difficult to understand the behavior of the reclaim code, and any attempt to reclaim from user level (memory-eating processes, raising min_free_kbytes) can end in the OOM-kill path. Using cgroups' memory.limit_in_bytes doesn't have this problem, precisely because it only calls try_to_free_pages(), which doesn't trigger OOM killing. If I could make that call from user level (without cgroups) it would greatly simplify my current workaround, and would be useful in other situations as well. Something like echo $page_count > /proc/sys/vm/try_to_free_pages cat /proc/sys/vm/pages_freed # the number of pages freed at the latest request > Hmm, this is interesting. Let me have a closer look... > > pm_restrict_gfp_mask which would completely rule out any IO > happens after hibernate_preallocate_memory is done and my limited > understanding tells me that this is where all the reclaim happens > (via shrink_all_memory). It is quite possible that the MM decides to > not swap in that path - depending on the memory usage - and miss it's > target. More details would be needed. E.g. vmscan tracepoints could tell > us more. > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? @ 2020-02-21 17:13 ` Luigi Semenzato 0 siblings, 0 replies; 27+ messages in thread From: Luigi Semenzato @ 2020-02-21 17:13 UTC (permalink / raw) To: Michal Hocko Cc: Rafael J. Wysocki, Chris Murphy, Linux Memory Management List, Linux PM On Fri, Feb 21, 2020 at 1:36 AM Michal Hocko <mhocko@kernel.org> wrote: > > On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote: > > On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote: > > > > I was forgetting: forcing swap by eating up memory is dangerous > > > > because it can lead to unexpected OOM kills > > > > > > Could you be more specific what you have in mind? swapoff causing the > > > OOM killer? No, not swapoff, just fast allocation. Also, in some earlier experiments I tried gradually increasing min_free_kbytes (precisely as suggested) and this would randomly trigger OOM kills when swap space was still available. > > > > , but you can mitigate that > > > > by giving the memory-eaters a higher OOM kill score. Still, some way > > > > of calling try_to_free_pages() directly from user-level would be > > > > preferable. I wonder if such API has been discussed. > > > > > > No, there is no API to trigger the global memory reclaim. You could > > > start the reclaim by increasing min_free_kbytes but I wouldn't really > > > recommend that unless you know exactly what you are doing and also I > > > fail to see the point. If s2disk fails due to insufficient swap space > > > then how can a pro-active reclaim help in the first place? > > > > My understanding of the problem is that the size of swap is > > (theoretically) sufficient, but it is not used as expected during the > > preallocation of image memory. > > > > It was stated in one of the previous messages (not in this thread, > > cannot find it now) that swap (of the same size as RAM) was activated > > (swapon) right before hibernation, so theoretically that should be > > sufficient AFAICS. Correct, those were my experiments. Search the archives for "semenzato", there are a couple of threads on the topic. But really, why not have a user-level interface for reclaim? I find it very difficult to understand the behavior of the reclaim code, and any attempt to reclaim from user level (memory-eating processes, raising min_free_kbytes) can end in the OOM-kill path. Using cgroups' memory.limit_in_bytes doesn't have this problem, precisely because it only calls try_to_free_pages(), which doesn't trigger OOM killing. If I could make that call from user level (without cgroups) it would greatly simplify my current workaround, and would be useful in other situations as well. Something like echo $page_count > /proc/sys/vm/try_to_free_pages cat /proc/sys/vm/pages_freed # the number of pages freed at the latest request > Hmm, this is interesting. Let me have a closer look... > > pm_restrict_gfp_mask which would completely rule out any IO > happens after hibernate_preallocate_memory is done and my limited > understanding tells me that this is where all the reclaim happens > (via shrink_all_memory). It is quite possible that the MM decides to > not swap in that path - depending on the memory usage - and miss it's > target. More details would be needed. E.g. vmscan tracepoints could tell > us more. > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-21 9:04 ` Rafael J. Wysocki @ 2020-02-21 9:46 ` Chris Murphy -1 siblings, 0 replies; 27+ messages in thread From: Chris Murphy @ 2020-02-21 9:46 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Michal Hocko, Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM On Fri, Feb 21, 2020 at 2:04 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > My understanding of the problem is that the size of swap is > (theoretically) sufficient, but it is not used as expected during the > preallocation of image memory. Right. I have no idea how locality of pages is determined in the swap device. But if it's sufficiently fragmented such that contiguous free space for a hibernation image is not sufficient, then hibernation could fail. > It was stated in one of the previous messages (not in this thread, > cannot find it now) that swap (of the same size as RAM) was activated > (swapon) right before hibernation, so theoretically that should be > sufficient AFAICS. I mentioned it as an idea floated by systemd developers. I'm not sure if it's mentioned elsewhere. Some folks wonder if such functionality could be prone to racing. https://lore.kernel.org/linux-mm/CAJCQCtSx0FOX7q0p=9XgDLJ6O0+hF_vc-wU4KL=c9xoSGGkstA@mail.gmail.com/T/#m4d47d127da493f998b232d42d81621335358aee1 Another idea that's been suggested for a while is formally separating hibernation and paging into separate files (or partitions). a. Guarantees hibernation image has the necessary contiguous free space. b. Might be easier to create (or even obviate) a sane interface for hibernation images in swapfiles; that is, if it were a dedicated hibernationfile rather than being inserted in a swapfile. Right now that interface doesn't exist, so e.g. on Btrfs while it can support swapfiles and hibernation images, the offset has to be figured out manually so resume can succeed. https://github.com/systemd/systemd/issues/11939#issuecomment-471684411 -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? @ 2020-02-21 9:46 ` Chris Murphy 0 siblings, 0 replies; 27+ messages in thread From: Chris Murphy @ 2020-02-21 9:46 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Michal Hocko, Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM On Fri, Feb 21, 2020 at 2:04 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > My understanding of the problem is that the size of swap is > (theoretically) sufficient, but it is not used as expected during the > preallocation of image memory. Right. I have no idea how locality of pages is determined in the swap device. But if it's sufficiently fragmented such that contiguous free space for a hibernation image is not sufficient, then hibernation could fail. > It was stated in one of the previous messages (not in this thread, > cannot find it now) that swap (of the same size as RAM) was activated > (swapon) right before hibernation, so theoretically that should be > sufficient AFAICS. I mentioned it as an idea floated by systemd developers. I'm not sure if it's mentioned elsewhere. Some folks wonder if such functionality could be prone to racing. https://lore.kernel.org/linux-mm/CAJCQCtSx0FOX7q0p=9XgDLJ6O0+hF_vc-wU4KL=c9xoSGGkstA@mail.gmail.com/T/#m4d47d127da493f998b232d42d81621335358aee1 Another idea that's been suggested for a while is formally separating hibernation and paging into separate files (or partitions). a. Guarantees hibernation image has the necessary contiguous free space. b. Might be easier to create (or even obviate) a sane interface for hibernation images in swapfiles; that is, if it were a dedicated hibernationfile rather than being inserted in a swapfile. Right now that interface doesn't exist, so e.g. on Btrfs while it can support swapfiles and hibernation images, the offset has to be figured out manually so resume can succeed. https://github.com/systemd/systemd/issues/11939#issuecomment-471684411 -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 17:16 ` Luigi Semenzato 2020-02-20 17:38 ` Luigi Semenzato @ 2020-02-20 19:09 ` Chris Murphy 2020-02-20 19:44 ` Luigi Semenzato 1 sibling, 1 reply; 27+ messages in thread From: Chris Murphy @ 2020-02-20 19:09 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Linux Memory Management List, Linux PM On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > I think this is the right group for the memory issues. > > I suspect that the problem with failed allocations (ENOMEM) boils down > to the unreliability of the page allocator. In my experience, under > pressure (i.e. pages must be swapped out to be reclaimed) allocations > can fail even when in theory they should succeed. (I wish I were > wrong and that someone would convincingly correct me.) What is vm.swappiness set to on your system? A fellow Fedora contributor who has consistently reproduced what you describe, has discovered he has vm.swappiness=0, and even if it's set to 1, the problem no longer happens. And this is not a documented consequence of using a value of 0. > I have a workaround in which I use memcgroups to free pages before > starting hibernation. The cgroup request "echo $limit > > .../memory.limit_in_bytes" blocks until memory usage in the chosen > cgroup is below $limit. However, I have seen this request fail even > when there is extra available swap space. > > The callback for the operation is mem_cgroup_resize_limit() (BTW I am > looking at kernel version 4.3.5) and that code has a loop where > try_to_free_pages() is called up to retry_count, which is at least 5. > Why 5? One suspects that the writer of that code must have also > realized that the page freeing request is unreliable and it's worth > trying multiple times. > > So you could try something similar. I don't know if there are > interfaces to try_to_free_pages() other than those in cgroups. If > not, and you aren't using cgroups, one way might be to start several > memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 | > sleep infinity") and monitor allocation, then when they use more than > 50% of RAM kill them and immediately hibernate before the freed pages > are reused. If you can build your custom kernel, maybe it's worth > adding a sysfs entry to invoke try_to_free_pages(). You could also > change the hibernation code to do that, but having the user-level hook > may be more flexible. Fedora 31+ now uses cgroupsv2. In any case, my use case is making sure this works correctly, sanely, with mainline kernels because Fedora doesn't do custom things with the kernel. -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 19:09 ` Chris Murphy @ 2020-02-20 19:44 ` Luigi Semenzato 2020-02-20 21:48 ` Chris Murphy 2020-02-27 6:43 ` Chris Murphy 0 siblings, 2 replies; 27+ messages in thread From: Luigi Semenzato @ 2020-02-20 19:44 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote: > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > > > I think this is the right group for the memory issues. > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > to the unreliability of the page allocator. In my experience, under > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > can fail even when in theory they should succeed. (I wish I were > > wrong and that someone would convincingly correct me.) > > What is vm.swappiness set to on your system? A fellow Fedora > contributor who has consistently reproduced what you describe, has > discovered he has vm.swappiness=0, and even if it's set to 1, the > problem no longer happens. And this is not a documented consequence of > using a value of 0. I am using the default value of 60. A zero value should cause all file pages to be discarded before any anonymous pages are swapped. I wonder if the fellow Fedora contributor's workload has a lot of file pages, so that discarding them is enough for the image allocator to succeed. In that case "sync; echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving the same result. (By the way, in my experiments I do that just before hibernating.) > > I have a workaround in which I use memcgroups to free pages before > > starting hibernation. The cgroup request "echo $limit > > > .../memory.limit_in_bytes" blocks until memory usage in the chosen > > cgroup is below $limit. However, I have seen this request fail even > > when there is extra available swap space. > > > > The callback for the operation is mem_cgroup_resize_limit() (BTW I am > > looking at kernel version 4.3.5) and that code has a loop where > > try_to_free_pages() is called up to retry_count, which is at least 5. > > Why 5? One suspects that the writer of that code must have also > > realized that the page freeing request is unreliable and it's worth > > trying multiple times. > > > > So you could try something similar. I don't know if there are > > interfaces to try_to_free_pages() other than those in cgroups. If > > not, and you aren't using cgroups, one way might be to start several > > memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 | > > sleep infinity") and monitor allocation, then when they use more than > > 50% of RAM kill them and immediately hibernate before the freed pages > > are reused. If you can build your custom kernel, maybe it's worth > > adding a sysfs entry to invoke try_to_free_pages(). You could also > > change the hibernation code to do that, but having the user-level hook > > may be more flexible. > > Fedora 31+ now uses cgroupsv2. In any case, my use case is making sure > this works correctly, sanely, with mainline kernels because Fedora > doesn't do custom things with the kernel. > > > > -- > Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 19:44 ` Luigi Semenzato @ 2020-02-20 21:48 ` Chris Murphy 2020-02-27 6:43 ` Chris Murphy 1 sibling, 0 replies; 27+ messages in thread From: Chris Murphy @ 2020-02-20 21:48 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote: > > On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote: > > > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > I think this is the right group for the memory issues. > > > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > > to the unreliability of the page allocator. In my experience, under > > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > > can fail even when in theory they should succeed. (I wish I were > > > wrong and that someone would convincingly correct me.) > > > > What is vm.swappiness set to on your system? A fellow Fedora > > contributor who has consistently reproduced what you describe, has > > discovered he has vm.swappiness=0, and even if it's set to 1, the > > problem no longer happens. And this is not a documented consequence of > > using a value of 0. > > I am using the default value of 60. > > A zero value should cause all file pages to be discarded before any > anonymous pages are swapped. I wonder if the fellow Fedora > contributor's workload has a lot of file pages, so that discarding > them is enough for the image allocator to succeed. In that case "sync; > echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving > the same result. (By the way, in my experiments I do that just before > hibernating.) Unfortunately I can't reproduce graceful failure you describe, myself. I either get successful hibernation/resume or some kind of non-deterministic and fatal failure to enter hibernation - and any dmesg/journal that might contain evidence of the failure is lost. I've had better success with qemu-kvm testing, but even in that case I see about 1/4 of the time (with a ridiculously small sample size) failure to complete hibernation entry. I can't tell if the failure happens during page out, hibernation image creation, or hibernation image write out - but the result is a black screen (virt-manager console) and the VM never shutsdown or reboots, it just hangs and spins ~400% CPU (even though it's only assigned 3 CPUs). It's sufficiently unreliable that I can't really consider it supported or supportable. Microsoft and Apple have put more emphasis lately on S0 low power idle, faster booting, and application state saving. The behavior in Windows 10 with hiberfil.sys is a limited environment, essentially that of the login window (no user environment state is saved in it), and is used both for resuming from S4, as well as fast boot. A separate file pagefile.sys is used for paging, so there's never a conflict where a use case that depends on significant page out can prevent hibernation from succeeding. It's also Secure Boot compatible. Where on linux with x86_64 it isn't. Between kernel and ACPI and firmware bugs, it's going to take a lot more effort to make it reliable and trustworthy for the general case. Or it should just be abandoned, it seems to be mostly that way already. -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? @ 2020-02-20 21:48 ` Chris Murphy 0 siblings, 0 replies; 27+ messages in thread From: Chris Murphy @ 2020-02-20 21:48 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote: > > On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote: > > > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > I think this is the right group for the memory issues. > > > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > > to the unreliability of the page allocator. In my experience, under > > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > > can fail even when in theory they should succeed. (I wish I were > > > wrong and that someone would convincingly correct me.) > > > > What is vm.swappiness set to on your system? A fellow Fedora > > contributor who has consistently reproduced what you describe, has > > discovered he has vm.swappiness=0, and even if it's set to 1, the > > problem no longer happens. And this is not a documented consequence of > > using a value of 0. > > I am using the default value of 60. > > A zero value should cause all file pages to be discarded before any > anonymous pages are swapped. I wonder if the fellow Fedora > contributor's workload has a lot of file pages, so that discarding > them is enough for the image allocator to succeed. In that case "sync; > echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving > the same result. (By the way, in my experiments I do that just before > hibernating.) Unfortunately I can't reproduce graceful failure you describe, myself. I either get successful hibernation/resume or some kind of non-deterministic and fatal failure to enter hibernation - and any dmesg/journal that might contain evidence of the failure is lost. I've had better success with qemu-kvm testing, but even in that case I see about 1/4 of the time (with a ridiculously small sample size) failure to complete hibernation entry. I can't tell if the failure happens during page out, hibernation image creation, or hibernation image write out - but the result is a black screen (virt-manager console) and the VM never shutsdown or reboots, it just hangs and spins ~400% CPU (even though it's only assigned 3 CPUs). It's sufficiently unreliable that I can't really consider it supported or supportable. Microsoft and Apple have put more emphasis lately on S0 low power idle, faster booting, and application state saving. The behavior in Windows 10 with hiberfil.sys is a limited environment, essentially that of the login window (no user environment state is saved in it), and is used both for resuming from S4, as well as fast boot. A separate file pagefile.sys is used for paging, so there's never a conflict where a use case that depends on significant page out can prevent hibernation from succeeding. It's also Secure Boot compatible. Where on linux with x86_64 it isn't. Between kernel and ACPI and firmware bugs, it's going to take a lot more effort to make it reliable and trustworthy for the general case. Or it should just be abandoned, it seems to be mostly that way already. -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2020-02-20 19:44 ` Luigi Semenzato @ 2020-02-27 6:43 ` Chris Murphy 2020-02-27 6:43 ` Chris Murphy 1 sibling, 0 replies; 27+ messages in thread From: Chris Murphy @ 2020-02-27 6:43 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote: > > On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote: > > > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > I think this is the right group for the memory issues. > > > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > > to the unreliability of the page allocator. In my experience, under > > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > > can fail even when in theory they should succeed. (I wish I were > > > wrong and that someone would convincingly correct me.) > > > > What is vm.swappiness set to on your system? A fellow Fedora > > contributor who has consistently reproduced what you describe, has > > discovered he has vm.swappiness=0, and even if it's set to 1, the > > problem no longer happens. And this is not a documented consequence of > > using a value of 0. > > I am using the default value of 60. > > A zero value should cause all file pages to be discarded before any > anonymous pages are swapped. I wonder if the fellow Fedora > contributor's workload has a lot of file pages, so that discarding > them is enough for the image allocator to succeed. In that case "sync; > echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving > the same result. (By the way, in my experiments I do that just before > hibernating.) He reports hibernation failure even if he drops caches beforehand. https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.org/message/XYWYF33RFVISVZTPYSJRRXP7TFXPV4GD/ -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? @ 2020-02-27 6:43 ` Chris Murphy 0 siblings, 0 replies; 27+ messages in thread From: Chris Murphy @ 2020-02-27 6:43 UTC (permalink / raw) To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote: > > On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote: > > > > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > I think this is the right group for the memory issues. > > > > > > I suspect that the problem with failed allocations (ENOMEM) boils down > > > to the unreliability of the page allocator. In my experience, under > > > pressure (i.e. pages must be swapped out to be reclaimed) allocations > > > can fail even when in theory they should succeed. (I wish I were > > > wrong and that someone would convincingly correct me.) > > > > What is vm.swappiness set to on your system? A fellow Fedora > > contributor who has consistently reproduced what you describe, has > > discovered he has vm.swappiness=0, and even if it's set to 1, the > > problem no longer happens. And this is not a documented consequence of > > using a value of 0. > > I am using the default value of 60. > > A zero value should cause all file pages to be discarded before any > anonymous pages are swapped. I wonder if the fellow Fedora > contributor's workload has a lot of file pages, so that discarding > them is enough for the image allocator to succeed. In that case "sync; > echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving > the same result. (By the way, in my experiments I do that just before > hibernating.) He reports hibernation failure even if he drops caches beforehand. https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.org/message/XYWYF33RFVISVZTPYSJRRXP7TFXPV4GD/ -- Chris Murphy ^ permalink raw reply [flat|nested] 27+ messages in thread
* is hibernation usable? @ 2019-10-22 20:09 Luigi Semenzato 2019-10-22 20:57 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Luigi Semenzato @ 2019-10-22 20:09 UTC (permalink / raw) To: linux-kernel Cc: Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Rafael J. Wysocki, Sonny Rao, Brian Geffon Following a thread in linux-pm (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues that may be of general interest. 1. To the best of my knowledge, Linux hibernation is guaranteed to fail if more than 1/2 of total RAM is in use (for instance, by anonymous pages). My knowledge is based on evidence, experiments, code inspection, the thread above, and a comment in Documentation/swsusp.txt, copied here: "Instead, we load the image into unused memory and then atomically copy it back to it original location. This implies, of course, a maximum image size of half the amount of memory." 2. There's no simple/general workaround. Rafael suggested on the thread "Whatever doesn't fit into 50% of RAM needs to be swapped out before hibernation". This is a good suggestion: I am actually close to achieving this using memcgroups, but it's a fair amount of work, and a fairly special case. Not everybody uses memcgroups, and I don't know of other reliable ways of forcing swap from user level. 3. A feature that works only when 1/2 of total RAM can be allocated is, in my opinion, not usable, except possibly under special circumstances, such as mine. Most of the available articles and documentation do not mention this important fact (but for the excerpt I mentioned, which is not in a prominent position). Two questions then: A. Should the documentation be changed to reflect this fact more clearly? I feel that the current situation is a disservice to the user community. B. Would it be worthwhile to improve the hibernation code to remove this limitation? Is this of interest to anybody (other than me)? Thank you in advance! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2019-10-22 20:09 Luigi Semenzato @ 2019-10-22 20:57 ` Rafael J. Wysocki 2019-10-22 21:26 ` Luigi Semenzato 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2019-10-22 20:57 UTC (permalink / raw) To: Luigi Semenzato Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Rafael J. Wysocki, Sonny Rao, Brian Geffon On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote: > > Following a thread in linux-pm > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues > that may be of general interest. > > 1. To the best of my knowledge, Linux hibernation is guaranteed to > fail if more than 1/2 of total RAM is in use (for instance, by > anonymous pages). My knowledge is based on evidence, experiments, > code inspection, the thread above, and a comment in > Documentation/swsusp.txt, copied here: So I use it on a regular basis (i.e. every day) on a system that often has over 50% or RAM in use and it all works. I also know about other people using it on a regular basis. For all of these users, it is usable. > "Instead, we load the image into unused memory and then atomically > copy it back to it original location. This implies, of course, a > maximum image size of half the amount of memory." That isn't right any more. An image that is loaded during resume can, in fact, be larger than 50% of RAM. An image that is created during hibernation, however, cannot. > 2. There's no simple/general workaround. Rafael suggested on the > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out > before hibernation". This is a good suggestion: I am actually close > to achieving this using memcgroups, but it's a fair amount of work, > and a fairly special case. Not everybody uses memcgroups, and I don't > know of other reliable ways of forcing swap from user level. I don't need to do anything like that. hibernate_preallocate_memory() manages to free a sufficient amount of memory on my system every time. > 3. A feature that works only when 1/2 of total RAM can be allocated > is, in my opinion, not usable, except possibly under special > circumstances, such as mine. Most of the available articles and > documentation do not mention this important fact (but for the excerpt > I mentioned, which is not in a prominent position). It can be used with over 1/2 of RAM allocated and that is quite easy to demonstrate. Honestly, I'm not sure what your problem is really. > Two questions then: > > A. Should the documentation be changed to reflect this fact more > clearly? I feel that the current situation is a disservice to the > user community. Propose changes. > B. Would it be worthwhile to improve the hibernation code to remove > this limitation? Is this of interest to anybody (other than me)? Again, propose specific changes. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2019-10-22 20:57 ` Rafael J. Wysocki @ 2019-10-22 21:26 ` Luigi Semenzato 2019-10-22 22:13 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Luigi Semenzato @ 2019-10-22 21:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon Thank you for the quick reply! On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > Following a thread in linux-pm > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues > > that may be of general interest. > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to > > fail if more than 1/2 of total RAM is in use (for instance, by > > anonymous pages). My knowledge is based on evidence, experiments, > > code inspection, the thread above, and a comment in > > Documentation/swsusp.txt, copied here: > > So I use it on a regular basis (i.e. every day) on a system that often > has over 50% or RAM in use and it all works. > > I also know about other people using it on a regular basis. > > For all of these users, it is usable. > > > "Instead, we load the image into unused memory and then atomically > > copy it back to it original location. This implies, of course, a > > maximum image size of half the amount of memory." > > That isn't right any more. An image that is loaded during resume can, > in fact, be larger than 50% of RAM. An image that is created during > hibernation, however, cannot. Sorry, I don't understand this. Are you saying that, for instance, you can resume a 30 GB image on a 32 GB device, but that image could only have been created on a 64 GB device? > > 2. There's no simple/general workaround. Rafael suggested on the > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out > > before hibernation". This is a good suggestion: I am actually close > > to achieving this using memcgroups, but it's a fair amount of work, > > and a fairly special case. Not everybody uses memcgroups, and I don't > > know of other reliable ways of forcing swap from user level. > > I don't need to do anything like that. Again, I don't understand. Why did you make that suggestion then? > hibernate_preallocate_memory() manages to free a sufficient amount of > memory on my system every time. Unfortunately this doesn't work for me. I may have described a simple experiment: on a 4GB device, create two large processes like this: dd if=/dev/zero bs=1100M count=1 | sleep infinity & dd if=/dev/zero bs=1100M count=1 | sleep infinity & so that more than 50% of TotalMem is used for anonymous pages. Then echo disk > /sys/power/state fails with ENOMEM. Is this supposed to work? Maybe I am doing something wrong? Hibernation works before I create the dd processes. After I force some of those pages to a separate swap device, hibernation works too, so those pages aren't mlocked or anything. > > 3. A feature that works only when 1/2 of total RAM can be allocated > > is, in my opinion, not usable, except possibly under special > > circumstances, such as mine. Most of the available articles and > > documentation do not mention this important fact (but for the excerpt > > I mentioned, which is not in a prominent position). > > It can be used with over 1/2 of RAM allocated and that is quite easy > to demonstrate. > > Honestly, I'm not sure what your problem is really. I apologize if I am doing something stupid and I should know better before I waste other people's time. I have been trying to explain these issues as best as I can. I have a reproducible failure. I'll be happy to provide any additional detail. > > > Two questions then: > > > > A. Should the documentation be changed to reflect this fact more > > clearly? I feel that the current situation is a disservice to the > > user community. > > Propose changes. Sure, after we resolve the above questions. > > B. Would it be worthwhile to improve the hibernation code to remove > > this limitation? Is this of interest to anybody (other than me)? > > Again, propose specific changes. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2019-10-22 21:26 ` Luigi Semenzato @ 2019-10-22 22:13 ` Rafael J. Wysocki 2019-10-22 22:53 ` Luigi Semenzato 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2019-10-22 22:13 UTC (permalink / raw) To: Luigi Semenzato Cc: Rafael J. Wysocki, linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote: > > Thank you for the quick reply! > > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > Following a thread in linux-pm > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues > > > that may be of general interest. > > > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to > > > fail if more than 1/2 of total RAM is in use (for instance, by > > > anonymous pages). My knowledge is based on evidence, experiments, > > > code inspection, the thread above, and a comment in > > > Documentation/swsusp.txt, copied here: > > > > So I use it on a regular basis (i.e. every day) on a system that often > > has over 50% or RAM in use and it all works. > > > > I also know about other people using it on a regular basis. > > > > For all of these users, it is usable. > > > > > "Instead, we load the image into unused memory and then atomically > > > copy it back to it original location. This implies, of course, a > > > maximum image size of half the amount of memory." > > > > That isn't right any more. An image that is loaded during resume can, > > in fact, be larger than 50% of RAM. An image that is created during > > hibernation, however, cannot. > > Sorry, I don't understand this. Are you saying that, for instance, > you can resume a 30 GB image on a 32 GB device, but that image could > only have been created on a 64 GB device? Had it been possible to create images larger than 50% of memory during hibernation, it would have been possible to load them during resume as well. The resume code doesn't have a 50% of RAM limitation, the image creation code does. > > > 2. There's no simple/general workaround. Rafael suggested on the > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out > > > before hibernation". This is a good suggestion: I am actually close > > > to achieving this using memcgroups, but it's a fair amount of work, > > > and a fairly special case. Not everybody uses memcgroups, and I don't > > > know of other reliable ways of forcing swap from user level. > > > > I don't need to do anything like that. > > Again, I don't understand. Why did you make that suggestion then? > > > hibernate_preallocate_memory() manages to free a sufficient amount of > > memory on my system every time. > > Unfortunately this doesn't work for me. I may have described a simple > experiment: on a 4GB device, create two large processes like this: > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > so that more than 50% of TotalMem is used for anonymous pages. Then > echo disk > /sys/power/state fails with ENOMEM. I guess hibernate_preallocate_memory() is not able to free enough memory for itself in that case. > Is this supposed to work? Yes, it is, in general. > Maybe I am doing something wrong? > Hibernation works before I create the dd processes. After I force > some of those pages to a separate swap device, hibernation works too, > so those pages aren't mlocked or anything. It looks like you are doing something that is not covered by hibernate_preallocate_memory(). > > > 3. A feature that works only when 1/2 of total RAM can be allocated > > > is, in my opinion, not usable, except possibly under special > > > circumstances, such as mine. Most of the available articles and > > > documentation do not mention this important fact (but for the excerpt > > > I mentioned, which is not in a prominent position). > > > > It can be used with over 1/2 of RAM allocated and that is quite easy > > to demonstrate. > > > > Honestly, I'm not sure what your problem is really. > > I apologize if I am doing something stupid and I should know better > before I waste other people's time. I have been trying to explain > these issues as best as I can. I have a reproducible failure. I'll > be happy to provide any additional detail. Simply put, hibernation, as implemented today, needs to allocate over 50% of RAM (or at least as much as to be able to copy all of the non-free pages) for image creation. If it cannot do that, it will fail and you know how to prevent it from allocating enough memory in a reproducible way. AFAICS that's a situation in which every attempt to allocate 50% of memory for any other purpose will fail as well. Frankly, you are first to report this problem, so it arguably is not common. It looks like hibernate_preallocate_memory() may be improved to cover that case, but then the question is how much more complicated it will have to become for this purpose and whether or not that's worth pursuing. > > > > > Two questions then: > > > > > > A. Should the documentation be changed to reflect this fact more > > > clearly? I feel that the current situation is a disservice to the > > > user community. > > > > Propose changes. > > Sure, after we resolve the above questions. > > > > B. Would it be worthwhile to improve the hibernation code to remove > > > this limitation? Is this of interest to anybody (other than me)? > > > > Again, propose specific changes. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2019-10-22 22:13 ` Rafael J. Wysocki @ 2019-10-22 22:53 ` Luigi Semenzato 2019-10-22 23:16 ` Rafael J. Wysocki 0 siblings, 1 reply; 27+ messages in thread From: Luigi Semenzato @ 2019-10-22 22:53 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon On Tue, Oct 22, 2019 at 3:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > Thank you for the quick reply! > > > > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > > > Following a thread in linux-pm > > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues > > > > that may be of general interest. > > > > > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to > > > > fail if more than 1/2 of total RAM is in use (for instance, by > > > > anonymous pages). My knowledge is based on evidence, experiments, > > > > code inspection, the thread above, and a comment in > > > > Documentation/swsusp.txt, copied here: > > > > > > So I use it on a regular basis (i.e. every day) on a system that often > > > has over 50% or RAM in use and it all works. > > > > > > I also know about other people using it on a regular basis. > > > > > > For all of these users, it is usable. > > > > > > > "Instead, we load the image into unused memory and then atomically > > > > copy it back to it original location. This implies, of course, a > > > > maximum image size of half the amount of memory." > > > > > > That isn't right any more. An image that is loaded during resume can, > > > in fact, be larger than 50% of RAM. An image that is created during > > > hibernation, however, cannot. > > > > Sorry, I don't understand this. Are you saying that, for instance, > > you can resume a 30 GB image on a 32 GB device, but that image could > > only have been created on a 64 GB device? > > Had it been possible to create images larger than 50% of memory during > hibernation, it would have been possible to load them during resume as > well. > > The resume code doesn't have a 50% of RAM limitation, the image > creation code does. Thanks a lot for the clarifications. It is possible that you and I have different definitions of "working in general". My main issue ia that I would like image creation (i.e. entering hibernation) to work with >50% of RAM in use, and I am extrapolating that other people would like that too. I can see that there are many uses where this is not needed though, especially if you mostly care about resume. > > > > > 2. There's no simple/general workaround. Rafael suggested on the > > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out > > > > before hibernation". This is a good suggestion: I am actually close > > > > to achieving this using memcgroups, but it's a fair amount of work, > > > > and a fairly special case. Not everybody uses memcgroups, and I don't > > > > know of other reliable ways of forcing swap from user level. > > > > > > I don't need to do anything like that. > > > > Again, I don't understand. Why did you make that suggestion then? > > > > > hibernate_preallocate_memory() manages to free a sufficient amount of > > > memory on my system every time. > > > > Unfortunately this doesn't work for me. I may have described a simple > > experiment: on a 4GB device, create two large processes like this: > > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > > > so that more than 50% of TotalMem is used for anonymous pages. Then > > echo disk > /sys/power/state fails with ENOMEM. > > I guess hibernate_preallocate_memory() is not able to free enough > memory for itself in that case. > > > Is this supposed to work? > > Yes, it is, in general. > > > Maybe I am doing something wrong? > > Hibernation works before I create the dd processes. After I force > > some of those pages to a separate swap device, hibernation works too, > > so those pages aren't mlocked or anything. > > It looks like you are doing something that is not covered by > hibernate_preallocate_memory(). > > > > > 3. A feature that works only when 1/2 of total RAM can be allocated > > > > is, in my opinion, not usable, except possibly under special > > > > circumstances, such as mine. Most of the available articles and > > > > documentation do not mention this important fact (but for the excerpt > > > > I mentioned, which is not in a prominent position). > > > > > > It can be used with over 1/2 of RAM allocated and that is quite easy > > > to demonstrate. > > > > > > Honestly, I'm not sure what your problem is really. > > > > I apologize if I am doing something stupid and I should know better > > before I waste other people's time. I have been trying to explain > > these issues as best as I can. I have a reproducible failure. I'll > > be happy to provide any additional detail. > > Simply put, hibernation, as implemented today, needs to allocate over > 50% of RAM (or at least as much as to be able to copy all of the > non-free pages) for image creation. If it cannot do that, it will > fail and you know how to prevent it from allocating enough memory in a > reproducible way. AFAICS that's a situation in which every attempt to > allocate 50% of memory for any other purpose will fail as well. > > Frankly, you are first to report this problem, so it arguably is not > common. It looks like hibernate_preallocate_memory() may be improved > to cover that case, but then the question is how much more complicated > it will have to become for this purpose and whether or not that's > worth pursuing. Right. I was hoping to discuss that. Is it easier to do in the kernel what I am trying to do at user level, i.e. force swap of excess pages (possibly to a separate device or partition) so that enough pages are freed up to make hibernate_preallocate_memory always succeed? I started reading the swap code, but it is entangled with page reclaim and I haven't seen a simple solution, neither do I know if there is one and how long it would take to find it, or code around it. (However I haven't looked yet at how it works when memcgroup limits are lowered---that may give me good ideas). Thanks! > > > > > > > > Two questions then: > > > > > > > > A. Should the documentation be changed to reflect this fact more > > > > clearly? I feel that the current situation is a disservice to the > > > > user community. > > > > > > Propose changes. > > > > Sure, after we resolve the above questions. > > > > > > B. Would it be worthwhile to improve the hibernation code to remove > > > > this limitation? Is this of interest to anybody (other than me)? > > > > > > Again, propose specific changes. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2019-10-22 22:53 ` Luigi Semenzato @ 2019-10-22 23:16 ` Rafael J. Wysocki 2019-10-22 23:25 ` Luigi Semenzato 0 siblings, 1 reply; 27+ messages in thread From: Rafael J. Wysocki @ 2019-10-22 23:16 UTC (permalink / raw) To: Luigi Semenzato Cc: Rafael J. Wysocki, linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon On Wed, Oct 23, 2019 at 12:53 AM Luigi Semenzato <semenzato@google.com> wrote: > > On Tue, Oct 22, 2019 at 3:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > Thank you for the quick reply! > > > > > > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > > > > > Following a thread in linux-pm > > > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues > > > > > that may be of general interest. > > > > > > > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to > > > > > fail if more than 1/2 of total RAM is in use (for instance, by > > > > > anonymous pages). My knowledge is based on evidence, experiments, > > > > > code inspection, the thread above, and a comment in > > > > > Documentation/swsusp.txt, copied here: > > > > > > > > So I use it on a regular basis (i.e. every day) on a system that often > > > > has over 50% or RAM in use and it all works. > > > > > > > > I also know about other people using it on a regular basis. > > > > > > > > For all of these users, it is usable. > > > > > > > > > "Instead, we load the image into unused memory and then atomically > > > > > copy it back to it original location. This implies, of course, a > > > > > maximum image size of half the amount of memory." > > > > > > > > That isn't right any more. An image that is loaded during resume can, > > > > in fact, be larger than 50% of RAM. An image that is created during > > > > hibernation, however, cannot. > > > > > > Sorry, I don't understand this. Are you saying that, for instance, > > > you can resume a 30 GB image on a 32 GB device, but that image could > > > only have been created on a 64 GB device? > > > > Had it been possible to create images larger than 50% of memory during > > hibernation, it would have been possible to load them during resume as > > well. > > > > The resume code doesn't have a 50% of RAM limitation, the image > > creation code does. > > Thanks a lot for the clarifications. > > It is possible that you and I have different definitions of "working > in general". My main issue ia that I would like image creation (i.e. > entering hibernation) to work with >50% of RAM in use, and I am > extrapolating that other people would like that too. I can see that > there are many uses where this is not needed though, especially if you > mostly care about resume. Also note that you need to be precise about what ">50% of RAM in use" means. For example, AFAICS hibernation works just fine for many cases in which MemFree is way below 50% of MemTotal. > > > > > > > 2. There's no simple/general workaround. Rafael suggested on the > > > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out > > > > > before hibernation". This is a good suggestion: I am actually close > > > > > to achieving this using memcgroups, but it's a fair amount of work, > > > > > and a fairly special case. Not everybody uses memcgroups, and I don't > > > > > know of other reliable ways of forcing swap from user level. > > > > > > > > I don't need to do anything like that. > > > > > > Again, I don't understand. Why did you make that suggestion then? > > > > > > > hibernate_preallocate_memory() manages to free a sufficient amount of > > > > memory on my system every time. > > > > > > Unfortunately this doesn't work for me. I may have described a simple > > > experiment: on a 4GB device, create two large processes like this: > > > > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > > > > > so that more than 50% of TotalMem is used for anonymous pages. Then > > > echo disk > /sys/power/state fails with ENOMEM. > > > > I guess hibernate_preallocate_memory() is not able to free enough > > memory for itself in that case. > > > > > Is this supposed to work? > > > > Yes, it is, in general. > > > > > Maybe I am doing something wrong? > > > Hibernation works before I create the dd processes. After I force > > > some of those pages to a separate swap device, hibernation works too, > > > so those pages aren't mlocked or anything. > > > > It looks like you are doing something that is not covered by > > hibernate_preallocate_memory(). > > > > > > > 3. A feature that works only when 1/2 of total RAM can be allocated > > > > > is, in my opinion, not usable, except possibly under special > > > > > circumstances, such as mine. Most of the available articles and > > > > > documentation do not mention this important fact (but for the excerpt > > > > > I mentioned, which is not in a prominent position). > > > > > > > > It can be used with over 1/2 of RAM allocated and that is quite easy > > > > to demonstrate. > > > > > > > > Honestly, I'm not sure what your problem is really. > > > > > > I apologize if I am doing something stupid and I should know better > > > before I waste other people's time. I have been trying to explain > > > these issues as best as I can. I have a reproducible failure. I'll > > > be happy to provide any additional detail. > > > > Simply put, hibernation, as implemented today, needs to allocate over > > 50% of RAM (or at least as much as to be able to copy all of the > > non-free pages) for image creation. If it cannot do that, it will > > fail and you know how to prevent it from allocating enough memory in a > > reproducible way. AFAICS that's a situation in which every attempt to > > allocate 50% of memory for any other purpose will fail as well. > > > > Frankly, you are first to report this problem, so it arguably is not > > common. It looks like hibernate_preallocate_memory() may be improved > > to cover that case, but then the question is how much more complicated > > it will have to become for this purpose and whether or not that's > > worth pursuing. > > Right. I was hoping to discuss that. Is it easier to do in the > kernel what I am trying to do at user level, i.e. force swap of excess > pages (possibly to a separate device or partition) so that enough > pages are freed up to make hibernate_preallocate_memory always > succeed? It should at least be possible to do that, but it's been a while since I last looked at hibernate_preallocate_memory() etc. > I started reading the swap code, but it is entangled with > page reclaim and I haven't seen a simple solution, neither do I know > if there is one and how long it would take to find it, or code around > it. (However I haven't looked yet at how it works when memcgroup > limits are lowered---that may give me good ideas). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: is hibernation usable? 2019-10-22 23:16 ` Rafael J. Wysocki @ 2019-10-22 23:25 ` Luigi Semenzato 0 siblings, 0 replies; 27+ messages in thread From: Luigi Semenzato @ 2019-10-22 23:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon On Tue, Oct 22, 2019 at 4:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Oct 23, 2019 at 12:53 AM Luigi Semenzato <semenzato@google.com> wrote: > > > > On Tue, Oct 22, 2019 at 3:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > > > Thank you for the quick reply! > > > > > > > > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > > > > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote: > > > > > > > > > > > > Following a thread in linux-pm > > > > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues > > > > > > that may be of general interest. > > > > > > > > > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to > > > > > > fail if more than 1/2 of total RAM is in use (for instance, by > > > > > > anonymous pages). My knowledge is based on evidence, experiments, > > > > > > code inspection, the thread above, and a comment in > > > > > > Documentation/swsusp.txt, copied here: > > > > > > > > > > So I use it on a regular basis (i.e. every day) on a system that often > > > > > has over 50% or RAM in use and it all works. > > > > > > > > > > I also know about other people using it on a regular basis. > > > > > > > > > > For all of these users, it is usable. > > > > > > > > > > > "Instead, we load the image into unused memory and then atomically > > > > > > copy it back to it original location. This implies, of course, a > > > > > > maximum image size of half the amount of memory." > > > > > > > > > > That isn't right any more. An image that is loaded during resume can, > > > > > in fact, be larger than 50% of RAM. An image that is created during > > > > > hibernation, however, cannot. > > > > > > > > Sorry, I don't understand this. Are you saying that, for instance, > > > > you can resume a 30 GB image on a 32 GB device, but that image could > > > > only have been created on a 64 GB device? > > > > > > Had it been possible to create images larger than 50% of memory during > > > hibernation, it would have been possible to load them during resume as > > > well. > > > > > > The resume code doesn't have a 50% of RAM limitation, the image > > > creation code does. > > > > Thanks a lot for the clarifications. > > > > It is possible that you and I have different definitions of "working > > in general". My main issue ia that I would like image creation (i.e. > > entering hibernation) to work with >50% of RAM in use, and I am > > extrapolating that other people would like that too. I can see that > > there are many uses where this is not needed though, especially if you > > mostly care about resume. > > Also note that you need to be precise about what ">50% of RAM in use" > means. For example, AFAICS hibernation works just fine for many cases > in which MemFree is way below 50% of MemTotal. Yes, I agree, that's tricky to explain. Of course here I mean the number of "saveable" pages, as defined in hibernate.c, and clearly anon pages are always saveable. > > > > > > > > > 2. There's no simple/general workaround. Rafael suggested on the > > > > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out > > > > > > before hibernation". This is a good suggestion: I am actually close > > > > > > to achieving this using memcgroups, but it's a fair amount of work, > > > > > > and a fairly special case. Not everybody uses memcgroups, and I don't > > > > > > know of other reliable ways of forcing swap from user level. > > > > > > > > > > I don't need to do anything like that. > > > > > > > > Again, I don't understand. Why did you make that suggestion then? > > > > > > > > > hibernate_preallocate_memory() manages to free a sufficient amount of > > > > > memory on my system every time. > > > > > > > > Unfortunately this doesn't work for me. I may have described a simple > > > > experiment: on a 4GB device, create two large processes like this: > > > > > > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity & > > > > > > > > so that more than 50% of TotalMem is used for anonymous pages. Then > > > > echo disk > /sys/power/state fails with ENOMEM. > > > > > > I guess hibernate_preallocate_memory() is not able to free enough > > > memory for itself in that case. > > > > > > > Is this supposed to work? > > > > > > Yes, it is, in general. > > > > > > > Maybe I am doing something wrong? > > > > Hibernation works before I create the dd processes. After I force > > > > some of those pages to a separate swap device, hibernation works too, > > > > so those pages aren't mlocked or anything. > > > > > > It looks like you are doing something that is not covered by > > > hibernate_preallocate_memory(). > > > > > > > > > 3. A feature that works only when 1/2 of total RAM can be allocated > > > > > > is, in my opinion, not usable, except possibly under special > > > > > > circumstances, such as mine. Most of the available articles and > > > > > > documentation do not mention this important fact (but for the excerpt > > > > > > I mentioned, which is not in a prominent position). > > > > > > > > > > It can be used with over 1/2 of RAM allocated and that is quite easy > > > > > to demonstrate. > > > > > > > > > > Honestly, I'm not sure what your problem is really. > > > > > > > > I apologize if I am doing something stupid and I should know better > > > > before I waste other people's time. I have been trying to explain > > > > these issues as best as I can. I have a reproducible failure. I'll > > > > be happy to provide any additional detail. > > > > > > Simply put, hibernation, as implemented today, needs to allocate over > > > 50% of RAM (or at least as much as to be able to copy all of the > > > non-free pages) for image creation. If it cannot do that, it will > > > fail and you know how to prevent it from allocating enough memory in a > > > reproducible way. AFAICS that's a situation in which every attempt to > > > allocate 50% of memory for any other purpose will fail as well. > > > > > > Frankly, you are first to report this problem, so it arguably is not > > > common. It looks like hibernate_preallocate_memory() may be improved > > > to cover that case, but then the question is how much more complicated > > > it will have to become for this purpose and whether or not that's > > > worth pursuing. > > > > Right. I was hoping to discuss that. Is it easier to do in the > > kernel what I am trying to do at user level, i.e. force swap of excess > > pages (possibly to a separate device or partition) so that enough > > pages are freed up to make hibernate_preallocate_memory always > > succeed? > > It should at least be possible to do that, but it's been a while since > I last looked at hibernate_preallocate_memory() etc. > > > I started reading the swap code, but it is entangled with > > page reclaim and I haven't seen a simple solution, neither do I know > > if there is one and how long it would take to find it, or code around > > it. (However I haven't looked yet at how it works when memcgroup > > limits are lowered---that may give me good ideas). ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2020-02-27 6:44 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-11 19:50 is hibernation usable? Chris Murphy 2020-02-11 22:23 ` Luigi Semenzato 2020-02-20 2:54 ` Chris Murphy 2020-02-20 2:56 ` Chris Murphy 2020-02-20 17:16 ` Luigi Semenzato 2020-02-20 17:38 ` Luigi Semenzato 2020-02-21 8:49 ` Michal Hocko 2020-02-21 9:04 ` Rafael J. Wysocki 2020-02-21 9:04 ` Rafael J. Wysocki 2020-02-21 9:36 ` Michal Hocko 2020-02-21 17:13 ` Luigi Semenzato 2020-02-21 17:13 ` Luigi Semenzato 2020-02-21 9:46 ` Chris Murphy 2020-02-21 9:46 ` Chris Murphy 2020-02-20 19:09 ` Chris Murphy 2020-02-20 19:44 ` Luigi Semenzato 2020-02-20 21:48 ` Chris Murphy 2020-02-20 21:48 ` Chris Murphy 2020-02-27 6:43 ` Chris Murphy 2020-02-27 6:43 ` Chris Murphy -- strict thread matches above, loose matches on Subject: below -- 2019-10-22 20:09 Luigi Semenzato 2019-10-22 20:57 ` Rafael J. Wysocki 2019-10-22 21:26 ` Luigi Semenzato 2019-10-22 22:13 ` Rafael J. Wysocki 2019-10-22 22:53 ` Luigi Semenzato 2019-10-22 23:16 ` Rafael J. Wysocki 2019-10-22 23:25 ` Luigi Semenzato
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.