All of lore.kernel.org
 help / color / mirror / Atom feed
* re: is hibernation usable?
@ 2020-02-11 19:50 Chris Murphy
  2020-02-11 22:23 ` Luigi Semenzato
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Murphy @ 2020-02-11 19:50 UTC (permalink / raw)
  To: linux-mm; +Cc: semenzato

Original thread:
https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/

This whole thread is a revelation. I have no doubt most users have no
idea that hibernation image creation is expected to fail if more than
50% RAM is used. Please bear with me while I ask some possibly
rudimentary questions to ensure I understand this in simple terms.

Example system: 32G RAM, all of it used, plus 2G of page outs (into
the swap device).

+ 2G already paged out to swap
+ 16GB needs to be paged out to swap, to free up enough memory to
create the hibernation image
+ 8-16GB for the (compressed) hibernation image to be written to a
*contiguous* range within swap device

This suggests a 26G-34G swap device, correct? (I realize that this
swap device could, in another example, contain more than 2G of page
outs already, and that would only increase this requirement.)

Is there now (or planned) an automatic kernel facility that will do
the eviction automatically, to free up enough memory, so that the
hibernation image can always be successfully created in-memory? If
not, does this suggest some facility needs to be created, maybe in
systemd, coordinating with the desktop environment? I don't need to
understand the details but I do want to understand if this exists,
will exist, and where it will exist.

One idea floated on Fedora devel@ a few months ago by a systemd
developer, is to activate a swap device at hibernation time. That way
the system is constrained to a smaller swap device, e.g. swap on
/dev/zram during normal use, but can still hibernate by activating a
suitably sized swap device on-demand. Do you anticipate any problems
with this idea? Could it be subject to race conditions?

Is there any difference in hibernation reliability between swap
partitions, versus swapfiles? I note there isn't a standard interface
for all file systems, notably Btrfs has a unique requirement [1]

Are there any prospects for signed hibernation images, in order to
support hibernation when UEFI Secure Boot is enabled?

What about the signing of swap? If there's a trust concern with the
hibernation image, and I agree that there is in the context of UEFI
SB, then it seems there's likewise a concern about active pages in
swap. Yes? No?


[1]
https://lore.kernel.org/linux-btrfs/CAJCQCtSLYY-AY8b1WZ1D4neTrwMsm_A61-G-8e6-H3Dmfue_vQ@mail.gmail.com/

Thanks!

--
Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-11 19:50 is hibernation usable? Chris Murphy
@ 2020-02-11 22:23 ` Luigi Semenzato
  2020-02-20  2:54   ` Chris Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Luigi Semenzato @ 2020-02-11 22:23 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List

On Tue, Feb 11, 2020 at 11:50 AM Chris Murphy <lists@colorremedies.com> wrote:
>
> Original thread:
> https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/
>
> This whole thread is a revelation. I have no doubt most users have no
> idea that hibernation image creation is expected to fail if more than
> 50% RAM is used. Please bear with me while I ask some possibly
> rudimentary questions to ensure I understand this in simple terms.

To be clear, I am not completely sure of this.  Other developers are
not in agreement with this (as you can see from the thread).  However,
I can easily and consistently reproduce the memory allocation failure
when anon is >50% of total.  According to others, the image allocation
should reclaim pages by forcing anon pages to swap.  I don't
understand if/how the swap partition accommodates both swapped pages
and the hibernation image, but in any case, in my experiments, I
allocate a swap disk the same size of RAM, which should be sufficient
(again, according to the threads).

> Example system: 32G RAM, all of it used, plus 2G of page outs (into
> the swap device).
>
> + 2G already paged out to swap
> + 16GB needs to be paged out to swap, to free up enough memory to
> create the hibernation image
> + 8-16GB for the (compressed) hibernation image to be written to a
> *contiguous* range within swap device
>
> This suggests a 26G-34G swap device, correct? (I realize that this
> swap device could, in another example, contain more than 2G of page
> outs already, and that would only increase this requirement.)
>
> Is there now (or planned) an automatic kernel facility that will do
> the eviction automatically, to free up enough memory, so that the
> hibernation image can always be successfully created in-memory? If
> not, does this suggest some facility needs to be created, maybe in
> systemd, coordinating with the desktop environment? I don't need to
> understand the details but I do want to understand if this exists,
> will exist, and where it will exist.

I have a workaround, but it needs memcgroups.  You can

echo $limit > .../$cgroup/memory.mem.limit_in_bytes

and if your current usage is greater than $limit, and you have swap,
the operation will block until enough pages have been swapped out to
satisfy the limit.

Even this isn't guaranteed to work, even with enough free swap.  The
limit adjustment invokes mem_cgroup_resize_limit() which contains a
loop with multiple retries of a call to do_try_to_free_pages().  The
number of retries looks like a heuristic, and I've seen the resizing
fail.




> One idea floated on Fedora devel@ a few months ago by a systemd
> developer, is to activate a swap device at hibernation time. That way
> the system is constrained to a smaller swap device, e.g. swap on
> /dev/zram during normal use, but can still hibernate by activating a
> suitably sized swap device on-demand. Do you anticipate any problems
> with this idea? Could it be subject to race conditions?
>
> Is there any difference in hibernation reliability between swap
> partitions, versus swapfiles? I note there isn't a standard interface
> for all file systems, notably Btrfs has a unique requirement [1]
>
> Are there any prospects for signed hibernation images, in order to
> support hibernation when UEFI Secure Boot is enabled?
>
> What about the signing of swap? If there's a trust concern with the
> hibernation image, and I agree that there is in the context of UEFI
> SB, then it seems there's likewise a concern about active pages in
> swap. Yes? No?
>
>
> [1]
> https://lore.kernel.org/linux-btrfs/CAJCQCtSLYY-AY8b1WZ1D4neTrwMsm_A61-G-8e6-H3Dmfue_vQ@mail.gmail.com/
>
> Thanks!
>
> --
> Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-11 22:23 ` Luigi Semenzato
@ 2020-02-20  2:54   ` Chris Murphy
  2020-02-20  2:56     ` Chris Murphy
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Murphy @ 2020-02-20  2:54 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Linux Memory Management List

On Tue, Feb 11, 2020 at 3:23 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Tue, Feb 11, 2020 at 11:50 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > Original thread:
> > https://lore.kernel.org/linux-mm/CAA25o9RSWPX8L3s=r6A+4oSdQyvGfWZ1bhKfGvSo5nN-X58HQA@mail.gmail.com/
> >
> > This whole thread is a revelation. I have no doubt most users have no
> > idea that hibernation image creation is expected to fail if more than
> > 50% RAM is used. Please bear with me while I ask some possibly
> > rudimentary questions to ensure I understand this in simple terms.
>
> To be clear, I am not completely sure of this.  Other developers are
> not in agreement with this (as you can see from the thread).  However,
> I can easily and consistently reproduce the memory allocation failure
> when anon is >50% of total.  According to others, the image allocation
> should reclaim pages by forcing anon pages to swap.  I don't
> understand if/how the swap partition accommodates both swapped pages
> and the hibernation image, but in any case, in my experiments, I
> allocate a swap disk the same size of RAM, which should be sufficient
> (again, according to the threads).

I'm testing with this method:

# echo reboot > /sys/power/disk
# echo disk > /sys/power/state

About 2/3rd of the time on a test system, hibernation entry fails.
It's fatal. The last journal entry is:
[  349.732372] PM: hibernation: hibernation entry

Screen is blank, system gets hot, fans go to high, and it doesn't
recover after 15 minutes. After forcing power off and rebooting, there
is no hibernation signature reported in the swap partition so I don't
think the kernel every reached reboot.

Shifting over to a qemu-kvm with pm support enabled, this is working.
If I fill up pretty much all of RAM and a small amount of swap is
used, the above two commands succeed, the VM reboots, and the
hibernation image is resumed without error. AnonPages is 73% of total.
Upon successful resume, it appears quite a lot of pages were pushed to
swap. It looks like about 1GiB was paged out.

Before hibernation:
$ cat /proc/meminfo
MemTotal:        2985944 kB
MemFree:          148376 kB
MemAvailable:     220428 kB
Buffers:             172 kB
Cached:           366100 kB
SwapCached:         4632 kB
Active:          1962088 kB
Inactive:         592576 kB
Active(anon):    1842560 kB
Inactive(anon):   467904 kB
Active(file):     119528 kB
Inactive(file):   124672 kB
Unevictable:        1628 kB
Mlocked:            1628 kB
SwapTotal:       3117052 kB
SwapFree:        2899952 kB
Dirty:              6248 kB
Writeback:             0 kB
AnonPages:       2187236 kB
Mapped:           245800 kB
Shmem:            120504 kB
KReclaimable:      58016 kB
Slab:             203260 kB
SReclaimable:      58016 kB
SUnreclaim:       145244 kB
KernelStack:       13712 kB
PageTables:        23364 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4610024 kB
Committed_AS:    6019396 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       27528 kB
VmallocChunk:          0 kB
Percpu:             4016 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      238332 kB
DirectMap2M:     2904064 kB


After resume:
[chris@vm ~]$ cat /proc/meminfo
MemTotal:        2985944 kB
MemFree:         1007132 kB
MemAvailable:    1069576 kB
Buffers:              76 kB
Cached:           400464 kB
SwapCached:       296112 kB
Active:           755856 kB
Inactive:         955624 kB
Active(anon):     731668 kB
Inactive(anon):   683352 kB
Active(file):      24188 kB
Inactive(file):   272272 kB
Unevictable:        1632 kB
Mlocked:            1632 kB
SwapTotal:       3117052 kB
SwapFree:        1874788 kB
Dirty:              2716 kB
Writeback:             0 kB
AnonPages:       1182108 kB
Mapped:           225352 kB
Shmem:            102480 kB
KReclaimable:      48968 kB
Slab:             183104 kB
SReclaimable:      48968 kB
SUnreclaim:       134136 kB
KernelStack:       14000 kB
PageTables:        22924 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4610024 kB
Committed_AS:    5937732 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       27800 kB
VmallocChunk:          0 kB
Percpu:             4016 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      238332 kB
DirectMap2M:     2904064 kB
$

There must be some other cause for the 50% limitation. Is it possible
it only starts once there's a certain amount of RAM present? e.g.
maybe it can only page out 4GiB of Anon pages to swap? And after that
point if at least 50% RAM isn't available, hibernation image creation
fails?


-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20  2:54   ` Chris Murphy
@ 2020-02-20  2:56     ` Chris Murphy
  2020-02-20 17:16       ` Luigi Semenzato
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Murphy @ 2020-02-20  2:56 UTC (permalink / raw)
  To: Linux Memory Management List; +Cc: Luigi Semenzato

Also, is this the correct list for hibernation/swap discussion? Or linux-pm@?

Thanks,

Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20  2:56     ` Chris Murphy
@ 2020-02-20 17:16       ` Luigi Semenzato
  2020-02-20 17:38         ` Luigi Semenzato
  2020-02-20 19:09         ` Chris Murphy
  0 siblings, 2 replies; 27+ messages in thread
From: Luigi Semenzato @ 2020-02-20 17:16 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM

I think this is the right group for the memory issues.

I suspect that the problem with failed allocations (ENOMEM) boils down
to the unreliability of the page allocator.  In my experience, under
pressure (i.e. pages must be swapped out to be reclaimed) allocations
can fail even when in theory they should succeed.  (I wish I were
wrong and that someone would convincingly correct me.)

I have a workaround in which I use memcgroups to free pages before
starting hibernation.  The cgroup request "echo $limit >
.../memory.limit_in_bytes"  blocks until memory usage in the chosen
cgroup is below $limit.  However, I have seen this request fail even
when there is extra available swap space.

The callback for the operation is mem_cgroup_resize_limit() (BTW I am
looking at kernel version 4.3.5) and that code has a loop where
try_to_free_pages() is called up to retry_count, which is at least 5.
Why 5?  One suspects that the writer of that code must have also
realized that the page freeing request is unreliable and it's worth
trying multiple times.

So you could try something similar.  I don't know if there are
interfaces to try_to_free_pages() other than those in cgroups.  If
not, and you aren't using cgroups, one way might be to start several
memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 |
sleep infinity") and monitor allocation, then when they use more than
50% of RAM kill them and immediately hibernate before the freed pages
are reused.  If you can build your custom kernel, maybe it's worth
adding a sysfs entry to invoke try_to_free_pages().  You could also
change the hibernation code to do that, but having the user-level hook
may be more flexible.


On Wed, Feb 19, 2020 at 6:56 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> Also, is this the correct list for hibernation/swap discussion? Or linux-pm@?
>
> Thanks,
>
> Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20 17:16       ` Luigi Semenzato
@ 2020-02-20 17:38         ` Luigi Semenzato
  2020-02-21  8:49           ` Michal Hocko
  2020-02-20 19:09         ` Chris Murphy
  1 sibling, 1 reply; 27+ messages in thread
From: Luigi Semenzato @ 2020-02-20 17:38 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM

I was forgetting: forcing swap by eating up memory is dangerous
because it can lead to unexpected OOM kills, but you can mitigate that
by giving the memory-eaters a higher OOM kill score.  Still, some way
of calling try_to_free_pages() directly from user-level would be
preferable.  I wonder if such API has been discussed.


On Thu, Feb 20, 2020 at 9:16 AM Luigi Semenzato <semenzato@google.com> wrote:
>
> I think this is the right group for the memory issues.
>
> I suspect that the problem with failed allocations (ENOMEM) boils down
> to the unreliability of the page allocator.  In my experience, under
> pressure (i.e. pages must be swapped out to be reclaimed) allocations
> can fail even when in theory they should succeed.  (I wish I were
> wrong and that someone would convincingly correct me.)
>
> I have a workaround in which I use memcgroups to free pages before
> starting hibernation.  The cgroup request "echo $limit >
> .../memory.limit_in_bytes"  blocks until memory usage in the chosen
> cgroup is below $limit.  However, I have seen this request fail even
> when there is extra available swap space.
>
> The callback for the operation is mem_cgroup_resize_limit() (BTW I am
> looking at kernel version 4.3.5) and that code has a loop where
> try_to_free_pages() is called up to retry_count, which is at least 5.
> Why 5?  One suspects that the writer of that code must have also
> realized that the page freeing request is unreliable and it's worth
> trying multiple times.
>
> So you could try something similar.  I don't know if there are
> interfaces to try_to_free_pages() other than those in cgroups.  If
> not, and you aren't using cgroups, one way might be to start several
> memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 |
> sleep infinity") and monitor allocation, then when they use more than
> 50% of RAM kill them and immediately hibernate before the freed pages
> are reused.  If you can build your custom kernel, maybe it's worth
> adding a sysfs entry to invoke try_to_free_pages().  You could also
> change the hibernation code to do that, but having the user-level hook
> may be more flexible.
>
>
> On Wed, Feb 19, 2020 at 6:56 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > Also, is this the correct list for hibernation/swap discussion? Or linux-pm@?
> >
> > Thanks,
> >
> > Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20 17:16       ` Luigi Semenzato
  2020-02-20 17:38         ` Luigi Semenzato
@ 2020-02-20 19:09         ` Chris Murphy
  2020-02-20 19:44           ` Luigi Semenzato
  1 sibling, 1 reply; 27+ messages in thread
From: Chris Murphy @ 2020-02-20 19:09 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
>
> I think this is the right group for the memory issues.
>
> I suspect that the problem with failed allocations (ENOMEM) boils down
> to the unreliability of the page allocator.  In my experience, under
> pressure (i.e. pages must be swapped out to be reclaimed) allocations
> can fail even when in theory they should succeed.  (I wish I were
> wrong and that someone would convincingly correct me.)

What is vm.swappiness set to on your system? A fellow Fedora
contributor who has consistently reproduced what you describe, has
discovered he has vm.swappiness=0, and even if it's set to 1, the
problem no longer happens. And this is not a documented consequence of
using a value of 0.


> I have a workaround in which I use memcgroups to free pages before
> starting hibernation.  The cgroup request "echo $limit >
> .../memory.limit_in_bytes"  blocks until memory usage in the chosen
> cgroup is below $limit.  However, I have seen this request fail even
> when there is extra available swap space.
>
> The callback for the operation is mem_cgroup_resize_limit() (BTW I am
> looking at kernel version 4.3.5) and that code has a loop where
> try_to_free_pages() is called up to retry_count, which is at least 5.
> Why 5?  One suspects that the writer of that code must have also
> realized that the page freeing request is unreliable and it's worth
> trying multiple times.
>
> So you could try something similar.  I don't know if there are
> interfaces to try_to_free_pages() other than those in cgroups.  If
> not, and you aren't using cgroups, one way might be to start several
> memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 |
> sleep infinity") and monitor allocation, then when they use more than
> 50% of RAM kill them and immediately hibernate before the freed pages
> are reused.  If you can build your custom kernel, maybe it's worth
> adding a sysfs entry to invoke try_to_free_pages().  You could also
> change the hibernation code to do that, but having the user-level hook
> may be more flexible.

Fedora 31+ now uses cgroupsv2. In any case, my use case is making sure
this works correctly, sanely, with mainline kernels because Fedora
doesn't do custom things with the kernel.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20 19:09         ` Chris Murphy
@ 2020-02-20 19:44           ` Luigi Semenzato
  2020-02-20 21:48               ` Chris Murphy
  2020-02-27  6:43               ` Chris Murphy
  0 siblings, 2 replies; 27+ messages in thread
From: Luigi Semenzato @ 2020-02-20 19:44 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> >
> > I think this is the right group for the memory issues.
> >
> > I suspect that the problem with failed allocations (ENOMEM) boils down
> > to the unreliability of the page allocator.  In my experience, under
> > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > can fail even when in theory they should succeed.  (I wish I were
> > wrong and that someone would convincingly correct me.)
>
> What is vm.swappiness set to on your system? A fellow Fedora
> contributor who has consistently reproduced what you describe, has
> discovered he has vm.swappiness=0, and even if it's set to 1, the
> problem no longer happens. And this is not a documented consequence of
> using a value of 0.

I am using the default value of 60.

A zero value should cause all file pages to be discarded before any
anonymous pages are swapped.  I wonder if the fellow Fedora
contributor's workload has a lot of file pages, so that discarding
them is enough for the image allocator to succeed. In that case "sync;
echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
the same result.  (By the way, in my experiments I do that just before
hibernating.)

> > I have a workaround in which I use memcgroups to free pages before
> > starting hibernation.  The cgroup request "echo $limit >
> > .../memory.limit_in_bytes"  blocks until memory usage in the chosen
> > cgroup is below $limit.  However, I have seen this request fail even
> > when there is extra available swap space.
> >
> > The callback for the operation is mem_cgroup_resize_limit() (BTW I am
> > looking at kernel version 4.3.5) and that code has a loop where
> > try_to_free_pages() is called up to retry_count, which is at least 5.
> > Why 5?  One suspects that the writer of that code must have also
> > realized that the page freeing request is unreliable and it's worth
> > trying multiple times.
> >
> > So you could try something similar.  I don't know if there are
> > interfaces to try_to_free_pages() other than those in cgroups.  If
> > not, and you aren't using cgroups, one way might be to start several
> > memory-eating processes (such as "dd if=/dev/zero bs=1G count=1 |
> > sleep infinity") and monitor allocation, then when they use more than
> > 50% of RAM kill them and immediately hibernate before the freed pages
> > are reused.  If you can build your custom kernel, maybe it's worth
> > adding a sysfs entry to invoke try_to_free_pages().  You could also
> > change the hibernation code to do that, but having the user-level hook
> > may be more flexible.
>
> Fedora 31+ now uses cgroupsv2. In any case, my use case is making sure
> this works correctly, sanely, with mainline kernels because Fedora
> doesn't do custom things with the kernel.
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20 19:44           ` Luigi Semenzato
@ 2020-02-20 21:48               ` Chris Murphy
  2020-02-27  6:43               ` Chris Murphy
  1 sibling, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2020-02-20 21:48 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

Unfortunately I can't reproduce graceful failure you describe, myself.
I either get successful hibernation/resume or some kind of
non-deterministic and fatal failure to enter hibernation - and any
dmesg/journal that might contain evidence of the failure is lost. I've
had better success with qemu-kvm testing, but even in that case I see
about 1/4 of the time (with a ridiculously small sample size) failure
to complete hibernation entry. I can't tell if the failure happens
during page out, hibernation image creation, or hibernation image
write out - but the result is a black screen (virt-manager console)
and the VM never shutsdown or reboots, it just hangs and spins ~400%
CPU (even though it's only assigned 3 CPUs).

It's sufficiently unreliable that I can't really consider it supported
or supportable.

Microsoft and Apple have put more emphasis lately on S0 low power
idle, faster booting, and application state saving. The behavior in
Windows 10 with hiberfil.sys is a limited environment, essentially
that of the login window (no user environment state is saved in it),
and is used both for resuming from S4, as well as fast boot. A
separate file pagefile.sys is used for paging, so there's never a
conflict where a use case that depends on significant page out can
prevent hibernation from succeeding. It's also Secure Boot compatible.
Where on linux with x86_64 it isn't.

Between kernel and ACPI and firmware bugs, it's going to take a lot
more effort to make it reliable and trustworthy for the general case.
Or it should just be abandoned, it seems to be mostly that way
already.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
@ 2020-02-20 21:48               ` Chris Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2020-02-20 21:48 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

Unfortunately I can't reproduce graceful failure you describe, myself.
I either get successful hibernation/resume or some kind of
non-deterministic and fatal failure to enter hibernation - and any
dmesg/journal that might contain evidence of the failure is lost. I've
had better success with qemu-kvm testing, but even in that case I see
about 1/4 of the time (with a ridiculously small sample size) failure
to complete hibernation entry. I can't tell if the failure happens
during page out, hibernation image creation, or hibernation image
write out - but the result is a black screen (virt-manager console)
and the VM never shutsdown or reboots, it just hangs and spins ~400%
CPU (even though it's only assigned 3 CPUs).

It's sufficiently unreliable that I can't really consider it supported
or supportable.

Microsoft and Apple have put more emphasis lately on S0 low power
idle, faster booting, and application state saving. The behavior in
Windows 10 with hiberfil.sys is a limited environment, essentially
that of the login window (no user environment state is saved in it),
and is used both for resuming from S4, as well as fast boot. A
separate file pagefile.sys is used for paging, so there's never a
conflict where a use case that depends on significant page out can
prevent hibernation from succeeding. It's also Secure Boot compatible.
Where on linux with x86_64 it isn't.

Between kernel and ACPI and firmware bugs, it's going to take a lot
more effort to make it reliable and trustworthy for the general case.
Or it should just be abandoned, it seems to be mostly that way
already.

-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20 17:38         ` Luigi Semenzato
@ 2020-02-21  8:49           ` Michal Hocko
  2020-02-21  9:04               ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2020-02-21  8:49 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> I was forgetting: forcing swap by eating up memory is dangerous
> because it can lead to unexpected OOM kills

Could you be more specific what you have in mind? swapoff causing the
OOM killer?

> , but you can mitigate that
> by giving the memory-eaters a higher OOM kill score.  Still, some way
> of calling try_to_free_pages() directly from user-level would be
> preferable.  I wonder if such API has been discussed.

No, there is no API to trigger the global memory reclaim. You could
start the reclaim by increasing min_free_kbytes but I wouldn't really
recommend that unless you know exactly what you are doing and also I
fail to see the point. If s2disk fails due to insufficient swap space
then how can a pro-active reclaim help in the first place?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-21  8:49           ` Michal Hocko
@ 2020-02-21  9:04               ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-02-21  9:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > I was forgetting: forcing swap by eating up memory is dangerous
> > because it can lead to unexpected OOM kills
>
> Could you be more specific what you have in mind? swapoff causing the
> OOM killer?
>
> > , but you can mitigate that
> > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > of calling try_to_free_pages() directly from user-level would be
> > preferable.  I wonder if such API has been discussed.
>
> No, there is no API to trigger the global memory reclaim. You could
> start the reclaim by increasing min_free_kbytes but I wouldn't really
> recommend that unless you know exactly what you are doing and also I
> fail to see the point. If s2disk fails due to insufficient swap space
> then how can a pro-active reclaim help in the first place?

My understanding of the problem is that the size of swap is
(theoretically) sufficient, but it is not used as expected during the
preallocation of image memory.

It was stated in one of the previous messages (not in this thread,
cannot find it now) that swap (of the same size as RAM) was activated
(swapon) right before hibernation, so theoretically that should be
sufficient AFAICS.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
@ 2020-02-21  9:04               ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-02-21  9:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > I was forgetting: forcing swap by eating up memory is dangerous
> > because it can lead to unexpected OOM kills
>
> Could you be more specific what you have in mind? swapoff causing the
> OOM killer?
>
> > , but you can mitigate that
> > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > of calling try_to_free_pages() directly from user-level would be
> > preferable.  I wonder if such API has been discussed.
>
> No, there is no API to trigger the global memory reclaim. You could
> start the reclaim by increasing min_free_kbytes but I wouldn't really
> recommend that unless you know exactly what you are doing and also I
> fail to see the point. If s2disk fails due to insufficient swap space
> then how can a pro-active reclaim help in the first place?

My understanding of the problem is that the size of swap is
(theoretically) sufficient, but it is not used as expected during the
preallocation of image memory.

It was stated in one of the previous messages (not in this thread,
cannot find it now) that swap (of the same size as RAM) was activated
(swapon) right before hibernation, so theoretically that should be
sufficient AFAICS.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-21  9:04               ` Rafael J. Wysocki
  (?)
@ 2020-02-21  9:36               ` Michal Hocko
  2020-02-21 17:13                   ` Luigi Semenzato
  -1 siblings, 1 reply; 27+ messages in thread
From: Michal Hocko @ 2020-02-21  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luigi Semenzato, Chris Murphy, Linux Memory Management List, Linux PM

On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote:
> On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > > I was forgetting: forcing swap by eating up memory is dangerous
> > > because it can lead to unexpected OOM kills
> >
> > Could you be more specific what you have in mind? swapoff causing the
> > OOM killer?
> >
> > > , but you can mitigate that
> > > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > > of calling try_to_free_pages() directly from user-level would be
> > > preferable.  I wonder if such API has been discussed.
> >
> > No, there is no API to trigger the global memory reclaim. You could
> > start the reclaim by increasing min_free_kbytes but I wouldn't really
> > recommend that unless you know exactly what you are doing and also I
> > fail to see the point. If s2disk fails due to insufficient swap space
> > then how can a pro-active reclaim help in the first place?
> 
> My understanding of the problem is that the size of swap is
> (theoretically) sufficient, but it is not used as expected during the
> preallocation of image memory.
> 
> It was stated in one of the previous messages (not in this thread,
> cannot find it now) that swap (of the same size as RAM) was activated
> (swapon) right before hibernation, so theoretically that should be
> sufficient AFAICS.

Hmm, this is interesting. Let me have a closer look...

pm_restrict_gfp_mask which would completely rule out any IO
happens after hibernate_preallocate_memory is done and my limited
understanding tells me that this is where all the reclaim happens
(via shrink_all_memory). It is quite possible that the MM decides to
not swap in that path - depending on the memory usage - and miss it's
target. More details would be needed. E.g. vmscan tracepoints could tell
us more.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-21  9:04               ` Rafael J. Wysocki
@ 2020-02-21  9:46                 ` Chris Murphy
  -1 siblings, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2020-02-21  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Michal Hocko, Luigi Semenzato, Chris Murphy,
	Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 2:04 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> My understanding of the problem is that the size of swap is
> (theoretically) sufficient, but it is not used as expected during the
> preallocation of image memory.

Right. I have no idea how locality of pages is determined in the swap
device. But if it's sufficiently fragmented such that contiguous free
space for a hibernation image is not sufficient, then hibernation
could fail.

> It was stated in one of the previous messages (not in this thread,
> cannot find it now) that swap (of the same size as RAM) was activated
> (swapon) right before hibernation, so theoretically that should be
> sufficient AFAICS.

I mentioned it as an idea floated by systemd developers. I'm not sure
if it's mentioned elsewhere. Some folks wonder if such functionality
could be prone to racing.
https://lore.kernel.org/linux-mm/CAJCQCtSx0FOX7q0p=9XgDLJ6O0+hF_vc-wU4KL=c9xoSGGkstA@mail.gmail.com/T/#m4d47d127da493f998b232d42d81621335358aee1

Another idea that's been suggested for a while is formally separating
hibernation and paging into separate files (or partitions).
a. Guarantees hibernation image has the necessary contiguous free space.
b. Might be easier to create (or even obviate) a sane interface for
hibernation images in swapfiles; that is, if it were a dedicated
hibernationfile rather than being inserted in a swapfile. Right now
that interface doesn't exist, so e.g. on Btrfs while it can support
swapfiles and hibernation images, the offset has to be figured out
manually so resume can succeed.
https://github.com/systemd/systemd/issues/11939#issuecomment-471684411





--
Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
@ 2020-02-21  9:46                 ` Chris Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2020-02-21  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Michal Hocko, Luigi Semenzato, Chris Murphy,
	Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 2:04 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> My understanding of the problem is that the size of swap is
> (theoretically) sufficient, but it is not used as expected during the
> preallocation of image memory.

Right. I have no idea how locality of pages is determined in the swap
device. But if it's sufficiently fragmented such that contiguous free
space for a hibernation image is not sufficient, then hibernation
could fail.

> It was stated in one of the previous messages (not in this thread,
> cannot find it now) that swap (of the same size as RAM) was activated
> (swapon) right before hibernation, so theoretically that should be
> sufficient AFAICS.

I mentioned it as an idea floated by systemd developers. I'm not sure
if it's mentioned elsewhere. Some folks wonder if such functionality
could be prone to racing.
https://lore.kernel.org/linux-mm/CAJCQCtSx0FOX7q0p=9XgDLJ6O0+hF_vc-wU4KL=c9xoSGGkstA@mail.gmail.com/T/#m4d47d127da493f998b232d42d81621335358aee1

Another idea that's been suggested for a while is formally separating
hibernation and paging into separate files (or partitions).
a. Guarantees hibernation image has the necessary contiguous free space.
b. Might be easier to create (or even obviate) a sane interface for
hibernation images in swapfiles; that is, if it were a dedicated
hibernationfile rather than being inserted in a swapfile. Right now
that interface doesn't exist, so e.g. on Btrfs while it can support
swapfiles and hibernation images, the offset has to be figured out
manually so resume can succeed.
https://github.com/systemd/systemd/issues/11939#issuecomment-471684411





--
Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-21  9:36               ` Michal Hocko
@ 2020-02-21 17:13                   ` Luigi Semenzato
  0 siblings, 0 replies; 27+ messages in thread
From: Luigi Semenzato @ 2020-02-21 17:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, Chris Murphy, Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 1:36 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote:
> > On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > > > I was forgetting: forcing swap by eating up memory is dangerous
> > > > because it can lead to unexpected OOM kills
> > >
> > > Could you be more specific what you have in mind? swapoff causing the
> > > OOM killer?

No, not swapoff, just fast allocation.

Also, in some earlier experiments I tried gradually increasing
min_free_kbytes (precisely as suggested) and this would randomly
trigger OOM kills when swap space was still available.

> > > > , but you can mitigate that
> > > > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > > > of calling try_to_free_pages() directly from user-level would be
> > > > preferable.  I wonder if such API has been discussed.
> > >
> > > No, there is no API to trigger the global memory reclaim. You could
> > > start the reclaim by increasing min_free_kbytes but I wouldn't really
> > > recommend that unless you know exactly what you are doing and also I
> > > fail to see the point. If s2disk fails due to insufficient swap space
> > > then how can a pro-active reclaim help in the first place?
> >
> > My understanding of the problem is that the size of swap is
> > (theoretically) sufficient, but it is not used as expected during the
> > preallocation of image memory.
> >
> > It was stated in one of the previous messages (not in this thread,
> > cannot find it now) that swap (of the same size as RAM) was activated
> > (swapon) right before hibernation, so theoretically that should be
> > sufficient AFAICS.

Correct, those were my experiments.  Search the archives for
"semenzato", there are a couple of threads on the topic.

But really, why not have a user-level interface for reclaim?  I find
it very difficult to understand the behavior of the reclaim code, and
any attempt to reclaim from user level (memory-eating processes,
raising min_free_kbytes) can end in the OOM-kill path.  Using cgroups'
memory.limit_in_bytes doesn't have this problem, precisely because it
only calls try_to_free_pages(), which doesn't trigger OOM killing.  If
I could make that call from user level (without cgroups) it would
greatly simplify my current workaround, and would be useful in other
situations as well.

Something like

  echo $page_count > /proc/sys/vm/try_to_free_pages
  cat /proc/sys/vm/pages_freed   # the number of pages freed at the
latest request

> Hmm, this is interesting. Let me have a closer look...
>
> pm_restrict_gfp_mask which would completely rule out any IO
> happens after hibernate_preallocate_memory is done and my limited
> understanding tells me that this is where all the reclaim happens
> (via shrink_all_memory). It is quite possible that the MM decides to
> not swap in that path - depending on the memory usage - and miss it's
> target. More details would be needed. E.g. vmscan tracepoints could tell
> us more.
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
@ 2020-02-21 17:13                   ` Luigi Semenzato
  0 siblings, 0 replies; 27+ messages in thread
From: Luigi Semenzato @ 2020-02-21 17:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, Chris Murphy, Linux Memory Management List, Linux PM

On Fri, Feb 21, 2020 at 1:36 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 21-02-20 10:04:18, Rafael J. Wysocki wrote:
> > On Fri, Feb 21, 2020 at 9:49 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Thu 20-02-20 09:38:06, Luigi Semenzato wrote:
> > > > I was forgetting: forcing swap by eating up memory is dangerous
> > > > because it can lead to unexpected OOM kills
> > >
> > > Could you be more specific what you have in mind? swapoff causing the
> > > OOM killer?

No, not swapoff, just fast allocation.

Also, in some earlier experiments I tried gradually increasing
min_free_kbytes (precisely as suggested) and this would randomly
trigger OOM kills when swap space was still available.

> > > > , but you can mitigate that
> > > > by giving the memory-eaters a higher OOM kill score.  Still, some way
> > > > of calling try_to_free_pages() directly from user-level would be
> > > > preferable.  I wonder if such API has been discussed.
> > >
> > > No, there is no API to trigger the global memory reclaim. You could
> > > start the reclaim by increasing min_free_kbytes but I wouldn't really
> > > recommend that unless you know exactly what you are doing and also I
> > > fail to see the point. If s2disk fails due to insufficient swap space
> > > then how can a pro-active reclaim help in the first place?
> >
> > My understanding of the problem is that the size of swap is
> > (theoretically) sufficient, but it is not used as expected during the
> > preallocation of image memory.
> >
> > It was stated in one of the previous messages (not in this thread,
> > cannot find it now) that swap (of the same size as RAM) was activated
> > (swapon) right before hibernation, so theoretically that should be
> > sufficient AFAICS.

Correct, those were my experiments.  Search the archives for
"semenzato", there are a couple of threads on the topic.

But really, why not have a user-level interface for reclaim?  I find
it very difficult to understand the behavior of the reclaim code, and
any attempt to reclaim from user level (memory-eating processes,
raising min_free_kbytes) can end in the OOM-kill path.  Using cgroups'
memory.limit_in_bytes doesn't have this problem, precisely because it
only calls try_to_free_pages(), which doesn't trigger OOM killing.  If
I could make that call from user level (without cgroups) it would
greatly simplify my current workaround, and would be useful in other
situations as well.

Something like

  echo $page_count > /proc/sys/vm/try_to_free_pages
  cat /proc/sys/vm/pages_freed   # the number of pages freed at the
latest request

> Hmm, this is interesting. Let me have a closer look...
>
> pm_restrict_gfp_mask which would completely rule out any IO
> happens after hibernate_preallocate_memory is done and my limited
> understanding tells me that this is where all the reclaim happens
> (via shrink_all_memory). It is quite possible that the MM decides to
> not swap in that path - depending on the memory usage - and miss it's
> target. More details would be needed. E.g. vmscan tracepoints could tell
> us more.
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2020-02-20 19:44           ` Luigi Semenzato
@ 2020-02-27  6:43               ` Chris Murphy
  2020-02-27  6:43               ` Chris Murphy
  1 sibling, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2020-02-27  6:43 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

He reports hibernation failure even if he drops caches beforehand.

https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.org/message/XYWYF33RFVISVZTPYSJRRXP7TFXPV4GD/

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
@ 2020-02-27  6:43               ` Chris Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Murphy @ 2020-02-27  6:43 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Chris Murphy, Linux Memory Management List, Linux PM

On Thu, Feb 20, 2020 at 12:45 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Thu, Feb 20, 2020 at 11:09 AM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Feb 20, 2020 at 10:16 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I think this is the right group for the memory issues.
> > >
> > > I suspect that the problem with failed allocations (ENOMEM) boils down
> > > to the unreliability of the page allocator.  In my experience, under
> > > pressure (i.e. pages must be swapped out to be reclaimed) allocations
> > > can fail even when in theory they should succeed.  (I wish I were
> > > wrong and that someone would convincingly correct me.)
> >
> > What is vm.swappiness set to on your system? A fellow Fedora
> > contributor who has consistently reproduced what you describe, has
> > discovered he has vm.swappiness=0, and even if it's set to 1, the
> > problem no longer happens. And this is not a documented consequence of
> > using a value of 0.
>
> I am using the default value of 60.
>
> A zero value should cause all file pages to be discarded before any
> anonymous pages are swapped.  I wonder if the fellow Fedora
> contributor's workload has a lot of file pages, so that discarding
> them is enough for the image allocator to succeed. In that case "sync;
> echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving
> the same result.  (By the way, in my experiments I do that just before
> hibernating.)

He reports hibernation failure even if he drops caches beforehand.

https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.org/message/XYWYF33RFVISVZTPYSJRRXP7TFXPV4GD/

-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2019-10-22 23:16         ` Rafael J. Wysocki
@ 2019-10-22 23:25           ` Luigi Semenzato
  0 siblings, 0 replies; 27+ messages in thread
From: Luigi Semenzato @ 2019-10-22 23:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira,
	Sonny Rao, Brian Geffon

On Tue, Oct 22, 2019 at 4:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Oct 23, 2019 at 12:53 AM Luigi Semenzato <semenzato@google.com> wrote:
> >
> > On Tue, Oct 22, 2019 at 3:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote:
> > > >
> > > > Thank you for the quick reply!
> > > >
> > > > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote:
> > > > > >
> > > > > > Following a thread in linux-pm
> > > > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
> > > > > > that may be of general interest.
> > > > > >
> > > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to
> > > > > > fail if more than 1/2 of total RAM is in use (for instance, by
> > > > > > anonymous pages).  My knowledge is based on evidence, experiments,
> > > > > > code inspection, the thread above, and a comment in
> > > > > > Documentation/swsusp.txt, copied here:
> > > > >
> > > > > So I use it on a regular basis (i.e. every day) on a system that often
> > > > > has over 50% or RAM in use and it all works.
> > > > >
> > > > > I also know about other people using it on a regular basis.
> > > > >
> > > > > For all of these users, it is usable.
> > > > >
> > > > > >  "Instead, we load the image into unused memory and then atomically
> > > > > > copy it back to it original location. This implies, of course, a
> > > > > > maximum image size of half the amount of memory."
> > > > >
> > > > > That isn't right any more.  An image that is loaded during resume can,
> > > > > in fact, be larger than 50% of RAM.  An image that is created during
> > > > > hibernation, however, cannot.
> > > >
> > > > Sorry, I don't understand this.  Are you saying that, for instance,
> > > > you can resume a 30 GB image on a 32 GB device, but that image could
> > > > only have been created on a 64 GB device?
> > >
> > > Had it been possible to create images larger than 50% of memory during
> > > hibernation, it would have been possible to load them during resume as
> > > well.
> > >
> > > The resume code doesn't have a 50% of RAM limitation, the image
> > > creation code does.
> >
> > Thanks a lot for the clarifications.
> >
> > It is possible that you and I have different definitions of "working
> > in general".  My main issue ia that I would like image creation (i.e.
> > entering hibernation) to work with >50% of RAM in use, and I am
> > extrapolating that other people would like that too.  I can see that
> > there are many uses where this is not needed though, especially if you
> > mostly care about resume.
>
> Also note that you need to be precise about what ">50% of RAM in use"
> means.  For example, AFAICS hibernation works just fine for many cases
> in which MemFree is way below 50% of MemTotal.

Yes, I agree, that's tricky to explain.  Of course here I mean the
number of "saveable" pages, as defined in hibernate.c, and clearly
anon pages are always saveable.

> > >
> > > > > > 2. There's no simple/general workaround.  Rafael suggested on the
> > > > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
> > > > > > before hibernation".  This is a good suggestion: I am actually close
> > > > > > to achieving this using memcgroups, but it's a fair amount of work,
> > > > > > and a fairly special case.  Not everybody uses memcgroups, and I don't
> > > > > > know of other reliable ways of forcing swap from user level.
> > > > >
> > > > > I don't need to do anything like that.
> > > >
> > > > Again, I don't understand.  Why did you make that suggestion then?
> > > >
> > > > > hibernate_preallocate_memory() manages to free a sufficient amount of
> > > > > memory on my system every time.
> > > >
> > > > Unfortunately this doesn't work for me.  I may have described a simple
> > > > experiment: on a 4GB device, create two large processes like this:
> > > >
> > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> > > > dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> > > >
> > > > so that more than 50% of TotalMem is used for anonymous pages.  Then
> > > > echo disk > /sys/power/state fails with ENOMEM.
> > >
> > > I guess hibernate_preallocate_memory() is not able to free enough
> > > memory for itself in that case.
> > >
> > > > Is this supposed to work?
> > >
> > > Yes, it is, in general.
> > >
> > > > Maybe I am doing something wrong?
> > > > Hibernation works before I create the dd processes.  After I force
> > > > some of those pages to a separate swap device, hibernation works too,
> > > > so those pages aren't mlocked or anything.
> > >
> > > It looks like you are doing something that is not covered by
> > > hibernate_preallocate_memory().
> > >
> > > > > > 3. A feature that works only when 1/2 of total RAM can be allocated
> > > > > > is, in my opinion, not usable, except possibly under special
> > > > > > circumstances, such as mine. Most of the available articles and
> > > > > > documentation do not mention this important fact (but for the excerpt
> > > > > > I mentioned, which is not in a prominent position).
> > > > >
> > > > > It can be used with over 1/2 of RAM allocated and that is quite easy
> > > > > to demonstrate.
> > > > >
> > > > > Honestly, I'm not sure what your problem is really.
> > > >
> > > > I apologize if I am doing something stupid and I should know better
> > > > before I waste other people's time.  I have been trying to explain
> > > > these issues as best as I can.  I have a reproducible failure.  I'll
> > > > be happy to provide any additional detail.
> > >
> > > Simply put, hibernation, as implemented today, needs to allocate over
> > > 50% of RAM (or at least as much as to be able to copy all of the
> > > non-free pages) for image creation.  If it cannot do that, it will
> > > fail and you know how to prevent it from allocating enough memory in a
> > > reproducible way.  AFAICS that's a situation in which every attempt to
> > > allocate 50% of memory for any other purpose will fail as well.
> > >
> > > Frankly, you are first to report this problem, so it arguably is not
> > > common.  It looks like hibernate_preallocate_memory() may be improved
> > > to cover that case, but then the question is how much more complicated
> > > it will have to become for this purpose and whether or not that's
> > > worth pursuing.
> >
> > Right.  I was hoping to discuss that.  Is it easier to do in the
> > kernel what I am trying to do at user level, i.e. force swap of excess
> > pages (possibly to a separate device or partition) so that enough
> > pages are freed up to make hibernate_preallocate_memory always
> > succeed?
>
> It should at least be possible to do that, but it's been a while since
> I last looked at hibernate_preallocate_memory() etc.
>
> > I started reading the swap code, but it is entangled with
> > page reclaim and I haven't seen a simple solution, neither do I know
> > if there is one and how long it would take to find it, or code around
> > it.  (However I haven't looked yet at how it works when memcgroup
> > limits are lowered---that may give me good ideas).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2019-10-22 22:53       ` Luigi Semenzato
@ 2019-10-22 23:16         ` Rafael J. Wysocki
  2019-10-22 23:25           ` Luigi Semenzato
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2019-10-22 23:16 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Rafael J. Wysocki, linux-kernel, Linux PM, Andrew Morton,
	Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon

On Wed, Oct 23, 2019 at 12:53 AM Luigi Semenzato <semenzato@google.com> wrote:
>
> On Tue, Oct 22, 2019 at 3:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > Thank you for the quick reply!
> > >
> > > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote:
> > > > >
> > > > > Following a thread in linux-pm
> > > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
> > > > > that may be of general interest.
> > > > >
> > > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to
> > > > > fail if more than 1/2 of total RAM is in use (for instance, by
> > > > > anonymous pages).  My knowledge is based on evidence, experiments,
> > > > > code inspection, the thread above, and a comment in
> > > > > Documentation/swsusp.txt, copied here:
> > > >
> > > > So I use it on a regular basis (i.e. every day) on a system that often
> > > > has over 50% or RAM in use and it all works.
> > > >
> > > > I also know about other people using it on a regular basis.
> > > >
> > > > For all of these users, it is usable.
> > > >
> > > > >  "Instead, we load the image into unused memory and then atomically
> > > > > copy it back to it original location. This implies, of course, a
> > > > > maximum image size of half the amount of memory."
> > > >
> > > > That isn't right any more.  An image that is loaded during resume can,
> > > > in fact, be larger than 50% of RAM.  An image that is created during
> > > > hibernation, however, cannot.
> > >
> > > Sorry, I don't understand this.  Are you saying that, for instance,
> > > you can resume a 30 GB image on a 32 GB device, but that image could
> > > only have been created on a 64 GB device?
> >
> > Had it been possible to create images larger than 50% of memory during
> > hibernation, it would have been possible to load them during resume as
> > well.
> >
> > The resume code doesn't have a 50% of RAM limitation, the image
> > creation code does.
>
> Thanks a lot for the clarifications.
>
> It is possible that you and I have different definitions of "working
> in general".  My main issue ia that I would like image creation (i.e.
> entering hibernation) to work with >50% of RAM in use, and I am
> extrapolating that other people would like that too.  I can see that
> there are many uses where this is not needed though, especially if you
> mostly care about resume.

Also note that you need to be precise about what ">50% of RAM in use"
means.  For example, AFAICS hibernation works just fine for many cases
in which MemFree is way below 50% of MemTotal.

> >
> > > > > 2. There's no simple/general workaround.  Rafael suggested on the
> > > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
> > > > > before hibernation".  This is a good suggestion: I am actually close
> > > > > to achieving this using memcgroups, but it's a fair amount of work,
> > > > > and a fairly special case.  Not everybody uses memcgroups, and I don't
> > > > > know of other reliable ways of forcing swap from user level.
> > > >
> > > > I don't need to do anything like that.
> > >
> > > Again, I don't understand.  Why did you make that suggestion then?
> > >
> > > > hibernate_preallocate_memory() manages to free a sufficient amount of
> > > > memory on my system every time.
> > >
> > > Unfortunately this doesn't work for me.  I may have described a simple
> > > experiment: on a 4GB device, create two large processes like this:
> > >
> > > dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> > > dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> > >
> > > so that more than 50% of TotalMem is used for anonymous pages.  Then
> > > echo disk > /sys/power/state fails with ENOMEM.
> >
> > I guess hibernate_preallocate_memory() is not able to free enough
> > memory for itself in that case.
> >
> > > Is this supposed to work?
> >
> > Yes, it is, in general.
> >
> > > Maybe I am doing something wrong?
> > > Hibernation works before I create the dd processes.  After I force
> > > some of those pages to a separate swap device, hibernation works too,
> > > so those pages aren't mlocked or anything.
> >
> > It looks like you are doing something that is not covered by
> > hibernate_preallocate_memory().
> >
> > > > > 3. A feature that works only when 1/2 of total RAM can be allocated
> > > > > is, in my opinion, not usable, except possibly under special
> > > > > circumstances, such as mine. Most of the available articles and
> > > > > documentation do not mention this important fact (but for the excerpt
> > > > > I mentioned, which is not in a prominent position).
> > > >
> > > > It can be used with over 1/2 of RAM allocated and that is quite easy
> > > > to demonstrate.
> > > >
> > > > Honestly, I'm not sure what your problem is really.
> > >
> > > I apologize if I am doing something stupid and I should know better
> > > before I waste other people's time.  I have been trying to explain
> > > these issues as best as I can.  I have a reproducible failure.  I'll
> > > be happy to provide any additional detail.
> >
> > Simply put, hibernation, as implemented today, needs to allocate over
> > 50% of RAM (or at least as much as to be able to copy all of the
> > non-free pages) for image creation.  If it cannot do that, it will
> > fail and you know how to prevent it from allocating enough memory in a
> > reproducible way.  AFAICS that's a situation in which every attempt to
> > allocate 50% of memory for any other purpose will fail as well.
> >
> > Frankly, you are first to report this problem, so it arguably is not
> > common.  It looks like hibernate_preallocate_memory() may be improved
> > to cover that case, but then the question is how much more complicated
> > it will have to become for this purpose and whether or not that's
> > worth pursuing.
>
> Right.  I was hoping to discuss that.  Is it easier to do in the
> kernel what I am trying to do at user level, i.e. force swap of excess
> pages (possibly to a separate device or partition) so that enough
> pages are freed up to make hibernate_preallocate_memory always
> succeed?

It should at least be possible to do that, but it's been a while since
I last looked at hibernate_preallocate_memory() etc.

> I started reading the swap code, but it is entangled with
> page reclaim and I haven't seen a simple solution, neither do I know
> if there is one and how long it would take to find it, or code around
> it.  (However I haven't looked yet at how it works when memcgroup
> limits are lowered---that may give me good ideas).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2019-10-22 22:13     ` Rafael J. Wysocki
@ 2019-10-22 22:53       ` Luigi Semenzato
  2019-10-22 23:16         ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Luigi Semenzato @ 2019-10-22 22:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira,
	Sonny Rao, Brian Geffon

On Tue, Oct 22, 2019 at 3:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote:
> >
> > Thank you for the quick reply!
> >
> > On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote:
> > > >
> > > > Following a thread in linux-pm
> > > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
> > > > that may be of general interest.
> > > >
> > > > 1. To the best of my knowledge, Linux hibernation is guaranteed to
> > > > fail if more than 1/2 of total RAM is in use (for instance, by
> > > > anonymous pages).  My knowledge is based on evidence, experiments,
> > > > code inspection, the thread above, and a comment in
> > > > Documentation/swsusp.txt, copied here:
> > >
> > > So I use it on a regular basis (i.e. every day) on a system that often
> > > has over 50% or RAM in use and it all works.
> > >
> > > I also know about other people using it on a regular basis.
> > >
> > > For all of these users, it is usable.
> > >
> > > >  "Instead, we load the image into unused memory and then atomically
> > > > copy it back to it original location. This implies, of course, a
> > > > maximum image size of half the amount of memory."
> > >
> > > That isn't right any more.  An image that is loaded during resume can,
> > > in fact, be larger than 50% of RAM.  An image that is created during
> > > hibernation, however, cannot.
> >
> > Sorry, I don't understand this.  Are you saying that, for instance,
> > you can resume a 30 GB image on a 32 GB device, but that image could
> > only have been created on a 64 GB device?
>
> Had it been possible to create images larger than 50% of memory during
> hibernation, it would have been possible to load them during resume as
> well.
>
> The resume code doesn't have a 50% of RAM limitation, the image
> creation code does.

Thanks a lot for the clarifications.

It is possible that you and I have different definitions of "working
in general".  My main issue ia that I would like image creation (i.e.
entering hibernation) to work with >50% of RAM in use, and I am
extrapolating that other people would like that too.  I can see that
there are many uses where this is not needed though, especially if you
mostly care about resume.

>
> > > > 2. There's no simple/general workaround.  Rafael suggested on the
> > > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
> > > > before hibernation".  This is a good suggestion: I am actually close
> > > > to achieving this using memcgroups, but it's a fair amount of work,
> > > > and a fairly special case.  Not everybody uses memcgroups, and I don't
> > > > know of other reliable ways of forcing swap from user level.
> > >
> > > I don't need to do anything like that.
> >
> > Again, I don't understand.  Why did you make that suggestion then?
> >
> > > hibernate_preallocate_memory() manages to free a sufficient amount of
> > > memory on my system every time.
> >
> > Unfortunately this doesn't work for me.  I may have described a simple
> > experiment: on a 4GB device, create two large processes like this:
> >
> > dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> > dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> >
> > so that more than 50% of TotalMem is used for anonymous pages.  Then
> > echo disk > /sys/power/state fails with ENOMEM.
>
> I guess hibernate_preallocate_memory() is not able to free enough
> memory for itself in that case.
>
> > Is this supposed to work?
>
> Yes, it is, in general.
>
> > Maybe I am doing something wrong?
> > Hibernation works before I create the dd processes.  After I force
> > some of those pages to a separate swap device, hibernation works too,
> > so those pages aren't mlocked or anything.
>
> It looks like you are doing something that is not covered by
> hibernate_preallocate_memory().
>
> > > > 3. A feature that works only when 1/2 of total RAM can be allocated
> > > > is, in my opinion, not usable, except possibly under special
> > > > circumstances, such as mine. Most of the available articles and
> > > > documentation do not mention this important fact (but for the excerpt
> > > > I mentioned, which is not in a prominent position).
> > >
> > > It can be used with over 1/2 of RAM allocated and that is quite easy
> > > to demonstrate.
> > >
> > > Honestly, I'm not sure what your problem is really.
> >
> > I apologize if I am doing something stupid and I should know better
> > before I waste other people's time.  I have been trying to explain
> > these issues as best as I can.  I have a reproducible failure.  I'll
> > be happy to provide any additional detail.
>
> Simply put, hibernation, as implemented today, needs to allocate over
> 50% of RAM (or at least as much as to be able to copy all of the
> non-free pages) for image creation.  If it cannot do that, it will
> fail and you know how to prevent it from allocating enough memory in a
> reproducible way.  AFAICS that's a situation in which every attempt to
> allocate 50% of memory for any other purpose will fail as well.
>
> Frankly, you are first to report this problem, so it arguably is not
> common.  It looks like hibernate_preallocate_memory() may be improved
> to cover that case, but then the question is how much more complicated
> it will have to become for this purpose and whether or not that's
> worth pursuing.

Right.  I was hoping to discuss that.  Is it easier to do in the
kernel what I am trying to do at user level, i.e. force swap of excess
pages (possibly to a separate device or partition) so that enough
pages are freed up to make hibernate_preallocate_memory always
succeed?  I started reading the swap code, but it is entangled with
page reclaim and I haven't seen a simple solution, neither do I know
if there is one and how long it would take to find it, or code around
it.  (However I haven't looked yet at how it works when memcgroup
limits are lowered---that may give me good ideas).

Thanks!


>
> > >
> > > > Two questions then:
> > > >
> > > > A. Should the documentation be changed to reflect this fact more
> > > > clearly?  I feel that the current situation is a disservice to the
> > > > user community.
> > >
> > > Propose changes.
> >
> > Sure, after we resolve the above questions.
> >
> > > > B. Would it be worthwhile to improve the hibernation code to remove
> > > > this limitation?  Is this of interest to anybody (other than me)?
> > >
> > > Again, propose specific changes.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2019-10-22 21:26   ` Luigi Semenzato
@ 2019-10-22 22:13     ` Rafael J. Wysocki
  2019-10-22 22:53       ` Luigi Semenzato
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2019-10-22 22:13 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: Rafael J. Wysocki, linux-kernel, Linux PM, Andrew Morton,
	Geoff Pike, Bas Nowaira, Sonny Rao, Brian Geffon

On Tue, Oct 22, 2019 at 11:26 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> Thank you for the quick reply!
>
> On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > Following a thread in linux-pm
> > > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
> > > that may be of general interest.
> > >
> > > 1. To the best of my knowledge, Linux hibernation is guaranteed to
> > > fail if more than 1/2 of total RAM is in use (for instance, by
> > > anonymous pages).  My knowledge is based on evidence, experiments,
> > > code inspection, the thread above, and a comment in
> > > Documentation/swsusp.txt, copied here:
> >
> > So I use it on a regular basis (i.e. every day) on a system that often
> > has over 50% or RAM in use and it all works.
> >
> > I also know about other people using it on a regular basis.
> >
> > For all of these users, it is usable.
> >
> > >  "Instead, we load the image into unused memory and then atomically
> > > copy it back to it original location. This implies, of course, a
> > > maximum image size of half the amount of memory."
> >
> > That isn't right any more.  An image that is loaded during resume can,
> > in fact, be larger than 50% of RAM.  An image that is created during
> > hibernation, however, cannot.
>
> Sorry, I don't understand this.  Are you saying that, for instance,
> you can resume a 30 GB image on a 32 GB device, but that image could
> only have been created on a 64 GB device?

Had it been possible to create images larger than 50% of memory during
hibernation, it would have been possible to load them during resume as
well.

The resume code doesn't have a 50% of RAM limitation, the image
creation code does.

> > > 2. There's no simple/general workaround.  Rafael suggested on the
> > > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
> > > before hibernation".  This is a good suggestion: I am actually close
> > > to achieving this using memcgroups, but it's a fair amount of work,
> > > and a fairly special case.  Not everybody uses memcgroups, and I don't
> > > know of other reliable ways of forcing swap from user level.
> >
> > I don't need to do anything like that.
>
> Again, I don't understand.  Why did you make that suggestion then?
>
> > hibernate_preallocate_memory() manages to free a sufficient amount of
> > memory on my system every time.
>
> Unfortunately this doesn't work for me.  I may have described a simple
> experiment: on a 4GB device, create two large processes like this:
>
> dd if=/dev/zero bs=1100M count=1 | sleep infinity &
> dd if=/dev/zero bs=1100M count=1 | sleep infinity &
>
> so that more than 50% of TotalMem is used for anonymous pages.  Then
> echo disk > /sys/power/state fails with ENOMEM.

I guess hibernate_preallocate_memory() is not able to free enough
memory for itself in that case.

> Is this supposed to work?

Yes, it is, in general.

> Maybe I am doing something wrong?
> Hibernation works before I create the dd processes.  After I force
> some of those pages to a separate swap device, hibernation works too,
> so those pages aren't mlocked or anything.

It looks like you are doing something that is not covered by
hibernate_preallocate_memory().

> > > 3. A feature that works only when 1/2 of total RAM can be allocated
> > > is, in my opinion, not usable, except possibly under special
> > > circumstances, such as mine. Most of the available articles and
> > > documentation do not mention this important fact (but for the excerpt
> > > I mentioned, which is not in a prominent position).
> >
> > It can be used with over 1/2 of RAM allocated and that is quite easy
> > to demonstrate.
> >
> > Honestly, I'm not sure what your problem is really.
>
> I apologize if I am doing something stupid and I should know better
> before I waste other people's time.  I have been trying to explain
> these issues as best as I can.  I have a reproducible failure.  I'll
> be happy to provide any additional detail.

Simply put, hibernation, as implemented today, needs to allocate over
50% of RAM (or at least as much as to be able to copy all of the
non-free pages) for image creation.  If it cannot do that, it will
fail and you know how to prevent it from allocating enough memory in a
reproducible way.  AFAICS that's a situation in which every attempt to
allocate 50% of memory for any other purpose will fail as well.

Frankly, you are first to report this problem, so it arguably is not
common.  It looks like hibernate_preallocate_memory() may be improved
to cover that case, but then the question is how much more complicated
it will have to become for this purpose and whether or not that's
worth pursuing.

> >
> > > Two questions then:
> > >
> > > A. Should the documentation be changed to reflect this fact more
> > > clearly?  I feel that the current situation is a disservice to the
> > > user community.
> >
> > Propose changes.
>
> Sure, after we resolve the above questions.
>
> > > B. Would it be worthwhile to improve the hibernation code to remove
> > > this limitation?  Is this of interest to anybody (other than me)?
> >
> > Again, propose specific changes.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2019-10-22 20:57 ` Rafael J. Wysocki
@ 2019-10-22 21:26   ` Luigi Semenzato
  2019-10-22 22:13     ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Luigi Semenzato @ 2019-10-22 21:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira,
	Sonny Rao, Brian Geffon

Thank you for the quick reply!

On Tue, Oct 22, 2019 at 1:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote:
> >
> > Following a thread in linux-pm
> > (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
> > that may be of general interest.
> >
> > 1. To the best of my knowledge, Linux hibernation is guaranteed to
> > fail if more than 1/2 of total RAM is in use (for instance, by
> > anonymous pages).  My knowledge is based on evidence, experiments,
> > code inspection, the thread above, and a comment in
> > Documentation/swsusp.txt, copied here:
>
> So I use it on a regular basis (i.e. every day) on a system that often
> has over 50% or RAM in use and it all works.
>
> I also know about other people using it on a regular basis.
>
> For all of these users, it is usable.
>
> >  "Instead, we load the image into unused memory and then atomically
> > copy it back to it original location. This implies, of course, a
> > maximum image size of half the amount of memory."
>
> That isn't right any more.  An image that is loaded during resume can,
> in fact, be larger than 50% of RAM.  An image that is created during
> hibernation, however, cannot.

Sorry, I don't understand this.  Are you saying that, for instance,
you can resume a 30 GB image on a 32 GB device, but that image could
only have been created on a 64 GB device?

> > 2. There's no simple/general workaround.  Rafael suggested on the
> > thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
> > before hibernation".  This is a good suggestion: I am actually close
> > to achieving this using memcgroups, but it's a fair amount of work,
> > and a fairly special case.  Not everybody uses memcgroups, and I don't
> > know of other reliable ways of forcing swap from user level.
>
> I don't need to do anything like that.

Again, I don't understand.  Why did you make that suggestion then?

> hibernate_preallocate_memory() manages to free a sufficient amount of
> memory on my system every time.

Unfortunately this doesn't work for me.  I may have described a simple
experiment: on a 4GB device, create two large processes like this:

dd if=/dev/zero bs=1100M count=1 | sleep infinity &
dd if=/dev/zero bs=1100M count=1 | sleep infinity &

so that more than 50% of TotalMem is used for anonymous pages.  Then
echo disk > /sys/power/state fails with ENOMEM.

Is this supposed to work?  Maybe I am doing something wrong?
Hibernation works before I create the dd processes.  After I force
some of those pages to a separate swap device, hibernation works too,
so those pages aren't mlocked or anything.

> > 3. A feature that works only when 1/2 of total RAM can be allocated
> > is, in my opinion, not usable, except possibly under special
> > circumstances, such as mine. Most of the available articles and
> > documentation do not mention this important fact (but for the excerpt
> > I mentioned, which is not in a prominent position).
>
> It can be used with over 1/2 of RAM allocated and that is quite easy
> to demonstrate.
>
> Honestly, I'm not sure what your problem is really.

I apologize if I am doing something stupid and I should know better
before I waste other people's time.  I have been trying to explain
these issues as best as I can.  I have a reproducible failure.  I'll
be happy to provide any additional detail.

>
> > Two questions then:
> >
> > A. Should the documentation be changed to reflect this fact more
> > clearly?  I feel that the current situation is a disservice to the
> > user community.
>
> Propose changes.

Sure, after we resolve the above questions.

> > B. Would it be worthwhile to improve the hibernation code to remove
> > this limitation?  Is this of interest to anybody (other than me)?
>
> Again, propose specific changes.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: is hibernation usable?
  2019-10-22 20:09 Luigi Semenzato
@ 2019-10-22 20:57 ` Rafael J. Wysocki
  2019-10-22 21:26   ` Luigi Semenzato
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2019-10-22 20:57 UTC (permalink / raw)
  To: Luigi Semenzato
  Cc: linux-kernel, Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira,
	Rafael J. Wysocki, Sonny Rao, Brian Geffon

On Tue, Oct 22, 2019 at 10:09 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> Following a thread in linux-pm
> (https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
> that may be of general interest.
>
> 1. To the best of my knowledge, Linux hibernation is guaranteed to
> fail if more than 1/2 of total RAM is in use (for instance, by
> anonymous pages).  My knowledge is based on evidence, experiments,
> code inspection, the thread above, and a comment in
> Documentation/swsusp.txt, copied here:

So I use it on a regular basis (i.e. every day) on a system that often
has over 50% or RAM in use and it all works.

I also know about other people using it on a regular basis.

For all of these users, it is usable.

>  "Instead, we load the image into unused memory and then atomically
> copy it back to it original location. This implies, of course, a
> maximum image size of half the amount of memory."

That isn't right any more.  An image that is loaded during resume can,
in fact, be larger than 50% of RAM.  An image that is created during
hibernation, however, cannot.

> 2. There's no simple/general workaround.  Rafael suggested on the
> thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
> before hibernation".  This is a good suggestion: I am actually close
> to achieving this using memcgroups, but it's a fair amount of work,
> and a fairly special case.  Not everybody uses memcgroups, and I don't
> know of other reliable ways of forcing swap from user level.

I don't need to do anything like that.

hibernate_preallocate_memory() manages to free a sufficient amount of
memory on my system every time.

> 3. A feature that works only when 1/2 of total RAM can be allocated
> is, in my opinion, not usable, except possibly under special
> circumstances, such as mine. Most of the available articles and
> documentation do not mention this important fact (but for the excerpt
> I mentioned, which is not in a prominent position).

It can be used with over 1/2 of RAM allocated and that is quite easy
to demonstrate.

Honestly, I'm not sure what your problem is really.

> Two questions then:
>
> A. Should the documentation be changed to reflect this fact more
> clearly?  I feel that the current situation is a disservice to the
> user community.

Propose changes.

> B. Would it be worthwhile to improve the hibernation code to remove
> this limitation?  Is this of interest to anybody (other than me)?

Again, propose specific changes.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* is hibernation usable?
@ 2019-10-22 20:09 Luigi Semenzato
  2019-10-22 20:57 ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Luigi Semenzato @ 2019-10-22 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linux PM, Andrew Morton, Geoff Pike, Bas Nowaira,
	Rafael J. Wysocki, Sonny Rao, Brian Geffon

Following a thread in linux-pm
(https://marc.info/?l=linux-mm&m=157012300901871) I have some issues
that may be of general interest.

1. To the best of my knowledge, Linux hibernation is guaranteed to
fail if more than 1/2 of total RAM is in use (for instance, by
anonymous pages).  My knowledge is based on evidence, experiments,
code inspection, the thread above, and a comment in
Documentation/swsusp.txt, copied here:

 "Instead, we load the image into unused memory and then atomically
copy it back to it original location. This implies, of course, a
maximum image size of half the amount of memory."

2. There's no simple/general workaround.  Rafael suggested on the
thread "Whatever doesn't fit into 50% of RAM needs to be swapped out
before hibernation".  This is a good suggestion: I am actually close
to achieving this using memcgroups, but it's a fair amount of work,
and a fairly special case.  Not everybody uses memcgroups, and I don't
know of other reliable ways of forcing swap from user level.

3. A feature that works only when 1/2 of total RAM can be allocated
is, in my opinion, not usable, except possibly under special
circumstances, such as mine. Most of the available articles and
documentation do not mention this important fact (but for the excerpt
I mentioned, which is not in a prominent position).

Two questions then:

A. Should the documentation be changed to reflect this fact more
clearly?  I feel that the current situation is a disservice to the
user community.

B. Would it be worthwhile to improve the hibernation code to remove
this limitation?  Is this of interest to anybody (other than me)?

Thank you in advance!

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-02-27  6:44 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 19:50 is hibernation usable? Chris Murphy
2020-02-11 22:23 ` Luigi Semenzato
2020-02-20  2:54   ` Chris Murphy
2020-02-20  2:56     ` Chris Murphy
2020-02-20 17:16       ` Luigi Semenzato
2020-02-20 17:38         ` Luigi Semenzato
2020-02-21  8:49           ` Michal Hocko
2020-02-21  9:04             ` Rafael J. Wysocki
2020-02-21  9:04               ` Rafael J. Wysocki
2020-02-21  9:36               ` Michal Hocko
2020-02-21 17:13                 ` Luigi Semenzato
2020-02-21 17:13                   ` Luigi Semenzato
2020-02-21  9:46               ` Chris Murphy
2020-02-21  9:46                 ` Chris Murphy
2020-02-20 19:09         ` Chris Murphy
2020-02-20 19:44           ` Luigi Semenzato
2020-02-20 21:48             ` Chris Murphy
2020-02-20 21:48               ` Chris Murphy
2020-02-27  6:43             ` Chris Murphy
2020-02-27  6:43               ` Chris Murphy
  -- strict thread matches above, loose matches on Subject: below --
2019-10-22 20:09 Luigi Semenzato
2019-10-22 20:57 ` Rafael J. Wysocki
2019-10-22 21:26   ` Luigi Semenzato
2019-10-22 22:13     ` Rafael J. Wysocki
2019-10-22 22:53       ` Luigi Semenzato
2019-10-22 23:16         ` Rafael J. Wysocki
2019-10-22 23:25           ` Luigi Semenzato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.