linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: kvm@vger.kernel.org, cohuck@redhat.com, borntraeger@de.ibm.com,
	frankja@linux.ibm.com, thuth@redhat.com, pasic@linux.ibm.com,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ulrich.Weigand@de.ibm.com,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v3 00/14] KVM: s390: pv: implement lazy destroy
Date: Fri, 6 Aug 2021 13:30:21 +0200	[thread overview]
Message-ID: <ada27c6d-4dc9-04c3-d5b9-566e65359701@redhat.com> (raw)
In-Reply-To: <20210806113005.0259d53c@p-imbrenda>

>>> This means that the same address space can have memory belonging to
>>> more than one protected guest, although only one will be running,
>>> the others will in fact not even have any CPUs.
>>
>> ... this ...
> 
> this ^ is exactly the reboot case.

Ah, right, we're having more than one protected guest per process, so 
it's all handled within the same process.

> 
>>> When a guest is destroyed, its memory still counts towards its
>>> memory control group until it's actually freed (I tested this
>>> experimentally)
>>>
>>> When the system runs out of memory, if a guest has terminated and
>>> its memory is being cleaned asynchronously, the OOM killer will
>>> wait a little and then see if memory has been freed. This has the
>>> practical effect of slowing down memory allocations when the system
>>> is out of memory to give the cleanup thread time to cleanup and
>>> free memory, and avoid an actual OOM situation.
>>
>> ... and this sound like the kind of arch MM hacks that will bite us
>> in the long run. Of course, I might be wrong, but already doing
>> excessive GFP_ATOMIC allocations or messing with the OOM killer that
> 
> they are GFP_ATOMIC but they should not put too much weight on the
> memory and can also fail without consequences, I used:
> 
> GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN
> 
> also notice that after every page allocation a page gets freed, so this
> is only temporary.

Correct me if I'm wrong: you're allocating unmovable pages for tracking 
(e.g., ZONE_DMA, ZONE_NORMAL) from atomic reserves and will free a 
movable process page, correct? Or which page will you be freeing?

> 
> I would not call it "messing with the OOM killer", I'm using the same
> interface used by virtio-baloon

Right, and for virtio-balloon it's actually a workaround to restore the 
original behavior of a rarely used feature: deflate-on-oom. Commit 
da10329cb057 ("virtio-balloon: switch back to OOM handler for 
VIRTIO_BALLOON_F_DEFLATE_ON_OOM") tried to document why we switched back 
from a shrinker to VIRTIO_BALLOON_F_DEFLATE_ON_OOM:

"The name "deflate on OOM" makes it pretty clear when deflation should
  happen - after other approaches to reclaim memory failed, not while
  reclaiming. This allows to minimize the footprint of a guest - memory
  will only be taken out of the balloon when really needed."

Note some subtle differences:

a) IIRC, before running into the OOM killer, will try reclaiming
    anything  else. This is what we want for deflate-on-oom, it might not
    be what you want for your feature (e.g., flushing other processes/VMs
    to disk/swap instead of waiting for a single process to stop).

b) Migration of movable balloon inflated pages continues working because
    we are dealing with non-lru page migration.

Will page reclaim, page migration, compaction, ... of these movable LRU 
pages still continue working while they are sitting around waiting to be 
cleaned up? I can see that we're grabbing an extra reference when we put 
them onto the list, that might be a problem: for example, we can most 
certainly not swap out these pages or write them back to disk on memory 
pressure.

> 
>> way for a pure (shutdown) optimization is an alarm signal. Of course,
>> I might be wrong.
>>
>> You should at least CC linux-mm. I'll do that right now and also CC
>> Michal. He might have time to have a quick glimpse at patch #11 and
>> #13.
>>
>> https://lkml.kernel.org/r/20210804154046.88552-12-imbrenda@linux.ibm.com
>> https://lkml.kernel.org/r/20210804154046.88552-14-imbrenda@linux.ibm.com
>>
>> IMHO, we should proceed with patch 1-10, as they solve a really
>> important problem ("slow reboots") in a nice way, whereby patch 11
>> handles a case that can be worked around comparatively easily by
>> management tools -- my 2 cents.
> 
> how would management tools work around the issue that a shutdown can
> take very long?

The traditional approach is to wait starting a new VM on another 
hypervisor instead until memory has been freed up, or start it on 
another hypervisor. That raises the question about the target use case.

What I don't get is that we have to pay the price for freeing up that 
memory. Why isn't it sufficient to keep the process running and let 
ordinary MM do it's thing?

Maybe you should clearly spell out what the target use case for the fast 
shutdown (fast quitting of the process?) is?. I assume it is, starting a 
new VM / process / whatsoever on the same host immediately, and then

a) Eventually slowing down other processes due heavy reclaim.
b) Slowing down the new process because you have to pay the price of 
cleaning up memory.

I think I am missing why we need the lazy destroy at all when killing a 
process. Couldn't you instead teach the OOM killer "hey, we're currently 
quitting a heavy process that is just *very* slow to free up memory, 
please wait for that before starting shooting around" ?

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-08-06 11:30 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 15:40 [PATCH v3 00/14] KVM: s390: pv: implement lazy destroy Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 01/14] KVM: s390: pv: add macros for UVC CC values Claudio Imbrenda
2021-08-06  7:26   ` David Hildenbrand
2021-08-06  9:34     ` Claudio Imbrenda
2021-08-06 15:15     ` Janosch Frank
2021-08-04 15:40 ` [PATCH v3 02/14] KVM: s390: pv: avoid stall notifications for some UVCs Claudio Imbrenda
2021-08-06  7:30   ` David Hildenbrand
2021-08-06  9:33     ` Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 03/14] KVM: s390: pv: leak the ASCE page when destroy fails Claudio Imbrenda
2021-08-06  7:31   ` David Hildenbrand
2021-08-06  9:32     ` Claudio Imbrenda
2021-08-06 11:39       ` David Hildenbrand
2021-08-04 15:40 ` [PATCH v3 04/14] KVM: s390: pv: properly handle page flags for protected guests Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 05/14] KVM: s390: pv: handle secure storage violations " Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 06/14] KVM: s390: pv: handle secure storage exceptions for normal guests Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 07/14] KVM: s390: pv: refactor s390_reset_acc Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 08/14] KVM: s390: pv: usage counter instead of flag Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 09/14] KVM: s390: pv: add export before import Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 10/14] KVM: s390: pv: lazy destroy for reboot Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 11/14] KVM: s390: pv: extend lazy destroy to handle shutdown Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 12/14] KVM: s390: pv: module parameter to fence lazy destroy Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 13/14] KVM: s390: pv: add OOM notifier for " Claudio Imbrenda
2021-08-04 15:40 ` [PATCH v3 14/14] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
2021-08-06  7:10 ` [PATCH v3 00/14] KVM: s390: pv: implement lazy destroy David Hildenbrand
2021-08-06  9:30   ` Claudio Imbrenda
2021-08-06 11:30     ` David Hildenbrand [this message]
2021-08-06 13:44       ` Claudio Imbrenda
2021-08-09  8:50         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ada27c6d-4dc9-04c3-d5b9-566e65359701@redhat.com \
    --to=david@redhat.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=pasic@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).