All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] kvm, mm: account shadow page tables to kmemcg
@ 2018-06-29 14:02 ` Shakeel Butt
  0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2018-06-29 14:02 UTC (permalink / raw)
  To: Paolo Bonzini, Andrew Morton
  Cc: linux-kernel, kvm, linux-mm, Shakeel Butt, Michal Hocko,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

The size of kvm's shadow page tables corresponds to the size of the
guest virtual machines on the system. Large VMs can spend a significant
amount of memory as shadow page tables which can not be left as system
memory overhead. So, account shadow page tables to the kmemcg.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
---
Changelog since v1:
- replaced (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT

 arch/x86/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690d8b95..6b8f11521c41 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
 	if (cache->nobjs >= min)
 		return 0;
 	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-		page = (void *)__get_free_page(GFP_KERNEL);
+		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
 		if (!page)
 			return -ENOMEM;
 		cache->objects[cache->nobjs++] = page;
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2] kvm, mm: account shadow page tables to kmemcg
@ 2018-06-29 14:02 ` Shakeel Butt
  0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2018-06-29 14:02 UTC (permalink / raw)
  To: Paolo Bonzini, Andrew Morton
  Cc: linux-kernel, kvm, linux-mm, Shakeel Butt, Michal Hocko,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

The size of kvm's shadow page tables corresponds to the size of the
guest virtual machines on the system. Large VMs can spend a significant
amount of memory as shadow page tables which can not be left as system
memory overhead. So, account shadow page tables to the kmemcg.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Radim KrA?mA!A? <rkrcmar@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
---
Changelog since v1:
- replaced (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT

 arch/x86/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690d8b95..6b8f11521c41 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
 	if (cache->nobjs >= min)
 		return 0;
 	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-		page = (void *)__get_free_page(GFP_KERNEL);
+		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
 		if (!page)
 			return -ENOMEM;
 		cache->objects[cache->nobjs++] = page;
-- 
2.18.0.rc2.346.g013aa6912e-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] kvm, mm: account shadow page tables to kmemcg
  2018-06-29 14:02 ` Shakeel Butt
@ 2018-06-29 14:30   ` Michal Hocko
  -1 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2018-06-29 14:30 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Paolo Bonzini, Andrew Morton, linux-kernel, kvm, linux-mm,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

On Fri 29-06-18 07:02:24, Shakeel Butt wrote:
> The size of kvm's shadow page tables corresponds to the size of the
> guest virtual machines on the system. Large VMs can spend a significant
> amount of memory as shadow page tables which can not be left as system
> memory overhead. So, account shadow page tables to the kmemcg.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: stable@vger.kernel.org

I am not familiar wtih kvm to judge but if we are going to account this
memory we will probably want to let oom_badness know how much memory
to account to a specific process. Is this something that we can do?
We will probably need a new MM_KERNEL rss_stat stat for that purpose.

Just to make it clear. I am not opposing to this patch but considering
that shadow page tables might consume a lot of memory it would be good
to know who is responsible for it from the OOM perspective. Something to
solve on top of this.

I would also love to see a note how this memory is bound to the owner
life time in the changelog. That would make the review much more easier.

> ---
> Changelog since v1:
> - replaced (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT
> 
>  arch/x86/kvm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index d594690d8b95..6b8f11521c41 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
>  	if (cache->nobjs >= min)
>  		return 0;
>  	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> -		page = (void *)__get_free_page(GFP_KERNEL);
> +		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
>  		if (!page)
>  			return -ENOMEM;
>  		cache->objects[cache->nobjs++] = page;
> -- 
> 2.18.0.rc2.346.g013aa6912e-goog

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] kvm, mm: account shadow page tables to kmemcg
@ 2018-06-29 14:30   ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2018-06-29 14:30 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Paolo Bonzini, Andrew Morton, linux-kernel, kvm, linux-mm,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

On Fri 29-06-18 07:02:24, Shakeel Butt wrote:
> The size of kvm's shadow page tables corresponds to the size of the
> guest virtual machines on the system. Large VMs can spend a significant
> amount of memory as shadow page tables which can not be left as system
> memory overhead. So, account shadow page tables to the kmemcg.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Radim KrA?mA!A? <rkrcmar@redhat.com>
> Cc: Peter Feiner <pfeiner@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: stable@vger.kernel.org

I am not familiar wtih kvm to judge but if we are going to account this
memory we will probably want to let oom_badness know how much memory
to account to a specific process. Is this something that we can do?
We will probably need a new MM_KERNEL rss_stat stat for that purpose.

Just to make it clear. I am not opposing to this patch but considering
that shadow page tables might consume a lot of memory it would be good
to know who is responsible for it from the OOM perspective. Something to
solve on top of this.

I would also love to see a note how this memory is bound to the owner
life time in the changelog. That would make the review much more easier.

> ---
> Changelog since v1:
> - replaced (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT
> 
>  arch/x86/kvm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index d594690d8b95..6b8f11521c41 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
>  	if (cache->nobjs >= min)
>  		return 0;
>  	while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
> -		page = (void *)__get_free_page(GFP_KERNEL);
> +		page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
>  		if (!page)
>  			return -ENOMEM;
>  		cache->objects[cache->nobjs++] = page;
> -- 
> 2.18.0.rc2.346.g013aa6912e-goog

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] kvm, mm: account shadow page tables to kmemcg
  2018-06-29 14:30   ` Michal Hocko
  (?)
@ 2018-06-29 14:40   ` Paolo Bonzini
  2018-06-29 14:55     ` Michal Hocko
  -1 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2018-06-29 14:40 UTC (permalink / raw)
  To: Michal Hocko, Shakeel Butt
  Cc: Andrew Morton, linux-kernel, kvm, linux-mm, Johannes Weiner,
	Vladimir Davydov, Greg Thelen, Radim Krčmář,
	Peter Feiner, stable

On 29/06/2018 16:30, Michal Hocko wrote:
> I am not familiar wtih kvm to judge but if we are going to account this
> memory we will probably want to let oom_badness know how much memory
> to account to a specific process. Is this something that we can do?
> We will probably need a new MM_KERNEL rss_stat stat for that purpose.
> 
> Just to make it clear. I am not opposing to this patch but considering
> that shadow page tables might consume a lot of memory it would be good
> to know who is responsible for it from the OOM perspective. Something to
> solve on top of this.

The amount of memory is generally proportional to the size of the
virtual machine memory, which is reflected directly into RSS.  Because
KVM processes are usually huge, and will probably dwarf everything else
in the system (except firefox and chromium of course :)), the general
order of magnitude of the oom_badness should be okay.

> I would also love to see a note how this memory is bound to the owner
> life time in the changelog. That would make the review much more easier.

--verbose for people that aren't well versed in linux mm, please...

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] kvm, mm: account shadow page tables to kmemcg
  2018-06-29 14:40   ` Paolo Bonzini
@ 2018-06-29 14:55     ` Michal Hocko
  2018-06-29 15:43       ` Paolo Bonzini
  2018-06-29 16:47       ` Shakeel Butt
  0 siblings, 2 replies; 8+ messages in thread
From: Michal Hocko @ 2018-06-29 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Shakeel Butt, Andrew Morton, linux-kernel, kvm, linux-mm,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

On Fri 29-06-18 16:40:23, Paolo Bonzini wrote:
> On 29/06/2018 16:30, Michal Hocko wrote:
> > I am not familiar wtih kvm to judge but if we are going to account this
> > memory we will probably want to let oom_badness know how much memory
> > to account to a specific process. Is this something that we can do?
> > We will probably need a new MM_KERNEL rss_stat stat for that purpose.
> > 
> > Just to make it clear. I am not opposing to this patch but considering
> > that shadow page tables might consume a lot of memory it would be good
> > to know who is responsible for it from the OOM perspective. Something to
> > solve on top of this.
> 
> The amount of memory is generally proportional to the size of the
> virtual machine memory, which is reflected directly into RSS.  Because
> KVM processes are usually huge, and will probably dwarf everything else
> in the system (except firefox and chromium of course :)), the general
> order of magnitude of the oom_badness should be okay.

I think we will need MM_KERNEL longterm anyway. As I've said this is not
a must for this patch to go. But it is better to have a fair comparision
and kill larger processes if at all possible. It seems this should be
the case here.
 
> > I would also love to see a note how this memory is bound to the owner
> > life time in the changelog. That would make the review much more easier.
> 
> --verbose for people that aren't well versed in linux mm, please...

Well, if the memory accounted to the memcg hits the hard limit and there
is no way to reclaim anything to reduce the charged memory then we have
to kill something. Hopefully the memory hog. If that one dies it would
be great it releases its charges along the way. My remark was just to
explain how that would happen for this specific type of memory. Bound to
a file, has its own tear down etc. Basically make life of reviewers
easier to understand the lifetime of charged objects without digging
deep into the specific subsystem.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] kvm, mm: account shadow page tables to kmemcg
  2018-06-29 14:55     ` Michal Hocko
@ 2018-06-29 15:43       ` Paolo Bonzini
  2018-06-29 16:47       ` Shakeel Butt
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2018-06-29 15:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Andrew Morton, linux-kernel, kvm, linux-mm,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

On 29/06/2018 16:55, Michal Hocko wrote:
>>> I would also love to see a note how this memory is bound to the owner
>>> life time in the changelog. That would make the review much more easier.
>> --verbose for people that aren't well versed in linux mm, please...
> Well, if the memory accounted to the memcg hits the hard limit and there
> is no way to reclaim anything to reduce the charged memory then we have
> to kill something. Hopefully the memory hog. If that one dies it would
> be great it releases its charges along the way. My remark was just to
> explain how that would happen for this specific type of memory. Bound to
> a file, has its own tear down etc. Basically make life of reviewers
> easier to understand the lifetime of charged objects without digging
> deep into the specific subsystem.

Oh I see.  Yes, it's all freed when the VM file descriptor (which you
get with a ioctl on /dev/kvm) is closed.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] kvm, mm: account shadow page tables to kmemcg
  2018-06-29 14:55     ` Michal Hocko
  2018-06-29 15:43       ` Paolo Bonzini
@ 2018-06-29 16:47       ` Shakeel Butt
  1 sibling, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2018-06-29 16:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Paolo Bonzini, Andrew Morton, LKML, kvm, Linux MM,
	Johannes Weiner, Vladimir Davydov, Greg Thelen,
	Radim Krčmář,
	Peter Feiner, stable

On Fri, Jun 29, 2018 at 7:55 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 29-06-18 16:40:23, Paolo Bonzini wrote:
> > On 29/06/2018 16:30, Michal Hocko wrote:
> > > I am not familiar wtih kvm to judge but if we are going to account this
> > > memory we will probably want to let oom_badness know how much memory
> > > to account to a specific process. Is this something that we can do?
> > > We will probably need a new MM_KERNEL rss_stat stat for that purpose.
> > >
> > > Just to make it clear. I am not opposing to this patch but considering
> > > that shadow page tables might consume a lot of memory it would be good
> > > to know who is responsible for it from the OOM perspective. Something to
> > > solve on top of this.
> >
> > The amount of memory is generally proportional to the size of the
> > virtual machine memory, which is reflected directly into RSS.  Because
> > KVM processes are usually huge, and will probably dwarf everything else
> > in the system (except firefox and chromium of course :)), the general
> > order of magnitude of the oom_badness should be okay.
>
> I think we will need MM_KERNEL longterm anyway. As I've said this is not
> a must for this patch to go. But it is better to have a fair comparision
> and kill larger processes if at all possible. It seems this should be
> the case here.
>

I will look more into MM_KERNEL counter. I still have couple more kmem
allocations in kvm (like dirty bitmap) which I want to be accounted. I
will bundle them together.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-29 16:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 14:02 [PATCH v2] kvm, mm: account shadow page tables to kmemcg Shakeel Butt
2018-06-29 14:02 ` Shakeel Butt
2018-06-29 14:30 ` Michal Hocko
2018-06-29 14:30   ` Michal Hocko
2018-06-29 14:40   ` Paolo Bonzini
2018-06-29 14:55     ` Michal Hocko
2018-06-29 15:43       ` Paolo Bonzini
2018-06-29 16:47       ` Shakeel Butt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.