All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, almasrymina@google.com,
	aneesh.kumar@linux.vnet.ibm.com, david@redhat.com,
	linux-mm@kvack.org, mhocko@kernel.org, mike.kravetz@oracle.com,
	mm-commits@vger.kernel.org, mprivozn@redhat.com, mst@redhat.com,
	songmuchun@bytedance.com, stable@vger.kernel.org, tj@kernel.org,
	torvalds@linux-foundation.org
Subject: [patch 02/15] hugetlb_cgroup: fix reservation accounting
Date: Sun, 01 Nov 2020 17:07:27 -0800	[thread overview]
Message-ID: <20201102010727.6e4pqN4-U%akpm@linux-foundation.org> (raw)
In-Reply-To: <20201101170656.48abbd5e88375219f868af5e@linux-foundation.org>

From: Mike Kravetz <mike.kravetz@oracle.com>
Subject: hugetlb_cgroup: fix reservation accounting

Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and hit the warning below.  QEMU with free page hinting
uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
as free by a VM. The reporting granularity is in pageblock granularity.
So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
one huge page in QEMU.

[  315.251417] ------------[ cut here ]------------
[  315.251424] WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[  315.251425] Modules linked in: ...
[  315.251466] CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[  315.251467] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[  315.251469] RIP: 0010:page_counter_uncharge+0x4b/0x50
...
[  315.251479] Call Trace:
[  315.251485]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[  315.251487]  region_del+0x1d3/0x300
[  315.251489]  hugetlb_unreserve_pages+0x39/0xb0
[  315.251492]  remove_inode_hugepages+0x1a8/0x3d0
[  315.251495]  ? tlb_finish_mmu+0x7a/0x1d0
[  315.251497]  hugetlbfs_fallocate+0x3c4/0x5c0
[  315.251519]  ? kvm_arch_vcpu_ioctl_run+0x614/0x1700 [kvm]
[  315.251522]  ? file_has_perm+0xa2/0xb0
[  315.251524]  ? inode_security+0xc/0x60
[  315.251525]  ? selinux_file_permission+0x4e/0x120
[  315.251527]  vfs_fallocate+0x146/0x290
[  315.251529]  __x64_sys_fallocate+0x3e/0x70
[  315.251531]  do_syscall_64+0x33/0x40
[  315.251533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
[  315.251542] ---[ end trace 4c88c62ccb1349c9 ]---

Investigation of the issue uncovered bugs in hugetlb cgroup reservation
accounting.  This patch addresses the found issues.

Link: https://lkml.kernel.org/r/20201021204426.36069-1-mike.kravetz@oracle.com
Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
Cc: <stable@vger.kernel.org>
Reported-by: Michal Privoznik <mprivozn@redhat.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

--- a/mm/hugetlb.c~hugetlb_cgroup-fix-reservation-accounting
+++ a/mm/hugetlb.c
@@ -648,6 +648,8 @@ retry:
 			}
 
 			del += t - f;
+			hugetlb_cgroup_uncharge_file_region(
+				resv, rg, t - f);
 
 			/* New entry for end of split region */
 			nrg->from = t;
@@ -660,9 +662,6 @@ retry:
 			/* Original entry is trimmed */
 			rg->to = f;
 
-			hugetlb_cgroup_uncharge_file_region(
-				resv, rg, nrg->to - nrg->from);
-
 			list_add(&nrg->link, &rg->link);
 			nrg = NULL;
 			break;
@@ -678,17 +677,17 @@ retry:
 		}
 
 		if (f <= rg->from) {	/* Trim beginning of region */
-			del += t - rg->from;
-			rg->from = t;
-
 			hugetlb_cgroup_uncharge_file_region(resv, rg,
 							    t - rg->from);
-		} else {		/* Trim end of region */
-			del += rg->to - f;
-			rg->to = f;
 
+			del += t - rg->from;
+			rg->from = t;
+		} else {		/* Trim end of region */
 			hugetlb_cgroup_uncharge_file_region(resv, rg,
 							    rg->to - f);
+
+			del += rg->to - f;
+			rg->to = f;
 		}
 	}
 
@@ -2443,6 +2442,9 @@ struct page *alloc_huge_page(struct vm_a
 
 		rsv_adjust = hugepage_subpool_put_pages(spool, 1);
 		hugetlb_acct_memory(h, -rsv_adjust);
+		if (deferred_reserve)
+			hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
+					pages_per_huge_page(h), page);
 	}
 	return page;
 
_

  parent reply	other threads:[~2020-11-02  1:07 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02  1:06 incoming Andrew Morton
2020-11-02  1:07 ` [patch 01/15] mm/mremap_pages: fix static key devmap_managed_key updates Andrew Morton
2020-11-02  1:07 ` Andrew Morton [this message]
2020-11-02  1:07 ` [patch 03/15] mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg Andrew Morton
2020-11-02  1:07 ` [patch 04/15] mm: memcg: link page counters to root if use_hierarchy is false Andrew Morton
2020-11-02  1:07   ` [LTP] " Andrew Morton
2020-11-02  1:07 ` [patch 05/15] kasan: adopt KUNIT tests to SW_TAGS mode Andrew Morton
2020-11-02  1:07 ` [patch 06/15] mm: mempolicy: fix potential pte_unmap_unlock pte error Andrew Morton
2020-11-02  1:07 ` [patch 07/15] ptrace: fix task_join_group_stop() for the case when current is traced Andrew Morton
2020-11-02  1:07 ` [patch 08/15] lib/crc32test: remove extra local_irq_disable/enable Andrew Morton
2020-11-02  1:07 ` [patch 09/15] mm/truncate.c: make __invalidate_mapping_pages() static Andrew Morton
2020-11-02  1:07 ` [patch 10/15] kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled Andrew Morton
2020-11-02  1:07 ` [patch 11/15] mm, oom: keep oom_adj under or at upper limit when printing Andrew Morton
2020-11-02  1:08 ` [patch 12/15] mm: always have io_remap_pfn_range() set pgprot_decrypted() Andrew Morton
2020-11-02  1:08 ` [patch 13/15] epoll: check ep_events_available() upon timeout Andrew Morton
2020-11-02 17:08   ` Linus Torvalds
2020-11-02 17:08     ` Linus Torvalds
2020-11-02 17:48     ` Soheil Hassas Yeganeh
2020-11-02 17:48       ` Soheil Hassas Yeganeh
2020-11-02 18:51       ` Linus Torvalds
2020-11-02 18:51         ` Linus Torvalds
2020-11-02 19:38         ` Linus Torvalds
2020-11-02 19:38           ` Linus Torvalds
2020-11-02 19:54         ` Soheil Hassas Yeganeh
2020-11-02 19:54           ` Soheil Hassas Yeganeh
2020-11-02 20:12           ` Linus Torvalds
2020-11-02 20:12             ` Linus Torvalds
2020-11-02  1:08 ` [patch 14/15] epoll: add a selftest for epoll timeout race Andrew Morton
2020-11-02  1:08 ` [patch 15/15] kernel/hung_task.c: make type annotations consistent Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201102010727.6e4pqN4-U%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=songmuchun@bytedance.com \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.