[patch 01/16] Revert "mm: madvise: skip unmapped vma holes passed to process

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 01/16] Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
       [not found] <20220401112740.351496714b370467a92207a6@linux-foundation.org>
@ 2022-04-01 18:28 ` Andrew Morton
  2022-04-01 18:28 ` [patch 02/16] ocfs2: fix crash when mount with quota enabled Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2022-04-01 18:28 UTC (permalink / raw)
  To: vbabka, surenb, stable, rientjes, nadav.amit, mhocko,
	quic_charante, akpm, patches, linux-mm, mm-commits, torvalds,
	akpm

From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"

This reverts commit 08095d6310a7 ("mm: madvise: skip unmapped vma holes
passed to process_madvise") as process_madvise() fails to return the exact
processed bytes in other cases too.  As an example: if process_madvise()
hits mlocked pages after processing some initial bytes passed in [start,
end), it just returns EINVAL although some bytes are processed.  Thus
making an exception only for ENOMEM is partially fixing the problem of
returning the proper advised bytes.

Thus revert this patch and return proper bytes advised.

Link: https://lkml.kernel.org/r/e73da1304a88b6a8a11907045117cccf4c2b8374.1648046642.git.quic_charante@quicinc.com
Fixes: 08095d6310a7ce ("mm: madvise: skip unmapped vma holes passed to process_madvise")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

--- a/mm/madvise.c~revert-mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
+++ a/mm/madvise.c
@@ -1464,16 +1464,9 @@ SYSCALL_DEFINE5(process_madvise, int, pi
 
 	while (iov_iter_count(&iter)) {
 		iovec = iov_iter_iovec(&iter);
-		/*
-		 * do_madvise returns ENOMEM if unmapped holes are present
-		 * in the passed VMA. process_madvise() is expected to skip
-		 * unmapped holes passed to it in the 'struct iovec' list
-		 * and not fail because of them. Thus treat -ENOMEM return
-		 * from do_madvise as valid and continue processing.
-		 */
 		ret = do_madvise(mm, (unsigned long)iovec.iov_base,
 					iovec.iov_len, behavior);
-		if (ret < 0 && ret != -ENOMEM)
+		if (ret < 0)
 			break;
 		iov_iter_advance(&iter, iovec.iov_len);
 	}
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch 02/16] ocfs2: fix crash when mount with quota enabled
       [not found] <20220401112740.351496714b370467a92207a6@linux-foundation.org>
  2022-04-01 18:28 ` [patch 01/16] Revert "mm: madvise: skip unmapped vma holes passed to process_madvise" Andrew Morton
@ 2022-04-01 18:28 ` Andrew Morton
  2022-04-01 18:28 ` [patch 11/16] mm,hwpoison: unmap poisoned page before invalidation Andrew Morton
  2022-04-01 18:28 ` [patch 15/16] mm/kmemleak: reset tag when compare object pointer Andrew Morton
  3 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2022-04-01 18:28 UTC (permalink / raw)
  To: vvidic, stable, sathlerds, joseph.qi, akpm, patches, linux-mm,
	mm-commits, torvalds, akpm

From: Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: ocfs2: fix crash when mount with quota enabled

There is a reported crash when mounting ocfs2 with quota enabled.

RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2]
Call Trace:
<TASK>
ocfs2_local_read_info+0xb9/0x6f0 [ocfs2]
? ocfs2_local_check_quota_file+0x197/0x390 [ocfs2]
dquot_load_quota_sb+0x216/0x470
? preempt_count_add+0x68/0xa0
dquot_load_quota_inode+0x85/0x100
ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2]
ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2]
mount_bdev+0x185/0x1b0
? ocfs2_initialize_super.isra.0+0xf40/0xf40 [ocfs2]
legacy_get_tree+0x27/0x40
vfs_get_tree+0x25/0xb0
path_mount+0x465/0xac0
__x64_sys_mount+0x103/0x140
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>

It is caused by when initializing dqi_gqlock, the corresponding dqi_type
and dqi_sb are not properly initialized.  This issue is introduced by
commit 6c85c2c72819, which wants to avoid accessing uninitialized
variables in error cases.  So make global quota info properly initialized.

Link: https://lkml.kernel.org/r/20220323023644.40084-1-joseph.qi@linux.alibaba.com
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141
Fixes: 6c85c2c72819 ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Dayvison <sathlerds@gmail.com>
Tested-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/quota_global.c |   23 ++++++++++++-----------
 fs/ocfs2/quota_local.c  |    2 --
 2 files changed, 12 insertions(+), 13 deletions(-)

--- a/fs/ocfs2/quota_global.c~ocfs2-fix-crash-when-mount-with-quota-enabled
+++ a/fs/ocfs2/quota_global.c
@@ -337,7 +337,6 @@ void ocfs2_unlock_global_qf(struct ocfs2
 /* Read information header from global quota file */
 int ocfs2_global_read_info(struct super_block *sb, int type)
 {
-	struct inode *gqinode = NULL;
 	unsigned int ino[OCFS2_MAXQUOTAS] = { USER_QUOTA_SYSTEM_INODE,
 					      GROUP_QUOTA_SYSTEM_INODE };
 	struct ocfs2_global_disk_dqinfo dinfo;
@@ -346,29 +345,31 @@ int ocfs2_global_read_info(struct super_
 	u64 pcount;
 	int status;
 
+	oinfo->dqi_gi.dqi_sb = sb;
+	oinfo->dqi_gi.dqi_type = type;
+	ocfs2_qinfo_lock_res_init(&oinfo->dqi_gqlock, oinfo);
+	oinfo->dqi_gi.dqi_entry_size = sizeof(struct ocfs2_global_disk_dqblk);
+	oinfo->dqi_gi.dqi_ops = &ocfs2_global_ops;
+	oinfo->dqi_gqi_bh = NULL;
+	oinfo->dqi_gqi_count = 0;
+
 	/* Read global header */
-	gqinode = ocfs2_get_system_file_inode(OCFS2_SB(sb), ino[type],
+	oinfo->dqi_gqinode = ocfs2_get_system_file_inode(OCFS2_SB(sb), ino[type],
 			OCFS2_INVALID_SLOT);
-	if (!gqinode) {
+	if (!oinfo->dqi_gqinode) {
 		mlog(ML_ERROR, "failed to get global quota inode (type=%d)\n",
 			type);
 		status = -EINVAL;
 		goto out_err;
 	}
-	oinfo->dqi_gi.dqi_sb = sb;
-	oinfo->dqi_gi.dqi_type = type;
-	oinfo->dqi_gi.dqi_entry_size = sizeof(struct ocfs2_global_disk_dqblk);
-	oinfo->dqi_gi.dqi_ops = &ocfs2_global_ops;
-	oinfo->dqi_gqi_bh = NULL;
-	oinfo->dqi_gqi_count = 0;
-	oinfo->dqi_gqinode = gqinode;
+
 	status = ocfs2_lock_global_qf(oinfo, 0);
 	if (status < 0) {
 		mlog_errno(status);
 		goto out_err;
 	}
 
-	status = ocfs2_extent_map_get_blocks(gqinode, 0, &oinfo->dqi_giblk,
+	status = ocfs2_extent_map_get_blocks(oinfo->dqi_gqinode, 0, &oinfo->dqi_giblk,
 					     &pcount, NULL);
 	if (status < 0)
 		goto out_unlock;
--- a/fs/ocfs2/quota_local.c~ocfs2-fix-crash-when-mount-with-quota-enabled
+++ a/fs/ocfs2/quota_local.c
@@ -702,8 +702,6 @@ static int ocfs2_local_read_info(struct
 	info->dqi_priv = oinfo;
 	oinfo->dqi_type = type;
 	INIT_LIST_HEAD(&oinfo->dqi_chunk);
-	oinfo->dqi_gqinode = NULL;
-	ocfs2_qinfo_lock_res_init(&oinfo->dqi_gqlock, oinfo);
 	oinfo->dqi_rec = NULL;
 	oinfo->dqi_lqi_bh = NULL;
 	oinfo->dqi_libh = NULL;
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch 11/16] mm,hwpoison: unmap poisoned page before invalidation
       [not found] <20220401112740.351496714b370467a92207a6@linux-foundation.org>
  2022-04-01 18:28 ` [patch 01/16] Revert "mm: madvise: skip unmapped vma holes passed to process_madvise" Andrew Morton
  2022-04-01 18:28 ` [patch 02/16] ocfs2: fix crash when mount with quota enabled Andrew Morton
@ 2022-04-01 18:28 ` Andrew Morton
  2022-04-01 18:28 ` [patch 15/16] mm/kmemleak: reset tag when compare object pointer Andrew Morton
  3 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2022-04-01 18:28 UTC (permalink / raw)
  To: stable, osalvador, naoya.horiguchi, mgorman, linmiaohe, hannes,
	riel, akpm, patches, linux-mm, mm-commits, torvalds, akpm

From: Rik van Riel <riel@surriel.com>
Subject: mm,hwpoison: unmap poisoned page before invalidation

In some cases it appears the invalidation of a hwpoisoned page fails
because the page is still mapped in another process.  This can cause a
program to be continuously restarted and die when it page faults on the
page that was not invalidated.  Avoid that problem by unmapping the
hwpoisoned page when we find it.

Another issue is that sometimes we end up oopsing in finish_fault, if the
code tries to do something with the now-NULL vmf->page.  I did not hit
this error when submitting the previous patch because there are several
opportunities for alloc_set_pte to bail out before accessing vmf->page,
and that apparently happened on those systems, and most of the time on
other systems, too.

However, across several million systems that error does occur a handful of
times a day.  It can be avoided by returning VM_FAULT_NOPAGE which will
cause do_read_fault to return before calling finish_fault.

Link: https://lkml.kernel.org/r/20220325161428.5068d97e@imladris.surriel.com
Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/mm/memory.c~mmhwpoison-unmap-poisoned-page-before-invalidation
+++ a/mm/memory.c
@@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_f
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
+		struct page *page = vmf->page;
 		vm_fault_t poisonret = VM_FAULT_HWPOISON;
 		if (ret & VM_FAULT_LOCKED) {
+			if (page_mapped(page))
+				unmap_mapping_pages(page_mapping(page),
+						    page->index, 1, false);
 			/* Retry if a clean page was removed from the cache. */
-			if (invalidate_inode_page(vmf->page))
-				poisonret = 0;
-			unlock_page(vmf->page);
+			if (invalidate_inode_page(page))
+				poisonret = VM_FAULT_NOPAGE;
+			unlock_page(page);
 		}
-		put_page(vmf->page);
+		put_page(page);
 		vmf->page = NULL;
 		return poisonret;
 	}
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch 15/16] mm/kmemleak: reset tag when compare object pointer
       [not found] <20220401112740.351496714b370467a92207a6@linux-foundation.org>
                   ` (2 preceding siblings ...)
  2022-04-01 18:28 ` [patch 11/16] mm,hwpoison: unmap poisoned page before invalidation Andrew Morton
@ 2022-04-01 18:28 ` Andrew Morton
  3 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2022-04-01 18:28 UTC (permalink / raw)
  To: yee.lee, stable, nicholas.tang, matthias.bgg, chinwen.chang,
	catalin.marinas, Kuan-Ying.Lee, akpm, patches, linux-mm,
	mm-commits, torvalds, akpm

From: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Subject: mm/kmemleak: reset tag when compare object pointer

When we use HW-tag based kasan and enable vmalloc support, we hit
the following bug. It is due to comparison between tagged object
and non-tagged pointer.

We need to reset the kasan tag when we need to compare tagged object
and non-tagged pointer.

[    7.690429][T400001] init: kmemleak: [name:kmemleak&]Scan area larger than object 0xffffffe77076f440
[    7.691762][T400001] init: CPU: 4 PID: 1 Comm: init Tainted: G S      W         5.15.25-android13-0-g5cacf919c2bc #1
[    7.693218][T400001] init: Hardware name: MT6983(ENG) (DT)
[    7.693983][T400001] init: Call trace:
[    7.694508][T400001] init:  dump_backtrace.cfi_jt+0x0/0x8
[    7.695272][T400001] init:  dump_stack_lvl+0xac/0x120
[    7.695985][T400001] init:  add_scan_area+0xc4/0x244
[    7.696685][T400001] init:  kmemleak_scan_area+0x40/0x9c
[    7.697428][T400001] init:  layout_and_allocate+0x1e8/0x288
[    7.698211][T400001] init:  load_module+0x2c8/0xf00
[    7.698895][T400001] init:  __se_sys_finit_module+0x190/0x1d0
[    7.699701][T400001] init:  __arm64_sys_finit_module+0x20/0x30
[    7.700517][T400001] init:  invoke_syscall+0x60/0x170
[    7.701225][T400001] init:  el0_svc_common+0xc8/0x114
[    7.701933][T400001] init:  do_el0_svc+0x28/0xa0
[    7.702580][T400001] init:  el0_svc+0x60/0xf8
[    7.703196][T400001] init:  el0t_64_sync_handler+0x88/0xec
[    7.703964][T400001] init:  el0t_64_sync+0x1b4/0x1b8
[    7.704658][T400001] init: kmemleak: [name:kmemleak&]Object 0xf5ffffe77076b000 (size 32768):
[    7.705824][T400001] init: kmemleak: [name:kmemleak&]  comm "init", pid 1, jiffies 4294894197
[    7.707002][T400001] init: kmemleak: [name:kmemleak&]  min_count = 0
[    7.707886][T400001] init: kmemleak: [name:kmemleak&]  count = 0
[    7.708718][T400001] init: kmemleak: [name:kmemleak&]  flags = 0x1
[    7.709574][T400001] init: kmemleak: [name:kmemleak&]  checksum = 0
[    7.710440][T400001] init: kmemleak: [name:kmemleak&]  backtrace:
[    7.711284][T400001] init:      module_alloc+0x9c/0x120
[    7.712015][T400001] init:      move_module+0x34/0x19c
[    7.712735][T400001] init:      layout_and_allocate+0x1c4/0x288
[    7.713561][T400001] init:      load_module+0x2c8/0xf00
[    7.714291][T400001] init:      __se_sys_finit_module+0x190/0x1d0
[    7.715142][T400001] init:      __arm64_sys_finit_module+0x20/0x30
[    7.716004][T400001] init:      invoke_syscall+0x60/0x170
[    7.716758][T400001] init:      el0_svc_common+0xc8/0x114
[    7.717512][T400001] init:      do_el0_svc+0x28/0xa0
[    7.718207][T400001] init:      el0_svc+0x60/0xf8
[    7.718869][T400001] init:      el0t_64_sync_handler+0x88/0xec
[    7.719683][T400001] init:      el0t_64_sync+0x1b4/0x1b8

Link: https://lkml.kernel.org/r/20220318034051.30687-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kmemleak.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/mm/kmemleak.c~mm-kmemleak-reset-tag-when-compare-object-pointer
+++ a/mm/kmemleak.c
@@ -796,6 +796,8 @@ static void add_scan_area(unsigned long
 	unsigned long flags;
 	struct kmemleak_object *object;
 	struct kmemleak_scan_area *area = NULL;
+	unsigned long untagged_ptr;
+	unsigned long untagged_objp;
 
 	object = find_and_get_object(ptr, 1);
 	if (!object) {
@@ -804,6 +806,9 @@ static void add_scan_area(unsigned long
 		return;
 	}
 
+	untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr);
+	untagged_objp = (unsigned long)kasan_reset_tag((void *)object->pointer);
+
 	if (scan_area_cache)
 		area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
 
@@ -815,8 +820,8 @@ static void add_scan_area(unsigned long
 		goto out_unlock;
 	}
 	if (size == SIZE_MAX) {
-		size = object->pointer + object->size - ptr;
-	} else if (ptr + size > object->pointer + object->size) {
+		size = untagged_objp + object->size - untagged_ptr;
+	} else if (untagged_ptr + size > untagged_objp + object->size) {
 		kmemleak_warn("Scan area larger than object 0x%08lx\n", ptr);
 		dump_object_info(object);
 		kmem_cache_free(scan_area_cache, area);
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch 11/16] mm,hwpoison: unmap poisoned page before invalidation
       [not found] <l>
@ 2022-04-01 18:21 ` Andrew Morton
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2022-04-01 18:21 UTC (permalink / raw)
  To: stable, osalvador, naoya.horiguchi, mgorman, linmiaohe, hannes,
	riel, akpm, patches, linux-mm, mm-commits, torvalds, akpm

From: Rik van Riel <riel@surriel.com>
Subject: mm,hwpoison: unmap poisoned page before invalidation

In some cases it appears the invalidation of a hwpoisoned page fails
because the page is still mapped in another process.  This can cause a
program to be continuously restarted and die when it page faults on the
page that was not invalidated.  Avoid that problem by unmapping the
hwpoisoned page when we find it.

Another issue is that sometimes we end up oopsing in finish_fault, if the
code tries to do something with the now-NULL vmf->page.  I did not hit
this error when submitting the previous patch because there are several
opportunities for alloc_set_pte to bail out before accessing vmf->page,
and that apparently happened on those systems, and most of the time on
other systems, too.

However, across several million systems that error does occur a handful of
times a day.  It can be avoided by returning VM_FAULT_NOPAGE which will
cause do_read_fault to return before calling finish_fault.

Link: https://lkml.kernel.org/r/20220325161428.5068d97e@imladris.surriel.com
Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/mm/memory.c~mmhwpoison-unmap-poisoned-page-before-invalidation
+++ a/mm/memory.c
@@ -3918,14 +3918,18 @@ static vm_fault_t __do_fault(struct vm_f
 		return ret;

 	if (unlikely(PageHWPoison(vmf->page))) {
+		struct page *page = vmf->page;
 		vm_fault_t poisonret = VM_FAULT_HWPOISON;
 		if (ret & VM_FAULT_LOCKED) {
+			if (page_mapped(page))
+				unmap_mapping_pages(page_mapping(page),
+						    page->index, 1, false);
 			/* Retry if a clean page was removed from the cache. */
-			if (invalidate_inode_page(vmf->page))
-				poisonret = 0;
-			unlock_page(vmf->page);
+			if (invalidate_inode_page(page))
+				poisonret = VM_FAULT_NOPAGE;
+			unlock_page(page);
 		}
-		put_page(vmf->page);
+		put_page(page);
 		vmf->page = NULL;
 		return poisonret;
 	}
_

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-01 18:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220401112740.351496714b370467a92207a6@linux-foundation.org>
2022-04-01 18:28 ` [patch 01/16] Revert "mm: madvise: skip unmapped vma holes passed to process_madvise" Andrew Morton
2022-04-01 18:28 ` [patch 02/16] ocfs2: fix crash when mount with quota enabled Andrew Morton
2022-04-01 18:28 ` [patch 11/16] mm,hwpoison: unmap poisoned page before invalidation Andrew Morton
2022-04-01 18:28 ` [patch 15/16] mm/kmemleak: reset tag when compare object pointer Andrew Morton
     [not found] <l>
2022-04-01 18:21 ` [patch 11/16] mm,hwpoison: unmap poisoned page before invalidation Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).