All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock
@ 2017-03-22  9:09 Jiri Slaby
  2017-03-22  9:09 ` [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter Jiri Slaby
  2017-03-22  9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby
  0 siblings, 2 replies; 12+ messages in thread
From: Jiri Slaby @ 2017-03-22  9:09 UTC (permalink / raw)
  To: stable; +Cc: Mikulas Patocka, Mike Snitzer, Jiri Slaby

From: Mikulas Patocka <mpatocka@redhat.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current->bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/md/dm.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8c82835a4749..fafb82f383df 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1075,11 +1075,62 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 }
 EXPORT_SYMBOL_GPL(dm_set_target_max_io_len);
 
+/*
+ * Flush current->bio_list when the target map method blocks.
+ * This fixes deadlocks in snapshot and possibly in other targets.
+ */
+struct dm_offload {
+	struct blk_plug plug;
+	struct blk_plug_cb cb;
+};
+
+static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
+{
+	struct dm_offload *o = container_of(cb, struct dm_offload, cb);
+	struct bio_list list;
+	struct bio *bio;
+
+	INIT_LIST_HEAD(&o->cb.list);
+
+	if (unlikely(!current->bio_list))
+		return;
+
+	list = *current->bio_list;
+	bio_list_init(current->bio_list);
+
+	while ((bio = bio_list_pop(&list))) {
+		struct bio_set *bs = bio->bi_pool;
+		if (unlikely(!bs) || bs == fs_bio_set) {
+			bio_list_add(current->bio_list, bio);
+			continue;
+		}
+
+		spin_lock(&bs->rescue_lock);
+		bio_list_add(&bs->rescue_list, bio);
+		queue_work(bs->rescue_workqueue, &bs->rescue_work);
+		spin_unlock(&bs->rescue_lock);
+	}
+}
+
+static void dm_offload_start(struct dm_offload *o)
+{
+	blk_start_plug(&o->plug);
+	o->cb.callback = flush_current_bio_list;
+	list_add(&o->cb.list, &current->plug->cb_list);
+}
+
+static void dm_offload_end(struct dm_offload *o)
+{
+	list_del(&o->cb.list);
+	blk_finish_plug(&o->plug);
+}
+
 static void __map_bio(struct dm_target_io *tio)
 {
 	int r;
 	sector_t sector;
 	struct mapped_device *md;
+	struct dm_offload o;
 	struct bio *clone = &tio->clone;
 	struct dm_target *ti = tio->ti;
 
@@ -1093,7 +1144,11 @@ static void __map_bio(struct dm_target_io *tio)
 	 */
 	atomic_inc(&tio->io->io_count);
 	sector = clone->bi_sector;
+
+	dm_offload_start(&o);
 	r = ti->type->map(ti, clone);
+	dm_offload_end(&o);
+
 	if (r == DM_MAPIO_REMAPPED) {
 		/* the bio has been remapped so dispatch it */
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter
  2017-03-22  9:09 [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock Jiri Slaby
@ 2017-03-22  9:09 ` Jiri Slaby
  2017-03-22  9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby
  1 sibling, 0 replies; 12+ messages in thread
From: Jiri Slaby @ 2017-03-22  9:09 UTC (permalink / raw)
  To: stable; +Cc: Brian Foster, Dave Chinner, Nikolay Borisov, Jiri Slaby

From: Brian Foster <bfoster@redhat.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit dbd5c8c9a28899c6ca719eb21afc0afba9dd5574 upstream.

The total field from struct xfs_alloc_arg is a bit of an unknown
commodity. It is documented as the total block requirement for the
transaction and is used in this manner from most call sites by virtue of
passing the total block reservation of the transaction associated with
an allocation. Several xfs_bmapi_write() callers pass hardcoded values
of 0 or 1 for the total block requirement, which is a historical oddity
without any clear reasoning.

The xfs_iomap_write_direct() caller, for example, passes 0 for the total
block requirement. This has been determined to cause problems in the
form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
the block allocator. Specifically, the xfs_alloc_space_available()
function incorrectly selects an AG that doesn't actually have sufficient
space for the allocation. This occurs because the args.total field is 0
and thus the remaining free space check on the AG doesn't actually
consider the size of the allocation request. This locks the AGF buffer,
the allocation attempt proceeds and ultimately fails (in
xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
AG. In turn, this can lead to incorrect AG locking order (if the
allocator wraps around, attempting to lock AG 0 after acquiring AG N)
and thus deadlock if racing with another operation. This problem has
been reproduced via generic/299 on smallish (1GB) ramdisk test devices.

To avoid this problem, replace the undocumented hardcoded total
parameters from the iomap and utility callers to pass the block
reservation used for the associated transaction. This is consistent with
other xfs_bmapi_write() callers throughout XFS. The assumption is that
the total field allows the selection of an AG that can handle the entire
operation rather than simply the allocation/range being requested (e.g.,
resulting btree splits, etc.). This addresses the aforementioned
generic/299 hang by ensuring AG selection only occurs when the
allocation can be satisfied by the AG.

[nb] backport to 3.12

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 fs/xfs/xfs_bmap_util.c | 2 +-
 fs/xfs/xfs_iomap.c     | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 97f952caea74..42cb2f3ea51f 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1100,7 +1100,7 @@ xfs_alloc_file_space(
 		xfs_bmap_init(&free_list, &firstfsb);
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
 					allocatesize_fsb, alloc_type, &firstfsb,
-					0, imapp, &nimaps, &free_list);
+					resblks, imapp, &nimaps, &free_list);
 		if (error) {
 			goto error0;
 		}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 8d4d49b6fbf3..1d48f7a9b63e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -217,7 +217,7 @@ xfs_iomap_write_direct(
 	xfs_bmap_init(&free_list, &firstfsb);
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flag,
-				&firstfsb, 0, imap, &nimaps, &free_list);
+				&firstfsb, resblks, imap, &nimaps, &free_list);
 	if (error)
 		goto out_bmap_cancel;
 
@@ -762,7 +762,7 @@ xfs_iomap_write_allocate(
 			error = xfs_bmapi_write(tp, ip, map_start_fsb,
 						count_fsb,
 						XFS_BMAPI_STACK_SWITCH,
-						&first_block, 1,
+						&first_block, nres,
 						imap, &nimaps, &free_list);
 			if (error)
 				goto trans_cancel;
@@ -877,8 +877,8 @@ xfs_iomap_write_unwritten(
 		xfs_bmap_init(&free_list, &firstfsb);
 		nimaps = 1;
 		error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
-				  XFS_BMAPI_CONVERT, &firstfsb,
-				  1, &imap, &nimaps, &free_list);
+					XFS_BMAPI_CONVERT, &firstfsb, resblks,
+					&imap, &nimaps, &free_list);
 		if (error)
 			goto error_on_bmapi_transaction;
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-22  9:09 [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock Jiri Slaby
  2017-03-22  9:09 ` [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter Jiri Slaby
@ 2017-03-22  9:09 ` Jiri Slaby
  2017-03-28 13:11   ` Michal Hocko
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Slaby @ 2017-03-22  9:09 UTC (permalink / raw)
  To: stable
  Cc: Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau,
	Oleg Nesterov, Kees Cook, Andy Lutomirski, Michal Hocko,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings,
	Jiri Slaby

From: Keno Fischer <keno@juliacomputing.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.

In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead.  Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.

However, a similar check in huge_memory.c was forgotten.  As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.

While in this state the process is stil SIGKILLable, but little else
works (e.g.  no ptrace attach, no other signals).  This is easily
reproduced with the following code (assuming thp are set to always):

    #include <assert.h>
    #include <fcntl.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>

    #define TEST_SIZE 5 * 1024 * 1024

    int main(void) {
      int status;
      pid_t child;
      int fd = open("/proc/self/mem", O_RDWR);
      void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                        MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
      assert(addr != MAP_FAILED);
      pid_t parent_pid = getpid();
      if ((child = fork()) == 0) {
        void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                           MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
        assert(addr2 != MAP_FAILED);
        memset(addr2, 'a', TEST_SIZE);
        pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
        return 0;
      }
      assert(child == waitpid(child, &status, 0));
      assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
      return 0;
    }

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit.  The same pattern exists
in follow_devmap_pmd.  However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Drop change to follow_devmap_pmd()
 - pmd_dirty() is not available; check the page flags as in
   can_follow_write_pte()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[mhocko:
  This has been forward ported from the 3.2 stable tree.]
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 mm/huge_memory.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 04535b64119c..5bf6b1576b49 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1222,6 +1222,18 @@ out_unlock:
 	return ret;
 }
 
+/*
+ * foll_force can write to even unwritable pmd's, but only
+ * after we've gone through a cow cycle and they are dirty.
+ */
+static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
+					unsigned int flags)
+{
+	return pmd_write(pmd) ||
+		((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
+		 page && PageAnon(page));
+}
+
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 				   unsigned long addr,
 				   pmd_t *pmd,
@@ -1232,9 +1244,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 
 	assert_spin_locked(&mm->page_table_lock);
 
-	if (flags & FOLL_WRITE && !pmd_write(*pmd))
-		goto out;
-
 	/* Avoid dumping huge zero page */
 	if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
 		return ERR_PTR(-EFAULT);
@@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 
 	page = pmd_page(*pmd);
 	VM_BUG_ON(!PageHead(page));
+
+	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
+		goto out;
+
 	if (flags & FOLL_TOUCH) {
 		pmd_t _pmd;
 		/*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-22  9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby
@ 2017-03-28 13:11   ` Michal Hocko
  2017-03-28 13:23     ` Michal Hocko
  2017-03-31 19:58     ` Ben Hutchings
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2017-03-28 13:11 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings

On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
[...]
> @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  
>  	page = pmd_page(*pmd);
>  	VM_BUG_ON(!PageHead(page));
> +
> +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> +		goto out;
> +
>  	if (flags & FOLL_TOUCH) {
>  		pmd_t _pmd;
>  		/*

I have just noticed that this patch is not correct fo 3.12 because we
should return NULL rather than the page in this case. 3.2 is wrong as
well AFAICS.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-28 13:11   ` Michal Hocko
@ 2017-03-28 13:23     ` Michal Hocko
  2017-03-28 13:55       ` Jiri Slaby
                         ` (2 more replies)
  2017-03-31 19:58     ` Ben Hutchings
  1 sibling, 3 replies; 12+ messages in thread
From: Michal Hocko @ 2017-03-28 13:23 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings,
	Miroslav Benes

On Tue 28-03-17 15:11:54, Michal Hocko wrote:
> On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
> [...]
> > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> >  
> >  	page = pmd_page(*pmd);
> >  	VM_BUG_ON(!PageHead(page));
> > +
> > +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> > +		goto out;
> > +
> >  	if (flags & FOLL_TOUCH) {
> >  		pmd_t _pmd;
> >  		/*
> 
> I have just noticed that this patch is not correct fo 3.12 because we
> should return NULL rather than the page in this case. 3.2 is wrong as
> well AFAICS.

The following should be applied on both 3.2 and 3.12 kernels.
---
>From a245c2791db389d98e1f3c77b6734b1870b7a15c Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 28 Mar 2017 15:17:26 +0200
Subject: [PATCH] mm/huge_memory.c: fix up "mm/huge_memory.c: respect 
 FOLL_FORCE/FOLL_COW for thp" backport
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is a stable follow up fix for an incorrect backport. The issue is
not present in the upstream kernel.

Miroslav has noticed the following splat when testing my 3.2 forward
port of 8310d48b125d ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for
thp") to 3.12:

BUG: Bad page state in process a.out  pfn:26400
page:ffffea000085e000 count:0 mapcount:1 mapping:          (null) index:0x7f049d600
page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked)
page dumped because: nonzero mapcount
[iii]
CPU: 2 PID: 5926 Comm: a.out Tainted: G            E    3.12.61-0-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7
 ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000
 ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009
Call Trace:
 [<ffffffff8100475d>] dump_trace+0x7d/0x2d0
 [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170
 [<ffffffff81005ce1>] show_stack+0x21/0x50
 [<ffffffff81515830>] dump_stack+0x5d/0x78
 [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102
 [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0
 [<ffffffff81141275>] __free_pages_ok+0x15/0xd0
 [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0
 [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0
 [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0
 [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0
 [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b
 [<00007f049da18573>] 0x7f049da18572

The problem is that the original 3.2 backport didn't return NULL page on
the FOLL_COW page and so the page got reused.

Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 998efcee7201..d6e6cafdb2c9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -989,7 +989,7 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm,
 	VM_BUG_ON(!PageHead(page));
 
 	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
-		goto out;
+		return NULL;
 
 	if (flags & FOLL_TOUCH) {
 		pmd_t _pmd;
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-28 13:23     ` Michal Hocko
@ 2017-03-28 13:55       ` Jiri Slaby
  2017-03-30 15:24       ` Michal Hocko
  2017-03-31 20:07       ` Ben Hutchings
  2 siblings, 0 replies; 12+ messages in thread
From: Jiri Slaby @ 2017-03-28 13:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings,
	Miroslav Benes

On 03/28/2017, 03:23 PM, Michal Hocko wrote:
> On Tue 28-03-17 15:11:54, Michal Hocko wrote:
>> On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
>> [...]
>>> @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>>>  
>>>  	page = pmd_page(*pmd);
>>>  	VM_BUG_ON(!PageHead(page));
>>> +
>>> +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
>>> +		goto out;
>>> +
>>>  	if (flags & FOLL_TOUCH) {
>>>  		pmd_t _pmd;
>>>  		/*
>>
>> I have just noticed that this patch is not correct fo 3.12 because we
>> should return NULL rather than the page in this case. 3.2 is wrong as
>> well AFAICS.
> 
> The following should be applied on both 3.2 and 3.12 kernels.
> ---
> From a245c2791db389d98e1f3c77b6734b1870b7a15c Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 28 Mar 2017 15:17:26 +0200
> Subject: [PATCH] mm/huge_memory.c: fix up "mm/huge_memory.c: respect 
>  FOLL_FORCE/FOLL_COW for thp" backport
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> This is a stable follow up fix for an incorrect backport. The issue is
> not present in the upstream kernel.
> 
> Miroslav has noticed the following splat when testing my 3.2 forward
> port of 8310d48b125d ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for
> thp") to 3.12:
> 
> BUG: Bad page state in process a.out  pfn:26400
> page:ffffea000085e000 count:0 mapcount:1 mapping:          (null) index:0x7f049d600
> page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked)
> page dumped because: nonzero mapcount
> [iii]
> CPU: 2 PID: 5926 Comm: a.out Tainted: G            E    3.12.61-0-default #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
>  0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7
>  ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000
>  ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009
> Call Trace:
>  [<ffffffff8100475d>] dump_trace+0x7d/0x2d0
>  [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170
>  [<ffffffff81005ce1>] show_stack+0x21/0x50
>  [<ffffffff81515830>] dump_stack+0x5d/0x78
>  [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102
>  [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0
>  [<ffffffff81141275>] __free_pages_ok+0x15/0xd0
>  [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0
>  [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0
>  [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0
>  [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0
>  [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b
>  [<00007f049da18573>] 0x7f049da18572
> 
> The problem is that the original 3.2 backport didn't return NULL page on
> the FOLL_COW page and so the page got reused.
> 
> Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/huge_memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 998efcee7201..d6e6cafdb2c9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -989,7 +989,7 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm,
>  	VM_BUG_ON(!PageHead(page));
>  
>  	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> -		goto out;
> +		return NULL;

Thanks, squashed into the original commit given the kernel with the
bogus patch was not released yet.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-28 13:23     ` Michal Hocko
  2017-03-28 13:55       ` Jiri Slaby
@ 2017-03-30 15:24       ` Michal Hocko
  2017-03-31 20:07       ` Ben Hutchings
  2 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2017-03-30 15:24 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Jiri Slaby,
	Miroslav Benes

ping Ben

On Tue 28-03-17 15:23:26, Michal Hocko wrote:
[...]
> From a245c2791db389d98e1f3c77b6734b1870b7a15c Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 28 Mar 2017 15:17:26 +0200
> Subject: [PATCH] mm/huge_memory.c: fix up "mm/huge_memory.c: respect 
>  FOLL_FORCE/FOLL_COW for thp" backport
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> This is a stable follow up fix for an incorrect backport. The issue is
> not present in the upstream kernel.
> 
> Miroslav has noticed the following splat when testing my 3.2 forward
> port of 8310d48b125d ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for
> thp") to 3.12:
> 
> BUG: Bad page state in process a.out  pfn:26400
> page:ffffea000085e000 count:0 mapcount:1 mapping:          (null) index:0x7f049d600
> page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked)
> page dumped because: nonzero mapcount
> [iii]
> CPU: 2 PID: 5926 Comm: a.out Tainted: G            E    3.12.61-0-default #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
>  0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7
>  ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000
>  ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009
> Call Trace:
>  [<ffffffff8100475d>] dump_trace+0x7d/0x2d0
>  [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170
>  [<ffffffff81005ce1>] show_stack+0x21/0x50
>  [<ffffffff81515830>] dump_stack+0x5d/0x78
>  [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102
>  [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0
>  [<ffffffff81141275>] __free_pages_ok+0x15/0xd0
>  [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0
>  [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0
>  [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0
>  [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0
>  [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b
>  [<00007f049da18573>] 0x7f049da18572
> 
> The problem is that the original 3.2 backport didn't return NULL page on
> the FOLL_COW page and so the page got reused.
> 
> Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/huge_memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 998efcee7201..d6e6cafdb2c9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -989,7 +989,7 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm,
>  	VM_BUG_ON(!PageHead(page));
>  
>  	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> -		goto out;
> +		return NULL;
>  
>  	if (flags & FOLL_TOUCH) {
>  		pmd_t _pmd;
> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-28 13:11   ` Michal Hocko
  2017-03-28 13:23     ` Michal Hocko
@ 2017-03-31 19:58     ` Ben Hutchings
  2017-04-03  7:28       ` Michal Hocko
  1 sibling, 1 reply; 12+ messages in thread
From: Ben Hutchings @ 2017-03-31 19:58 UTC (permalink / raw)
  To: Michal Hocko, Jiri Slaby
  Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote:
> On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
> [...]
> > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> >  
> >  	page = pmd_page(*pmd);
> >  	VM_BUG_ON(!PageHead(page));
> > +
> > +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> > +		goto out;
> > +
> >  	if (flags & FOLL_TOUCH) {
> > 		pmd_t _pmd;
> >  		/*
> 
> I have just noticed that this patch is not correct fo 3.12 because we
> should return NULL rather than the page in this case. 3.2 is wrong as
> well AFAICS.

Thanks for reporting this.  This is the same mistake I made initially
with follow_page() in 3.2.  But I had a test case which caught that
before release, and I don't have a test case for this.

Ben.

-- 
Ben Hutchings
To err is human; to really foul things up requires a computer.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-28 13:23     ` Michal Hocko
  2017-03-28 13:55       ` Jiri Slaby
  2017-03-30 15:24       ` Michal Hocko
@ 2017-03-31 20:07       ` Ben Hutchings
  2 siblings, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2017-03-31 20:07 UTC (permalink / raw)
  To: Michal Hocko, Jiri Slaby
  Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Miroslav Benes

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Tue, 2017-03-28 at 15:23 +0200, Michal Hocko wrote:
> On Tue 28-03-17 15:11:54, Michal Hocko wrote:
> > On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
> > [...]
> > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> > >  
> > >  	page = pmd_page(*pmd);
> > >  	VM_BUG_ON(!PageHead(page));
> > > +
> > > +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> > > +		goto out;
> > > +
> > >  	if (flags & FOLL_TOUCH) {
> > > 		pmd_t _pmd;
> > >  		/*
> > 
> > I have just noticed that this patch is not correct fo 3.12 because we
> > should return NULL rather than the page in this case. 3.2 is wrong as
> > well AFAICS.
> 
> The following should be applied on both 3.2 and 3.12 kernels.
[...]

Thanks again; I've queued this up for 3.2.

Ben.

-- 
Ben Hutchings
To err is human; to really foul things up requires a computer.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-03-31 19:58     ` Ben Hutchings
@ 2017-04-03  7:28       ` Michal Hocko
  2017-04-03  9:08         ` Miroslav Benes
  2017-04-10 18:14         ` Ben Hutchings
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2017-04-03  7:28 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jiri Slaby, stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Miroslav Benes

On Fri 31-03-17 20:58:16, Ben Hutchings wrote:
> On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote:
> > On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
> > [...]
> > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> > > �
> > > �	page = pmd_page(*pmd);
> > > �	VM_BUG_ON(!PageHead(page));
> > > +
> > > +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> > > +		goto out;
> > > +
> > > �	if (flags & FOLL_TOUCH) {
> > > 		pmd_t _pmd;
> > > �		/*
> > 
> > I have just noticed that this patch is not correct fo 3.12 because we
> > should return NULL rather than the page in this case. 3.2 is wrong as
> > well AFAICS.
> 
> Thanks for reporting this.  This is the same mistake I made initially
> with follow_page() in 3.2.  But I had a test case which caught that
> before release, and I don't have a test case for this.

I believe Miroslav has used the test case embeded in the patch
description to catch the bug.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-04-03  7:28       ` Michal Hocko
@ 2017-04-03  9:08         ` Miroslav Benes
  2017-04-10 18:14         ` Ben Hutchings
  1 sibling, 0 replies; 12+ messages in thread
From: Miroslav Benes @ 2017-04-03  9:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ben Hutchings, Jiri Slaby, stable, Keno Fischer, Greg Thelen,
	Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook,
	Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

On Mon, 3 Apr 2017, Michal Hocko wrote:

> On Fri 31-03-17 20:58:16, Ben Hutchings wrote:
> > On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote:
> > > On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
> > > [...]
> > > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> > > > �
> > > > �	page = pmd_page(*pmd);
> > > > �	VM_BUG_ON(!PageHead(page));
> > > > +
> > > > +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> > > > +		goto out;
> > > > +
> > > > �	if (flags & FOLL_TOUCH) {
> > > > 		pmd_t _pmd;
> > > > �		/*
> > > 
> > > I have just noticed that this patch is not correct fo 3.12 because we
> > > should return NULL rather than the page in this case. 3.2 is wrong as
> > > well AFAICS.
> > 
> > Thanks for reporting this.  This is the same mistake I made initially
> > with follow_page() in 3.2.  But I had a test case which caught that
> > before release, and I don't have a test case for this.
> 
> I believe Miroslav has used the test case embeded in the patch
> description to catch the bug.

That is correct.

Miroslav

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
  2017-04-03  7:28       ` Michal Hocko
  2017-04-03  9:08         ` Miroslav Benes
@ 2017-04-10 18:14         ` Ben Hutchings
  1 sibling, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2017-04-10 18:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jiri Slaby, stable, Keno Fischer, Greg Thelen, Nicholas Piggin,
	Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski,
	Hugh Dickins, Andrew Morton, Linus Torvalds, Miroslav Benes

[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]

On Mon, 2017-04-03 at 09:28 +0200, Michal Hocko wrote:
> On Fri 31-03-17 20:58:16, Ben Hutchings wrote:
> > On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote:
> > > On Wed 22-03-17 10:09:43, Jiri Slaby wrote:
> > > [...]
> > > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> > > >  
> > > >  	page = pmd_page(*pmd);
> > > >  	VM_BUG_ON(!PageHead(page));
> > > > +
> > > > +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> > > > +		goto out;
> > > > +
> > > >  	if (flags & FOLL_TOUCH) {
> > > > 		pmd_t _pmd;
> > > >  		/*
> > > 
> > > I have just noticed that this patch is not correct fo 3.12 because we
> > > should return NULL rather than the page in this case. 3.2 is wrong as
> > > well AFAICS.
> > 
> > Thanks for reporting this.  This is the same mistake I made initially
> > with follow_page() in 3.2.  But I had a test case which caught that
> > before release, and I don't have a test case for this.
> 
> I believe Miroslav has used the test case embeded in the patch
> description to catch the bug.

Well then I have no excuse for screwing this up. :-/

Ben.

-- 
Ben Hutchings
73.46% of all statistics are made up.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-04-10 18:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22  9:09 [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock Jiri Slaby
2017-03-22  9:09 ` [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter Jiri Slaby
2017-03-22  9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby
2017-03-28 13:11   ` Michal Hocko
2017-03-28 13:23     ` Michal Hocko
2017-03-28 13:55       ` Jiri Slaby
2017-03-30 15:24       ` Michal Hocko
2017-03-31 20:07       ` Ben Hutchings
2017-03-31 19:58     ` Ben Hutchings
2017-04-03  7:28       ` Michal Hocko
2017-04-03  9:08         ` Miroslav Benes
2017-04-10 18:14         ` Ben Hutchings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.