* [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock @ 2017-03-22 9:09 Jiri Slaby 2017-03-22 9:09 ` [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter Jiri Slaby 2017-03-22 9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby 0 siblings, 2 replies; 12+ messages in thread From: Jiri Slaby @ 2017-03-22 9:09 UTC (permalink / raw) To: stable; +Cc: Mikulas Patocka, Mike Snitzer, Jiri Slaby From: Mikulas Patocka <mpatocka@redhat.com> This patch has been added to the 3.12 stable tree. If you have any objections, please let us know. =============== commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream. Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers") created a workqueue for every bio set and code in bio_alloc_bioset() that tries to resolve some low-memory deadlocks by redirecting bios queued on current->bio_list to the workqueue if the system is low on memory. However other deadlocks (see below **) may happen, without any low memory condition, because generic_make_request is queuing bios to current->bio_list (rather than submitting them). ** the related dm-snapshot deadlock is detailed here: https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html Fix this deadlock by redirecting any bios on current->bio_list to the bio_set's rescue workqueue on every schedule() call. Consequently, when the process blocks on a mutex, the bios queued on current->bio_list are dispatched to independent workqueus and they can complete without waiting for the mutex to be available. The structure blk_plug contains an entry cb_list and this list can contain arbitrary callback functions that are called when the process blocks. To implement this fix DM (ab)uses the onstack plug's cb_list interface to get its flush_current_bio_list() called at schedule() time. This fixes the snapshot deadlock - if the map method blocks, flush_current_bio_list() will be called and it redirects bios waiting on current->bio_list to appropriate workqueues. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650 Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> --- drivers/md/dm.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 8c82835a4749..fafb82f383df 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1075,11 +1075,62 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len) } EXPORT_SYMBOL_GPL(dm_set_target_max_io_len); +/* + * Flush current->bio_list when the target map method blocks. + * This fixes deadlocks in snapshot and possibly in other targets. + */ +struct dm_offload { + struct blk_plug plug; + struct blk_plug_cb cb; +}; + +static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule) +{ + struct dm_offload *o = container_of(cb, struct dm_offload, cb); + struct bio_list list; + struct bio *bio; + + INIT_LIST_HEAD(&o->cb.list); + + if (unlikely(!current->bio_list)) + return; + + list = *current->bio_list; + bio_list_init(current->bio_list); + + while ((bio = bio_list_pop(&list))) { + struct bio_set *bs = bio->bi_pool; + if (unlikely(!bs) || bs == fs_bio_set) { + bio_list_add(current->bio_list, bio); + continue; + } + + spin_lock(&bs->rescue_lock); + bio_list_add(&bs->rescue_list, bio); + queue_work(bs->rescue_workqueue, &bs->rescue_work); + spin_unlock(&bs->rescue_lock); + } +} + +static void dm_offload_start(struct dm_offload *o) +{ + blk_start_plug(&o->plug); + o->cb.callback = flush_current_bio_list; + list_add(&o->cb.list, ¤t->plug->cb_list); +} + +static void dm_offload_end(struct dm_offload *o) +{ + list_del(&o->cb.list); + blk_finish_plug(&o->plug); +} + static void __map_bio(struct dm_target_io *tio) { int r; sector_t sector; struct mapped_device *md; + struct dm_offload o; struct bio *clone = &tio->clone; struct dm_target *ti = tio->ti; @@ -1093,7 +1144,11 @@ static void __map_bio(struct dm_target_io *tio) */ atomic_inc(&tio->io->io_count); sector = clone->bi_sector; + + dm_offload_start(&o); r = ti->type->map(ti, clone); + dm_offload_end(&o); + if (r == DM_MAPIO_REMAPPED) { /* the bio has been remapped so dispatch it */ -- 2.12.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter 2017-03-22 9:09 [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock Jiri Slaby @ 2017-03-22 9:09 ` Jiri Slaby 2017-03-22 9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby 1 sibling, 0 replies; 12+ messages in thread From: Jiri Slaby @ 2017-03-22 9:09 UTC (permalink / raw) To: stable; +Cc: Brian Foster, Dave Chinner, Nikolay Borisov, Jiri Slaby From: Brian Foster <bfoster@redhat.com> This patch has been added to the 3.12 stable tree. If you have any objections, please let us know. =============== commit dbd5c8c9a28899c6ca719eb21afc0afba9dd5574 upstream. The total field from struct xfs_alloc_arg is a bit of an unknown commodity. It is documented as the total block requirement for the transaction and is used in this manner from most call sites by virtue of passing the total block reservation of the transaction associated with an allocation. Several xfs_bmapi_write() callers pass hardcoded values of 0 or 1 for the total block requirement, which is a historical oddity without any clear reasoning. The xfs_iomap_write_direct() caller, for example, passes 0 for the total block requirement. This has been determined to cause problems in the form of ABBA deadlocks of AGF buffers due to incorrect AG selection in the block allocator. Specifically, the xfs_alloc_space_available() function incorrectly selects an AG that doesn't actually have sufficient space for the allocation. This occurs because the args.total field is 0 and thus the remaining free space check on the AG doesn't actually consider the size of the allocation request. This locks the AGF buffer, the allocation attempt proceeds and ultimately fails (in xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next AG. In turn, this can lead to incorrect AG locking order (if the allocator wraps around, attempting to lock AG 0 after acquiring AG N) and thus deadlock if racing with another operation. This problem has been reproduced via generic/299 on smallish (1GB) ramdisk test devices. To avoid this problem, replace the undocumented hardcoded total parameters from the iomap and utility callers to pass the block reservation used for the associated transaction. This is consistent with other xfs_bmapi_write() callers throughout XFS. The assumption is that the total field allows the selection of an AG that can handle the entire operation rather than simply the allocation/range being requested (e.g., resulting btree splits, etc.). This addresses the aforementioned generic/299 hang by ensuring AG selection only occurs when the allocation can be satisfied by the AG. [nb] backport to 3.12 Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> --- fs/xfs/xfs_bmap_util.c | 2 +- fs/xfs/xfs_iomap.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 97f952caea74..42cb2f3ea51f 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1100,7 +1100,7 @@ xfs_alloc_file_space( xfs_bmap_init(&free_list, &firstfsb); error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb, alloc_type, &firstfsb, - 0, imapp, &nimaps, &free_list); + resblks, imapp, &nimaps, &free_list); if (error) { goto error0; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 8d4d49b6fbf3..1d48f7a9b63e 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -217,7 +217,7 @@ xfs_iomap_write_direct( xfs_bmap_init(&free_list, &firstfsb); nimaps = 1; error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flag, - &firstfsb, 0, imap, &nimaps, &free_list); + &firstfsb, resblks, imap, &nimaps, &free_list); if (error) goto out_bmap_cancel; @@ -762,7 +762,7 @@ xfs_iomap_write_allocate( error = xfs_bmapi_write(tp, ip, map_start_fsb, count_fsb, XFS_BMAPI_STACK_SWITCH, - &first_block, 1, + &first_block, nres, imap, &nimaps, &free_list); if (error) goto trans_cancel; @@ -877,8 +877,8 @@ xfs_iomap_write_unwritten( xfs_bmap_init(&free_list, &firstfsb); nimaps = 1; error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, - XFS_BMAPI_CONVERT, &firstfsb, - 1, &imap, &nimaps, &free_list); + XFS_BMAPI_CONVERT, &firstfsb, resblks, + &imap, &nimaps, &free_list); if (error) goto error_on_bmapi_transaction; -- 2.12.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-22 9:09 [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock Jiri Slaby 2017-03-22 9:09 ` [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter Jiri Slaby @ 2017-03-22 9:09 ` Jiri Slaby 2017-03-28 13:11 ` Michal Hocko 1 sibling, 1 reply; 12+ messages in thread From: Jiri Slaby @ 2017-03-22 9:09 UTC (permalink / raw) To: stable Cc: Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Michal Hocko, Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings, Jiri Slaby From: Keno Fischer <keno@juliacomputing.com> This patch has been added to the 3.12 stable tree. If you have any objections, please let us know. =============== commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream. In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE after a COW was resolved to setting the (newly introduced) FOLL_COW instead. Simultaneously, the check in gup.c was updated to still allow writes with FOLL_FORCE set if FOLL_COW had also been set. However, a similar check in huge_memory.c was forgotten. As a result, remote memory writes to ro regions of memory backed by transparent huge pages cause an infinite loop in the kernel (handle_mm_fault sets FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is true. While in this state the process is stil SIGKILLable, but little else works (e.g. no ptrace attach, no other signals). This is easily reproduced with the following code (assuming thp are set to always): #include <assert.h> #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #define TEST_SIZE 5 * 1024 * 1024 int main(void) { int status; pid_t child; int fd = open("/proc/self/mem", O_RDWR); void *addr = mmap(NULL, TEST_SIZE, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr != MAP_FAILED); pid_t parent_pid = getpid(); if ((child = fork()) == 0) { void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr2 != MAP_FAILED); memset(addr2, 'a', TEST_SIZE); pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr); return 0; } assert(child == waitpid(child, &status, 0)); assert(WIFEXITED(status) && WEXITSTATUS(status) == 0); return 0; } Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to the update in gup.c in the original commit. The same pattern exists in follow_devmap_pmd. However, we should not be able to reach that check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever do. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com Signed-off-by: Keno Fischer <keno@juliacomputing.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Drop change to follow_devmap_pmd() - pmd_dirty() is not available; check the page flags as in can_follow_write_pte() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [mhocko: This has been forward ported from the 3.2 stable tree.] Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> --- mm/huge_memory.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 04535b64119c..5bf6b1576b49 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1222,6 +1222,18 @@ out_unlock: return ret; } +/* + * foll_force can write to even unwritable pmd's, but only + * after we've gone through a cow cycle and they are dirty. + */ +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, + unsigned int flags) +{ + return pmd_write(pmd) || + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && + page && PageAnon(page)); +} + struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, @@ -1232,9 +1244,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, assert_spin_locked(&mm->page_table_lock); - if (flags & FOLL_WRITE && !pmd_write(*pmd)) - goto out; - /* Avoid dumping huge zero page */ if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) return ERR_PTR(-EFAULT); @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, page = pmd_page(*pmd); VM_BUG_ON(!PageHead(page)); + + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) + goto out; + if (flags & FOLL_TOUCH) { pmd_t _pmd; /* -- 2.12.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-22 9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby @ 2017-03-28 13:11 ` Michal Hocko 2017-03-28 13:23 ` Michal Hocko 2017-03-31 19:58 ` Ben Hutchings 0 siblings, 2 replies; 12+ messages in thread From: Michal Hocko @ 2017-03-28 13:11 UTC (permalink / raw) To: Jiri Slaby Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings On Wed 22-03-17 10:09:43, Jiri Slaby wrote: [...] > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > page = pmd_page(*pmd); > VM_BUG_ON(!PageHead(page)); > + > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > + goto out; > + > if (flags & FOLL_TOUCH) { > pmd_t _pmd; > /* I have just noticed that this patch is not correct fo 3.12 because we should return NULL rather than the page in this case. 3.2 is wrong as well AFAICS. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-28 13:11 ` Michal Hocko @ 2017-03-28 13:23 ` Michal Hocko 2017-03-28 13:55 ` Jiri Slaby ` (2 more replies) 2017-03-31 19:58 ` Ben Hutchings 1 sibling, 3 replies; 12+ messages in thread From: Michal Hocko @ 2017-03-28 13:23 UTC (permalink / raw) To: Jiri Slaby Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings, Miroslav Benes On Tue 28-03-17 15:11:54, Michal Hocko wrote: > On Wed 22-03-17 10:09:43, Jiri Slaby wrote: > [...] > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > > > page = pmd_page(*pmd); > > VM_BUG_ON(!PageHead(page)); > > + > > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > > + goto out; > > + > > if (flags & FOLL_TOUCH) { > > pmd_t _pmd; > > /* > > I have just noticed that this patch is not correct fo 3.12 because we > should return NULL rather than the page in this case. 3.2 is wrong as > well AFAICS. The following should be applied on both 3.2 and 3.12 kernels. --- >From a245c2791db389d98e1f3c77b6734b1870b7a15c Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Tue, 28 Mar 2017 15:17:26 +0200 Subject: [PATCH] mm/huge_memory.c: fix up "mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp" backport MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This is a stable follow up fix for an incorrect backport. The issue is not present in the upstream kernel. Miroslav has noticed the following splat when testing my 3.2 forward port of 8310d48b125d ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp") to 3.12: BUG: Bad page state in process a.out pfn:26400 page:ffffea000085e000 count:0 mapcount:1 mapping: (null) index:0x7f049d600 page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked) page dumped because: nonzero mapcount [iii] CPU: 2 PID: 5926 Comm: a.out Tainted: G E 3.12.61-0-default #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7 ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000 ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009 Call Trace: [<ffffffff8100475d>] dump_trace+0x7d/0x2d0 [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170 [<ffffffff81005ce1>] show_stack+0x21/0x50 [<ffffffff81515830>] dump_stack+0x5d/0x78 [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102 [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0 [<ffffffff81141275>] __free_pages_ok+0x15/0xd0 [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0 [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0 [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0 [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0 [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b [<00007f049da18573>] 0x7f049da18572 The problem is that the original 3.2 backport didn't return NULL page on the FOLL_COW page and so the page got reused. Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 998efcee7201..d6e6cafdb2c9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -989,7 +989,7 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm, VM_BUG_ON(!PageHead(page)); if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) - goto out; + return NULL; if (flags & FOLL_TOUCH) { pmd_t _pmd; -- 2.11.0 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-28 13:23 ` Michal Hocko @ 2017-03-28 13:55 ` Jiri Slaby 2017-03-30 15:24 ` Michal Hocko 2017-03-31 20:07 ` Ben Hutchings 2 siblings, 0 replies; 12+ messages in thread From: Jiri Slaby @ 2017-03-28 13:55 UTC (permalink / raw) To: Michal Hocko Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Ben Hutchings, Miroslav Benes On 03/28/2017, 03:23 PM, Michal Hocko wrote: > On Tue 28-03-17 15:11:54, Michal Hocko wrote: >> On Wed 22-03-17 10:09:43, Jiri Slaby wrote: >> [...] >>> @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, >>> >>> page = pmd_page(*pmd); >>> VM_BUG_ON(!PageHead(page)); >>> + >>> + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) >>> + goto out; >>> + >>> if (flags & FOLL_TOUCH) { >>> pmd_t _pmd; >>> /* >> >> I have just noticed that this patch is not correct fo 3.12 because we >> should return NULL rather than the page in this case. 3.2 is wrong as >> well AFAICS. > > The following should be applied on both 3.2 and 3.12 kernels. > --- > From a245c2791db389d98e1f3c77b6734b1870b7a15c Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Tue, 28 Mar 2017 15:17:26 +0200 > Subject: [PATCH] mm/huge_memory.c: fix up "mm/huge_memory.c: respect > FOLL_FORCE/FOLL_COW for thp" backport > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > This is a stable follow up fix for an incorrect backport. The issue is > not present in the upstream kernel. > > Miroslav has noticed the following splat when testing my 3.2 forward > port of 8310d48b125d ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for > thp") to 3.12: > > BUG: Bad page state in process a.out pfn:26400 > page:ffffea000085e000 count:0 mapcount:1 mapping: (null) index:0x7f049d600 > page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked) > page dumped because: nonzero mapcount > [iii] > CPU: 2 PID: 5926 Comm: a.out Tainted: G E 3.12.61-0-default #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > 0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7 > ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000 > ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009 > Call Trace: > [<ffffffff8100475d>] dump_trace+0x7d/0x2d0 > [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170 > [<ffffffff81005ce1>] show_stack+0x21/0x50 > [<ffffffff81515830>] dump_stack+0x5d/0x78 > [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102 > [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0 > [<ffffffff81141275>] __free_pages_ok+0x15/0xd0 > [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0 > [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0 > [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0 > [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0 > [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b > [<00007f049da18573>] 0x7f049da18572 > > The problem is that the original 3.2 backport didn't return NULL page on > the FOLL_COW page and so the page got reused. > > Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/huge_memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 998efcee7201..d6e6cafdb2c9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -989,7 +989,7 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm, > VM_BUG_ON(!PageHead(page)); > > if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > - goto out; > + return NULL; Thanks, squashed into the original commit given the kernel with the bogus patch was not released yet. -- js suse labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-28 13:23 ` Michal Hocko 2017-03-28 13:55 ` Jiri Slaby @ 2017-03-30 15:24 ` Michal Hocko 2017-03-31 20:07 ` Ben Hutchings 2 siblings, 0 replies; 12+ messages in thread From: Michal Hocko @ 2017-03-30 15:24 UTC (permalink / raw) To: Ben Hutchings Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Jiri Slaby, Miroslav Benes ping Ben On Tue 28-03-17 15:23:26, Michal Hocko wrote: [...] > From a245c2791db389d98e1f3c77b6734b1870b7a15c Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Tue, 28 Mar 2017 15:17:26 +0200 > Subject: [PATCH] mm/huge_memory.c: fix up "mm/huge_memory.c: respect > FOLL_FORCE/FOLL_COW for thp" backport > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > This is a stable follow up fix for an incorrect backport. The issue is > not present in the upstream kernel. > > Miroslav has noticed the following splat when testing my 3.2 forward > port of 8310d48b125d ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for > thp") to 3.12: > > BUG: Bad page state in process a.out pfn:26400 > page:ffffea000085e000 count:0 mapcount:1 mapping: (null) index:0x7f049d600 > page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked) > page dumped because: nonzero mapcount > [iii] > CPU: 2 PID: 5926 Comm: a.out Tainted: G E 3.12.61-0-default #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 > 0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7 > ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000 > ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009 > Call Trace: > [<ffffffff8100475d>] dump_trace+0x7d/0x2d0 > [<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170 > [<ffffffff81005ce1>] show_stack+0x21/0x50 > [<ffffffff81515830>] dump_stack+0x5d/0x78 > [<ffffffff815118a5>] bad_page.part.67+0xe8/0x102 > [<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0 > [<ffffffff81141275>] __free_pages_ok+0x15/0xd0 > [<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0 > [<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0 > [<ffffffff811a3b18>] vfs_write+0xb8/0x1e0 > [<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0 > [<ffffffff81523b49>] system_call_fastpath+0x16/0x1b > [<00007f049da18573>] 0x7f049da18572 > > The problem is that the original 3.2 backport didn't return NULL page on > the FOLL_COW page and so the page got reused. > > Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/huge_memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 998efcee7201..d6e6cafdb2c9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -989,7 +989,7 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm, > VM_BUG_ON(!PageHead(page)); > > if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > - goto out; > + return NULL; > > if (flags & FOLL_TOUCH) { > pmd_t _pmd; > -- > 2.11.0 > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-28 13:23 ` Michal Hocko 2017-03-28 13:55 ` Jiri Slaby 2017-03-30 15:24 ` Michal Hocko @ 2017-03-31 20:07 ` Ben Hutchings 2 siblings, 0 replies; 12+ messages in thread From: Ben Hutchings @ 2017-03-31 20:07 UTC (permalink / raw) To: Michal Hocko, Jiri Slaby Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Miroslav Benes [-- Attachment #1: Type: text/plain, Size: 917 bytes --] On Tue, 2017-03-28 at 15:23 +0200, Michal Hocko wrote: > On Tue 28-03-17 15:11:54, Michal Hocko wrote: > > On Wed 22-03-17 10:09:43, Jiri Slaby wrote: > > [...] > > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > > > > > page = pmd_page(*pmd); > > > VM_BUG_ON(!PageHead(page)); > > > + > > > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > > > + goto out; > > > + > > > if (flags & FOLL_TOUCH) { > > > pmd_t _pmd; > > > /* > > > > I have just noticed that this patch is not correct fo 3.12 because we > > should return NULL rather than the page in this case. 3.2 is wrong as > > well AFAICS. > > The following should be applied on both 3.2 and 3.12 kernels. [...] Thanks again; I've queued this up for 3.2. Ben. -- Ben Hutchings To err is human; to really foul things up requires a computer. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-28 13:11 ` Michal Hocko 2017-03-28 13:23 ` Michal Hocko @ 2017-03-31 19:58 ` Ben Hutchings 2017-04-03 7:28 ` Michal Hocko 1 sibling, 1 reply; 12+ messages in thread From: Ben Hutchings @ 2017-03-31 19:58 UTC (permalink / raw) To: Michal Hocko, Jiri Slaby Cc: stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 910 bytes --] On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote: > On Wed 22-03-17 10:09:43, Jiri Slaby wrote: > [...] > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > > > page = pmd_page(*pmd); > > VM_BUG_ON(!PageHead(page)); > > + > > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > > + goto out; > > + > > if (flags & FOLL_TOUCH) { > > pmd_t _pmd; > > /* > > I have just noticed that this patch is not correct fo 3.12 because we > should return NULL rather than the page in this case. 3.2 is wrong as > well AFAICS. Thanks for reporting this. This is the same mistake I made initially with follow_page() in 3.2. But I had a test case which caught that before release, and I don't have a test case for this. Ben. -- Ben Hutchings To err is human; to really foul things up requires a computer. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-03-31 19:58 ` Ben Hutchings @ 2017-04-03 7:28 ` Michal Hocko 2017-04-03 9:08 ` Miroslav Benes 2017-04-10 18:14 ` Ben Hutchings 0 siblings, 2 replies; 12+ messages in thread From: Michal Hocko @ 2017-04-03 7:28 UTC (permalink / raw) To: Ben Hutchings Cc: Jiri Slaby, stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Miroslav Benes On Fri 31-03-17 20:58:16, Ben Hutchings wrote: > On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote: > > On Wed 22-03-17 10:09:43, Jiri Slaby wrote: > > [...] > > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > > � > > > � page = pmd_page(*pmd); > > > � VM_BUG_ON(!PageHead(page)); > > > + > > > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > > > + goto out; > > > + > > > � if (flags & FOLL_TOUCH) { > > > pmd_t _pmd; > > > � /* > > > > I have just noticed that this patch is not correct fo 3.12 because we > > should return NULL rather than the page in this case. 3.2 is wrong as > > well AFAICS. > > Thanks for reporting this. This is the same mistake I made initially > with follow_page() in 3.2. But I had a test case which caught that > before release, and I don't have a test case for this. I believe Miroslav has used the test case embeded in the patch description to catch the bug. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-04-03 7:28 ` Michal Hocko @ 2017-04-03 9:08 ` Miroslav Benes 2017-04-10 18:14 ` Ben Hutchings 1 sibling, 0 replies; 12+ messages in thread From: Miroslav Benes @ 2017-04-03 9:08 UTC (permalink / raw) To: Michal Hocko Cc: Ben Hutchings, Jiri Slaby, stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 1102 bytes --] On Mon, 3 Apr 2017, Michal Hocko wrote: > On Fri 31-03-17 20:58:16, Ben Hutchings wrote: > > On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote: > > > On Wed 22-03-17 10:09:43, Jiri Slaby wrote: > > > [...] > > > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > > > � > > > > � page = pmd_page(*pmd); > > > > � VM_BUG_ON(!PageHead(page)); > > > > + > > > > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > > > > + goto out; > > > > + > > > > � if (flags & FOLL_TOUCH) { > > > > pmd_t _pmd; > > > > � /* > > > > > > I have just noticed that this patch is not correct fo 3.12 because we > > > should return NULL rather than the page in this case. 3.2 is wrong as > > > well AFAICS. > > > > Thanks for reporting this. This is the same mistake I made initially > > with follow_page() in 3.2. But I had a test case which caught that > > before release, and I don't have a test case for this. > > I believe Miroslav has used the test case embeded in the patch > description to catch the bug. That is correct. Miroslav ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp 2017-04-03 7:28 ` Michal Hocko 2017-04-03 9:08 ` Miroslav Benes @ 2017-04-10 18:14 ` Ben Hutchings 1 sibling, 0 replies; 12+ messages in thread From: Ben Hutchings @ 2017-04-10 18:14 UTC (permalink / raw) To: Michal Hocko Cc: Jiri Slaby, stable, Keno Fischer, Greg Thelen, Nicholas Piggin, Willy Tarreau, Oleg Nesterov, Kees Cook, Andy Lutomirski, Hugh Dickins, Andrew Morton, Linus Torvalds, Miroslav Benes [-- Attachment #1: Type: text/plain, Size: 1242 bytes --] On Mon, 2017-04-03 at 09:28 +0200, Michal Hocko wrote: > On Fri 31-03-17 20:58:16, Ben Hutchings wrote: > > On Tue, 2017-03-28 at 15:11 +0200, Michal Hocko wrote: > > > On Wed 22-03-17 10:09:43, Jiri Slaby wrote: > > > [...] > > > > @@ -1245,6 +1254,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > > > > > > > > page = pmd_page(*pmd); > > > > VM_BUG_ON(!PageHead(page)); > > > > + > > > > + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags)) > > > > + goto out; > > > > + > > > > if (flags & FOLL_TOUCH) { > > > > pmd_t _pmd; > > > > /* > > > > > > I have just noticed that this patch is not correct fo 3.12 because we > > > should return NULL rather than the page in this case. 3.2 is wrong as > > > well AFAICS. > > > > Thanks for reporting this. This is the same mistake I made initially > > with follow_page() in 3.2. But I had a test case which caught that > > before release, and I don't have a test case for this. > > I believe Miroslav has used the test case embeded in the patch > description to catch the bug. Well then I have no excuse for screwing this up. :-/ Ben. -- Ben Hutchings 73.46% of all statistics are made up. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-04-10 18:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-22 9:09 [patch added to 3.12-stable] dm: flush queued bios when process blocks to avoid deadlock Jiri Slaby 2017-03-22 9:09 ` [patch added to 3.12-stable] xfs: pass total block res. as total xfs_bmapi_write() parameter Jiri Slaby 2017-03-22 9:09 ` [patch added to 3.12-stable] mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Jiri Slaby 2017-03-28 13:11 ` Michal Hocko 2017-03-28 13:23 ` Michal Hocko 2017-03-28 13:55 ` Jiri Slaby 2017-03-30 15:24 ` Michal Hocko 2017-03-31 20:07 ` Ben Hutchings 2017-03-31 19:58 ` Ben Hutchings 2017-04-03 7:28 ` Michal Hocko 2017-04-03 9:08 ` Miroslav Benes 2017-04-10 18:14 ` Ben Hutchings
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.