From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: [patch 004/131] mm/gup: refactor and de-duplicate gup_fast() code Date: Wed, 03 Jun 2020 15:56:30 -0700 Message-ID: <20200603225630.dODblpnlR%akpm@linux-foundation.org> References: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org> Reply-To: linux-kernel@vger.kernel.org Return-path: Received: from mail.kernel.org ([198.145.29.99]:37738 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725863AbgFCW4d (ORCPT ); Wed, 3 Jun 2020 18:56:33 -0400 In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org> Sender: mm-commits-owner@vger.kernel.org List-Id: mm-commits@vger.kernel.org To: airlied@linux.ie, akpm@linux-foundation.org, chris@chris-wilson.co.uk, daniel@ffwll.ch, jani.nikula@linux.intel.com, jhubbard@nvidia.com, joonas.lahtinen@linux.intel.com, jrdr.linux@gmail.com, linux-mm@kvack.org, matthew.auld@intel.com, mm-commits@vger.kernel.org, rodrigo.vivi@intel.com, torvalds@linux-foundation.org, tvrtko.ursulin@intel.com, willy@infradead.org From: John Hubbard Subject: mm/gup: refactor and de-duplicate gup_fast() code There were two nearly identical sets of code for gup_fast() style of walking the page tables with interrupts disabled. This has lead to the usual maintenance problems that arise from having duplicated code. There is already a core internal routine in gup.c for gup_fast(), so just enhance it very slightly: allow skipping the fall-back to "slow" (regular) get_user_pages(), via the new FOLL_FAST_ONLY flag. Then, just call internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust the API to match pre-existing API behavior. There is a change in behavior from this refactoring: the nested form of interrupt disabling is used in all gup_fast() variants now. That's because there is only one place that interrupt disabling for page walking is done, and so the safer form is required. This should, if anything, eliminate possible (rare) bugs, because the non-nested form of enabling interrupts was fragile at best. [jhubbard@nvidia.com: fixup] Link: http://lkml.kernel.org/r/20200521233841.1279742-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20200519002124.2025955-3-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: Chris Wilson Cc: Daniel Vetter Cc: David Airlie Cc: Jani Nikula Cc: "Joonas Lahtinen" Cc: Matthew Auld Cc: Matthew Wilcox Cc: Rodrigo Vivi Cc: Souptick Joarder Cc: Tvrtko Ursulin Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 mm/gup.c | 61 ++++++++++++++++++++----------------------- 2 files changed, 30 insertions(+), 32 deletions(-) --- a/include/linux/mm.h~mm-gup-refactor-and-de-duplicate-gup_fast-code +++ a/include/linux/mm.h @@ -2816,6 +2816,7 @@ struct page *follow_page(struct vm_area_ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ +#define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each --- a/mm/gup.c~mm-gup-refactor-and-de-duplicate-gup_fast-code +++ a/mm/gup.c @@ -2731,10 +2731,12 @@ static int internal_get_user_pages_fast( struct page **pages) { unsigned long addr, len, end; + unsigned long flags; int nr_pinned = 0, ret = 0; if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | - FOLL_FORCE | FOLL_PIN | FOLL_GET))) + FOLL_FORCE | FOLL_PIN | FOLL_GET | + FOLL_FAST_ONLY))) return -EINVAL; start = untagged_addr(start) & PAGE_MASK; @@ -2753,16 +2755,26 @@ static int internal_get_user_pages_fast( * order to avoid confusing the normal COW routines. So only * targets that are already writable are safe to do by just * looking at the page tables. + * + * Disable interrupts. The nested form is used, in order to allow full, + * general purpose use of this routine. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See struct mmu_table_batch comments in + * include/asm-generic/tlb.h for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. */ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { - local_irq_disable(); + local_irq_save(flags); gup_pgd_range(addr, end, gup_flags | FOLL_WRITE, pages, &nr_pinned); - local_irq_enable(); + local_irq_restore(flags); ret = nr_pinned; } - if (nr_pinned < nr_pages) { + if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) { /* Try to get the remaining pages with get_user_pages */ start += nr_pinned << PAGE_SHIFT; pages += nr_pinned; @@ -2798,37 +2810,27 @@ static int internal_get_user_pages_fast( int __get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { - unsigned long len, end; - unsigned long flags; - int nr_pinned = 0; + int nr_pinned; /* * Internally (within mm/gup.c), gup fast variants must set FOLL_GET, * because gup fast is always a "pin with a +1 page refcount" request. + * + * FOLL_FAST_ONLY is required in order to match the API description of + * this routine: no fall back to regular ("slow") GUP. */ - unsigned int gup_flags = FOLL_GET; + unsigned int gup_flags = FOLL_GET | FOLL_FAST_ONLY; if (write) gup_flags |= FOLL_WRITE; - start = untagged_addr(start) & PAGE_MASK; - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - - if (end <= start) - return 0; - if (unlikely(!access_ok((void __user *)start, len))) - return 0; + nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags, + pages); /* - * Disable interrupts. We use the nested form as we can already have - * interrupts disabled by get_futex_key. - * - * With interrupts disabled, we block page table pages from being - * freed from under us. See struct mmu_table_batch comments in - * include/asm-generic/tlb.h for more details. - * - * We do not adopt an rcu_read_lock(.) here as we also want to - * block IPIs that come from THPs splitting. + * As specified in the API description above, this routine is not + * allowed to return negative values. However, the common core + * routine internal_get_user_pages_fast() *can* return -errno. + * Therefore, correct for that here: * * NOTE! We allow read-only gup_fast() here, but you'd better be * careful about possible COW pages. You'll get _a_ COW page, but @@ -2836,13 +2838,8 @@ int __get_user_pages_fast(unsigned long * COW event happens after this. COW may break the page copy in a * random direction. */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 317E4C433E0 for ; Wed, 3 Jun 2020 22:56:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE6EE20734 for ; Wed, 3 Jun 2020 22:56:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="aJnboygc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE6EE20734 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 88BB6280005; Wed, 3 Jun 2020 18:56:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83D10280003; Wed, 3 Jun 2020 18:56:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 706BE280005; Wed, 3 Jun 2020 18:56:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 525B8280003 for ; Wed, 3 Jun 2020 18:56:33 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0B94A21E4 for ; Wed, 3 Jun 2020 22:56:33 +0000 (UTC) X-FDA: 76889411466.02.soup37_9070fa803aa06 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id E2B3D7FD8E for ; Wed, 3 Jun 2020 22:56:32 +0000 (UTC) X-HE-Tag: soup37_9070fa803aa06 X-Filterd-Recvd-Size: 8086 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Wed, 3 Jun 2020 22:56:32 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 33890207DA; Wed, 3 Jun 2020 22:56:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1591224991; bh=06wkJiu454/OpJq7lRYI9HSvq5G6Gidcgav5JXi4IKU=; h=Date:From:To:Subject:In-Reply-To:From; b=aJnboygcvyJy9SeA4nETEyDZZi1eJ93nbVZtikcuSMyB4fUFZjKhDSJf4WADtJfwQ c+4Trj6h9TZhI602DbiCUGWpxlsNaBfEOf6afJeEICHizGtJvRzOZgMWdvf3MtUf7v 1xUgu4lUo6rAa+2fC1srnh7t1rBhiomTjIyYefbQ= Date: Wed, 03 Jun 2020 15:56:30 -0700 From: Andrew Morton To: airlied@linux.ie, akpm@linux-foundation.org, chris@chris-wilson.co.uk, daniel@ffwll.ch, jani.nikula@linux.intel.com, jhubbard@nvidia.com, joonas.lahtinen@linux.intel.com, jrdr.linux@gmail.com, linux-mm@kvack.org, matthew.auld@intel.com, mm-commits@vger.kernel.org, rodrigo.vivi@intel.com, torvalds@linux-foundation.org, tvrtko.ursulin@intel.com, willy@infradead.org Subject: [patch 004/131] mm/gup: refactor and de-duplicate gup_fast() code Message-ID: <20200603225630.dODblpnlR%akpm@linux-foundation.org> In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Queue-Id: E2B3D7FD8E X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: John Hubbard Subject: mm/gup: refactor and de-duplicate gup_fast() code There were two nearly identical sets of code for gup_fast() style of walking the page tables with interrupts disabled. This has lead to the usual maintenance problems that arise from having duplicated code. There is already a core internal routine in gup.c for gup_fast(), so just enhance it very slightly: allow skipping the fall-back to "slow" (regular) get_user_pages(), via the new FOLL_FAST_ONLY flag. Then, just call internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust the API to match pre-existing API behavior. There is a change in behavior from this refactoring: the nested form of interrupt disabling is used in all gup_fast() variants now. That's because there is only one place that interrupt disabling for page walking is done, and so the safer form is required. This should, if anything, eliminate possible (rare) bugs, because the non-nested form of enabling interrupts was fragile at best. [jhubbard@nvidia.com: fixup] Link: http://lkml.kernel.org/r/20200521233841.1279742-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20200519002124.2025955-3-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: Chris Wilson Cc: Daniel Vetter Cc: David Airlie Cc: Jani Nikula Cc: "Joonas Lahtinen" Cc: Matthew Auld Cc: Matthew Wilcox Cc: Rodrigo Vivi Cc: Souptick Joarder Cc: Tvrtko Ursulin Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 mm/gup.c | 61 ++++++++++++++++++++----------------------- 2 files changed, 30 insertions(+), 32 deletions(-) --- a/include/linux/mm.h~mm-gup-refactor-and-de-duplicate-gup_fast-code +++ a/include/linux/mm.h @@ -2816,6 +2816,7 @@ struct page *follow_page(struct vm_area_ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ +#define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each --- a/mm/gup.c~mm-gup-refactor-and-de-duplicate-gup_fast-code +++ a/mm/gup.c @@ -2731,10 +2731,12 @@ static int internal_get_user_pages_fast( struct page **pages) { unsigned long addr, len, end; + unsigned long flags; int nr_pinned = 0, ret = 0; if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | - FOLL_FORCE | FOLL_PIN | FOLL_GET))) + FOLL_FORCE | FOLL_PIN | FOLL_GET | + FOLL_FAST_ONLY))) return -EINVAL; start = untagged_addr(start) & PAGE_MASK; @@ -2753,16 +2755,26 @@ static int internal_get_user_pages_fast( * order to avoid confusing the normal COW routines. So only * targets that are already writable are safe to do by just * looking at the page tables. + * + * Disable interrupts. The nested form is used, in order to allow full, + * general purpose use of this routine. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See struct mmu_table_batch comments in + * include/asm-generic/tlb.h for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. */ if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { - local_irq_disable(); + local_irq_save(flags); gup_pgd_range(addr, end, gup_flags | FOLL_WRITE, pages, &nr_pinned); - local_irq_enable(); + local_irq_restore(flags); ret = nr_pinned; } - if (nr_pinned < nr_pages) { + if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) { /* Try to get the remaining pages with get_user_pages */ start += nr_pinned << PAGE_SHIFT; pages += nr_pinned; @@ -2798,37 +2810,27 @@ static int internal_get_user_pages_fast( int __get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { - unsigned long len, end; - unsigned long flags; - int nr_pinned = 0; + int nr_pinned; /* * Internally (within mm/gup.c), gup fast variants must set FOLL_GET, * because gup fast is always a "pin with a +1 page refcount" request. + * + * FOLL_FAST_ONLY is required in order to match the API description of + * this routine: no fall back to regular ("slow") GUP. */ - unsigned int gup_flags = FOLL_GET; + unsigned int gup_flags = FOLL_GET | FOLL_FAST_ONLY; if (write) gup_flags |= FOLL_WRITE; - start = untagged_addr(start) & PAGE_MASK; - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - - if (end <= start) - return 0; - if (unlikely(!access_ok((void __user *)start, len))) - return 0; + nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags, + pages); /* - * Disable interrupts. We use the nested form as we can already have - * interrupts disabled by get_futex_key. - * - * With interrupts disabled, we block page table pages from being - * freed from under us. See struct mmu_table_batch comments in - * include/asm-generic/tlb.h for more details. - * - * We do not adopt an rcu_read_lock(.) here as we also want to - * block IPIs that come from THPs splitting. + * As specified in the API description above, this routine is not + * allowed to return negative values. However, the common core + * routine internal_get_user_pages_fast() *can* return -errno. + * Therefore, correct for that here: * * NOTE! We allow read-only gup_fast() here, but you'd better be * careful about possible COW pages. You'll get _a_ COW page, but @@ -2836,13 +2838,8 @@ int __get_user_pages_fast(unsigned long * COW event happens after this. COW may break the page copy in a * random direction. */ - - if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && - gup_fast_permitted(start, end)) { - local_irq_save(flags); - gup_pgd_range(start, end, gup_flags, pages, &nr_pinned); - local_irq_restore(flags); - } + if (nr_pinned < 0) + nr_pinned = 0; return nr_pinned; } _