From: Linus Torvalds <torvalds@linux-foundation.org> To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, Linux ARM <linux-arm-kernel@lists.infradead.org>, Catalin Marinas <catalin.marinas@arm.com>, Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Vinayak Menon <vinmenon@codeaurora.org>, Android Kernel Team <kernel-team@android.com> Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Date: Thu, 10 Dec 2020 09:23:53 -0800 [thread overview] Message-ID: <CAHk-=wiU8ktvak2hCj2TWJ6wMSwVsUSvi5Bjf4i1JGvpGmyUZw@mail.gmail.com> (raw) In-Reply-To: <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > See lightly tested patch below. Is it something you had in mind? This is closer, in that at least it removes the ostensibly blocking allocation (that can't happen) from the prefault path. But the main issue remains: > > At that point, I think the current very special and odd > > do_fault_around() pre-allocation could be made into just a _regular_ > > "allocate the pmd if it doesn't exist". And then the pte locking could > > be moved into filemap_map_pages(), and suddenly the semantics and > > rules around all that would be a whole lot more obvious. > > No. It would stop faultaround code from mapping huge pages. We had to > defer pte page table mapping until we know we don't have huge pages in > page cache. Can we please move that part to the callers too - possibly with a separate helper function? Because the real issue remains: as long the map_set_pte() function takes the pte lock, the caller cannot rely on it. And the filemap_map_pages() code really would like to rely on it. Because if the lock is taken there *above* the loop - or even in the loop iteration at the top, the code can now do things that rely on "I know I hold the page table lock". In particular, we can get rid of that very very expensive page locking. Which is the reason I know about the horrid current issue with "pre-allocate in one place, lock in another, and know we are atomic in a third place" issue. Because I had to walk down these paths and realize that "this loop is run under the page table lock, EXCEPT for the first iteration, where it's taken by the first time we do that non-allocating alloc_set_pte()". See? Linus
WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org> To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Android Kernel Team <kernel-team@android.com>, Jan Kara <jack@suse.cz>, Minchan Kim <minchan@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, Vinayak Menon <vinmenon@codeaurora.org>, Andrew Morton <akpm@linux-foundation.org>, Will Deacon <will@kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Date: Thu, 10 Dec 2020 09:23:53 -0800 [thread overview] Message-ID: <CAHk-=wiU8ktvak2hCj2TWJ6wMSwVsUSvi5Bjf4i1JGvpGmyUZw@mail.gmail.com> (raw) In-Reply-To: <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > See lightly tested patch below. Is it something you had in mind? This is closer, in that at least it removes the ostensibly blocking allocation (that can't happen) from the prefault path. But the main issue remains: > > At that point, I think the current very special and odd > > do_fault_around() pre-allocation could be made into just a _regular_ > > "allocate the pmd if it doesn't exist". And then the pte locking could > > be moved into filemap_map_pages(), and suddenly the semantics and > > rules around all that would be a whole lot more obvious. > > No. It would stop faultaround code from mapping huge pages. We had to > defer pte page table mapping until we know we don't have huge pages in > page cache. Can we please move that part to the callers too - possibly with a separate helper function? Because the real issue remains: as long the map_set_pte() function takes the pte lock, the caller cannot rely on it. And the filemap_map_pages() code really would like to rely on it. Because if the lock is taken there *above* the loop - or even in the loop iteration at the top, the code can now do things that rely on "I know I hold the page table lock". In particular, we can get rid of that very very expensive page locking. Which is the reason I know about the horrid current issue with "pre-allocate in one place, lock in another, and know we are atomic in a third place" issue. Because I had to walk down these paths and realize that "this loop is run under the page table lock, EXCEPT for the first iteration, where it's taken by the first time we do that non-allocating alloc_set_pte()". See? Linus _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-12-10 17:25 UTC|newest] Thread overview: 138+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-09 16:39 [PATCH 0/2] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 16:39 ` [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 17:58 ` Linus Torvalds 2020-12-09 18:40 ` Will Deacon 2020-12-09 18:40 ` Will Deacon 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 19:04 ` Linus Torvalds 2020-12-09 20:32 ` Matthew Wilcox 2020-12-09 20:32 ` Matthew Wilcox 2020-12-09 21:04 ` Linus Torvalds 2020-12-09 21:04 ` Linus Torvalds 2020-12-09 21:04 ` Linus Torvalds 2020-12-10 15:08 ` Kirill A. Shutemov 2020-12-10 15:08 ` Kirill A. Shutemov 2020-12-10 17:23 ` Linus Torvalds [this message] 2020-12-10 17:23 ` Linus Torvalds 2020-12-10 17:23 ` Linus Torvalds 2020-12-14 16:07 ` Kirill A. Shutemov 2020-12-14 16:07 ` Kirill A. Shutemov 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 17:54 ` Linus Torvalds 2020-12-14 18:56 ` Matthew Wilcox 2020-12-14 18:56 ` Matthew Wilcox 2020-12-16 17:07 ` Kirill A. Shutemov 2020-12-16 17:07 ` Kirill A. Shutemov 2020-12-16 18:41 ` Linus Torvalds 2020-12-16 18:41 ` Linus Torvalds 2020-12-16 18:41 ` Linus Torvalds 2020-12-17 10:54 ` Kirill A. Shutemov 2020-12-17 10:54 ` Kirill A. Shutemov 2020-12-17 18:22 ` Linus Torvalds 2020-12-17 18:22 ` Linus Torvalds 2020-12-17 18:22 ` Linus Torvalds 2020-12-18 11:04 ` Kirill A. Shutemov 2020-12-18 11:04 ` Kirill A. Shutemov 2020-12-18 18:56 ` Linus Torvalds 2020-12-18 18:56 ` Linus Torvalds 2020-12-18 18:56 ` Linus Torvalds 2020-12-19 12:41 ` Kirill A. Shutemov 2020-12-19 12:41 ` Kirill A. Shutemov 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:08 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-19 20:34 ` Linus Torvalds 2020-12-22 10:00 ` Kirill A. Shutemov 2020-12-22 10:00 ` Kirill A. Shutemov 2020-12-24 4:04 ` Hugh Dickins 2020-12-24 4:04 ` Hugh Dickins 2020-12-24 4:04 ` Hugh Dickins 2020-12-25 11:31 ` Kirill A. Shutemov 2020-12-25 11:31 ` Kirill A. Shutemov 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 17:57 ` Linus Torvalds 2020-12-26 20:43 ` Kirill A. Shutemov 2020-12-26 20:43 ` Kirill A. Shutemov 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:03 ` Hugh Dickins 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 21:16 ` Linus Torvalds 2020-12-26 22:40 ` Kirill A. Shutemov 2020-12-26 22:40 ` Kirill A. Shutemov 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 0:45 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 2:38 ` Hugh Dickins 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 19:38 ` Linus Torvalds 2020-12-27 20:32 ` Damian Tometzki 2020-12-27 20:32 ` Damian Tometzki 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 22:35 ` Hugh Dickins 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:12 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:40 ` Linus Torvalds 2020-12-27 23:55 ` Kirill A. Shutemov 2020-12-27 23:55 ` Kirill A. Shutemov 2020-12-27 23:48 ` Kirill A. Shutemov 2020-12-27 23:48 ` Kirill A. Shutemov 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 1:54 ` Linus Torvalds 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 6:43 ` Hugh Dickins 2020-12-28 12:53 ` Kirill A. Shutemov 2020-12-28 12:53 ` Kirill A. Shutemov 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 18:47 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-28 21:58 ` Linus Torvalds 2020-12-29 13:28 ` Kirill A. Shutemov 2020-12-29 13:28 ` Kirill A. Shutemov 2020-12-29 15:19 ` Matthew Wilcox 2020-12-29 15:19 ` Matthew Wilcox 2020-12-29 20:52 ` Linus Torvalds 2020-12-29 20:52 ` Linus Torvalds 2020-12-29 20:52 ` Linus Torvalds 2020-12-28 22:05 ` Kirill A. Shutemov 2020-12-28 22:05 ` Kirill A. Shutemov 2020-12-28 22:12 ` Kirill A. Shutemov 2020-12-28 22:12 ` Kirill A. Shutemov 2020-12-29 4:35 ` Hugh Dickins 2020-12-29 4:35 ` Hugh Dickins 2020-12-29 4:35 ` Hugh Dickins 2020-12-28 23:28 ` Linus Torvalds 2020-12-28 23:28 ` Linus Torvalds 2020-12-28 23:28 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:07 ` Linus Torvalds 2020-12-26 21:41 ` Matthew Wilcox 2020-12-26 21:41 ` Matthew Wilcox 2020-12-09 16:39 ` [PATCH 2/2] arm64: mm: Implement arch_wants_old_faultaround_pte() Will Deacon 2020-12-09 16:39 ` Will Deacon 2020-12-09 18:35 ` Catalin Marinas 2020-12-09 18:35 ` Catalin Marinas 2020-12-09 18:46 ` Will Deacon 2020-12-09 18:46 ` Will Deacon
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAHk-=wiU8ktvak2hCj2TWJ6wMSwVsUSvi5Bjf4i1JGvpGmyUZw@mail.gmail.com' \ --to=torvalds@linux-foundation.org \ --cc=akpm@linux-foundation.org \ --cc=catalin.marinas@arm.com \ --cc=jack@suse.cz \ --cc=kernel-team@android.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=minchan@kernel.org \ --cc=vinmenon@codeaurora.org \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.