From: Peter Zijlstra <a.p.zijlstra@chello.nl> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, Benjamin Herrenschmidt <benh@kernel.crashing.org>, David Miller <davem@davemloft.net>, Hugh Dickins <hugh.dickins@tiscali.co.uk>, Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>, Russell King <rmk@arm.linux.org.uk>, Chris Metcalf <cmetcalf@tilera.com>, Martin Schwidefsky <schwidefsky@de.ibm.com> Subject: Re: [RFC][PATCH 2/6] mm: Change flush_tlb_range() to take an mm_struct Date: Wed, 02 Mar 2011 22:40:27 +0100 [thread overview] Message-ID: <1299102027.1310.39.camel@laptop> (raw) In-Reply-To: <AANLkTimhWKhHojZ-9XZGSh3OzfPhvo__Dib9VfeMWoBQ@mail.gmail.com> On Wed, 2011-03-02 at 11:19 -0800, Linus Torvalds wrote: > On Wed, Mar 2, 2011 at 9:59 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > In order to be able to properly support architecture that want/need to > > support TLB range invalidation, we need to change the > > flush_tlb_range() argument from a vm_area_struct to an mm_struct > > because the range might very well extend past one VMA, or not have a > > VMA at all. > > I really don't think this is right. The whole "drop the icache > information" thing is a total anti-optimization, since for some > architectures, the icache flush is the _big_ deal. Right, so Tile has the I-cache flush from flush_tlb_range(), I'm not sure if that's the right thing to do, Documentation/cachetlb.txt seems to suggest doing it from update_mmu_cache() like things. However, I really don't know, and would happily be explained how these things are supposed to work. Also: > Possibly much > bigger than the TLB flush itself. Doing an icache flush was much more > expensive than the TLB flush on alpha, for example (the tlb had ASI's > etc, the icache did not). Right, but the problem remains that we do page-table teardown without having a vma. Now we can re-introduce I/D variants again by assuming D-only and using tlb_start_vma() to set a I-too bit on VM_EXEC. (this assumes the vm_args range is non-executable -- which it had better be). How about I do something like: enum { TLB_FLUSH_I = 1, TLB_FLUSH_D = 2, TLB_FLUSH_PAGE = 4, TLB_FLUSH_HPAGE = 8, }; void flush_tlb_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int flags); And we then do: tlb_gather_mmu(struct mmu_gather *tlb, ...) { ... tlb->flush_type = TLB_FLUSH_D | TLB_FLUSH_PAGE; } tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { if (!tlb->fullmm) flush_cache_range(vma, vma->vm_start, vma->vm_end); if (vma->vm_flags & VM_EXEC) tlb->flush_type |= TLB_FLUSH_I; if (vma->vm_flags & VM_HUGEPAGE) tlb->flush_type |= TLB_FLUSH_HPAGE; } tlb_flush_mmu(struct mmu_gather *tlb) { if (!tlb->fullmm && tlb->need_flush) { flush_tlb_range(tlb->mm, tlb->start, tlb->end, tlb->flush_type); tlb->start = TASK_SIZE; tlb->end = 0; } ... } > > There are various reasons that we need to flush TLBs _after_ freeing > > the page-tables themselves. For some architectures (x86 among others) > > this serializes against (both hardware and software) page table > > walkers like gup_fast(). > > This part of the changelog also makes no sense what-so-ever. It's > actively wrong. > > On x86, we absolutely *must* do the TLB flush _before_ we release the > page tables. So your commentary is actively wrong and misleading. > > The order has to be: > - clear the page table entry, queue the page to be free'd > - flush the TLB > - free the page (and page tables) > > and nothing else is correct, afaik. So the changelog is pure and utter > garbage. I didn't look at what the patch actually changed. OK, so I use the wrong terms, I meant page-table tear-down, where we remove the pte page pointer from the pmd, remove the pmd page from the pud etc. We then flush the TLBs and only then actually free the pages. I think the confusion stems from the fact that we call tear-down free_pgtables() The point was that we need to TLB flush _after_ tear-down (before actual free), not before tear-down. The problem is that currently we either end up doing too many TLB flushes or one too few.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, Benjamin Herrenschmidt <benh@kernel.crashing.org>, David Miller <davem@davemloft.net>, Hugh Dickins <hugh.dickins@tiscali.co.uk>, Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>, Russell King <rmk@arm.linux.org.uk>, Chris Metcalf <cmetcalf@tilera.com>, Martin Schwidefsky <schwidefsky@de.ibm.com> Subject: Re: [RFC][PATCH 2/6] mm: Change flush_tlb_range() to take an mm_struct Date: Wed, 02 Mar 2011 22:40:27 +0100 [thread overview] Message-ID: <1299102027.1310.39.camel@laptop> (raw) In-Reply-To: <AANLkTimhWKhHojZ-9XZGSh3OzfPhvo__Dib9VfeMWoBQ@mail.gmail.com> On Wed, 2011-03-02 at 11:19 -0800, Linus Torvalds wrote: > On Wed, Mar 2, 2011 at 9:59 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > In order to be able to properly support architecture that want/need to > > support TLB range invalidation, we need to change the > > flush_tlb_range() argument from a vm_area_struct to an mm_struct > > because the range might very well extend past one VMA, or not have a > > VMA at all. > > I really don't think this is right. The whole "drop the icache > information" thing is a total anti-optimization, since for some > architectures, the icache flush is the _big_ deal. Right, so Tile has the I-cache flush from flush_tlb_range(), I'm not sure if that's the right thing to do, Documentation/cachetlb.txt seems to suggest doing it from update_mmu_cache() like things. However, I really don't know, and would happily be explained how these things are supposed to work. Also: > Possibly much > bigger than the TLB flush itself. Doing an icache flush was much more > expensive than the TLB flush on alpha, for example (the tlb had ASI's > etc, the icache did not). Right, but the problem remains that we do page-table teardown without having a vma. Now we can re-introduce I/D variants again by assuming D-only and using tlb_start_vma() to set a I-too bit on VM_EXEC. (this assumes the vm_args range is non-executable -- which it had better be). How about I do something like: enum { TLB_FLUSH_I = 1, TLB_FLUSH_D = 2, TLB_FLUSH_PAGE = 4, TLB_FLUSH_HPAGE = 8, }; void flush_tlb_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int flags); And we then do: tlb_gather_mmu(struct mmu_gather *tlb, ...) { ... tlb->flush_type = TLB_FLUSH_D | TLB_FLUSH_PAGE; } tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { if (!tlb->fullmm) flush_cache_range(vma, vma->vm_start, vma->vm_end); if (vma->vm_flags & VM_EXEC) tlb->flush_type |= TLB_FLUSH_I; if (vma->vm_flags & VM_HUGEPAGE) tlb->flush_type |= TLB_FLUSH_HPAGE; } tlb_flush_mmu(struct mmu_gather *tlb) { if (!tlb->fullmm && tlb->need_flush) { flush_tlb_range(tlb->mm, tlb->start, tlb->end, tlb->flush_type); tlb->start = TASK_SIZE; tlb->end = 0; } ... } > > There are various reasons that we need to flush TLBs _after_ freeing > > the page-tables themselves. For some architectures (x86 among others) > > this serializes against (both hardware and software) page table > > walkers like gup_fast(). > > This part of the changelog also makes no sense what-so-ever. It's > actively wrong. > > On x86, we absolutely *must* do the TLB flush _before_ we release the > page tables. So your commentary is actively wrong and misleading. > > The order has to be: > - clear the page table entry, queue the page to be free'd > - flush the TLB > - free the page (and page tables) > > and nothing else is correct, afaik. So the changelog is pure and utter > garbage. I didn't look at what the patch actually changed. OK, so I use the wrong terms, I meant page-table tear-down, where we remove the pte page pointer from the pmd, remove the pmd page from the pud etc. We then flush the TLBs and only then actually free the pages. I think the confusion stems from the fact that we call tear-down free_pgtables() The point was that we need to TLB flush _after_ tear-down (before actual free), not before tear-down. The problem is that currently we either end up doing too many TLB flushes or one too few. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-02 21:39 UTC|newest] Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top 2011-03-02 17:59 [RFC][PATCH 0/6] mm: Unify TLB gather implementations Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` [RFC][PATCH 1/6] mm: Optimize fullmm TLB flushing Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` [RFC][PATCH 2/6] mm: Change flush_tlb_range() to take an mm_struct Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 19:19 ` Linus Torvalds 2011-03-02 19:19 ` Linus Torvalds 2011-03-02 20:58 ` Rik van Riel 2011-03-02 20:58 ` Rik van Riel 2011-03-02 21:40 ` Peter Zijlstra [this message] 2011-03-02 21:40 ` Peter Zijlstra 2011-03-02 21:47 ` David Miller 2011-03-02 21:47 ` David Miller 2011-03-03 17:22 ` Chris Metcalf 2011-03-03 17:22 ` Chris Metcalf 2011-03-03 17:22 ` Chris Metcalf 2011-03-03 18:45 ` David Miller 2011-03-03 18:45 ` David Miller 2011-03-03 18:56 ` Chris Metcalf 2011-03-03 18:56 ` Chris Metcalf 2011-03-03 18:56 ` Chris Metcalf 2011-03-10 18:05 ` [PATCH] arch/tile: optimize icache flush Chris Metcalf 2011-03-10 18:05 ` Chris Metcalf 2011-03-10 18:05 ` Chris Metcalf 2011-03-10 23:19 ` Rik van Riel 2011-03-10 23:19 ` Rik van Riel 2011-03-02 17:59 ` [RFC][PATCH 3/6] mm: Provide generic range tracking and flushing Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` [RFC][PATCH 4/6] arm, mm: Convert arm to generic tlb Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-09 15:16 ` Catalin Marinas 2011-03-09 15:16 ` Catalin Marinas 2011-03-09 15:19 ` Peter Zijlstra 2011-03-09 15:19 ` Peter Zijlstra 2011-03-09 15:36 ` Catalin Marinas 2011-03-09 15:36 ` Catalin Marinas 2011-03-09 15:39 ` Peter Zijlstra 2011-03-09 15:39 ` Peter Zijlstra 2011-03-09 15:48 ` Peter Zijlstra 2011-03-09 15:48 ` Peter Zijlstra 2011-03-09 16:34 ` Catalin Marinas 2011-03-09 16:34 ` Catalin Marinas 2012-05-17 3:05 ` Paul Mundt 2012-05-17 3:05 ` Paul Mundt 2012-05-17 9:30 ` Catalin Marinas 2012-05-17 9:30 ` Catalin Marinas 2012-05-17 9:39 ` Catalin Marinas 2012-05-17 9:39 ` Catalin Marinas 2012-05-17 9:51 ` Russell King 2012-05-17 9:51 ` Russell King 2012-05-17 11:28 ` Peter Zijlstra 2012-05-17 11:28 ` Peter Zijlstra 2012-05-17 12:14 ` Catalin Marinas 2012-05-17 12:14 ` Catalin Marinas 2012-05-17 16:00 ` Catalin Marinas 2012-05-17 16:00 ` Catalin Marinas 2012-05-17 16:24 ` Peter Zijlstra 2012-05-17 16:24 ` Peter Zijlstra 2012-05-17 16:33 ` Peter Zijlstra 2012-05-17 16:33 ` Peter Zijlstra 2012-05-17 16:44 ` Peter Zijlstra 2012-05-17 16:44 ` Peter Zijlstra 2012-05-17 16:59 ` Peter Zijlstra 2012-05-17 16:59 ` Peter Zijlstra 2012-05-17 17:01 ` Catalin Marinas 2012-05-17 17:01 ` Catalin Marinas 2012-05-17 17:11 ` Peter Zijlstra 2012-05-17 17:11 ` Peter Zijlstra 2012-05-21 7:47 ` Martin Schwidefsky 2012-05-21 7:47 ` Martin Schwidefsky 2012-05-17 17:22 ` Russell King 2012-05-17 17:22 ` Russell King 2012-05-17 18:31 ` Catalin Marinas 2012-05-17 18:31 ` Catalin Marinas 2011-03-02 17:59 ` [RFC][PATCH 5/6] ia64, mm: Convert ia64 " Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` [RFC][PATCH 6/6] sh, mm: Convert sh " Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra 2011-03-02 17:59 ` Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1299102027.1310.39.camel@laptop \ --to=a.p.zijlstra@chello.nl \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=benh@kernel.crashing.org \ --cc=cmetcalf@tilera.com \ --cc=davem@davemloft.net \ --cc=hugh.dickins@tiscali.co.uk \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mel@csn.ul.ie \ --cc=mingo@elte.hu \ --cc=npiggin@kernel.dk \ --cc=riel@redhat.com \ --cc=rmk@arm.linux.org.uk \ --cc=schwidefsky@de.ibm.com \ --cc=tglx@linutronix.de \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.