linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Leonardo Bras <leonardo@linux.ibm.com>,
	<linuxppc-dev@lists.ozlabs.org>, <linux-kernel@vger.kernel.org>,
	<kvm-ppc@vger.kernel.org>, <linux-arch@vger.kernel.org>,
	<linux-mm@kvack.org>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	YueHaibing <yuehaibing@huawei.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Keith Busch <keith.busch@intel.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Paul Mackerras <paulus@samba.org>,
	Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>,
	"Allison Randal" <allison@lohutok.net>,
	Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>,
	Ganesh Goudar <ganeshgr@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ira Weiny <ira.weiny@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v4 01/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks
Date: Mon, 30 Sep 2019 14:47:54 -0700	[thread overview]
Message-ID: <8534727b-72ed-b974-219e-02155bcd17a8@nvidia.com> (raw)
In-Reply-To: <673bcb94b7752e086cc4133fb6cceb24394c02c0.camel@linux.ibm.com>

On 9/30/19 11:42 AM, Leonardo Bras wrote:
> On Mon, 2019-09-30 at 10:57 -0700, John Hubbard wrote:
>>> As I told before, there are cases where this function is called from
>>> 'real mode' in powerpc, which doesn't disable irqs and may have a
>>> tricky behavior if we do. So, encapsulate the irq disable in this
>>> function can be a bad choice.
>>
>> You still haven't explained how this works in that case. So far, the
>> synchronization we've discussed has depended upon interrupt disabling
>> as part of the solution, in order to hold off page splitting and page
>> table freeing.
> 
> The irqs are already disabled by another mechanism (hw): MSR_EE=0.
> So, serialize will work as expected.

I get that they're disabled. But will this interlock with the code that
issues IPIs?? Because it's not just disabling interrupts that matters, but
rather, synchronizing with the code (TLB flushing) that *happens* to 
require issuing IPIs, which in turn interact with disabling interrupts.

So I'm still not seeing how that could work here, unless there is something
interesting about the smp_call_function_many() on ppc with MSR_EE=0 mode...?

> 
>> Simply skipping that means that an additional mechanism is required...which
>> btw might involve a new, ppc-specific routine, so maybe this is going to end
>> up pretty close to what I pasted in after all...
>>> Of course, if we really need that, we can add a bool parameter to the
>>> function to choose about disabling/enabling irqs.
>>>> * This is really a core mm function, so don't hide it away in arch layers.
>>>>     (If you're changing mm/ files, that's a big hint.)
>>>
>>> My idea here is to let the arch decide on how this 'register' is going
>>> to work, as archs may have different needs (in powerpc for example, we
>>> can't always disable irqs, since we may be in realmode).

Yes, the tension there is that a) some things are per-arch, and b) it's easy 
to get it wrong. The commit below (d9101bfa6adc) is IMHO a perfect example of
that.

So, I would like core mm/ functions that guide the way, but the interrupt
behavior complicates it. I think your original passing of just struct_mm
is probably the right balance, assuming that I'm wrong about interrupts.


>>>
>>> Maybe we can create a generic function instead of a dummy, and let it
>>> be replaced in case the arch needs to do so.
>>
>> Yes, that might be what we need, if it turns out that ppc can't use this
>> approach (although let's see about that).
>>
> 
> I initially used the dummy approach because I did not see anything like
> serialize in other archs. 
> 
> I mean, even if I put some generic function here, if there is no
> function to use the 'lockless_pgtbl_walk_count', it becomes only a
> overhead.
> 

Not really: the memory barrier is required in all cases, and this code
would be good I think:

+void register_lockless_pgtable_walker(struct mm_struct *mm)
+{
+#ifdef LOCKLESS_PAGE_TABLE_WALK_TRACKING
+       atomic_inc(&mm->lockless_pgtbl_nr_walkers);
+#endif
+       /*
+        * This memory barrier pairs with any code that is either trying to
+        * delete page tables, or split huge pages.
+        */
+       smp_mb();
+}
+EXPORT_SYMBOL_GPL(gup_fast_lock_acquire);

And this is the same as your original patch, with just a minor name change:

@@ -2341,9 +2395,11 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
 
        if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
            gup_fast_permitted(start, end)) {
+               register_lockless_pgtable_walker(current->mm);
                local_irq_save(flags);
                gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr);
                local_irq_restore(flags);
+               deregister_lockless_pgtable_walker(current->mm);


Btw, hopefully minor note: it also looks like there's a number of changes in the same 
area that conflict, for example:

    commit d9101bfa6adc ("powerpc/mm/mce: Keep irqs disabled during lockless 
         page table walk") <Aneesh Kumar K.V> (Thu, 19 Sep 2019)

...so it would be good to rebase this onto 5.4-rc1, now that that's here.


thanks,
-- 
John Hubbard
NVIDIA


  reply	other threads:[~2019-09-30 21:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27 23:39 [PATCH v4 00/11] Introduces new count-based method for monitoring lockless pagetable walks Leonardo Bras
2019-09-27 23:39 ` [PATCH v4 01/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks Leonardo Bras
2019-09-29 22:40   ` John Hubbard
2019-09-29 23:17     ` John Hubbard
2019-09-30 15:14     ` Leonardo Bras
2019-09-30 17:57       ` John Hubbard
2019-09-30 18:42         ` Leonardo Bras
2019-09-30 21:47           ` John Hubbard [this message]
2019-10-01 18:39             ` Leonardo Bras
2019-10-01 18:52               ` John Hubbard
2019-09-27 23:39 ` [PATCH v4 02/11] asm-generic/pgtable: Adds dummy functions " Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 03/11] mm/gup: Applies counting method to monitor gup_pgd_range Leonardo Bras
2019-09-30 11:09   ` Kirill A. Shutemov
2019-09-30 14:27     ` Leonardo Bras
2019-09-30 21:51   ` John Hubbard
2019-10-01 17:56     ` Leonardo Bras
2019-10-01 19:04       ` John Hubbard
2019-10-01 19:40         ` Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 04/11] powerpc/mce_power: Applies counting method to monitor lockless pgtbl walks Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 05/11] powerpc/perf: " Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 06/11] powerpc/mm/book3s64/hash: " Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 07/11] powerpc/kvm/e500: " Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 08/11] powerpc/kvm/book3s_hv: " Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 09/11] powerpc/kvm/book3s_64: " Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 10/11] powerpc/book3s_64: Enables counting method to monitor lockless pgtbl walk Leonardo Bras
2019-09-27 23:40 ` [PATCH v4 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing Leonardo Bras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8534727b-72ed-b974-219e-02155bcd17a8@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=allison@lohutok.net \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=dan.j.williams@intel.com \
    --cc=ganeshgr@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=keith.busch@intel.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=leonardo@linux.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=yuehaibing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).