From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andres Lagar-Cavilla Subject: Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem Date: Tue, 16 Sep 2014 09:52:39 -0700 Message-ID: References: <1410811885-17267-1-git-send-email-andreslc@google.com> <54184078.4070505@redhat.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=bcaec51867424d771a0503319470 Cc: Gleb Natapov , Rik van Riel , Peter Zijlstra , Mel Gorman , Andy Lutomirski , Andrew Morton , Andrea Arcangeli , Sasha Levin , Jianyu Zhan , Paul Cassella , Hugh Dickins , Peter Feiner , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org To: Paolo Bonzini Return-path: In-Reply-To: <54184078.4070505@redhat.com> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org --bcaec51867424d771a0503319470 Content-Type: text/plain; charset=UTF-8 On Tue, Sep 16, 2014 at 6:51 AM, Paolo Bonzini wrote: > Il 15/09/2014 22:11, Andres Lagar-Cavilla ha scritto: > > + if (!locked) { > > + BUG_ON(npages != -EBUSY); > > VM_BUG_ON perhaps? > Sure. > > @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, > bool *async, bool write_fault, > > npages = get_user_page_nowait(current, current->mm, > > addr, write_fault, page); > > up_read(¤t->mm->mmap_sem); > > - } else > > - npages = get_user_pages_fast(addr, 1, write_fault, > > - page); > > + } else { > > + /* > > + * By now we have tried gup_fast, and possible async_pf, > and we > > + * are certainly not atomic. Time to retry the gup, > allowing > > + * mmap semaphore to be relinquished in the case of IO. > > + */ > > + npages = kvm_get_user_page_retry(current, current->mm, > addr, > > + write_fault, page); > > This is a separate logical change. Was this: > > down_read(&mm->mmap_sem); > npages = get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL); > up_read(&mm->mmap_sem); > > the intention rather than get_user_pages_fast? > Nope. The intention was to pass FAULT_FLAG_RETRY to the vma fault handler (without _NOWAIT). And once you do that, if you come back without holding the mmap sem, you need to call yet again. By that point in the call chain I felt comfortable dropping the _fast. All paths that get there have already tried _fast (and some have tried _NOWAIT). > I think a first patch should introduce kvm_get_user_page_retry ("Retry a > fault after a gup with FOLL_NOWAIT.") and the second would add > FOLL_TRIED ("This properly relinquishes mmap semaphore if the > filemap/swap has to wait on page lock (and retries the gup to completion > after that"). > That's not what FOLL_TRIED does. The relinquishing of mmap semaphore is done by this patch minus the FOLL_TRIED bits. FOLL_TRIED will let the fault handler (e.g. filemap) know that we've been there and waited on the IO already, so in the common case we won't need to redo the IO. Have a look at how FAULT_FLAG_TRIED is used in e.g. arch/x86/mm/fault.c. > > Apart from this, the patch looks good. The mm/ parts are minimal, so I > think it's best to merge it through the KVM tree with someone's Acked-by. > Thanks! Andres > > Paolo > -- Andres Lagar-Cavilla | Google Cloud Platform | andreslc@google.com | 647-778-4380 --bcaec51867424d771a0503319470 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On T= ue, Sep 16, 2014 at 6:51 AM, Paolo Bonzini <pbonzini@redhat.com><= /span> wrote:
Il 15/09/2014 22:11, Andres= Lagar-Cavilla ha scritto:
> +=C2=A0 =C2=A0 =C2=A0if (!locked) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0BUG_ON(npages !=3D -E= BUSY);

VM_BUG_ON perhaps?
Sure.


> @@ -1177,9 +1210,15 @@ static int hva_to_pfn_slow(unsigned long addr, = bool *async, bool write_fault,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0npages =3D get_u= ser_page_nowait(current, current->mm,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0addr, write_fault, page);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0up_read(&cur= rent->mm->mmap_sem);
> -=C2=A0 =C2=A0 =C2=A0} else
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0npages =3D get_user_p= ages_fast(addr, 1, write_fault,
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 page);
> +=C2=A0 =C2=A0 =C2=A0} else {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * By now we have tri= ed gup_fast, and possible async_pf, and we
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * are certainly not = atomic. Time to retry the gup, allowing
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * mmap semaphore to = be relinquished in the case of IO.
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0npages =3D kvm_get_us= er_page_retry(current, current->mm, addr,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 write_fault, page);

This is a separate logical change.=C2=A0 Was this:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 down_read(&mm->mmap_sem);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 npages =3D get_user_pages(NULL, mm, addr, 1, 1,= 0, NULL, NULL);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 up_read(&mm->mmap_sem);

the intention rather than get_user_pages_fast?

Nope. The intention was to pass FAULT_FLAG_RETRY to the vma fault h= andler (without _NOWAIT). And once you do that, if you come back without ho= lding the mmap sem, you need to call yet again.

By= that point in the call chain I felt comfortable dropping the _fast. All pa= ths that get there have already tried _fast (and some have tried _NOWAIT).<= /div>


I think a first patch should introduce kvm_get_user_page_retry ("Retry= a
fault after a gup with FOLL_NOWAIT.") and the second would add
FOLL_TRIED ("This properly relinquishes mmap semaphore if the
filemap/swap has to wait on page lock (and retries the gup to completion after that").

That's not what = FOLL_TRIED does. The relinquishing of mmap semaphore is done by this patch = minus the FOLL_TRIED bits. FOLL_TRIED will let the fault handler (e.g. file= map) know that we've been there and waited on the IO already, so in the= common case we won't need to redo the IO.

Hav= e a look at how FAULT_FLAG_TRIED is used in e.g. arch/x86/mm/fault.c.
=
=C2=A0

Apart from this, the patch looks good.=C2=A0 The mm/ parts are minimal, so = I
think it's best to merge it through the KVM tree with someone's Ack= ed-by.

Thanks!
Andres
=C2=A0

Paolo



--
Andres La= gar-Cavilla=C2=A0|= =C2=A0Google Cloud Platform=C2=A0|=C2=A0= andreslc@google.com=C2=A0|=C2=A0647-778-4380
--bcaec51867424d771a0503319470-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org