All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Dominique Martinet <asmadeus@codewreck.org>
Cc: linux-mm@kvack.org
Subject: Re: How to use huge pages in drivers?
Date: Wed, 4 Sep 2019 10:50:32 -0700	[thread overview]
Message-ID: <20190904175032.GL29434@bombadil.infradead.org> (raw)
In-Reply-To: <20190904170056.GA9825@nautica>

On Wed, Sep 04, 2019 at 07:00:56PM +0200, Dominique Martinet wrote:
> Dominique Martinet wrote on Tue, Sep 03, 2019:
> > Matthew Wilcox wrote on Tue, Sep 03, 2019:
> > > > What I'd like to know is:
> > > >  - we know (assuming the other side isn't too bugged, but if it is we're
> > > > fucked up anyway) exactly what huge-page-sized physical memory range has
> > > > been mapped on the other side, is there a way to manually gather the
> > > > pages corresponding and merge them into a huge page?
> > > 
> > > You're using the word 'page' here, but I suspect what you really mean is
> > > "pfn" or "pte".  As you've described it, it doesn't matter what data structure
> > > Linux is using for the memory, since Linux doesn't know about the memory.
> > 
> > Correct, we're already using vmf_insert_pfn
> 
> Actually let me take that back, vmf_insert_pfn is only used if
> pfn_valid() is false, probably as a safeguard of sort(?).
> The normal case went with pfn_to_page(pfn) + vm_insert_page() so, as
> things stands.
> I do have a few more questions if you could humor me a bit more...
> 
>  - the vma was created with a vm_flags including VM_MIXEDMAP for some
> reason, I don't know why.
> If I change it to VM_PFNMAP (which sounds better here from the little I
> understand of this as we do not need cow and looks a bit simpler?), I
> can remove the vm_insert_page() path and use the vmf_insert_pfn one
> instead, which appears to work fine for simple programs... But the
> kernel thread for my network adapter (bxi... which is not upstream
> either I guess.. sigh..) no longer tries to fault via my custom .fault
> vm operation... Which means I probably did need MIXEDMAP ?

Strange ... PFNMAP absolutely should try to fault via the ->fault
vm operation (although see below)

>  - ignoring that for now (it's not like I need to switch to PFNMAP);
> adding vmf_insert_pfn_pmd() for when the remote side uses large pages,
> it complains that the vmf->pmd is not a pmd_none nor huge nor a devmap
> (this check appears specific to rhel7 kernel, I could temporarily test
> with an upstream kernel but the network adapter won't work there so I'll
> need this to work on this ultimately)
> 
> It looks like handle_mm_fault() will always try to allocate a pmd so it
> should never be empty in my fault handler, and I don't see anything else
> than vmf_insert_pfn_pmd() setting the mkdevmap flag, and it's not huge
> either...
> (on a dump, the the pmd content is 175cb18067, so these flags according
> to crash for x86_64 are (PRESENT|RW|USER|ACCESSED|DIRTY))
> 
> I tried adding a huge_fault vm op thinking it might be called with a
> more appropriate pmd but it doesn't seem to be called at all in my
> case..? I would have assumed from the code that it would try every page

You shouldn't be calling vmf_insert_pfn_pmd() from a regular ->fault
handler, as by then the fault handler has already inserted a PMD.
The ->huge_fault handler is the place to call it from.

You may need to force PMD-alignment for your call to mmap().

> Long story short, I think I have some deeper undestanding problem about
> the whole thing. Do I also need to use some specific flags when that
> special file is mmap'd to allow huge_fault to be called ?
> I think transparent_hugepage_enabled(vma) is fine, but the vmf.pmd found
> in __handle_mm_fault is probably already not none at this point...?
> 
> Thanks again, feel free to ignore me for a bit longer I'll keep digging
> my own grave, writing to a rubber duck that might have an idea of how
> far the wrong way I've gone already helps... :D

Hope these pointers are slightly more useful than a rubber duck ;-)


  reply	other threads:[~2019-09-04 17:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 18:26 How to use huge pages in drivers? Dominique Martinet
2019-09-03 18:42 ` Matthew Wilcox
2019-09-03 21:28   ` Dominique Martinet
2019-09-04 17:00     ` Dominique Martinet
2019-09-04 17:50       ` Matthew Wilcox [this message]
2019-09-05 15:44         ` Dominique Martinet
2019-09-05 18:15           ` Matthew Wilcox
2019-09-05 18:50             ` Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190904175032.GL29434@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=asmadeus@codewreck.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.