From: Hongyan Xia <hx242@xen.org>
To: Jan Beulich <jbeulich@suse.com>
Cc: jgrall@amazon.com, "Andrew Cooper" <andrew.cooper3@citrix.com>,
"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH] x86/vmap: handle superpages in vmap_to_mfn()
Date: Wed, 02 Dec 2020 12:17:30 +0000 [thread overview]
Message-ID: <6409ea31aa21fa54ef2697b0ce959bbf7b5e2a23.camel@xen.org> (raw)
In-Reply-To: <ef127c6f-2d8a-1ddf-f8e7-7e747518c5a8@suse.com>
On Wed, 2020-12-02 at 11:04 +0100, Jan Beulich wrote:
> On 30.11.2020 17:50, Hongyan Xia wrote:
> > From: Hongyan Xia <hongyxia@amazon.com>
> >
> > There is simply no guarantee that vmap won't return superpages to
> > the
> > caller. It can happen if the list of MFNs are contiguous, or we
> > simply
> > have a large granularity. Although rare, if such things do happen,
> > we
> > will simply hit BUG_ON() and crash. Properly handle such cases in a
> > new
> > implementation.
> >
> > Note that vmap is now too large to be a macro, so implement it as a
> > normal function and move the declaration to mm.h (page.h cannot
> > handle
> > mfn_t).
>
> I'm not happy about this movement, and it's also not really clear to
> me what problem page.h would have in principle. Yes, it can't
> include xen/mm.h, but I think it is long overdue that we disentangle
> this at least some. Let me see what I can do as a prereq for this
> change, but see also below.
>
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5194,6 +5194,49 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> > } \
> > } while ( false )
> >
> > +mfn_t vmap_to_mfn(const void *v)
> > +{
> > + bool locking = system_state > SYS_STATE_boot;
> > + unsigned int l2_offset = l2_table_offset((unsigned long)v);
> > + unsigned int l1_offset = l1_table_offset((unsigned long)v);
> > + l3_pgentry_t *pl3e = virt_to_xen_l3e((unsigned long)v);
> > + l2_pgentry_t *pl2e;
> > + l1_pgentry_t *pl1e;
>
> Can't all three of them be pointer-to-const?
They can (and yes they should).
> > + struct page_info *l3page;
> > + mfn_t ret;
> > +
> > + ASSERT(pl3e);
>
> if ( !pl3e )
> {
> ASSERT_UNREACHABLE();
> return INVALID_MFN;
> }
>
> as per the bottom of ./CODING_STYLE? (Similarly further down
> then.)
Will revise.
> > + l3page = virt_to_page(pl3e);
> > + L3T_LOCK(l3page);
> > +
> > + ASSERT(l3e_get_flags(*pl3e) & _PAGE_PRESENT);
> > + if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
> > + {
> > + ret = mfn_add(l3e_get_mfn(*pl3e),
> > + (l2_offset << PAGETABLE_ORDER) + l1_offset);
> > + L3T_UNLOCK(l3page);
> > + return ret;
>
> To keep the locked region as narrow as possible
>
> mfn = l3e_get_mfn(*pl3e);
> L3T_UNLOCK(l3page);
> return mfn_add(mfn, (l2_offset << PAGETABLE_ORDER) +
> l1_offset);
>
> ? However, in particular because of the recurring unlocks on
> the exit paths I wonder whether it wouldn't be better to
> restructure the whole function such that there'll be one unlock
> and one return. Otoh I'm afraid what I'm asking for here is
> going to yield a measurable set of goto-s ...
I can do that.
But what about the lock narrowing? Will be slightly more tricky when
there is goto. Naturally:
ret = full return mfn;
goto out;
out:
UNLOCK();
return ret;
but with narrowing, my first reaction is:
ret = high bits of mfn;
l2_offset = 0;
l1_offset = 0;
goto out;
out:
UNLOCK();
return mfn + l2_offset << TABLE_ORDER + l1_offset;
To be honest, I doubt it is really worth it and I prefer the first one.
>
> > + }
> > +
> > + pl2e = map_l2t_from_l3e(*pl3e) + l2_offset;
> > + ASSERT(l2e_get_flags(*pl2e) & _PAGE_PRESENT);
> > + if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
> > + {
> > + ret = mfn_add(l2e_get_mfn(*pl2e), l1_offset);
> > + L3T_UNLOCK(l3page);
> > + return ret;
> > + }
> > +
> > + pl1e = map_l1t_from_l2e(*pl2e) + l1_offset;
> > + UNMAP_DOMAIN_PAGE(pl2e);
> > + ASSERT(l1e_get_flags(*pl1e) & _PAGE_PRESENT);
> > + ret = l1e_get_mfn(*pl1e);
> > + L3T_UNLOCK(l3page);
> > + UNMAP_DOMAIN_PAGE(pl1e);
> > +
> > + return ret;
> > +}
>
> Now for the name of the function: The only aspect tying it
> somewhat to vmap() is that it assumes (asserts) it'll find a
> valid mapping. I think it wants renaming, and vmap_to_mfn()
> then would become a #define of it (perhaps even retaining
> its property of getting unsigned long passed in), at which
> point it also doesn't need moving out of page.h. As to the
> actual name, xen_map_to_mfn() to somewhat match up with
> map_pages_to_xen()?
I actually really like this idea. I will come up with something in the
next rev. But if we want to make it generic, shouldn't we not ASSERT on
pl*e and the PRESENT flag and just return INVALID_MFN? Then this
function works on both mapped address (assumption of vmap_to_mfn()) and
other use cases.
Hongyan
next prev parent reply other threads:[~2020-12-02 12:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-30 16:50 [PATCH] x86/vmap: handle superpages in vmap_to_mfn() Hongyan Xia
2020-12-02 10:04 ` Jan Beulich
2020-12-02 12:17 ` Hongyan Xia [this message]
2020-12-02 13:05 ` Jan Beulich
2020-12-03 11:21 Hongyan Xia
2020-12-03 11:27 ` Hongyan Xia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6409ea31aa21fa54ef2697b0ce959bbf7b5e2a23.camel@xen.org \
--to=hx242@xen.org \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=jgrall@amazon.com \
--cc=roger.pau@citrix.com \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).