Re: [PATCH] x86/vmap: handle superpages in vmap_to_mfn()

From: Jan Beulich <jbeulich@suse.com>
To: Hongyan Xia <hx242@xen.org>
Cc: jgrall@amazon.com, "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH] x86/vmap: handle superpages in vmap_to_mfn()
Date: Wed, 2 Dec 2020 11:04:45 +0100	[thread overview]
Message-ID: <ef127c6f-2d8a-1ddf-f8e7-7e747518c5a8@suse.com> (raw)
In-Reply-To: <34de4c4326673c60d3e2cbd3bbcbcca481906524.1606755042.git.hongyxia@amazon.com>

On 30.11.2020 17:50, Hongyan Xia wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> There is simply no guarantee that vmap won't return superpages to the
> caller. It can happen if the list of MFNs are contiguous, or we simply
> have a large granularity. Although rare, if such things do happen, we
> will simply hit BUG_ON() and crash. Properly handle such cases in a new
> implementation.
> 
> Note that vmap is now too large to be a macro, so implement it as a
> normal function and move the declaration to mm.h (page.h cannot handle
> mfn_t).

I'm not happy about this movement, and it's also not really clear to
me what problem page.h would have in principle. Yes, it can't
include xen/mm.h, but I think it is long overdue that we disentangle
this at least some. Let me see what I can do as a prereq for this
change, but see also below.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5194,6 +5194,49 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>          }                                          \
>      } while ( false )
>  
> +mfn_t vmap_to_mfn(const void *v)
> +{
> +    bool locking = system_state > SYS_STATE_boot;
> +    unsigned int l2_offset = l2_table_offset((unsigned long)v);
> +    unsigned int l1_offset = l1_table_offset((unsigned long)v);
> +    l3_pgentry_t *pl3e = virt_to_xen_l3e((unsigned long)v);
> +    l2_pgentry_t *pl2e;
> +    l1_pgentry_t *pl1e;

Can't all three of them be pointer-to-const?

> +    struct page_info *l3page;
> +    mfn_t ret;
> +
> +    ASSERT(pl3e);

    if ( !pl3e )
    {
        ASSERT_UNREACHABLE();
        return INVALID_MFN;
    }

as per the bottom of ./CODING_STYLE? (Similarly further down
then.)

> +    l3page = virt_to_page(pl3e);
> +    L3T_LOCK(l3page);
> +
> +    ASSERT(l3e_get_flags(*pl3e) & _PAGE_PRESENT);
> +    if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
> +    {
> +        ret = mfn_add(l3e_get_mfn(*pl3e),
> +                      (l2_offset << PAGETABLE_ORDER) + l1_offset);
> +        L3T_UNLOCK(l3page);
> +        return ret;

To keep the locked region as narrow as possible

        mfn = l3e_get_mfn(*pl3e);
        L3T_UNLOCK(l3page);
        return mfn_add(mfn, (l2_offset << PAGETABLE_ORDER) + l1_offset);

? However, in particular because of the recurring unlocks on
the exit paths I wonder whether it wouldn't be better to
restructure the whole function such that there'll be one unlock
and one return. Otoh I'm afraid what I'm asking for here is
going to yield a measurable set of goto-s ...

> +    }
> +
> +    pl2e = map_l2t_from_l3e(*pl3e) + l2_offset;
> +    ASSERT(l2e_get_flags(*pl2e) & _PAGE_PRESENT);
> +    if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
> +    {
> +        ret = mfn_add(l2e_get_mfn(*pl2e), l1_offset);
> +        L3T_UNLOCK(l3page);
> +        return ret;
> +    }
> +
> +    pl1e = map_l1t_from_l2e(*pl2e) + l1_offset;
> +    UNMAP_DOMAIN_PAGE(pl2e);
> +    ASSERT(l1e_get_flags(*pl1e) & _PAGE_PRESENT);
> +    ret = l1e_get_mfn(*pl1e);
> +    L3T_UNLOCK(l3page);
> +    UNMAP_DOMAIN_PAGE(pl1e);
> +
> +    return ret;
> +}

Now for the name of the function: The only aspect tying it
somewhat to vmap() is that it assumes (asserts) it'll find a
valid mapping. I think it wants renaming, and vmap_to_mfn()
then would become a #define of it (perhaps even retaining
its property of getting unsigned long passed in), at which
point it also doesn't need moving out of page.h. As to the
actual name, xen_map_to_mfn() to somewhat match up with
map_pages_to_xen()?

Jan