linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Steven Sistare <steven.sistare@oracle.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 3/3] vfio/type1: Batch page pinning
Date: Tue, 23 Mar 2021 13:32:54 -0600	[thread overview]
Message-ID: <20210323133254.33ed9161@omen.home.shazbot.org> (raw)
In-Reply-To: <20210219161305.36522-4-daniel.m.jordan@oracle.com>

Hi Daniel,

I've found a bug in this patch that we need to fix.  The diff is a
little difficult to follow, so I'll discuss it in the resulting
function below...

(1) Imagine the user has passed a vaddr range that alternates pfnmaps
and pinned memory per page.


static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
                                  long npage, unsigned long *pfn_base,
                                  unsigned long limit, struct vfio_batch *batch)
{
        unsigned long pfn;
        struct mm_struct *mm = current->mm;
        long ret, pinned = 0, lock_acct = 0;
        bool rsvd;
        dma_addr_t iova = vaddr - dma->vaddr + dma->iova;

        /* This code path is only user initiated */
        if (!mm)
                return -ENODEV;

        if (batch->size) {
                /* Leftover pages in batch from an earlier call. */
                *pfn_base = page_to_pfn(batch->pages[batch->offset]);
                pfn = *pfn_base;
                rsvd = is_invalid_reserved_pfn(*pfn_base);

(4) We're called again and fill our local variables from the batch.  The
    batch only has one page, so we'll complete the inner loop below and refill.

(6) We're called again, batch->size is 1, but it was for a pfnmap, the pages
    array still contains the last pinned page, so we end up incorrectly using
    this pfn again for the next entry.

        } else {
                *pfn_base = 0;
        }

        while (npage) {
                if (!batch->size) {
                        /* Empty batch, so refill it. */
                        long req_pages = min_t(long, npage, batch->capacity);

                        ret = vaddr_get_pfns(mm, vaddr, req_pages, dma->prot,
                                             &pfn, batch->pages);
                        if (ret < 0)
                                goto unpin_out;

(2) Assume the 1st page is pfnmap, the 2nd is pinned memory

                        batch->size = ret;
                        batch->offset = 0;

                        if (!*pfn_base) {
                                *pfn_base = pfn;
                                rsvd = is_invalid_reserved_pfn(*pfn_base);
                        }
                }

                /*
                 * pfn is preset for the first iteration of this inner loop and
                 * updated at the end to handle a VM_PFNMAP pfn.  In that case,
                 * batch->pages isn't valid (there's no struct page), so allow
                 * batch->pages to be touched only when there's more than one
                 * pfn to check, which guarantees the pfns are from a
                 * !VM_PFNMAP vma.
                 */
                while (true) {
                        if (pfn != *pfn_base + pinned ||
                            rsvd != is_invalid_reserved_pfn(pfn))
                                goto out;

(3) On the 2nd page, both tests are probably true here, so we take this goto,
    leaving the batch with the next page.

(5) Now we've refilled batch, but the next page is pfnmap, so likely both of the
    above tests are true... but this is a pfnmap'ing!

(7) Do we add something like if (batch->size == 1 && !batch->offset) {
    put_pfn(pfn, dma->prot); batch->size = 0; }?

                        /*
                         * Reserved pages aren't counted against the user,
                         * externally pinned pages are already counted against
                         * the user.
                         */
                        if (!rsvd && !vfio_find_vpfn(dma, iova)) {
                                if (!dma->lock_cap &&
                                    mm->locked_vm + lock_acct + 1 > limit) {
                                        pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
                                                __func__, limit << PAGE_SHIFT);
                                        ret = -ENOMEM;
                                        goto unpin_out;
                                }
                                lock_acct++;
                        }

                        pinned++;
                        npage--;
                        vaddr += PAGE_SIZE;
                        iova += PAGE_SIZE;
                        batch->offset++;
                        batch->size--;

                        if (!batch->size)
                                break;

                        pfn = page_to_pfn(batch->pages[batch->offset]);
                }

                if (unlikely(disable_hugepages))
                        break;
        }

out:
        ret = vfio_lock_acct(dma, lock_acct, false);

unpin_out:
        if (ret < 0) {
                if (pinned && !rsvd) {
                        for (pfn = *pfn_base ; pinned ; pfn++, pinned--)
                                put_pfn(pfn, dma->prot);
                }
                vfio_batch_unpin(batch, dma);

(8) These calls to batch_unpin are rather precarious as well, any time batch->size is
    non-zero, we risk using the pages array for a pfnmap.  We might actually want the
    above change in (7) moved into this exit path so that we can never return a potential
    pfnmap batch.

                return ret;
        }

        return pinned;
}

This is a regression that not only causes incorrect mapping for the
user, but also allows the user to trigger bad page counts, so we need
a fix for v5.12.  Thanks,

Alex


  reply	other threads:[~2021-03-23 19:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-19 16:13 [PATCH v2 0/3] vfio/type1: Batch page pinning Daniel Jordan
2021-02-19 16:13 ` [PATCH v2 1/3] vfio/type1: Change success value of vaddr_get_pfn() Daniel Jordan
2021-02-19 16:13 ` [PATCH v2 2/3] vfio/type1: Prepare for batched pinning with struct vfio_batch Daniel Jordan
2021-02-19 16:13 ` [PATCH v2 3/3] vfio/type1: Batch page pinning Daniel Jordan
2021-03-23 19:32   ` Alex Williamson [this message]
2021-03-23 22:25     ` Daniel Jordan
2021-03-23 23:09       ` Alex Williamson
2021-02-23 18:24 ` [PATCH v2 0/3] " Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210323133254.33ed9161@omen.home.shazbot.org \
    --to=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=steven.sistare@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).