linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
	akpm@linux-foundation.org, dave@stgolabs.net, jack@suse.cz,
	cl@linux.com, linux-mm@kvack.org, kvm@vger.kernel.org,
	kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-fpga@vger.kernel.org, linux-kernel@vger.kernel.org,
	paulus@ozlabs.org, benh@kernel.crashing.org, mpe@ellerman.id.au,
	hao.wu@intel.com, atull@kernel.org, mdf@kernel.org,
	aik@ozlabs.ru, peterz@infradead.org
Subject: Re: [PATCH 1/5] vfio/type1: use pinned_vm instead of locked_vm to account pinned pages
Date: Wed, 13 Feb 2019 13:03:30 -0700	[thread overview]
Message-ID: <20190213130330.76ef1987@w520.home> (raw)
In-Reply-To: <20190213002650.kav7xc4r2xs5f3ef@ca-dmjordan1.us.oracle.com>

On Tue, 12 Feb 2019 19:26:50 -0500
Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> On Tue, Feb 12, 2019 at 11:41:10AM -0700, Alex Williamson wrote:
> > Daniel Jordan <daniel.m.jordan@oracle.com> wrote:  
> > > On Mon, Feb 11, 2019 at 03:56:20PM -0700, Jason Gunthorpe wrote:  
> > > > I haven't looked at this super closely, but how does this stuff work?
> > > > 
> > > > do_mlock doesn't touch pinned_vm, and this doesn't touch locked_vm...
> > > > 
> > > > Shouldn't all this be 'if (locked_vm + pinned_vm < RLIMIT_MEMLOCK)' ?
> > > >
> > > > Otherwise MEMLOCK is really doubled..    
> > > 
> > > So this has been a problem for some time, but it's not as easy as adding them
> > > together, see [1][2] for a start.
> > > 
> > > The locked_vm/pinned_vm issue definitely needs fixing, but all this series is
> > > trying to do is account to the right counter.  
> 
> Thanks for taking a look, Alex.
> 
> > This still makes me nervous because we have userspace dependencies on
> > setting process locked memory.  
> 
> Could you please expand on this?  Trying to get more context.

VFIO is a userspace driver interface and the pinned/locked page
accounting we're doing here is trying to prevent a user from exceeding
their locked memory limits.  Thus a VM management tool or unprivileged
userspace driver needs to have appropriate locked memory limits
configured for their use case.  Currently we do not have a unified
accounting scheme, so if a page is mlock'd by the user and also mapped
through VFIO for DMA, it's accounted twice, these both increment
locked_vm and userspace needs to manage that.  If pinned memory
and locked memory are now two separate buckets and we're only comparing
one of them against the locked memory limit, then it seems we have
effectively doubled the user's locked memory for this use case, as
Jason questioned.  The user could mlock one page and DMA map another,
they're both "locked", but now they only take one slot in each bucket.

If we continue forward with using a separate bucket here, userspace
could infer that accounting is unified and lower the user's locked
memory limit, or exploit the gap that their effective limit might
actually exceed system memory.  In the former case, if we do eventually
correct to compare the total of the combined buckets against the user's
locked memory limits, we'll break users that have adapted their locked
memory limits to meet the apparent needs.  In the latter case, the
inconsistent accounting is potentially an attack vector.

> > There's a user visible difference if we
> > account for them in the same bucket vs separate.  Perhaps we're
> > counting in the wrong bucket now, but if we "fix" that and userspace
> > adapts, how do we ever go back to accounting both mlocked and pinned
> > memory combined against rlimit?  Thanks,  
> 
> PeterZ posted an RFC that addresses this point[1].  It kept pinned_vm and
> locked_vm accounting separate, but allowed the two to be added safely to be
> compared against RLIMIT_MEMLOCK.

Unless I'm incorrect in the concerns above, I don't see how we can
convert vfio before this occurs.
 
> Anyway, until some solution is agreed on, are there objections to converting
> locked_vm to an atomic, to avoid user-visible changes, instead of switching
> locked_vm users to pinned_vm?

Seems that as long as we have separate buckets that are compared
individually to rlimit that we've got problems, it's just a matter of
where they're exposed based on which bucket is used for which
interface.  Thanks,

Alex


  reply	other threads:[~2019-02-13 20:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11 22:44 [PATCH 0/5] " Daniel Jordan
2019-02-11 22:44 ` [PATCH 1/5] vfio/type1: " Daniel Jordan
2019-02-11 22:56   ` Jason Gunthorpe
2019-02-11 23:11     ` Daniel Jordan
2019-02-12 18:41       ` Alex Williamson
2019-02-13  0:26         ` Daniel Jordan
2019-02-13 20:03           ` Alex Williamson [this message]
2019-02-13 23:07             ` Jason Gunthorpe
2019-02-14  1:46             ` Daniel Jordan
2019-02-11 22:44 ` [PATCH 2/5] vfio/spapr_tce: " Daniel Jordan
2019-02-12  6:56   ` Alexey Kardashevskiy
2019-02-12 16:50     ` Christopher Lameter
2019-02-12 17:18       ` Daniel Jordan
2019-02-13  0:37         ` Alexey Kardashevskiy
2019-02-12 18:56     ` Alex Williamson
2019-02-13  0:34       ` Alexey Kardashevskiy
2019-02-11 22:44 ` [PATCH 3/5] fpga/dlf/afu: " Daniel Jordan
2019-02-11 22:44 ` [PATCH 4/5] powerpc/mmu: " Daniel Jordan
2019-02-13  1:14   ` kbuild test robot
2019-02-11 22:44 ` [PATCH 5/5] kvm/book3s: " Daniel Jordan
2019-02-13  1:43   ` kbuild test robot
2019-02-11 22:54 ` [PATCH 0/5] " Jason Gunthorpe
2019-02-11 23:15   ` Daniel Jordan
2019-02-14  1:53   ` Ira Weiny
2019-02-14  6:00     ` Jason Gunthorpe
2019-02-14 19:33       ` Ira Weiny
2019-02-14 20:12         ` Jason Gunthorpe
2019-02-14 21:46           ` Ira Weiny
2019-02-14 22:16             ` Jason Gunthorpe
2019-02-15 15:26               ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190213130330.76ef1987@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=akpm@linux-foundation.org \
    --cc=atull@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave@stgolabs.net \
    --cc=hao.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fpga@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mdf@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@ozlabs.org \
    --cc=peterz@infradead.org \
    --subject='Re: [PATCH 1/5] vfio/type1: use pinned_vm instead of locked_vm to account pinned pages' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).