All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
	Alexander Graf <agraf@suse.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
Date: Tue, 8 Dec 2015 16:18:38 +1100	[thread overview]
Message-ID: <20151208051838.GQ20139@voom.fritz.box> (raw)
In-Reply-To: <1442314179-9706-6-git-send-email-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 3928 bytes --]

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

Hmm.  Does it make sense to account for the descriptor struct itself?
I mean there are lots of little structures the kernel will allocate on
a process's behalf, and I don't think most of them get accounted
against locked vm.

> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> +	long ret = 0;
> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(abs(npages) * sizeof(struct page *));
> +	const long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;

Overflow checks might be useful here, I'm not sure.

> +
> +	if (!current || !current->mm)
> +		return ret; /* process exited */
> +
> +	npages += stt_pages;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	if (inc) {
> +		long locked, lock_limit;
> +
> +		locked = current->mm->locked_vm + npages;
> +		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +			ret = -ENOMEM;
> +		else
> +			current->mm->locked_vm += npages;
> +	} else {
> +		if (npages > current->mm->locked_vm)

Should this be a WARN_ON?  It means something has gone wrong
previously in the accounting, doesn't it?

> +			npages = current->mm->locked_vm;
> +
> +		current->mm->locked_vm -= npages;
> +	}
> +
> +	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> +			inc ? '+' : '-',
> +			npages << PAGE_SHIFT,
> +			current->mm->locked_vm << PAGE_SHIFT,
> +			rlimit(RLIMIT_MEMLOCK),
> +			ret ? " - exceeded" : "");
> +
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_table *stt = container_of(head,
>  			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
> +	long npages = kvmppc_stt_npages(stt->window_size);
>  
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
>  
>  	kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> +	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
>  	return 0;
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages, true);
> +	if (ret) {
> +		stt = NULL;
> +		goto fail;
> +	}
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras <paulus@samba.org>,
	Alexander Graf <agraf@suse.com>,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
Date: Tue, 08 Dec 2015 05:18:38 +0000	[thread overview]
Message-ID: <20151208051838.GQ20139@voom.fritz.box> (raw)
In-Reply-To: <1442314179-9706-6-git-send-email-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 3928 bytes --]

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

Hmm.  Does it make sense to account for the descriptor struct itself?
I mean there are lots of little structures the kernel will allocate on
a process's behalf, and I don't think most of them get accounted
against locked vm.

> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> +	long ret = 0;
> +	const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> +			(abs(npages) * sizeof(struct page *));
> +	const long stt_pages = ALIGN(bytes, PAGE_SIZE) / PAGE_SIZE;

Overflow checks might be useful here, I'm not sure.

> +
> +	if (!current || !current->mm)
> +		return ret; /* process exited */
> +
> +	npages += stt_pages;
> +
> +	down_write(&current->mm->mmap_sem);
> +
> +	if (inc) {
> +		long locked, lock_limit;
> +
> +		locked = current->mm->locked_vm + npages;
> +		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +			ret = -ENOMEM;
> +		else
> +			current->mm->locked_vm += npages;
> +	} else {
> +		if (npages > current->mm->locked_vm)

Should this be a WARN_ON?  It means something has gone wrong
previously in the accounting, doesn't it?

> +			npages = current->mm->locked_vm;
> +
> +		current->mm->locked_vm -= npages;
> +	}
> +
> +	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> +			inc ? '+' : '-',
> +			npages << PAGE_SHIFT,
> +			current->mm->locked_vm << PAGE_SHIFT,
> +			rlimit(RLIMIT_MEMLOCK),
> +			ret ? " - exceeded" : "");
> +
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct rcu_head *head)
>  {
>  	struct kvmppc_spapr_tce_table *stt = container_of(head,
>  			struct kvmppc_spapr_tce_table, rcu);
>  	int i;
> +	long npages = kvmppc_stt_npages(stt->window_size);
>  
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
>  
>  	kfree(stt);
> @@ -89,6 +132,7 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
>  
>  	kvm_put_kvm(stt->kvm);
>  
> +	kvmppc_account_memlimit(kvmppc_stt_npages(stt->window_size), false);
>  	call_rcu(&stt->rcu, release_spapr_tce_table);
>  
>  	return 0;
> @@ -114,6 +158,11 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages, true);
> +	if (ret) {
> +		stt = NULL;
> +		goto fail;
> +	}
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2015-12-08  9:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-15 10:49 [PATCH kernel 0/9] KVM: PPC: Add in-kernel multitce handling Alexey Kardashevskiy
2015-09-15 10:49 ` Alexey Kardashevskiy
2015-09-15 10:49 ` [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:05   ` David Gibson
2015-12-08  2:05     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:08   ` David Gibson
2015-12-08  2:08     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:18   ` David Gibson
2015-12-08  2:18     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  2:35   ` David Gibson
2015-12-08  2:35     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-11-30  2:06   ` Paul Mackerras
2015-11-30  2:06     ` Paul Mackerras
2015-11-30  5:09     ` Alexey Kardashevskiy
2015-11-30  5:09       ` Alexey Kardashevskiy
2015-12-08  5:18   ` David Gibson [this message]
2015-12-08  5:18     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 6/9] KVM: PPC: Replace SPAPR_TCE_SHIFT with IOMMU_PAGE_SHIFT_4K Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:19   ` David Gibson
2015-12-08  5:19     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 7/9] KVM: PPC: Move reusable bits of H_PUT_TCE handler to helpers Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:27   ` David Gibson
2015-12-08  5:27     ` David Gibson
2015-12-22  7:24     ` Alexey Kardashevskiy
2015-12-22  7:24       ` Alexey Kardashevskiy
2015-09-15 10:49 ` [PATCH kernel 8/9] KVM: Fix KVM_SMI chapter number Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:29   ` David Gibson
2015-12-08  5:29     ` David Gibson
2015-09-15 10:49 ` [PATCH kernel 9/9] KVM: PPC: Add support for multiple-TCE hcalls Alexey Kardashevskiy
2015-09-15 10:49   ` Alexey Kardashevskiy
2015-12-08  5:48   ` David Gibson
2015-12-08  5:48     ` David Gibson
2015-12-22  7:42     ` Alexey Kardashevskiy
2015-12-22  7:42       ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151208051838.GQ20139@voom.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=agraf@suse.com \
    --cc=aik@ozlabs.ru \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.