[2/6] macvtap: zerocopy: fix truesize underestimation
diff mbox series

Message ID 20120416060758.14140.40252.stgit@intel-e5620-16-2.englab.nay.redhat.com
State New, archived
Headers show
Series
  • [1/6] macvtap: zerocopy: fix offset calculation when building skb
Related show

Commit Message

Jason Wang April 16, 2012, 6:07 a.m. UTC
As the skb fragment were pinned/built from user pages, we should
account the page instead of length for truesize.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Michael S. Tsirkin April 16, 2012, 7:14 a.m. UTC | #1
On Mon, Apr 16, 2012 at 02:07:59PM +0800, Jason Wang wrote:
> As the skb fragment were pinned/built from user pages, we should
> account the page instead of length for truesize.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I'm not sure this is right: the skb does *not* consume the
whole page, userspace uses the rest of the page
for other skbs. So we'll end up accounting for the
same page twice.
Eric, what's the right thing to do here in your opinion?

> ---
>  drivers/net/macvtap.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index bd4a70d..7cb2684 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
>  		struct page *page[MAX_SKB_FRAGS];
>  		int num_pages;
>  		unsigned long base;
> +		unsigned long truesize;
>  
>  		len = from->iov_len - offset;
>  		if (!len) {
> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
>  		    (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
>  			/* put_page is in skb free */
>  			return -EFAULT;
> +		truesize = size * PAGE_SIZE;
>  		skb->data_len += len;
>  		skb->len += len;
> -		skb->truesize += len;
> -		atomic_add(len, &skb->sk->sk_wmem_alloc);
> +		skb->truesize += truesize;
> +		atomic_add(truesize, &skb->sk->sk_wmem_alloc);
>  		while (len) {
>  			int off = base & ~PAGE_MASK;
>  			int size = min_t(int, len, PAGE_SIZE - off);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang April 16, 2012, 8:23 a.m. UTC | #2
On 04/16/2012 03:14 PM, Michael S. Tsirkin wrote:
> On Mon, Apr 16, 2012 at 02:07:59PM +0800, Jason Wang wrote:
>> As the skb fragment were pinned/built from user pages, we should
>> account the page instead of length for truesize.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> I'm not sure this is right: the skb does *not* consume the
> whole page, userspace uses the rest of the page
> for other skbs. So we'll end up accounting for the
> same page twice.
> Eric, what's the right thing to do here in your opinion?

Or at very least, we need to do this in skb_copy_ubufs() as it allocate 
whole new pages.
>> ---
>>   drivers/net/macvtap.c |    6 ++++--
>>   1 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index bd4a70d..7cb2684 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
>>   		struct page *page[MAX_SKB_FRAGS];
>>   		int num_pages;
>>   		unsigned long base;
>> +		unsigned long truesize;
>>
>>   		len = from->iov_len - offset;
>>   		if (!len) {
>> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
>>   		    (num_pages>  MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
>>   			/* put_page is in skb free */
>>   			return -EFAULT;
>> +		truesize = size * PAGE_SIZE;
>>   		skb->data_len += len;
>>   		skb->len += len;
>> -		skb->truesize += len;
>> -		atomic_add(len,&skb->sk->sk_wmem_alloc);
>> +		skb->truesize += truesize;
>> +		atomic_add(truesize,&skb->sk->sk_wmem_alloc);
>>   		while (len) {
>>   			int off = base&  ~PAGE_MASK;
>>   			int size = min_t(int, len, PAGE_SIZE - off);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Eric Dumazet April 16, 2012, 8:49 a.m. UTC | #3
On Mon, 2012-04-16 at 10:14 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 16, 2012 at 02:07:59PM +0800, Jason Wang wrote:
> > As the skb fragment were pinned/built from user pages, we should
> > account the page instead of length for truesize.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> I'm not sure this is right: the skb does *not* consume the
> whole page, userspace uses the rest of the page
> for other skbs. So we'll end up accounting for the
> same page twice.
> Eric, what's the right thing to do here in your opinion?

Problem is we dont know for sure userspace wont free pages right after
this syscall. So an evil application could consume more kernel memory
than what socket limit allowed.

Its same problem with vmsplice(mem -> pipe) + splice(pipe -> socket)

When we clone skb with frags, resulting skb will have same truesize,
even if the pages are shared ...



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin April 16, 2012, 1:25 p.m. UTC | #4
On Mon, Apr 16, 2012 at 10:49:53AM +0200, Eric Dumazet wrote:
> On Mon, 2012-04-16 at 10:14 +0300, Michael S. Tsirkin wrote:
> > On Mon, Apr 16, 2012 at 02:07:59PM +0800, Jason Wang wrote:
> > > As the skb fragment were pinned/built from user pages, we should
> > > account the page instead of length for truesize.
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > 
> > I'm not sure this is right: the skb does *not* consume the
> > whole page, userspace uses the rest of the page
> > for other skbs. So we'll end up accounting for the
> > same page twice.
> > Eric, what's the right thing to do here in your opinion?
> 
> Problem is we dont know for sure userspace wont free pages right after
> this syscall. So an evil application could consume more kernel memory
> than what socket limit allowed.
> 
> Its same problem with vmsplice(mem -> pipe) + splice(pipe -> socket)
> 
> When we clone skb with frags, resulting skb will have same truesize,
> even if the pages are shared ...
> 

I see, thanks for the clarification.

Patch
diff mbox series

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index bd4a70d..7cb2684 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -519,6 +519,7 @@  static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
 		struct page *page[MAX_SKB_FRAGS];
 		int num_pages;
 		unsigned long base;
+		unsigned long truesize;
 
 		len = from->iov_len - offset;
 		if (!len) {
@@ -533,10 +534,11 @@  static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
 		    (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
 			/* put_page is in skb free */
 			return -EFAULT;
+		truesize = size * PAGE_SIZE;
 		skb->data_len += len;
 		skb->len += len;
-		skb->truesize += len;
-		atomic_add(len, &skb->sk->sk_wmem_alloc);
+		skb->truesize += truesize;
+		atomic_add(truesize, &skb->sk->sk_wmem_alloc);
 		while (len) {
 			int off = base & ~PAGE_MASK;
 			int size = min_t(int, len, PAGE_SIZE - off);