[V2,2/9] macvtap: zerocopy: fix truesize underestimation
diff mbox series

Message ID 20120502034144.11782.88947.stgit@amd-6168-8-1.englab.nay.redhat.com
State New, archived
Headers show
Series
  • vhost/macvtap zeropcopy fixes
Related show

Commit Message

Jason Wang May 2, 2012, 3:41 a.m. UTC
As the skb fragment were pinned/built from user pages, we should
account the page instead of length for truesize.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Shirley Ma May 15, 2012, 5:26 p.m. UTC | #1
On Wed, 2012-05-02 at 11:41 +0800, Jason Wang wrote:
> As the skb fragment were pinned/built from user pages, we should
> account the page instead of length for truesize.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/macvtap.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index bd4a70d..7cb2684 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct sk_buff
> *skb, const struct iovec *from,
>                 struct page *page[MAX_SKB_FRAGS];
>                 int num_pages;
>                 unsigned long base;
> +               unsigned long truesize;
> 
>                 len = from->iov_len - offset;
>                 if (!len) {
> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct sk_buff
> *skb, const struct iovec *from,
>                     (num_pages > MAX_SKB_FRAGS -
> skb_shinfo(skb)->nr_frags))
>                         /* put_page is in skb free */
>                         return -EFAULT;
> +               truesize = size * PAGE_SIZE;

Here should be truesize = size * PAGE_SIZE - offset, right?

>                 skb->data_len += len;
>                 skb->len += len;
> -               skb->truesize += len;
> -               atomic_add(len, &skb->sk->sk_wmem_alloc);
> +               skb->truesize += truesize;
> +               atomic_add(truesize, &skb->sk->sk_wmem_alloc);
>                 while (len) {
>                         int off = base & ~PAGE_MASK;
>                         int size = min_t(int, len, PAGE_SIZE - off);
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang May 16, 2012, 3:04 a.m. UTC | #2
On 05/16/2012 01:26 AM, Shirley Ma wrote:
> On Wed, 2012-05-02 at 11:41 +0800, Jason Wang wrote:
>> As the skb fragment were pinned/built from user pages, we should
>> account the page instead of length for truesize.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>   drivers/net/macvtap.c |    6 ++++--
>>   1 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index bd4a70d..7cb2684 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct sk_buff
>> *skb, const struct iovec *from,
>>                  struct page *page[MAX_SKB_FRAGS];
>>                  int num_pages;
>>                  unsigned long base;
>> +               unsigned long truesize;
>>
>>                  len = from->iov_len - offset;
>>                  if (!len) {
>> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct sk_buff
>> *skb, const struct iovec *from,
>>                      (num_pages>  MAX_SKB_FRAGS -
>> skb_shinfo(skb)->nr_frags))
>>                          /* put_page is in skb free */
>>                          return -EFAULT;
>> +               truesize = size * PAGE_SIZE;
> Here should be truesize = size * PAGE_SIZE - offset, right?
>

We get the whole user page, so need to account them all. Also this is 
aligned with skb_copy_ubufs().
>>                  skb->data_len += len;
>>                  skb->len += len;
>> -               skb->truesize += len;
>> -               atomic_add(len,&skb->sk->sk_wmem_alloc);
>> +               skb->truesize += truesize;
>> +               atomic_add(truesize,&skb->sk->sk_wmem_alloc);
>>                  while (len) {
>>                          int off = base&  ~PAGE_MASK;
>>                          int size = min_t(int, len, PAGE_SIZE - off);
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Shirley Ma May 16, 2012, 3:03 p.m. UTC | #3
On Wed, 2012-05-16 at 11:04 +0800, Jason Wang wrote:
> >> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> >> index bd4a70d..7cb2684 100644
> >> --- a/drivers/net/macvtap.c
> >> +++ b/drivers/net/macvtap.c
> >> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct
> sk_buff
> >> *skb, const struct iovec *from,
> >>                  struct page *page[MAX_SKB_FRAGS];
> >>                  int num_pages;
> >>                  unsigned long base;
> >> +               unsigned long truesize;
> >>
> >>                  len = from->iov_len - offset;
> >>                  if (!len) {
> >> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct
> sk_buff
> >> *skb, const struct iovec *from,
> >>                      (num_pages>  MAX_SKB_FRAGS -
> >> skb_shinfo(skb)->nr_frags))
> >>                          /* put_page is in skb free */
> >>                          return -EFAULT;
> >> +               truesize = size * PAGE_SIZE;
> > Here should be truesize = size * PAGE_SIZE - offset, right?
> >
> 
> We get the whole user page, so need to account them all. Also this is 
> aligned with skb_copy_ubufs(). 

Then this would double count the size of "first" offset left from
previous copy, both skb->len and truesize.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang May 17, 2012, 2:59 a.m. UTC | #4
On 05/16/2012 11:03 PM, Shirley Ma wrote:
> On Wed, 2012-05-16 at 11:04 +0800, Jason Wang wrote:
>>>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>>>> index bd4a70d..7cb2684 100644
>>>> --- a/drivers/net/macvtap.c
>>>> +++ b/drivers/net/macvtap.c
>>>> @@ -519,6 +519,7 @@ static int zerocopy_sg_from_iovec(struct
>> sk_buff
>>>> *skb, const struct iovec *from,
>>>>                   struct page *page[MAX_SKB_FRAGS];
>>>>                   int num_pages;
>>>>                   unsigned long base;
>>>> +               unsigned long truesize;
>>>>
>>>>                   len = from->iov_len - offset;
>>>>                   if (!len) {
>>>> @@ -533,10 +534,11 @@ static int zerocopy_sg_from_iovec(struct
>> sk_buff
>>>> *skb, const struct iovec *from,
>>>>                       (num_pages>   MAX_SKB_FRAGS -
>>>> skb_shinfo(skb)->nr_frags))
>>>>                           /* put_page is in skb free */
>>>>                           return -EFAULT;
>>>> +               truesize = size * PAGE_SIZE;
>>> Here should be truesize = size * PAGE_SIZE - offset, right?
>>>
>> We get the whole user page, so need to account them all. Also this is
>> aligned with skb_copy_ubufs().
> Then this would double count the size of "first" offset left from
> previous copy, both skb->len and truesize.
>
> Thanks
> Shirley
>

Didn't see how this affact skb->len. And for truesize, I think they are 
different, when the offset were not zero, the data in this vector were 
divided into two parts. First part is copied into skb directly, and the 
second were pinned from a whole userspace page by get_user_pages_fast(), 
so we need count the whole page to the socket limit to prevent evil 
application.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Shirley Ma May 17, 2012, 3:28 p.m. UTC | #5
On Thu, 2012-05-17 at 10:59 +0800, Jason Wang wrote:
> Didn't see how this affact skb->len. And for truesize, I think they
> are 
> different, when the offset were not zero, the data in this vector
> were 
> divided into two parts. First part is copied into skb directly, and
> the 
> second were pinned from a whole userspace page by
> get_user_pages_fast(), 
> so we need count the whole page to the socket limit to prevent evil 
> application. 

What I meant that the code for skb->truesize has double added the first
offset if any left from that vector (partically copied into skb
directly, and then count pagesize which includes the offset (truesize +=
PAGE_SIZE)).

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang May 18, 2012, 10:10 a.m. UTC | #6
On 05/17/2012 11:28 PM, Shirley Ma wrote:
> On Thu, 2012-05-17 at 10:59 +0800, Jason Wang wrote:
>> Didn't see how this affact skb->len. And for truesize, I think they
>> are
>> different, when the offset were not zero, the data in this vector
>> were
>> divided into two parts. First part is copied into skb directly, and
>> the
>> second were pinned from a whole userspace page by
>> get_user_pages_fast(),
>> so we need count the whole page to the socket limit to prevent evil
>> application.
> What I meant that the code for skb->truesize has double added the first
> offset if any left from that vector (partically copied into skb
> directly, and then count pagesize which includes the offset (truesize +=
> PAGE_SIZE)).

Yes, I get you mean. There's no difference between first frag and 
others: it's also possible for other frags that didn't occupy the whole 
page. Since we pin the whole user page, better to count the whole page 
size to prevent evil application.
> Thanks
> Shirley
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Shirley Ma May 18, 2012, 3:22 p.m. UTC | #7
On Fri, 2012-05-18 at 18:10 +0800, Jason Wang wrote:
> > On Thu, 2012-05-17 at 10:59 +0800, Jason Wang wrote:
> >> Didn't see how this affact skb->len. And for truesize, I think they
> >> are
> >> different, when the offset were not zero, the data in this vector
> >> were
> >> divided into two parts. First part is copied into skb directly, and
> >> the
> >> second were pinned from a whole userspace page by
> >> get_user_pages_fast(),
> >> so we need count the whole page to the socket limit to prevent evil
> >> application.
> > What I meant that the code for skb->truesize has double added the
> first
> > offset if any left from that vector (partically copied into skb
> > directly, and then count pagesize which includes the offset
> (truesize +=
> > PAGE_SIZE)).
> 
> Yes, I get you mean. There's no difference between first frag and 
> others: it's also possible for other frags that didn't occupy the
> whole 
> page. Since we pin the whole user page, better to count the whole
> page 
> size to prevent evil application. 

The difference between first frags and others is other frags might not
occupy the whole page, but the first frags extra offset was doubled
added in skb truesize.

So it's ok for skb->truesize to be bigger than all the skb pinned pages
here?

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang May 21, 2012, 6:15 a.m. UTC | #8
On 05/18/2012 11:22 PM, Shirley Ma wrote:
> On Fri, 2012-05-18 at 18:10 +0800, Jason Wang wrote:
>>> On Thu, 2012-05-17 at 10:59 +0800, Jason Wang wrote:
>>>> Didn't see how this affact skb->len. And for truesize, I think they
>>>> are
>>>> different, when the offset were not zero, the data in this vector
>>>> were
>>>> divided into two parts. First part is copied into skb directly, and
>>>> the
>>>> second were pinned from a whole userspace page by
>>>> get_user_pages_fast(),
>>>> so we need count the whole page to the socket limit to prevent evil
>>>> application.
>>> What I meant that the code for skb->truesize has double added the
>> first
>>> offset if any left from that vector (partically copied into skb
>>> directly, and then count pagesize which includes the offset
>> (truesize +=
>>> PAGE_SIZE)).
>> Yes, I get you mean. There's no difference between first frag and
>> others: it's also possible for other frags that didn't occupy the
>> whole
>> page. Since we pin the whole user page, better to count the whole
>> page
>> size to prevent evil application.
> The difference between first frags and others is other frags might not
> occupy the whole page, but the first frags extra offset was doubled
> added in skb truesize.
>
> So it's ok for skb->truesize to be bigger than all the skb pinned pages
> here?

I think it's ok here and we could find other example such as virtio_net 
driver.
>
> Thanks
> Shirley
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index bd4a70d..7cb2684 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -519,6 +519,7 @@  static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
 		struct page *page[MAX_SKB_FRAGS];
 		int num_pages;
 		unsigned long base;
+		unsigned long truesize;
 
 		len = from->iov_len - offset;
 		if (!len) {
@@ -533,10 +534,11 @@  static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from,
 		    (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
 			/* put_page is in skb free */
 			return -EFAULT;
+		truesize = size * PAGE_SIZE;
 		skb->data_len += len;
 		skb->len += len;
-		skb->truesize += len;
-		atomic_add(len, &skb->sk->sk_wmem_alloc);
+		skb->truesize += truesize;
+		atomic_add(truesize, &skb->sk->sk_wmem_alloc);
 		while (len) {
 			int off = base & ~PAGE_MASK;
 			int size = min_t(int, len, PAGE_SIZE - off);