All of lore.kernel.org
 help / color / mirror / Atom feed
* ICMP packets - ll_temac with Microblaze
@ 2011-12-21 10:11 Michal Simek
  2011-12-21 10:28 ` Eric Dumazet
  2011-12-22 10:32 ` Michael Wang
  0 siblings, 2 replies; 36+ messages in thread
From: Michal Simek @ 2011-12-21 10:11 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, John Williams; +Cc: netdev

Hi Eric and David,

I have found one problem with ll_temac driver and
this commit: net: more accurate skb truesize
sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075

The problem is only with icmp packets from the target. It is sent and driver receive it
but it is not proceed to the application.

The problem I see is that kmalloc_node_track_caller allocate
specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.

Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
everything is ok.

Can you give me some hints what can be wrong?

Thanks,
Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:11 ICMP packets - ll_temac with Microblaze Michal Simek
@ 2011-12-21 10:28 ` Eric Dumazet
  2011-12-21 10:30   ` Michal Simek
  2011-12-21 10:30   ` Eric Dumazet
  2011-12-22 10:32 ` Michael Wang
  1 sibling, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 10:28 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 11:11 +0100, Michal Simek a écrit :
> Hi Eric and David,
> 
> I have found one problem with ll_temac driver and
> this commit: net: more accurate skb truesize
> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
> 
> The problem is only with icmp packets from the target. It is sent and driver receive it
> but it is not proceed to the application.
> 
> The problem I see is that kmalloc_node_track_caller allocate
> specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
> The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.
> 
> Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
> everything is ok.
> 
> Can you give me some hints what can be wrong?
> 

Is it with SLUB, SLAB or SLOB allocator ?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:28 ` Eric Dumazet
@ 2011-12-21 10:30   ` Michal Simek
  2011-12-21 10:30   ` Eric Dumazet
  1 sibling, 0 replies; 36+ messages in thread
From: Michal Simek @ 2011-12-21 10:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 11:11 +0100, Michal Simek a écrit :
>> Hi Eric and David,
>>
>> I have found one problem with ll_temac driver and
>> this commit: net: more accurate skb truesize
>> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
>>
>> The problem is only with icmp packets from the target. It is sent and driver receive it
>> but it is not proceed to the application.
>>
>> The problem I see is that kmalloc_node_track_caller allocate
>> specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
>> The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.
>>
>> Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
>> everything is ok.
>>
>> Can you give me some hints what can be wrong?
>>
> 
> Is it with SLUB, SLAB or SLOB allocator ?

SLAB.

CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set

Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:28 ` Eric Dumazet
  2011-12-21 10:30   ` Michal Simek
@ 2011-12-21 10:30   ` Eric Dumazet
  2011-12-21 10:32     ` Michal Simek
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 10:30 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 11:28 +0100, Eric Dumazet a écrit :
> Le mercredi 21 décembre 2011 à 11:11 +0100, Michal Simek a écrit :
> > Hi Eric and David,
> > 
> > I have found one problem with ll_temac driver and
> > this commit: net: more accurate skb truesize
> > sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
> > 
> > The problem is only with icmp packets from the target. It is sent and driver receive it
> > but it is not proceed to the application.
> > 
> > The problem I see is that kmalloc_node_track_caller allocate
> > specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
> > The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.
> > 
> > Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
> > everything is ok.
> > 
> > Can you give me some hints what can be wrong?
> > 
> 
> Is it with SLUB, SLAB or SLOB allocator ?
> 
> 

(I am referring to commit bc417e30f8d
(net: Add back alignment for size for __alloc_skb)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:30   ` Eric Dumazet
@ 2011-12-21 10:32     ` Michal Simek
  2011-12-21 10:37       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-21 10:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 11:28 +0100, Eric Dumazet a écrit :
>> Le mercredi 21 décembre 2011 à 11:11 +0100, Michal Simek a écrit :
>>> Hi Eric and David,
>>>
>>> I have found one problem with ll_temac driver and
>>> this commit: net: more accurate skb truesize
>>> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
>>>
>>> The problem is only with icmp packets from the target. It is sent and driver receive it
>>> but it is not proceed to the application.
>>>
>>> The problem I see is that kmalloc_node_track_caller allocate
>>> specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
>>> The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.
>>>
>>> Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
>>> everything is ok.
>>>
>>> Can you give me some hints what can be wrong?
>>>
>> Is it with SLUB, SLAB or SLOB allocator ?
>>
>>
> 
> (I am referring to commit bc417e30f8d
> (net: Add back alignment for size for __alloc_skb)

I have seen that commit. Using the latest&greatest version with this patch.

Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:32     ` Michal Simek
@ 2011-12-21 10:37       ` Eric Dumazet
  2011-12-21 10:41         ` Michal Simek
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 10:37 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 11:32 +0100, Michal Simek a écrit :
> Eric Dumazet wrote:
> > Le mercredi 21 décembre 2011 à 11:28 +0100, Eric Dumazet a écrit :
> >> Le mercredi 21 décembre 2011 à 11:11 +0100, Michal Simek a écrit :
> >>> Hi Eric and David,
> >>>
> >>> I have found one problem with ll_temac driver and
> >>> this commit: net: more accurate skb truesize
> >>> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
> >>>
> >>> The problem is only with icmp packets from the target. It is sent and driver receive it
> >>> but it is not proceed to the application.
> >>>
> >>> The problem I see is that kmalloc_node_track_caller allocate
> >>> specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
> >>> The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.
> >>>
> >>> Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
> >>> everything is ok.
> >>>
> >>> Can you give me some hints what can be wrong?
> >>>
> >> Is it with SLUB, SLAB or SLOB allocator ?
> >>
> >>
> > 
> > (I am referring to commit bc417e30f8d
> > (net: Add back alignment for size for __alloc_skb)
> 
> I have seen that commit. Using the latest&greatest version with this patch.
> 

So after these changes, struct skb_shared_info is located to the very
end of allocated memory.

Maybe there is a problem with MTU=9000, since assuming PAGE_SIZE=4096 on
your machine (is it ?), we have a 16384 bytes block of memory, and
~16000 available in skb head, instead of ~9000

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:37       ` Eric Dumazet
@ 2011-12-21 10:41         ` Michal Simek
  2011-12-21 10:45           ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-21 10:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 11:32 +0100, Michal Simek a écrit :
>> Eric Dumazet wrote:
>>> Le mercredi 21 décembre 2011 à 11:28 +0100, Eric Dumazet a écrit :
>>>> Le mercredi 21 décembre 2011 à 11:11 +0100, Michal Simek a écrit :
>>>>> Hi Eric and David,
>>>>>
>>>>> I have found one problem with ll_temac driver and
>>>>> this commit: net: more accurate skb truesize
>>>>> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
>>>>>
>>>>> The problem is only with icmp packets from the target. It is sent and driver receive it
>>>>> but it is not proceed to the application.
>>>>>
>>>>> The problem I see is that kmalloc_node_track_caller allocate
>>>>> specific size and then this size is changed by SKB_WITH_OVERHEAD(ksize(data)).
>>>>> The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb function.
>>>>>
>>>>> Currently driver uses setting for jumbo frames (9k). When I change it to use mtu (1,5k) then
>>>>> everything is ok.
>>>>>
>>>>> Can you give me some hints what can be wrong?
>>>>>
>>>> Is it with SLUB, SLAB or SLOB allocator ?
>>>>
>>>>
>>> (I am referring to commit bc417e30f8d
>>> (net: Add back alignment for size for __alloc_skb)
>> I have seen that commit. Using the latest&greatest version with this patch.
>>
> 
> So after these changes, struct skb_shared_info is located to the very
> end of allocated memory.
> 
> Maybe there is a problem with MTU=9000, since assuming PAGE_SIZE=4096 on
> your machine (is it ?), we have a 16384 bytes block of memory, and
> ~16000 available in skb head, instead of ~9000

yes, page size is 4k.

Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
overhead is 0x3f40(16192)

kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
skb->head     c7870000
skb->data     c7870000
skb->tail     c7870000
skb->truesize 000040c0
skb->end      c7873f40


Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:41         ` Michal Simek
@ 2011-12-21 10:45           ` Eric Dumazet
  2011-12-21 10:54             ` Michal Simek
  2011-12-21 11:03             ` Eric Dumazet
  0 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 10:45 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :

> yes, page size is 4k.
> 
> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
> overhead is 0x3f40(16192)
> 
> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
> skb->head     c7870000
> skb->data     c7870000
> skb->tail     c7870000
> skb->truesize 000040c0
> skb->end      c7873f40
> 

So basically truesize is bigger than previous kernel, and you hit a
socket limit.  (RCVBUF ?)

truesize is more correct, so we uncover prior bugs.

We should allow at least one packet, regardless of its size :(

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:45           ` Eric Dumazet
@ 2011-12-21 10:54             ` Michal Simek
  2011-12-21 11:05               ` Eric Dumazet
  2011-12-21 11:03             ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-21 10:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :
> 
>> yes, page size is 4k.
>>
>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
>> overhead is 0x3f40(16192)
>>
>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
>> skb->head     c7870000
>> skb->data     c7870000
>> skb->tail     c7870000
>> skb->truesize 000040c0
>> skb->end      c7873f40
>>
> 
> So basically truesize is bigger than previous kernel, and you hit a
> socket limit.  (RCVBUF ?)

Where is that limit setup?

Michal


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:45           ` Eric Dumazet
  2011-12-21 10:54             ` Michal Simek
@ 2011-12-21 11:03             ` Eric Dumazet
  2011-12-21 11:10               ` Michal Simek
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 11:03 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit :
> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :
> 
> > yes, page size is 4k.
> > 
> > Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
> > overhead is 0x3f40(16192)
> > 
> > kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
> > skb->head     c7870000
> > skb->data     c7870000
> > skb->tail     c7870000
> > skb->truesize 000040c0
> > skb->end      c7873f40
> > 
> 
> So basically truesize is bigger than previous kernel, and you hit a
> socket limit.  (RCVBUF ?)
> 
> truesize is more correct, so we uncover prior bugs.
> 
> We should allow at least one packet, regardless of its size :(
> 
> 

I would try the following patch :

diff --git a/include/net/sock.h b/include/net/sock.h
index bf6b9fd..21bb3b5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 
 /*
  * Take into account size of receive queue and backlog queue
+ * Do not take into account this skb truesize,
+ * to allow even a single big packet to come.
  */
 static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
 {
 	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
 
-	return qsize + skb->truesize > sk->sk_rcvbuf;
+	return qsize > sk->sk_rcvbuf;
 }
 
 /* The per-socket spinlock must be held here. */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:54             ` Michal Simek
@ 2011-12-21 11:05               ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 11:05 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 11:54 +0100, Michal Simek a écrit :
> Eric Dumazet wrote:
> > Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :
> > 
> >> yes, page size is 4k.
> >>
> >> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
> >> overhead is 0x3f40(16192)
> >>
> >> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
> >> skb->head     c7870000
> >> skb->data     c7870000
> >> skb->tail     c7870000
> >> skb->truesize 000040c0
> >> skb->end      c7873f40
> >>
> > 
> > So basically truesize is bigger than previous kernel, and you hit a
> > socket limit.  (RCVBUF ?)
> 
> Where is that limit setup?
> 

Might be in the application itself.

setsockopt(socket, SOL_SOCKET, SO_RCVBUF, &limit, sizeof(limit));

kernel currently apply a factor of two to take into account skb
overhead. This might be not enough with jumbo frames.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 11:03             ` Eric Dumazet
@ 2011-12-21 11:10               ` Michal Simek
  2011-12-21 11:13                 ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-21 11:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit :
>> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :
>>
>>> yes, page size is 4k.
>>>
>>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
>>> overhead is 0x3f40(16192)
>>>
>>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
>>> skb->head     c7870000
>>> skb->data     c7870000
>>> skb->tail     c7870000
>>> skb->truesize 000040c0
>>> skb->end      c7873f40
>>>
>> So basically truesize is bigger than previous kernel, and you hit a
>> socket limit.  (RCVBUF ?)
>>
>> truesize is more correct, so we uncover prior bugs.
>>
>> We should allow at least one packet, regardless of its size :(
>>
>>
> 
> I would try the following patch :
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index bf6b9fd..21bb3b5 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  
>  /*
>   * Take into account size of receive queue and backlog queue
> + * Do not take into account this skb truesize,
> + * to allow even a single big packet to come.
>   */
>  static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
>  {
>  	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
>  
> -	return qsize + skb->truesize > sk->sk_rcvbuf;
> +	return qsize > sk->sk_rcvbuf;
>  }
>  
>  /* The per-socket spinlock must be held here. */
> 

It doesn't fix the problem. Do you want to add some printk to code?

Michal



-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 11:10               ` Michal Simek
@ 2011-12-21 11:13                 ` Eric Dumazet
  2011-12-21 11:50                   ` Michal Simek
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 11:13 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 12:10 +0100, Michal Simek a écrit :
> Eric Dumazet wrote:
> > Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit :
> >> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :
> >>
> >>> yes, page size is 4k.
> >>>
> >>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
> >>> overhead is 0x3f40(16192)
> >>>
> >>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
> >>> skb->head     c7870000
> >>> skb->data     c7870000
> >>> skb->tail     c7870000
> >>> skb->truesize 000040c0
> >>> skb->end      c7873f40
> >>>
> >> So basically truesize is bigger than previous kernel, and you hit a
> >> socket limit.  (RCVBUF ?)
> >>
> >> truesize is more correct, so we uncover prior bugs.
> >>
> >> We should allow at least one packet, regardless of its size :(
> >>
> >>
> > 
> > I would try the following patch :
> > 
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index bf6b9fd..21bb3b5 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> >  
> >  /*
> >   * Take into account size of receive queue and backlog queue
> > + * Do not take into account this skb truesize,
> > + * to allow even a single big packet to come.
> >   */
> >  static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
> >  {
> >  	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
> >  
> > -	return qsize + skb->truesize > sk->sk_rcvbuf;
> > +	return qsize > sk->sk_rcvbuf;
> >  }
> >  
> >  /* The per-socket spinlock must be held here. */
> > 
> 
> It doesn't fix the problem. Do you want to add some printk to code?
> 
> Michal
> 
> 
> 

try to change SOCK_MIN_RCVBUF

include/net/sock.h

#define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))

->

#define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff))

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 11:13                 ` Eric Dumazet
@ 2011-12-21 11:50                   ` Michal Simek
  2011-12-21 12:39                     ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-21 11:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 12:10 +0100, Michal Simek a écrit :
>> Eric Dumazet wrote:
>>> Le mercredi 21 décembre 2011 à 11:45 +0100, Eric Dumazet a écrit :
>>>> Le mercredi 21 décembre 2011 à 11:41 +0100, Michal Simek a écrit :
>>>>
>>>>> yes, page size is 4k.
>>>>>
>>>>> Here is the log for allocation - origin size for MTU 9000 is 0x2420(9248) and size after
>>>>> overhead is 0x3f40(16192)
>>>>>
>>>>> kmalloc_node_track_caller size 00002420, SKB_WITH_OVERHEAD(ksize(data)) 00003f40
>>>>> skb->head     c7870000
>>>>> skb->data     c7870000
>>>>> skb->tail     c7870000
>>>>> skb->truesize 000040c0
>>>>> skb->end      c7873f40
>>>>>
>>>> So basically truesize is bigger than previous kernel, and you hit a
>>>> socket limit.  (RCVBUF ?)
>>>>
>>>> truesize is more correct, so we uncover prior bugs.
>>>>
>>>> We should allow at least one packet, regardless of its size :(
>>>>
>>>>
>>> I would try the following patch :
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index bf6b9fd..21bb3b5 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>>>  
>>>  /*
>>>   * Take into account size of receive queue and backlog queue
>>> + * Do not take into account this skb truesize,
>>> + * to allow even a single big packet to come.
>>>   */
>>>  static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
>>>  {
>>>  	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
>>>  
>>> -	return qsize + skb->truesize > sk->sk_rcvbuf;
>>> +	return qsize > sk->sk_rcvbuf;
>>>  }
>>>  
>>>  /* The per-socket spinlock must be held here. */
>>>
>> It doesn't fix the problem. Do you want to add some printk to code?
>>
>> Michal
>>
>>
>>
> 
> try to change SOCK_MIN_RCVBUF
> 
> include/net/sock.h
> 
> #define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))
> 
> ->
> 
> #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff))
> 

Doesn't work. :-(

Michal



-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 11:50                   ` Michal Simek
@ 2011-12-21 12:39                     ` Eric Dumazet
  2011-12-21 13:28                       ` Michal Simek
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 12:39 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 12:50 +0100, Michal Simek a écrit :

> > #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff))
> > 
> 
> Doesn't work. :-(
> 

Sorry, I have no idea where packet is dropped, I dont know your
application.

You might send "netstat -s" output if you cant find it ...

You spoke of ICMP message, I cant find a limit in this area ?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 12:39                     ` Eric Dumazet
@ 2011-12-21 13:28                       ` Michal Simek
  2011-12-21 13:40                         ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-21 13:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 12:50 +0100, Michal Simek a écrit :
> 
>>> #define SOCK_MIN_RCVBUF (16384 + sizeof(struct sk_buff))
>>>
>> Doesn't work. :-(
>>
> 
> Sorry, I have no idea where packet is dropped, I dont know your
> application.

Application is simple ping from Busybox - nothing special.


> You might send "netstat -s" output if you cant find it ...

~ # ./netstat -s
Ip:
     36 total packets received
     2 with invalid addresses
     0 forwarded
     0 incoming packets discarded
     34 incoming packets delivered
     34 requests sent out
Icmp:
     34 ICMP messages received
     0 input ICMP message failed.
     ICMP input histogram:
         echo replies: 34
     34 ICMP messages sent
     0 ICMP messages failed
     ICMP output histogram:
         echo request: 34
IcmpMsg:
         InType0: 34
         OutType8: 34
Tcp:
     0 active connections openings
     0 passive connection openings
     0 failed connection attempts
     0 connection resets received
     0 connections established
     0 segments received
     0 segments send out
     0 segments retransmited
     0 bad segments received.
     0 resets sent
Udp:
     0 packets received
     0 packets to unknown port received.
     0 packet receive errors
     0 packets sent
     RcvbufErrors: 0
     SndbufErrors: 0
UdpLite:
     InDatagrams: 0
     NoPorts: 0
     InErrors: 0
     OutDatagrams: 0
     RcvbufErrors: 0
     SndbufErrors: 0
TcpExt:
     ArpFilter: 0
     0 packets header predicted
     TCPPureAcks: 0
     TCPHPAcks: 0
     TCPRenoRecovery: 0
     TCPSackRecovery: 0
     TCPSACKReneging: 0
     TCPFACKReorder: 0
     TCPSACKReorder: 0
     TCPRenoReorder: 0
     TCPTSReorder: 0
     TCPFullUndo: 0
     TCPPartialUndo: 0
     TCPDSACKUndo: 0
     TCPLossUndo: 0
     TCPLoss: 0
     TCPLostRetransmit: 0
     TCPRenoFailures: 0
     TCPSackFailures: 0
     TCPLossFailures: 0
     TCPFastRetrans: 0
     TCPForwardRetrans: 0
     TCPSlowStartRetrans: 0
     TCPTimeouts: 0
     TCPRenoRecoveryFail: 0
     TCPSackRecoveryFail: 0
     TCPSchedulerFailed: 0
     TCPRcvCollapsed: 0
     TCPDSACKOldSent: 0
     TCPDSACKOfoSent: 0
     TCPDSACKRecv: 0
     TCPDSACKOfoRecv: 0
     TCPAbortOnSyn: 0
     TCPAbortOnData: 0
     TCPAbortOnClose: 0
     TCPAbortOnMemory: 0
     TCPAbortOnTimeout: 0
     TCPAbortOnLinger: 0
     TCPAbortFailed: 0
     TCPMemoryPressures: 0
     TCPSACKDiscard: 0
     TCPDSACKIgnoredOld: 0
     TCPDSACKIgnoredNoUndo: 0
     TCPSpuriousRTOs: 0
     TCPMD5NotFound: 0
     TCPMD5Unexpected: 0
     TCPSackShifted: 0
     TCPSackMerged: 0
     TCPSackShiftFallback: 0
     TCPBacklogDrop: 0
     TCPMinTTLDrop: 0
     TCPDeferAcceptDrop: 0
     IPReversePathFilter: 0
     TCPTimeWaitOverflow: 0
     TCPReqQFullDoCookies: 0
     TCPReqQFullDrop: 0
IpExt:
     InNoRoutes: 0
     InTruncatedPkts: 0
     InMcastPkts: 0
     OutMcastPkts: 0
     InBcastPkts: 0
     OutBcastPkts: 0
     InOctets: 4008
     OutOctets: 2856
     InMcastOctets: 0
     OutMcastOctets: 0
     InBcastOctets: 0
     OutBcastOctets: 0



> You spoke of ICMP message, I cant find a limit in this area ?

ok. Can you provide me any background why size should be setup by
size = SKB_WITH_OVERHEAD(ksize(data));
and not to use size which is passed to kmalloc in __alloc_skb.

In past I have seen some issues because of caches but this is not that
case because on QEMU (where cache is not modeled) everything works the same as on HW.

Below is the log what I see.

Thanks,
Michal


~ # ping -c 5 192.168.0.101
PING 192.168.0.101 (192.168.0.101): 56 data bytes

--- 192.168.0.101 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
~ # mount -t nfs -o nolock -o tcp 192.168.0.101:/tftpboot/nfs /mnt
~ # ls /mnt/ | head
0001-sd.patch
0001-syscalls-getdtablesize01-fix-for-missing-etc-hosts.patch
1
1-xils
2
arp
arp-mb
busybox
core
exec
~ # umount /mnt/
~ # ./netstat -s
Ip:
     36 total packets received
     2 with invalid addresses
     0 forwarded
     0 incoming packets discarded
     34 incoming packets delivered
     43 requests sent out
Icmp:
     5 ICMP messages received
     0 input ICMP message failed.
     ICMP input histogram:
         echo replies: 5
     5 ICMP messages sent
     0 ICMP messages failed
     ICMP output histogram:
         echo request: 5
IcmpMsg:
         InType0: 5
         OutType8: 5
Tcp:
     4 active connections openings
     0 passive connection openings
     0 failed connection attempts
     0 connection resets received
     0 connections established
     29 segments received
     38 segments send out
     0 segments retransmited
     0 bad segments received.
     0 resets sent
Udp:
     0 packets received
     0 packets to unknown port received.
     0 packet receive errors
     0 packets sent
     RcvbufErrors: 0
     SndbufErrors: 0
UdpLite:
     InDatagrams: 0
     NoPorts: 0
     InErrors: 0
     OutDatagrams: 0
     RcvbufErrors: 0
     SndbufErrors: 0
TcpExt:
     ArpFilter: 0
     1 delayed acks sent
     10 packets header predicted
     TCPPureAcks: 4
     TCPHPAcks: 6
     TCPRenoRecovery: 0
     TCPSackRecovery: 0
     TCPSACKReneging: 0
     TCPFACKReorder: 0
     TCPSACKReorder: 0
     TCPRenoReorder: 0
     TCPTSReorder: 0
     TCPFullUndo: 0
     TCPPartialUndo: 0
     TCPDSACKUndo: 0
     TCPLossUndo: 0
     TCPLoss: 0
     TCPLostRetransmit: 0
     TCPRenoFailures: 0
     TCPSackFailures: 0
     TCPLossFailures: 0
     TCPFastRetrans: 0
     TCPForwardRetrans: 0
     TCPSlowStartRetrans: 0
     TCPTimeouts: 0
     TCPRenoRecoveryFail: 0
     TCPSackRecoveryFail: 0
     TCPSchedulerFailed: 0
     TCPRcvCollapsed: 0
     TCPDSACKOldSent: 0
     TCPDSACKOfoSent: 0
     TCPDSACKRecv: 0
     TCPDSACKOfoRecv: 0
     TCPAbortOnSyn: 0
     TCPAbortOnData: 0
     TCPAbortOnClose: 0
     TCPAbortOnMemory: 0
     TCPAbortOnTimeout: 0
     TCPAbortOnLinger: 0
     TCPAbortFailed: 0
     TCPMemoryPressures: 0
     TCPSACKDiscard: 0
     TCPDSACKIgnoredOld: 0
     TCPDSACKIgnoredNoUndo: 0
     TCPSpuriousRTOs: 0
     TCPMD5NotFound: 0
     TCPMD5Unexpected: 0
     TCPSackShifted: 0
     TCPSackMerged: 0
     TCPSackShiftFallback: 0
     TCPBacklogDrop: 0
     TCPMinTTLDrop: 0
     TCPDeferAcceptDrop: 0
     IPReversePathFilter: 0
     TCPTimeWaitOverflow: 0
     TCPReqQFullDoCookies: 0
     TCPReqQFullDrop: 0
IpExt:
     InNoRoutes: 0
     InTruncatedPkts: 0
     InMcastPkts: 0
     OutMcastPkts: 0
     InBcastPkts: 0
     OutBcastPkts: 0
     InOctets: 11348
     OutOctets: 3752
     InMcastOctets: 0
     OutMcastOctets: 0
     InBcastOctets: 0
     OutBcastOctets: 0
~ #
~ #
~ # tcpdump
device eth0 entered promiscuous mode
tcpdump: listening on eth0
00:01:31.421191 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:31.479686 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:31.442325 arp who-has 192.168.0.1 tell 192.168.0.100
00:01:31.446052 arp reply 192.168.0.1 is-at f4:ec:38:b2:6a:9e
00:01:31.446256 192.168.0.100.47044 > 192.168.0.1.53: 45191+ (44) (DF)
00:01:31.461859 192.168.0.1.53 > 192.168.0.100.47044: 45191 NXDomain 0/1/0 (121)
00:01:31.464125 192.168.0.100.41833 > 192.168.0.1.53: 37485+ (44) (DF)
00:01:31.478026 192.168.0.1.53 > 192.168.0.100.41833: 37485 NXDomain 0/1/0 (121)
00:01:31.481794 192.168.0.100.35138 > 192.168.0.1.53: 56594+ (42) (DF)
00:01:31.497202 192.168.0.1.53 > 192.168.0.100.35138: 56594 NXDomain 0/1/0 (119)
00:01:32.419130 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:32.419482 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:33.418342 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:33.418693 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:34.419327 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:34.419676 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:35.418042 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:35.418406 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:36.418923 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:36.419275 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:36.459936 arp who-has 192.168.0.100 tell 192.168.0.1
00:01:36.460216 arp reply 192.168.0.100 is-at 0:a:35:0:19:15
00:01:37.418800 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:37.419149 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:38.419247 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:38.419617 192.168.0.100 > 192.168.0.101: icmp: echo reply
00:01:39.418820 192.168.0.101 > 192.168.0.100: icmp: echo request (DF)
00:01:39.419169 192.168.0.100 > 192.168.0.101: icmp: echo reply
\x03
56 packets received by filter
0 packets droppeddevice eth0 left promiscuous mode
  by kernel
~ #
~ # ping 192.168.0.101
PING 192.168.0.101 (192.168.0.101): 56 data bytes
\x03
--- 192.168.0.101 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
~ # ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
\x03
--- 192.168.0.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
~ # ping 192.168.0.101 -c 10 &
~ # PING 192.168.0.101 (192.168.0.101): 56 data bytes
~ # tcpdump
device eth0 entered promiscuous mode
tcpdump: listening on eth0
00:01:59.615918 192.168.0.100 > 192.168.0.101: icmp: echo request (DF)
00:01:59.619166 192.168.0.101 > 192.168.0.100: icmp: echo reply
00:01:59.636473 192.168.0.100.57212 > 192.168.0.1.53: 43026+ (44) (DF)
00:01:59.650838 192.168.0.1.53 > 192.168.0.100.57212: 43026 NXDomain 0/1/0 (121)
00:01:59.652915 192.168.0.100.41475 > 192.168.0.1.53: 28371+ (44) (DF)
00:01:59.666907 192.168.0.1.53 > 192.168.0.100.41475: 28371 NXDomain 0/1/0 (121)
00:01:59.670832 192.168.0.100.47770 > 192.168.0.1.53: 61908+ (42) (DF)
00:01:59.684921 192.168.0.1.53 > 192.168.0.100.47770: 61908 NXDomain 0/1/0 (119)
00:02:00.616493 192.168.0.100 > 192.168.0.101: icmp: echo request (DF)
00:02:00.620237 192.168.0.101 > 192.168.0.100: icmp: echo reply
00:02:00.624275 arp who-has 192.168.0.101 tell 192.168.0.100
00:02:00.627936 arp reply 192.168.0.101 is-at 0:22:15:fb:34:a1
00:02:01.617232 192.168.0.100 > 192.168.0.101: icmp: echo request (DF)
00:02:01.620980 192.168.0.101 > 192.168.0.100: icmp: echo reply
00:02:02.617962 192.168.0.100 > 192.168.0.101: icmp: echo request (DF)
00:02:02.621687 192.168.0.101 > 192.168.0.100: icmp: echo reply
00:02:03.618687 192.168.0.100 > 192.168.0.101: icmp: echo request (DF)
00:02:03.622389 192.168.0.101 > 192.168.0.100: icmp: echo reply
00:02:04.619414 192.168.0.100 > 192.168.0.101: icmp: echo request (DF)
00:02:04.623129 192.168.0.101 > 192.168.0.100: icmp: echo reply


42 packets received by filter
0 packets droppeddevice eth0 left promiscuous mode
  by kernel
~ #
--- 192.168.0.101 ping statistics ---
10 packets transmitted, 0 packets received, 100% packet loss

[1] + Done(1)                    ping 192.168.0.101 -c 10
~ #


-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 13:28                       ` Michal Simek
@ 2011-12-21 13:40                         ` Eric Dumazet
  2011-12-21 14:24                           ` Michal Simek
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 13:40 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 14:28 +0100, Michal Simek a écrit :

> ok. Can you provide me any background why size should be setup by
> size = SKB_WITH_OVERHEAD(ksize(data));
> and not to use size which is passed to kmalloc in __alloc_skb.

Its all about memory accounting (based on skb->truesize)

Prior to the patch, we could fool memory accounting because skbs claimed
to use less memory than what they really used.

And crash machines eventually.

Now memory accouting is fixed, we probably need to change some points in
the kernel, where we previously accepted a small skb, but not a very
large one.

Since "ping" probably uses SOCK_RAW sockets, I'll try this one :

(We dont care of _this_ skb truesize, only on the count of previously
queued packets)


diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 0da505c..a809a48 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (snaplen > res)
 		snaplen = res;
 
-	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
-	    (unsigned)sk->sk_rcvbuf)
+	if (atomic_read(&sk->sk_rmem_alloc) >= (unsigned)sk->sk_rcvbuf)
 		goto drop_n_acct;
 
 	if (skb_shared(skb)) {
@@ -1763,7 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (po->tp_version <= TPACKET_V2) {
 		if (macoff + snaplen > po->rx_ring.frame_size) {
 			if (po->copy_thresh &&
-				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
+				atomic_read(&sk->sk_rmem_alloc)
 				< (unsigned)sk->sk_rcvbuf) {
 				if (skb_shared(skb)) {
 					copy_skb = skb_clone(skb, GFP_ATOMIC);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 13:40                         ` Eric Dumazet
@ 2011-12-21 14:24                           ` Michal Simek
  2011-12-21 15:30                             ` Eric Dumazet
  2011-12-21 15:59                             ` Eric Dumazet
  0 siblings, 2 replies; 36+ messages in thread
From: Michal Simek @ 2011-12-21 14:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, John Williams, netdev

Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 14:28 +0100, Michal Simek a écrit :
> 
>> ok. Can you provide me any background why size should be setup by
>> size = SKB_WITH_OVERHEAD(ksize(data));
>> and not to use size which is passed to kmalloc in __alloc_skb.
> 
> Its all about memory accounting (based on skb->truesize)
> 
> Prior to the patch, we could fool memory accounting because skbs claimed
> to use less memory than what they really used.
> 
> And crash machines eventually.
> 
> Now memory accouting is fixed, we probably need to change some points in
> the kernel, where we previously accepted a small skb, but not a very
> large one.
> 
> Since "ping" probably uses SOCK_RAW sockets, I'll try this one :
> 
> (We dont care of _this_ skb truesize, only on the count of previously
> queued packets)
> 
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 0da505c..a809a48 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if (snaplen > res)
>  		snaplen = res;
>  
> -	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
> -	    (unsigned)sk->sk_rcvbuf)
> +	if (atomic_read(&sk->sk_rmem_alloc) >= (unsigned)sk->sk_rcvbuf)
>  		goto drop_n_acct;
>  
>  	if (skb_shared(skb)) {
> @@ -1763,7 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if (po->tp_version <= TPACKET_V2) {
>  		if (macoff + snaplen > po->rx_ring.frame_size) {
>  			if (po->copy_thresh &&
> -				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
> +				atomic_read(&sk->sk_rmem_alloc)
>  				< (unsigned)sk->sk_rcvbuf) {
>  				if (skb_shared(skb)) {
>  					copy_skb = skb_clone(skb, GFP_ATOMIC);
> 
> 
> 
> 

It doesn't work too.
It is possible to see this behavior on qemu. What about if I prepare you package with cross toolchain, rootfs
and you can add debug message where you want?

I have also tried ll_temac driver with ppc440 and behavior is the same.

Max FRAME_SIZE pass to netdev_alloc_skb_ip_align is 7966. For this value ping works.
(For ll_temac driver it is #define XTE_JUMBO_MTU 7948 from ll_temac.h)

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 14:24                           ` Michal Simek
@ 2011-12-21 15:30                             ` Eric Dumazet
  2011-12-21 15:44                               ` Eric Dumazet
  2011-12-21 15:59                             ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 15:30 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 15:24 +0100, Michal Simek a écrit :
> Eric Dumazet wrote:
> > Le mercredi 21 décembre 2011 à 14:28 +0100, Michal Simek a écrit :
> > 
> >> ok. Can you provide me any background why size should be setup by
> >> size = SKB_WITH_OVERHEAD(ksize(data));
> >> and not to use size which is passed to kmalloc in __alloc_skb.
> > 
> > Its all about memory accounting (based on skb->truesize)
> > 
> > Prior to the patch, we could fool memory accounting because skbs claimed
> > to use less memory than what they really used.
> > 
> > And crash machines eventually.
> > 
> > Now memory accouting is fixed, we probably need to change some points in
> > the kernel, where we previously accepted a small skb, but not a very
> > large one.
> > 
> > Since "ping" probably uses SOCK_RAW sockets, I'll try this one :
> > 
> > (We dont care of _this_ skb truesize, only on the count of previously
> > queued packets)
> > 
> > 
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 0da505c..a809a48 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
> >  	if (snaplen > res)
> >  		snaplen = res;
> >  
> > -	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
> > -	    (unsigned)sk->sk_rcvbuf)
> > +	if (atomic_read(&sk->sk_rmem_alloc) >= (unsigned)sk->sk_rcvbuf)
> >  		goto drop_n_acct;
> >  
> >  	if (skb_shared(skb)) {
> > @@ -1763,7 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> >  	if (po->tp_version <= TPACKET_V2) {
> >  		if (macoff + snaplen > po->rx_ring.frame_size) {
> >  			if (po->copy_thresh &&
> > -				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
> > +				atomic_read(&sk->sk_rmem_alloc)
> >  				< (unsigned)sk->sk_rcvbuf) {
> >  				if (skb_shared(skb)) {
> >  					copy_skb = skb_clone(skb, GFP_ATOMIC);
> > 
> > 
> > 
> > 
> 
> It doesn't work too.
> It is possible to see this behavior on qemu. What about if I prepare you package with cross toolchain, rootfs
> and you can add debug message where you want?
> 
> I have also tried ll_temac driver with ppc440 and behavior is the same.
> 
> Max FRAME_SIZE pass to netdev_alloc_skb_ip_align is 7966. For this value ping works.
> (For ll_temac driver it is #define XTE_JUMBO_MTU 7948 from ll_temac.h)

I did several tests with MTU 9000 on my machines and it works well.

It seems my pings (iputils-sss20071127 or iputils-sss20101006) uses a
big enough RCVBUF

setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0

Could you check with "strace ping..." what is doing busybox ?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 15:30                             ` Eric Dumazet
@ 2011-12-21 15:44                               ` Eric Dumazet
  2011-12-21 16:01                                 ` Jun Zhao
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 15:44 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 16:30 +0100, Eric Dumazet a écrit :
> Le mercredi 21 décembre 2011 à 15:24 +0100, Michal Simek a écrit :
> > Eric Dumazet wrote:
> > > Le mercredi 21 décembre 2011 à 14:28 +0100, Michal Simek a écrit :
> > > 
> > >> ok. Can you provide me any background why size should be setup by
> > >> size = SKB_WITH_OVERHEAD(ksize(data));
> > >> and not to use size which is passed to kmalloc in __alloc_skb.
> > > 
> > > Its all about memory accounting (based on skb->truesize)
> > > 
> > > Prior to the patch, we could fool memory accounting because skbs claimed
> > > to use less memory than what they really used.
> > > 
> > > And crash machines eventually.
> > > 
> > > Now memory accouting is fixed, we probably need to change some points in
> > > the kernel, where we previously accepted a small skb, but not a very
> > > large one.
> > > 
> > > Since "ping" probably uses SOCK_RAW sockets, I'll try this one :
> > > 
> > > (We dont care of _this_ skb truesize, only on the count of previously
> > > queued packets)
> > > 
> > > 
> > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > index 0da505c..a809a48 100644
> > > --- a/net/packet/af_packet.c
> > > +++ b/net/packet/af_packet.c
> > > @@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
> > >  	if (snaplen > res)
> > >  		snaplen = res;
> > >  
> > > -	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
> > > -	    (unsigned)sk->sk_rcvbuf)
> > > +	if (atomic_read(&sk->sk_rmem_alloc) >= (unsigned)sk->sk_rcvbuf)
> > >  		goto drop_n_acct;
> > >  
> > >  	if (skb_shared(skb)) {
> > > @@ -1763,7 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > >  	if (po->tp_version <= TPACKET_V2) {
> > >  		if (macoff + snaplen > po->rx_ring.frame_size) {
> > >  			if (po->copy_thresh &&
> > > -				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
> > > +				atomic_read(&sk->sk_rmem_alloc)
> > >  				< (unsigned)sk->sk_rcvbuf) {
> > >  				if (skb_shared(skb)) {
> > >  					copy_skb = skb_clone(skb, GFP_ATOMIC);
> > > 
> > > 
> > > 
> > > 
> > 
> > It doesn't work too.
> > It is possible to see this behavior on qemu. What about if I prepare you package with cross toolchain, rootfs
> > and you can add debug message where you want?
> > 
> > I have also tried ll_temac driver with ppc440 and behavior is the same.
> > 
> > Max FRAME_SIZE pass to netdev_alloc_skb_ip_align is 7966. For this value ping works.
> > (For ll_temac driver it is #define XTE_JUMBO_MTU 7948 from ll_temac.h)
> 
> I did several tests with MTU 9000 on my machines and it works well.
> 
> It seems my pings (iputils-sss20071127 or iputils-sss20101006) uses a
> big enough RCVBUF
> 
> setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
> getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0
> 
> Could you check with "strace ping..." what is doing busybox ?

I found it : Its too small for jumbo frames.

setsockopt(3, SOL_SOCKET, SO_RCVBUF, [7280], 4) = 0

networking/ping.c


        /* set recv buf (needed if we can get lots of responses: flood ping,
         * broadcast ping etc) */
        sockopt = (datalen * 2) + 7 * 1024; /* giving it a bit of extra room */
        setsockopt(pingsock, SOL_SOCKET, SO_RCVBUF, &sockopt, sizeof(sockopt));

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 14:24                           ` Michal Simek
  2011-12-21 15:30                             ` Eric Dumazet
@ 2011-12-21 15:59                             ` Eric Dumazet
  2011-12-21 17:11                               ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 15:59 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 15:24 +0100, Michal Simek a écrit :
> Eric Dumazet wrote:
> > Le mercredi 21 décembre 2011 à 14:28 +0100, Michal Simek a écrit :
> > 
> >> ok. Can you provide me any background why size should be setup by
> >> size = SKB_WITH_OVERHEAD(ksize(data));
> >> and not to use size which is passed to kmalloc in __alloc_skb.
> > 
> > Its all about memory accounting (based on skb->truesize)
> > 
> > Prior to the patch, we could fool memory accounting because skbs claimed
> > to use less memory than what they really used.
> > 
> > And crash machines eventually.
> > 
> > Now memory accouting is fixed, we probably need to change some points in
> > the kernel, where we previously accepted a small skb, but not a very
> > large one.
> > 
> > Since "ping" probably uses SOCK_RAW sockets, I'll try this one :
> > 
> > (We dont care of _this_ skb truesize, only on the count of previously
> > queued packets)
> > 
> > 
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 0da505c..a809a48 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
> >  	if (snaplen > res)
> >  		snaplen = res;
> >  
> > -	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
> > -	    (unsigned)sk->sk_rcvbuf)
> > +	if (atomic_read(&sk->sk_rmem_alloc) >= (unsigned)sk->sk_rcvbuf)
> >  		goto drop_n_acct;
> >  
> >  	if (skb_shared(skb)) {
> > @@ -1763,7 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> >  	if (po->tp_version <= TPACKET_V2) {
> >  		if (macoff + snaplen > po->rx_ring.frame_size) {
> >  			if (po->copy_thresh &&
> > -				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
> > +				atomic_read(&sk->sk_rmem_alloc)
> >  				< (unsigned)sk->sk_rcvbuf) {
> >  				if (skb_shared(skb)) {
> >  					copy_skb = skb_clone(skb, GFP_ATOMIC);
> > 
> > 
> > 
> > 
> 
> It doesn't work too.
> It is possible to see this behavior on qemu. What about if I prepare you package with cross toolchain, rootfs
> and you can add debug message where you want?
> 
> I have also tried ll_temac driver with ppc440 and behavior is the same.
> 
> Max FRAME_SIZE pass to netdev_alloc_skb_ip_align is 7966. For this value ping works.
> (For ll_temac driver it is #define XTE_JUMBO_MTU 7948 from ll_temac.h)
> 

I wonder if you applied/tested my patch correctly, since it really
should had help in your case  (allowing first received packet to be
queued, even if very big)

Given the way NIC drivers work (pre-allocating big buffers before
hardware fill frames in them), SO_RCVBUF limits are difficult to
respect.

We should allow at least one frame, even with a very small SO_RCVBUF
setting, or silently cap a minimum sane value.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 15:44                               ` Eric Dumazet
@ 2011-12-21 16:01                                 ` Jun Zhao
  2011-12-21 16:05                                   ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Jun Zhao @ 2011-12-21 16:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: monstr, David Miller, John Williams, netdev

On Wed, 2011-12-21 at 16:44 +0100, Eric Dumazet wrote:
> Le mercredi 21 décembre 2011 à 16:30 +0100, Eric Dumazet a écrit :
> > Le mercredi 21 décembre 2011 à 15:24 +0100, Michal Simek a écrit :
> > > Eric Dumazet wrote:
> > > > Le mercredi 21 décembre 2011 à 14:28 +0100, Michal Simek a écrit :
> > > > 
> > > >> ok. Can you provide me any background why size should be setup by
> > > >> size = SKB_WITH_OVERHEAD(ksize(data));
> > > >> and not to use size which is passed to kmalloc in __alloc_skb.
> > > > 
> > > > Its all about memory accounting (based on skb->truesize)
> > > > 
> > > > Prior to the patch, we could fool memory accounting because skbs claimed
> > > > to use less memory than what they really used.
> > > > 
> > > > And crash machines eventually.
> > > > 
> > > > Now memory accouting is fixed, we probably need to change some points in
> > > > the kernel, where we previously accepted a small skb, but not a very
> > > > large one.
> > > > 
> > > > Since "ping" probably uses SOCK_RAW sockets, I'll try this one :
> > > > 
> > > > (We dont care of _this_ skb truesize, only on the count of previously
> > > > queued packets)
> > > > 
> > > > 
> > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > > index 0da505c..a809a48 100644
> > > > --- a/net/packet/af_packet.c
> > > > +++ b/net/packet/af_packet.c
> > > > @@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
> > > >  	if (snaplen > res)
> > > >  		snaplen = res;
> > > >  
> > > > -	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
> > > > -	    (unsigned)sk->sk_rcvbuf)
> > > > +	if (atomic_read(&sk->sk_rmem_alloc) >= (unsigned)sk->sk_rcvbuf)
> > > >  		goto drop_n_acct;
> > > >  
> > > >  	if (skb_shared(skb)) {
> > > > @@ -1763,7 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > > >  	if (po->tp_version <= TPACKET_V2) {
> > > >  		if (macoff + snaplen > po->rx_ring.frame_size) {
> > > >  			if (po->copy_thresh &&
> > > > -				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
> > > > +				atomic_read(&sk->sk_rmem_alloc)
> > > >  				< (unsigned)sk->sk_rcvbuf) {
> > > >  				if (skb_shared(skb)) {
> > > >  					copy_skb = skb_clone(skb, GFP_ATOMIC);
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > It doesn't work too.
> > > It is possible to see this behavior on qemu. What about if I prepare you package with cross toolchain, rootfs
> > > and you can add debug message where you want?
> > > 
> > > I have also tried ll_temac driver with ppc440 and behavior is the same.
> > > 
> > > Max FRAME_SIZE pass to netdev_alloc_skb_ip_align is 7966. For this value ping works.
> > > (For ll_temac driver it is #define XTE_JUMBO_MTU 7948 from ll_temac.h)
> > 
> > I did several tests with MTU 9000 on my machines and it works well.
> > 
> > It seems my pings (iputils-sss20071127 or iputils-sss20101006) uses a
> > big enough RCVBUF
> > 
> > setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
> > getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0
> > 
> > Could you check with "strace ping..." what is doing busybox ?
> 
> I found it : Its too small for jumbo frames.
> 
> setsockopt(3, SOL_SOCKET, SO_RCVBUF, [7280], 4) = 0
> 
> networking/ping.c
> 
> 
>         /* set recv buf (needed if we can get lots of responses: flood ping,
>          * broadcast ping etc) */
>         sockopt = (datalen * 2) + 7 * 1024; /* giving it a bit of extra room */
>         setsockopt(pingsock, SOL_SOCKET, SO_RCVBUF, &sockopt, sizeof(sockopt));
> 
> 
> 
> --

Why receive buffer size MUST bigger than the jumbo frames size even if
the packet size is small?

> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 16:01                                 ` Jun Zhao
@ 2011-12-21 16:05                                   ` Eric Dumazet
  2011-12-21 16:11                                     ` Jun Zhao
  2011-12-21 16:39                                     ` David Laight
  0 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 16:05 UTC (permalink / raw)
  To: Jun Zhao; +Cc: monstr, David Miller, John Williams, netdev

Le jeudi 22 décembre 2011 à 00:01 +0800, Jun Zhao a écrit :

> Why receive buffer size MUST bigger than the jumbo frames size even if
> the packet size is small?

Thats the way it is in linux.

We count the memory size used by the packet.

If not, a malicious attacker could send 1-byte frames and exhaust your
kernel memory.

If you want to reduce it, you might use copybreak : Some drivers copy
the data into a small skb instead of providing a jumbo frame to upper
stack.

tg3 for example.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 16:05                                   ` Eric Dumazet
@ 2011-12-21 16:11                                     ` Jun Zhao
  2011-12-21 16:39                                     ` David Laight
  1 sibling, 0 replies; 36+ messages in thread
From: Jun Zhao @ 2011-12-21 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: monstr, David Miller, John Williams, netdev

On Wed, 2011-12-21 at 17:05 +0100, Eric Dumazet wrote:
> Le jeudi 22 décembre 2011 à 00:01 +0800, Jun Zhao a écrit :
> 
> > Why receive buffer size MUST bigger than the jumbo frames size even if
> > the packet size is small?
> 
> Thats the way it is in linux.
> 
> We count the memory size used by the packet.
> 
> If not, a malicious attacker could send 1-byte frames and exhaust your
> kernel memory.
> 
> If you want to reduce it, you might use copybreak : Some drivers copy
> the data into a small skb instead of providing a jumbo frame to upper
> stack.
> 
> tg3 for example.
> 
> 
> 

Eric Dumazet:

I got it, thanks for you explanation. 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: ICMP packets - ll_temac with Microblaze
  2011-12-21 16:05                                   ` Eric Dumazet
  2011-12-21 16:11                                     ` Jun Zhao
@ 2011-12-21 16:39                                     ` David Laight
  2011-12-21 16:50                                       ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: David Laight @ 2011-12-21 16:39 UTC (permalink / raw)
  To: Eric Dumazet, Jun Zhao; +Cc: monstr, David Miller, John Williams, netdev

 
> If not, a malicious attacker could send 1-byte frames and exhaust your
> kernel memory.

That used to happen in SVR4 - it was possible for single byte TCP (etc)
data to be queued at a STREAM head (which counted actual data bytes)
in a 2k message block.
Actually we also managed to use all kernel memory queueing zero sized
STREAMS messages.

> If you want to reduce it, you might use copybreak : Some drivers copy
> the data into a small skb instead of providing a jumbo frame to upper
> stack.

Many, many moons ago I wrote an ethernet driver that received into
an array of 128 (mostly) 512byte buffers. Full sized frames would have
multiple rx ring entries, but could almost always be copied into a
correctly
sized buffer with a single aligned copy (cache-line aligned if useful).
This was significantly faster than other schemes - especially
on systems where the iommu needed setting to allow the ethernet
hardware to access memory.

I don't know if the linux skb buffers would allow the ethernet
driver to use (say) 1600 byte rx buffers, and link them together
when a long frame arrives. Most ethernet HW I've seen will
fragment long receives.

The painful hardware is that which enforces a 4n byte alignment
on the rx buffer! - on systens that can't do misaligned accesses.
2 bytes of junk would be fine!

	David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: ICMP packets - ll_temac with Microblaze
  2011-12-21 16:39                                     ` David Laight
@ 2011-12-21 16:50                                       ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 16:50 UTC (permalink / raw)
  To: David Laight; +Cc: Jun Zhao, monstr, David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 16:39 +0000, David Laight a écrit :

> Many, many moons ago I wrote an ethernet driver that received into
> an array of 128 (mostly) 512byte buffers. Full sized frames would have
> multiple rx ring entries, but could almost always be copied into a
> correctly
> sized buffer with a single aligned copy (cache-line aligned if useful).
> This was significantly faster than other schemes - especially
> on systems where the iommu needed setting to allow the ethernet
> hardware to access memory.
> 
> I don't know if the linux skb buffers would allow the ethernet
> driver to use (say) 1600 byte rx buffers, and link them together
> when a long frame arrives. Most ethernet HW I've seen will
> fragment long receives.
> 
> The painful hardware is that which enforces a 4n byte alignment
> on the rx buffer! - on systens that can't do misaligned accesses.
> 2 bytes of junk would be fine!

It all depends on hardware capabilities.

Old hardware required a single area per frame. So driver had to talk to
the hardware the same way.

We now have hardware able to split data into several frags to reduce the
overhead.

(Even using different pools to get a low number of frags, and good 
filling ratio)

Take a look at NIU for example.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 15:59                             ` Eric Dumazet
@ 2011-12-21 17:11                               ` Eric Dumazet
  2011-12-21 20:55                                 ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-21 17:11 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, John Williams, netdev

Le mercredi 21 décembre 2011 à 16:59 +0100, Eric Dumazet a écrit :

> I wonder if you applied/tested my patch correctly, since it really
> should had help in your case  (allowing first received packet to be
> queued, even if very big)

Hmm, I missed another spot in sock_queue_rcv_skb().
(RAW sockets are not PACKET :) )

Here is the patch again, I tested it with 'busybox ping' and MTU=9000,
it solved the problem for me.

Thanks

[PATCH net-next] net: relax rcvbuf limits

skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b533 (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.

Reported-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h     |    4 +++-
 net/core/sock.c        |    6 +-----
 net/packet/af_packet.c |    6 ++----
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index bf6b9fd..21bb3b5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -662,12 +662,14 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 
 /*
  * Take into account size of receive queue and backlog queue
+ * Do not take into account this skb truesize,
+ * to allow even a single big packet to come.
  */
 static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
 {
 	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
 
-	return qsize + skb->truesize > sk->sk_rcvbuf;
+	return qsize > sk->sk_rcvbuf;
 }
 
 /* The per-socket spinlock must be held here. */
diff --git a/net/core/sock.c b/net/core/sock.c
index a343286..347b6d9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -339,11 +339,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	unsigned long flags;
 	struct sk_buff_head *list = &sk->sk_receive_queue;
 
-	/* Cast sk->rcvbuf to unsigned... It's pointless, but reduces
-	   number of warnings when compiling with -W --ANK
-	 */
-	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
-	    (unsigned)sk->sk_rcvbuf) {
+	if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) {
 		atomic_inc(&sk->sk_drops);
 		trace_sock_rcvqueue_full(sk, skb);
 		return -ENOMEM;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 0da505c..e56ca75 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1631,8 +1631,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (snaplen > res)
 		snaplen = res;
 
-	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
-	    (unsigned)sk->sk_rcvbuf)
+	if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
 		goto drop_n_acct;
 
 	if (skb_shared(skb)) {
@@ -1763,8 +1762,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (po->tp_version <= TPACKET_V2) {
 		if (macoff + snaplen > po->rx_ring.frame_size) {
 			if (po->copy_thresh &&
-				atomic_read(&sk->sk_rmem_alloc) + skb->truesize
-				< (unsigned)sk->sk_rcvbuf) {
+			    atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf) {
 				if (skb_shared(skb)) {
 					copy_skb = skb_clone(skb, GFP_ATOMIC);
 				} else {

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 17:11                               ` Eric Dumazet
@ 2011-12-21 20:55                                 ` David Miller
  2011-12-22  7:49                                   ` Michal Simek
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2011-12-21 20:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: monstr, john.williams, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Dec 2011 18:11:44 +0100

> [PATCH net-next] net: relax rcvbuf limits
> 
> skb->truesize might be big even for a small packet.
> 
> Its even bigger after commit 87fb4b7b533 (net: more accurate skb
> truesize) and big MTU.
> 
> We should allow queueing at least one packet per receiver, even with a
> low RCVBUF setting.
> 
> Reported-by: Michal Simek <monstr@monstr.eu>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied to net-next, although I was tempted to put it into net.

We may end up backporting this into -stable at some point, we'll
see.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 20:55                                 ` David Miller
@ 2011-12-22  7:49                                   ` Michal Simek
  2011-12-22  7:57                                     ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Simek @ 2011-12-22  7:49 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, john.williams, netdev

David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Dec 2011 18:11:44 +0100
> 
>> [PATCH net-next] net: relax rcvbuf limits
>>
>> skb->truesize might be big even for a small packet.
>>
>> Its even bigger after commit 87fb4b7b533 (net: more accurate skb
>> truesize) and big MTU.
>>
>> We should allow queueing at least one packet per receiver, even with a
>> low RCVBUF setting.
>>
>> Reported-by: Michal Simek <monstr@monstr.eu>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied to net-next, although I was tempted to put it into net.
> 
> We may end up backporting this into -stable at some point, we'll
> see.

Yes, it works. Thanks Eric.

I hope that this patch will be in v3.2.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-22  7:49                                   ` Michal Simek
@ 2011-12-22  7:57                                     ` Eric Dumazet
  2011-12-22  8:05                                       ` Michal Simek
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-22  7:57 UTC (permalink / raw)
  To: monstr; +Cc: David Miller, john.williams, netdev

Le jeudi 22 décembre 2011 à 08:49 +0100, Michal Simek a écrit :
> David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 21 Dec 2011 18:11:44 +0100
> > 
> >> [PATCH net-next] net: relax rcvbuf limits
> >>
> >> skb->truesize might be big even for a small packet.
> >>
> >> Its even bigger after commit 87fb4b7b533 (net: more accurate skb
> >> truesize) and big MTU.
> >>
> >> We should allow queueing at least one packet per receiver, even with a
> >> low RCVBUF setting.
> >>
> >> Reported-by: Michal Simek <monstr@monstr.eu>
> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > Applied to net-next, although I was tempted to put it into net.
> > 
> > We may end up backporting this into -stable at some point, we'll
> > see.
> 
> Yes, it works. Thanks Eric.
> 
> I hope that this patch will be in v3.2.
> 

Thanks for testing !

I overlooked fact that commit 87fb4b7b533 was already in 3.2, so yes, we
probably can push this to 3.2

(By the way, busybox ping probably doesnt work on 3.1 kernel with a
MTU=9000 non copybreak driver, so its not a clear 3.2 regression)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-22  7:57                                     ` Eric Dumazet
@ 2011-12-22  8:05                                       ` Michal Simek
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Simek @ 2011-12-22  8:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, john.williams, netdev

Eric Dumazet wrote:
> Le jeudi 22 décembre 2011 à 08:49 +0100, Michal Simek a écrit :
>> David Miller wrote:
>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>> Date: Wed, 21 Dec 2011 18:11:44 +0100
>>>
>>>> [PATCH net-next] net: relax rcvbuf limits
>>>>
>>>> skb->truesize might be big even for a small packet.
>>>>
>>>> Its even bigger after commit 87fb4b7b533 (net: more accurate skb
>>>> truesize) and big MTU.
>>>>
>>>> We should allow queueing at least one packet per receiver, even with a
>>>> low RCVBUF setting.
>>>>
>>>> Reported-by: Michal Simek <monstr@monstr.eu>
>>>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>>> Applied to net-next, although I was tempted to put it into net.
>>>
>>> We may end up backporting this into -stable at some point, we'll
>>> see.
>> Yes, it works. Thanks Eric.
>>
>> I hope that this patch will be in v3.2.
>>
> 
> Thanks for testing !
> 
> I overlooked fact that commit 87fb4b7b533 was already in 3.2, so yes, we
> probably can push this to 3.2

great.

> (By the way, busybox ping probably doesnt work on 3.1 kernel with a
> MTU=9000 non copybreak driver, so its not a clear 3.2 regression)

good to know.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng)
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel 2.6 Microblaze Linux - http://www.monstr.eu/fdt/
Microblaze U-BOOT custodian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-21 10:11 ICMP packets - ll_temac with Microblaze Michal Simek
  2011-12-21 10:28 ` Eric Dumazet
@ 2011-12-22 10:32 ` Michael Wang
  2011-12-22 10:46   ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: Michael Wang @ 2011-12-22 10:32 UTC (permalink / raw)
  To: monstr; +Cc: Eric Dumazet, David Miller, John Williams, netdev

On 12/21/2011 06:11 PM, Michal Simek wrote:

> Hi Eric and David,
> 
> I have found one problem with ll_temac driver and
> this commit: net: more accurate skb truesize
> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
> 
> The problem is only with icmp packets from the target. It is sent and
> driver receive it
> but it is not proceed to the application.
> 

Hi, Michal

What's the type of icmp you are using? such as "EchoReps", we can find
the actually handler routine by this type.

And you said the packet already received by driver, can you tell me the
way you used to confirm this?

And have you checked when was the icmp packet being dropped, is it in
icmp_rcv or before or later?

Thanks,
Michael Wang

> The problem I see is that kmalloc_node_track_caller allocate
> specific size and then this size is changed by
> SKB_WITH_OVERHEAD(ksize(data)).
> The problem is with netdev_alloc_skb_ip_align which calls __alloc_skb
> function.
> 
> Currently driver uses setting for jumbo frames (9k). When I change it to
> use mtu (1,5k) then
> everything is ok.
> 
> Can you give me some hints what can be wrong?
> 
> Thanks,
> Michal
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-22 10:32 ` Michael Wang
@ 2011-12-22 10:46   ` Eric Dumazet
  2011-12-22 13:01     ` Michael Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-22 10:46 UTC (permalink / raw)
  To: Michael Wang; +Cc: monstr, David Miller, John Williams, netdev

Le jeudi 22 décembre 2011 à 18:32 +0800, Michael Wang a écrit :
> On 12/21/2011 06:11 PM, Michal Simek wrote:
> 
> > Hi Eric and David,
> > 
> > I have found one problem with ll_temac driver and
> > this commit: net: more accurate skb truesize
> > sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
> > 
> > The problem is only with icmp packets from the target. It is sent and
> > driver receive it
> > but it is not proceed to the application.
> > 
> 
> Hi, Michal
> 
> What's the type of icmp you are using? such as "EchoReps", we can find
> the actually handler routine by this type.
> 
> And you said the packet already received by driver, can you tell me the
> way you used to confirm this?
> 
> And have you checked when was the icmp packet being dropped, is it in
> icmp_rcv or before or later?

Packet was dropped right before being queued in RAW socket
receive_queue, because of low sk_rcvbuf setting (against skb->truesize)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-22 10:46   ` Eric Dumazet
@ 2011-12-22 13:01     ` Michael Wang
  2011-12-22 13:21       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Michael Wang @ 2011-12-22 13:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: monstr, David Miller, John Williams, netdev

On 12/22/2011 06:46 PM, Eric Dumazet wrote:

> Le jeudi 22 décembre 2011 à 18:32 +0800, Michael Wang a écrit :
>> On 12/21/2011 06:11 PM, Michal Simek wrote:
>>
>>> Hi Eric and David,
>>>
>>> I have found one problem with ll_temac driver and
>>> this commit: net: more accurate skb truesize
>>> sha1: 87fb4b7b533073eeeaed0b6bf7c2328995f6c075
>>>
>>> The problem is only with icmp packets from the target. It is sent and
>>> driver receive it
>>> but it is not proceed to the application.
>>>
>>
>> Hi, Michal
>>
>> What's the type of icmp you are using? such as "EchoReps", we can find
>> the actually handler routine by this type.
>>
>> And you said the packet already received by driver, can you tell me the
>> way you used to confirm this?
>>
>> And have you checked when was the icmp packet being dropped, is it in
>> icmp_rcv or before or later?
> 
> Packet was dropped right before being queued in RAW socket
> receive_queue, because of low sk_rcvbuf setting (against skb->truesize)
> 

Hi, Eric

So do you mean the way is:

ip_local_deliver_finish --> raw_local_deliver --> raw_v4_input -->
raw_rcv --> raw_rcv_skb --> sock_queue_rcv_skb

And in sock_queue_rcv_skb:

        if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
            (unsigned)sk->sk_rcvbuf) {
                atomic_inc(&sk->sk_drops);
                trace_sock_rcvqueue_full(sk, skb);
                return -ENOMEM;				//drop?
        }

Is this the way the packet being dropped? Is this confirmed by some printk?

Sorry if the question is naive, I just want to make sure I am checking
at the right place.

Regards,
Michael Wang

> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-22 13:01     ` Michael Wang
@ 2011-12-22 13:21       ` Eric Dumazet
  2011-12-22 13:42         ` Michael Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2011-12-22 13:21 UTC (permalink / raw)
  To: Michael Wang; +Cc: monstr, David Miller, John Williams, netdev

Le jeudi 22 décembre 2011 à 21:01 +0800, Michael Wang a écrit :

> Hi, Eric
> 
> So do you mean the way is:
> 
> ip_local_deliver_finish --> raw_local_deliver --> raw_v4_input -->
> raw_rcv --> raw_rcv_skb --> sock_queue_rcv_skb
> 
> And in sock_queue_rcv_skb:
> 
>         if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
>             (unsigned)sk->sk_rcvbuf) {
>                 atomic_inc(&sk->sk_drops);
>                 trace_sock_rcvqueue_full(sk, skb);
>                 return -ENOMEM;				//drop?
>         }
> 
> Is this the way the packet being dropped? Is this confirmed by some printk?
> 
> Sorry if the question is naive, I just want to make sure I am checking
> at the right place.
> 

Yep

Are you aware I already posted a patch to solve the problem ?

http://patchwork.ozlabs.org/patch/132698/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: ICMP packets - ll_temac with Microblaze
  2011-12-22 13:21       ` Eric Dumazet
@ 2011-12-22 13:42         ` Michael Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Michael Wang @ 2011-12-22 13:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: monstr, David Miller, John Williams, netdev

On 12/22/2011 09:21 PM, Eric Dumazet wrote:

> Le jeudi 22 décembre 2011 à 21:01 +0800, Michael Wang a écrit :
> 
>> Hi, Eric
>>
>> So do you mean the way is:
>>
>> ip_local_deliver_finish --> raw_local_deliver --> raw_v4_input -->
>> raw_rcv --> raw_rcv_skb --> sock_queue_rcv_skb
>>
>> And in sock_queue_rcv_skb:
>>
>>         if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
>>             (unsigned)sk->sk_rcvbuf) {
>>                 atomic_inc(&sk->sk_drops);
>>                 trace_sock_rcvqueue_full(sk, skb);
>>                 return -ENOMEM;				//drop?
>>         }
>>
>> Is this the way the packet being dropped? Is this confirmed by some printk?
>>
>> Sorry if the question is naive, I just want to make sure I am checking
>> at the right place.
>>
> 
> Yep
> 
> Are you aware I already posted a patch to solve the problem ?
> 
> http://patchwork.ozlabs.org/patch/132698/
> 

Oh, sorry, I don't know about that, congratulation :)

Regards,
Michael Wang

> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-12-22 13:42 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-21 10:11 ICMP packets - ll_temac with Microblaze Michal Simek
2011-12-21 10:28 ` Eric Dumazet
2011-12-21 10:30   ` Michal Simek
2011-12-21 10:30   ` Eric Dumazet
2011-12-21 10:32     ` Michal Simek
2011-12-21 10:37       ` Eric Dumazet
2011-12-21 10:41         ` Michal Simek
2011-12-21 10:45           ` Eric Dumazet
2011-12-21 10:54             ` Michal Simek
2011-12-21 11:05               ` Eric Dumazet
2011-12-21 11:03             ` Eric Dumazet
2011-12-21 11:10               ` Michal Simek
2011-12-21 11:13                 ` Eric Dumazet
2011-12-21 11:50                   ` Michal Simek
2011-12-21 12:39                     ` Eric Dumazet
2011-12-21 13:28                       ` Michal Simek
2011-12-21 13:40                         ` Eric Dumazet
2011-12-21 14:24                           ` Michal Simek
2011-12-21 15:30                             ` Eric Dumazet
2011-12-21 15:44                               ` Eric Dumazet
2011-12-21 16:01                                 ` Jun Zhao
2011-12-21 16:05                                   ` Eric Dumazet
2011-12-21 16:11                                     ` Jun Zhao
2011-12-21 16:39                                     ` David Laight
2011-12-21 16:50                                       ` Eric Dumazet
2011-12-21 15:59                             ` Eric Dumazet
2011-12-21 17:11                               ` Eric Dumazet
2011-12-21 20:55                                 ` David Miller
2011-12-22  7:49                                   ` Michal Simek
2011-12-22  7:57                                     ` Eric Dumazet
2011-12-22  8:05                                       ` Michal Simek
2011-12-22 10:32 ` Michael Wang
2011-12-22 10:46   ` Eric Dumazet
2011-12-22 13:01     ` Michael Wang
2011-12-22 13:21       ` Eric Dumazet
2011-12-22 13:42         ` Michael Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.