netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch] tcp: attach SYNACK messages to request sockets instead of listener
@ 2015-10-29 21:49 Haiyang Zhang
  2015-10-29 22:58 ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Haiyang Zhang @ 2015-10-29 21:49 UTC (permalink / raw)
  To: edumazet; +Cc: David Miller, netdev, KY Srinivasan

Hi Eric,

I saw a panic in __dev_kfree_skb_any() when I ssh into some 
Ubuntu VM with latest Linux-next tree on Hyper-V host.
With git bisecting, I found the patch below is the first commit
with this issue. I also included the stack trace here.
Do you have any idea about what the problem might be?

http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=ca6fb06518836ef9b65dc0aac02ff97704d52a05
author  Eric Dumazet <edumazet@google.com> 2015-10-02 18:43:35 (GMT) 
commit ca6fb06518836ef9b65dc0aac02ff97704d52a05 (patch) 
tcp: attach SYNACK messages to request sockets instead of listener

Stack trace:
[   96.235084] general protection fault: 0000 [#1] SMP
[   96.235084] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtabl
e_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip
6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_
nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables hyperv_keyboard pcspkr
hv_utils serio_raw i2c_piix4 hyperv_fb i2c_core acpi_cpufreq uinput xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_
acpi hid_hyperv hv_netvsc hv_storvsc ata_piix libata hv_vmbus floppy dm_mirror dm_region_hash dm_log dm_mod
[   96.235084] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.3.0-rc6-next-20151021+ #1
[   96.235084] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[   96.235084] task: ffff880101bf0000 ti: ffff880101bf8000 task.ti: ffff880101bf8000
[   96.235084] RIP: 0010:[<ffffffff8158b17c>]  [<ffffffff8158b17c>] sock_wfree+0x4c/0x60
[   96.235084] RSP: 0018:ffff880102643da8  EFLAGS: 00010292
[   96.235084] RAX: 00000000000004ff RBX: ffff8800f2d50000 RCX: 0000000000000000
[   96.235084] RDX: ffff8800f1af0000 RSI: 0000000000000001 RDI: ffff8800f2d50000
[   96.235084] RBP: ffff880102643db8 R08: ffff8800f2086000 R09: 000000000007efc8
[   96.235084] R10: ffff880036800000 R11: 0000000000000000 R12: ffff8800f2d50124
[   96.235084] R13: ffff880036800000 R14: ffff880035d80000 R15: ffff8800f39b7c00
[   96.770086] FS:  0000000000000000(0000) GS:ffff880102640000(0000) knlGS:0000000000000000
[   96.770086] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   96.770086] CR2: 00007efefe680514 CR3: 0000000036bee000 CR4: 00000000000006e0
[   96.770086] Stack:
[   96.770086]  ffff8800f2e93800 ffff8800f2e93800 ffff880102643dd0 ffffffff8158c42f
[   96.770086]  ffff8800f2e93800 ffff880102643de8 ffffffff8158dac2 ffff8800f2087000
[   96.770086]  ffff880102643e08 ffffffff8158e06c ffff8800f2087000 0000000000001000
[   96.770086] Call Trace:
[   96.770086]  <IRQ>
[   96.770086]  [<ffffffff8158c42f>] skb_release_head_state+0x4f/0xb0
[   96.770086]  [<ffffffff8158dac2>] skb_release_all+0x12/0x30
[   96.770086]  [<ffffffff8158e06c>] consume_skb+0x2c/0x70
[   96.770086]  [<ffffffff8159f885>] __dev_kfree_skb_any+0x35/0x40
[   96.770086]  [<ffffffffa00ef0fc>] netvsc_xmit_completion+0x1c/0x20 [hv_netvsc]
[   96.770086]  [<ffffffffa00f12c7>] netvsc_channel_cb+0x217/0x3f0 [hv_netvsc]
[   96.770086]  [<ffffffffa0059584>] vmbus_on_event+0x154/0x190 [hv_vmbus]
[   96.770086]  [<ffffffff81083495>] tasklet_action+0xe5/0xf0
[   96.770086]  [<ffffffff810836f7>] __do_softirq+0xd7/0x2a0
[   96.770086]  [<ffffffff81083b65>] irq_exit+0xf5/0x100
[   96.770086]  [<ffffffff8104da4e>] hyperv_vector_handler+0x3e/0x50
[   96.770086]  [<ffffffff816ae717>] hyperv_callback_vector+0x87/0x90
[   96.770086]  <EOI>
[   96.770086]  [<ffffffff810635a6>] ? native_safe_halt+0x6/0x10
[   96.770086]  [<ffffffff81021aee>] default_idle+0x1e/0xa0
[   96.770086]  [<ffffffff8102227f>] arch_cpu_idle+0xf/0x20
[   96.770086]  [<ffffffff810c1492>] default_idle_call+0x32/0x40
[   96.770086]  [<ffffffff810c17be>] cpu_startup_entry+0x2be/0x330
[   96.770086]  [<ffffffff810503a0>] start_secondary+0x190/0x1d0
[   96.770086] Code: 80 e6 02 74 19 f0 41 29 04 24 74 05 5b 41 5c 5d c3 48 89 df e8 b6 f8 ff ff 5b 41 5c 5d c3 83 e8 01
f0 29 83 24 01 00 00 48 89 df <ff> 93 a0 02 00 00 b8 01 00 00 00 eb cd 0f 1f 80 00 00 00 00 66
[   96.770086] RIP  [<ffffffff8158b17c>] sock_wfree+0x4c/0x60
[   96.770086]  RSP <ffff880102643da8>
[   97.572206] ---[ end trace 0d1199c7e6a1aaa4 ]---
[   97.573146] Kernel panic - not syncing: Fatal exception in interrupt
[   97.573146] Kernel Offset: disabled
[   97.573146] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Thanks,
- Haiyang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-29 21:49 [patch] tcp: attach SYNACK messages to request sockets instead of listener Haiyang Zhang
@ 2015-10-29 22:58 ` Eric Dumazet
  2015-10-30 19:38   ` Haiyang Zhang
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-10-29 22:58 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: edumazet, David Miller, netdev, KY Srinivasan

On Thu, 2015-10-29 at 21:49 +0000, Haiyang Zhang wrote:
> Hi Eric,
> 
> I saw a panic in __dev_kfree_skb_any() when I ssh into some 
> Ubuntu VM with latest Linux-next tree on Hyper-V host.
> With git bisecting, I found the patch below is the first commit
> with this issue. I also included the stack trace here.
> Do you have any idea about what the problem might be?
> 
> http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=ca6fb06518836ef9b65dc0aac02ff97704d52a05
> author  Eric Dumazet <edumazet@google.com> 2015-10-02 18:43:35 (GMT) 
> commit ca6fb06518836ef9b65dc0aac02ff97704d52a05 (patch) 
> tcp: attach SYNACK messages to request sockets instead of listener
> 
> Stack trace:
> [   96.235084] general protection fault: 0000 [#1] SMP
> [   96.235084] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtabl
> e_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip
> 6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_
> nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables hyperv_keyboard pcspkr
> hv_utils serio_raw i2c_piix4 hyperv_fb i2c_core acpi_cpufreq uinput xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_
> acpi hid_hyperv hv_netvsc hv_storvsc ata_piix libata hv_vmbus floppy dm_mirror dm_region_hash dm_log dm_mod
> [   96.235084] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.3.0-rc6-next-20151021+ #1
> [   96.235084] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
> [   96.235084] task: ffff880101bf0000 ti: ffff880101bf8000 task.ti: ffff880101bf8000
> [   96.235084] RIP: 0010:[<ffffffff8158b17c>]  [<ffffffff8158b17c>] sock_wfree+0x4c/0x60
> [   96.235084] RSP: 0018:ffff880102643da8  EFLAGS: 00010292
> [   96.235084] RAX: 00000000000004ff RBX: ffff8800f2d50000 RCX: 0000000000000000
> [   96.235084] RDX: ffff8800f1af0000 RSI: 0000000000000001 RDI: ffff8800f2d50000
> [   96.235084] RBP: ffff880102643db8 R08: ffff8800f2086000 R09: 000000000007efc8
> [   96.235084] R10: ffff880036800000 R11: 0000000000000000 R12: ffff8800f2d50124
> [   96.235084] R13: ffff880036800000 R14: ffff880035d80000 R15: ffff8800f39b7c00
> [   96.770086] FS:  0000000000000000(0000) GS:ffff880102640000(0000) knlGS:0000000000000000
> [   96.770086] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   96.770086] CR2: 00007efefe680514 CR3: 0000000036bee000 CR4: 00000000000006e0
> [   96.770086] Stack:
> [   96.770086]  ffff8800f2e93800 ffff8800f2e93800 ffff880102643dd0 ffffffff8158c42f
> [   96.770086]  ffff8800f2e93800 ffff880102643de8 ffffffff8158dac2 ffff8800f2087000
> [   96.770086]  ffff880102643e08 ffffffff8158e06c ffff8800f2087000 0000000000001000
> [   96.770086] Call Trace:
> [   96.770086]  <IRQ>
> [   96.770086]  [<ffffffff8158c42f>] skb_release_head_state+0x4f/0xb0
> [   96.770086]  [<ffffffff8158dac2>] skb_release_all+0x12/0x30
> [   96.770086]  [<ffffffff8158e06c>] consume_skb+0x2c/0x70
> [   96.770086]  [<ffffffff8159f885>] __dev_kfree_skb_any+0x35/0x40
> [   96.770086]  [<ffffffffa00ef0fc>] netvsc_xmit_completion+0x1c/0x20 [hv_netvsc]
> [   96.770086]  [<ffffffffa00f12c7>] netvsc_channel_cb+0x217/0x3f0 [hv_netvsc]
> [   96.770086]  [<ffffffffa0059584>] vmbus_on_event+0x154/0x190 [hv_vmbus]
> [   96.770086]  [<ffffffff81083495>] tasklet_action+0xe5/0xf0
> [   96.770086]  [<ffffffff810836f7>] __do_softirq+0xd7/0x2a0
> [   96.770086]  [<ffffffff81083b65>] irq_exit+0xf5/0x100
> [   96.770086]  [<ffffffff8104da4e>] hyperv_vector_handler+0x3e/0x50
> [   96.770086]  [<ffffffff816ae717>] hyperv_callback_vector+0x87/0x90
> [   96.770086]  <EOI>
> [   96.770086]  [<ffffffff810635a6>] ? native_safe_halt+0x6/0x10
> [   96.770086]  [<ffffffff81021aee>] default_idle+0x1e/0xa0
> [   96.770086]  [<ffffffff8102227f>] arch_cpu_idle+0xf/0x20
> [   96.770086]  [<ffffffff810c1492>] default_idle_call+0x32/0x40
> [   96.770086]  [<ffffffff810c17be>] cpu_startup_entry+0x2be/0x330
> [   96.770086]  [<ffffffff810503a0>] start_secondary+0x190/0x1d0
> [   96.770086] Code: 80 e6 02 74 19 f0 41 29 04 24 74 05 5b 41 5c 5d c3 48 89 df e8 b6 f8 ff ff 5b 41 5c 5d c3 83 e8 01
> f0 29 83 24 01 00 00 48 89 df <ff> 93 a0 02 00 00 b8 01 00 00 00 eb cd 0f 1f 80 00 00 00 00 66
> [   96.770086] RIP  [<ffffffff8158b17c>] sock_wfree+0x4c/0x60
> [   96.770086]  RSP <ffff880102643da8>
> [   97.572206] ---[ end trace 0d1199c7e6a1aaa4 ]---
> [   97.573146] Kernel panic - not syncing: Fatal exception in interrupt
> [   97.573146] Kernel Offset: disabled
> [   97.573146] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> 
> Thanks,
> - Haiyang
> 

Thanks for this report.

Somehow I knew such bugs would surface ;)

Please try following debugging patch ?

We need to identify which part of the kernel is messed up.

diff --git a/include/net/sock.h b/include/net/sock.h
index aeed5c95f3ca..a643499d37e2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1951,6 +1951,14 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 	}
 }
 
+/* This helper checks if a socket is a full socket,
+ * ie _not_ a timewait or request socket.
+ */
+static inline bool sk_fullsock(const struct sock *sk)
+{
+	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
+}
+
 /*
  *	Queue a received datagram if it will fit. Stream and sequenced
  *	protocols can't normally use this as they need to fit buffers in
@@ -1962,6 +1970,10 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 
 static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 {
+	if (!sk_fullsock(sk)) {
+		WARN_ON_ONCE(1);
+		return;
+	}
 	skb_orphan(skb);
 	skb->sk = sk;
 	skb->destructor = sock_wfree;
@@ -2223,14 +2235,6 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb)
 	return NULL;
 }
 
-/* This helper checks if a socket is a full socket,
- * ie _not_ a timewait or request socket.
- */
-static inline bool sk_fullsock(const struct sock *sk)
-{
-	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
-}
-
 /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
  * SYNACK messages can be attached to either ones (depending on SYNCOOKIE)
  */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-29 22:58 ` Eric Dumazet
@ 2015-10-30 19:38   ` Haiyang Zhang
  2015-10-30 20:02     ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Haiyang Zhang @ 2015-10-30 19:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: edumazet, David Miller, netdev, KY Srinivasan



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Thursday, October 29, 2015 6:59 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: edumazet@google.com; David Miller <davem@davemloft.net>;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> Subject: Re: [patch] tcp: attach SYNACK messages to request sockets
> instead of listener
> 
> 
> Thanks for this report.
> 
> Somehow I knew such bugs would surface ;)
> 
> Please try following debugging patch ?
> 
> We need to identify which part of the kernel is messed up.
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index aeed5c95f3ca..a643499d37e2 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1951,6 +1951,14 @@ static inline void skb_set_hash_from_sk(struct
> sk_buff *skb, struct sock *sk)
>  	}
>  }
> 
> +/* This helper checks if a socket is a full socket,
> + * ie _not_ a timewait or request socket.
> + */
> +static inline bool sk_fullsock(const struct sock *sk)
> +{
> +	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> +}
> +
>  /*
>   *	Queue a received datagram if it will fit. Stream and sequenced
>   *	protocols can't normally use this as they need to fit buffers in
> @@ -1962,6 +1970,10 @@ static inline void skb_set_hash_from_sk(struct
> sk_buff *skb, struct sock *sk)
> 
>  static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  {
> +	if (!sk_fullsock(sk)) {
> +		WARN_ON_ONCE(1);
> +		return;
> +	}
>  	skb_orphan(skb);
>  	skb->sk = sk;
>  	skb->destructor = sock_wfree;
> @@ -2223,14 +2235,6 @@ static inline struct sock *skb_steal_sock(struct
> sk_buff *skb)
>  	return NULL;
>  }
> 
> -/* This helper checks if a socket is a full socket,
> - * ie _not_ a timewait or request socket.
> - */
> -static inline bool sk_fullsock(const struct sock *sk)
> -{
> -	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> -}
> -
>  /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
>   * SYNACK messages can be attached to either ones (depending on
> SYNCOOKIE)
>   */
> 

Hi Eric,

Thanks for the debug patch. The panic does not happen anymore with
the patch. I see a warning call trace:

[  222.307948] ------------[ cut here ]------------
[  222.308009] WARNING: CPU: 6 PID: 0 at include/net/sock.h:1974 ip_finish_output2+0x34f/0x360()
[  222.308027] Modules linked in: cfg80211 joydev crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 glue_helper hid_generic lrw gf128mul ablk_helper i2c_piix4 hid_hyperv hyperv_fb hid cryptd hyperv_keyboard 8250_fintek mac_hid serio_raw parport_pc ppdev lp parport autofs4 hv_utils hv_netvsc hv_storvsc psmouse hv_vmbus floppy pata_acpi
[  222.308088] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.3.0-rc6-next-20151022+ #2
[  222.308104] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
[  222.308120]  ffffffff81b2ae66 ffff88007c783878 ffffffff813a4cf4 0000000000000000
[  222.308137]  ffff88007c7838b0 ffffffff81078cc6 ffff88005bf2dc00 ffff880079f58000
[  222.308153]  ffff88005bf2c000 ffff88005bf2c800 ffff880050b28000 ffff88007c7838c0
[  222.308171] Call Trace:
[  222.308185]  <IRQ>  [<ffffffff813a4cf4>] dump_stack+0x44/0x60
[  222.308212]  [<ffffffff81078cc6>] warn_slowpath_common+0x86/0xc0
[  222.308228]  [<ffffffff81078dba>] warn_slowpath_null+0x1a/0x20
[  222.308245]  [<ffffffff816d2e6f>] ip_finish_output2+0x34f/0x360
[  222.308262]  [<ffffffff816d4589>] ip_finish_output+0x149/0x1e0
[  222.308280]  [<ffffffff816d4f2c>] ip_output+0x5c/0xc0
[  222.308300]  [<ffffffff8101e899>] ? sched_clock+0x9/0x10
[  222.308319]  [<ffffffff810a63a7>] ? sched_clock_local+0x17/0x80
[  222.308335]  [<ffffffff816d4725>] ip_local_out+0x35/0x40
[  222.308351]  [<ffffffff816d487d>] ip_build_and_send_pkt+0x14d/0x1c0
[  222.308369]  [<ffffffff816f39fb>] tcp_v4_send_synack+0x5b/0xb0
[  222.308386]  [<ffffffff816d8fb9>] ? inet_ehash_insert+0x59/0x130
[  222.308404]  [<ffffffff816da266>] ? inet_csk_reqsk_queue_hash_add+0x76/0xa0
[  222.308425]  [<ffffffff816e3223>] tcp_conn_request+0x9b3/0x9f0
[  222.308444]  [<ffffffff816f20bc>] tcp_v4_conn_request+0x4c/0x50
[  222.308458]  [<ffffffff816e940c>] tcp_rcv_state_process+0x19c/0xcb0
[  222.308473]  [<ffffffff817746ec>] ? tcp_v4_inbound_md5_hash+0x6d/0x177
[  222.308485]  [<ffffffff816f2f53>] tcp_v4_do_rcv+0x73/0x210
[  222.308496]  [<ffffffff816f43b1>] tcp_v4_rcv+0x811/0x840
[  222.308511]  [<ffffffff816cda9a>] ? ip_route_input_noref+0xb3a/0xd90
[  222.308524]  [<ffffffff816cf1a3>] ip_local_deliver_finish+0x53/0xe0
[  222.308536]  [<ffffffff816cf690>] ip_local_deliver+0x60/0xd0
[  222.308549]  [<ffffffff816cf2b7>] ip_rcv_finish+0x87/0x2b0
[  222.308561]  [<ffffffff816cf949>] ip_rcv+0x249/0x350
[  222.308574]  [<ffffffff8176796c>] ? packet_rcv+0x4c/0x3e0
[  222.308589]  [<ffffffff81696857>] __netif_receive_skb_core+0x2d7/0x980
[  222.308602]  [<ffffffff81696f18>] __netif_receive_skb+0x18/0x60
[  222.308614]  [<ffffffff81697b38>] process_backlog+0xa8/0x150
[  222.308627]  [<ffffffff816973c3>] net_rx_action+0x1b3/0x2c0
[  222.308641]  [<ffffffff8107d32c>] __do_softirq+0xfc/0x250
[  222.308653]  [<ffffffff8107d5de>] irq_exit+0x8e/0x90
[  222.308667]  [<ffffffff8104a1ce>] hyperv_vector_handler+0x3e/0x50
[  222.308680]  [<ffffffff817835c2>] hyperv_callback_vector+0x82/0x90
[  222.308690]  <EOI>  [<ffffffff8105e256>] ? native_safe_halt+0x6/0x10
[  222.308707]  [<ffffffff8101f7ae>] default_idle+0x1e/0xa0
[  222.308718]  [<ffffffff8101feff>] arch_cpu_idle+0xf/0x20
[  222.308731]  [<ffffffff810b86f2>] default_idle_call+0x32/0x40
[  222.308743]  [<ffffffff810b8a18>] cpu_startup_entry+0x2b8/0x310
[  222.308756]  [<ffffffff8104c238>] start_secondary+0x178/0x1a0
[  222.308769] ---[ end trace 0c71438d4d1b6dca ]---

Thanks,
- Haiyang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-30 19:38   ` Haiyang Zhang
@ 2015-10-30 20:02     ` Eric Dumazet
  2015-10-30 20:18       ` Eric Dumazet
  2015-10-30 20:28       ` [patch] tcp: attach SYNACK messages to request sockets instead of listener KY Srinivasan
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-10-30 20:02 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: edumazet, David Miller, netdev, KY Srinivasan

On Fri, 2015-10-30 at 19:38 +0000, Haiyang Zhang wrote:
> 
> > -----Original Message-----
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > Sent: Thursday, October 29, 2015 6:59 PM
> > To: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: edumazet@google.com; David Miller <davem@davemloft.net>;
> > netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> > Subject: Re: [patch] tcp: attach SYNACK messages to request sockets
> > instead of listener
> > 
> > 
> > Thanks for this report.
> > 
> > Somehow I knew such bugs would surface ;)
> > 
> > Please try following debugging patch ?
> > 
> > We need to identify which part of the kernel is messed up.
> > 
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index aeed5c95f3ca..a643499d37e2 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1951,6 +1951,14 @@ static inline void skb_set_hash_from_sk(struct
> > sk_buff *skb, struct sock *sk)
> >  	}
> >  }
> > 
> > +/* This helper checks if a socket is a full socket,
> > + * ie _not_ a timewait or request socket.
> > + */
> > +static inline bool sk_fullsock(const struct sock *sk)
> > +{
> > +	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > +}
> > +
> >  /*
> >   *	Queue a received datagram if it will fit. Stream and sequenced
> >   *	protocols can't normally use this as they need to fit buffers in
> > @@ -1962,6 +1970,10 @@ static inline void skb_set_hash_from_sk(struct
> > sk_buff *skb, struct sock *sk)
> > 
> >  static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
> >  {
> > +	if (!sk_fullsock(sk)) {
> > +		WARN_ON_ONCE(1);
> > +		return;
> > +	}
> >  	skb_orphan(skb);
> >  	skb->sk = sk;
> >  	skb->destructor = sock_wfree;
> > @@ -2223,14 +2235,6 @@ static inline struct sock *skb_steal_sock(struct
> > sk_buff *skb)
> >  	return NULL;
> >  }
> > 
> > -/* This helper checks if a socket is a full socket,
> > - * ie _not_ a timewait or request socket.
> > - */
> > -static inline bool sk_fullsock(const struct sock *sk)
> > -{
> > -	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > -}
> > -
> >  /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
> >   * SYNACK messages can be attached to either ones (depending on
> > SYNCOOKIE)
> >   */
> > 
> 
> Hi Eric,
> 
> Thanks for the debug patch. The panic does not happen anymore with
> the patch. I see a warning call trace:
> 
> [  222.307948] ------------[ cut here ]------------
> [  222.308009] WARNING: CPU: 6 PID: 0 at include/net/sock.h:1974 ip_finish_output2+0x34f/0x360()
> [  222.308027] Modules linked in: cfg80211 joydev crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 glue_helper hid_generic lrw gf128mul ablk_helper i2c_piix4 hid_hyperv hyperv_fb hid cryptd hyperv_keyboard 8250_fintek mac_hid serio_raw parport_pc ppdev lp parport autofs4 hv_utils hv_netvsc hv_storvsc psmouse hv_vmbus floppy pata_acpi
> [  222.308088] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.3.0-rc6-next-20151022+ #2
> [  222.308104] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
> [  222.308120]  ffffffff81b2ae66 ffff88007c783878 ffffffff813a4cf4 0000000000000000
> [  222.308137]  ffff88007c7838b0 ffffffff81078cc6 ffff88005bf2dc00 ffff880079f58000
> [  222.308153]  ffff88005bf2c000 ffff88005bf2c800 ffff880050b28000 ffff88007c7838c0
> [  222.308171] Call Trace:
> [  222.308185]  <IRQ>  [<ffffffff813a4cf4>] dump_stack+0x44/0x60
> [  222.308212]  [<ffffffff81078cc6>] warn_slowpath_common+0x86/0xc0
> [  222.308228]  [<ffffffff81078dba>] warn_slowpath_null+0x1a/0x20
> [  222.308245]  [<ffffffff816d2e6f>] ip_finish_output2+0x34f/0x360
> [  222.308262]  [<ffffffff816d4589>] ip_finish_output+0x149/0x1e0
> [  222.308280]  [<ffffffff816d4f2c>] ip_output+0x5c/0xc0
> [  222.308300]  [<ffffffff8101e899>] ? sched_clock+0x9/0x10
> [  222.308319]  [<ffffffff810a63a7>] ? sched_clock_local+0x17/0x80
> [  222.308335]  [<ffffffff816d4725>] ip_local_out+0x35/0x40
> [  222.308351]  [<ffffffff816d487d>] ip_build_and_send_pkt+0x14d/0x1c0
> [  222.308369]  [<ffffffff816f39fb>] tcp_v4_send_synack+0x5b/0xb0
> [  222.308386]  [<ffffffff816d8fb9>] ? inet_ehash_insert+0x59/0x130
> [  222.308404]  [<ffffffff816da266>] ? inet_csk_reqsk_queue_hash_add+0x76/0xa0
> [  222.308425]  [<ffffffff816e3223>] tcp_conn_request+0x9b3/0x9f0
> [  222.308444]  [<ffffffff816f20bc>] tcp_v4_conn_request+0x4c/0x50
> [  222.308458]  [<ffffffff816e940c>] tcp_rcv_state_process+0x19c/0xcb0
> [  222.308473]  [<ffffffff817746ec>] ? tcp_v4_inbound_md5_hash+0x6d/0x177
> [  222.308485]  [<ffffffff816f2f53>] tcp_v4_do_rcv+0x73/0x210
> [  222.308496]  [<ffffffff816f43b1>] tcp_v4_rcv+0x811/0x840
> [  222.308511]  [<ffffffff816cda9a>] ? ip_route_input_noref+0xb3a/0xd90
> [  222.308524]  [<ffffffff816cf1a3>] ip_local_deliver_finish+0x53/0xe0
> [ 222.308536]  [<ffffffff816cf690>] ip_local_deliver+0x60/0xd0
> [  222.308549]  [<ffffffff816cf2b7>] ip_rcv_finish+0x87/0x2b0
> [  222.308561]  [<ffffffff816cf949>] ip_rcv+0x249/0x350
> [  222.308574]  [<ffffffff8176796c>] ? packet_rcv+0x4c/0x3e0
> [  222.308589]  [<ffffffff81696857>] __netif_receive_skb_core+0x2d7/0x980
> [  222.308602]  [<ffffffff81696f18>] __netif_receive_skb+0x18/0x60
> [  222.308614]  [<ffffffff81697b38>] process_backlog+0xa8/0x150
> [  222.308627]  [<ffffffff816973c3>] net_rx_action+0x1b3/0x2c0
> [  222.308641]  [<ffffffff8107d32c>] __do_softirq+0xfc/0x250
> [  222.308653]  [<ffffffff8107d5de>] irq_exit+0x8e/0x90
> [  222.308667]  [<ffffffff8104a1ce>] hyperv_vector_handler+0x3e/0x50
> [  222.308680]  [<ffffffff817835c2>] hyperv_callback_vector+0x82/0x90
> [  222.308690]  <EOI>  [<ffffffff8105e256>] ? native_safe_halt+0x6/0x10
> [  222.308707]  [<ffffffff8101f7ae>] default_idle+0x1e/0xa0
> [  222.308718]  [<ffffffff8101feff>] arch_cpu_idle+0xf/0x20
> [  222.308731]  [<ffffffff810b86f2>] default_idle_call+0x32/0x40
> [  222.308743]  [<ffffffff810b8a18>] cpu_startup_entry+0x2b8/0x310
> [  222.308756]  [<ffffffff8104c238>] start_secondary+0x178/0x1a0
> [  222.308769] ---[ end trace 0c71438d4d1b6dca ]---

So it looks like you have a device with a very big hh_len

MAX_TCP_HEADER is not enough space to hold all headers, and this is the
bug that needs to be fixed. This is scary to realloc all tcp packets !

Could you add :

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 50e29737b584..164dbbbfe6b1 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -190,6 +190,8 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
 		struct sk_buff *skb2;
 
+		pr_err_once("Wow ! headroom=%u while hh_len(%s)=%u\n",
+			    skb_headroom(skb), dev->name, hh_len);
 		skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
 		if (!skb2) {
 			kfree_skb(skb);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-30 20:02     ` Eric Dumazet
@ 2015-10-30 20:18       ` Eric Dumazet
  2015-10-30 21:42         ` Haiyang Zhang
  2015-10-30 20:28       ` [patch] tcp: attach SYNACK messages to request sockets instead of listener KY Srinivasan
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-10-30 20:18 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: edumazet, David Miller, netdev, KY Srinivasan

On Fri, 2015-10-30 at 13:02 -0700, Eric Dumazet wrote:

> So it looks like you have a device with a very big hh_len
> 
> MAX_TCP_HEADER is not enough space to hold all headers, and this is the
> bug that needs to be fixed. This is scary to realloc all tcp packets !
> 

drivers/net/hyperv/netvsc_drv.c sets a needed_headroom of 220 bytes :(

Could you try :

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ac653b7b8ac..7dbdd29076be 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -132,8 +132,10 @@ static inline bool dev_xmit_complete(int rc)
  *	used.
  */
 
-#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
-# if defined(CONFIG_MAC80211_MESH)
+#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) || IS_ENABLED(CONFIG_HYPERV_NET)
+# if IS_ENABLED(CONFIG_HYPERV_NET)
+# define LL_MAX_HEADER 384
+# elif defined(CONFIG_MAC80211_MESH)
 #  define LL_MAX_HEADER 128
 # else
 #  define LL_MAX_HEADER 96

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-30 20:02     ` Eric Dumazet
  2015-10-30 20:18       ` Eric Dumazet
@ 2015-10-30 20:28       ` KY Srinivasan
  1 sibling, 0 replies; 28+ messages in thread
From: KY Srinivasan @ 2015-10-30 20:28 UTC (permalink / raw)
  To: Eric Dumazet, Haiyang Zhang; +Cc: edumazet, David Miller, netdev



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, October 30, 2015 1:03 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: edumazet@google.com; David Miller <davem@davemloft.net>;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> Subject: Re: [patch] tcp: attach SYNACK messages to request sockets instead
> of listener
> 
> On Fri, 2015-10-30 at 19:38 +0000, Haiyang Zhang wrote:
> >
> > > -----Original Message-----
> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > > Sent: Thursday, October 29, 2015 6:59 PM
> > > To: Haiyang Zhang <haiyangz@microsoft.com>
> > > Cc: edumazet@google.com; David Miller <davem@davemloft.net>;
> > > netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> > > Subject: Re: [patch] tcp: attach SYNACK messages to request sockets
> > > instead of listener
> > >
> > >
> > > Thanks for this report.
> > >
> > > Somehow I knew such bugs would surface ;)
> > >
> > > Please try following debugging patch ?
> > >
> > > We need to identify which part of the kernel is messed up.
> > >
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index aeed5c95f3ca..a643499d37e2 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -1951,6 +1951,14 @@ static inline void skb_set_hash_from_sk(struct
> > > sk_buff *skb, struct sock *sk)
> > >  	}
> > >  }
> > >
> > > +/* This helper checks if a socket is a full socket,
> > > + * ie _not_ a timewait or request socket.
> > > + */
> > > +static inline bool sk_fullsock(const struct sock *sk)
> > > +{
> > > +	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT |
> TCPF_NEW_SYN_RECV);
> > > +}
> > > +
> > >  /*
> > >   *	Queue a received datagram if it will fit. Stream and sequenced
> > >   *	protocols can't normally use this as they need to fit buffers in
> > > @@ -1962,6 +1970,10 @@ static inline void skb_set_hash_from_sk(struct
> > > sk_buff *skb, struct sock *sk)
> > >
> > >  static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
> > >  {
> > > +	if (!sk_fullsock(sk)) {
> > > +		WARN_ON_ONCE(1);
> > > +		return;
> > > +	}
> > >  	skb_orphan(skb);
> > >  	skb->sk = sk;
> > >  	skb->destructor = sock_wfree;
> > > @@ -2223,14 +2235,6 @@ static inline struct sock *skb_steal_sock(struct
> > > sk_buff *skb)
> > >  	return NULL;
> > >  }
> > >
> > > -/* This helper checks if a socket is a full socket,
> > > - * ie _not_ a timewait or request socket.
> > > - */
> > > -static inline bool sk_fullsock(const struct sock *sk)
> > > -{
> > > -	return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT |
> TCPF_NEW_SYN_RECV);
> > > -}
> > > -
> > >  /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
> > >   * SYNACK messages can be attached to either ones (depending on
> > > SYNCOOKIE)
> > >   */
> > >
> >
> > Hi Eric,
> >
> > Thanks for the debug patch. The panic does not happen anymore with
> > the patch. I see a warning call trace:
> >
> > [  222.307948] ------------[ cut here ]------------
> > [  222.308009] WARNING: CPU: 6 PID: 0 at include/net/sock.h:1974
> ip_finish_output2+0x34f/0x360()
> > [  222.308027] Modules linked in: cfg80211 joydev crct10dif_pclmul
> crc32_pclmul aesni_intel aes_x86_64 glue_helper hid_generic lrw gf128mul
> ablk_helper i2c_piix4 hid_hyperv hyperv_fb hid cryptd hyperv_keyboard
> 8250_fintek mac_hid serio_raw parport_pc ppdev lp parport autofs4 hv_utils
> hv_netvsc hv_storvsc psmouse hv_vmbus floppy pata_acpi
> > [  222.308088] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.3.0-rc6-next-
> 20151022+ #2
> > [  222.308104] Hardware name: Microsoft Corporation Virtual
> Machine/Virtual Machine, BIOS 090006  05/23/2012
> > [  222.308120]  ffffffff81b2ae66 ffff88007c783878 ffffffff813a4cf4
> 0000000000000000
> > [  222.308137]  ffff88007c7838b0 ffffffff81078cc6 ffff88005bf2dc00
> ffff880079f58000
> > [  222.308153]  ffff88005bf2c000 ffff88005bf2c800 ffff880050b28000
> ffff88007c7838c0
> > [  222.308171] Call Trace:
> > [  222.308185]  <IRQ>  [<ffffffff813a4cf4>] dump_stack+0x44/0x60
> > [  222.308212]  [<ffffffff81078cc6>] warn_slowpath_common+0x86/0xc0
> > [  222.308228]  [<ffffffff81078dba>] warn_slowpath_null+0x1a/0x20
> > [  222.308245]  [<ffffffff816d2e6f>] ip_finish_output2+0x34f/0x360
> > [  222.308262]  [<ffffffff816d4589>] ip_finish_output+0x149/0x1e0
> > [  222.308280]  [<ffffffff816d4f2c>] ip_output+0x5c/0xc0
> > [  222.308300]  [<ffffffff8101e899>] ? sched_clock+0x9/0x10
> > [  222.308319]  [<ffffffff810a63a7>] ? sched_clock_local+0x17/0x80
> > [  222.308335]  [<ffffffff816d4725>] ip_local_out+0x35/0x40
> > [  222.308351]  [<ffffffff816d487d>] ip_build_and_send_pkt+0x14d/0x1c0
> > [  222.308369]  [<ffffffff816f39fb>] tcp_v4_send_synack+0x5b/0xb0
> > [  222.308386]  [<ffffffff816d8fb9>] ? inet_ehash_insert+0x59/0x130
> > [  222.308404]  [<ffffffff816da266>] ?
> inet_csk_reqsk_queue_hash_add+0x76/0xa0
> > [  222.308425]  [<ffffffff816e3223>] tcp_conn_request+0x9b3/0x9f0
> > [  222.308444]  [<ffffffff816f20bc>] tcp_v4_conn_request+0x4c/0x50
> > [  222.308458]  [<ffffffff816e940c>] tcp_rcv_state_process+0x19c/0xcb0
> > [  222.308473]  [<ffffffff817746ec>] ?
> tcp_v4_inbound_md5_hash+0x6d/0x177
> > [  222.308485]  [<ffffffff816f2f53>] tcp_v4_do_rcv+0x73/0x210
> > [  222.308496]  [<ffffffff816f43b1>] tcp_v4_rcv+0x811/0x840
> > [  222.308511]  [<ffffffff816cda9a>] ? ip_route_input_noref+0xb3a/0xd90
> > [  222.308524]  [<ffffffff816cf1a3>] ip_local_deliver_finish+0x53/0xe0
> > [ 222.308536]  [<ffffffff816cf690>] ip_local_deliver+0x60/0xd0
> > [  222.308549]  [<ffffffff816cf2b7>] ip_rcv_finish+0x87/0x2b0
> > [  222.308561]  [<ffffffff816cf949>] ip_rcv+0x249/0x350
> > [  222.308574]  [<ffffffff8176796c>] ? packet_rcv+0x4c/0x3e0
> > [  222.308589]  [<ffffffff81696857>]
> __netif_receive_skb_core+0x2d7/0x980
> > [  222.308602]  [<ffffffff81696f18>] __netif_receive_skb+0x18/0x60
> > [  222.308614]  [<ffffffff81697b38>] process_backlog+0xa8/0x150
> > [  222.308627]  [<ffffffff816973c3>] net_rx_action+0x1b3/0x2c0
> > [  222.308641]  [<ffffffff8107d32c>] __do_softirq+0xfc/0x250
> > [  222.308653]  [<ffffffff8107d5de>] irq_exit+0x8e/0x90
> > [  222.308667]  [<ffffffff8104a1ce>] hyperv_vector_handler+0x3e/0x50
> > [  222.308680]  [<ffffffff817835c2>] hyperv_callback_vector+0x82/0x90
> > [  222.308690]  <EOI>  [<ffffffff8105e256>] ? native_safe_halt+0x6/0x10
> > [  222.308707]  [<ffffffff8101f7ae>] default_idle+0x1e/0xa0
> > [  222.308718]  [<ffffffff8101feff>] arch_cpu_idle+0xf/0x20
> > [  222.308731]  [<ffffffff810b86f2>] default_idle_call+0x32/0x40
> > [  222.308743]  [<ffffffff810b8a18>] cpu_startup_entry+0x2b8/0x310
> > [  222.308756]  [<ffffffff8104c238>] start_secondary+0x178/0x1a0
> > [  222.308769] ---[ end trace 0c71438d4d1b6dca ]---
> 
> So it looks like you have a device with a very big hh_len

Yes; our driver does ask for additional head room. We can avoid this realloc if we
set  LL_MAX_HEADER  to 224.

K. Y
> 
> MAX_TCP_HEADER is not enough space to hold all headers, and this is the
> bug that needs to be fixed. This is scary to realloc all tcp packets !
> 
> Could you add :
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 50e29737b584..164dbbbfe6b1 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -190,6 +190,8 @@ static int ip_finish_output2(struct net *net, struct
> sock *sk, struct sk_buff *s
>  	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
>  		struct sk_buff *skb2;
> 
> +		pr_err_once("Wow ! headroom=%u while
> hh_len(%s)=%u\n",
> +			    skb_headroom(skb), dev->name, hh_len);
>  		skb2 = skb_realloc_headroom(skb,
> LL_RESERVED_SPACE(dev));
>  		if (!skb2) {
>  			kfree_skb(skb);
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-30 20:18       ` Eric Dumazet
@ 2015-10-30 21:42         ` Haiyang Zhang
  2015-10-30 23:52           ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Haiyang Zhang @ 2015-10-30 21:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: edumazet, David Miller, netdev, KY Srinivasan



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, October 30, 2015 4:19 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: edumazet@google.com; David Miller <davem@davemloft.net>;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> Subject: Re: [patch] tcp: attach SYNACK messages to request sockets
> instead of listener
> 
> On Fri, 2015-10-30 at 13:02 -0700, Eric Dumazet wrote:
> 
> > So it looks like you have a device with a very big hh_len
> >
> > MAX_TCP_HEADER is not enough space to hold all headers, and this is
> the
> > bug that needs to be fixed. This is scary to realloc all tcp packets !
> >
> 
> drivers/net/hyperv/netvsc_drv.c sets a needed_headroom of 220 bytes :(
> 
> Could you try :
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 4ac653b7b8ac..7dbdd29076be 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -132,8 +132,10 @@ static inline bool dev_xmit_complete(int rc)
>   *	used.
>   */
> 
> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> -# if defined(CONFIG_MAC80211_MESH)
> +#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) ||
> IS_ENABLED(CONFIG_HYPERV_NET)
> +# if IS_ENABLED(CONFIG_HYPERV_NET)
> +# define LL_MAX_HEADER 384
> +# elif defined(CONFIG_MAC80211_MESH)
>  #  define LL_MAX_HEADER 128
>  # else
>  #  define LL_MAX_HEADER 96
> 

With your 2nd patch, I saw:
[   19.242104] Wow ! headroom=164 while hh_len(eth0)=240

After adding the 3rd patch with increased LL_MAX_HEADER,
the warning and stack trace no longer show up.

Also, could you fix the net core code so that it doesn't 
panic even with reallocated header?

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] tcp: attach SYNACK messages to request sockets instead of listener
  2015-10-30 21:42         ` Haiyang Zhang
@ 2015-10-30 23:52           ` Eric Dumazet
  2015-11-01 17:20             ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-10-30 23:52 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: edumazet, David Miller, netdev, KY Srinivasan

On Fri, 2015-10-30 at 21:42 +0000, Haiyang Zhang wrote:
> 

> 
> With your 2nd patch, I saw:
> [   19.242104] Wow ! headroom=164 while hh_len(eth0)=240
> 
> After adding the 3rd patch with increased LL_MAX_HEADER,
> the warning and stack trace no longer show up.
> 
> Also, could you fix the net core code so that it doesn't 
> panic even with reallocated header?

Well, the reason we have a net-next tree, is that we can let
new features settling for a while, and eventually detect and fix
old bugs.

In your case, having the proper LL_MAX_HEADER will likely improve
performance of TCP_RR like workloads by a nice number :

Re-allocating/copying all TCP packets without notice was _not_ a nice
feature.

Panicing while in net-next mode certainly is something a developer will
notice and report.

BTW, it is possible we have to revert the SYNACK attachment to request
sockets, if it turns out we have too many bugs to fix.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-10-30 23:52           ` Eric Dumazet
@ 2015-11-01 17:20             ` Eric Dumazet
  2015-11-01 20:58               ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-11-01 17:20 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: edumazet, David Miller, netdev, KY Srinivasan

From: Eric Dumazet <edumazet@google.com>

My recent commit, attaching SYNACK messages to request sockets
exposed a too small LL_MAX_HEADER when netvsc_drv.c is in use,
because this driver sets a needed_headroom of 220 bytes.

Increase LL_MAX_HEADER in this case, to avoid a realloc of all
TCP frames.

In another patch, I'll make skb_set_owner_w() more robust.

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 include/linux/netdevice.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ac653b7b8ac..04e3864e660e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -132,8 +132,10 @@ static inline bool dev_xmit_complete(int rc)
  *	used.
  */
 
-#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
-# if defined(CONFIG_MAC80211_MESH)
+#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) || IS_ENABLED(CONFIG_HYPERV_NET)
+# if IS_ENABLED(CONFIG_HYPERV_NET)
+# define LL_MAX_HEADER 256
+# elif defined(CONFIG_MAC80211_MESH)
 #  define LL_MAX_HEADER 128
 # else
 #  define LL_MAX_HEADER 96

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-01 17:20             ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled Eric Dumazet
@ 2015-11-01 20:58               ` David Miller
  2015-11-01 22:36                 ` Eric Dumazet
  2015-11-03  7:59                 ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled KY Srinivasan
  0 siblings, 2 replies; 28+ messages in thread
From: David Miller @ 2015-11-01 20:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: haiyangz, edumazet, netdev, kys

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 01 Nov 2015 09:20:59 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> My recent commit, attaching SYNACK messages to request sockets
> exposed a too small LL_MAX_HEADER when netvsc_drv.c is in use,
> because this driver sets a needed_headroom of 220 bytes.
> 
> Increase LL_MAX_HEADER in this case, to avoid a realloc of all
> TCP frames.
> 
> In another patch, I'll make skb_set_owner_w() more robust.
> 
> Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>

Using a value of 256 just because HYPER-V is crazy imposes a huge
unnecessary burdon upon the rest of the stack.

I rejected a previous attempt to use such a huge value for
LL_MAX_HEADER, and I will do so again here.  We need a different fix
for this issue, one that doesn't hurt everyone.

Every distribution is going to turn all the options on, so you might
as well consider the largest LL_MAX_HEADER value the one %99.999
users end up paying the price for.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-01 20:58               ` David Miller
@ 2015-11-01 22:36                 ` Eric Dumazet
  2015-11-01 22:58                   ` [PATCH net-next] net: make skb_set_owner_w() more robust Eric Dumazet
  2015-11-03  7:59                 ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled KY Srinivasan
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-11-01 22:36 UTC (permalink / raw)
  To: David Miller; +Cc: haiyangz, edumazet, netdev, kys

On Sun, 2015-11-01 at 15:58 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 01 Nov 2015 09:20:59 -0800
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > My recent commit, attaching SYNACK messages to request sockets
> > exposed a too small LL_MAX_HEADER when netvsc_drv.c is in use,
> > because this driver sets a needed_headroom of 220 bytes.
> > 
> > Increase LL_MAX_HEADER in this case, to avoid a realloc of all
> > TCP frames.
> > 
> > In another patch, I'll make skb_set_owner_w() more robust.
> > 
> > Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Using a value of 256 just because HYPER-V is crazy imposes a huge
> unnecessary burdon upon the rest of the stack.
> 
> I rejected a previous attempt to use such a huge value for
> LL_MAX_HEADER, and I will do so again here.  We need a different fix
> for this issue, one that doesn't hurt everyone.

Sure, I was planning to send the skb_set_owner_w() fix as well.

I was not aware you already objected to LL_MAX_HEADER increase.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH net-next] net: make skb_set_owner_w() more robust
  2015-11-01 22:36                 ` Eric Dumazet
@ 2015-11-01 22:58                   ` Eric Dumazet
  2015-11-01 23:18                     ` kbuild test robot
  2015-11-01 23:36                     ` [PATCH v2 " Eric Dumazet
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-11-01 22:58 UTC (permalink / raw)
  To: David Miller; +Cc: haiyangz, edumazet, netdev, kys

From: Eric Dumazet <edumazet@google.com>

skb_set_owner_w() is called from various places that assume
skb->sk always point to a full blown socket (as it changes
sk->sk_wmem_alloc)

We'd like to attach skb to request sockets, and in the future
to timewait sockets as well. For these kind of pseudo sockets,
we need to take a traditional refcount and use sock_edemux()
as the destructor.

It is now time to un-inline skb_set_owner_w(), being too big.

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 include/net/sock.h    |   17 ++---------------
 net/core/sock.c       |   20 ++++++++++++++++++++
 net/ipv4/tcp_output.c |    4 +---
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index aeed5c95f3ca..f570e75e3da9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1951,6 +1951,8 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 	}
 }
 
+void skb_set_owner_w(struct sk_buff *skb, struct sock *sk);
+
 /*
  *	Queue a received datagram if it will fit. Stream and sequenced
  *	protocols can't normally use this as they need to fit buffers in
@@ -1959,21 +1961,6 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
  *	Inlined as it's very short and called for pretty much every
  *	packet ever received.
  */
-
-static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
-{
-	skb_orphan(skb);
-	skb->sk = sk;
-	skb->destructor = sock_wfree;
-	skb_set_hash_from_sk(skb, sk);
-	/*
-	 * We used to take a refcount on sk, but following operation
-	 * is enough to guarantee sk_free() wont free this sock until
-	 * all in-flight packets are completed
-	 */
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
-}
-
 static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
 {
 	skb_orphan(skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 0ef30aa90132..8d8abcb4d8cd 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1656,6 +1656,26 @@ void sock_wfree(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sock_wfree);
 
+void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
+{
+	skb_orphan(skb);
+	skb->sk = sk;
+	if (unlikely(!sk_fullsock(sk))) {
+		skb->destructor = sock_edemux;
+		sock_hold(sk);
+		return;
+	}
+	skb->destructor = sock_wfree;
+	skb_set_hash_from_sk(skb, sk);
+	/*
+	 * We used to take a refcount on sk, but following operation
+	 * is enough to guarantee sk_free() wont free this sock until
+	 * all in-flight packets are completed
+	 */
+	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+}
+EXPORT_SYMBOL(skb_set_owner_w);
+
 void skb_orphan_partial(struct sk_buff *skb)
 {
 	/* TCP stack sets skb->ooo_okay based on sk_wmem_alloc,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f4f9793eb025..cb7ca569052c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2963,9 +2963,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	skb_reserve(skb, MAX_TCP_HEADER);
 
 	if (attach_req) {
-		skb->destructor = sock_edemux;
-		sock_hold(req_to_sk(req));
-		skb->sk = req_to_sk(req);
+		skb_set_owner_w(skb, req_to_sk(req));
 	} else {
 		/* sk is a const pointer, because we want to express multiple
 		 * cpu might call us concurrently.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: make skb_set_owner_w() more robust
  2015-11-01 22:58                   ` [PATCH net-next] net: make skb_set_owner_w() more robust Eric Dumazet
@ 2015-11-01 23:18                     ` kbuild test robot
  2015-11-01 23:27                       ` Eric Dumazet
  2015-11-01 23:36                     ` [PATCH v2 " Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: kbuild test robot @ 2015-11-01 23:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kbuild-all, David Miller, haiyangz, edumazet, netdev, kys

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]

Hi Eric,

[auto build test ERROR on net-next/master -- if it's inappropriate base, please suggest rules for selecting the more suitable base]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-make-skb_set_owner_w-more-robust/20151102-070107
config: x86_64-randconfig-x019-201544 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/core/sock.c: In function 'skb_set_owner_w':
>> net/core/sock.c:1664:21: error: 'sock_edemux' undeclared (first use in this function)
      skb->destructor = sock_edemux;
                        ^
   net/core/sock.c:1664:21: note: each undeclared identifier is reported only once for each function it appears in

vim +/sock_edemux +1664 net/core/sock.c

  1658	
  1659	void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
  1660	{
  1661		skb_orphan(skb);
  1662		skb->sk = sk;
  1663		if (unlikely(!sk_fullsock(sk))) {
> 1664			skb->destructor = sock_edemux;
  1665			sock_hold(sk);
  1666			return;
  1667		}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 25673 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: make skb_set_owner_w() more robust
  2015-11-01 23:18                     ` kbuild test robot
@ 2015-11-01 23:27                       ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-11-01 23:27 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, David Miller, haiyangz, edumazet, netdev, kys

On Mon, 2015-11-02 at 07:18 +0800, kbuild test robot wrote:
> Hi Eric,
> 
> [auto build test ERROR on net-next/master -- if it's inappropriate base, please suggest rules for selecting the more suitable base]
> 
> url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-make-skb_set_owner_w-more-robust/20151102-070107
> config: x86_64-randconfig-x019-201544 (attached as .config)
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>    net/core/sock.c: In function 'skb_set_owner_w':
> >> net/core/sock.c:1664:21: error: 'sock_edemux' undeclared (first use in this function)
>       skb->destructor = sock_edemux;
>                         ^
>    net/core/sock.c:1664:21: note: each undeclared identifier is reported only once for each function it appears in

Yes, I forgot sock_edemux() was guarded by CONFIG_INET.

Kind of silly options really...

I will send a V2.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 net-next] net: make skb_set_owner_w() more robust
  2015-11-01 22:58                   ` [PATCH net-next] net: make skb_set_owner_w() more robust Eric Dumazet
  2015-11-01 23:18                     ` kbuild test robot
@ 2015-11-01 23:36                     ` Eric Dumazet
  2015-11-02 20:05                       ` Haiyang Zhang
  2015-11-02 21:29                       ` David Miller
  1 sibling, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2015-11-01 23:36 UTC (permalink / raw)
  To: David Miller; +Cc: haiyangz, edumazet, netdev, kys

From: Eric Dumazet <edumazet@google.com>

skb_set_owner_w() is called from various places that assume
skb->sk always point to a full blown socket (as it changes
sk->sk_wmem_alloc)

We'd like to attach skb to request sockets, and in the future
to timewait sockets as well. For these kind of pseudo sockets,
we need to take a traditional refcount and use sock_edemux()
as the destructor.

It is now time to un-inline skb_set_owner_w(), being too big.

Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
---
v2: sock_edemux() must be guarded by CONFIG_INET

 include/net/sock.h    |   17 ++---------------
 net/core/sock.c       |   22 ++++++++++++++++++++++
 net/ipv4/tcp_output.c |    4 +---
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index aeed5c95f3ca..f570e75e3da9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1951,6 +1951,8 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 	}
 }
 
+void skb_set_owner_w(struct sk_buff *skb, struct sock *sk);
+
 /*
  *	Queue a received datagram if it will fit. Stream and sequenced
  *	protocols can't normally use this as they need to fit buffers in
@@ -1959,21 +1961,6 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
  *	Inlined as it's very short and called for pretty much every
  *	packet ever received.
  */
-
-static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
-{
-	skb_orphan(skb);
-	skb->sk = sk;
-	skb->destructor = sock_wfree;
-	skb_set_hash_from_sk(skb, sk);
-	/*
-	 * We used to take a refcount on sk, but following operation
-	 * is enough to guarantee sk_free() wont free this sock until
-	 * all in-flight packets are completed
-	 */
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
-}
-
 static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
 {
 	skb_orphan(skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 0ef30aa90132..7529eb9463be 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1656,6 +1656,28 @@ void sock_wfree(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sock_wfree);
 
+void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
+{
+	skb_orphan(skb);
+	skb->sk = sk;
+#ifdef CONFIG_INET
+	if (unlikely(!sk_fullsock(sk))) {
+		skb->destructor = sock_edemux;
+		sock_hold(sk);
+		return;
+	}
+#endif
+	skb->destructor = sock_wfree;
+	skb_set_hash_from_sk(skb, sk);
+	/*
+	 * We used to take a refcount on sk, but following operation
+	 * is enough to guarantee sk_free() wont free this sock until
+	 * all in-flight packets are completed
+	 */
+	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+}
+EXPORT_SYMBOL(skb_set_owner_w);
+
 void skb_orphan_partial(struct sk_buff *skb)
 {
 	/* TCP stack sets skb->ooo_okay based on sk_wmem_alloc,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f4f9793eb025..cb7ca569052c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2963,9 +2963,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	skb_reserve(skb, MAX_TCP_HEADER);
 
 	if (attach_req) {
-		skb->destructor = sock_edemux;
-		sock_hold(req_to_sk(req));
-		skb->sk = req_to_sk(req);
+		skb_set_owner_w(skb, req_to_sk(req));
 	} else {
 		/* sk is a const pointer, because we want to express multiple
 		 * cpu might call us concurrently.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 net-next] net: make skb_set_owner_w() more robust
  2015-11-01 23:36                     ` [PATCH v2 " Eric Dumazet
@ 2015-11-02 20:05                       ` Haiyang Zhang
  2015-11-02 20:09                         ` Eric Dumazet
  2015-11-02 21:29                       ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Haiyang Zhang @ 2015-11-02 20:05 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: edumazet, netdev, KY Srinivasan



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Sunday, November 1, 2015 6:37 PM
> To: David Miller <davem@davemloft.net>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>; edumazet@google.com;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> Subject: [PATCH v2 net-next] net: make skb_set_owner_w() more robust
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> skb_set_owner_w() is called from various places that assume
> skb->sk always point to a full blown socket (as it changes
> sk->sk_wmem_alloc)
> 
> We'd like to attach skb to request sockets, and in the future
> to timewait sockets as well. For these kind of pseudo sockets,
> we need to take a traditional refcount and use sock_edemux()
> as the destructor.
> 
> It is now time to un-inline skb_set_owner_w(), being too big.
> 
> Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets
> instead of listener")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> v2: sock_edemux() must be guarded by CONFIG_INET
> 
>  include/net/sock.h    |   17 ++---------------
>  net/core/sock.c       |   22 ++++++++++++++++++++++
>  net/ipv4/tcp_output.c |    4 +---
>  3 files changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index aeed5c95f3ca..f570e75e3da9 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1951,6 +1951,8 @@ static inline void skb_set_hash_from_sk(struct
> sk_buff *skb, struct sock *sk)
>  	}
>  }
> 
> +void skb_set_owner_w(struct sk_buff *skb, struct sock *sk);
> +
>  /*
>   *	Queue a received datagram if it will fit. Stream and sequenced
>   *	protocols can't normally use this as they need to fit buffers in
> @@ -1959,21 +1961,6 @@ static inline void skb_set_hash_from_sk(struct
> sk_buff *skb, struct sock *sk)
>   *	Inlined as it's very short and called for pretty much every
>   *	packet ever received.
>   */
> -
> -static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
> -{
> -	skb_orphan(skb);
> -	skb->sk = sk;
> -	skb->destructor = sock_wfree;
> -	skb_set_hash_from_sk(skb, sk);
> -	/*
> -	 * We used to take a refcount on sk, but following operation
> -	 * is enough to guarantee sk_free() wont free this sock until
> -	 * all in-flight packets are completed
> -	 */
> -	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
> -}
> -
>  static inline void skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
>  {
>  	skb_orphan(skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 0ef30aa90132..7529eb9463be 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1656,6 +1656,28 @@ void sock_wfree(struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(sock_wfree);
> 
> +void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
> +{
> +	skb_orphan(skb);
> +	skb->sk = sk;
> +#ifdef CONFIG_INET
> +	if (unlikely(!sk_fullsock(sk))) {

Thanks for the fix!
For some driver, like ours, this condition may not be "unlikely".
So could you remove the "unlikely"?

Thanks,
- Haiyang



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next] net: make skb_set_owner_w() more robust
  2015-11-02 20:05                       ` Haiyang Zhang
@ 2015-11-02 20:09                         ` Eric Dumazet
  2015-11-02 20:26                           ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-11-02 20:09 UTC (permalink / raw)
  To: Haiyang Zhang; +Cc: David Miller, edumazet, netdev, KY Srinivasan

On Mon, 2015-11-02 at 20:05 +0000, Haiyang Zhang wrote:

> Thanks for the fix!
> For some driver, like ours, this condition may not be "unlikely".
> So could you remove the "unlikely"?

No, I wont remove the unlikely.

Look, your main issue is about reallocating skbs, because of excessive
dev->needed_headroom.

An unlikely() mismatch is 1000 times less expensive, why would you
care ?

If you really care, fix your driver to not abuse skb->head to store 220
bytes of private data.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next] net: make skb_set_owner_w() more robust
  2015-11-02 20:09                         ` Eric Dumazet
@ 2015-11-02 20:26                           ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2015-11-02 20:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: haiyangz, edumazet, netdev, kys

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 02 Nov 2015 12:09:25 -0800

> On Mon, 2015-11-02 at 20:05 +0000, Haiyang Zhang wrote:
> 
>> Thanks for the fix!
>> For some driver, like ours, this condition may not be "unlikely".
>> So could you remove the "unlikely"?
> 
> No, I wont remove the unlikely.
> 
> Look, your main issue is about reallocating skbs, because of excessive
> dev->needed_headroom.
> 
> An unlikely() mismatch is 1000 times less expensive, why would you
> care ?
> 
> If you really care, fix your driver to not abuse skb->head to store 220
> bytes of private data.

+1

And I've been saying this from the beginning.  This driver must place
it's private per-packet data in another location if it wants optimal
behavior inside of the Linux networking stack.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 net-next] net: make skb_set_owner_w() more robust
  2015-11-01 23:36                     ` [PATCH v2 " Eric Dumazet
  2015-11-02 20:05                       ` Haiyang Zhang
@ 2015-11-02 21:29                       ` David Miller
  1 sibling, 0 replies; 28+ messages in thread
From: David Miller @ 2015-11-02 21:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: haiyangz, edumazet, netdev, kys

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 01 Nov 2015 15:36:55 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> skb_set_owner_w() is called from various places that assume
> skb->sk always point to a full blown socket (as it changes
> sk->sk_wmem_alloc)
> 
> We'd like to attach skb to request sockets, and in the future
> to timewait sockets as well. For these kind of pseudo sockets,
> we need to take a traditional refcount and use sock_edemux()
> as the destructor.
> 
> It is now time to un-inline skb_set_owner_w(), being too big.
> 
> Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-01 20:58               ` David Miller
  2015-11-01 22:36                 ` Eric Dumazet
@ 2015-11-03  7:59                 ` KY Srinivasan
  2015-11-03 15:33                   ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: KY Srinivasan @ 2015-11-03  7:59 UTC (permalink / raw)
  To: David Miller, eric.dumazet; +Cc: Haiyang Zhang, edumazet, netdev



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Sunday, November 1, 2015 12:59 PM
> To: eric.dumazet@gmail.com
> Cc: Haiyang Zhang <haiyangz@microsoft.com>; edumazet@google.com;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>
> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET
> is enabled
> 
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 01 Nov 2015 09:20:59 -0800
> 
> > From: Eric Dumazet <edumazet@google.com>
> >
> > My recent commit, attaching SYNACK messages to request sockets
> > exposed a too small LL_MAX_HEADER when netvsc_drv.c is in use,
> > because this driver sets a needed_headroom of 220 bytes.
> >
> > Increase LL_MAX_HEADER in this case, to avoid a realloc of all
> > TCP frames.
> >
> > In another patch, I'll make skb_set_owner_w() more robust.
> >
> > Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets
> instead of listener")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Using a value of 256 just because HYPER-V is crazy imposes a huge
> unnecessary burdon upon the rest of the stack.
> 
> I rejected a previous attempt to use such a huge value for
> LL_MAX_HEADER, and I will do so again here.  We need a different fix
> for this issue, one that doesn't hurt everyone.
> 
> Every distribution is going to turn all the options on, so you might
> as well consider the largest LL_MAX_HEADER value the one %99.999
> users end up paying the price for.

David,

I have implemented the scheme we had discussed a few weeks ago. In this new implementation
our driver is NOT requesting addition headroom - rndis header and the per packet state is being
maintained outside of the skb. What I am seeing is that when I have LL_MAX_HEADER set to 220 bytes,
even though our driver is not using the additional head room, I see about a 10% boost in the peak performance
(about 34 Gbps on a 40Gbps interface). However, when I set the LL_MAX_HEADER value to the current default,
the peak performance drops back to what we currently have (around 31 Gbps). In both these cases,
there is no reallocation of skb since no additional headroom is being requested and yet there is a significant
difference in performance.  I trying to figure out why this is the case, your insights will be greatly
appreciated.

Regards,

K. Y

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03  7:59                 ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled KY Srinivasan
@ 2015-11-03 15:33                   ` David Miller
  2015-11-03 16:37                     ` Eric Dumazet
  2015-11-03 18:09                     ` KY Srinivasan
  0 siblings, 2 replies; 28+ messages in thread
From: David Miller @ 2015-11-03 15:33 UTC (permalink / raw)
  To: kys; +Cc: eric.dumazet, haiyangz, edumazet, netdev

From: KY Srinivasan <kys@microsoft.com>
Date: Tue, 3 Nov 2015 07:59:36 +0000

> I have implemented the scheme we had discussed a few weeks ago. In
> this new implementation our driver is NOT requesting addition
> headroom - rndis header and the per packet state is being maintained
> outside of the skb. What I am seeing is that when I have
> LL_MAX_HEADER set to 220 bytes, even though our driver is not using
> the additional head room, I see about a 10% boost in the peak
> performance (about 34 Gbps on a 40Gbps interface). However, when I
> set the LL_MAX_HEADER value to the current default, the peak
> performance drops back to what we currently have (around 31
> Gbps). In both these cases, there is no reallocation of skb since no
> additional headroom is being requested and yet there is a
> significant difference in performance.  I trying to figure out why
> this is the case, your insights will be greatly appreciated.

It probably has something to do with cache line or data alignment.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 15:33                   ` David Miller
@ 2015-11-03 16:37                     ` Eric Dumazet
  2015-11-03 17:34                       ` Haiyang Zhang
  2015-11-03 18:09                     ` KY Srinivasan
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2015-11-03 16:37 UTC (permalink / raw)
  To: David Miller; +Cc: kys, haiyangz, edumazet, netdev

On Tue, 2015-11-03 at 10:33 -0500, David Miller wrote:
> From: KY Srinivasan <kys@microsoft.com>
> Date: Tue, 3 Nov 2015 07:59:36 +0000
> 
> > I have implemented the scheme we had discussed a few weeks ago. In
> > this new implementation our driver is NOT requesting addition
> > headroom - rndis header and the per packet state is being maintained
> > outside of the skb. What I am seeing is that when I have
> > LL_MAX_HEADER set to 220 bytes, even though our driver is not using
> > the additional head room, I see about a 10% boost in the peak
> > performance (about 34 Gbps on a 40Gbps interface). However, when I
> > set the LL_MAX_HEADER value to the current default, the peak
> > performance drops back to what we currently have (around 31
> > Gbps). In both these cases, there is no reallocation of skb since no
> > additional headroom is being requested and yet there is a
> > significant difference in performance.  I trying to figure out why
> > this is the case, your insights will be greatly appreciated.
> 
> It probably has something to do with cache line or data alignment.

This also might be because of a slight change in skb->truesize, and/or
a change of amount of payload in skb->head

(Increasing LL_MAX_HEADER is reducing amount of payload in skb->head)

Can't you run perf tool to get some precise profiling ?


Another red flag in you driver xmit is :

return (ret == -EAGAIN) ? NETDEV_TX_BUSY : NETDEV_TX_OK;


extract from include/linux/netdevice.h
 * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
 *                               struct net_device *dev);
 *      Called when a packet needs to be transmitted.
 *      Returns NETDEV_TX_OK.  Can return NETDEV_TX_BUSY, but you should stop
 *      the queue before that can happen; it's for obsolete devices and weird
 *      corner cases, but the stack really does a non-trivial amount
 *      of useless work if you return NETDEV_TX_BUSY.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 16:37                     ` Eric Dumazet
@ 2015-11-03 17:34                       ` Haiyang Zhang
  2015-11-03 18:20                         ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Haiyang Zhang @ 2015-11-03 17:34 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: KY Srinivasan, edumazet, netdev



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, November 3, 2015 11:37 AM
> To: David Miller <davem@davemloft.net>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; edumazet@google.com;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET
> is enabled
> 
> On Tue, 2015-11-03 at 10:33 -0500, David Miller wrote:
> > From: KY Srinivasan <kys@microsoft.com>
> > Date: Tue, 3 Nov 2015 07:59:36 +0000
> >
> > > I have implemented the scheme we had discussed a few weeks ago. In
> > > this new implementation our driver is NOT requesting addition
> > > headroom - rndis header and the per packet state is being maintained
> > > outside of the skb. What I am seeing is that when I have
> > > LL_MAX_HEADER set to 220 bytes, even though our driver is not using
> > > the additional head room, I see about a 10% boost in the peak
> > > performance (about 34 Gbps on a 40Gbps interface). However, when I
> > > set the LL_MAX_HEADER value to the current default, the peak
> > > performance drops back to what we currently have (around 31 Gbps).
> > > In both these cases, there is no reallocation of skb since no
> > > additional headroom is being requested and yet there is a
> > > significant difference in performance.  I trying to figure out why
> > > this is the case, your insights will be greatly appreciated.
> >
> > It probably has something to do with cache line or data alignment.
> 
> This also might be because of a slight change in skb->truesize, and/or a
> change of amount of payload in skb->head
> 
> (Increasing LL_MAX_HEADER is reducing amount of payload in skb->head)
> 
> Can't you run perf tool to get some precise profiling ?
> 
> 
> Another red flag in you driver xmit is :
> 
> return (ret == -EAGAIN) ? NETDEV_TX_BUSY : NETDEV_TX_OK;
> 
> 
> extract from include/linux/netdevice.h
>  * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
>  *                               struct net_device *dev);
>  *      Called when a packet needs to be transmitted.
>  *      Returns NETDEV_TX_OK.  Can return NETDEV_TX_BUSY, but you should
> stop
>  *      the queue before that can happen; it's for obsolete devices and weird
>  *      corner cases, but the stack really does a non-trivial amount
>  *      of useless work if you return NETDEV_TX_BUSY.

We stop xmit when ring buffer is <10% available (netvsc_send_pkt()), so we 
almost never hit empty ring buffer and return NETDEV_TX_BUSY. 
But we still keep this busy return in our code, just for "weird corner cases".

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 15:33                   ` David Miller
  2015-11-03 16:37                     ` Eric Dumazet
@ 2015-11-03 18:09                     ` KY Srinivasan
  1 sibling, 0 replies; 28+ messages in thread
From: KY Srinivasan @ 2015-11-03 18:09 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, Haiyang Zhang, edumazet, netdev



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, November 3, 2015 7:33 AM
> To: KY Srinivasan <kys@microsoft.com>
> Cc: eric.dumazet@gmail.com; Haiyang Zhang <haiyangz@microsoft.com>;
> edumazet@google.com; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET
> is enabled
> 
> From: KY Srinivasan <kys@microsoft.com>
> Date: Tue, 3 Nov 2015 07:59:36 +0000
> 
> > I have implemented the scheme we had discussed a few weeks ago. In
> > this new implementation our driver is NOT requesting addition
> > headroom - rndis header and the per packet state is being maintained
> > outside of the skb. What I am seeing is that when I have
> > LL_MAX_HEADER set to 220 bytes, even though our driver is not using
> > the additional head room, I see about a 10% boost in the peak
> > performance (about 34 Gbps on a 40Gbps interface). However, when I
> > set the LL_MAX_HEADER value to the current default, the peak
> > performance drops back to what we currently have (around 31
> > Gbps). In both these cases, there is no reallocation of skb since no
> > additional headroom is being requested and yet there is a
> > significant difference in performance.  I trying to figure out why
> > this is the case, your insights will be greatly appreciated.
> 
> It probably has something to do with cache line or data alignment.

Thanks David. I too am leaning towards the same conclusion. Since I am
not using any part of skb, it looks like the alignment issue is solved by
having a larger LL_MAX_HEADER (220 bytes to be specific). I will do some
bare metal testing on the setup we have and report back.

Regards,

K. Y 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 17:34                       ` Haiyang Zhang
@ 2015-11-03 18:20                         ` David Miller
  2015-11-03 18:49                           ` Haiyang Zhang
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2015-11-03 18:20 UTC (permalink / raw)
  To: haiyangz; +Cc: eric.dumazet, kys, edumazet, netdev

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Tue, 3 Nov 2015 17:34:47 +0000

> But we still keep this busy return in our code, just for "weird corner cases".

The queue_stopped condition must be precise.

If you cannot enqueue a maximally segmented SKB, you must stop the queue.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 18:20                         ` David Miller
@ 2015-11-03 18:49                           ` Haiyang Zhang
  2015-11-03 19:50                             ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Haiyang Zhang @ 2015-11-03 18:49 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, KY Srinivasan, edumazet, netdev



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, November 3, 2015 1:20 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: eric.dumazet@gmail.com; KY Srinivasan <kys@microsoft.com>;
> edumazet@google.com; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET
> is enabled
> 
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Date: Tue, 3 Nov 2015 17:34:47 +0000
> 
> > But we still keep this busy return in our code, just for "weird corner cases".
> 
> The queue_stopped condition must be precise.

The only case we return NETDEV_TX_BUSY is when the outgoing ring buffer is full, 
which almost never happens because we stop the xmit queue if ring is <10% available.

Thanks,
- Haiyang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 18:49                           ` Haiyang Zhang
@ 2015-11-03 19:50                             ` David Miller
  2015-11-03 21:00                               ` Haiyang Zhang
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2015-11-03 19:50 UTC (permalink / raw)
  To: haiyangz; +Cc: eric.dumazet, kys, edumazet, netdev

From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Tue, 3 Nov 2015 18:49:05 +0000

>> -----Original Message-----
>> From: David Miller [mailto:davem@davemloft.net]
>> Sent: Tuesday, November 3, 2015 1:20 PM
>> To: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: eric.dumazet@gmail.com; KY Srinivasan <kys@microsoft.com>;
>> edumazet@google.com; netdev@vger.kernel.org
>> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET
>> is enabled
>> 
>> From: Haiyang Zhang <haiyangz@microsoft.com>
>> Date: Tue, 3 Nov 2015 17:34:47 +0000
>> 
>> > But we still keep this busy return in our code, just for "weird corner cases".
>> 
>> The queue_stopped condition must be precise.
> 
> The only case we return NETDEV_TX_BUSY is when the outgoing ring buffer is full, 
> which almost never happens because we stop the xmit queue if ring is <10% available.

I don't think you understand.

You must perform the queue stop operation such that it is impossible for your
->ndo_start_xmit() method to be invoked in a way such that you cannot transmit
the SKB given to you immediately.

It's quite tiring to keep trying to explain this over and over repeatedly.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled
  2015-11-03 19:50                             ` David Miller
@ 2015-11-03 21:00                               ` Haiyang Zhang
  0 siblings, 0 replies; 28+ messages in thread
From: Haiyang Zhang @ 2015-11-03 21:00 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, KY Srinivasan, edumazet, netdev



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, November 3, 2015 2:50 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: eric.dumazet@gmail.com; KY Srinivasan <kys@microsoft.com>;
> edumazet@google.com; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET
> is enabled
> 
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Date: Tue, 3 Nov 2015 18:49:05 +0000
> 
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Tuesday, November 3, 2015 1:20 PM
> >> To: Haiyang Zhang <haiyangz@microsoft.com>
> >> Cc: eric.dumazet@gmail.com; KY Srinivasan <kys@microsoft.com>;
> >> edumazet@google.com; netdev@vger.kernel.org
> >> Subject: Re: [PATCH net-next] net: increase LL_MAX_HEADER if
> >> HYPERV_NET is enabled
> >>
> >> From: Haiyang Zhang <haiyangz@microsoft.com>
> >> Date: Tue, 3 Nov 2015 17:34:47 +0000
> >>
> >> > But we still keep this busy return in our code, just for "weird corner
> cases".
> >>
> >> The queue_stopped condition must be precise.
> >
> > The only case we return NETDEV_TX_BUSY is when the outgoing ring
> > buffer is full, which almost never happens because we stop the xmit queue
> if ring is <10% available.
> 
> I don't think you understand.
> 
> You must perform the queue stop operation such that it is impossible for
> your
> ->ndo_start_xmit() method to be invoked in a way such that you cannot
> ->transmit
> the SKB given to you immediately.

We already did the queue stop operation in the netvsc_send_pkt() in file netvsc.c:
		if (ring_avail < RING_AVAIL_PERCENT_LOWATER) {
			netif_tx_stop_queue(netdev_get_tx_queue(ndev, q_idx));
This flow control mechanism stops the tx queue when the outgoing buffer on vmbus 
is <10% available. So we can always transmit immediately when start_xmit is called.

The case of returning NETDEV_TX_BUSY is not expected to happen normally.

Thanks,
- Haiyang

> 
> It's quite tiring to keep trying to explain this over and over repeatedly.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-11-03 21:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-29 21:49 [patch] tcp: attach SYNACK messages to request sockets instead of listener Haiyang Zhang
2015-10-29 22:58 ` Eric Dumazet
2015-10-30 19:38   ` Haiyang Zhang
2015-10-30 20:02     ` Eric Dumazet
2015-10-30 20:18       ` Eric Dumazet
2015-10-30 21:42         ` Haiyang Zhang
2015-10-30 23:52           ` Eric Dumazet
2015-11-01 17:20             ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled Eric Dumazet
2015-11-01 20:58               ` David Miller
2015-11-01 22:36                 ` Eric Dumazet
2015-11-01 22:58                   ` [PATCH net-next] net: make skb_set_owner_w() more robust Eric Dumazet
2015-11-01 23:18                     ` kbuild test robot
2015-11-01 23:27                       ` Eric Dumazet
2015-11-01 23:36                     ` [PATCH v2 " Eric Dumazet
2015-11-02 20:05                       ` Haiyang Zhang
2015-11-02 20:09                         ` Eric Dumazet
2015-11-02 20:26                           ` David Miller
2015-11-02 21:29                       ` David Miller
2015-11-03  7:59                 ` [PATCH net-next] net: increase LL_MAX_HEADER if HYPERV_NET is enabled KY Srinivasan
2015-11-03 15:33                   ` David Miller
2015-11-03 16:37                     ` Eric Dumazet
2015-11-03 17:34                       ` Haiyang Zhang
2015-11-03 18:20                         ` David Miller
2015-11-03 18:49                           ` Haiyang Zhang
2015-11-03 19:50                             ` David Miller
2015-11-03 21:00                               ` Haiyang Zhang
2015-11-03 18:09                     ` KY Srinivasan
2015-10-30 20:28       ` [patch] tcp: attach SYNACK messages to request sockets instead of listener KY Srinivasan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).