* [PATCH bpf-next 0/4] bpf, sockmap: Fix memleaks and issues of mem charge/uncharge
@ 2022-02-25 1:49 Wang Yufen
2022-02-25 1:49 ` [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg Wang Yufen
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Wang Yufen @ 2022-02-25 1:49 UTC (permalink / raw)
To: john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
This patchset fixes memleaks and incorrect charge/uncharge memory, these
issues cause the following info:
WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0
Call Trace:
<IRQ>
inet_csk_destroy_sock+0x55/0x110
tcp_rcv_state_process+0xe5f/0xe90
? sk_filter_trim_cap+0x10d/0x230
? tcp_v4_do_rcv+0x161/0x250
tcp_v4_do_rcv+0x161/0x250
tcp_v4_rcv+0xc3a/0xce0
ip_protocol_deliver_rcu+0x3d/0x230
ip_local_deliver_finish+0x54/0x60
ip_local_deliver+0xfd/0x110
? ip_protocol_deliver_rcu+0x230/0x230
ip_rcv+0xd6/0x100
? ip_local_deliver+0x110/0x110
__netif_receive_skb_one_core+0x85/0xa0
process_backlog+0xa4/0x160
__napi_poll+0x29/0x1b0
net_rx_action+0x287/0x300
__do_softirq+0xff/0x2fc
do_softirq+0x79/0x90
</IRQ>
WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Wang Yufen (4):
bpf, sockmap: Fix memleak in sk_psock_queue_msg
bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
bpf, sockmap: Fix more uncharged while msg has more_data
bpf, sockmap: Fix double uncharge the mem of sk_msg
include/linux/skmsg.h | 13 ++++---------
net/ipv4/tcp_bpf.c | 13 +++++++++----
2 files changed, 13 insertions(+), 13 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg
2022-02-25 1:49 [PATCH bpf-next 0/4] bpf, sockmap: Fix memleaks and issues of mem charge/uncharge Wang Yufen
@ 2022-02-25 1:49 ` Wang Yufen
2022-02-27 19:21 ` Cong Wang
2022-02-25 1:49 ` [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full Wang Yufen
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Wang Yufen @ 2022-02-25 1:49 UTC (permalink / raw)
To: john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
data on the ingress msg queue while tear down is trying to free it.
sk1 (redirect sk2) sk2
------------------- ---------------
tcp_bpf_sendmsg()
tcp_bpf_send_verdict()
tcp_bpf_sendmsg_redir()
bpf_tcp_ingress()
sock_map_close()
lock_sock()
lock_sock() ... blocking
sk_psock_stop
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
release_sock(sk);
lock_sock()
sk_mem_charge()
get_page()
sk_psock_queue_msg()
sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
drop_sk_msg()
release_sock()
While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
msg.
This issue can cause the following info:
WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0
Call Trace:
<IRQ>
inet_csk_destroy_sock+0x55/0x110
tcp_rcv_state_process+0xe5f/0xe90
? sk_filter_trim_cap+0x10d/0x230
? tcp_v4_do_rcv+0x161/0x250
tcp_v4_do_rcv+0x161/0x250
tcp_v4_rcv+0xc3a/0xce0
ip_protocol_deliver_rcu+0x3d/0x230
ip_local_deliver_finish+0x54/0x60
ip_local_deliver+0xfd/0x110
? ip_protocol_deliver_rcu+0x230/0x230
ip_rcv+0xd6/0x100
? ip_local_deliver+0x110/0x110
__netif_receive_skb_one_core+0x85/0xa0
process_backlog+0xa4/0x160
__napi_poll+0x29/0x1b0
net_rx_action+0x287/0x300
__do_softirq+0xff/0x2fc
do_softirq+0x79/0x90
</IRQ>
WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 9635720b7c88 ("bpf, sockmap: Fix memleak on ingress msg enqueue")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
include/linux/skmsg.h | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index fdb5375f0562..c5a2d6f50f25 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -304,21 +304,16 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
kfree_skb(skb);
}
-static inline void drop_sk_msg(struct sk_psock *psock, struct sk_msg *msg)
-{
- if (msg->skb)
- sock_drop(psock->sk, msg->skb);
- kfree(msg);
-}
-
static inline void sk_psock_queue_msg(struct sk_psock *psock,
struct sk_msg *msg)
{
spin_lock_bh(&psock->ingress_lock);
if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
list_add_tail(&msg->list, &psock->ingress_msg);
- else
- drop_sk_msg(psock, msg);
+ else {
+ sk_msg_free(psock->sk, msg);
+ kfree(msg);
+ }
spin_unlock_bh(&psock->ingress_lock);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
2022-02-25 1:49 [PATCH bpf-next 0/4] bpf, sockmap: Fix memleaks and issues of mem charge/uncharge Wang Yufen
2022-02-25 1:49 ` [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg Wang Yufen
@ 2022-02-25 1:49 ` Wang Yufen
2022-03-01 4:02 ` John Fastabend
2022-02-25 1:49 ` [PATCH bpf-next 3/4] bpf, sockmap: Fix more uncharged while msg has more_data Wang Yufen
2022-02-25 1:49 ` [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg Wang Yufen
3 siblings, 1 reply; 14+ messages in thread
From: Wang Yufen @ 2022-02-25 1:49 UTC (permalink / raw)
To: john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
If tcp_bpf_sendmsg() is running while sk msg is full, sk_msg_alloc()
returns -ENOSPC, tcp_bpf_sendmsg() goto wait for memory. If partial memory
has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is greater
than osize after sk_msg_alloc(), memleak occurs. To fix we use
sk_msg_trim() to release the allocated memory, then goto wait for memory.
This issue can cause the following info:
WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
__tcp_close+0x279/0x470
tcp_close+0x1f/0x60
inet_release+0x3f/0x80
__sock_release+0x3d/0xb0
sock_close+0x11/0x20
__fput+0x92/0x250
task_work_run+0x6a/0xa0
do_exit+0x33b/0xb60
do_group_exit+0x2f/0xa0
get_signal+0xb6/0x950
arch_do_signal_or_restart+0xac/0x2a0
exit_to_user_mode_prepare+0xa9/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
kthread+0xe6/0x110
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
net/ipv4/tcp_bpf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 9b9b02052fd3..ac9f491cc139 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -421,8 +421,10 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
osize = msg_tx->sg.size;
err = sk_msg_alloc(sk, msg_tx, msg_tx->sg.size + copy, msg_tx->sg.end - 1);
if (err) {
- if (err != -ENOSPC)
+ if (err != -ENOSPC) {
+ sk_msg_trim(sk, msg_tx, osize);
goto wait_for_memory;
+ }
enospc = true;
copy = msg_tx->sg.size - osize;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next 3/4] bpf, sockmap: Fix more uncharged while msg has more_data
2022-02-25 1:49 [PATCH bpf-next 0/4] bpf, sockmap: Fix memleaks and issues of mem charge/uncharge Wang Yufen
2022-02-25 1:49 ` [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg Wang Yufen
2022-02-25 1:49 ` [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full Wang Yufen
@ 2022-02-25 1:49 ` Wang Yufen
2022-03-01 4:20 ` John Fastabend
2022-02-25 1:49 ` [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg Wang Yufen
3 siblings, 1 reply; 14+ messages in thread
From: Wang Yufen @ 2022-02-25 1:49 UTC (permalink / raw)
To: john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
In tcp_bpf_send_verdict(), if msg has more data after
tcp_bpf_sendmsg_redir():
tcp_bpf_send_verdict()
tosend = msg->sg.size //msg->sg.size = 22220
case __SK_REDIRECT:
sk_msg_return() //uncharged msg->sg.size(22220) sk->sk_forward_alloc
tcp_bpf_sendmsg_redir() //after tcp_bpf_sendmsg_redir, msg->sg.size=11000
goto more_data;
tosend = msg->sg.size //msg->sg.size = 11000
case __SK_REDIRECT:
sk_msg_return() //uncharged msg->sg.size(11000) to sk->sk_forward_alloc
The msg->sg.size(11000) has been uncharged twice, to fix we can charge the
remaining msg->sg.size before goto more data.
This issue can cause the following info:
WARNING: CPU: 0 PID: 9860 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
__tcp_close+0x279/0x470
tcp_close+0x1f/0x60
inet_release+0x3f/0x80
__sock_release+0x3d/0xb0
sock_close+0x11/0x20
__fput+0x92/0x250
task_work_run+0x6a/0xa0
do_exit+0x33b/0xb60
do_group_exit+0x2f/0xa0
get_signal+0xb6/0x950
arch_do_signal_or_restart+0xac/0x2a0
? vfs_write+0x237/0x290
exit_to_user_mode_prepare+0xa9/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
net/ipv4/tcp_bpf.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ac9f491cc139..1f0364e06619 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -335,7 +335,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
cork = true;
psock->cork = NULL;
}
- sk_msg_return(sk, msg, tosend);
+ sk_msg_return(sk, msg, msg->sg.size);
release_sock(sk);
ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
@@ -375,8 +375,11 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
}
if (msg &&
msg->sg.data[msg->sg.start].page_link &&
- msg->sg.data[msg->sg.start].length)
+ msg->sg.data[msg->sg.start].length) {
+ if (eval == __SK_REDIRECT)
+ sk_mem_charge(sk, msg->sg.size);
goto more_data;
+ }
}
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg
2022-02-25 1:49 [PATCH bpf-next 0/4] bpf, sockmap: Fix memleaks and issues of mem charge/uncharge Wang Yufen
` (2 preceding siblings ...)
2022-02-25 1:49 ` [PATCH bpf-next 3/4] bpf, sockmap: Fix more uncharged while msg has more_data Wang Yufen
@ 2022-02-25 1:49 ` Wang Yufen
2022-03-01 4:11 ` John Fastabend
3 siblings, 1 reply; 14+ messages in thread
From: Wang Yufen @ 2022-02-25 1:49 UTC (permalink / raw)
To: john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
If tcp_bpf_sendmsg is running during a tear down operation, psock may be
freed.
tcp_bpf_sendmsg()
tcp_bpf_send_verdict()
sk_msg_return()
tcp_bpf_sendmsg_redir()
unlikely(!psock))
sk_msg_free()
The mem of msg has been uncharged in tcp_bpf_send_verdict() by
sk_msg_return(), so we need to use sk_msg_free_nocharge while psock
is null.
This issue can cause the following info:
WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
---
net/ipv4/tcp_bpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 1f0364e06619..03c037d2a055 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -139,7 +139,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
int ret;
if (unlikely(!psock)) {
- sk_msg_free(sk, msg);
+ sk_msg_free_nocharge(sk, msg);
return 0;
}
ret = ingress ? bpf_tcp_ingress(sk, psock, msg, bytes, flags) :
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg
2022-02-25 1:49 ` [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg Wang Yufen
@ 2022-02-27 19:21 ` Cong Wang
2022-03-01 1:49 ` wangyufen
0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2022-02-27 19:21 UTC (permalink / raw)
To: Wang Yufen
Cc: john.fastabend, daniel, jakub, lmb, davem, bpf, edumazet,
yoshfuji, dsahern, kuba, ast, andrii, kafai, songliubraving, yhs,
kpsingh, netdev
On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote:
> If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
> data on the ingress msg queue while tear down is trying to free it.
>
> sk1 (redirect sk2) sk2
> ------------------- ---------------
> tcp_bpf_sendmsg()
> tcp_bpf_send_verdict()
> tcp_bpf_sendmsg_redir()
> bpf_tcp_ingress()
> sock_map_close()
> lock_sock()
> lock_sock() ... blocking
> sk_psock_stop
> sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
> release_sock(sk);
> lock_sock()
> sk_mem_charge()
> get_page()
> sk_psock_queue_msg()
> sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
> drop_sk_msg()
> release_sock()
>
> While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
> and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
> msg.
>
What about the other code path? That is, sk_psock_skb_ingress_enqueue().
I don't see skmsg is charged there.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg
2022-02-27 19:21 ` Cong Wang
@ 2022-03-01 1:49 ` wangyufen
2022-03-01 3:44 ` John Fastabend
2022-03-03 0:31 ` Cong Wang
0 siblings, 2 replies; 14+ messages in thread
From: wangyufen @ 2022-03-01 1:49 UTC (permalink / raw)
To: Cong Wang
Cc: john.fastabend, daniel, jakub, lmb, davem, bpf, edumazet,
yoshfuji, dsahern, kuba, ast, andrii, kafai, songliubraving, yhs,
kpsingh, netdev
在 2022/2/28 3:21, Cong Wang 写道:
> On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote:
>> If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
>> data on the ingress msg queue while tear down is trying to free it.
>>
>> sk1 (redirect sk2) sk2
>> ------------------- ---------------
>> tcp_bpf_sendmsg()
>> tcp_bpf_send_verdict()
>> tcp_bpf_sendmsg_redir()
>> bpf_tcp_ingress()
>> sock_map_close()
>> lock_sock()
>> lock_sock() ... blocking
>> sk_psock_stop
>> sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
>> release_sock(sk);
>> lock_sock()
>> sk_mem_charge()
>> get_page()
>> sk_psock_queue_msg()
>> sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
>> drop_sk_msg()
>> release_sock()
>>
>> While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
>> and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
>> msg.
>>
> What about the other code path? That is, sk_psock_skb_ingress_enqueue().
> I don't see skmsg is charged there.
sk_psock_skb_ingress_self() | sk_psock_skb_ingress()
skb_set_owner_r()
sk_mem_charge()
sk_psock_skb_ingress_enqueue()
The other code path skmsg is charged by skb_set_owner_r()->sk_mem_charge()
>
> Thanks.
> .
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg
2022-03-01 1:49 ` wangyufen
@ 2022-03-01 3:44 ` John Fastabend
2022-03-03 0:31 ` Cong Wang
1 sibling, 0 replies; 14+ messages in thread
From: John Fastabend @ 2022-03-01 3:44 UTC (permalink / raw)
To: wangyufen, Cong Wang
Cc: john.fastabend, daniel, jakub, lmb, davem, bpf, edumazet,
yoshfuji, dsahern, kuba, ast, andrii, kafai, songliubraving, yhs,
kpsingh, netdev
wangyufen wrote:
>
> 在 2022/2/28 3:21, Cong Wang 写道:
> > On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote:
> >> If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
> >> data on the ingress msg queue while tear down is trying to free it.
> >>
> >> sk1 (redirect sk2) sk2
> >> ------------------- ---------------
> >> tcp_bpf_sendmsg()
> >> tcp_bpf_send_verdict()
> >> tcp_bpf_sendmsg_redir()
> >> bpf_tcp_ingress()
> >> sock_map_close()
> >> lock_sock()
> >> lock_sock() ... blocking
> >> sk_psock_stop
> >> sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
> >> release_sock(sk);
> >> lock_sock()
> >> sk_mem_charge()
> >> get_page()
> >> sk_psock_queue_msg()
> >> sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
> >> drop_sk_msg()
> >> release_sock()
> >>
> >> While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
> >> and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
> >> msg.
> >>
> > What about the other code path? That is, sk_psock_skb_ingress_enqueue().
> > I don't see skmsg is charged there.
>
> sk_psock_skb_ingress_self() | sk_psock_skb_ingress()
> skb_set_owner_r()
> sk_mem_charge()
> sk_psock_skb_ingress_enqueue()
>
> The other code path skmsg is charged by skb_set_owner_r()->sk_mem_charge()
>
> >
> > Thanks.
> > .
I walked that code and fix LGTM as well.
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
2022-02-25 1:49 ` [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full Wang Yufen
@ 2022-03-01 4:02 ` John Fastabend
2022-03-01 7:05 ` wangyufen
0 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2022-03-01 4:02 UTC (permalink / raw)
To: Wang Yufen, john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
Wang Yufen wrote:
> If tcp_bpf_sendmsg() is running while sk msg is full, sk_msg_alloc()
> returns -ENOSPC, tcp_bpf_sendmsg() goto wait for memory. If partial memory
> has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is greater
> than osize after sk_msg_alloc(), memleak occurs. To fix we use
> sk_msg_trim() to release the allocated memory, then goto wait for memory.
Small nit, "sk_msg_alloc() returns -ENOSPC" should be something like, "when
sk_msg_alloc() returns -ENOMEM error,..." That error path is from ENOMEM not
the ENOSPC.
But nice find thanks! I think we might have seen this in a couple cases on
our side as well.
>
> This issue can cause the following info:
> WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
> Call Trace:
> <TASK>
> inet_csk_destroy_sock+0x55/0x110
> __tcp_close+0x279/0x470
> tcp_close+0x1f/0x60
> inet_release+0x3f/0x80
> __sock_release+0x3d/0xb0
> sock_close+0x11/0x20
> __fput+0x92/0x250
> task_work_run+0x6a/0xa0
> do_exit+0x33b/0xb60
> do_group_exit+0x2f/0xa0
> get_signal+0xb6/0x950
> arch_do_signal_or_restart+0xac/0x2a0
> exit_to_user_mode_prepare+0xa9/0x200
> syscall_exit_to_user_mode+0x12/0x30
> do_syscall_64+0x46/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> </TASK>
>
> WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
> Call Trace:
> <TASK>
> __sk_destruct+0x24/0x1f0
> sk_psock_destroy+0x19b/0x1c0
> process_one_work+0x1b3/0x3c0
> kthread+0xe6/0x110
> ret_from_fork+0x22/0x30
> </TASK>
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
> net/ipv4/tcp_bpf.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 9b9b02052fd3..ac9f491cc139 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -421,8 +421,10 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> osize = msg_tx->sg.size;
> err = sk_msg_alloc(sk, msg_tx, msg_tx->sg.size + copy, msg_tx->sg.end - 1);
> if (err) {
> - if (err != -ENOSPC)
> + if (err != -ENOSPC) {
> + sk_msg_trim(sk, msg_tx, osize);
> goto wait_for_memory;
> + }
> enospc = true;
> copy = msg_tx->sg.size - osize;
> }
> --
> 2.25.1
>
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg
2022-02-25 1:49 ` [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg Wang Yufen
@ 2022-03-01 4:11 ` John Fastabend
2022-03-01 7:24 ` wangyufen
0 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2022-03-01 4:11 UTC (permalink / raw)
To: Wang Yufen, john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
Wang Yufen wrote:
> If tcp_bpf_sendmsg is running during a tear down operation, psock may be
> freed.
>
> tcp_bpf_sendmsg()
> tcp_bpf_send_verdict()
> sk_msg_return()
> tcp_bpf_sendmsg_redir()
> unlikely(!psock))
> sk_msg_free()
>
> The mem of msg has been uncharged in tcp_bpf_send_verdict() by
> sk_msg_return(), so we need to use sk_msg_free_nocharge while psock
> is null.
>
> This issue can cause the following info:
> WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
> Call Trace:
> <TASK>
> __sk_destruct+0x24/0x1f0
> sk_psock_destroy+0x19b/0x1c0
> process_one_work+0x1b3/0x3c0
> worker_thread+0x30/0x350
> ? process_one_work+0x3c0/0x3c0
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x22/0x30
> </TASK>
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
> net/ipv4/tcp_bpf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 1f0364e06619..03c037d2a055 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -139,7 +139,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
> int ret;
>
> if (unlikely(!psock)) {
> - sk_msg_free(sk, msg);
> + sk_msg_free_nocharge(sk, msg);
> return 0;
> }
> ret = ingress ? bpf_tcp_ingress(sk, psock, msg, bytes, flags) :
Did you consider simply returning an error code here? This would then
trigger the sk_msg_free_nocharge in the error path of __SK_REDIRECT
and would have the side effect of throwing an error up to user space.
This would be a slight change in behavior from user side but would
look the same as an error if the redirect on the socket threw an
error so I think it would be OK.
Thanks,
John
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH bpf-next 3/4] bpf, sockmap: Fix more uncharged while msg has more_data
2022-02-25 1:49 ` [PATCH bpf-next 3/4] bpf, sockmap: Fix more uncharged while msg has more_data Wang Yufen
@ 2022-03-01 4:20 ` John Fastabend
0 siblings, 0 replies; 14+ messages in thread
From: John Fastabend @ 2022-03-01 4:20 UTC (permalink / raw)
To: Wang Yufen, john.fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev, Wang Yufen
Wang Yufen wrote:
> In tcp_bpf_send_verdict(), if msg has more data after
> tcp_bpf_sendmsg_redir():
>
> tcp_bpf_send_verdict()
> tosend = msg->sg.size //msg->sg.size = 22220
> case __SK_REDIRECT:
> sk_msg_return() //uncharged msg->sg.size(22220) sk->sk_forward_alloc
> tcp_bpf_sendmsg_redir() //after tcp_bpf_sendmsg_redir, msg->sg.size=11000
> goto more_data;
> tosend = msg->sg.size //msg->sg.size = 11000
> case __SK_REDIRECT:
> sk_msg_return() //uncharged msg->sg.size(11000) to sk->sk_forward_alloc
>
> The msg->sg.size(11000) has been uncharged twice, to fix we can charge the
> remaining msg->sg.size before goto more data.
>
> This issue can cause the following info:
> WARNING: CPU: 0 PID: 9860 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
> Call Trace:
> <TASK>
> inet_csk_destroy_sock+0x55/0x110
> __tcp_close+0x279/0x470
> tcp_close+0x1f/0x60
> inet_release+0x3f/0x80
> __sock_release+0x3d/0xb0
> sock_close+0x11/0x20
> __fput+0x92/0x250
> task_work_run+0x6a/0xa0
> do_exit+0x33b/0xb60
> do_group_exit+0x2f/0xa0
> get_signal+0xb6/0x950
> arch_do_signal_or_restart+0xac/0x2a0
> ? vfs_write+0x237/0x290
> exit_to_user_mode_prepare+0xa9/0x200
> syscall_exit_to_user_mode+0x12/0x30
> do_syscall_64+0x46/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> </TASK>
>
> WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
> Call Trace:
> <TASK>
> __sk_destruct+0x24/0x1f0
> sk_psock_destroy+0x19b/0x1c0
> process_one_work+0x1b3/0x3c0
> worker_thread+0x30/0x350
> ? process_one_work+0x3c0/0x3c0
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x22/0x30
> </TASK>
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
> ---
LGTM also fixes another charge error when going through the error path with
apply set where it looks like we would have left some bytes charged to the
socket.
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full
2022-03-01 4:02 ` John Fastabend
@ 2022-03-01 7:05 ` wangyufen
0 siblings, 0 replies; 14+ messages in thread
From: wangyufen @ 2022-03-01 7:05 UTC (permalink / raw)
To: John Fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev
在 2022/3/1 12:02, John Fastabend 写道:
> Wang Yufen wrote:
>> If tcp_bpf_sendmsg() is running while sk msg is full, sk_msg_alloc()
>> returns -ENOSPC, tcp_bpf_sendmsg() goto wait for memory. If partial memory
>> has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is greater
>> than osize after sk_msg_alloc(), memleak occurs. To fix we use
>> sk_msg_trim() to release the allocated memory, then goto wait for memory.
> Small nit, "sk_msg_alloc() returns -ENOSPC" should be something like, "when
> sk_msg_alloc() returns -ENOMEM error,..." That error path is from ENOMEM not
> the ENOSPC.
Thanks, I will fix in v2.
>
> But nice find thanks! I think we might have seen this in a couple cases on
> our side as well.
>
>> This issue can cause the following info:
>> WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
>> Call Trace:
>> <TASK>
>> inet_csk_destroy_sock+0x55/0x110
>> __tcp_close+0x279/0x470
>> tcp_close+0x1f/0x60
>> inet_release+0x3f/0x80
>> __sock_release+0x3d/0xb0
>> sock_close+0x11/0x20
>> __fput+0x92/0x250
>> task_work_run+0x6a/0xa0
>> do_exit+0x33b/0xb60
>> do_group_exit+0x2f/0xa0
>> get_signal+0xb6/0x950
>> arch_do_signal_or_restart+0xac/0x2a0
>> exit_to_user_mode_prepare+0xa9/0x200
>> syscall_exit_to_user_mode+0x12/0x30
>> do_syscall_64+0x46/0x80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>> </TASK>
>>
>> WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
>> Call Trace:
>> <TASK>
>> __sk_destruct+0x24/0x1f0
>> sk_psock_destroy+0x19b/0x1c0
>> process_one_work+0x1b3/0x3c0
>> kthread+0xe6/0x110
>> ret_from_fork+0x22/0x30
>> </TASK>
>>
>> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>> ---
>> net/ipv4/tcp_bpf.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>> index 9b9b02052fd3..ac9f491cc139 100644
>> --- a/net/ipv4/tcp_bpf.c
>> +++ b/net/ipv4/tcp_bpf.c
>> @@ -421,8 +421,10 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>> osize = msg_tx->sg.size;
>> err = sk_msg_alloc(sk, msg_tx, msg_tx->sg.size + copy, msg_tx->sg.end - 1);
>> if (err) {
>> - if (err != -ENOSPC)
>> + if (err != -ENOSPC) {
>> + sk_msg_trim(sk, msg_tx, osize);
>> goto wait_for_memory;
>> + }
>> enospc = true;
>> copy = msg_tx->sg.size - osize;
>> }
>> --
>> 2.25.1
>>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> .
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg
2022-03-01 4:11 ` John Fastabend
@ 2022-03-01 7:24 ` wangyufen
0 siblings, 0 replies; 14+ messages in thread
From: wangyufen @ 2022-03-01 7:24 UTC (permalink / raw)
To: John Fastabend, daniel, jakub, lmb, davem, bpf
Cc: edumazet, yoshfuji, dsahern, kuba, ast, andrii, kafai,
songliubraving, yhs, kpsingh, netdev
在 2022/3/1 12:11, John Fastabend 写道:
> Wang Yufen wrote:
>> If tcp_bpf_sendmsg is running during a tear down operation, psock may be
>> freed.
>>
>> tcp_bpf_sendmsg()
>> tcp_bpf_send_verdict()
>> sk_msg_return()
>> tcp_bpf_sendmsg_redir()
>> unlikely(!psock))
>> sk_msg_free()
>>
>> The mem of msg has been uncharged in tcp_bpf_send_verdict() by
>> sk_msg_return(), so we need to use sk_msg_free_nocharge while psock
>> is null.
>>
>> This issue can cause the following info:
>> WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
>> Call Trace:
>> <TASK>
>> __sk_destruct+0x24/0x1f0
>> sk_psock_destroy+0x19b/0x1c0
>> process_one_work+0x1b3/0x3c0
>> worker_thread+0x30/0x350
>> ? process_one_work+0x3c0/0x3c0
>> kthread+0xe6/0x110
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x22/0x30
>> </TASK>
>>
>> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
>> Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>> ---
>> net/ipv4/tcp_bpf.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>> index 1f0364e06619..03c037d2a055 100644
>> --- a/net/ipv4/tcp_bpf.c
>> +++ b/net/ipv4/tcp_bpf.c
>> @@ -139,7 +139,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
>> int ret;
>>
>> if (unlikely(!psock)) {
>> - sk_msg_free(sk, msg);
>> + sk_msg_free_nocharge(sk, msg);
>> return 0;
>> }
>> ret = ingress ? bpf_tcp_ingress(sk, psock, msg, bytes, flags) :
> Did you consider simply returning an error code here? This would then
> trigger the sk_msg_free_nocharge in the error path of __SK_REDIRECT
> and would have the side effect of throwing an error up to user space.
> This would be a slight change in behavior from user side but would
> look the same as an error if the redirect on the socket threw an
> error so I think it would be OK.
Yes, I think it would be better to return -EPIPE, will do in v2.
Thanks.
>
> Thanks,
> John
> .
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg
2022-03-01 1:49 ` wangyufen
2022-03-01 3:44 ` John Fastabend
@ 2022-03-03 0:31 ` Cong Wang
1 sibling, 0 replies; 14+ messages in thread
From: Cong Wang @ 2022-03-03 0:31 UTC (permalink / raw)
To: wangyufen
Cc: john.fastabend, daniel, jakub, lmb, davem, bpf, edumazet,
yoshfuji, dsahern, kuba, ast, andrii, kafai, songliubraving, yhs,
kpsingh, netdev
On Tue, Mar 01, 2022 at 09:49:12AM +0800, wangyufen wrote:
>
> 在 2022/2/28 3:21, Cong Wang 写道:
> > On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote:
> > > If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
> > > data on the ingress msg queue while tear down is trying to free it.
> > >
> > > sk1 (redirect sk2) sk2
> > > ------------------- ---------------
> > > tcp_bpf_sendmsg()
> > > tcp_bpf_send_verdict()
> > > tcp_bpf_sendmsg_redir()
> > > bpf_tcp_ingress()
> > > sock_map_close()
> > > lock_sock()
> > > lock_sock() ... blocking
> > > sk_psock_stop
> > > sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
> > > release_sock(sk);
> > > lock_sock()
> > > sk_mem_charge()
> > > get_page()
> > > sk_psock_queue_msg()
> > > sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
> > > drop_sk_msg()
> > > release_sock()
> > >
> > > While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
> > > and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
> > > msg.
> > >
> > What about the other code path? That is, sk_psock_skb_ingress_enqueue().
> > I don't see skmsg is charged there.
>
> sk_psock_skb_ingress_self() | sk_psock_skb_ingress()
> skb_set_owner_r()
> sk_mem_charge()
> sk_psock_skb_ingress_enqueue()
>
> The other code path skmsg is charged by skb_set_owner_r()->sk_mem_charge()
>
skb_set_owner_r() charges skb, I was asking skmsg. ;) In
sk_psock_skb_ingress_enqueue(), the skmsg was initialized but not
actually charged, hence I was asking... From a second look, it seems
sk_mem_uncharge() is not called for sk_psock_skb_ingress_enqueue() where
msg->skb is clearly not NULL.
Also, you introduce an unnecessary sk_msg_init() from __sk_msg_free(),
because you call kfree(msg) after it.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-03-03 0:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 1:49 [PATCH bpf-next 0/4] bpf, sockmap: Fix memleaks and issues of mem charge/uncharge Wang Yufen
2022-02-25 1:49 ` [PATCH bpf-next 1/4] bpf, sockmap: Fix memleak in sk_psock_queue_msg Wang Yufen
2022-02-27 19:21 ` Cong Wang
2022-03-01 1:49 ` wangyufen
2022-03-01 3:44 ` John Fastabend
2022-03-03 0:31 ` Cong Wang
2022-02-25 1:49 ` [PATCH bpf-next 2/4] bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full Wang Yufen
2022-03-01 4:02 ` John Fastabend
2022-03-01 7:05 ` wangyufen
2022-02-25 1:49 ` [PATCH bpf-next 3/4] bpf, sockmap: Fix more uncharged while msg has more_data Wang Yufen
2022-03-01 4:20 ` John Fastabend
2022-02-25 1:49 ` [PATCH bpf-next 4/4] bpf, sockmap: Fix double uncharge the mem of sk_msg Wang Yufen
2022-03-01 4:11 ` John Fastabend
2022-03-01 7:24 ` wangyufen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).