* [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed @ 2022-03-14 12:44 Wang Yufen 2022-03-14 15:30 ` Jakub Sitnicki 0 siblings, 1 reply; 9+ messages in thread From: Wang Yufen @ 2022-03-14 12:44 UTC (permalink / raw) To: ast, john.fastabend, daniel, jakub, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh Cc: netdev, bpf, Wang Yufen A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete the sockmap element, the tcp socket will switch to use the TCP protocol stack to send and receive packets. The switching process may cause some issues, such as if some msgs exist in the ingress queue and are cleared by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. Signed-off-by: Wang Yufen <wangyufen@huawei.com> --- include/uapi/linux/bpf.h | 3 +++ kernel/bpf/syscall.c | 2 ++ net/core/sock_map.c | 3 +++ 3 files changed, 8 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4eebea830613..1dab090f271c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1218,6 +1218,9 @@ enum { /* Create a map that is suitable to be an inner map with dynamic max entries */ BPF_F_INNER_MAP = (1U << 12), + +/* This should only be used for bpf_map_delete_elem called by user. */ + BPF_F_TCP_SOCKMAP = (1U << 13), }; /* Flags for BPF_PROG_QUERY. */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index db402ebc5570..57aa98087322 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1232,7 +1232,9 @@ static int map_delete_elem(union bpf_attr *attr) bpf_disable_instrumentation(); rcu_read_lock(); + map->map_flags |= BPF_F_TCP_SOCKMAP; err = map->ops->map_delete_elem(map, key); + map->map_flags &= ~BPF_F_TCP_SOCKMAP; rcu_read_unlock(); bpf_enable_instrumentation(); maybe_wait_bpf_programs(map); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 2d213c4011db..5b90a35d1d23 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -914,6 +914,9 @@ static int sock_hash_delete_elem(struct bpf_map *map, void *key) struct bpf_shtab_elem *elem; int ret = -ENOENT; + if (map->map_flags & BPF_F_TCP_SOCKMAP) + return -EOPNOTSUPP; + hash = sock_hash_bucket_hash(key, key_size); bucket = sock_hash_select_bucket(htab, hash); -- 2.25.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed 2022-03-14 12:44 [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed Wang Yufen @ 2022-03-14 15:30 ` Jakub Sitnicki 2022-03-15 7:24 ` wangyufen 0 siblings, 1 reply; 9+ messages in thread From: Jakub Sitnicki @ 2022-03-14 15:30 UTC (permalink / raw) To: Wang Yufen Cc: ast, john.fastabend, daniel, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: > A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete > the sockmap element, the tcp socket will switch to use the TCP protocol > stack to send and receive packets. The switching process may cause some > issues, such as if some msgs exist in the ingress queue and are cleared > by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. > > Signed-off-by: Wang Yufen <wangyufen@huawei.com> > --- Can you please tell us a bit more about the life-cycle of the socket in your workload? Questions that come to mind: 1) What triggers the removal of the socket from sockmap in your case? 2) Would it still be a problem if removal from sockmap did not cause any packets to get dropped? [...] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed 2022-03-14 15:30 ` Jakub Sitnicki @ 2022-03-15 7:24 ` wangyufen 2022-03-15 12:12 ` Jakub Sitnicki 0 siblings, 1 reply; 9+ messages in thread From: wangyufen @ 2022-03-15 7:24 UTC (permalink / raw) To: Jakub Sitnicki Cc: ast, john.fastabend, daniel, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf 在 2022/3/14 23:30, Jakub Sitnicki 写道: > On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: >> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete >> the sockmap element, the tcp socket will switch to use the TCP protocol >> stack to send and receive packets. The switching process may cause some >> issues, such as if some msgs exist in the ingress queue and are cleared >> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. >> >> Signed-off-by: Wang Yufen <wangyufen@huawei.com> >> --- > Can you please tell us a bit more about the life-cycle of the socket in > your workload? Questions that come to mind: > > 1) What triggers the removal of the socket from sockmap in your case? We use sk_msg to redirect with sock hash, like this: skA redirect skB Tx <-----------> skB,Rx And construct a scenario where the packet sending speed is high, the packet receiving speed is slow, so the packets are stacked in the ingress queue on the receiving side. In this case, if run bpf_map_delete_elem() to delete the sockmap entry, will trigger the following procedure: sock_hash_delete_elem() sock_map_unref() sk_psock_put() sk_psock_drop() sk_psock_stop() __sk_psock_zap_ingress() __sk_psock_purge_ingress_msg() > 2) Would it still be a problem if removal from sockmap did not cause any > packets to get dropped? Yes, it still be a problem. If removal from sockmap did not cause any packets to get dropped, packet receiving process switches to use TCP protocol stack. The packets in the psock ingress queue cannot be received by the user. Thanks. > > [...] > . ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed 2022-03-15 7:24 ` wangyufen @ 2022-03-15 12:12 ` Jakub Sitnicki 2022-03-15 16:25 ` Daniel Borkmann ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Jakub Sitnicki @ 2022-03-15 12:12 UTC (permalink / raw) To: wangyufen Cc: ast, john.fastabend, daniel, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: > 在 2022/3/14 23:30, Jakub Sitnicki 写道: >> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: >>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete >>> the sockmap element, the tcp socket will switch to use the TCP protocol >>> stack to send and receive packets. The switching process may cause some >>> issues, such as if some msgs exist in the ingress queue and are cleared >>> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. >>> >>> Signed-off-by: Wang Yufen <wangyufen@huawei.com> >>> --- >> Can you please tell us a bit more about the life-cycle of the socket in >> your workload? Questions that come to mind: >> >> 1) What triggers the removal of the socket from sockmap in your case? > We use sk_msg to redirect with sock hash, like this: > > skA redirect skB > Tx <-----------> skB,Rx > > And construct a scenario where the packet sending speed is high, the > packet receiving speed is slow, so the packets are stacked in the ingress > queue on the receiving side. In this case, if run bpf_map_delete_elem() to > delete the sockmap entry, will trigger the following procedure: > > sock_hash_delete_elem() > sock_map_unref() > sk_psock_put() > sk_psock_drop() > sk_psock_stop() > __sk_psock_zap_ingress() > __sk_psock_purge_ingress_msg() > >> 2) Would it still be a problem if removal from sockmap did not cause any >> packets to get dropped? > Yes, it still be a problem. If removal from sockmap did not cause any > packets to get dropped, packet receiving process switches to use TCP > protocol stack. The packets in the psock ingress queue cannot be received > > by the user. Thanks for the context. So, if I understand correctly, you want to avoid breaking the network pipe by updating the sockmap from user-space. This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed 2022-03-15 12:12 ` Jakub Sitnicki @ 2022-03-15 16:25 ` Daniel Borkmann [not found] ` <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com> 2022-03-16 0:36 ` Cong Wang 2022-03-16 3:25 ` wangyufen 2 siblings, 1 reply; 9+ messages in thread From: Daniel Borkmann @ 2022-03-15 16:25 UTC (permalink / raw) To: Jakub Sitnicki, wangyufen Cc: ast, john.fastabend, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf On 3/15/22 1:12 PM, Jakub Sitnicki wrote: > On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: >> 在 2022/3/14 23:30, Jakub Sitnicki 写道: >>> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: >>>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete >>>> the sockmap element, the tcp socket will switch to use the TCP protocol >>>> stack to send and receive packets. The switching process may cause some >>>> issues, such as if some msgs exist in the ingress queue and are cleared >>>> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. >>>> >>>> Signed-off-by: Wang Yufen <wangyufen@huawei.com> >>>> --- >>> Can you please tell us a bit more about the life-cycle of the socket in >>> your workload? Questions that come to mind: >>> >>> 1) What triggers the removal of the socket from sockmap in your case? >> We use sk_msg to redirect with sock hash, like this: >> >> skA redirect skB >> Tx <-----------> skB,Rx >> >> And construct a scenario where the packet sending speed is high, the >> packet receiving speed is slow, so the packets are stacked in the ingress >> queue on the receiving side. In this case, if run bpf_map_delete_elem() to >> delete the sockmap entry, will trigger the following procedure: >> >> sock_hash_delete_elem() >> sock_map_unref() >> sk_psock_put() >> sk_psock_drop() >> sk_psock_stop() >> __sk_psock_zap_ingress() >> __sk_psock_purge_ingress_msg() >> >>> 2) Would it still be a problem if removal from sockmap did not cause any >>> packets to get dropped? >> Yes, it still be a problem. If removal from sockmap did not cause any >> packets to get dropped, packet receiving process switches to use TCP >> protocol stack. The packets in the psock ingress queue cannot be received >> >> by the user. > > Thanks for the context. So, if I understand correctly, you want to avoid > breaking the network pipe by updating the sockmap from user-space. > > This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? +1 Aside from that, the patch as-is also fails BPF CI in a lot of places, please make sure to check selftests: https://github.com/kernel-patches/bpf/runs/5537367301?check_suite_focus=true [...] #145/73 sockmap_listen/sockmap IPv6 test_udp_redir:OK #145/74 sockmap_listen/sockmap IPv6 test_udp_unix_redir:OK #145/75 sockmap_listen/sockmap Unix test_unix_redir:OK #145/76 sockmap_listen/sockmap Unix test_unix_redir:OK ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 #145/77 sockmap_listen/sockhash IPv4 TCP test_insert_invalid:FAIL ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 #145/78 sockmap_listen/sockhash IPv4 TCP test_insert_opened:FAIL ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 #145/79 sockmap_listen/sockhash IPv4 TCP test_insert_bound:FAIL ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 ./test_progs:test_ops_cleanup:1424: map_delete: expected EINVAL/ENOENT: Operation not supported test_ops_cleanup:FAIL:1424 [...] Thanks, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com>]
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed [not found] ` <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com> @ 2022-03-16 5:23 ` John Fastabend 2022-03-16 14:57 ` Jakub Sitnicki 1 sibling, 0 replies; 9+ messages in thread From: John Fastabend @ 2022-03-16 5:23 UTC (permalink / raw) To: wangyufen, Daniel Borkmann, Jakub Sitnicki Cc: ast, john.fastabend, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf wangyufen wrote: > > 在 2022/3/16 0:25, Daniel Borkmann 写道: > > On 3/15/22 1:12 PM, Jakub Sitnicki wrote: > >> On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: > >>> 在 2022/3/14 23:30, Jakub Sitnicki 写道: > >>>> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: > >>>>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to > >>>>> delete > >>>>> the sockmap element, the tcp socket will switch to use the TCP > >>>>> protocol > >>>>> stack to send and receive packets. The switching process may cause > >>>>> some > >>>>> issues, such as if some msgs exist in the ingress queue and are > >>>>> cleared > >>>>> by sk_psock_drop(), the packets are lost, and the tcp data is > >>>>> abnormal. > >>>>> > >>>>> Signed-off-by: Wang Yufen <wangyufen@huawei.com> > >>>>> --- > >>>> Can you please tell us a bit more about the life-cycle of the > >>>> socket in > >>>> your workload? Questions that come to mind: > >>>> > >>>> 1) What triggers the removal of the socket from sockmap in your case? > >>> We use sk_msg to redirect with sock hash, like this: > >>> > >>> skA redirect skB > >>> Tx <-----------> skB,Rx > >>> > >>> And construct a scenario where the packet sending speed is high, the > >>> packet receiving speed is slow, so the packets are stacked in the > >>> ingress > >>> queue on the receiving side. In this case, if run > >>> bpf_map_delete_elem() to > >>> delete the sockmap entry, will trigger the following procedure: > >>> > >>> sock_hash_delete_elem() > >>> sock_map_unref() > >>> sk_psock_put() > >>> sk_psock_drop() > >>> sk_psock_stop() > >>> __sk_psock_zap_ingress() > >>> __sk_psock_purge_ingress_msg() > >>> > >>>> 2) Would it still be a problem if removal from sockmap did not > >>>> cause any > >>>> packets to get dropped? > >>> Yes, it still be a problem. If removal from sockmap did not cause any > >>> packets to get dropped, packet receiving process switches to use TCP > >>> protocol stack. The packets in the psock ingress queue cannot be > >>> received > >>> > >>> by the user. > >> > >> Thanks for the context. So, if I understand correctly, you want to avoid > >> breaking the network pipe by updating the sockmap from user-space. > >> > >> This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? > > > > +1 > > > > Aside from that, the patch as-is also fails BPF CI in a lot of places, > > please > > make sure to check selftests: > > > > https://github.com/kernel-patches/bpf/runs/5537367301?check_suite_focus=true > > > > > > [...] > > #145/73 sockmap_listen/sockmap IPv6 test_udp_redir:OK > > #145/74 sockmap_listen/sockmap IPv6 test_udp_unix_redir:OK > > #145/75 sockmap_listen/sockmap Unix test_unix_redir:OK > > #145/76 sockmap_listen/sockmap Unix test_unix_redir:OK > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > #145/77 sockmap_listen/sockhash IPv4 TCP test_insert_invalid:FAIL > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > #145/78 sockmap_listen/sockhash IPv4 TCP test_insert_opened:FAIL > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > #145/79 sockmap_listen/sockhash IPv4 TCP test_insert_bound:FAIL > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > ./test_progs:test_ops_cleanup:1424: map_delete: expected > > EINVAL/ENOENT: Operation not supported > > test_ops_cleanup:FAIL:1424 > > [...] > > > > Thanks, > > Daniel > > . > > I'm not sure about this patch. The main purpose is to point out the > possible problems > > when the socket is deleted from the map.I'm sorry for the trouble. > > Thanks. If you want to delete a socket you should flush it first. To do this stop redirecting traffic to it and then read all the data out. At the moment its a bit tricky to know when the recieving socket is empty though. Adding a flag on delete to only delete when the ingress qlen == 0 might be a possibility if you need delete to work and are trying to work out how to safely delete sockets. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed [not found] ` <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com> 2022-03-16 5:23 ` John Fastabend @ 2022-03-16 14:57 ` Jakub Sitnicki 1 sibling, 0 replies; 9+ messages in thread From: Jakub Sitnicki @ 2022-03-16 14:57 UTC (permalink / raw) To: wangyufen Cc: Daniel Borkmann, ast, john.fastabend, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf On Wed, Mar 16, 2022 at 11:42 AM +08, wangyufen wrote: [...] > I'm not sure about this patch. The main purpose is to point out the possible problems > > when the socket is deleted from the map. I'm sorry for the trouble. No problem at all. Happy to see sockmap gaining wider adoption. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed 2022-03-15 12:12 ` Jakub Sitnicki 2022-03-15 16:25 ` Daniel Borkmann @ 2022-03-16 0:36 ` Cong Wang 2022-03-16 3:25 ` wangyufen 2 siblings, 0 replies; 9+ messages in thread From: Cong Wang @ 2022-03-16 0:36 UTC (permalink / raw) To: Jakub Sitnicki Cc: wangyufen, ast, john.fastabend, daniel, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf On Tue, Mar 15, 2022 at 01:12:08PM +0100, Jakub Sitnicki wrote: > On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: > > 在 2022/3/14 23:30, Jakub Sitnicki 写道: > >> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: > >>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete > >>> the sockmap element, the tcp socket will switch to use the TCP protocol > >>> stack to send and receive packets. The switching process may cause some > >>> issues, such as if some msgs exist in the ingress queue and are cleared > >>> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. > >>> > >>> Signed-off-by: Wang Yufen <wangyufen@huawei.com> > >>> --- > >> Can you please tell us a bit more about the life-cycle of the socket in > >> your workload? Questions that come to mind: > >> > >> 1) What triggers the removal of the socket from sockmap in your case? > > We use sk_msg to redirect with sock hash, like this: > > > > skA redirect skB > > Tx <-----------> skB,Rx > > > > And construct a scenario where the packet sending speed is high, the > > packet receiving speed is slow, so the packets are stacked in the ingress > > queue on the receiving side. In this case, if run bpf_map_delete_elem() to > > delete the sockmap entry, will trigger the following procedure: > > > > sock_hash_delete_elem() > > sock_map_unref() > > sk_psock_put() > > sk_psock_drop() > > sk_psock_stop() > > __sk_psock_zap_ingress() > > __sk_psock_purge_ingress_msg() > > > >> 2) Would it still be a problem if removal from sockmap did not cause any > >> packets to get dropped? > > Yes, it still be a problem. If removal from sockmap did not cause any > > packets to get dropped, packet receiving process switches to use TCP > > protocol stack. The packets in the psock ingress queue cannot be received > > > > by the user. > > Thanks for the context. So, if I understand correctly, you want to avoid > breaking the network pipe by updating the sockmap from user-space. > > This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? Doesn't BPF_MAP_FREEZE only freeze write operations from syscalls? For sockmap, receiving packets is not a part of map write operation. The problem here is that skmsg can only be consumed when the socket is still in the map, as it uses a separate queue and a separate type of message (skmsg vs. skb). So, esstentially this behavior is by design. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed 2022-03-15 12:12 ` Jakub Sitnicki 2022-03-15 16:25 ` Daniel Borkmann 2022-03-16 0:36 ` Cong Wang @ 2022-03-16 3:25 ` wangyufen 2 siblings, 0 replies; 9+ messages in thread From: wangyufen @ 2022-03-16 3:25 UTC (permalink / raw) To: Jakub Sitnicki Cc: ast, john.fastabend, daniel, lmb, davem, kafai, dsahern, kuba, songliubraving, yhs, kpsingh, netdev, bpf 在 2022/3/15 20:12, Jakub Sitnicki 写道: > On Tue, Mar 15, 2022 at 03:24 PM +08, wangyufen wrote: >> 在 2022/3/14 23:30, Jakub Sitnicki 写道: >>> On Mon, Mar 14, 2022 at 08:44 PM +08, Wang Yufen wrote: >>>> A tcp socket in a sockmap. If user invokes bpf_map_delete_elem to delete >>>> the sockmap element, the tcp socket will switch to use the TCP protocol >>>> stack to send and receive packets. The switching process may cause some >>>> issues, such as if some msgs exist in the ingress queue and are cleared >>>> by sk_psock_drop(), the packets are lost, and the tcp data is abnormal. >>>> >>>> Signed-off-by: Wang Yufen <wangyufen@huawei.com> >>>> --- >>> Can you please tell us a bit more about the life-cycle of the socket in >>> your workload? Questions that come to mind: >>> >>> 1) What triggers the removal of the socket from sockmap in your case? >> We use sk_msg to redirect with sock hash, like this: >> >> skA redirect skB >> Tx <-----------> skB,Rx >> >> And construct a scenario where the packet sending speed is high, the >> packet receiving speed is slow, so the packets are stacked in the ingress >> queue on the receiving side. In this case, if run bpf_map_delete_elem() to >> delete the sockmap entry, will trigger the following procedure: >> >> sock_hash_delete_elem() >> sock_map_unref() >> sk_psock_put() >> sk_psock_drop() >> sk_psock_stop() >> __sk_psock_zap_ingress() >> __sk_psock_purge_ingress_msg() >> >>> 2) Would it still be a problem if removal from sockmap did not cause any >>> packets to get dropped? >> Yes, it still be a problem. If removal from sockmap did not cause any >> packets to get dropped, packet receiving process switches to use TCP >> protocol stack. The packets in the psock ingress queue cannot be received >> >> by the user. > Thanks for the context. So, if I understand correctly, you want to avoid > breaking the network pipe by updating the sockmap from user-space. > > This sounds awfully similar to BPF_MAP_FREEZE. Have you considered that? > . Sorry, I didn't notice this. I used BPF_MAP_FREEZE to verify, can solve my problem, thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-16 15:02 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-14 12:44 [PATCH bpf-next] bpf, sockmap: Manual deletion of sockmap elements in user mode is not allowed Wang Yufen 2022-03-14 15:30 ` Jakub Sitnicki 2022-03-15 7:24 ` wangyufen 2022-03-15 12:12 ` Jakub Sitnicki 2022-03-15 16:25 ` Daniel Borkmann [not found] ` <f5a45e95-bac2-e1be-2d7b-5e6d55f9b408@huawei.com> 2022-03-16 5:23 ` John Fastabend 2022-03-16 14:57 ` Jakub Sitnicki 2022-03-16 0:36 ` Cong Wang 2022-03-16 3:25 ` wangyufen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.