* [RFC PATCH net v2 0/2] net/smc: Fix for race in smc link group termination @ 2021-12-28 15:13 Wen Gu 2021-12-28 15:13 ` [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination Wen Gu 2021-12-28 15:13 ` [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu 0 siblings, 2 replies; 15+ messages in thread From: Wen Gu @ 2021-12-28 15:13 UTC (permalink / raw) To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu We encountered some crashes recently and they are caused by the race between the access and free of link/link group in smc link group termination. The crashes can be reproduced in frequent abnormal link group termination, like set RNICs up/down. This set of patches tries to fix this by extending the life cycle of link/link group to ensure that they won't be referred to after cleared or freed. v1->v2 - Declare __smcr_link_clear() as 'static' Best wishes, Wen Gu Wen Gu (2): net/smc: Resolve the race between link group access and termination net/smc: Resolve the race between SMC-R link access and clear net/smc/smc.h | 1 + net/smc/smc_core.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++------ net/smc/smc_core.h | 7 +++++ 3 files changed, 79 insertions(+), 9 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2021-12-28 15:13 [RFC PATCH net v2 0/2] net/smc: Fix for race in smc link group termination Wen Gu @ 2021-12-28 15:13 ` Wen Gu 2021-12-29 12:56 ` Karsten Graul 2021-12-28 15:13 ` [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu 1 sibling, 1 reply; 15+ messages in thread From: Wen Gu @ 2021-12-28 15:13 UTC (permalink / raw) To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu We encountered some crashes caused by the race between the access and the termination of link groups. Here are some of panic stacks we met: 1) Race between smc_clc_wait_msg() and __smc_lgr_terminate() BUG: kernel NULL pointer dereference, address: 00000000000002f0 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc] Call Trace: <TASK> ? smc_clc_send_accept+0x45/0xa0 [smc] ? smc_clc_send_accept+0x45/0xa0 [smc] smc_listen_work+0x783/0x1220 [smc] ? finish_task_switch+0xc4/0x2e0 ? process_one_work+0x1ad/0x3c0 process_one_work+0x1ad/0x3c0 worker_thread+0x4c/0x390 ? rescuer_thread+0x320/0x320 kthread+0x149/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() abnormal case like port error --------------------------------------------------------------- | __smc_lgr_terminate() | |- smc_conn_kill() | |- smc_lgr_unregister_conn() | |- set conn->lgr = NULL smc_clc_wait_msg() | |- access conn->lgr (panic) | 2) Race between smc_setsockopt() and __smc_lgr_terminate() BUG: kernel NULL pointer dereference, address: 00000000000002e8 RIP: 0010:smc_setsockopt+0x17a/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> smc_setsockopt() abnormal case like port error -------------------------------------------------------------- | __smc_lgr_terminate() | |- smc_conn_kill() | |- smc_lgr_unregister_conn() | |- set conn->lgr = NULL mod_delayed_work() | |- access conn->lgr (panic) | There are some other panic points and they are caused by the simmilar reason as described above, which is accessing link group after termination, thus getting a NULL pointer or invalid resource. Currently, there seems to be no synchronization between the link group access and a sudden termination of it. This patch tries to fix this by introducing reference count of link group and not freeing link group until reference count is zero. Link group might be referred to by link or smc connection. So the operation to the link group reference count can be concluded as follows: object [hold or initialized as 1] [put] -------------------------------------------------------------------- link group smc_lgr_create() smc_lgr_free() connections smc_lgr_register_conn() smc_conn_free() links smcr_link_init() smcr_link_clear() Througth this way, we extend the life cycle of link group and ensure it is longer than the life cycle of connections and links above it, so that avoid invalid access to link group after its termination. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> --- net/smc/smc.h | 1 + net/smc/smc_core.c | 45 ++++++++++++++++++++++++++++++++++++++++----- net/smc/smc_core.h | 3 +++ 3 files changed, 44 insertions(+), 5 deletions(-) diff --git a/net/smc/smc.h b/net/smc/smc.h index 1a4fc1c..3d0b8e3 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -221,6 +221,7 @@ struct smc_connection { */ u64 peer_token; /* SMC-D token of peer */ u8 killed : 1; /* abnormal termination */ + u8 freed : 1; /* normal termiation */ u8 out_of_sync : 1; /* out of sync with peer */ }; diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 1f40b8e..d72eb13 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -184,6 +184,7 @@ static int smc_lgr_register_conn(struct smc_connection *conn, bool first) conn->alert_token_local = 0; } smc_lgr_add_alert_token(conn); + smc_lgr_hold(conn->lgr); /* lgr_put in smc_conn_free() */ conn->lgr->conns_num++; return 0; } @@ -216,7 +217,6 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn) __smc_lgr_unregister_conn(conn); } write_unlock_bh(&lgr->conns_lock); - conn->lgr = NULL; } int smc_nl_get_sys_info(struct sk_buff *skb, struct netlink_callback *cb) @@ -749,6 +749,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu; lnk->link_id = smcr_next_link_id(lgr); lnk->lgr = lgr; + smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */ lnk->link_idx = link_idx; smc_ibdev_cnt_inc(lnk); smcr_copy_dev_info_to_link(lnk); @@ -841,6 +842,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini) lgr->terminating = 0; lgr->freeing = 0; lgr->vlan_id = ini->vlan_id; + refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */ mutex_init(&lgr->sndbufs_lock); mutex_init(&lgr->rmbs_lock); rwlock_init(&lgr->conns_lock); @@ -1120,8 +1122,20 @@ void smc_conn_free(struct smc_connection *conn) { struct smc_link_group *lgr = conn->lgr; - if (!lgr) + if (!lgr || conn->freed) + /* smc connection wasn't registered to a link group + * or has already been freed before. + * + * Judge these to ensure that lgr refcnt will be put + * only once if connection has been registered to a + * link group successfully. + */ return; + + conn->freed = 1; + if (conn->killed) + goto lgr_put; + if (lgr->is_smcd) { if (!list_empty(&lgr->list)) smc_ism_unset_conn(conn); @@ -1138,6 +1152,8 @@ void smc_conn_free(struct smc_connection *conn) if (!lgr->conns_num) smc_lgr_schedule_free_work(lgr); +lgr_put: + smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */ } /* unregister a link from a buf_desc */ @@ -1209,6 +1225,7 @@ void smcr_link_clear(struct smc_link *lnk, bool log) smc_ib_destroy_queue_pair(lnk); smc_ib_dealloc_protection_domain(lnk); smc_wr_free_link_mem(lnk); + smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */ smc_ibdev_cnt_dec(lnk); put_device(&lnk->smcibdev->ibdev->dev); smcibdev = lnk->smcibdev; @@ -1283,6 +1300,15 @@ static void smc_lgr_free_bufs(struct smc_link_group *lgr) __smc_lgr_free_bufs(lgr, true); } +/* won't be freed until no one accesses to lgr anymore */ +static void __smc_lgr_free(struct smc_link_group *lgr) +{ + smc_lgr_free_bufs(lgr); + if (!lgr->is_smcd) + smc_wr_free_lgr_mem(lgr); + kfree(lgr); +} + /* remove a link group */ static void smc_lgr_free(struct smc_link_group *lgr) { @@ -1298,7 +1324,6 @@ static void smc_lgr_free(struct smc_link_group *lgr) smc_llc_lgr_clear(lgr); } - smc_lgr_free_bufs(lgr); destroy_workqueue(lgr->tx_wq); if (lgr->is_smcd) { smc_ism_put_vlan(lgr->smcd, lgr->vlan_id); @@ -1306,11 +1331,21 @@ static void smc_lgr_free(struct smc_link_group *lgr) if (!atomic_dec_return(&lgr->smcd->lgr_cnt)) wake_up(&lgr->smcd->lgrs_deleted); } else { - smc_wr_free_lgr_mem(lgr); if (!atomic_dec_return(&lgr_cnt)) wake_up(&lgrs_deleted); } - kfree(lgr); + smc_lgr_put(lgr); /* theoretically last lgr_put */ +} + +void smc_lgr_hold(struct smc_link_group *lgr) +{ + refcount_inc(&lgr->refcnt); +} + +void smc_lgr_put(struct smc_link_group *lgr) +{ + if (refcount_dec_and_test(&lgr->refcnt)) + __smc_lgr_free(lgr); } static void smc_sk_wake_ups(struct smc_sock *smc) diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index d63b082..51203b1 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -249,6 +249,7 @@ struct smc_link_group { u8 terminating : 1;/* lgr is terminating */ u8 freeing : 1; /* lgr is being freed */ + refcount_t refcnt; /* lgr reference count */ bool is_smcd; /* SMC-R or SMC-D */ u8 smc_version; u8 negotiated_eid[SMC_MAX_EID_LEN]; @@ -470,6 +471,8 @@ static inline void smc_set_pci_values(struct pci_dev *pci_dev, void smc_lgr_cleanup_early(struct smc_connection *conn); void smc_lgr_terminate_sched(struct smc_link_group *lgr); +void smc_lgr_hold(struct smc_link_group *lgr); +void smc_lgr_put(struct smc_link_group *lgr); void smcr_port_add(struct smc_ib_device *smcibdev, u8 ibport); void smcr_port_err(struct smc_ib_device *smcibdev, u8 ibport); void smc_smcd_terminate(struct smcd_dev *dev, u64 peer_gid, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2021-12-28 15:13 ` [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination Wen Gu @ 2021-12-29 12:56 ` Karsten Graul 2021-12-31 9:44 ` Wen Gu 0 siblings, 1 reply; 15+ messages in thread From: Karsten Graul @ 2021-12-29 12:56 UTC (permalink / raw) To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu On 28/12/2021 16:13, Wen Gu wrote: > We encountered some crashes caused by the race between the access > and the termination of link groups. While I agree with the problems you found I am not sure if the solution is the right one. At the moment conn->lgr is checked all over the code as indication if a connection still has a valid link group. When you change this semantic by leaving conn->lgr set after the connection was unregistered from its link group then I expect various new problems to happen. For me the right solution would be to use correct locking before conn->lgr is checked and used. In smc_lgr_unregister_conn() the lgr->conns_lock is used when conn->lgr is unset (note that it is better to have that "conn->lgr = NULL;" line INSIDE the lock in this function). And on any places in the code where conn->lgr is used you get the read_lock while lgr is accessed. This could solve the problem, using existing mechanisms, right? Opinions? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2021-12-29 12:56 ` Karsten Graul @ 2021-12-31 9:44 ` Wen Gu 2022-01-03 10:36 ` Karsten Graul 0 siblings, 1 reply; 15+ messages in thread From: Wen Gu @ 2021-12-31 9:44 UTC (permalink / raw) To: Karsten Graul, davem, kuba Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu Hi Karsten, Thanks for your suggestions. Wish you and your family a happy New Year! On 2021/12/29 8:56 pm, Karsten Graul wrote: > On 28/12/2021 16:13, Wen Gu wrote: >> We encountered some crashes caused by the race between the access >> and the termination of link groups. > > While I agree with the problems you found I am not sure if the solution is the right one. > At the moment conn->lgr is checked all over the code as indication if a connection > still has a valid link group. When you change this semantic by leaving conn->lgr set > after the connection was unregistered from its link group then I expect various new problems > to happen. Actually we also thought about this semantic mismatch problem. But we haven't encountered any problems caused by leaving conn->lgr set in our tests. After careful consideration, we chose to use this patch as a trade off against the more serious problems -- the crashes in mutliple places caused by abnormal termination. If any specific problems caused by leaving conn->lgr set can be expected, please inform us. Thanks. > > For me the right solution would be to use correct locking before conn->lgr is checked and used. > In my humble opinion, the key point is not avoiding access to a NULL pointer (conn->lgr) by checking it before, but avoiding access link group after it is freed, which becomes a piece of dirty memory. This patch focuses on how to ensure the safe access to link group, which is from conn->lgr or link->lgr. > In smc_lgr_unregister_conn() the lgr->conns_lock is used when conn->lgr is unset (note that > it is better to have that "conn->lgr = NULL;" line INSIDE the lock in this function). > I think lgr->conns_lock is used to make the read and modify to lgr->conns_all mutually exclusive. As mentioned above, we are aimed to avoid access link group after it is freed. It might be inappropriate to avoid access to a freed lgr by lgr->conns_lock. > And on any places in the code where conn->lgr is used you get the read_lock while lgr is accessed. > This could solve the problem, using existing mechanisms, right? Opinions? We also considered to protect the access to link group by locking at the beginning as you suggested, like RCU. But we found some imperfections of this way. 1) It is hard to cover all the race. link group is referred to all over the code and link group termination may be triggered at any time. So it is hard to find all the exact potential race code and protected each one respectively by locking. The discover of race code will rely heavily on testing and we may have to continuously add patches to fix each new race we find. 2) It is hard to hold lock during all the link group access. Only checking conn->lgr before accessing to link group is not safe even with correct locking. Even though conn->lgr is checked, link group may have been freed during the following access. access termination ----------------------------------------------------------- if (conn->lgr) | | kfree(lgr) access to lgr (undesired) | To ensure link group access safe, we need to hold the lock before every link group access and not put until link group access finishes, it will cover too much codes and we need to pay attention to the behavior in the lock-holding section. So we chose to use reference count and consider this issue from a life cycle perspective. Introducing reference count can overcome the imperfections mentioned above by: 1) Prolonging the life cycle of link group. Instead of finding all the race, the main idea of the patch is to prolong the life cycle of link group, making it longer than the access cycle of connections and links over the link group. So even link group is being terminated, the free of link group is later than all the access to it. We think the access cycle to link group of connections is from smc_lgr_register_conn() to smc_conn_free() and the access cycle to link group of links is from smcr_link_init() to smcr_link_clear(). 2) Introducing reference count. Instead of using lock, we use reference count to ensure link group is freed only when no one refers to it. No need to find every place which needs holding lock and pay attention to the behavior in lock-holding sections. What do you think about it? Thanks, Wen Gu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2021-12-31 9:44 ` Wen Gu @ 2022-01-03 10:36 ` Karsten Graul 2022-01-05 8:27 ` Wen Gu 0 siblings, 1 reply; 15+ messages in thread From: Karsten Graul @ 2022-01-03 10:36 UTC (permalink / raw) To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu On 31/12/2021 10:44, Wen Gu wrote: > On 2021/12/29 8:56 pm, Karsten Graul wrote: >> On 28/12/2021 16:13, Wen Gu wrote: >>> We encountered some crashes caused by the race between the access >>> and the termination of link groups. > What do you think about it? > Hi Wen, thank you, and I also wish you and your family a happy New Year! Thanks for your detailed explanation, you convinced me of your idea to use a reference counting! I think its a good solution for the various problems you describe. I am still thinking that even if you saw no problems when conn->lgr is not NULL when the lgr is already terminated there should be more attention on the places where conn->lgr is checked. For example, in smc_cdc_get_slot_and_msg_send() there is a check for !conn->lgr with the intention to avoid working with a terminated link group. Should all checks for !conn->lgr be now replaced by the check for conn->freed ?? Does this make sense? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2022-01-03 10:36 ` Karsten Graul @ 2022-01-05 8:27 ` Wen Gu 2022-01-05 12:03 ` Karsten Graul 0 siblings, 1 reply; 15+ messages in thread From: Wen Gu @ 2022-01-05 8:27 UTC (permalink / raw) To: Karsten Graul, davem, kuba Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu Thanks for your reply. On 2022/1/3 6:36 pm, Karsten Graul wrote: > On 31/12/2021 10:44, Wen Gu wrote: >> On 2021/12/29 8:56 pm, Karsten Graul wrote: >>> On 28/12/2021 16:13, Wen Gu wrote: >>>> We encountered some crashes caused by the race between the access >>>> and the termination of link groups. >> What do you think about it? >> > > Hi Wen, > > thank you, and I also wish you and your family a happy New Year! > > Thanks for your detailed explanation, you convinced me of your idea to use > a reference counting! I think its a good solution for the various problems you describe. > > I am still thinking that even if you saw no problems when conn->lgr is not NULL when the lgr > is already terminated there should be more attention on the places where conn->lgr is checked. Thank you for reminding. I agree with the concern. It should be improved to avoid the potential issue we haven't found. > For example, in smc_cdc_get_slot_and_msg_send() there is a check for !conn->lgr with the intention > to avoid working with a terminated link group. > Should all checks for !conn->lgr be now replaced by the check for conn->freed ?? Does this make sense? In my humble opinion, we can replace !conn->lgr with !conn->alert_token_local. If a smc connection is registered to a link group successfully by smc_lgr_register_conn(), conn->alert_token_local is set to non-zero. At this moment, the conn->lgr is ready to be used. And if the link group is terminated, conn->alert_token_local is reset to zero in smc_lgr_unregister_conn(), meaning that the link group registered to connection shouldn't be used anymore. So I think checking conn->alert_token_local has the same effect with checking conn->lgr to identify whether the link group pointed by conn->lgr is still healthy and able to be used. What do you think about it? :) Thanks, Wen Gu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2022-01-05 8:27 ` Wen Gu @ 2022-01-05 12:03 ` Karsten Graul 2022-01-06 13:02 ` Wen Gu 0 siblings, 1 reply; 15+ messages in thread From: Karsten Graul @ 2022-01-05 12:03 UTC (permalink / raw) To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu On 05/01/2022 09:27, Wen Gu wrote: > On 2022/1/3 6:36 pm, Karsten Graul wrote: >> On 31/12/2021 10:44, Wen Gu wrote: >>> On 2021/12/29 8:56 pm, Karsten Graul wrote: >>>> On 28/12/2021 16:13, Wen Gu wrote: >>>>> We encountered some crashes caused by the race between the access >>>>> and the termination of link groups. > So I think checking conn->alert_token_local has the same effect with checking conn->lgr to > identify whether the link group pointed by conn->lgr is still healthy and able to be used. Yeah that sounds like a good solution for that! So is it now guaranteed that conn->lgr is always set and this check can really be removed completely, or should there be a new helper that checks conn->lgr and the alert_token, like smc_lgr_valid() ? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2022-01-05 12:03 ` Karsten Graul @ 2022-01-06 13:02 ` Wen Gu 2022-01-07 9:54 ` Karsten Graul 0 siblings, 1 reply; 15+ messages in thread From: Wen Gu @ 2022-01-06 13:02 UTC (permalink / raw) To: Karsten Graul, davem, kuba Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu Thanks for your reply. On 2022/1/5 8:03 pm, Karsten Graul wrote: > On 05/01/2022 09:27, Wen Gu wrote: >> On 2022/1/3 6:36 pm, Karsten Graul wrote: >>> On 31/12/2021 10:44, Wen Gu wrote: >>>> On 2021/12/29 8:56 pm, Karsten Graul wrote: >>>>> On 28/12/2021 16:13, Wen Gu wrote: >>>>>> We encountered some crashes caused by the race between the access >>>>>> and the termination of link groups. >> So I think checking conn->alert_token_local has the same effect with checking conn->lgr to >> identify whether the link group pointed by conn->lgr is still healthy and able to be used. > > Yeah that sounds like a good solution for that! So is it now guaranteed that conn->lgr is always > set and this check can really be removed completely, or should there be a new helper that checks > conn->lgr and the alert_token, like smc_lgr_valid() ? In my humble opinion, the link group pointed by conn->lgr might have the following three stages if we remove 'conn->lgr = NULL' from smc_lgr_unregister_conn(). 1. conn->lgr = NULL and conn->alert_token_local is zero This means that the connection has never been registered in a link group. conn->lgr is clearly unable to use. 2. conn->lgr != NULL and conn->alert_token_local is non-zero This means that the connection has been registered in a link group, and conn->lgr is valid to access. 3. conn->lgr != NULL but conn->alert_token_local is zero This means that the connection was registered in a link group before, but is unregistered from it now. conn->lgr shouldn't be used anymore. So I am trying this way: 1) Introduce a new helper smc_conn_lgr_state() to check the three stages mentioned above. enum smc_conn_lgr_state { SMC_CONN_LGR_ORPHAN, /* conn was never registered in a link group */ SMC_CONN_LGR_VALID, /* conn is registered in a link group now */ SMC_CONN_LGR_INVALID, /* conn was registered in a link group, but now is unregistered from it and conn->lgr should not be used any more */ }; 2) replace the current conn->lgr check with the new helper. These new changes are under testing now. What do you think about it? :) Thanks, Wen Gu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2022-01-06 13:02 ` Wen Gu @ 2022-01-07 9:54 ` Karsten Graul 2022-01-07 12:04 ` Wen Gu 0 siblings, 1 reply; 15+ messages in thread From: Karsten Graul @ 2022-01-07 9:54 UTC (permalink / raw) To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu On 06/01/2022 14:02, Wen Gu wrote: > Thanks for your reply. > > On 2022/1/5 8:03 pm, Karsten Graul wrote: >> On 05/01/2022 09:27, Wen Gu wrote: >>> On 2022/1/3 6:36 pm, Karsten Graul wrote: >>>> On 31/12/2021 10:44, Wen Gu wrote: >>>>> On 2021/12/29 8:56 pm, Karsten Graul wrote: >>>>>> On 28/12/2021 16:13, Wen Gu wrote: >>>>>>> We encountered some crashes caused by the race between the access >>>>>>> and the termination of link groups. > So I am trying this way: > > 1) Introduce a new helper smc_conn_lgr_state() to check the three stages mentioned above. > > enum smc_conn_lgr_state { > SMC_CONN_LGR_ORPHAN, /* conn was never registered in a link group */ > SMC_CONN_LGR_VALID, /* conn is registered in a link group now */ > SMC_CONN_LGR_INVALID, /* conn was registered in a link group, but now > is unregistered from it and conn->lgr should > not be used any more */ > }; > > 2) replace the current conn->lgr check with the new helper. > > These new changes are under testing now. > > What do you think about it? :) Sounds good, but is it really needed to separate 3 cases, i.e. who is really using them 3? Doesn't it come down to a more simple smc_conn_lgr_valid() which is easier to implement in the various places in the code? (i.e.: if (smc_conn_lgr_valid()) ....) Valid would mean conn->lgr != NULL and conn->alert_token_local != 0. The more special cases would check what they want by there own. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination 2022-01-07 9:54 ` Karsten Graul @ 2022-01-07 12:04 ` Wen Gu 0 siblings, 0 replies; 15+ messages in thread From: Wen Gu @ 2022-01-07 12:04 UTC (permalink / raw) To: Karsten Graul, davem, kuba Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu Thanks for your reply. On 2022/1/7 5:54 pm, Karsten Graul wrote: > On 06/01/2022 14:02, Wen Gu wrote: >> Thanks for your reply. >> >> On 2022/1/5 8:03 pm, Karsten Graul wrote: >>> On 05/01/2022 09:27, Wen Gu wrote: >>>> On 2022/1/3 6:36 pm, Karsten Graul wrote: >>>>> On 31/12/2021 10:44, Wen Gu wrote: >>>>>> On 2021/12/29 8:56 pm, Karsten Graul wrote: >>>>>>> On 28/12/2021 16:13, Wen Gu wrote: >>>>>>>> We encountered some crashes caused by the race between the access >>>>>>>> and the termination of link groups. >> So I am trying this way: >> >> 1) Introduce a new helper smc_conn_lgr_state() to check the three stages mentioned above. >> >> enum smc_conn_lgr_state { >> SMC_CONN_LGR_ORPHAN, /* conn was never registered in a link group */ >> SMC_CONN_LGR_VALID, /* conn is registered in a link group now */ >> SMC_CONN_LGR_INVALID, /* conn was registered in a link group, but now >> is unregistered from it and conn->lgr should >> not be used any more */ >> }; > > Sounds good, but is it really needed to separate 3 cases, i.e. who is really using them 3? > Doesn't it come down to a more simple smc_conn_lgr_valid() which is easier to implement in > the various places in the code? (i.e.: if (smc_conn_lgr_valid()) ....) > Valid would mean conn->lgr != NULL and conn->alert_token_local != 0. The more special cases > would check what they want by there own. Yes, Most of the time we only need to check whether conn->lgr is in SMC_CONN_LGR_VALID. Only in smc_conn_free() we need to identify whether conn->lgr is in SMC_CONN_LGR_ORPHAN (need a directly return) or SMC_CONN_LGR_INVALID (put link group refcnt and then return). So I agree with only checking whether conn->lgr is valid with a more simple smc_conn_lgr_valid(). And distinguish SMC_CONN_LGR_ORPHAN and SMC_CONN_LGR_INVALID cases by additional check for conn->lgr. Thanks, Wen Gu ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear 2021-12-28 15:13 [RFC PATCH net v2 0/2] net/smc: Fix for race in smc link group termination Wen Gu 2021-12-28 15:13 ` [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination Wen Gu @ 2021-12-28 15:13 ` Wen Gu 2021-12-29 12:51 ` Karsten Graul 1 sibling, 1 reply; 15+ messages in thread From: Wen Gu @ 2021-12-28 15:13 UTC (permalink / raw) To: kgraul, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu We encountered some crashes caused by the race between SMC-R link access and link clear triggered by link group termination in abnormal case, like port error. Here are some of panic stacks we met: 1) Race between smc_llc_flow_initiate() and smcr_link_clear() BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc] Call Trace: <TASK> ? __smc_buf_create+0x75a/0x950 [smc] smcr_lgr_reg_rmbs+0x2a/0xbf [smc] smc_listen_work+0xf72/0x1230 [smc] ? process_one_work+0x25c/0x600 process_one_work+0x25c/0x600 worker_thread+0x4f/0x3a0 ? process_one_work+0x600/0x600 kthread+0x15d/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() __smc_lgr_terminate() --------------------------------------------------------------- | smc_lgr_free() | |- smcr_link_clear() | |- memset(lnk, 0) smc_listen_rdma_reg() | |- smcr_lgr_reg_rmbs() | |- smc_llc_flow_initiate() | |- access lnk->lgr (panic) | 2) Race between smc_wr_tx_dismiss_slots() and smcr_link_clear() BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:_find_first_bit+0x8/0x50 Call Trace: <TASK> smc_wr_tx_dismiss_slots+0x34/0xc0 [smc] ? smc_cdc_tx_filter+0x10/0x10 [smc] smc_conn_free+0xd8/0x100 [smc] __smc_release+0xf1/0x140 [smc] smc_release+0x89/0x1b0 [smc] __sock_release+0x37/0xb0 sock_close+0x14/0x20 __fput+0xa9/0x260 task_work_run+0x6b/0xb0 do_exit+0x3ef/0xd40 do_group_exit+0x47/0xb0 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> smc_conn_free() __smc_lgr_terminate() ---------------------------------------------------------------- | smc_lgr_free() | |- smcr_link_clear() | |- smc_wr_free_link_mem() | |- lnk->wr_tx_mask = NULL; smc_wr_tx_dismiss_slots() | |- for_each_set_bit(link->wr_tx_mask) | |- (panic) | These crashes are caused by clearing SMC-R link resources when someone is still accessing to them. So this patch tries to fix it by introducing reference count of SMC-R links and ensuring that the sensitive resources of links are not cleared until reference count is zero. The operation to the SMC-R link reference count can be concluded as follows: object [hold or initialized as 1] [put] -------------------------------------------------------------------- links smcr_link_init() smcr_link_clear() connections smcr_lgr_conn_assign_link() smc_conn_free() Through this way, the clear of SMC-R links is later than the free of all the smc connections above it, thus avoiding the unsafe reference to SMC-R links. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> --- net/smc/smc_core.c | 43 +++++++++++++++++++++++++++++++++++-------- net/smc/smc_core.h | 4 ++++ 2 files changed, 39 insertions(+), 8 deletions(-) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index d72eb13..a64d394 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -155,6 +155,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first) if (!conn->lnk) return SMC_CLC_DECL_NOACTLINK; atomic_inc(&conn->lnk->conn_cnt); + smcr_link_hold(conn->lnk); /* link_put in smc_conn_free() */ return 0; } @@ -746,6 +747,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, } get_device(&lnk->smcibdev->ibdev->dev); atomic_inc(&lnk->smcibdev->lnk_cnt); + refcount_set(&lnk->refcnt, 1); /* link refcnt is set to 1 */ + lnk->clearing = 0; lnk->path_mtu = lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu; lnk->link_id = smcr_next_link_id(lgr); lnk->lgr = lgr; @@ -994,8 +997,12 @@ void smc_switch_link_and_count(struct smc_connection *conn, struct smc_link *to_lnk) { atomic_dec(&conn->lnk->conn_cnt); + /* put old link, hold in smcr_lgr_conn_assign_link() */ + smcr_link_put(conn->lnk); conn->lnk = to_lnk; atomic_inc(&conn->lnk->conn_cnt); + /* hold new link, put in smc_conn_free() */ + smcr_link_hold(conn->lnk); } struct smc_link *smc_switch_conns(struct smc_link_group *lgr, @@ -1126,9 +1133,9 @@ void smc_conn_free(struct smc_connection *conn) /* smc connection wasn't registered to a link group * or has already been freed before. * - * Judge these to ensure that lgr refcnt will be put - * only once if connection has been registered to a - * link group successfully. + * Judge these to ensure that lgr/link refcnt will be + * put only once if connection has been registered to + * a link group successfully. */ return; @@ -1153,6 +1160,8 @@ void smc_conn_free(struct smc_connection *conn) if (!lgr->conns_num) smc_lgr_schedule_free_work(lgr); lgr_put: + if (!lgr->is_smcd) + smcr_link_put(conn->lnk); /* link_hold in smcr_lgr_conn_assign_link() */ smc_lgr_put(lgr); /* lgr_hold in smc_lgr_register_conn() */ } @@ -1209,13 +1218,23 @@ static void smcr_rtoken_clear_link(struct smc_link *lnk) } } +static void __smcr_link_clear(struct smc_link *lnk) +{ + smc_wr_free_link_mem(lnk); + smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */ + memset(lnk, 0, sizeof(struct smc_link)); + lnk->state = SMC_LNK_UNUSED; +} + /* must be called under lgr->llc_conf_mutex lock */ void smcr_link_clear(struct smc_link *lnk, bool log) { struct smc_ib_device *smcibdev; - if (!lnk->lgr || lnk->state == SMC_LNK_UNUSED) + if (lnk->clearing || !lnk->lgr || + lnk->state == SMC_LNK_UNUSED) return; + lnk->clearing = 1; lnk->peer_qpn = 0; smc_llc_link_clear(lnk, log); smcr_buf_unmap_lgr(lnk); @@ -1224,15 +1243,23 @@ void smcr_link_clear(struct smc_link *lnk, bool log) smc_wr_free_link(lnk); smc_ib_destroy_queue_pair(lnk); smc_ib_dealloc_protection_domain(lnk); - smc_wr_free_link_mem(lnk); - smc_lgr_put(lnk->lgr); /* lgr_hold in smcr_link_init() */ smc_ibdev_cnt_dec(lnk); put_device(&lnk->smcibdev->ibdev->dev); smcibdev = lnk->smcibdev; - memset(lnk, 0, sizeof(struct smc_link)); - lnk->state = SMC_LNK_UNUSED; if (!atomic_dec_return(&smcibdev->lnk_cnt)) wake_up(&smcibdev->lnks_deleted); + smcr_link_put(lnk); /* theoretically last link_put */ +} + +void smcr_link_hold(struct smc_link *lnk) +{ + refcount_inc(&lnk->refcnt); +} + +void smcr_link_put(struct smc_link *lnk) +{ + if (refcount_dec_and_test(&lnk->refcnt)) + __smcr_link_clear(lnk); } static void smcr_buf_free(struct smc_link_group *lgr, bool is_rmb, diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 51203b1..e73217f 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -137,6 +137,8 @@ struct smc_link { u8 peer_link_uid[SMC_LGR_ID_SIZE]; /* peer uid */ u8 link_idx; /* index in lgr link array */ u8 link_is_asym; /* is link asymmetric? */ + u8 clearing : 1; /* link is being cleared */ + refcount_t refcnt; /* link reference count */ struct smc_link_group *lgr; /* parent link group */ struct work_struct link_down_wrk; /* wrk to bring link down */ char ibname[IB_DEVICE_NAME_MAX]; /* ib device name */ @@ -504,6 +506,8 @@ void smc_rtoken_set2(struct smc_link_group *lgr, int rtok_idx, int link_id, int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, u8 link_idx, struct smc_init_info *ini); void smcr_link_clear(struct smc_link *lnk, bool log); +void smcr_link_hold(struct smc_link *lnk); +void smcr_link_put(struct smc_link *lnk); void smc_switch_link_and_count(struct smc_connection *conn, struct smc_link *to_lnk); int smcr_buf_map_lgr(struct smc_link *lnk); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear 2021-12-28 15:13 ` [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu @ 2021-12-29 12:51 ` Karsten Graul 2021-12-30 4:00 ` dust.li 2021-12-31 9:45 ` Wen Gu 0 siblings, 2 replies; 15+ messages in thread From: Karsten Graul @ 2021-12-29 12:51 UTC (permalink / raw) To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu On 28/12/2021 16:13, Wen Gu wrote: > We encountered some crashes caused by the race between SMC-R > link access and link clear triggered by link group termination > in abnormal case, like port error. Without to dig deeper into this, there is already a refcount for links, see smc_wr_tx_link_hold(). In smc_wr_free_link() there are waits for the refcounts to become zero. Why do you need to introduce another refcounting instead of using the existing? And if you have a good reason, do we still need the existing refcounting with your new implementation? Maybe its enough to use the existing refcounting in the other functions like smc_llc_flow_initiate()? Btw: it is interesting what kind of crashes you see, we never met them in our setup. Its great to see you evaluating SMC in a cloud environment! ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear 2021-12-29 12:51 ` Karsten Graul @ 2021-12-30 4:00 ` dust.li 2021-12-31 9:45 ` Wen Gu 1 sibling, 0 replies; 15+ messages in thread From: dust.li @ 2021-12-30 4:00 UTC (permalink / raw) To: Karsten Graul, Wen Gu, davem, kuba Cc: linux-s390, netdev, linux-kernel, tonylu On Wed, Dec 29, 2021 at 01:51:27PM +0100, Karsten Graul wrote: >On 28/12/2021 16:13, Wen Gu wrote: >> We encountered some crashes caused by the race between SMC-R >> link access and link clear triggered by link group termination >> in abnormal case, like port error. > >Without to dig deeper into this, there is already a refcount for links, see smc_wr_tx_link_hold(). >In smc_wr_free_link() there are waits for the refcounts to become zero. > >Why do you need to introduce another refcounting instead of using the existing? >And if you have a good reason, do we still need the existing refcounting with your new >implementation? > >Maybe its enough to use the existing refcounting in the other functions like smc_llc_flow_initiate()? > >Btw: it is interesting what kind of crashes you see, we never met them in our setup. We are trying to using SMC + RDMA to boost application performance, we now have a product in the cloud called ERDMA which can be used in the virtual machine. We are testing SMC with link down/up with short flow cases since in the cloud environment the RDMA device may be plugged in/out frequently, and there are many different applications, some of them may have pretty much short flows. >Its great to see you evaluating SMC in a cloud environment! Thanks! We are trying to use SMC to boost performance for cloud applications, and we hope SMC can be more generic and widely used. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear 2021-12-29 12:51 ` Karsten Graul 2021-12-30 4:00 ` dust.li @ 2021-12-31 9:45 ` Wen Gu 2022-01-03 10:39 ` Karsten Graul 1 sibling, 1 reply; 15+ messages in thread From: Wen Gu @ 2021-12-31 9:45 UTC (permalink / raw) To: Karsten Graul, davem, kuba Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu Thanks for your reply. On 2021/12/29 8:51 pm, Karsten Graul wrote: > On 28/12/2021 16:13, Wen Gu wrote: >> We encountered some crashes caused by the race between SMC-R >> link access and link clear triggered by link group termination >> in abnormal case, like port error. > > Without to dig deeper into this, there is already a refcount for links, see smc_wr_tx_link_hold(). > In smc_wr_free_link() there are waits for the refcounts to become zero. > Thanks for reminding. we also noticed link->wr_tx_refcnt when trying to fix this issue. In my humble opinions, link->wr_tx_refcnt is used for fixing the race between the waiters for a tx work request buffer (mainly LLC/CDC msgs) and the link down processing that finally clears the link. But the issue in this patch is about the race between the access to link by the connections above it (like in listen or connect processing) and the link clear processing that memset the link as zero and release the resource. So it seems that the two should not share the same reference count? > Why do you need to introduce another refcounting instead of using the existing? > And if you have a good reason, do we still need the existing refcounting with your new > implementation? > Yes, we still need it. In my humble opinion, link->wr_tx_refcnt can ensure that the CDC/LLC message sends won't wait for an already cleared link. And LLC messages may be triggered by underlying events like net device ports add/error. But this patch's implementation only ensures that the access to link by connections is safe and smc connections won't get something that already freed during its life cycle, like in listen/connect processing. It can't cover the link access by LLC messages, which may be triggered by underlying events. > Maybe its enough to use the existing refcounting in the other functions like smc_llc_flow_initiate()? > > Btw: it is interesting what kind of crashes you see, we never met them in our setup. This kind of crashes and the link group free crashes mentioned in the [1/2] patch can be reproduced by up/down net device frequently during the testing. > Its great to see you evaluating SMC in a cloud environment! Thanks! Hope that SMC will be widely used. It is an amazing protocal! Cheers, Wen Gu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear 2021-12-31 9:45 ` Wen Gu @ 2022-01-03 10:39 ` Karsten Graul 0 siblings, 0 replies; 15+ messages in thread From: Karsten Graul @ 2022-01-03 10:39 UTC (permalink / raw) To: Wen Gu, davem, kuba; +Cc: linux-s390, netdev, linux-kernel, dust.li, tonylu On 31/12/2021 10:45, Wen Gu wrote: > Thanks for your reply. Thanks for the explanation and discussion. Please post this patch to the net tree. I sent some comments for patch 1 of this series already. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-01-07 12:04 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-28 15:13 [RFC PATCH net v2 0/2] net/smc: Fix for race in smc link group termination Wen Gu 2021-12-28 15:13 ` [RFC PATCH net v2 1/2] net/smc: Resolve the race between link group access and termination Wen Gu 2021-12-29 12:56 ` Karsten Graul 2021-12-31 9:44 ` Wen Gu 2022-01-03 10:36 ` Karsten Graul 2022-01-05 8:27 ` Wen Gu 2022-01-05 12:03 ` Karsten Graul 2022-01-06 13:02 ` Wen Gu 2022-01-07 9:54 ` Karsten Graul 2022-01-07 12:04 ` Wen Gu 2021-12-28 15:13 ` [RFC PATCH net v2 2/2] net/smc: Resolve the race between SMC-R link access and clear Wen Gu 2021-12-29 12:51 ` Karsten Graul 2021-12-30 4:00 ` dust.li 2021-12-31 9:45 ` Wen Gu 2022-01-03 10:39 ` Karsten Graul
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).