* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 [not found] <20200220112231.34FB.409509F4@e16-tech.com> @ 2020-02-20 13:57 ` Chuck Lever 2020-02-20 14:05 ` Leon Romanovsky 0 siblings, 1 reply; 5+ messages in thread From: Chuck Lever @ 2020-02-20 13:57 UTC (permalink / raw) To: Wang Yugui; +Cc: linux-rdma Hello! Thanks for your bug report. > On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > Hi, chuck.lever > > a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20. > > maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance. The backtrace below is through IPoIB code paths. Those have nothing to do with NFS/RDMA, which is the only ULP code that is changed by my commit. > maybe the info is useful. I'm copying linux-rdma for a bigger set of eyeballs. My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect" between .20 and .21 to nail down a specific commit where the BUG starts to occur. > Feb 20 10:05:58 T630 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010 > ... > Feb 20 10:05:58 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core] > Feb 20 10:05:58 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230 > Feb 20 10:05:58 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core] > Feb 20 10:05:58 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core] > Feb 20 10:05:58 T630 kernel: ? dma_pool_free+0x24/0xc0 > Feb 20 10:05:58 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib] > Feb 20 10:05:58 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib] > Feb 20 10:05:58 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core] > Feb 20 10:05:58 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib] > Feb 20 10:05:58 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib] > Feb 20 10:05:58 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib] > Feb 20 10:05:58 T630 kernel: __dev_open+0xcd/0x160 > > > # ibstat > CA 'mlx4_0' > CA type: MT4099 > Number of ports: 2 > Firmware version: 2.42.5000 > Hardware version: 1 > Node GUID: 0xe41d2d03007b4080 > System image GUID: 0xe41d2d03007b4083 > Port 1: > State: Down > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02594868 > Port GUID: 0xe41d2d03007b4081 > Link layer: InfiniBand > Port 2: > State: Down > Physical state: Disabled > Rate: 40 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x00010000 > Port GUID: 0xe61d2dfffe7b4082 > Link layer: Ethernet > > Best Regards > 王玉贵 > 2020/02/20 > > -------------------------------------- > 北京京垓科技有限公司 > 王玉贵 wangyugui@e16-tech.com > 电话:+86-136-71123776 > <bug-of-ib-in-5.4.21.message> -- Chuck Lever ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 2020-02-20 13:57 ` a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 Chuck Lever @ 2020-02-20 14:05 ` Leon Romanovsky 2020-02-20 16:26 ` Wang Yugui 0 siblings, 1 reply; 5+ messages in thread From: Leon Romanovsky @ 2020-02-20 14:05 UTC (permalink / raw) To: Chuck Lever; +Cc: Wang Yugui, linux-rdma On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote: > Hello! > > Thanks for your bug report. > > > > On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > > > Hi, chuck.lever > > > > a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20. > > > > maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch > > I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance. > The backtrace below is through IPoIB code paths. Those have nothing to do with > NFS/RDMA, which is the only ULP code that is changed by my commit. > > > > maybe the info is useful. > > I'm copying linux-rdma for a bigger set of eyeballs. > > My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect" > between .20 and .21 to nail down a specific commit where the BUG starts to occur. No need to bisect, it is me who broke. The fix is already accepted, but not yet merged. https://patchwork.kernel.org/patch/11387567/ Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 2020-02-20 14:05 ` Leon Romanovsky @ 2020-02-20 16:26 ` Wang Yugui 2020-02-25 13:05 ` Maor Gottlieb 0 siblings, 1 reply; 5+ messages in thread From: Wang Yugui @ 2020-02-20 16:26 UTC (permalink / raw) To: Leon Romanovsky; +Cc: Chuck Lever, linux-rdma Hi, Leon, Chuck It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21. the call stack is almost the same. Feb 20 23:49:53 T630 kernel: Call Trace: Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core] Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230 Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core] Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core] Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0 Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib] Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib] Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core] Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib] Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib] Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib] Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160 Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220 Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0 Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60 Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00 Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840 Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80 Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0 Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20 Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110 Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840 Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890 Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40 Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40 Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850 Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30 Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230 Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60 Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360 Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300 Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30 Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110 Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110 Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220 Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0 Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60 Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260 Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160 Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0 Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100 Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290 Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0 Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0 Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180 Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 This card have 2 port, and port 1 is set as InfiniBand, port 2 is set as Ethernet. # ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 2 Firmware version: 2.42.5000 Hardware version: 1 Node GUID: 0xe41d2d03007b4080 System image GUID: 0xe41d2d03007b4083 Port 1: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02594868 Port GUID: 0xe41d2d03007b4081 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0xe61d2dfffe7b4082 Link layer: Ethernet Best Regards 王玉贵 2020/02/21 > On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote: > > Hello! > > > > Thanks for your bug report. > > > > > > > On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote: > > > > > > Hi, chuck.lever > > > > > > a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20. > > > > > > maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch > > > > I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance. > > The backtrace below is through IPoIB code paths. Those have nothing to do with > > NFS/RDMA, which is the only ULP code that is changed by my commit. > > > > > > > maybe the info is useful. > > > > I'm copying linux-rdma for a bigger set of eyeballs. > > > > My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect" > > between .20 and .21 to nail down a specific commit where the BUG starts to occur. > > No need to bisect, it is me who broke. > The fix is already accepted, but not yet merged. > https://patchwork.kernel.org/patch/11387567/ > > Thanks -------------------------------------- 北京京垓科技有限公司 王玉贵 wangyugui@e16-tech.com 电话:+86-136-71123776 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 2020-02-20 16:26 ` Wang Yugui @ 2020-02-25 13:05 ` Maor Gottlieb 2020-02-26 0:44 ` Wang Yugui 0 siblings, 1 reply; 5+ messages in thread From: Maor Gottlieb @ 2020-02-25 13:05 UTC (permalink / raw) To: Wang Yugui, Leon Romanovsky; +Cc: Chuck Lever, linux-rdma On 2/20/2020 6:26 PM, Wang Yugui wrote: > Hi, Leon, Chuck > > It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21. Hi Wang, How can I reproduce it ? Can you please try with the below diff? iff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index b9a36ea244d4..2d5608315dc8 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp, return NULL; if (qp_attr_mask & IB_QP_PORT) - new_pps->main.port_num = - (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num; + new_pps->main.port_num = qp_attr->port_num; + else if (qp_pps) + new_pps->main.port_num = qp_pps->main.port_num; + if (qp_attr_mask & IB_QP_PKEY_INDEX) - new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index : - qp_attr->pkey_index; + new_pps->main.pkey_index = qp_attr->pkey_index; + else if (qp_pps) + new_pps->main.pkey_index = qp_pps->main.pkey_index; + if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT)) new_pps->main.state = IB_PORT_PKEY_VALID; > > the call stack is almost the same. > > Feb 20 23:49:53 T630 kernel: Call Trace: > Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core] > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230 > Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core] > Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core] > Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0 > Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib] > Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib] > Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core] > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib] > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib] > Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib] > Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160 > Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220 > Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0 > Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60 > Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00 > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840 > Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80 > Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0 > Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20 > Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110 > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840 > Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890 > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40 > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40 > Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850 > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30 > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230 > Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60 > Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360 > Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300 > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30 > Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110 > Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110 > Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220 > Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0 > Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60 > Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260 > Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160 > Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0 > Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100 > Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290 > Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0 > Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0 > Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180 > Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > This card have 2 port, and port 1 is set as InfiniBand, port 2 > is set as Ethernet. > > # ibstat > CA 'mlx4_0' > CA type: MT4099 > Number of ports: 2 > Firmware version: 2.42.5000 > Hardware version: 1 > Node GUID: 0xe41d2d03007b4080 > System image GUID: 0xe41d2d03007b4083 > Port 1: > State: Down > Physical state: Polling > Rate: 10 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02594868 > Port GUID: 0xe41d2d03007b4081 > Link layer: InfiniBand > Port 2: > State: Down > Physical state: Disabled > Rate: 40 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x00010000 > Port GUID: 0xe61d2dfffe7b4082 > Link layer: Ethernet > > > Best Regards > 王玉贵 > 2020/02/21 > >> On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote: >>> Hello! >>> >>> Thanks for your bug report. >>> >>> >>>> On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote: >>>> >>>> Hi, chuck.lever >>>> >>>> a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20. >>>> >>>> maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch >>> I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance. >>> The backtrace below is through IPoIB code paths. Those have nothing to do with >>> NFS/RDMA, which is the only ULP code that is changed by my commit. >>> >>> >>>> maybe the info is useful. >>> I'm copying linux-rdma for a bigger set of eyeballs. >>> >>> My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect" >>> between .20 and .21 to nail down a specific commit where the BUG starts to occur. >> No need to bisect, it is me who broke. >> The fix is already accepted, but not yet merged. >> https://patchwork.kernel.org/patch/11387567/ >> >> Thanks > -------------------------------------- > 北京京垓科技有限公司 > 王玉贵 wangyugui@e16-tech.com > 电话:+86-136-71123776 > ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 2020-02-25 13:05 ` Maor Gottlieb @ 2020-02-26 0:44 ` Wang Yugui 0 siblings, 0 replies; 5+ messages in thread From: Wang Yugui @ 2020-02-26 0:44 UTC (permalink / raw) To: Maor Gottlieb; +Cc: Leon Romanovsky, Chuck Lever, linux-rdma [-- Attachment #1: Type: text/plain, Size: 9562 bytes --] Hi, Maor, Leon The kernel 5.4.21 plus the two patches successfully boot now without the NULL pointer problem. And nfs4/rdma sucessfully mount too. #RDMA-core-Fix-use-of-logical-OR-in-get_new_pps.patch #RDMA-core-fix-null.patch (the patch from Maor saved as git-am format) My MCX354A have 2 port, and port 1 is set as InfiniBand, port 2 is set as Ethernet. # ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 2 Firmware version: 2.42.5000 Hardware version: 1 Node GUID: 0xe41d2d03007b4080 System image GUID: 0xe41d2d03007b4083 Port 1: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02594868 Port GUID: 0xe41d2d03007b4081 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0xe61d2dfffe7b4082 Link layer: Ethernet # mlxup Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX3 Part Number: 01T7NW Description: ConnectX-3 VPI adapter; dual-port QSFP; FDR IB (56Gb/s) and 40GbE;PCIe3.0 x8 8GT/s; Dell PowerEdge PSID: DEL1090001019 PCI Device Name: 0000:84:00.0 Port1 GUID: e41d2d03007b4081 Port2 MAC: e41d2d7b4082 Versions: Current Available FW 2.42.5000 N/A PXE 3.4.0752 N/A Status: No matching image found My server is a dell PowerEdge T630 with some other NIC cards. # rxe_cfg Name Link Driver Speed NMTU IPv4_addr RDEV RMTU em1 yes igb 1500 192.168.2.63 em2 no igb 1500 p1p1 no bnx2x 10GigE 9000 10.0.0.63 p1p2 no bnx2x 10GigE 9000 10.0.1.63 p6p2 no mlx4_en 9000 10.40.1.63 virbr0 no bridge 1500 192.168.122.1 virbr0-nic no tun 1500 Best Regards 王玉贵 2020/02/26 > On 2/20/2020 6:26 PM, Wang Yugui wrote: > > Hi, Leon, Chuck > > > > It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21. > > Hi Wang, > > How can I reproduce it ? > > Can you please try with the below diff? > > iff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c > index b9a36ea244d4..2d5608315dc8 100644 > --- a/drivers/infiniband/core/security.c > +++ b/drivers/infiniband/core/security.c > @@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp, > ??????????????? return NULL; > > ??????? if (qp_attr_mask & IB_QP_PORT) > -?????????? new_pps->main.port_num = > -?????????????????? (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num; > +???????? new_pps->main.port_num = qp_attr->port_num; > + else if (qp_pps) > +???????? new_pps->main.port_num = qp_pps->main.port_num; > + > ??????? if (qp_attr_mask & IB_QP_PKEY_INDEX) > -?????????? new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index : > - qp_attr->pkey_index; > +???????? new_pps->main.pkey_index = qp_attr->pkey_index; > + else if (qp_pps) > +???????? new_pps->main.pkey_index = qp_pps->main.pkey_index; > + > ??????? if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT)) > ??????????????? new_pps->main.state = IB_PORT_PKEY_VALID; > > > > > the call stack is almost the same. > > > > Feb 20 23:49:53 T630 kernel: Call Trace: > > Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core] > > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230 > > Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core] > > Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core] > > Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0 > > Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib] > > Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib] > > Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core] > > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib] > > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib] > > Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib] > > Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160 > > Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220 > > Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0 > > Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60 > > Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00 > > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840 > > Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80 > > Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0 > > Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20 > > Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110 > > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840 > > Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890 > > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 > > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 > > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 > > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 > > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40 > > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50 > > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20 > > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40 > > Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850 > > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30 > > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230 > > Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60 > > Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360 > > Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300 > > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30 > > Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110 > > Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110 > > Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220 > > Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0 > > Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60 > > Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260 > > Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160 > > Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0 > > Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100 > > Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290 > > Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0 > > Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0 > > Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180 > > Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > This card have 2 port, and port 1 is set as InfiniBand, port 2 > > is set as Ethernet. > > > > # ibstat > > CA 'mlx4_0' > > CA type: MT4099 > > Number of ports: 2 > > Firmware version: 2.42.5000 > > Hardware version: 1 > > Node GUID: 0xe41d2d03007b4080 > > System image GUID: 0xe41d2d03007b4083 > > Port 1: > > State: Down > > Physical state: Polling > > Rate: 10 > > Base lid: 0 > > LMC: 0 > > SM lid: 0 > > Capability mask: 0x02594868 > > Port GUID: 0xe41d2d03007b4081 > > Link layer: InfiniBand > > Port 2: > > State: Down > > Physical state: Disabled > > Rate: 40 > > Base lid: 0 > > LMC: 0 > > SM lid: 0 > > Capability mask: 0x00010000 > > Port GUID: 0xe61d2dfffe7b4082 > > Link layer: Ethernet > > > > > > Best Regards > > 王玉贵 > > 2020/02/21 > > > >> On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote: > >>> Hello! > >>> > >>> Thanks for your bug report. > >>> > >>> > >>>> On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote: > >>>> > >>>> Hi, chuck.lever > >>>> > >>>> a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20. > >>>> > >>>> maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch > >>> I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance. > >>> The backtrace below is through IPoIB code paths. Those have nothing to do with > >>> NFS/RDMA, which is the only ULP code that is changed by my commit. > >>> > >>> > >>>> maybe the info is useful. > >>> I'm copying linux-rdma for a bigger set of eyeballs. > >>> > >>> My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect" > >>> between .20 and .21 to nail down a specific commit where the BUG starts to occur. > >> No need to bisect, it is me who broke. > >> The fix is already accepted, but not yet merged. > >> https://patchwork.kernel.org/patch/11387567/ > >> > >> Thanks > > -------------------------------------- > > 北京京垓科技有限公司 > > 王玉贵 wangyugui@e16-tech.com > > 电话:+86-136-71123776 > > -------------------------------------- 北京京垓科技有限公司 王玉贵 wangyugui@e16-tech.com 电话:+86-136-71123776 [-- Attachment #2: RDMA-core-fix-null.patch --] [-- Type: application/octet-stream, Size: 1218 bytes --] From d4078b7c5e9782b2ca3d6c6035f4abb995c4dab7 Mon Sep 17 00:00:00 2001 From: maorg@mellanox.com Date: Wed, 26 Feb 2020 07:58:29 +0800 Subject: [PATCH] RDMA-core-fix-NULL --- drivers/infiniband/core/security.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index 2b4d803..9e27ca1 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp, return NULL; if (qp_attr_mask & IB_QP_PORT) - new_pps->main.port_num = - (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num; + new_pps->main.port_num = qp_attr->port_num; + else if (qp_pps) + new_pps->main.port_num = qp_pps->main.port_num; + if (qp_attr_mask & IB_QP_PKEY_INDEX) - new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index : - qp_attr->pkey_index; + new_pps->main.pkey_index = qp_attr->pkey_index; + else if (qp_pps) + new_pps->main.pkey_index = qp_pps->main.pkey_index; + if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT)) new_pps->main.state = IB_PORT_PKEY_VALID; -- 2.24.1 [-- Attachment #3: RDMA-core-Fix-use-of-logical-OR-in-get_new_pps.patch --] [-- Type: application/octet-stream, Size: 5511 bytes --] From patchwork Mon Feb 17 20:43:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Chancellor <natechancellor@gmail.com> X-Patchwork-Id: 11387567 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: <SRS0=eK3A=4F=vger.kernel.org=linux-rdma-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B2E114E3 for <patchwork-linux-rdma@patchwork.kernel.org>; Mon, 17 Feb 2020 20:43:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EF64120801 for <patchwork-linux-rdma@patchwork.kernel.org>; Mon, 17 Feb 2020 20:43:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NSM1P5Sb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728676AbgBQUnj (ORCPT <rfc822;patchwork-linux-rdma@patchwork.kernel.org>); Mon, 17 Feb 2020 15:43:39 -0500 Received: from mail-oi1-f194.google.com ([209.85.167.194]:42426 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727300AbgBQUni (ORCPT <rfc822;linux-rdma@vger.kernel.org>); Mon, 17 Feb 2020 15:43:38 -0500 Received: by mail-oi1-f194.google.com with SMTP id j132so17938514oih.9; Mon, 17 Feb 2020 12:43:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=mt8KEiHVFXt+VI7oyaRToYaExGUPicwbfI4j6wwtPQE=; b=NSM1P5SbjhpDBQ9V9I+7JKKNZZ8Xsi/Ao/gOUbQ1xd+3FCSvZBiK2f28jPw8GLxAEi aPZehpxvudMkidUrcGsB2Bew1M4jb7qwd7CU6KSuteWVELybmQqqn+sWdTuiGjRa2g10 +XPrCy7IfzxuiYXxJGNn7Ms7wtLppo/NuXOOLQgDXLpcxFU4SBFDoIcJJzIs6MrZpt5v OK9Wpq4viCjxUrxAqvRh/W2VHdxlS/M8ZahbKDXH/U2gJQ5iTtyzaTqqisYboEJxjtVl hNHbFyIaNBkGy8Y7gWacVVo0+X77h06DaEi0HIZrwH3mG260jhh4PYTM8+cJZWgXikmi bYAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=mt8KEiHVFXt+VI7oyaRToYaExGUPicwbfI4j6wwtPQE=; b=nHxJb/N0aEXI1NLuEX6v7goMRKLqSy2/f6Oe/ur4GDnQuJUWv1J3+8Y9hjmvmTffGd hDULZoeMDf0ZI8lNjXGj5mGcQjgm4DPCfCU5lHtSGCztmG9J67UAI2blGUPRa99n8X8h hAK6FSREY/mooA0V2D2ww1ry/6800CZI5OBBLhE3xSp8nd38YT9Sco6bBKmkqD8RqF1X TQ3JmRGtHeBALgLm5Cwlr1KtB6i35NHyMlHNhdwPSKDvZGvjTqw4YFRHiSIX16K5a9Ag Axrw5TOTyicoVx7j0AmPBQI1veCKvoVSC7tCjY2QEEN1K4RjKyAVhZ154iDOanXAwne4 RiEA== X-Gm-Message-State: APjAAAWx8xNsmDH5SniaSlaS6gKI0cMNDnb6qfbkgcsQom4cDI5RRHIH XioNkIq8Dk7YsiSnNin+azk= X-Google-Smtp-Source: APXvYqwScEu4D7KeCYqO8/1v9KdWk5GSYNtypdkxNUfqBHecf0KjewwAPXsmUl0Uj1AZUoq+J4MAcA== X-Received: by 2002:aca:530e:: with SMTP id h14mr505712oib.105.1581972218092; Mon, 17 Feb 2020 12:43:38 -0800 (PST) Received: from localhost.localdomain ([2604:1380:4111:8b00::1]) by smtp.gmail.com with ESMTPSA id w20sm545592otj.21.2020.02.17.12.43.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Feb 2020 12:43:37 -0800 (PST) From: Nathan Chancellor <natechancellor@gmail.com> To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org>, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, clang-built-linux@googlegroups.com, Nathan Chancellor <natechancellor@gmail.com> Subject: [PATCH] RDMA/core: Fix use of logical OR in get_new_pps Date: Mon, 17 Feb 2020 13:43:18 -0700 Message-Id: <20200217204318.13609-1-natechancellor@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Patchwork-Bot: notify Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: <linux-rdma.vger.kernel.org> X-Mailing-List: linux-rdma@vger.kernel.org Clang warns: ../drivers/infiniband/core/security.c:351:41: warning: converting the enum constant to a boolean [-Wint-in-bool-context] if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) { ^ 1 warning generated. A bitwise OR should have been used instead. Fixes: 1dd017882e01 ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list") Link: https://github.com/ClangBuiltLinux/linux/issues/889 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> --- drivers/infiniband/core/security.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index 2b4d80393bd0..b9a36ea244d4 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -348,7 +348,7 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp, if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT)) new_pps->main.state = IB_PORT_PKEY_VALID; - if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) { + if (!(qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) && qp_pps) { new_pps->main.port_num = qp_pps->main.port_num; new_pps->main.pkey_index = qp_pps->main.pkey_index; if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID) ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-02-26 0:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200220112231.34FB.409509F4@e16-tech.com> 2020-02-20 13:57 ` a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 Chuck Lever 2020-02-20 14:05 ` Leon Romanovsky 2020-02-20 16:26 ` Wang Yugui 2020-02-25 13:05 ` Maor Gottlieb 2020-02-26 0:44 ` Wang Yugui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).