* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20
[not found] <20200220112231.34FB.409509F4@e16-tech.com>
@ 2020-02-20 13:57 ` Chuck Lever
2020-02-20 14:05 ` Leon Romanovsky
0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2020-02-20 13:57 UTC (permalink / raw)
To: Wang Yugui; +Cc: linux-rdma
Hello!
Thanks for your bug report.
> On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi, chuck.lever
>
> a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20.
>
> maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch
I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance.
The backtrace below is through IPoIB code paths. Those have nothing to do with
NFS/RDMA, which is the only ULP code that is changed by my commit.
> maybe the info is useful.
I'm copying linux-rdma for a bigger set of eyeballs.
My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect"
between .20 and .21 to nail down a specific commit where the BUG starts to occur.
> Feb 20 10:05:58 T630 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010
> ...
> Feb 20 10:05:58 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core]
> Feb 20 10:05:58 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230
> Feb 20 10:05:58 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core]
> Feb 20 10:05:58 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core]
> Feb 20 10:05:58 T630 kernel: ? dma_pool_free+0x24/0xc0
> Feb 20 10:05:58 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib]
> Feb 20 10:05:58 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib]
> Feb 20 10:05:58 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core]
> Feb 20 10:05:58 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib]
> Feb 20 10:05:58 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib]
> Feb 20 10:05:58 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib]
> Feb 20 10:05:58 T630 kernel: __dev_open+0xcd/0x160
>
>
> # ibstat
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 2
> Firmware version: 2.42.5000
> Hardware version: 1
> Node GUID: 0xe41d2d03007b4080
> System image GUID: 0xe41d2d03007b4083
> Port 1:
> State: Down
> Physical state: Polling
> Rate: 10
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x02594868
> Port GUID: 0xe41d2d03007b4081
> Link layer: InfiniBand
> Port 2:
> State: Down
> Physical state: Disabled
> Rate: 40
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x00010000
> Port GUID: 0xe61d2dfffe7b4082
> Link layer: Ethernet
>
> Best Regards
> 王玉贵
> 2020/02/20
>
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵 wangyugui@e16-tech.com
> 电话:+86-136-71123776
> <bug-of-ib-in-5.4.21.message>
--
Chuck Lever
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20
2020-02-20 13:57 ` a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 Chuck Lever
@ 2020-02-20 14:05 ` Leon Romanovsky
2020-02-20 16:26 ` Wang Yugui
0 siblings, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2020-02-20 14:05 UTC (permalink / raw)
To: Chuck Lever; +Cc: Wang Yugui, linux-rdma
On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote:
> Hello!
>
> Thanks for your bug report.
>
>
> > On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> >
> > Hi, chuck.lever
> >
> > a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20.
> >
> > maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch
>
> I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance.
> The backtrace below is through IPoIB code paths. Those have nothing to do with
> NFS/RDMA, which is the only ULP code that is changed by my commit.
>
>
> > maybe the info is useful.
>
> I'm copying linux-rdma for a bigger set of eyeballs.
>
> My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect"
> between .20 and .21 to nail down a specific commit where the BUG starts to occur.
No need to bisect, it is me who broke.
The fix is already accepted, but not yet merged.
https://patchwork.kernel.org/patch/11387567/
Thanks
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20
2020-02-20 14:05 ` Leon Romanovsky
@ 2020-02-20 16:26 ` Wang Yugui
2020-02-25 13:05 ` Maor Gottlieb
0 siblings, 1 reply; 5+ messages in thread
From: Wang Yugui @ 2020-02-20 16:26 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Chuck Lever, linux-rdma
Hi, Leon, Chuck
It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21.
the call stack is almost the same.
Feb 20 23:49:53 T630 kernel: Call Trace:
Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core]
Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230
Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core]
Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core]
Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0
Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib]
Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib]
Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core]
Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib]
Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib]
Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib]
Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160
Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220
Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0
Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60
Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00
Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80
Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0
Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20
Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110
Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890
Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850
Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230
Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60
Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360
Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300
Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110
Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110
Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220
Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0
Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60
Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260
Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160
Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0
Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100
Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290
Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0
Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0
Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180
Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
This card have 2 port, and port 1 is set as InfiniBand, port 2
is set as Ethernet.
# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xe41d2d03007b4080
System image GUID: 0xe41d2d03007b4083
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02594868
Port GUID: 0xe41d2d03007b4081
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0xe61d2dfffe7b4082
Link layer: Ethernet
Best Regards
王玉贵
2020/02/21
> On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote:
> > Hello!
> >
> > Thanks for your bug report.
> >
> >
> > > On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> > >
> > > Hi, chuck.lever
> > >
> > > a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20.
> > >
> > > maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch
> >
> > I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance.
> > The backtrace below is through IPoIB code paths. Those have nothing to do with
> > NFS/RDMA, which is the only ULP code that is changed by my commit.
> >
> >
> > > maybe the info is useful.
> >
> > I'm copying linux-rdma for a bigger set of eyeballs.
> >
> > My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect"
> > between .20 and .21 to nail down a specific commit where the BUG starts to occur.
>
> No need to bisect, it is me who broke.
> The fix is already accepted, but not yet merged.
> https://patchwork.kernel.org/patch/11387567/
>
> Thanks
--------------------------------------
北京京垓科技有限公司
王玉贵 wangyugui@e16-tech.com
电话:+86-136-71123776
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20
2020-02-20 16:26 ` Wang Yugui
@ 2020-02-25 13:05 ` Maor Gottlieb
2020-02-26 0:44 ` Wang Yugui
0 siblings, 1 reply; 5+ messages in thread
From: Maor Gottlieb @ 2020-02-25 13:05 UTC (permalink / raw)
To: Wang Yugui, Leon Romanovsky; +Cc: Chuck Lever, linux-rdma
On 2/20/2020 6:26 PM, Wang Yugui wrote:
> Hi, Leon, Chuck
>
> It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21.
Hi Wang,
How can I reproduce it ?
Can you please try with the below diff?
iff --git a/drivers/infiniband/core/security.c
b/drivers/infiniband/core/security.c
index b9a36ea244d4..2d5608315dc8 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const
struct ib_qp *qp,
return NULL;
if (qp_attr_mask & IB_QP_PORT)
- new_pps->main.port_num =
- (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
+ new_pps->main.port_num = qp_attr->port_num;
+ else if (qp_pps)
+ new_pps->main.port_num = qp_pps->main.port_num;
+
if (qp_attr_mask & IB_QP_PKEY_INDEX)
- new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
- qp_attr->pkey_index;
+ new_pps->main.pkey_index = qp_attr->pkey_index;
+ else if (qp_pps)
+ new_pps->main.pkey_index = qp_pps->main.pkey_index;
+
if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask &
IB_QP_PORT))
new_pps->main.state = IB_PORT_PKEY_VALID;
>
> the call stack is almost the same.
>
> Feb 20 23:49:53 T630 kernel: Call Trace:
> Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core]
> Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230
> Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core]
> Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core]
> Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0
> Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib]
> Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib]
> Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core]
> Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib]
> Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib]
> Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib]
> Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160
> Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220
> Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0
> Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60
> Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00
> Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
> Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80
> Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0
> Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20
> Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110
> Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
> Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890
> Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
> Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
> Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850
> Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
> Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230
> Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60
> Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360
> Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300
> Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
> Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110
> Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110
> Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220
> Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0
> Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60
> Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260
> Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160
> Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0
> Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100
> Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290
> Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0
> Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0
> Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180
> Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>
> This card have 2 port, and port 1 is set as InfiniBand, port 2
> is set as Ethernet.
>
> # ibstat
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 2
> Firmware version: 2.42.5000
> Hardware version: 1
> Node GUID: 0xe41d2d03007b4080
> System image GUID: 0xe41d2d03007b4083
> Port 1:
> State: Down
> Physical state: Polling
> Rate: 10
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x02594868
> Port GUID: 0xe41d2d03007b4081
> Link layer: InfiniBand
> Port 2:
> State: Down
> Physical state: Disabled
> Rate: 40
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x00010000
> Port GUID: 0xe61d2dfffe7b4082
> Link layer: Ethernet
>
>
> Best Regards
> 王玉贵
> 2020/02/21
>
>> On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote:
>>> Hello!
>>>
>>> Thanks for your bug report.
>>>
>>>
>>>> On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
>>>>
>>>> Hi, chuck.lever
>>>>
>>>> a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20.
>>>>
>>>> maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch
>>> I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance.
>>> The backtrace below is through IPoIB code paths. Those have nothing to do with
>>> NFS/RDMA, which is the only ULP code that is changed by my commit.
>>>
>>>
>>>> maybe the info is useful.
>>> I'm copying linux-rdma for a bigger set of eyeballs.
>>>
>>> My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect"
>>> between .20 and .21 to nail down a specific commit where the BUG starts to occur.
>> No need to bisect, it is me who broke.
>> The fix is already accepted, but not yet merged.
>> https://patchwork.kernel.org/patch/11387567/
>>
>> Thanks
> --------------------------------------
> 北京京垓科技有限公司
> 王玉贵 wangyugui@e16-tech.com
> 电话:+86-136-71123776
>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20
2020-02-25 13:05 ` Maor Gottlieb
@ 2020-02-26 0:44 ` Wang Yugui
0 siblings, 0 replies; 5+ messages in thread
From: Wang Yugui @ 2020-02-26 0:44 UTC (permalink / raw)
To: Maor Gottlieb; +Cc: Leon Romanovsky, Chuck Lever, linux-rdma
[-- Attachment #1: Type: text/plain, Size: 9562 bytes --]
Hi, Maor, Leon
The kernel 5.4.21 plus the two patches successfully boot now
without the NULL pointer problem. And nfs4/rdma sucessfully mount too.
#RDMA-core-Fix-use-of-logical-OR-in-get_new_pps.patch
#RDMA-core-fix-null.patch (the patch from Maor saved as git-am format)
My MCX354A have 2 port, and port 1 is set as InfiniBand, port 2 is set
as Ethernet.
# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xe41d2d03007b4080
System image GUID: 0xe41d2d03007b4083
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02594868
Port GUID: 0xe41d2d03007b4081
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0xe61d2dfffe7b4082
Link layer: Ethernet
# mlxup
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX3
Part Number: 01T7NW
Description: ConnectX-3 VPI adapter; dual-port QSFP; FDR IB (56Gb/s) and 40GbE;PCIe3.0 x8 8GT/s; Dell PowerEdge
PSID: DEL1090001019
PCI Device Name: 0000:84:00.0
Port1 GUID: e41d2d03007b4081
Port2 MAC: e41d2d7b4082
Versions: Current Available
FW 2.42.5000 N/A
PXE 3.4.0752 N/A
Status: No matching image found
My server is a dell PowerEdge T630 with some other NIC cards.
# rxe_cfg
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
em1 yes igb 1500 192.168.2.63
em2 no igb 1500
p1p1 no bnx2x 10GigE 9000 10.0.0.63
p1p2 no bnx2x 10GigE 9000 10.0.1.63
p6p2 no mlx4_en 9000 10.40.1.63
virbr0 no bridge 1500 192.168.122.1
virbr0-nic no tun 1500
Best Regards
王玉贵
2020/02/26
> On 2/20/2020 6:26 PM, Wang Yugui wrote:
> > Hi, Leon, Chuck
> >
> > It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21.
>
> Hi Wang,
>
> How can I reproduce it ?
>
> Can you please try with the below diff?
>
> iff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
> index b9a36ea244d4..2d5608315dc8 100644
> --- a/drivers/infiniband/core/security.c
> +++ b/drivers/infiniband/core/security.c
> @@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
> ??????????????? return NULL;
>
> ??????? if (qp_attr_mask & IB_QP_PORT)
> -?????????? new_pps->main.port_num =
> -?????????????????? (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
> +???????? new_pps->main.port_num = qp_attr->port_num;
> + else if (qp_pps)
> +???????? new_pps->main.port_num = qp_pps->main.port_num;
> +
> ??????? if (qp_attr_mask & IB_QP_PKEY_INDEX)
> -?????????? new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
> - qp_attr->pkey_index;
> +???????? new_pps->main.pkey_index = qp_attr->pkey_index;
> + else if (qp_pps)
> +???????? new_pps->main.pkey_index = qp_pps->main.pkey_index;
> +
> ??????? if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
> ??????????????? new_pps->main.state = IB_PORT_PKEY_VALID;
>
> >
> > the call stack is almost the same.
> >
> > Feb 20 23:49:53 T630 kernel: Call Trace:
> > Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230
> > Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0
> > Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib]
> > Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160
> > Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220
> > Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0
> > Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60
> > Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00
> > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
> > Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80
> > Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0
> > Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20
> > Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110
> > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
> > Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890
> > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
> > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
> > Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850
> > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
> > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230
> > Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60
> > Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360
> > Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300
> > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
> > Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110
> > Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110
> > Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220
> > Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0
> > Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60
> > Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260
> > Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160
> > Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0
> > Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100
> > Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290
> > Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0
> > Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0
> > Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180
> > Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> >
> > This card have 2 port, and port 1 is set as InfiniBand, port 2
> > is set as Ethernet.
> >
> > # ibstat
> > CA 'mlx4_0'
> > CA type: MT4099
> > Number of ports: 2
> > Firmware version: 2.42.5000
> > Hardware version: 1
> > Node GUID: 0xe41d2d03007b4080
> > System image GUID: 0xe41d2d03007b4083
> > Port 1:
> > State: Down
> > Physical state: Polling
> > Rate: 10
> > Base lid: 0
> > LMC: 0
> > SM lid: 0
> > Capability mask: 0x02594868
> > Port GUID: 0xe41d2d03007b4081
> > Link layer: InfiniBand
> > Port 2:
> > State: Down
> > Physical state: Disabled
> > Rate: 40
> > Base lid: 0
> > LMC: 0
> > SM lid: 0
> > Capability mask: 0x00010000
> > Port GUID: 0xe61d2dfffe7b4082
> > Link layer: Ethernet
> >
> >
> > Best Regards
> > 王玉贵
> > 2020/02/21
> >
> >> On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote:
> >>> Hello!
> >>>
> >>> Thanks for your bug report.
> >>>
> >>>
> >>>> On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> >>>>
> >>>> Hi, chuck.lever
> >>>>
> >>>> a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20.
> >>>>
> >>>> maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch
> >>> I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance.
> >>> The backtrace below is through IPoIB code paths. Those have nothing to do with
> >>> NFS/RDMA, which is the only ULP code that is changed by my commit.
> >>>
> >>>
> >>>> maybe the info is useful.
> >>> I'm copying linux-rdma for a bigger set of eyeballs.
> >>>
> >>> My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect"
> >>> between .20 and .21 to nail down a specific commit where the BUG starts to occur.
> >> No need to bisect, it is me who broke.
> >> The fix is already accepted, but not yet merged.
> >> https://patchwork.kernel.org/patch/11387567/
> >>
> >> Thanks
> > --------------------------------------
> > 北京京垓科技有限公司
> > 王玉贵 wangyugui@e16-tech.com
> > 电话:+86-136-71123776
> >
--------------------------------------
北京京垓科技有限公司
王玉贵 wangyugui@e16-tech.com
电话:+86-136-71123776
[-- Attachment #2: RDMA-core-fix-null.patch --]
[-- Type: application/octet-stream, Size: 1218 bytes --]
From d4078b7c5e9782b2ca3d6c6035f4abb995c4dab7 Mon Sep 17 00:00:00 2001
From: maorg@mellanox.com
Date: Wed, 26 Feb 2020 07:58:29 +0800
Subject: [PATCH] RDMA-core-fix-NULL
---
drivers/infiniband/core/security.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 2b4d803..9e27ca1 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
return NULL;
if (qp_attr_mask & IB_QP_PORT)
- new_pps->main.port_num =
- (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
+ new_pps->main.port_num = qp_attr->port_num;
+ else if (qp_pps)
+ new_pps->main.port_num = qp_pps->main.port_num;
+
if (qp_attr_mask & IB_QP_PKEY_INDEX)
- new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
- qp_attr->pkey_index;
+ new_pps->main.pkey_index = qp_attr->pkey_index;
+ else if (qp_pps)
+ new_pps->main.pkey_index = qp_pps->main.pkey_index;
+
if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
new_pps->main.state = IB_PORT_PKEY_VALID;
--
2.24.1
[-- Attachment #3: RDMA-core-Fix-use-of-logical-OR-in-get_new_pps.patch --]
[-- Type: application/octet-stream, Size: 5511 bytes --]
From patchwork Mon Feb 17 20:43:18 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nathan Chancellor <natechancellor@gmail.com>
X-Patchwork-Id: 11387567
X-Patchwork-Delegate: jgg@ziepe.ca
Return-Path: <SRS0=eK3A=4F=vger.kernel.org=linux-rdma-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
[172.30.200.123])
by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B2E114E3
for <patchwork-linux-rdma@patchwork.kernel.org>;
Mon, 17 Feb 2020 20:43:40 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
by mail.kernel.org (Postfix) with ESMTP id EF64120801
for <patchwork-linux-rdma@patchwork.kernel.org>;
Mon, 17 Feb 2020 20:43:39 +0000 (UTC)
Authentication-Results: mail.kernel.org;
dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
header.b="NSM1P5Sb"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S1728676AbgBQUnj (ORCPT
<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
Mon, 17 Feb 2020 15:43:39 -0500
Received: from mail-oi1-f194.google.com ([209.85.167.194]:42426 "EHLO
mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
with ESMTP id S1727300AbgBQUni (ORCPT
<rfc822;linux-rdma@vger.kernel.org>); Mon, 17 Feb 2020 15:43:38 -0500
Received: by mail-oi1-f194.google.com with SMTP id j132so17938514oih.9;
Mon, 17 Feb 2020 12:43:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20161025;
h=from:to:cc:subject:date:message-id:mime-version
:content-transfer-encoding;
bh=mt8KEiHVFXt+VI7oyaRToYaExGUPicwbfI4j6wwtPQE=;
b=NSM1P5SbjhpDBQ9V9I+7JKKNZZ8Xsi/Ao/gOUbQ1xd+3FCSvZBiK2f28jPw8GLxAEi
aPZehpxvudMkidUrcGsB2Bew1M4jb7qwd7CU6KSuteWVELybmQqqn+sWdTuiGjRa2g10
+XPrCy7IfzxuiYXxJGNn7Ms7wtLppo/NuXOOLQgDXLpcxFU4SBFDoIcJJzIs6MrZpt5v
OK9Wpq4viCjxUrxAqvRh/W2VHdxlS/M8ZahbKDXH/U2gJQ5iTtyzaTqqisYboEJxjtVl
hNHbFyIaNBkGy8Y7gWacVVo0+X77h06DaEi0HIZrwH3mG260jhh4PYTM8+cJZWgXikmi
bYAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version
:content-transfer-encoding;
bh=mt8KEiHVFXt+VI7oyaRToYaExGUPicwbfI4j6wwtPQE=;
b=nHxJb/N0aEXI1NLuEX6v7goMRKLqSy2/f6Oe/ur4GDnQuJUWv1J3+8Y9hjmvmTffGd
hDULZoeMDf0ZI8lNjXGj5mGcQjgm4DPCfCU5lHtSGCztmG9J67UAI2blGUPRa99n8X8h
hAK6FSREY/mooA0V2D2ww1ry/6800CZI5OBBLhE3xSp8nd38YT9Sco6bBKmkqD8RqF1X
TQ3JmRGtHeBALgLm5Cwlr1KtB6i35NHyMlHNhdwPSKDvZGvjTqw4YFRHiSIX16K5a9Ag
Axrw5TOTyicoVx7j0AmPBQI1veCKvoVSC7tCjY2QEEN1K4RjKyAVhZ154iDOanXAwne4
RiEA==
X-Gm-Message-State: APjAAAWx8xNsmDH5SniaSlaS6gKI0cMNDnb6qfbkgcsQom4cDI5RRHIH
XioNkIq8Dk7YsiSnNin+azk=
X-Google-Smtp-Source:
APXvYqwScEu4D7KeCYqO8/1v9KdWk5GSYNtypdkxNUfqBHecf0KjewwAPXsmUl0Uj1AZUoq+J4MAcA==
X-Received: by 2002:aca:530e:: with SMTP id h14mr505712oib.105.1581972218092;
Mon, 17 Feb 2020 12:43:38 -0800 (PST)
Received: from localhost.localdomain ([2604:1380:4111:8b00::1])
by smtp.gmail.com with ESMTPSA id
w20sm545592otj.21.2020.02.17.12.43.37
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Mon, 17 Feb 2020 12:43:37 -0800 (PST)
From: Nathan Chancellor <natechancellor@gmail.com>
To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, clang-built-linux@googlegroups.com,
Nathan Chancellor <natechancellor@gmail.com>
Subject: [PATCH] RDMA/core: Fix use of logical OR in get_new_pps
Date: Mon, 17 Feb 2020 13:43:18 -0700
Message-Id: <20200217204318.13609-1-natechancellor@gmail.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
X-Patchwork-Bot: notify
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
Clang warns:
../drivers/infiniband/core/security.c:351:41: warning: converting the
enum constant to a boolean [-Wint-in-bool-context]
if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
^
1 warning generated.
A bitwise OR should have been used instead.
Fixes: 1dd017882e01 ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Link: https://github.com/ClangBuiltLinux/linux/issues/889
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/security.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 2b4d80393bd0..b9a36ea244d4 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -348,7 +348,7 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
new_pps->main.state = IB_PORT_PKEY_VALID;
- if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
+ if (!(qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) && qp_pps) {
new_pps->main.port_num = qp_pps->main.port_num;
new_pps->main.pkey_index = qp_pps->main.pkey_index;
if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-02-26 0:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20200220112231.34FB.409509F4@e16-tech.com>
2020-02-20 13:57 ` a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 Chuck Lever
2020-02-20 14:05 ` Leon Romanovsky
2020-02-20 16:26 ` Wang Yugui
2020-02-25 13:05 ` Maor Gottlieb
2020-02-26 0:44 ` Wang Yugui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).