* [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
@ 2020-08-12 11:14 Kamal Heib
2020-08-15 6:58 ` Zhu Yanjun
0 siblings, 1 reply; 11+ messages in thread
From: Kamal Heib @ 2020-08-12 11:14 UTC (permalink / raw)
To: linux-rdma; +Cc: Doug Ledford, Jason Gunthorpe, Zhu Yanjun, Kamal Heib
To avoid the following kernel panic when calling kmem_cache_create()
with a NULL pointer from pool_cache(), move the rxe_cache_init() to the
context of device creation.
BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
Call Trace:
rxe_alloc+0xc8/0x160 [rdma_rxe]
rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
__ib_alloc_pd+0xcb/0x160 [ib_core]
ib_mad_init_device+0x296/0x8b0 [ib_core]
add_client_context+0x11a/0x160 [ib_core]
enable_device_and_get+0xdc/0x1d0 [ib_core]
ib_register_device+0x572/0x6b0 [ib_core]
? crypto_create_tfm+0x32/0xe0
? crypto_create_tfm+0x7a/0xe0
? crypto_alloc_tfm+0x58/0xf0
rxe_register_device+0x19d/0x1c0 [rdma_rxe]
rxe_net_add+0x3d/0x70 [rdma_rxe]
? dev_get_by_name_rcu+0x73/0x90
rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
parse_args+0x179/0x370
? ref_module+0x1b0/0x1b0
load_module+0x135e/0x17e0
? ref_module+0x1b0/0x1b0
? __do_sys_init_module+0x13b/0x180
__do_sys_init_module+0x13b/0x180
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7f9137ed296e
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
---
drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 5642eefb4ba1..60d5086dd34d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
goto err;
}
+ /* initialize slab caches for managed objects */
+ err = rxe_cache_init();
+ if (err) {
+ pr_err("unable to init object pools\n");
+ goto err;
+ }
+
err = rxe_net_add(ibdev_name, ndev);
if (err) {
pr_err("failed to add %s\n", ndev->name);
@@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
{
int err;
- /* initialize slab caches for managed objects */
- err = rxe_cache_init();
- if (err) {
- pr_err("unable to init object pools\n");
- return err;
- }
-
err = rxe_net_init();
if (err)
return err;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index fbcbac52290b..06c6d1f835b7 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -139,6 +139,9 @@ int rxe_cache_init(void)
for (i = 0; i < RXE_NUM_TYPES; i++) {
type = &rxe_type_info[i];
size = ALIGN(type->size, RXE_POOL_ALIGN);
+ if (type->cache)
+ continue;
+
if (!(type->flags & RXE_POOL_NO_ALLOC)) {
type->cache =
kmem_cache_create(type->name, size,
diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c
index ccda5f5a3bc0..d0af48ba0110 100644
--- a/drivers/infiniband/sw/rxe/rxe_sysfs.c
+++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c
@@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp)
goto err;
}
+ /* initialize slab caches for managed objects */
+ err = rxe_cache_init();
+ if (err) {
+ pr_err("unable to init object pools\n");
+ goto err;
+ }
+
err = rxe_net_add("rxe%d", ndev);
if (err) {
pr_err("failed to add %s\n", intf);
--
2.25.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-12 11:14 [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create() Kamal Heib
@ 2020-08-15 6:58 ` Zhu Yanjun
2020-08-16 22:12 ` Kamal Heib
0 siblings, 1 reply; 11+ messages in thread
From: Zhu Yanjun @ 2020-08-15 6:58 UTC (permalink / raw)
To: Kamal Heib, linux-rdma; +Cc: Doug Ledford, Jason Gunthorpe
On 8/12/2020 7:14 PM, Kamal Heib wrote:
> To avoid the following kernel panic when calling kmem_cache_create()
> with a NULL pointer from pool_cache(),
What is the root cause of this kernel panic?
Zhu Yanjun
> move the rxe_cache_init() to the
> context of device creation.
>
> BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP NOPTI
> CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> Call Trace:
> rxe_alloc+0xc8/0x160 [rdma_rxe]
> rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> __ib_alloc_pd+0xcb/0x160 [ib_core]
> ib_mad_init_device+0x296/0x8b0 [ib_core]
> add_client_context+0x11a/0x160 [ib_core]
> enable_device_and_get+0xdc/0x1d0 [ib_core]
> ib_register_device+0x572/0x6b0 [ib_core]
> ? crypto_create_tfm+0x32/0xe0
> ? crypto_create_tfm+0x7a/0xe0
> ? crypto_alloc_tfm+0x58/0xf0
> rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> rxe_net_add+0x3d/0x70 [rdma_rxe]
> ? dev_get_by_name_rcu+0x73/0x90
> rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> parse_args+0x179/0x370
> ? ref_module+0x1b0/0x1b0
> load_module+0x135e/0x17e0
> ? ref_module+0x1b0/0x1b0
> ? __do_sys_init_module+0x13b/0x180
> __do_sys_init_module+0x13b/0x180
> do_syscall_64+0x5b/0x1a0
> entry_SYSCALL_64_after_hwframe+0x65/0xca
> RIP: 0033:0x7f9137ed296e
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> ---
> drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> 3 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 5642eefb4ba1..60d5086dd34d 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> goto err;
> }
>
> + /* initialize slab caches for managed objects */
> + err = rxe_cache_init();
> + if (err) {
> + pr_err("unable to init object pools\n");
> + goto err;
> + }
> +
> err = rxe_net_add(ibdev_name, ndev);
> if (err) {
> pr_err("failed to add %s\n", ndev->name);
> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> {
> int err;
>
> - /* initialize slab caches for managed objects */
> - err = rxe_cache_init();
> - if (err) {
> - pr_err("unable to init object pools\n");
> - return err;
> - }
> -
> err = rxe_net_init();
> if (err)
> return err;
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index fbcbac52290b..06c6d1f835b7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -139,6 +139,9 @@ int rxe_cache_init(void)
> for (i = 0; i < RXE_NUM_TYPES; i++) {
> type = &rxe_type_info[i];
> size = ALIGN(type->size, RXE_POOL_ALIGN);
> + if (type->cache)
> + continue;
> +
> if (!(type->flags & RXE_POOL_NO_ALLOC)) {
> type->cache =
> kmem_cache_create(type->name, size,
> diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c
> index ccda5f5a3bc0..d0af48ba0110 100644
> --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c
> @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp)
> goto err;
> }
>
> + /* initialize slab caches for managed objects */
> + err = rxe_cache_init();
> + if (err) {
> + pr_err("unable to init object pools\n");
> + goto err;
> + }
> +
> err = rxe_net_add("rxe%d", ndev);
> if (err) {
> pr_err("failed to add %s\n", intf);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-15 6:58 ` Zhu Yanjun
@ 2020-08-16 22:12 ` Kamal Heib
2020-08-18 1:48 ` Zhu Yanjun
0 siblings, 1 reply; 11+ messages in thread
From: Kamal Heib @ 2020-08-16 22:12 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe
On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > To avoid the following kernel panic when calling kmem_cache_create()
> > with a NULL pointer from pool_cache(),
>
> What is the root cause of this kernel panic?
>
The kernel panic is triggered using the following command and it happen
because the cache is not getting initialized.
modprobe rdma_rxe add=eno1
Thanks,
Kamal
> Zhu Yanjun
>
> > move the rxe_cache_init() to the
> > context of device creation.
> >
> > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > PGD 0 P4D 0
> > Oops: 0000 [#1] SMP NOPTI
> > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > Call Trace:
> > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > add_client_context+0x11a/0x160 [ib_core]
> > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > ib_register_device+0x572/0x6b0 [ib_core]
> > ? crypto_create_tfm+0x32/0xe0
> > ? crypto_create_tfm+0x7a/0xe0
> > ? crypto_alloc_tfm+0x58/0xf0
> > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > ? dev_get_by_name_rcu+0x73/0x90
> > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > parse_args+0x179/0x370
> > ? ref_module+0x1b0/0x1b0
> > load_module+0x135e/0x17e0
> > ? ref_module+0x1b0/0x1b0
> > ? __do_sys_init_module+0x13b/0x180
> > __do_sys_init_module+0x13b/0x180
> > do_syscall_64+0x5b/0x1a0
> > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > RIP: 0033:0x7f9137ed296e
> >
> > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > 3 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > index 5642eefb4ba1..60d5086dd34d 100644
> > --- a/drivers/infiniband/sw/rxe/rxe.c
> > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > goto err;
> > }
> > + /* initialize slab caches for managed objects */
> > + err = rxe_cache_init();
> > + if (err) {
> > + pr_err("unable to init object pools\n");
> > + goto err;
> > + }
> > +
> > err = rxe_net_add(ibdev_name, ndev);
> > if (err) {
> > pr_err("failed to add %s\n", ndev->name);
> > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > {
> > int err;
> > - /* initialize slab caches for managed objects */
> > - err = rxe_cache_init();
> > - if (err) {
> > - pr_err("unable to init object pools\n");
> > - return err;
> > - }
> > -
> > err = rxe_net_init();
> > if (err)
> > return err;
> > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> > index fbcbac52290b..06c6d1f835b7 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> > @@ -139,6 +139,9 @@ int rxe_cache_init(void)
> > for (i = 0; i < RXE_NUM_TYPES; i++) {
> > type = &rxe_type_info[i];
> > size = ALIGN(type->size, RXE_POOL_ALIGN);
> > + if (type->cache)
> > + continue;
> > +
> > if (!(type->flags & RXE_POOL_NO_ALLOC)) {
> > type->cache =
> > kmem_cache_create(type->name, size,
> > diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c
> > index ccda5f5a3bc0..d0af48ba0110 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c
> > @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp)
> > goto err;
> > }
> > + /* initialize slab caches for managed objects */
> > + err = rxe_cache_init();
> > + if (err) {
> > + pr_err("unable to init object pools\n");
> > + goto err;
> > + }
> > +
> > err = rxe_net_add("rxe%d", ndev);
> > if (err) {
> > pr_err("failed to add %s\n", intf);
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-16 22:12 ` Kamal Heib
@ 2020-08-18 1:48 ` Zhu Yanjun
2020-08-18 5:50 ` Kamal Heib
0 siblings, 1 reply; 11+ messages in thread
From: Zhu Yanjun @ 2020-08-18 1:48 UTC (permalink / raw)
To: Kamal Heib; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe
On 8/17/2020 6:12 AM, Kamal Heib wrote:
> On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
>> On 8/12/2020 7:14 PM, Kamal Heib wrote:
>>> To avoid the following kernel panic when calling kmem_cache_create()
>>> with a NULL pointer from pool_cache(),
>> What is the root cause of this kernel panic?
>>
> The kernel panic is triggered using the following command and it happen
> because the cache is not getting initialized.
>
> modprobe rdma_rxe add=eno1
>
> Thanks,
> Kamal
>
>> Zhu Yanjun
>>
>>> move the rxe_cache_init() to the
>>> context of device creation.
>>>
>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
>>> PGD 0 P4D 0
>>> Oops: 0000 [#1] SMP NOPTI
>>> CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
>>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
>>> RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
>>> Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
>>> RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
>>> RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
>>> RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
>>> R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
>>> R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
>>> FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
>>> Call Trace:
>>> rxe_alloc+0xc8/0x160 [rdma_rxe]
>>> rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
>>> __ib_alloc_pd+0xcb/0x160 [ib_core]
>>> ib_mad_init_device+0x296/0x8b0 [ib_core]
>>> add_client_context+0x11a/0x160 [ib_core]
>>> enable_device_and_get+0xdc/0x1d0 [ib_core]
>>> ib_register_device+0x572/0x6b0 [ib_core]
>>> ? crypto_create_tfm+0x32/0xe0
>>> ? crypto_create_tfm+0x7a/0xe0
>>> ? crypto_alloc_tfm+0x58/0xf0
>>> rxe_register_device+0x19d/0x1c0 [rdma_rxe]
>>> rxe_net_add+0x3d/0x70 [rdma_rxe]
>>> ? dev_get_by_name_rcu+0x73/0x90
>>> rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
>>> parse_args+0x179/0x370
>>> ? ref_module+0x1b0/0x1b0
>>> load_module+0x135e/0x17e0
>>> ? ref_module+0x1b0/0x1b0
>>> ? __do_sys_init_module+0x13b/0x180
>>> __do_sys_init_module+0x13b/0x180
>>> do_syscall_64+0x5b/0x1a0
>>> entry_SYSCALL_64_after_hwframe+0x65/0xca
>>> RIP: 0033:0x7f9137ed296e
>>>
>>> Fixes: 8700e3e7c485 ("Soft RoCE driver")
>>> Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
>>> ---
>>> drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
>>> drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
>>> drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
>>> 3 files changed, 17 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
>>> index 5642eefb4ba1..60d5086dd34d 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>>> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>>> goto err;
>>> }
>>> + /* initialize slab caches for managed objects */
>>> + err = rxe_cache_init();
>>> + if (err) {
>>> + pr_err("unable to init object pools\n");
>>> + goto err;
>>> + }
>>> +
>>> err = rxe_net_add(ibdev_name, ndev);
>>> if (err) {
>>> pr_err("failed to add %s\n", ndev->name);
>>> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
>>> {
>>> int err;
>>> - /* initialize slab caches for managed objects */
>>> - err = rxe_cache_init();
When modprobe rdma_rxe, rxe_module_init should be called. Then
rxe_cache_init should be also called.
Why does the above call trace occur?
Zhu Yanjun
>>> - if (err) {
>>> - pr_err("unable to init object pools\n");
>>> - return err;
>>> - }
>>> -
>>> err = rxe_net_init();
>>> if (err)
>>> return err;
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>>> index fbcbac52290b..06c6d1f835b7 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>>> @@ -139,6 +139,9 @@ int rxe_cache_init(void)
>>> for (i = 0; i < RXE_NUM_TYPES; i++) {
>>> type = &rxe_type_info[i];
>>> size = ALIGN(type->size, RXE_POOL_ALIGN);
>>> + if (type->cache)
>>> + continue;
>>> +
>>> if (!(type->flags & RXE_POOL_NO_ALLOC)) {
>>> type->cache =
>>> kmem_cache_create(type->name, size,
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c
>>> index ccda5f5a3bc0..d0af48ba0110 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c
>>> @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp)
>>> goto err;
>>> }
>>> + /* initialize slab caches for managed objects */
>>> + err = rxe_cache_init();
>>> + if (err) {
>>> + pr_err("unable to init object pools\n");
>>> + goto err;
>>> + }
>>> +
>>> err = rxe_net_add("rxe%d", ndev);
>>> if (err) {
>>> pr_err("failed to add %s\n", intf);
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-18 1:48 ` Zhu Yanjun
@ 2020-08-18 5:50 ` Kamal Heib
2020-08-18 7:49 ` Leon Romanovsky
0 siblings, 1 reply; 11+ messages in thread
From: Kamal Heib @ 2020-08-18 5:50 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe
On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
> On 8/17/2020 6:12 AM, Kamal Heib wrote:
> > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> > > On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > > > To avoid the following kernel panic when calling kmem_cache_create()
> > > > with a NULL pointer from pool_cache(),
> > > What is the root cause of this kernel panic?
> > >
> > The kernel panic is triggered using the following command and it happen
> > because the cache is not getting initialized.
> >
> > modprobe rdma_rxe add=eno1
> >
> > Thanks,
> > Kamal
> >
> > > Zhu Yanjun
> > >
> > > > move the rxe_cache_init() to the
> > > > context of device creation.
> > > >
> > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > > > PGD 0 P4D 0
> > > > Oops: 0000 [#1] SMP NOPTI
> > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > > > Call Trace:
> > > > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > > > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > > > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > > > add_client_context+0x11a/0x160 [ib_core]
> > > > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > > > ib_register_device+0x572/0x6b0 [ib_core]
> > > > ? crypto_create_tfm+0x32/0xe0
> > > > ? crypto_create_tfm+0x7a/0xe0
> > > > ? crypto_alloc_tfm+0x58/0xf0
> > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > > > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > > > ? dev_get_by_name_rcu+0x73/0x90
> > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > > > parse_args+0x179/0x370
> > > > ? ref_module+0x1b0/0x1b0
> > > > load_module+0x135e/0x17e0
> > > > ? ref_module+0x1b0/0x1b0
> > > > ? __do_sys_init_module+0x13b/0x180
> > > > __do_sys_init_module+0x13b/0x180
> > > > do_syscall_64+0x5b/0x1a0
> > > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > > RIP: 0033:0x7f9137ed296e
> > > >
> > > > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > > > ---
> > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > > > index 5642eefb4ba1..60d5086dd34d 100644
> > > > --- a/drivers/infiniband/sw/rxe/rxe.c
> > > > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > > > goto err;
> > > > }
> > > > + /* initialize slab caches for managed objects */
> > > > + err = rxe_cache_init();
> > > > + if (err) {
> > > > + pr_err("unable to init object pools\n");
> > > > + goto err;
> > > > + }
> > > > +
> > > > err = rxe_net_add(ibdev_name, ndev);
> > > > if (err) {
> > > > pr_err("failed to add %s\n", ndev->name);
> > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > > > {
> > > > int err;
> > > > - /* initialize slab caches for managed objects */
> > > > - err = rxe_cache_init();
>
> When modprobe rdma_rxe, rxe_module_init should be called. Then
> rxe_cache_init should be also called.
>
> Why does the above call trace occur?
>
> Zhu Yanjun
>
As you can see in the call trace attached to the commit message, When
running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
is called before rxe_module_init() (without init the caches), so the
call trace occurs when trying to register the allocated rxe device from
the context of rxe_param_set_add() without initialize the caches.
Thanks,
Kamal
> > > > - if (err) {
> > > > - pr_err("unable to init object pools\n");
> > > > - return err;
> > > > - }
> > > > -
> > > > err = rxe_net_init();
> > > > if (err)
> > > > return err;
> > > > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> > > > index fbcbac52290b..06c6d1f835b7 100644
> > > > --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> > > > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> > > > @@ -139,6 +139,9 @@ int rxe_cache_init(void)
> > > > for (i = 0; i < RXE_NUM_TYPES; i++) {
> > > > type = &rxe_type_info[i];
> > > > size = ALIGN(type->size, RXE_POOL_ALIGN);
> > > > + if (type->cache)
> > > > + continue;
> > > > +
> > > > if (!(type->flags & RXE_POOL_NO_ALLOC)) {
> > > > type->cache =
> > > > kmem_cache_create(type->name, size,
> > > > diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c
> > > > index ccda5f5a3bc0..d0af48ba0110 100644
> > > > --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c
> > > > +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c
> > > > @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp)
> > > > goto err;
> > > > }
> > > > + /* initialize slab caches for managed objects */
> > > > + err = rxe_cache_init();
> > > > + if (err) {
> > > > + pr_err("unable to init object pools\n");
> > > > + goto err;
> > > > + }
> > > > +
> > > > err = rxe_net_add("rxe%d", ndev);
> > > > if (err) {
> > > > pr_err("failed to add %s\n", intf);
> > >
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-18 5:50 ` Kamal Heib
@ 2020-08-18 7:49 ` Leon Romanovsky
2020-08-18 14:18 ` Kamal Heib
2020-08-19 3:07 ` Zhu Yanjun
0 siblings, 2 replies; 11+ messages in thread
From: Leon Romanovsky @ 2020-08-18 7:49 UTC (permalink / raw)
To: Kamal Heib; +Cc: Zhu Yanjun, linux-rdma, Doug Ledford, Jason Gunthorpe
On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
> On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
> > On 8/17/2020 6:12 AM, Kamal Heib wrote:
> > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> > > > On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > > > > To avoid the following kernel panic when calling kmem_cache_create()
> > > > > with a NULL pointer from pool_cache(),
> > > > What is the root cause of this kernel panic?
> > > >
> > > The kernel panic is triggered using the following command and it happen
> > > because the cache is not getting initialized.
> > >
> > > modprobe rdma_rxe add=eno1
> > >
> > > Thanks,
> > > Kamal
> > >
> > > > Zhu Yanjun
> > > >
> > > > > move the rxe_cache_init() to the
> > > > > context of device creation.
> > > > >
> > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > > > > PGD 0 P4D 0
> > > > > Oops: 0000 [#1] SMP NOPTI
> > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > > > > Call Trace:
> > > > > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > > > > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > > > > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > > > > add_client_context+0x11a/0x160 [ib_core]
> > > > > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > > > > ib_register_device+0x572/0x6b0 [ib_core]
> > > > > ? crypto_create_tfm+0x32/0xe0
> > > > > ? crypto_create_tfm+0x7a/0xe0
> > > > > ? crypto_alloc_tfm+0x58/0xf0
> > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > > > > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > > > > ? dev_get_by_name_rcu+0x73/0x90
> > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > > > > parse_args+0x179/0x370
> > > > > ? ref_module+0x1b0/0x1b0
> > > > > load_module+0x135e/0x17e0
> > > > > ? ref_module+0x1b0/0x1b0
> > > > > ? __do_sys_init_module+0x13b/0x180
> > > > > __do_sys_init_module+0x13b/0x180
> > > > > do_syscall_64+0x5b/0x1a0
> > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > > > RIP: 0033:0x7f9137ed296e
> > > > >
> > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > > > > ---
> > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > > > > index 5642eefb4ba1..60d5086dd34d 100644
> > > > > --- a/drivers/infiniband/sw/rxe/rxe.c
> > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > > > > goto err;
> > > > > }
> > > > > + /* initialize slab caches for managed objects */
> > > > > + err = rxe_cache_init();
> > > > > + if (err) {
> > > > > + pr_err("unable to init object pools\n");
> > > > > + goto err;
> > > > > + }
> > > > > +
> > > > > err = rxe_net_add(ibdev_name, ndev);
> > > > > if (err) {
> > > > > pr_err("failed to add %s\n", ndev->name);
> > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > > > > {
> > > > > int err;
> > > > > - /* initialize slab caches for managed objects */
> > > > > - err = rxe_cache_init();
> >
> > When modprobe rdma_rxe, rxe_module_init should be called. Then
> > rxe_cache_init should be also called.
> >
> > Why does the above call trace occur?
> >
> > Zhu Yanjun
> >
>
> As you can see in the call trace attached to the commit message, When
> running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
> is called before rxe_module_init() (without init the caches), so the
> call trace occurs when trying to register the allocated rxe device from
> the context of rxe_param_set_add() without initialize the caches.
I would expect the fix being in rxe_init() instead of putting calls to
rxe_cache_init() in all places.
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-18 7:49 ` Leon Romanovsky
@ 2020-08-18 14:18 ` Kamal Heib
2020-08-19 3:07 ` Zhu Yanjun
1 sibling, 0 replies; 11+ messages in thread
From: Kamal Heib @ 2020-08-18 14:18 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Zhu Yanjun, linux-rdma, Doug Ledford, Jason Gunthorpe
On Tue, Aug 18, 2020 at 10:49:56AM +0300, Leon Romanovsky wrote:
> On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
> > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
> > > On 8/17/2020 6:12 AM, Kamal Heib wrote:
> > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > > > > > To avoid the following kernel panic when calling kmem_cache_create()
> > > > > > with a NULL pointer from pool_cache(),
> > > > > What is the root cause of this kernel panic?
> > > > >
> > > > The kernel panic is triggered using the following command and it happen
> > > > because the cache is not getting initialized.
> > > >
> > > > modprobe rdma_rxe add=eno1
> > > >
> > > > Thanks,
> > > > Kamal
> > > >
> > > > > Zhu Yanjun
> > > > >
> > > > > > move the rxe_cache_init() to the
> > > > > > context of device creation.
> > > > > >
> > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > > > > > PGD 0 P4D 0
> > > > > > Oops: 0000 [#1] SMP NOPTI
> > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > > > > > Call Trace:
> > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > > > > > add_client_context+0x11a/0x160 [ib_core]
> > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > > > > > ib_register_device+0x572/0x6b0 [ib_core]
> > > > > > ? crypto_create_tfm+0x32/0xe0
> > > > > > ? crypto_create_tfm+0x7a/0xe0
> > > > > > ? crypto_alloc_tfm+0x58/0xf0
> > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > > > > > ? dev_get_by_name_rcu+0x73/0x90
> > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > > > > > parse_args+0x179/0x370
> > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > load_module+0x135e/0x17e0
> > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > ? __do_sys_init_module+0x13b/0x180
> > > > > > __do_sys_init_module+0x13b/0x180
> > > > > > do_syscall_64+0x5b/0x1a0
> > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > > > > RIP: 0033:0x7f9137ed296e
> > > > > >
> > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > > > > > ---
> > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > > > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > index 5642eefb4ba1..60d5086dd34d 100644
> > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c
> > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > > > > > goto err;
> > > > > > }
> > > > > > + /* initialize slab caches for managed objects */
> > > > > > + err = rxe_cache_init();
> > > > > > + if (err) {
> > > > > > + pr_err("unable to init object pools\n");
> > > > > > + goto err;
> > > > > > + }
> > > > > > +
> > > > > > err = rxe_net_add(ibdev_name, ndev);
> > > > > > if (err) {
> > > > > > pr_err("failed to add %s\n", ndev->name);
> > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > > > > > {
> > > > > > int err;
> > > > > > - /* initialize slab caches for managed objects */
> > > > > > - err = rxe_cache_init();
> > >
> > > When modprobe rdma_rxe, rxe_module_init should be called. Then
> > > rxe_cache_init should be also called.
> > >
> > > Why does the above call trace occur?
> > >
> > > Zhu Yanjun
> > >
> >
> > As you can see in the call trace attached to the commit message, When
> > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
> > is called before rxe_module_init() (without init the caches), so the
> > call trace occurs when trying to register the allocated rxe device from
> > the context of rxe_param_set_add() without initialize the caches.
>
> I would expect the fix being in rxe_init() instead of putting calls to
> rxe_cache_init() in all places.
>
> Thanks
OK, I agree.
I'll post v2.
Thanks,
Kamal
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-18 7:49 ` Leon Romanovsky
2020-08-18 14:18 ` Kamal Heib
@ 2020-08-19 3:07 ` Zhu Yanjun
2020-08-19 4:58 ` Leon Romanovsky
1 sibling, 1 reply; 11+ messages in thread
From: Zhu Yanjun @ 2020-08-19 3:07 UTC (permalink / raw)
To: Leon Romanovsky, Kamal Heib; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe
On 8/18/2020 3:49 PM, Leon Romanovsky wrote:
> On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
>> On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
>>> On 8/17/2020 6:12 AM, Kamal Heib wrote:
>>>> On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
>>>>> On 8/12/2020 7:14 PM, Kamal Heib wrote:
>>>>>> To avoid the following kernel panic when calling kmem_cache_create()
>>>>>> with a NULL pointer from pool_cache(),
>>>>> What is the root cause of this kernel panic?
>>>>>
>>>> The kernel panic is triggered using the following command and it happen
>>>> because the cache is not getting initialized.
>>>>
>>>> modprobe rdma_rxe add=eno1
>>>>
>>>> Thanks,
>>>> Kamal
>>>>
>>>>> Zhu Yanjun
>>>>>
>>>>>> move the rxe_cache_init() to the
>>>>>> context of device creation.
>>>>>>
>>>>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
>>>>>> PGD 0 P4D 0
>>>>>> Oops: 0000 [#1] SMP NOPTI
>>>>>> CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
>>>>>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
>>>>>> RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
>>>>>> Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
>>>>>> RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
>>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
>>>>>> RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
>>>>>> RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
>>>>>> R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
>>>>>> R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
>>>>>> FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
>>>>>> Call Trace:
>>>>>> rxe_alloc+0xc8/0x160 [rdma_rxe]
>>>>>> rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
>>>>>> __ib_alloc_pd+0xcb/0x160 [ib_core]
>>>>>> ib_mad_init_device+0x296/0x8b0 [ib_core]
>>>>>> add_client_context+0x11a/0x160 [ib_core]
>>>>>> enable_device_and_get+0xdc/0x1d0 [ib_core]
>>>>>> ib_register_device+0x572/0x6b0 [ib_core]
>>>>>> ? crypto_create_tfm+0x32/0xe0
>>>>>> ? crypto_create_tfm+0x7a/0xe0
>>>>>> ? crypto_alloc_tfm+0x58/0xf0
>>>>>> rxe_register_device+0x19d/0x1c0 [rdma_rxe]
>>>>>> rxe_net_add+0x3d/0x70 [rdma_rxe]
>>>>>> ? dev_get_by_name_rcu+0x73/0x90
>>>>>> rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
>>>>>> parse_args+0x179/0x370
>>>>>> ? ref_module+0x1b0/0x1b0
>>>>>> load_module+0x135e/0x17e0
>>>>>> ? ref_module+0x1b0/0x1b0
>>>>>> ? __do_sys_init_module+0x13b/0x180
>>>>>> __do_sys_init_module+0x13b/0x180
>>>>>> do_syscall_64+0x5b/0x1a0
>>>>>> entry_SYSCALL_64_after_hwframe+0x65/0xca
>>>>>> RIP: 0033:0x7f9137ed296e
>>>>>>
>>>>>> Fixes: 8700e3e7c485 ("Soft RoCE driver")
>>>>>> Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
>>>>>> ---
>>>>>> drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
>>>>>> drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
>>>>>> drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
>>>>>> 3 files changed, 17 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
>>>>>> index 5642eefb4ba1..60d5086dd34d 100644
>>>>>> --- a/drivers/infiniband/sw/rxe/rxe.c
>>>>>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>>>>>> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>>>>>> goto err;
>>>>>> }
>>>>>> + /* initialize slab caches for managed objects */
>>>>>> + err = rxe_cache_init();
>>>>>> + if (err) {
>>>>>> + pr_err("unable to init object pools\n");
>>>>>> + goto err;
>>>>>> + }
>>>>>> +
>>>>>> err = rxe_net_add(ibdev_name, ndev);
>>>>>> if (err) {
>>>>>> pr_err("failed to add %s\n", ndev->name);
>>>>>> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
>>>>>> {
>>>>>> int err;
>>>>>> - /* initialize slab caches for managed objects */
>>>>>> - err = rxe_cache_init();
>>> When modprobe rdma_rxe, rxe_module_init should be called. Then
>>> rxe_cache_init should be also called.
>>>
>>> Why does the above call trace occur?
>>>
>>> Zhu Yanjun
>>>
>> As you can see in the call trace attached to the commit message, When
>> running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
>> is called before rxe_module_init() (without init the caches), so the
>> call trace occurs when trying to register the allocated rxe device from
>> the context of rxe_param_set_add() without initialize the caches.
> I would expect the fix being in rxe_init() instead of putting calls to
> rxe_cache_init() in all places.
I agree with you.
Is it possible to make rxe_module_init be called before rxe_param_set_add?
Thanks
>
> Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-19 3:07 ` Zhu Yanjun
@ 2020-08-19 4:58 ` Leon Romanovsky
2020-08-19 6:19 ` Zhu Yanjun
0 siblings, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2020-08-19 4:58 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: Kamal Heib, linux-rdma, Doug Ledford, Jason Gunthorpe
On Wed, Aug 19, 2020 at 11:07:56AM +0800, Zhu Yanjun wrote:
> On 8/18/2020 3:49 PM, Leon Romanovsky wrote:
> > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
> > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
> > > > On 8/17/2020 6:12 AM, Kamal Heib wrote:
> > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > > > > > > To avoid the following kernel panic when calling kmem_cache_create()
> > > > > > > with a NULL pointer from pool_cache(),
> > > > > > What is the root cause of this kernel panic?
> > > > > >
> > > > > The kernel panic is triggered using the following command and it happen
> > > > > because the cache is not getting initialized.
> > > > >
> > > > > modprobe rdma_rxe add=eno1
> > > > >
> > > > > Thanks,
> > > > > Kamal
> > > > >
> > > > > > Zhu Yanjun
> > > > > >
> > > > > > > move the rxe_cache_init() to the
> > > > > > > context of device creation.
> > > > > > >
> > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > > > > > > PGD 0 P4D 0
> > > > > > > Oops: 0000 [#1] SMP NOPTI
> > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > > > > > > Call Trace:
> > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > > > > > > add_client_context+0x11a/0x160 [ib_core]
> > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > > > > > > ib_register_device+0x572/0x6b0 [ib_core]
> > > > > > > ? crypto_create_tfm+0x32/0xe0
> > > > > > > ? crypto_create_tfm+0x7a/0xe0
> > > > > > > ? crypto_alloc_tfm+0x58/0xf0
> > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > > > > > > ? dev_get_by_name_rcu+0x73/0x90
> > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > > > > > > parse_args+0x179/0x370
> > > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > > load_module+0x135e/0x17e0
> > > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > > ? __do_sys_init_module+0x13b/0x180
> > > > > > > __do_sys_init_module+0x13b/0x180
> > > > > > > do_syscall_64+0x5b/0x1a0
> > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > > > > > RIP: 0033:0x7f9137ed296e
> > > > > > >
> > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > > > > > > ---
> > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > index 5642eefb4ba1..60d5086dd34d 100644
> > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > > > > > > goto err;
> > > > > > > }
> > > > > > > + /* initialize slab caches for managed objects */
> > > > > > > + err = rxe_cache_init();
> > > > > > > + if (err) {
> > > > > > > + pr_err("unable to init object pools\n");
> > > > > > > + goto err;
> > > > > > > + }
> > > > > > > +
> > > > > > > err = rxe_net_add(ibdev_name, ndev);
> > > > > > > if (err) {
> > > > > > > pr_err("failed to add %s\n", ndev->name);
> > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > > > > > > {
> > > > > > > int err;
> > > > > > > - /* initialize slab caches for managed objects */
> > > > > > > - err = rxe_cache_init();
> > > > When modprobe rdma_rxe, rxe_module_init should be called. Then
> > > > rxe_cache_init should be also called.
> > > >
> > > > Why does the above call trace occur?
> > > >
> > > > Zhu Yanjun
> > > >
> > > As you can see in the call trace attached to the commit message, When
> > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
> > > is called before rxe_module_init() (without init the caches), so the
> > > call trace occurs when trying to register the allocated rxe device from
> > > the context of rxe_param_set_add() without initialize the caches.
> > I would expect the fix being in rxe_init() instead of putting calls to
> > rxe_cache_init() in all places.
>
> I agree with you.
>
> Is it possible to make rxe_module_init be called before rxe_param_set_add?
The best solution will be to delete module_parameters() from RXE.
Thanks
>
> Thanks
>
> >
> > Thanks
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-19 4:58 ` Leon Romanovsky
@ 2020-08-19 6:19 ` Zhu Yanjun
2020-08-19 7:20 ` Leon Romanovsky
0 siblings, 1 reply; 11+ messages in thread
From: Zhu Yanjun @ 2020-08-19 6:19 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Kamal Heib, linux-rdma, Doug Ledford, Jason Gunthorpe
On Wed, Aug 19, 2020 at 12:58 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Aug 19, 2020 at 11:07:56AM +0800, Zhu Yanjun wrote:
> > On 8/18/2020 3:49 PM, Leon Romanovsky wrote:
> > > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
> > > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
> > > > > On 8/17/2020 6:12 AM, Kamal Heib wrote:
> > > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> > > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > > > > > > > To avoid the following kernel panic when calling kmem_cache_create()
> > > > > > > > with a NULL pointer from pool_cache(),
> > > > > > > What is the root cause of this kernel panic?
> > > > > > >
> > > > > > The kernel panic is triggered using the following command and it happen
> > > > > > because the cache is not getting initialized.
> > > > > >
> > > > > > modprobe rdma_rxe add=eno1
> > > > > >
> > > > > > Thanks,
> > > > > > Kamal
> > > > > >
> > > > > > > Zhu Yanjun
> > > > > > >
> > > > > > > > move the rxe_cache_init() to the
> > > > > > > > context of device creation.
> > > > > > > >
> > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > > > > > > > PGD 0 P4D 0
> > > > > > > > Oops: 0000 [#1] SMP NOPTI
> > > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > > > > > > > Call Trace:
> > > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > > > > > > > add_client_context+0x11a/0x160 [ib_core]
> > > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > > > > > > > ib_register_device+0x572/0x6b0 [ib_core]
> > > > > > > > ? crypto_create_tfm+0x32/0xe0
> > > > > > > > ? crypto_create_tfm+0x7a/0xe0
> > > > > > > > ? crypto_alloc_tfm+0x58/0xf0
> > > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > > > > > > > ? dev_get_by_name_rcu+0x73/0x90
> > > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > > > > > > > parse_args+0x179/0x370
> > > > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > > > load_module+0x135e/0x17e0
> > > > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > > > ? __do_sys_init_module+0x13b/0x180
> > > > > > > > __do_sys_init_module+0x13b/0x180
> > > > > > > > do_syscall_64+0x5b/0x1a0
> > > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > > > > > > RIP: 0033:0x7f9137ed296e
> > > > > > > >
> > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > > > > > > > ---
> > > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > > index 5642eefb4ba1..60d5086dd34d 100644
> > > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > > > > > > > goto err;
> > > > > > > > }
> > > > > > > > + /* initialize slab caches for managed objects */
> > > > > > > > + err = rxe_cache_init();
> > > > > > > > + if (err) {
> > > > > > > > + pr_err("unable to init object pools\n");
> > > > > > > > + goto err;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > err = rxe_net_add(ibdev_name, ndev);
> > > > > > > > if (err) {
> > > > > > > > pr_err("failed to add %s\n", ndev->name);
> > > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > > > > > > > {
> > > > > > > > int err;
> > > > > > > > - /* initialize slab caches for managed objects */
> > > > > > > > - err = rxe_cache_init();
> > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then
> > > > > rxe_cache_init should be also called.
> > > > >
> > > > > Why does the above call trace occur?
> > > > >
> > > > > Zhu Yanjun
> > > > >
> > > > As you can see in the call trace attached to the commit message, When
> > > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
> > > > is called before rxe_module_init() (without init the caches), so the
> > > > call trace occurs when trying to register the allocated rxe device from
> > > > the context of rxe_param_set_add() without initialize the caches.
> > > I would expect the fix being in rxe_init() instead of putting calls to
> > > rxe_cache_init() in all places.
> >
> > I agree with you.
> >
> > Is it possible to make rxe_module_init be called before rxe_param_set_add?
>
> The best solution will be to delete module_parameters() from RXE.
Sure. I am curious why the parameters are set before rxe_module_init.
Is this a bug?
Zhu Yanjun
>
> Thanks
>
> >
> > Thanks
> >
> > >
> > > Thanks
> >
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
2020-08-19 6:19 ` Zhu Yanjun
@ 2020-08-19 7:20 ` Leon Romanovsky
0 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2020-08-19 7:20 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: Kamal Heib, linux-rdma, Doug Ledford, Jason Gunthorpe
On Wed, Aug 19, 2020 at 02:19:20PM +0800, Zhu Yanjun wrote:
> On Wed, Aug 19, 2020 at 12:58 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Aug 19, 2020 at 11:07:56AM +0800, Zhu Yanjun wrote:
> > > On 8/18/2020 3:49 PM, Leon Romanovsky wrote:
> > > > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
> > > > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
> > > > > > On 8/17/2020 6:12 AM, Kamal Heib wrote:
> > > > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
> > > > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote:
> > > > > > > > > To avoid the following kernel panic when calling kmem_cache_create()
> > > > > > > > > with a NULL pointer from pool_cache(),
> > > > > > > > What is the root cause of this kernel panic?
> > > > > > > >
> > > > > > > The kernel panic is triggered using the following command and it happen
> > > > > > > because the cache is not getting initialized.
> > > > > > >
> > > > > > > modprobe rdma_rxe add=eno1
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kamal
> > > > > > >
> > > > > > > > Zhu Yanjun
> > > > > > > >
> > > > > > > > > move the rxe_cache_init() to the
> > > > > > > > > context of device creation.
> > > > > > > > >
> > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
> > > > > > > > > PGD 0 P4D 0
> > > > > > > > > Oops: 0000 [#1] SMP NOPTI
> > > > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
> > > > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
> > > > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
> > > > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
> > > > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
> > > > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
> > > > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
> > > > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
> > > > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
> > > > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
> > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
> > > > > > > > > Call Trace:
> > > > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe]
> > > > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
> > > > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core]
> > > > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core]
> > > > > > > > > add_client_context+0x11a/0x160 [ib_core]
> > > > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core]
> > > > > > > > > ib_register_device+0x572/0x6b0 [ib_core]
> > > > > > > > > ? crypto_create_tfm+0x32/0xe0
> > > > > > > > > ? crypto_create_tfm+0x7a/0xe0
> > > > > > > > > ? crypto_alloc_tfm+0x58/0xf0
> > > > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe]
> > > > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe]
> > > > > > > > > ? dev_get_by_name_rcu+0x73/0x90
> > > > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
> > > > > > > > > parse_args+0x179/0x370
> > > > > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > > > > load_module+0x135e/0x17e0
> > > > > > > > > ? ref_module+0x1b0/0x1b0
> > > > > > > > > ? __do_sys_init_module+0x13b/0x180
> > > > > > > > > __do_sys_init_module+0x13b/0x180
> > > > > > > > > do_syscall_64+0x5b/0x1a0
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > > > > > > > RIP: 0033:0x7f9137ed296e
> > > > > > > > >
> > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > > > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
> > > > > > > > > ---
> > > > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
> > > > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
> > > > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
> > > > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > > > index 5642eefb4ba1..60d5086dd34d 100644
> > > > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c
> > > > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> > > > > > > > > goto err;
> > > > > > > > > }
> > > > > > > > > + /* initialize slab caches for managed objects */
> > > > > > > > > + err = rxe_cache_init();
> > > > > > > > > + if (err) {
> > > > > > > > > + pr_err("unable to init object pools\n");
> > > > > > > > > + goto err;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > err = rxe_net_add(ibdev_name, ndev);
> > > > > > > > > if (err) {
> > > > > > > > > pr_err("failed to add %s\n", ndev->name);
> > > > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
> > > > > > > > > {
> > > > > > > > > int err;
> > > > > > > > > - /* initialize slab caches for managed objects */
> > > > > > > > > - err = rxe_cache_init();
> > > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then
> > > > > > rxe_cache_init should be also called.
> > > > > >
> > > > > > Why does the above call trace occur?
> > > > > >
> > > > > > Zhu Yanjun
> > > > > >
> > > > > As you can see in the call trace attached to the commit message, When
> > > > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
> > > > > is called before rxe_module_init() (without init the caches), so the
> > > > > call trace occurs when trying to register the allocated rxe device from
> > > > > the context of rxe_param_set_add() without initialize the caches.
> > > > I would expect the fix being in rxe_init() instead of putting calls to
> > > > rxe_cache_init() in all places.
> > >
> > > I agree with you.
> > >
> > > Is it possible to make rxe_module_init be called before rxe_param_set_add?
> >
> > The best solution will be to delete module_parameters() from RXE.
>
> Sure. I am curious why the parameters are set before rxe_module_init.
> Is this a bug?
Yes and no.
The part of receiving user input is correct and it should be done before
rxe_module_init(), so RXE can initialize properly based on the input.
The call to rxe_net_add() later inside of rxe_param_set_add() is wrong.
It should be done after rxe_module_init() finishes.
Thanks
>
> Zhu Yanjun
> >
> > Thanks
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks
> > >
> > >
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-08-19 7:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12 11:14 [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create() Kamal Heib
2020-08-15 6:58 ` Zhu Yanjun
2020-08-16 22:12 ` Kamal Heib
2020-08-18 1:48 ` Zhu Yanjun
2020-08-18 5:50 ` Kamal Heib
2020-08-18 7:49 ` Leon Romanovsky
2020-08-18 14:18 ` Kamal Heib
2020-08-19 3:07 ` Zhu Yanjun
2020-08-19 4:58 ` Leon Romanovsky
2020-08-19 6:19 ` Zhu Yanjun
2020-08-19 7:20 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).