All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhu Yanjun <zyjzyj2000@gmail.com>
To: Leon Romanovsky <leon@kernel.org>, Kamal Heib <kamalheib1@gmail.com>
Cc: linux-rdma@vger.kernel.org, Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create()
Date: Wed, 19 Aug 2020 11:07:56 +0800	[thread overview]
Message-ID: <31678b13-4db1-d454-a85c-1ba5c8029c41@gmail.com> (raw)
In-Reply-To: <20200818074956.GM7555@unreal>

On 8/18/2020 3:49 PM, Leon Romanovsky wrote:
> On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
>> On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
>>> On 8/17/2020 6:12 AM, Kamal Heib wrote:
>>>> On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
>>>>> On 8/12/2020 7:14 PM, Kamal Heib wrote:
>>>>>> To avoid the following kernel panic when calling kmem_cache_create()
>>>>>> with a NULL pointer from pool_cache(),
>>>>> What is the root cause of this kernel panic?
>>>>>
>>>> The kernel panic is triggered using the following command and it happen
>>>> because the cache is not getting initialized.
>>>>
>>>> modprobe rdma_rxe add=eno1
>>>>
>>>> Thanks,
>>>> Kamal
>>>>
>>>>> Zhu Yanjun
>>>>>
>>>>>>     move the rxe_cache_init() to the
>>>>>> context of device creation.
>>>>>>
>>>>>>     BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
>>>>>>     PGD 0 P4D 0
>>>>>>     Oops: 0000 [#1] SMP NOPTI
>>>>>>     CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
>>>>>>     Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
>>>>>>     RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
>>>>>>     Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
>>>>>>     RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
>>>>>>     RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
>>>>>>     RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
>>>>>>     RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
>>>>>>     R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
>>>>>>     R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
>>>>>>     FS:  00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
>>>>>>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>     CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
>>>>>>     Call Trace:
>>>>>>      rxe_alloc+0xc8/0x160 [rdma_rxe]
>>>>>>      rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
>>>>>>      __ib_alloc_pd+0xcb/0x160 [ib_core]
>>>>>>      ib_mad_init_device+0x296/0x8b0 [ib_core]
>>>>>>      add_client_context+0x11a/0x160 [ib_core]
>>>>>>      enable_device_and_get+0xdc/0x1d0 [ib_core]
>>>>>>      ib_register_device+0x572/0x6b0 [ib_core]
>>>>>>      ? crypto_create_tfm+0x32/0xe0
>>>>>>      ? crypto_create_tfm+0x7a/0xe0
>>>>>>      ? crypto_alloc_tfm+0x58/0xf0
>>>>>>      rxe_register_device+0x19d/0x1c0 [rdma_rxe]
>>>>>>      rxe_net_add+0x3d/0x70 [rdma_rxe]
>>>>>>      ? dev_get_by_name_rcu+0x73/0x90
>>>>>>      rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
>>>>>>      parse_args+0x179/0x370
>>>>>>      ? ref_module+0x1b0/0x1b0
>>>>>>      load_module+0x135e/0x17e0
>>>>>>      ? ref_module+0x1b0/0x1b0
>>>>>>      ? __do_sys_init_module+0x13b/0x180
>>>>>>      __do_sys_init_module+0x13b/0x180
>>>>>>      do_syscall_64+0x5b/0x1a0
>>>>>>      entry_SYSCALL_64_after_hwframe+0x65/0xca
>>>>>>     RIP: 0033:0x7f9137ed296e
>>>>>>
>>>>>> Fixes: 8700e3e7c485 ("Soft RoCE driver")
>>>>>> Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
>>>>>> ---
>>>>>>     drivers/infiniband/sw/rxe/rxe.c       | 14 +++++++-------
>>>>>>     drivers/infiniband/sw/rxe/rxe_pool.c  |  3 +++
>>>>>>     drivers/infiniband/sw/rxe/rxe_sysfs.c |  7 +++++++
>>>>>>     3 files changed, 17 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
>>>>>> index 5642eefb4ba1..60d5086dd34d 100644
>>>>>> --- a/drivers/infiniband/sw/rxe/rxe.c
>>>>>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>>>>>> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
>>>>>>     		goto err;
>>>>>>     	}
>>>>>> +	/* initialize slab caches for managed objects */
>>>>>> +	err = rxe_cache_init();
>>>>>> +	if (err) {
>>>>>> +		pr_err("unable to init object pools\n");
>>>>>> +		goto err;
>>>>>> +	}
>>>>>> +
>>>>>>     	err = rxe_net_add(ibdev_name, ndev);
>>>>>>     	if (err) {
>>>>>>     		pr_err("failed to add %s\n", ndev->name);
>>>>>> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
>>>>>>     {
>>>>>>     	int err;
>>>>>> -	/* initialize slab caches for managed objects */
>>>>>> -	err = rxe_cache_init();
>>> When modprobe rdma_rxe, rxe_module_init should be called. Then
>>> rxe_cache_init should be also called.
>>>
>>> Why does the above call trace occur?
>>>
>>> Zhu Yanjun
>>>
>> As you can see in the call trace attached to the commit message, When
>> running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
>> is called before rxe_module_init() (without init the caches), so the
>> call trace occurs when trying to register the allocated rxe device from
>> the context of rxe_param_set_add() without initialize the caches.
> I would expect the fix being in rxe_init() instead of putting calls to
> rxe_cache_init() in all places.

I agree with you.

Is it possible to make rxe_module_init be called before rxe_param_set_add?

Thanks

>
> Thanks



  parent reply	other threads:[~2020-08-19  3:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12 11:14 [PATCH for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create() Kamal Heib
2020-08-15  6:58 ` Zhu Yanjun
2020-08-16 22:12   ` Kamal Heib
2020-08-18  1:48     ` Zhu Yanjun
2020-08-18  5:50       ` Kamal Heib
2020-08-18  7:49         ` Leon Romanovsky
2020-08-18 14:18           ` Kamal Heib
2020-08-19  3:07           ` Zhu Yanjun [this message]
2020-08-19  4:58             ` Leon Romanovsky
2020-08-19  6:19               ` Zhu Yanjun
2020-08-19  7:20                 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31678b13-4db1-d454-a85c-1ba5c8029c41@gmail.com \
    --to=zyjzyj2000@gmail.com \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kamalheib1@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.