linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] BUG: RESTRACK detected leak of resources
@ 2021-10-04 10:45 syzbot
  2021-10-04 12:42 ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: syzbot @ 2021-10-04 10:45 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com

rdma_rxe: rxe-pd pool destroyed with unfree'd elem
rdma_rxe: rxe-mr pool destroyed with unfree'd elem
restrack: ------------[ cut here ]------------
infiniband syz0: BUG: RESTRACK detected leak of resources
restrack: Kernel PD object allocated by rds_rdma is not freed
restrack: ------------[ cut here ]------------


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-04 10:45 [syzbot] BUG: RESTRACK detected leak of resources syzbot
@ 2021-10-04 12:42 ` Dmitry Vyukov
  2021-10-04 13:15   ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2021-10-04 12:42 UTC (permalink / raw)
  To: Doug Ledford, syzbot, Jason Gunthorpe, Leon Romanovsky,
	RDMA mailing list
  Cc: linux-kernel, syzkaller-bugs

On Mon, 4 Oct 2021 at 12:45, syzbot
<syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
> dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com

+RESTRACK maintainers

(it would also be good if RESTRACK would print a more standard oops
with stack/filenames, so that testing systems can attribute issues to
files/maintainers).

> rdma_rxe: rxe-pd pool destroyed with unfree'd elem
> rdma_rxe: rxe-mr pool destroyed with unfree'd elem
> restrack: ------------[ cut here ]------------
> infiniband syz0: BUG: RESTRACK detected leak of resources
> restrack: Kernel PD object allocated by rds_rdma is not freed
> restrack: ------------[ cut here ]------------
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-04 12:42 ` Dmitry Vyukov
@ 2021-10-04 13:15   ` Jason Gunthorpe
  2021-10-04 13:22     ` Dmitry Vyukov
  2021-10-06 10:10     ` Leon Romanovsky
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 13:15 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Doug Ledford, syzbot, Leon Romanovsky, RDMA mailing list,
	linux-kernel, syzkaller-bugs

On Mon, Oct 04, 2021 at 02:42:11PM +0200, Dmitry Vyukov wrote:
> On Mon, 4 Oct 2021 at 12:45, syzbot
> <syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
> > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com
> 
> +RESTRACK maintainers
> 
> (it would also be good if RESTRACK would print a more standard oops
> with stack/filenames, so that testing systems can attribute issues to
> files/maintainers).

restrack certainly should trigger a WARN_ON to stop the kernel.. But I
don't know what stack track would be useful here. The culprit is
always the underlying driver, not the core code..

Anyhow, this report is either rxe or rds by the look of it.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-04 13:15   ` Jason Gunthorpe
@ 2021-10-04 13:22     ` Dmitry Vyukov
  2021-10-04 16:02       ` Haakon Bugge
  2021-10-06 10:10     ` Leon Romanovsky
  1 sibling, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2021-10-04 13:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, syzbot, Leon Romanovsky, RDMA mailing list,
	linux-kernel, syzkaller-bugs

On Mon, 4 Oct 2021 at 15:15, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Oct 04, 2021 at 02:42:11PM +0200, Dmitry Vyukov wrote:
> > On Mon, 4 Oct 2021 at 12:45, syzbot
> > <syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
> > > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com
> >
> > +RESTRACK maintainers
> >
> > (it would also be good if RESTRACK would print a more standard oops
> > with stack/filenames, so that testing systems can attribute issues to
> > files/maintainers).
>
> restrack certainly should trigger a WARN_ON to stop the kernel.. But I
> don't know what stack track would be useful here. The culprit is
> always the underlying driver, not the core code..

There seems to be a significant overlap between
drivers/infiniband/core/restrack.c and drivers/infiniband/sw/rxe/rxe.c
maintainers, so perhaps restrack.c is good enough approximation to
extract relevant people (definitely better then no CC at all :))


> Anyhow, this report is either rxe or rds by the look of it.
>
> Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-04 13:22     ` Dmitry Vyukov
@ 2021-10-04 16:02       ` Haakon Bugge
  2021-10-05 13:11         ` Zhu Yanjun
  0 siblings, 1 reply; 8+ messages in thread
From: Haakon Bugge @ 2021-10-04 16:02 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jason Gunthorpe, Doug Ledford, syzbot, Leon Romanovsky,
	OFED mailing list, LKML, syzkaller-bugs



> On 4 Oct 2021, at 15:22, Dmitry Vyukov <dvyukov@google.com> wrote:
> 
> On Mon, 4 Oct 2021 at 15:15, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> 
>> On Mon, Oct 04, 2021 at 02:42:11PM +0200, Dmitry Vyukov wrote:
>>> On Mon, 4 Oct 2021 at 12:45, syzbot
>>> <syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> syzbot found the following issue on:
>>>> 
>>>> HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
>>>> git tree:       linux-next
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
>>>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>>> 
>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>> 
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com
>>> 
>>> +RESTRACK maintainers
>>> 
>>> (it would also be good if RESTRACK would print a more standard oops
>>> with stack/filenames, so that testing systems can attribute issues to
>>> files/maintainers).
>> 
>> restrack certainly should trigger a WARN_ON to stop the kernel.. But I
>> don't know what stack track would be useful here. The culprit is
>> always the underlying driver, not the core code..
> 
> There seems to be a significant overlap between
> drivers/infiniband/core/restrack.c and drivers/infiniband/sw/rxe/rxe.c
> maintainers, so perhaps restrack.c is good enough approximation to
> extract relevant people (definitely better then no CC at all :))

Looks to me as this is rxe:

[ 1892.778632][ T8958] BUG: KASAN: use-after-free in __rxe_drop_index_locked+0xb5/0x100
[snip]
[ 1892.822375][ T8958] Call Trace:
[ 1892.825655][ T8958]  <TASK>
[ 1892.828594][ T8958]  dump_stack_lvl+0xcd/0x134
[ 1892.833273][ T8958]  print_address_description.constprop.0.cold+0x6c/0x30c
[ 1892.840316][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
[ 1892.845864][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
[ 1892.851424][ T8958]  kasan_report.cold+0x83/0xdf
[ 1892.856200][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
[ 1892.861761][ T8958]  kasan_check_range+0x13d/0x180
[ 1892.866780][ T8958]  __rxe_drop_index_locked+0xb5/0x100
[ 1892.872164][ T8958]  __rxe_drop_index+0x3f/0x60
[ 1892.876850][ T8958]  rxe_dereg_mr+0x14b/0x240
[ 1892.881381][ T8958]  ib_dealloc_pd_user+0x96/0x230
[ 1892.886566][ T8958]  rds_ib_dev_free+0xd4/0x3a0

So, RDS de-allocs its PD, ib core must first de-register the PD's local MR, calls rxe_dereg_mr(), ...


Thxs, Håkon


> 
>> Anyhow, this report is either rxe or rds by the look of it.
>> 
>> Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-04 16:02       ` Haakon Bugge
@ 2021-10-05 13:11         ` Zhu Yanjun
  2021-10-05 23:01           ` Bob Pearson
  0 siblings, 1 reply; 8+ messages in thread
From: Zhu Yanjun @ 2021-10-05 13:11 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Dmitry Vyukov, Jason Gunthorpe, Doug Ledford, syzbot,
	Leon Romanovsky, OFED mailing list, LKML, syzkaller-bugs

On Tue, Oct 5, 2021 at 1:56 AM Haakon Bugge <haakon.bugge@oracle.com> wrote:
>
>
>
> > On 4 Oct 2021, at 15:22, Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Mon, 4 Oct 2021 at 15:15, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >>
> >> On Mon, Oct 04, 2021 at 02:42:11PM +0200, Dmitry Vyukov wrote:
> >>> On Mon, 4 Oct 2021 at 12:45, syzbot
> >>> <syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> syzbot found the following issue on:
> >>>>
> >>>> HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
> >>>> git tree:       linux-next
> >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
> >>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
> >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
> >>>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >>>>
> >>>> Unfortunately, I don't have any reproducer for this issue yet.
> >>>>
> >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>> Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com
> >>>
> >>> +RESTRACK maintainers
> >>>
> >>> (it would also be good if RESTRACK would print a more standard oops
> >>> with stack/filenames, so that testing systems can attribute issues to
> >>> files/maintainers).
> >>
> >> restrack certainly should trigger a WARN_ON to stop the kernel.. But I
> >> don't know what stack track would be useful here. The culprit is
> >> always the underlying driver, not the core code..
> >
> > There seems to be a significant overlap between
> > drivers/infiniband/core/restrack.c and drivers/infiniband/sw/rxe/rxe.c
> > maintainers, so perhaps restrack.c is good enough approximation to
> > extract relevant people (definitely better then no CC at all :))
>
> Looks to me as this is rxe:
>
> [ 1892.778632][ T8958] BUG: KASAN: use-after-free in __rxe_drop_index_locked+0xb5/0x100
> [snip]
> [ 1892.822375][ T8958] Call Trace:
> [ 1892.825655][ T8958]  <TASK>
> [ 1892.828594][ T8958]  dump_stack_lvl+0xcd/0x134
> [ 1892.833273][ T8958]  print_address_description.constprop.0.cold+0x6c/0x30c
> [ 1892.840316][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
> [ 1892.845864][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
> [ 1892.851424][ T8958]  kasan_report.cold+0x83/0xdf
> [ 1892.856200][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
> [ 1892.861761][ T8958]  kasan_check_range+0x13d/0x180
> [ 1892.866780][ T8958]  __rxe_drop_index_locked+0xb5/0x100
> [ 1892.872164][ T8958]  __rxe_drop_index+0x3f/0x60
> [ 1892.876850][ T8958]  rxe_dereg_mr+0x14b/0x240
> [ 1892.881381][ T8958]  ib_dealloc_pd_user+0x96/0x230
> [ 1892.886566][ T8958]  rds_ib_dev_free+0xd4/0x3a0
>
> So, RDS de-allocs its PD, ib core must first de-register the PD's local MR, calls rxe_dereg_mr(), ...

int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
{
        struct rxe_mr *mr = to_rmr(ibmr);

        if (atomic_read(&mr->num_mw) > 0) {
                pr_warn("%s: Attempt to deregister an MR while bound to MWs\n",
                        __func__);
                return -EINVAL;
        }

        mr->state = RXE_MR_STATE_ZOMBIE;
        rxe_drop_ref(mr_pd(mr));
        rxe_drop_index(mr);            <-------This is call trace beginning.
        rxe_drop_ref(mr);

        return 0;
}

struct rxe_mr {
        struct rxe_pool_entry   pelem; <-----A ref_cnt in this struct.
        struct ib_mr            ibmr;

        struct ib_umem          *umem;

struct rxe_pool_entry {
        struct rxe_pool         *pool;
        struct kref             ref_cnt;        <-------This ref_cnt may help.
        struct list_head        list;

Zhu Yanjun

>
>
> Thxs, Håkon
>
>
> >
> >> Anyhow, this report is either rxe or rds by the look of it.
> >>
> >> Jason
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-05 13:11         ` Zhu Yanjun
@ 2021-10-05 23:01           ` Bob Pearson
  0 siblings, 0 replies; 8+ messages in thread
From: Bob Pearson @ 2021-10-05 23:01 UTC (permalink / raw)
  To: Zhu Yanjun, Haakon Bugge
  Cc: Dmitry Vyukov, Jason Gunthorpe, Doug Ledford, syzbot,
	Leon Romanovsky, OFED mailing list, LKML, syzkaller-bugs

On 10/5/21 8:11 AM, Zhu Yanjun wrote:
> On Tue, Oct 5, 2021 at 1:56 AM Haakon Bugge <haakon.bugge@oracle.com> wrote:
>>
>>
>>
>>> On 4 Oct 2021, at 15:22, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>
>>> On Mon, 4 Oct 2021 at 15:15, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>>>
>>>> On Mon, Oct 04, 2021 at 02:42:11PM +0200, Dmitry Vyukov wrote:
>>>>> On Mon, 4 Oct 2021 at 12:45, syzbot
>>>>> <syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> syzbot found the following issue on:
>>>>>>
>>>>>> HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
>>>>>> git tree:       linux-next
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
>>>>>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>>>>>
>>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com
>>>>>
>>>>> +RESTRACK maintainers
>>>>>
>>>>> (it would also be good if RESTRACK would print a more standard oops
>>>>> with stack/filenames, so that testing systems can attribute issues to
>>>>> files/maintainers).
>>>>
>>>> restrack certainly should trigger a WARN_ON to stop the kernel.. But I
>>>> don't know what stack track would be useful here. The culprit is
>>>> always the underlying driver, not the core code..
>>>
>>> There seems to be a significant overlap between
>>> drivers/infiniband/core/restrack.c and drivers/infiniband/sw/rxe/rxe.c
>>> maintainers, so perhaps restrack.c is good enough approximation to
>>> extract relevant people (definitely better then no CC at all :))
>>
>> Looks to me as this is rxe:
>>
>> [ 1892.778632][ T8958] BUG: KASAN: use-after-free in __rxe_drop_index_locked+0xb5/0x100
>> [snip]
>> [ 1892.822375][ T8958] Call Trace:
>> [ 1892.825655][ T8958]  <TASK>
>> [ 1892.828594][ T8958]  dump_stack_lvl+0xcd/0x134
>> [ 1892.833273][ T8958]  print_address_description.constprop.0.cold+0x6c/0x30c
>> [ 1892.840316][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
>> [ 1892.845864][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
>> [ 1892.851424][ T8958]  kasan_report.cold+0x83/0xdf
>> [ 1892.856200][ T8958]  ? __rxe_drop_index_locked+0xb5/0x100
>> [ 1892.861761][ T8958]  kasan_check_range+0x13d/0x180
>> [ 1892.866780][ T8958]  __rxe_drop_index_locked+0xb5/0x100
>> [ 1892.872164][ T8958]  __rxe_drop_index+0x3f/0x60
>> [ 1892.876850][ T8958]  rxe_dereg_mr+0x14b/0x240
>> [ 1892.881381][ T8958]  ib_dealloc_pd_user+0x96/0x230
>> [ 1892.886566][ T8958]  rds_ib_dev_free+0xd4/0x3a0
>>
>> So, RDS de-allocs its PD, ib core must first de-register the PD's local MR, calls rxe_dereg_mr(), ...
> 
> int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
> {
>         struct rxe_mr *mr = to_rmr(ibmr);
> 
>         if (atomic_read(&mr->num_mw) > 0) {
>                 pr_warn("%s: Attempt to deregister an MR while bound to MWs\n",
>                         __func__);
>                 return -EINVAL;
>         }
> 
>         mr->state = RXE_MR_STATE_ZOMBIE;
>         rxe_drop_ref(mr_pd(mr));
>         rxe_drop_index(mr);            <-------This is call trace beginning.
>         rxe_drop_ref(mr);
> 
>         return 0;
> }
> 
> struct rxe_mr {
>         struct rxe_pool_entry   pelem; <-----A ref_cnt in this struct.
>         struct ib_mr            ibmr;
> 
>         struct ib_umem          *umem;
> 
> struct rxe_pool_entry {
>         struct rxe_pool         *pool;
>         struct kref             ref_cnt;        <-------This ref_cnt may help.
>         struct list_head        list;
> 
> Zhu Yanjun
> 
>>
>>
>> Thxs, Håkon
>>
>>
>>>
>>>> Anyhow, this report is either rxe or rds by the look of it.
>>>>
>>>> Jason
>>

It looks like not all the objects are getting freed before the CA is deallocated.
If this happens the pool cleanup code issues the warning that the pool is being
cleaned up with some objects still in the pool. It then goes ahead and frees the
index table for indexed objects (like MRs). If the MR is later freed it tries to
remove its index in the index bit map which has already been freed causing the oops.
The MR not getting freed in time is the cause of the late deallocation of the PD
which was noticed by restrak. I am not sure if these late deletions are just the
normal flow of the program or caused by rdma core trying to clean up afterwards.

Normally MRs get a reference taken when they are created and dropped when they are
freed. Additionally references are taken when an lkey or rkey is looked up an turned
into a pointer to the MR object. These are normally dropped when the key goes out
of scope. If an error occurs it may be possible that the key goes out of scope without
dropping the reference to the MR which would cause what we are seeing. In this case
in order for the late deletion of the MR someone would have to call dereg mr a second
time to remove the lost ref count.

It would be helpful to have a test case to trigger this oops.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [syzbot] BUG: RESTRACK detected leak of resources
  2021-10-04 13:15   ` Jason Gunthorpe
  2021-10-04 13:22     ` Dmitry Vyukov
@ 2021-10-06 10:10     ` Leon Romanovsky
  1 sibling, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2021-10-06 10:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dmitry Vyukov, Doug Ledford, syzbot, RDMA mailing list,
	linux-kernel, syzkaller-bugs

On Mon, Oct 04, 2021 at 10:15:16AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 04, 2021 at 02:42:11PM +0200, Dmitry Vyukov wrote:
> > On Mon, 4 Oct 2021 at 12:45, syzbot
> > <syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    c7b4d0e56a1d Add linux-next specific files for 20210930
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=104be6cb300000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c9a1f6685aeb48bd
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=3a992c9e4fd9f0e6fd0e
> > > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+3a992c9e4fd9f0e6fd0e@syzkaller.appspotmail.com
> > 
> > +RESTRACK maintainers
> > 
> > (it would also be good if RESTRACK would print a more standard oops
> > with stack/filenames, so that testing systems can attribute issues to
> > files/maintainers).
> 
> restrack certainly should trigger a WARN_ON to stop the kernel.. But I
> don't know what stack track would be useful here. The culprit is
> always the underlying driver, not the core code..

We had WARN_ON() in early versions, but it didn't give us anything
except spammed dmesg, so I changed it to more lighter version.

> 
> Anyhow, this report is either rxe or rds by the look of it.
> 
> Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-06 10:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 10:45 [syzbot] BUG: RESTRACK detected leak of resources syzbot
2021-10-04 12:42 ` Dmitry Vyukov
2021-10-04 13:15   ` Jason Gunthorpe
2021-10-04 13:22     ` Dmitry Vyukov
2021-10-04 16:02       ` Haakon Bugge
2021-10-05 13:11         ` Zhu Yanjun
2021-10-05 23:01           ` Bob Pearson
2021-10-06 10:10     ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).