All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>,
	Andi Shyti <andi.shyti@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in per-gt sysfs
Date: Thu, 28 Apr 2022 16:36:14 +0200	[thread overview]
Message-ID: <ce0f7b16-b06b-4eb2-9c47-e7d4092a6c4f@intel.com> (raw)
In-Reply-To: <87ee1i5k58.wl-ashutosh.dixit@intel.com>



On 27.04.2022 22:46, Dixit, Ashutosh wrote:
> On Sun, 24 Apr 2022 15:36:23 -0700, Andi Shyti wrote:
>> Hi Andrzej and Ashutosh,
>>
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gt_types.h
>>>>>> index 937b2e1a305e..4c72b4f983a6 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
>>>>>> @@ -222,6 +222,9 @@ struct intel_gt {
>>>>>> 	} mocs;
>>>>>> 		struct intel_pxp pxp;
>>>>>> +
>>>>>> +	/* gt/gtN sysfs */
>>>>>> +	struct kobject sysfs_gtn;
>>>>> If you put kobject as a part of intel_gt what assures you that lifetime of
>>>>> kobject is shorter than intel_gt? Ie its refcounter is 0 on removal of
>>>>> intel_gt?
>>>> Because we are explicitly doing a kobject_put() in
>>>> intel_gt_sysfs_unregister(). Which is exactly what we are *not* doing in
>>>> the previous code.
>>>>
>>>> Let me explain a bit about the previous code (but feel free to skip since
>>>> the patch should speak for itself):
>>>> * Previously we kzalloc a 'struct kobj_gt'
>>>> * But we don't save a pointer to the 'struct kobj_gt' so we don't have the
>>>>     pointer to the kobject to be able to do a kobject_put() on it later
>>>> * Therefore we need to store the pointer in 'struct intel_gt'
>>>> * But if we have to put the pointer in 'struct intel_gt' we might as well
>>>>     put the kobject as part of 'struct intel_gt' and that also removes the
>>>>     need to have a 'struct kobj_gt' (kobj_to_gt() can just use container_of()
>>>>     to get gt from kobj).
>>>> * So I think this patch simpler/cleaner than the original code if you take
>>>>     the requirement for kobject_put() into account.
>> This is my oversight. This was something I completely forgot to
>> fix but it was my intention to do and actually I had some fixes
>> ongoing. But because this patch took too long to get in I
>> completely forgot about it (Sujaritha was actually the first who
>> pointed this out).
>>
>> Thanks, Ashutosh for taking this.
>>
>>> I fully agree that previous code is incorrect but I am not convinced current
>>> code is correct.
>>> If some objects are kref-counted it means usually they can have multiple
>>> concurrent users and kobject_put does not work as traditional
>>> destructor/cleanup/unregister.
>>> So in this particular case after calling kobject_init_and_add sysfs core can
>>> get multiple references on the object. Later, during driver unregistration
>>> kobject_put is called, but if the object is still in use by sysfs core, the
>>> object will not be destroyed/released. If the driver unregistration
>>> continues memory will be freed, leaving sysfs-core (or other users) with
>>> dangling pointers. Unless there is some additional synchronization mechanism
>>> I am not aware of.
>> Thanks Andrzej for summarizing this and what you said is actually
>> what happens. I had a similar solution developed and I had wrong
>> pointer reference happening.
> Hi Andrzej/Andi,
>
> I did do some research into kobject's and such before writing this patch
> and based on that I believe the patch is correct. Presenting some evidence
> below.
>
> The patch is verified by:
>
> a. Putting a printk in the release() method when it exists (it does for
>     sysfs_gtn kobject)
> b. Enabling dynamic prints for lib/kobject.c
>
> For example, with the following:
>
> # echo 'file kobject.c +p' > /sys/kernel/debug/dynamic_debug/control
> # echo -n "0000:03:00.0" > /sys/bus/pci/drivers/i915/unbind
>
> We see this in dmesg (see kobject_cleanup() called from kobject_put()):
>
> [ 1034.930007] kobject: '.defaults' (ffff88817130a640): kobject_cleanup, parent ffff8882262b5778
> [ 1034.930020] kobject: '.defaults' (ffff88817130a640): auto cleanup kobject_del
> [ 1034.930336] kobject: '.defaults' (ffff88817130a640): calling ktype release
> [ 1034.930340] kobject: (ffff88817130a640): dynamic_kobj_release
> [ 1034.930354] kobject: '.defaults': free name
> [ 1034.930366] kobject: 'gt0' (ffff8882262b5778): kobject_cleanup, parent ffff88817130a240
> [ 1034.930371] kobject: 'gt0' (ffff8882262b5778): auto cleanup kobject_del
> [ 1034.931930] kobject: 'gt0' (ffff8882262b5778): calling ktype release
> [ 1034.931936] kobject: 'gt0': free name
> [ 1034.958004] kobject: 'i915_0000_03_00.0' (ffff88810e1f8800): fill_kobj_path: path = '/devices/i915_0000_03_00.0'
> [ 1034.958155] kobject: 'i915_0000_03_00.0' (ffff88810e1f8800): kobject_cleanup, parent 0000000000000000
> [ 1034.958162] kobject: 'i915_0000_03_00.0' (ffff88810e1f8800): calling ktype release
> [ 1034.958188] kobject: 'i915_0000_03_00.0': free name
> [ 1034.958729] kobject: 'gt' (ffff88817130a240): kobject_cleanup, parent ffff8881160c5000
> [ 1034.958736] kobject: 'gt' (ffff88817130a240): auto cleanup kobject_del
> [ 1034.958762] kobject: 'gt' (ffff88817130a240): calling ktype release
> [ 1034.958767] kobject: (ffff88817130a240): dynamic_kobj_release
> [ 1034.958778] kobject: 'gt': free name
>
> We have the following directory structure (one of the patches is creating
> /sys/class/drm/card0/gt/gt0/.defaults):
>
>        /sys/class/drm/card0/gt
>                             |-gt0
>                                |-.defaults
>
> And we see from dmesg .defaults, gt0 and gt kobjects being cleaned up in
> that order.
>
> Looking at lib/kobject.c there are several interesting things:
>
> * Three subsystems are involved: kobject, sysfs and kernfs.
>
> * A child kobject takes a reference on the parent, so we must do a
>    kobject_put() on the child before doing kobject_put() on the parent
>    (creating a child kobject creates a corresponding sub-directory in sysfs).
>
> * Adding files to a sysfs directory does not take a reference on the
>    kobject, only on the parent kernfs_node.
>
> * Since we do call sysfs_create_group() (for RC6) ordinarily we will need
>    to call sysfs_remove_group() but this does not seem to be needed because
>    we are not creating a directory for the group (by providing a name for
>    the group). So sysfs_create_group() is equivalent to sysfs_create_files().
>    So it seems we don't need sysfs_remove_group().
>
> * Similarly it appears files created by sysfs_create_files() do not need to
>    be removed by sysfs_remove_files() because __kobject_del() and
>    sysfs_remove_dir() called from kobject_cleanup() do that for us (the
>    comment in kobject_cleanup() says "remove from sysfs if the caller did
>    not do it").
>
> Based on the above it is clear that no one except a child kobject takes a
> reference on the parent kobject and as long as we kobject_put() them in the
> correct order (as we seem to be doing based on dmesg trace above) we should
> be ok.
>
> Also what is followed in this patch is a fairly standard coding
> pattern. Further, in case of any errors we generally see failure to unload
> the module etc. and none of these things are being observed, module reload
> works fine.
>
> I hope these points are helpful in completing review of the patch.

See [1], it is quite old, so maybe it is not valid anymore, but I see no 
code proving sth has changed.
Also current doc says also [2] similar things, especially:
"Once you registered your kobject via kobject_add(), you must never use 
kfree() to free it directly"

[1]: https://lwn.net/Articles/36850/
[2]: 
https://elixir.bootlin.com/linux/v5.18-rc4/source/Documentation/core-api/kobject.rst#L246

Regards
Andrzej

>
> Thanks.
> --
> Ashutosh


  reply	other threads:[~2022-04-28 14:36 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 18:11 [Intel-gfx] [PATCH 0/8] drm/i915: Media freq factor and per-gt enhancements/fixes Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 1/8] drm/i915: Introduce has_media_ratio_mode Ashutosh Dixit
2022-04-15 10:26   ` Rodrigo Vivi
2022-04-13 18:11 ` [Intel-gfx] [PATCH 2/8] drm/i915/gt: Add media freq factor to per-gt sysfs Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 3/8] drm/i915/pcode: Extend pcode functions for multiple gt's Ashutosh Dixit
2022-04-14 13:28   ` Jani Nikula
2022-04-14 22:31     ` Dixit, Ashutosh
2022-04-15 10:21       ` Rodrigo Vivi
2022-04-20  5:54         ` Dixit, Ashutosh
2022-04-20 16:32           ` Vivi, Rodrigo
2022-04-26  7:42             ` Jani Nikula
2022-04-13 18:11 ` [Intel-gfx] [PATCH 4/8] drm/i915/pcode: Add a couple of pcode helpers Ashutosh Dixit
2022-04-15 10:31   ` Rodrigo Vivi
2022-04-19  1:23     ` Dixit, Ashutosh
2022-04-13 18:11 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Add media RP0/RPn to per-gt sysfs Ashutosh Dixit
2022-04-25  9:39   ` Kamil Konieczny
2022-04-26  0:43     ` Dixit, Ashutosh
2022-04-13 18:11 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Fix memory leaks in " Ashutosh Dixit
2022-04-13 19:14   ` Dixit, Ashutosh
2022-04-13 18:11 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Expose per-gt RPS defaults in sysfs Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 8/8] drm/i915/gt: Expose default value for media_freq_factor in per-gt sysfs Ashutosh Dixit
2022-04-14  0:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Media freq factor and per-gt enhancements/fixes Patchwork
2022-04-14  0:38 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-04-14  1:00 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-04-14  5:57   ` Dixit, Ashutosh
2022-04-14  7:11     ` Vudum, Lakshminarayana
2022-04-14  6:43 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-04-14  9:27 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-04-20  5:21 ` [Intel-gfx] [PATCH v2 0/9] " Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 1/9] drm/i915: Introduce has_media_ratio_mode Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 2/9] drm/i915/gt: Add media freq factor to per-gt sysfs Ashutosh Dixit
2022-04-21 20:57     ` Rodrigo Vivi
2022-04-26  0:29       ` Dixit, Ashutosh
2022-04-20  5:21   ` [Intel-gfx] [PATCH 3/9] drm/i915/pcode: Extend pcode functions for multiple gt's Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 4/9] drm/i915/gt: Convert callers to user per-gt pcode functions Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 5/9] drm/i915/pcode: Add a couple of pcode helpers Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 6/9] drm/i915/gt: Add media RP0/RPn to per-gt sysfs Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in " Ashutosh Dixit
2022-04-20 12:17     ` Andrzej Hajda
2022-04-20 16:12       ` Dixit, Ashutosh
2022-04-20 19:51         ` Andrzej Hajda
2022-04-24 22:36           ` Andi Shyti
2022-04-27 20:46             ` Dixit, Ashutosh
2022-04-28 14:36               ` Andrzej Hajda [this message]
2022-04-29  4:25                 ` Dixit, Ashutosh
2022-05-02  6:22                   ` Andrzej Hajda
2022-05-03  4:29                     ` Dixit, Ashutosh
2022-04-20  5:21   ` [Intel-gfx] [PATCH 8/9] drm/i915/gt: Expose per-gt RPS defaults in sysfs Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 9/9] drm/i915/gt: Expose default value for media_freq_factor in per-gt sysfs Ashutosh Dixit
2022-04-20  6:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Media freq factor and per-gt enhancements/fixes Patchwork
2022-04-20  6:25 [Intel-gfx] [PATCH v2 0/9] " Ashutosh Dixit
2022-04-20  6:25 ` [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in per-gt sysfs Ashutosh Dixit
2022-04-20 16:23   ` Dixit, Ashutosh
2022-04-24 22:30   ` Andi Shyti
2022-04-26 20:21     ` Dixit, Ashutosh
2022-04-27 11:45       ` Andi Shyti
2022-04-27 20:50         ` Dixit, Ashutosh
2022-04-29  0:39 [Intel-gfx] [PATCH v3 0/9] drm/i915: Media freq factor and per-gt enhancements/fixes Ashutosh Dixit
2022-04-29  0:39 ` [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in per-gt sysfs Ashutosh Dixit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce0f7b16-b06b-4eb2-9c47-e7d4092a6c4f@intel.com \
    --to=andrzej.hajda@intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.