From: John Garry <john.garry@huawei.com>
To: Robert Richter <rrichter@marvell.com>,
Borislav Petkov <bp@alien8.de>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Tony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>,
James Morse <james.morse@arm.com>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] EDAC/mc: Fix use-after-free and memleaks during device removal
Date: Thu, 6 Feb 2020 13:35:15 +0000 [thread overview]
Message-ID: <b5c40201-4521-b9c8-3adb-ee227bf2ffb4@huawei.com> (raw)
In-Reply-To: <20200205212444.10382-1-rrichter@marvell.com>
On 05/02/2020 21:24, Robert Richter wrote:
> A test kernel with the options set below revealed several issues when
> removing a mci device:
>
> DEBUG_TEST_DRIVER_REMOVE
> KASAN
> DEBUG_KMEMLEAK
>
> Issues seen:
>
> 1) Use-after-free:
>
> On 27.11.19 17:07:33, John Garry wrote:
>> [ 22.104498] BUG: KASAN: use-after-free in
>> edac_remove_sysfs_mci_device+0x148/0x180
>
> The use-after-free is caused by the mci_for_each_dimm() iterator that
> is called in edac_remove_sysfs_mci_device(). The iterator was
> introduced with commit c498afaf7df8 ("EDAC: Introduce an
> mci_for_each_dimm() iterator"). The iterator loop calls function
> device_unregister(&dimm->dev), which removes the sysfs entry of the
> device, but also frees the dimm struct in dimm_attr_release(). When
> incrementing the loop in mci_for_each_dimm(), the dimm struct is
> accessed again, but it is already freed.
>
> The fix is to free all the mci device's subsequent dimm and csrow
> objects at a later point when the mci device is freed. This keeps the
> data structures intact and the mci device can be fully used until its
> removal.
>
> 2) Memory leaks:
>
> Following memory leaks have been detected:
>
> # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
> 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows
> 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels
> 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn]
> 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms
> 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
>
> All leaks are from memory created by edac_mc_alloc().
>
> Note: The test above shows that edac_mc_alloc() was called here from
> ghes_edac_register(), thus both functions show up in the stack dump,
> but the driver causing the leaks is edac_mc. The comments with the
> data structures involved were made manually by analyzing the objdump.
>
> The data structures listed above and created by edac_mc_alloc() are
> not properly removed during device removal, which is done in
> edac_mc_free(). There are two paths implemented to remove the device
> depending on device registration, _edac_mc_free() is called if the
> device is not registered and edac_unregister_sysfs() otherwise. The
> implemenations differ. For the sysfs case the mci device removal lacks
> the removal of subsequent data structures (csrows, channels, dimms).
> This causes the memory leaks (see mci_attr_release()).
>
> Fixing this as follows:
>
> Unify code and implement a mci_release() function which is used to
> remove a struct mci regardless of the device registration status. Use
> put_device() to release the struct. Free all subsequent data structs
> of the mci's children in that release function. An effect of this is
> that no data is freed in edac_mc_sysfs.c (except the "mc" sysfs root
> node). All sysfs entries have the mci device as a parent, so its
> refcount will keep the mci parent as long as sysfs entries exist. This
> prevents struct mci from being freed until all sysfs entries have been
> removed which is done in edac_remove_sysfs_mci_device(). With the
> changes made the mci_for_each_dimm() loop is now save to release dimm
> devices from sysfs.
>
> The patch has been tested with the above kernel options, no issues
> seen any longer.
>
> Reported-by: John Garry <john.garry@huawei.com>
> Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator")
> Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
> Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> Acked-by: Aristeu Rozanski <aris@redhat.com>
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
> V2:
Kasan warnings and leak reports are gone:
Tested-by: John Garry <john.garry@huawei.com>
Cheers
prev parent reply other threads:[~2020-02-06 13:35 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-05 21:24 [PATCH v2] EDAC/mc: Fix use-after-free and memleaks during device removal Robert Richter
2020-02-06 13:35 ` John Garry [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5c40201-4521-b9c8-3adb-ee227bf2ffb4@huawei.com \
--to=john.garry@huawei.com \
--cc=aris@redhat.com \
--cc=bp@alien8.de \
--cc=james.morse@arm.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=rrichter@marvell.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).