All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tony.luck@intel.com, x86@kernel.org,
	Smita.KoralahalliChannabasappa@amd.com, mpatocka@redhat.com
Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
Date: Wed, 15 Jun 2022 08:33:50 +0200	[thread overview]
Message-ID: <Yql9TqFtebd2h9Z9@kroah.com> (raw)
In-Reply-To: <20220614174346.3648305-1-yazen.ghannam@amd.com>

On Tue, Jun 14, 2022 at 05:43:46PM +0000, Yazen Ghannam wrote:
> AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
> Therefore, the threshold_bank structure for bank 4, and its threshold_block
> structures, will be initialized once at boot time. And the kobject for the
> shared bank will be added to each of the CPUs that share it. Furthermore,
> the threshold_blocks for the shared bank will be added again to the bank's
> kobject. These additions will increase the refcount for the bank's kobject.
> 
> For example, a shared bank with two blocks and shared across two CPUs will
> be set up like this:
> 
> CPU0 init
>   bank create and add; bank refcount = 1; threshold_create_bank()
>     block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
>     block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
> CPU1 init
>   bank add; bank refcount = 3; threshold_create_bank()
>     block 0 add; bank refcount = 4; __threshold_add_blocks()
>     block 1 add; bank refcount = 5; __threshold_add_blocks()
> 
> Currently in threshold_remove_bank(), if the bank is shared then
> __threshold_remove_blocks() is called. Here the shared bank's kobject and
> the bank's blocks' kobjects are deleted. This is done on the first call
> even while the structures are still shared. Subsequent calls from other
> CPUs that share the structures will attempt to delete the kobjects.
> 
> During kobject_del(), kobject->sd is removed. If the kobject is not part of
> a kset with default_groups, then subsequent kobject_del() calls seem safe
> even with kobject->sd == NULL.
> 
> Originally, the AMD MCA thresholding structures did not use default_groups.
> And so the above behavior was not apparent.
> 
> However, a recent change implemented default_groups for the thresholding
> structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
> code path. In this case, the first kobject_del() may succeed and remove
> kobject->sd. But subsequent kobject_del() calls will give a WARNing in
> kernfs_remove_by_name_ns() since kobject->sd == NULL.
> 
> Use kobject_put() on the shared bank's kobject when "removing" blocks. This
> decrements the bank's refcount while keeping kobjects enabled until the
> bank is no longer shared. At that point, kobject_put() will be called on
> the blocks which drives their refcount to 0 and deletes them and also
> decrementing the bank's refcount. And finally kobject_put() will be called
> on the bank driving its refcount to 0 and deleting it.
> 
> With this patch and the example above:
> 
> CPU1 shutdown
>   bank is shared; bank refcount = 5; threshold_remove_bank()
>     block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
>     block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
> CPU0 shutdown
>   bank is no longer shared; bank refcount = 3; threshold_remove_bank()
>     block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
>     block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
>   put bank; bank refcount = 0; threshold_remove_bank()
> 
> Fixes: 7f99cb5e6039 ("x86/CPU/AMD: Use default_groups in kobj_type")

This predates this fixup, this commit just exposed the root problem here
so odds are it should be backported further, right?

thanks,

greg k-h

  reply	other threads:[~2022-06-15  7:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 17:43 [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks Yazen Ghannam
2022-06-15  6:33 ` Greg KH [this message]
2022-06-15 13:51   ` Yazen Ghannam
2022-10-26 10:16     ` Borislav Petkov
2022-10-26 12:04       ` Greg KH
2022-10-26 15:39         ` Yazen Ghannam
2022-10-26 18:29           ` Borislav Petkov
2022-10-26 19:44             ` Yazen Ghannam
2022-10-26 20:12               ` Borislav Petkov
2022-11-02  2:36                 ` Yazen Ghannam
2022-08-12 21:14 ` Mateusz Jończyk
2022-08-13 10:09   ` Borislav Petkov
2022-08-13 12:04     ` Mateusz Jończyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yql9TqFtebd2h9Z9@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.