Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Robert Richter <rrichter@marvell.com>
Cc: James Morse <james.morse@arm.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@suse.de>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3] EDAC, ghes: Fix locking and memory barrier issues
Date: Fri, 8 Nov 2019 16:32:03 +0100
Message-ID: <20191108153203.GE4503@zn.tnic> (raw)
In-Reply-To: <20191105201115.v2pe6k6g2brx5itv@rric.localdomain>

On Tue, Nov 05, 2019 at 08:11:22PM +0000, Robert Richter wrote:
> On 05.11.19 20:07:51, Robert Richter wrote:
> > The ghes registration and refcount is broken in several ways:
> > 
> >  * ghes_edac_register() returns with success for a 2nd instance even
> >    if a first instance is still running. This is not correct as the
> >    first instance may fail later. A subsequent registration may not
> >    finish before the first. Parallel registrations must be avoided.
> > 
> >  * The refcount was increased even if a registration failed. This
> >    leads to stale counters preventing the device from being released.
> > 
> >  * The ghes refcount may not be decremented properly on
> >    unregistration. Always decrement the refcount once
> >    ghes_edac_unregister() is called to keep the refcount sane.
> > 
> >  * The ghes_pvt pointer is handed to the irq handler before
> >    registration finished.
> > 
> >  * The mci structure could be freed while the irq handler is running.
> > 
> > Fix this by adding a mutex to ghes_edac_register(). This mutex
> > serializes instances to register and unregister. The refcount is only
> > increased if the registration succeeded. This makes sure the refcount
> > is in a consistent state after registering or unregistering a device.
> > Note: A spinlock cannot be used here as the code section may sleep.
> > 
> > The ghes_pvt is protected by ghes_lock now. This ensures the pointer
> > is not updated before registration was finished or while the irq
> > handler is running. It is unset before unregistering the device
> > including necessary (implicit) memory barriers making the changes
> > visible to other cpus. Thus, the device can not be used anymore by an
> > interrupt.
> > 
> > Also, rename ghes_init to ghes_refcount for better readability and
> > switch to refcount API.
> > 
> > A refcount is needed. There can be multiple GHES structures being
> > defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
> > Source, "Some platforms may describe multiple Generic Hardware Error
> > Source structures with different notification types, ...").
> > 
> > Another approach to use the mci's device refcount (get_device()) and
> > have a release function does not work here. A release function will be
> > called only for device_release() with the last put_device() call. The
> > device must be deleted *before* that with device_del(). This is only
> > possible by maintaining an own refcount.
> > 
> > Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
> > Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
> > Co-developed-by: James Morse <james.morse@arm.com>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Co-developed-by: Borislav Petkov <bp@suse.de>
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > Signed-off-by: Robert Richter <rrichter@marvell.com>
> 
> I hope this SOB chain is correct now.

Yeah.

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

      reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-05 20:07 Robert Richter
2019-11-05 20:11 ` Robert Richter
2019-11-08 15:32   ` Borislav Petkov [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191108153203.GE4503@zn.tnic \
    --to=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=rrichter@marvell.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git