All of lore.kernel.org
 help / color / mirror / Atom feed
* ECC memory of BMC
@ 2019-02-21  6:25 Will Liang (梁永鉉)
  2019-02-21  7:10 ` Andrew Jeffery
  0 siblings, 1 reply; 11+ messages in thread
From: Will Liang (梁永鉉) @ 2019-02-21  6:25 UTC (permalink / raw)
  To: openbmc

[-- Attachment #1: Type: text/plain, Size: 192 bytes --]

Hi,


we are trying to enable ECC on BMC memory.

is there any exist solution on BMC?

we can also share our proposal to review if there was anyone interested in it.


BRs,

Will

[-- Attachment #2: Type: text/html, Size: 722 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ECC memory of BMC
  2019-02-21  6:25 ECC memory of BMC Will Liang (梁永鉉)
@ 2019-02-21  7:10 ` Andrew Jeffery
  2019-02-21  7:27   ` Will Liang (梁永鉉)
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Jeffery @ 2019-02-21  7:10 UTC (permalink / raw)
  To: openbmc, Will Liang (梁永鉉); +Cc: Stefan M Schaeckeler

On Thu, 21 Feb 2019, at 17:16, Will Liang (梁永鉉) wrote:
>  
> Hi,
>  
> we are trying to enable ECC on BMC memory.
>  
> is there any exist solution on BMC?

Well, for what it's worth a kernel driver has been submitted upstream and is
queued for the 5.1 merge window:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=9b7e6242ee4efcd7f9ef699bf1965e3a5343f216

The patch does say it needs bootloader support though, and I'm not exactly
sure what that involves. I've added Stefan to Cc, maybe he can provide some
insight.

Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: ECC memory of BMC
  2019-02-21  7:10 ` Andrew Jeffery
@ 2019-02-21  7:27   ` Will Liang (梁永鉉)
  2019-02-21 16:56     ` Stefan Schaeckeler (sschaeck)
  2019-02-22  0:28     ` Andrew Jeffery
  0 siblings, 2 replies; 11+ messages in thread
From: Will Liang (梁永鉉) @ 2019-02-21  7:27 UTC (permalink / raw)
  To: Andrew Jeffery, openbmc; +Cc: Stefan M Schaeckeler

[-- Attachment #1: Type: text/plain, Size: 1049 bytes --]

Hi Andrew,


Thanks for your response.


We have also found EDAC driver.

What we want to do is to record the ECC events to SEL.


we are considering to create new dbus and a service.


Do you have any suggestions?


BRs,

Will

________________________________
寄件者: Andrew Jeffery <andrew@aj.id.au>
寄件日期: 2019年2月21日 下午 03:10
收件者: openbmc@lists.ozlabs.org; Will Liang (梁永鉉)
副本: Stefan M Schaeckeler
主旨: Re: ECC memory of BMC

On Thu, 21 Feb 2019, at 17:16, Will Liang (梁永鉉) wrote:
>
> Hi,
>
> we are trying to enable ECC on BMC memory.
>
> is there any exist solution on BMC?

Well, for what it's worth a kernel driver has been submitted upstream and is
queued for the 5.1 merge window:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=9b7e6242ee4efcd7f9ef699bf1965e3a5343f216

The patch does say it needs bootloader support though, and I'm not exactly
sure what that involves. I've added Stefan to Cc, maybe he can provide some
insight.

Andrew

[-- Attachment #2: Type: text/html, Size: 2181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ECC memory of BMC
  2019-02-21  7:27   ` Will Liang (梁永鉉)
@ 2019-02-21 16:56     ` Stefan Schaeckeler (sschaeck)
  2019-02-22  5:52       ` Stefan Schaeckeler (sschaeck)
  2019-02-22  0:28     ` Andrew Jeffery
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Schaeckeler (sschaeck) @ 2019-02-21 16:56 UTC (permalink / raw)
  To: Will Liang (梁永鉉), Andrew Jeffery, openbmc

Hi all,

ECC needs to be enabled in u-boot. The reason is explained in my first upload https://patchwork.kernel.org/patch/10732769/

Our u-boot engineer did it for us. I see u-boot code is full of #ifdef CONFIG_DRAM_ECC blocks. In theory, enabling this option should be enough.


- A good indication of having ECC successfully enabled is linux actually booting.
- Another indication is available memory being 8/9th of what it used to be (check /proc/meminfo)
- Then, my driver should not log any errors in its probe function (check dmesg).

 Stefan

On 2/20/19, 11:27 PM, "Will Liang (梁永鉉)" <Will.Liang@quantatw.com> wrote:

    Hi Andrew,
    
    
    Thanks for your response.
    
    
    We have also found EDAC driver.
    What we want to do is to record the ECC events to SEL.
    
    
    we are considering to create new dbus and a service.
    
    
    Do you have any suggestions?
    
    
    BRs,
    Will
    
    ________________________________________
    寄件者: Andrew Jeffery <andrew@aj.id.au>
    寄件日期: 2019年2月21日 下午 03:10
    收件者: openbmc@lists.ozlabs.org; Will Liang (梁永鉉)
    副本: Stefan M Schaeckeler
    主旨: Re: ECC memory of BMC  
    
    
    On Thu, 21 Feb 2019, at 17:16, Will Liang (梁永鉉) wrote:
    >  
    > Hi,
    >  
    > we are trying to enable ECC on BMC memory.
    >  
    > is there any exist solution on BMC?
    
    Well, for what it's worth a kernel driver has been submitted upstream and is
    queued for the 5.1 merge window:
    
    https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=9b7e6242ee4efcd7f9ef699bf1965e3a5343f216
    
    The patch does say it needs bootloader support though, and I'm not exactly
    sure what that involves. I've added Stefan to Cc, maybe he can provide some
    insight.
    
    Andrew
    
    
    
    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ECC memory of BMC
  2019-02-21  7:27   ` Will Liang (梁永鉉)
  2019-02-21 16:56     ` Stefan Schaeckeler (sschaeck)
@ 2019-02-22  0:28     ` Andrew Jeffery
  2019-02-22  2:00       ` Will Liang (梁永鉉)
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Jeffery @ 2019-02-22  0:28 UTC (permalink / raw)
  To: openbmc, Will Liang (梁永鉉)
  Cc: Stefan M Schaeckeler, dkodihal

Hi Will,

As a note, please send text-only email to the list in the future,
and try to avoid top-posting (try to reply "inline" underneath
the paragraph of interest, this way the context is always
adjacent to your response)

I understand text-only email and inline replies can be hard to
do with some mail clients, but that's the list preference.

On Thu, 21 Feb 2019, at 17:57, Will Liang (梁永鉉) wrote:
>  
> Hi Andrew,
>  
> Thanks for your response.
>  
> We have also found EDAC driver.
>
> What we want to do is to record the ECC events to SEL.
>  
> we are considering to create new dbus and a service.

Right; I think you need to create a new service that polls
the sysfs interface for the EDAC device, and then use
phosphor-logging to create error logs.  I'm not much of a
userspace guy, so I've Cc'ed Deepak who might be able
to help on that  front.

Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: ECC memory of BMC
  2019-02-22  0:28     ` Andrew Jeffery
@ 2019-02-22  2:00       ` Will Liang (梁永鉉)
  2019-02-22  5:52         ` Stefan Schaeckeler (sschaeck)
  0 siblings, 1 reply; 11+ messages in thread
From: Will Liang (梁永鉉) @ 2019-02-22  2:00 UTC (permalink / raw)
  To: Andrew Jeffery, openbmc; +Cc: Stefan M Schaeckeler, dkodihal

> 
> On Thu, 21 Feb 2019, at 17:57, Will Liang (梁永鉉) wrote:
> >
> > Hi Andrew,
> >
> > Thanks for your response.
> >
> > We have also found EDAC driver.
> >
> > What we want to do is to record the ECC events to SEL.
> >
> > we are considering to create new dbus and a service.
> 
> Right; I think you need to create a new service that polls the sysfs interface for
> the EDAC device, and then use phosphor-logging to create error logs. 


We consider creating the following objects for D-Bus:
-bus name : /xyz/openbmc_project/ECC
-object path : /xyz/openbmc_project/ECC/status
-interface : xyz.openbmc_project.Memory.MemoryECC

and error types for xyz::openbmc_project::Memory::Ecc::Error::ceCount and "ueCount"
and "isLoggingLimitReached" for phosphor-logging error message.

 I'm not
> much of a userspace guy, so I've Cc'ed Deepak who might be able to help on
> that  front.
>

> Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ECC memory of BMC
  2019-02-21 16:56     ` Stefan Schaeckeler (sschaeck)
@ 2019-02-22  5:52       ` Stefan Schaeckeler (sschaeck)
  2019-02-22  7:48         ` Will Liang (梁永鉉)
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Schaeckeler (sschaeck) @ 2019-02-22  5:52 UTC (permalink / raw)
  To: Will Liang (梁永鉉), Andrew Jeffery, openbmc

Hello Will,


One more thing about enabling ECC:

> ECC needs to be enabled in u-boot. The reason is explained in my first upload
> https://patchwork.kernel.org/patch/10732769/
> 
> Our u-boot engineer did it for us. I see u-boot code is full of #ifdef CONFIG_DRAM_ECC blocks.
> In theory, enabling this option should be enough.
> 
> 
> - A good indication of having ECC successfully enabled is linux actually booting.
> - Another indication is available memory being 8/9th of what it used to be (check /proc/meminfo)
> - Then, my driver should not log any errors in its probe function (check dmesg).


While you have not set up ECC mode in u-boot yet, you might be able to use the
edac driver by removing the following code in aspeed_probe()

	/* bail out if ECC mode is not configured */
	regmap_read(aspeed_regmap, ASPEED_MCR_CONF, &reg04);
	if (!(reg04 & ASPEED_MCR_CONF_ECC)) {
		dev_err(&pdev->dev, "ECC mode is not configured in u-boot\n");
		return -EPERM;
	}

This will run the probe function hopefully to the end and expose the sysfs nodes.
I have not tested it. Maybe you need to further butcher down the driver?


You can then inject fake ecc errors via the kernel debug framework:

root@aspeed-arm:/# echo 42 > /sys/kernel/debug/edac/mc0/fake_inject  

root@aspeed-arm:/# dmesg | tail -1
[  293.020000] EDAC MC0: 1 CE FAKE ERROR on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - for EDAC testing only)

root@aspeed-arm:/# cat /sys/devices/system/edac/mc/mc0/ce*
1
0


BTW, here are my notes on EDAC and ECC on Aspeed which you might find
interesting: http://students.engr.scu.edu/~sschaeck/misc/aspeed-edac.html



Hope that helps, Stefan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ECC memory of BMC
  2019-02-22  2:00       ` Will Liang (梁永鉉)
@ 2019-02-22  5:52         ` Stefan Schaeckeler (sschaeck)
  2019-02-22  5:55           ` Andrew Jeffery
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Schaeckeler (sschaeck) @ 2019-02-22  5:52 UTC (permalink / raw)
  To: Will Liang (梁永鉉), Andrew Jeffery, openbmc; +Cc: dkodihal

Hi Will,

On 2/21/19, 6:00 PM, "Will Liang (梁永鉉)" <Will.Liang@quantatw.com> wrote:

> > > What we want to do is to record the ECC events to SEL.
> > >
> > > we are considering to create new dbus and a service.
> > 
> > Right; I think you need to create a new service that polls the sysfs interface for
> > the EDAC device, and then use phosphor-logging to create error logs. 
>
> We consider creating the following objects for D-Bus:
> -bus name : /xyz/openbmc_project/ECC
> -object path : /xyz/openbmc_project/ECC/status
> -interface : xyz.openbmc_project.Memory.MemoryECC
>
> and error types for xyz::openbmc_project::Memory::Ecc::Error::ceCount and "ueCount"
> and "isLoggingLimitReached" for phosphor-logging error message.


Note, the driver also logs the addresses of the recoverable and un-recoverable
errors. Perhaps you want to expose them, too?

The edac framework is unfortunately not exposing them through sysfs. They get
printed through "edac_mc_handle_error()" as printk(KERN_WARNING, ...) and look
like

root@aspeed-arm:# dmesg | grep EDAC
[ 1718.900000] EDAC MC0: 1 CE address(es) not available on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:0 syndrome:0x0)
[ 1718.900000] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x80000 offset:0x0 grain:0 syndrome:0x0)


I'm not sure if there is an elegant way for userspace to retrieve messages from
the kernel ring buffer.

 Stefan




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ECC memory of BMC
  2019-02-22  5:52         ` Stefan Schaeckeler (sschaeck)
@ 2019-02-22  5:55           ` Andrew Jeffery
  2019-02-22  6:37             ` Will Liang (梁永鉉)
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Jeffery @ 2019-02-22  5:55 UTC (permalink / raw)
  To: Will Liang (梁永鉉), openbmc, Stefan M Schaeckeler
  Cc: dkodihal

On Fri, 22 Feb 2019, at 16:22, Stefan Schaeckeler (sschaeck) wrote:
> Hi Will,
> 
> On 2/21/19, 6:00 PM, "Will Liang (梁永鉉)" <Will.Liang@quantatw.com> wrote:
> 
> > > > What we want to do is to record the ECC events to SEL.
> > > >
> > > > we are considering to create new dbus and a service.
> > > 
> > > Right; I think you need to create a new service that polls the sysfs interface for
> > > the EDAC device, and then use phosphor-logging to create error logs. 
> >
> > We consider creating the following objects for D-Bus:
> > -bus name : /xyz/openbmc_project/ECC
> > -object path : /xyz/openbmc_project/ECC/status
> > -interface : xyz.openbmc_project.Memory.MemoryECC
> >
> > and error types for xyz::openbmc_project::Memory::Ecc::Error::ceCount and "ueCount"
> > and "isLoggingLimitReached" for phosphor-logging error message.
> 
> 
> Note, the driver also logs the addresses of the recoverable and un-recoverable
> errors. Perhaps you want to expose them, too?
> 
> The edac framework is unfortunately not exposing them through sysfs. They get
> printed through "edac_mc_handle_error()" as printk(KERN_WARNING, ...) and look
> like
> 
> root@aspeed-arm:# dmesg | grep EDAC
> [ 1718.900000] EDAC MC0: 1 CE address(es) not available on 
> mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:0 
> syndrome:0x0)
> [ 1718.900000] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 
> channel:0 page:0x80000 offset:0x0 grain:0 syndrome:0x0)
> 
> 
> I'm not sure if there is an elegant way for userspace to retrieve messages from
> the kernel ring buffer.
> 

Lets not start scraping dmesg. It's not considered part of the kernel ABI.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: ECC memory of BMC
  2019-02-22  5:55           ` Andrew Jeffery
@ 2019-02-22  6:37             ` Will Liang (梁永鉉)
  0 siblings, 0 replies; 11+ messages in thread
From: Will Liang (梁永鉉) @ 2019-02-22  6:37 UTC (permalink / raw)
  To: Andrew Jeffery, openbmc, Stefan M Schaeckeler; +Cc: dkodihal

Hi,

> -----Original Message-----
> From: Andrew Jeffery [mailto:andrew@aj.id.au]
> Sent: Friday, February 22, 2019 1:55 PM
> To: Will Liang (梁永鉉) <Will.Liang@quantatw.com>;
> openbmc@lists.ozlabs.org; Stefan M Schaeckeler <sschaeck@cisco.com>
> Cc: dkodihal@in.ibm.com
> Subject: Re: ECC memory of BMC
> 
> On Fri, 22 Feb 2019, at 16:22, Stefan Schaeckeler (sschaeck) wrote:
> > Hi Will,
> >
> > On 2/21/19, 6:00 PM, "Will Liang (梁永鉉)" <Will.Liang@quantatw.com>
> wrote:
> >
> > > > > What we want to do is to record the ECC events to SEL.
> > > > >
> > > > > we are considering to create new dbus and a service.
> > > >
> > > > Right; I think you need to create a new service that polls the
> > > > sysfs interface for the EDAC device, and then use phosphor-logging to
> create error logs.
> > >
> > > We consider creating the following objects for D-Bus:
> > > -bus name : /xyz/openbmc_project/ECC -object path :
> > > /xyz/openbmc_project/ECC/status -interface :
> > > xyz.openbmc_project.Memory.MemoryECC
> > >
> > > and error types for xyz::openbmc_project::Memory::Ecc::Error::ceCount
> and "ueCount"
> > > and "isLoggingLimitReached" for phosphor-logging error message.
> >
> >
> > Note, the driver also logs the addresses of the recoverable and
> > un-recoverable errors. Perhaps you want to expose them, too?
> >
> > The edac framework is unfortunately not exposing them through sysfs.
> > They get printed through "edac_mc_handle_error()" as
> > printk(KERN_WARNING, ...) and look like
> >
> > root@aspeed-arm:# dmesg | grep EDAC
> > [ 1718.900000] EDAC MC0: 1 CE address(es) not available on
> > mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:0
> > syndrome:0x0)
> > [ 1718.900000] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0
> > channel:0 page:0x80000 offset:0x0 grain:0 syndrome:0x0)
> >
> >
> > I'm not sure if there is an elegant way for userspace to retrieve messages
> from
> > the kernel ring buffer.
> >
> 
> Lets not start scraping dmesg. It's not considered part of the kernel ABI.

We do not expose error message from EDAC driver. 

We only want to fetch recoverable/un-recoverable counts and record the ECC log.
Therefore, we need a service to poll the EDAC driver.

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: ECC memory of BMC
  2019-02-22  5:52       ` Stefan Schaeckeler (sschaeck)
@ 2019-02-22  7:48         ` Will Liang (梁永鉉)
  0 siblings, 0 replies; 11+ messages in thread
From: Will Liang (梁永鉉) @ 2019-02-22  7:48 UTC (permalink / raw)
  To: Stefan Schaeckeler (sschaeck), Andrew Jeffery, openbmc


> 
> Hello Will,
> 
> 
> One more thing about enabling ECC:
> 
> > ECC needs to be enabled in u-boot. The reason is explained in my first
> > upload https://patchwork.kernel.org/patch/10732769/
> >
> > Our u-boot engineer did it for us. I see u-boot code is full of #ifdef
> CONFIG_DRAM_ECC blocks.
> > In theory, enabling this option should be enough.
> >
> >
> > - A good indication of having ECC successfully enabled is linux actually
> booting.
> > - Another indication is available memory being 8/9th of what it used
> > to be (check /proc/meminfo)
> > - Then, my driver should not log any errors in its probe function (check
> dmesg).
> 
> 
> While you have not set up ECC mode in u-boot yet, you might be able to use
> the edac driver by removing the following code in aspeed_probe()
> 
> 	/* bail out if ECC mode is not configured */
> 	regmap_read(aspeed_regmap, ASPEED_MCR_CONF, &reg04);
> 	if (!(reg04 & ASPEED_MCR_CONF_ECC)) {
> 		dev_err(&pdev->dev, "ECC mode is not configured in u-boot\n");
> 		return -EPERM;
> 	}
> 
> This will run the probe function hopefully to the end and expose the sysfs
> nodes.
> I have not tested it. Maybe you need to further butcher down the driver?
> 
> 
> You can then inject fake ecc errors via the kernel debug framework:
> 
> root@aspeed-arm:/# echo 42 > /sys/kernel/debug/edac/mc0/fake_inject
> 
> root@aspeed-arm:/# dmesg | tail -1
> [  293.020000] EDAC MC0: 1 CE FAKE ERROR on mc#0csrow#0channel#0
> (csrow:0 channel:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - for EDAC
> testing only)
> 
> root@aspeed-arm:/# cat /sys/devices/system/edac/mc/mc0/ce*
> 1
> 0
> 
> 
> BTW, here are my notes on EDAC and ECC on Aspeed which you might find
> interesting: http://students.engr.scu.edu/~sschaeck/misc/aspeed-edac.html
> 
> 
> 
> Hope that helps, Stefan

Hi Stefan,

Thanks for the updated information.
Instead of extending the kernel space, we are developing services in user space.

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-22  7:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-21  6:25 ECC memory of BMC Will Liang (梁永鉉)
2019-02-21  7:10 ` Andrew Jeffery
2019-02-21  7:27   ` Will Liang (梁永鉉)
2019-02-21 16:56     ` Stefan Schaeckeler (sschaeck)
2019-02-22  5:52       ` Stefan Schaeckeler (sschaeck)
2019-02-22  7:48         ` Will Liang (梁永鉉)
2019-02-22  0:28     ` Andrew Jeffery
2019-02-22  2:00       ` Will Liang (梁永鉉)
2019-02-22  5:52         ` Stefan Schaeckeler (sschaeck)
2019-02-22  5:55           ` Andrew Jeffery
2019-02-22  6:37             ` Will Liang (梁永鉉)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.