All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
@ 2017-09-01  1:42 Qi, Fuli
  2017-09-01 21:12 ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Qi, Fuli @ 2017-09-01  1:42 UTC (permalink / raw)
  To: linux-nvdimm

Hello,



This is my first time to experience OSS world. I hope I can contribute to NVDIMM development.



This is a patch set of nvdimmd, a tiny daemon to monitor the features of over

threshold events. It finds and monitors all of the dimms which support smart

threshold. Although at this time, nvdimm daemon only outputs NVDIMM's status to

systemlog, I would like to get comments/opinions.



The output includes dimm's name, health state and spares percentage, etc.

Here is a sample of the output.



            nvdimm warning: dimm over threshold notify [nmem2]

            health_state: non-critical

            spares_percentage: 75



About compiling the nvdimmd, I am not sure whether ndctl using automake can be

merged with into kernel or not[1], so a simple makefile is included. If it is

necessary to write into automake, please kindly let me know.



Here is TODO list:

 - Currently, if multiply events are notified in the same time, nvdimmd may lose some of them.

   I suppose it depends on select()’s specification, and I tried to use poll() instead of select(),

but it did not work well. So I need more research on select() and poll().

- Change the makefile to automake if necessary.

- Add more information of dimm into the notification.

- Add a config file to configure the parameters and initial settings for nvdimmd.

- Implement a feature(framework) so that nvdimmd can call external applications.



[1] https://lkml.org/lkml/2017/7/26/245



--

QI Fuli <qi.fuli@jp.fujitsu.com<mailto:qi.fuli@jp.fujitsu.com>>

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-01  1:42 [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event Qi, Fuli
@ 2017-09-01 21:12 ` Dan Williams
  2017-09-05 20:57   ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-09-01 21:12 UTC (permalink / raw)
  To: Qi, Fuli; +Cc: linux-nvdimm

On Thu, Aug 31, 2017 at 6:42 PM, Qi, Fuli <qi.fuli@jp.fujitsu.com> wrote:
> Hello,
>
>
>
> This is my first time to experience OSS world. I hope I can contribute to NVDIMM development.
>


Welcome!


> This is a patch set of nvdimmd, a tiny daemon to monitor the features of over
>
> threshold events. It finds and monitors all of the dimms which support smart
>
> threshold. Although at this time, nvdimm daemon only outputs NVDIMM's status to
>
> systemlog, I would like to get comments/opinions.
>


Sure, I'll take a deeper look next week.


>
> The output includes dimm's name, health state and spares percentage, etc.
>
> Here is a sample of the output.
>
>
>
>             nvdimm warning: dimm over threshold notify [nmem2]
>
>             health_state: non-critical
>
>             spares_percentage: 75
>
>
>
> About compiling the nvdimmd, I am not sure whether ndctl using automake can be
>
> merged with into kernel or not[1], so a simple makefile is included. If it is
>
> necessary to write into automake, please kindly let me know.


Yes, please use automake for now to integrate with the current build system.


> Here is TODO list:
>
>  - Currently, if multiply events are notified in the same time, nvdimmd may lose some of them.
>
>    I suppose it depends on select()’s specification, and I tried to use poll() instead of select(),
>
> but it did not work well. So I need more research on select() and poll().
>
> - Change the makefile to automake if necessary.
>
> - Add more information of dimm into the notification.


I need to do some more research myself on what is the easiest logging
implementation for other applications to consume, or if this should
just output raw json and let another application worry about
marhsaling that to a logging service.


> - Add a config file to configure the parameters and initial settings for nvdimmd.
>
> - Implement a feature(framework) so that nvdimmd can call external applications.


Sounds good.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-01 21:12 ` Dan Williams
@ 2017-09-05 20:57   ` Dan Williams
  2017-09-05 23:10     ` Song Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-09-05 20:57 UTC (permalink / raw)
  To: Qi, Fuli; +Cc: Song Liu, linux-nvdimm

[ adding Song ]

Song, I'm adding you to this thread for an opinion on what makes hardware
monitoring tooling easier to consume in a production environment. QI is
writing a daemon to catch alarms and notifications from the kernel's nvdimm
sub-system.

QI, Song presented on extending SCSI uevents for storage device event
logging at the last Linux Storage Summit.

On Fri, Sep 1, 2017 at 2:12 PM, Dan Williams <dan.j.williams@intel.com>
wrote:

> On Thu, Aug 31, 2017 at 6:42 PM, Qi, Fuli <qi.fuli@jp.fujitsu.com> wrote:
> > Hello,
> >
> >
> >
> > This is my first time to experience OSS world. I hope I can contribute
> to NVDIMM development.
> >
>
>
> Welcome!
>
>
> > This is a patch set of nvdimmd, a tiny daemon to monitor the features of
> over
> >
> > threshold events. It finds and monitors all of the dimms which support
> smart
> >
> > threshold. Although at this time, nvdimm daemon only outputs NVDIMM's
> status to
> >
> > systemlog, I would like to get comments/opinions.
> >
>
>
> Sure, I'll take a deeper look next week.
>
>
> >
> > The output includes dimm's name, health state and spares percentage, etc.
> >
> > Here is a sample of the output.
> >
> >
> >
> >             nvdimm warning: dimm over threshold notify [nmem2]
> >
> >             health_state: non-critical
> >
> >             spares_percentage: 75
> >
> >
> >
> > About compiling the nvdimmd, I am not sure whether ndctl using automake
> can be
> >
> > merged with into kernel or not[1], so a simple makefile is included. If
> it is
> >
> > necessary to write into automake, please kindly let me know.
>
>
> Yes, please use automake for now to integrate with the current build
> system.
>
>
> > Here is TODO list:
> >
> >  - Currently, if multiply events are notified in the same time, nvdimmd
> may lose some of them.
> >
> >    I suppose it depends on select()’s specification, and I tried to use
> poll() instead of select(),
> >
> > but it did not work well. So I need more research on select() and poll().
> >
> > - Change the makefile to automake if necessary.
> >
> > - Add more information of dimm into the notification.
>
>
> I need to do some more research myself on what is the easiest logging
> implementation for other applications to consume, or if this should
> just output raw json and let another application worry about
> marhsaling that to a logging service.
>
>
> > - Add a config file to configure the parameters and initial settings for
> nvdimmd.
> >
> > - Implement a feature(framework) so that nvdimmd can call external
> applications.
>
>
> Sounds good.
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-05 20:57   ` Dan Williams
@ 2017-09-05 23:10     ` Song Liu
  2017-09-05 23:42       ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Song Liu @ 2017-09-05 23:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm

Hi Dan, 

Thanks for adding me to the thread. 

Hi QI,

I am not familiar with the NVDIMM code, so my experiences may not apply well
to your case. 

We have done multiple designs to capture SCSI sense code in a log. The first
thing we learnt is do NOT use kernel debug messages (dmesg). The dmesg can 
easily get too noisy (for SCSI), which sometimes slows down your data path. 
Our first version is based on dmesg, which was not successfully. 

We have tried a few other options: ftrace, uevent, and BPF based solution. 
The benefit of these solutions (over dmesg) is that they send structured data
instead of plain text. All these solutions worked for me as prototype. I am 
currently pushing a BPF based solution to production. 

Another important lesson is to enable filtering and rate limits, and you need
it to be flexible. 


I hope these are helpful for your use cases. Please feel free to shoot me 
email for any questions. 

Thanks,
Song


> On Sep 5, 2017, at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> [ adding Song ]
> 
> Song, I'm adding you to this thread for an opinion on what makes hardware monitoring tooling easier to consume in a production environment. QI is writing a daemon to catch alarms and notifications from the kernel's nvdimm sub-system.
> 
> QI, Song presented on extending SCSI uevents for storage device event logging at the last Linux Storage Summit.
> 
> On Fri, Sep 1, 2017 at 2:12 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Aug 31, 2017 at 6:42 PM, Qi, Fuli <qi.fuli@jp.fujitsu.com> wrote:
> > Hello,
> >
> >
> >
> > This is my first time to experience OSS world. I hope I can contribute to NVDIMM development.
> >
> 
> 
> Welcome!
> 
> 
> > This is a patch set of nvdimmd, a tiny daemon to monitor the features of over
> >
> > threshold events. It finds and monitors all of the dimms which support smart
> >
> > threshold. Although at this time, nvdimm daemon only outputs NVDIMM's status to
> >
> > systemlog, I would like to get comments/opinions.
> >
> 
> 
> Sure, I'll take a deeper look next week.
> 
> 
> >
> > The output includes dimm's name, health state and spares percentage, etc.
> >
> > Here is a sample of the output.
> >
> >
> >
> >             nvdimm warning: dimm over threshold notify [nmem2]
> >
> >             health_state: non-critical
> >
> >             spares_percentage: 75
> >
> >
> >
> > About compiling the nvdimmd, I am not sure whether ndctl using automake can be
> >
> > merged with into kernel or not[1], so a simple makefile is included. If it is
> >
> > necessary to write into automake, please kindly let me know.
> 
> 
> Yes, please use automake for now to integrate with the current build system.
> 
> 
> > Here is TODO list:
> >
> >  - Currently, if multiply events are notified in the same time, nvdimmd may lose some of them.
> >
> >    I suppose it depends on select()’s specification, and I tried to use poll() instead of select(),
> >
> > but it did not work well. So I need more research on select() and poll().
> >
> > - Change the makefile to automake if necessary.
> >
> > - Add more information of dimm into the notification.
> 
> 
> I need to do some more research myself on what is the easiest logging
> implementation for other applications to consume, or if this should
> just output raw json and let another application worry about
> marhsaling that to a logging service.
> 
> 
> > - Add a config file to configure the parameters and initial settings for nvdimmd.
> >
> > - Implement a feature(framework) so that nvdimmd can call external applications.
> 
> 
> Sounds good.
> 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-05 23:10     ` Song Liu
@ 2017-09-05 23:42       ` Dan Williams
  2017-09-06  0:02         ` Song Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-09-05 23:42 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-nvdimm

On Tue, Sep 5, 2017 at 4:10 PM, Song Liu <songliubraving@fb.com> wrote:

> Hi Dan,
>
> Thanks for adding me to the thread.
>
> Hi QI,
>
> I am not familiar with the NVDIMM code, so my experiences may not apply
> well
> to your case.
>
> We have done multiple designs to capture SCSI sense code in a log. The
> first
> thing we learnt is do NOT use kernel debug messages (dmesg). The dmesg can
> easily get too noisy (for SCSI), which sometimes slows down your data path.
> Our first version is based on dmesg, which was not successfully.
>
> We have tried a few other options: ftrace, uevent, and BPF based solution.
> The benefit of these solutions (over dmesg) is that they send structured
> data
> instead of plain text. All these solutions worked for me as prototype. I am
> currently pushing a BPF based solution to production.
>
> Another important lesson is to enable filtering and rate limits, and you
> need
> it to be flexible.
>
>
> I hope these are helpful for your use cases. Please feel free to shoot me
> email for any questions.
>
>
So, do you think it would be enough if this daemon just captured all the
raw nvdimm events from the kernel and turned them into json records? I'm
just wondering what to do with the event data to make it easier to consume
by upper layer software, turn it into a Thrift endpoint?

In other words I want this to be a general building block that can be
integrated into production monitoring environments with various monitoring
infrastructures vs hard coding / prescribing a specific interface.

In any event, thanks for sharing your experience.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-05 23:42       ` Dan Williams
@ 2017-09-06  0:02         ` Song Liu
  2017-09-06  0:07           ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Song Liu @ 2017-09-06  0:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-nvdimm

For that, I would say generating JSON format is a solid building block. In my
setup, I have use Scribe https://en.wikipedia.org/wiki/Scribe_(log_server) . 
And further processing of data happens after that. But I don't think Scribe is 
available everywhere. 

Besides SCSI events, JSON format is preferable by many users of my tools. 

Some system admins prefer to pull data on their preferred schedule (instead of 
having a daemon to push data to them). So being able to filter events by time 
could be a useful feature. 

I hope these are helpful. 

Song



> On Sep 5, 2017, at 4:42 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> 
> 
> On Tue, Sep 5, 2017 at 4:10 PM, Song Liu <songliubraving@fb.com> wrote:
> Hi Dan,
> 
> Thanks for adding me to the thread.
> 
> Hi QI,
> 
> I am not familiar with the NVDIMM code, so my experiences may not apply well
> to your case.
> 
> We have done multiple designs to capture SCSI sense code in a log. The first
> thing we learnt is do NOT use kernel debug messages (dmesg). The dmesg can
> easily get too noisy (for SCSI), which sometimes slows down your data path.
> Our first version is based on dmesg, which was not successfully.
> 
> We have tried a few other options: ftrace, uevent, and BPF based solution.
> The benefit of these solutions (over dmesg) is that they send structured data
> instead of plain text. All these solutions worked for me as prototype. I am
> currently pushing a BPF based solution to production.
> 
> Another important lesson is to enable filtering and rate limits, and you need
> it to be flexible.
> 
> 
> I hope these are helpful for your use cases. Please feel free to shoot me
> email for any questions.
> 
> 
> So, do you think it would be enough if this daemon just captured all the raw nvdimm events from the kernel and turned them into json records? I'm just wondering what to do with the event data to make it easier to consume by upper layer software, turn it into a Thrift endpoint?
> 
> In other words I want this to be a general building block that can be integrated into production monitoring environments with various monitoring infrastructures vs hard coding / prescribing a specific interface.
> 
> In any event, thanks for sharing your experience.

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-06  0:02         ` Song Liu
@ 2017-09-06  0:07           ` Dan Williams
  2017-09-07  2:48             ` Qi, Fuli
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-09-06  0:07 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-nvdimm

On Tue, Sep 5, 2017 at 5:02 PM, Song Liu <songliubraving@fb.com> wrote:

> For that, I would say generating JSON format is a solid building block. In
> my
> setup, I have use Scribe https://en.wikipedia.org/wiki/Scribe_(log_server)
> .
> And further processing of data happens after that. But I don't think
> Scribe is
> available everywhere.
>
> Besides SCSI events, JSON format is preferable by many users of my tools.
>
> Some system admins prefer to pull data on their preferred schedule
> (instead of
> having a daemon to push data to them). So being able to filter events by
> time
> could be a useful feature.
>
> I hope these are helpful.
>

Yes, definitely. Thanks Song!
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event
  2017-09-06  0:07           ` Dan Williams
@ 2017-09-07  2:48             ` Qi, Fuli
  0 siblings, 0 replies; 8+ messages in thread
From: Qi, Fuli @ 2017-09-07  2:48 UTC (permalink / raw)
  To: 'Dan Williams', Song Liu; +Cc: linux-nvdimm

Hi

Thank you very much for your comments.

I agree that dmesg gets too noisy easily, but I think some users who use log
monitor/analyzer tools (i.e. Logstash, Fluented) would like to get warning/error
information form syslog. In order to make it consumed more easily by upper layer
software, I am thinking about adding a config file(/etc/nvdimmd/nvdimmd.conf)
to let users decide the way that suits their demands best. Users can choose either
to write the notifications into syslog, or to send structured data with json format
into a special file(users can setting the path for free, such as var/log/nvdimmd.log)
or both by configure parameters in the config file.

Users can also setup filters by setting different parameters in the config file,
for example the monitored dimms' name, notify time, frequency etc. If you think
this is the right approach, I will list up the parameters that I can think of by
now. Then we can discuss more details.

Your comments/opinions would also be highly appreciated.

P.S. Sorry to say that I will be off from office next week, and back on 19th Sep.

QI

From: Dan Williams [mailto:dan.j.williams@intel.com]
Sent: Wednesday, September 6, 2017 9:07 AM
To: Song Liu <songliubraving@fb.com>
Cc: Qi, Fuli/斉 福利 <qi.fuli@jp.fujitsu.com>; linux-nvdimm@lists.01.org
Subject: Re: [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event



On Tue, Sep 5, 2017 at 5:02 PM, Song Liu <songliubraving@fb.com<mailto:songliubraving@fb.com>> wrote:
For that, I would say generating JSON format is a solid building block. In my
setup, I have use Scribe https://en.wikipedia.org/wiki/Scribe_(log_server) .
And further processing of data happens after that. But I don't think Scribe is
available everywhere.

Besides SCSI events, JSON format is preferable by many users of my tools.

Some system admins prefer to pull data on their preferred schedule (instead of
having a daemon to push data to them). So being able to filter events by time
could be a useful feature.

I hope these are helpful.

Yes, definitely. Thanks Song!
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-07  2:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-01  1:42 [RFC patch 0/4]ndctl: nvdimmd: notify/monitor the feathers of over threshold event Qi, Fuli
2017-09-01 21:12 ` Dan Williams
2017-09-05 20:57   ` Dan Williams
2017-09-05 23:10     ` Song Liu
2017-09-05 23:42       ` Dan Williams
2017-09-06  0:02         ` Song Liu
2017-09-06  0:07           ` Dan Williams
2017-09-07  2:48             ` Qi, Fuli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.