All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rasdaemon: ras-mc-ctl: Add exception handling and support memory_failure_event
@ 2020-11-03 14:22 Shiju Jose
  2020-11-03 14:22 ` [PATCH 1/3] rasdaemon: ras-mc-ctl: Modify ARM processor error summary log Shiju Jose
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Shiju Jose @ 2020-11-03 14:22 UTC (permalink / raw)
  To: linux-edac, mchehab+huawei; +Cc: linuxarm, tanxiaofei, shiju.jose

Add exception handling and support memory_failure_event.

Shiju Jose (3):
  rasdaemon: ras-mc-ctl: Modify ARM processor error summary log
  rasdaemon: ras-mc-ctl: Add memory failure events
  rasdaemon: ras-mc-ctl: Add exception handling

 util/ras-mc-ctl.in | 612 ++++++++++++++++++++++++++-------------------
 1 file changed, 359 insertions(+), 253 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] rasdaemon: ras-mc-ctl: Modify ARM processor error summary log
  2020-11-03 14:22 [PATCH 0/3] rasdaemon: ras-mc-ctl: Add exception handling and support memory_failure_event Shiju Jose
@ 2020-11-03 14:22 ` Shiju Jose
  2020-11-03 14:22 ` [PATCH 2/3] rasdaemon: ras-mc-ctl: Add memory failure events Shiju Jose
  2020-11-03 14:22 ` [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling Shiju Jose
  2 siblings, 0 replies; 7+ messages in thread
From: Shiju Jose @ 2020-11-03 14:22 UTC (permalink / raw)
  To: linux-edac, mchehab+huawei; +Cc: linuxarm, tanxiaofei, shiju.jose

Add CPU's mpidr information to the ARM processor error
summary log.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 util/ras-mc-ctl.in | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/ras-mc-ctl.in b/util/ras-mc-ctl.in
index dd7d56f..d8abdbd 100755
--- a/util/ras-mc-ctl.in
+++ b/util/ras-mc-ctl.in
@@ -1123,7 +1123,7 @@ sub summary
     my ($err_type, $label, $mc, $top, $mid, $low, $count, $msg);
     my ($etype, $severity, $etype_string, $severity_string);
     my ($dev_name, $dev);
-    my ($affinity, $mpidr);
+    my ($mpidr);
 
     my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", "", "", {});
 
@@ -1160,13 +1160,13 @@ sub summary
     $query_handle->finish;
 
     # ARM processor arm_event errors
-    $query = "select affinity, mpidr, count(*) from arm_event group by affinity, mpidr";
+    $query = "select mpidr, count(*) from arm_event group by mpidr";
     $query_handle = $dbh->prepare($query);
     $query_handle->execute();
-    $query_handle->bind_columns(\($affinity, $mpidr, $count));
+    $query_handle->bind_columns(\($mpidr, $count));
     $out = "";
     while($query_handle->fetch()) {
-        $out .= "\t$count errors\n";
+        $out .= sprintf "\tCPU(mpidr=0x%x) has %d errors\n", $mpidr, $count;
     }
     if ($out ne "") {
         print "ARM processor events summary:\n$out\n";
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] rasdaemon: ras-mc-ctl: Add memory failure events
  2020-11-03 14:22 [PATCH 0/3] rasdaemon: ras-mc-ctl: Add exception handling and support memory_failure_event Shiju Jose
  2020-11-03 14:22 ` [PATCH 1/3] rasdaemon: ras-mc-ctl: Modify ARM processor error summary log Shiju Jose
@ 2020-11-03 14:22 ` Shiju Jose
  2020-11-03 14:22 ` [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling Shiju Jose
  2 siblings, 0 replies; 7+ messages in thread
From: Shiju Jose @ 2020-11-03 14:22 UTC (permalink / raw)
  To: linux-edac, mchehab+huawei; +Cc: linuxarm, tanxiaofei, shiju.jose

Add supporting memory failure errors (memory_failure_event)
to the ras-mc-ctl tool.

Sample Log,
ras-mc-ctl --summary
...
Memory failure events summary:
        Delayed errors: 4
        Failed errors: 1
...

ras-mc-ctl --errors
...
Memory failure events:
1 2020-10-28 23:20:41 -0800 error: pfn=0x204000000, page_type=free buddy page, action_result=Delayed
2 2020-10-28 23:31:38 -0800 error: pfn=0x204000000, page_type=free buddy page, action_result=Delayed
3 2020-10-28 23:54:54 -0800 error: pfn=0x205000000, page_type=free buddy page, action_result=Delayed
4 2020-10-29 00:12:25 -0800 error: pfn=0x204000000, page_type=free buddy page, action_result=Delayed
5 2020-10-29 00:26:36 -0800 error: pfn=0x204000000, page_type=free buddy page, action_result=Failed

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 util/ras-mc-ctl.in | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/util/ras-mc-ctl.in b/util/ras-mc-ctl.in
index d8abdbd..eebcc4e 100755
--- a/util/ras-mc-ctl.in
+++ b/util/ras-mc-ctl.in
@@ -1120,7 +1120,7 @@ sub summary
 {
     require DBI;
     my ($query, $query_handle, $out);
-    my ($err_type, $label, $mc, $top, $mid, $low, $count, $msg);
+    my ($err_type, $label, $mc, $top, $mid, $low, $count, $msg, $action_result);
     my ($etype, $severity, $etype_string, $severity_string);
     my ($dev_name, $dev);
     my ($mpidr);
@@ -1225,6 +1225,22 @@ sub summary
     }
     $query_handle->finish;
 
+    # Memory failure errors
+    $query = "select action_result, count(*) from memory_failure_event group by action_result";
+    $query_handle = $dbh->prepare($query);
+    $query_handle->execute();
+    $query_handle->bind_columns(\($action_result, $count));
+    $out = "";
+    while($query_handle->fetch()) {
+        $out .= "\t$action_result errors: $count\n";
+    }
+    if ($out ne "") {
+        print "Memory failure events summary:\n$out\n";
+    } else {
+        print "No Memory failure errors.\n\n";
+    }
+    $query_handle->finish;
+
     # MCE mce_record errors
     $query = "select error_msg, count(*) from mce_record group by error_msg";
     $query_handle = $dbh->prepare($query);
@@ -1253,6 +1269,7 @@ sub errors
     my ($bus_name, $dev_name, $driver_name, $reporter_name);
     my ($dev, $sector, $nr_sector, $error, $rwbs, $cmd);
     my ($error_count, $affinity, $mpidr, $r_state, $psci_state);
+    my ($pfn, $page_type, $action_result);
 
     my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", "", "", {});
 
@@ -1384,6 +1401,23 @@ sub errors
     }
     $query_handle->finish;
 
+    # Memory failure errors
+    $query = "select id, timestamp, pfn, page_type, action_result from memory_failure_event order by id";
+    $query_handle = $dbh->prepare($query);
+    $query_handle->execute();
+    $query_handle->bind_columns(\($id, $timestamp, $pfn, $page_type, $action_result));
+    $out = "";
+    while($query_handle->fetch()) {
+        $out .= "$id $timestamp error: ";
+        $out .= "pfn=$pfn, page_type=$page_type, action_result=$action_result\n";
+    }
+    if ($out ne "") {
+        print "Memory failure events:\n$out\n";
+    } else {
+        print "No Memory failure errors.\n\n";
+    }
+    $query_handle->finish;
+
     # MCE mce_record errors
     $query = "select id, timestamp, mcgcap, mcgstatus, status, addr, misc, ip, tsc, walltime, cpu, cpuid, apicid, socketid, cs, bank, cpuvendor, bank_name, error_msg, mcgstatus_msg, mcistatus_msg, user_action, mc_location from mce_record order by id";
     $query_handle = $dbh->prepare($query);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
  2020-11-03 14:22 [PATCH 0/3] rasdaemon: ras-mc-ctl: Add exception handling and support memory_failure_event Shiju Jose
  2020-11-03 14:22 ` [PATCH 1/3] rasdaemon: ras-mc-ctl: Modify ARM processor error summary log Shiju Jose
  2020-11-03 14:22 ` [PATCH 2/3] rasdaemon: ras-mc-ctl: Add memory failure events Shiju Jose
@ 2020-11-03 14:22 ` Shiju Jose
  2020-12-23 10:03   ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 7+ messages in thread
From: Shiju Jose @ 2020-11-03 14:22 UTC (permalink / raw)
  To: linux-edac, mchehab+huawei; +Cc: linuxarm, tanxiaofei, shiju.jose

Add exception handling in the ras-mc-ctl.

For example, when an event's table is not present in the SQLite DB,
then the DBI would detect exception and ras-mc-ctl exit without
read and log remaining event's information. This would happen
when an event is not enabled in the rasdaemon. Following is the
error log when the devlink_event table is not present in the DB,
"DBD::SQLite::db prepare failed: no such table: devlink_event at ./ras-mc-ctl line 1198.
Can't call method "execute" on an undefined value at ./ras-mc-ctl line 1199"

Also disabled the DBI's automatic error logging by setting
the $dbh->{PrintError} = 0 to avoid duplicate exception logs.

Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 util/ras-mc-ctl.in | 632 +++++++++++++++++++++++++--------------------
 1 file changed, 352 insertions(+), 280 deletions(-)

diff --git a/util/ras-mc-ctl.in b/util/ras-mc-ctl.in
index eebcc4e..07b52e9 100755
--- a/util/ras-mc-ctl.in
+++ b/util/ras-mc-ctl.in
@@ -34,6 +34,7 @@ use File::Basename;
 use File::Find;
 use Getopt::Long;
 use POSIX;
+use Try::Tiny;
 
 my $dbname      = "@RASSTATEDIR@/@RAS_DB_FNAME@";
 my $prefix      = "@prefix@";
@@ -1126,136 +1127,171 @@ sub summary
     my ($mpidr);
 
     my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", "", "", {});
+    # Disable the DBI automatic exception log
+    $dbh->{PrintError} = 0;
 
     # Memory controller mc_event errors
-    $query = "select err_type, label, mc, top_layer,middle_layer,lower_layer, count(*) from mc_event group by err_type, label, mc, top_layer, middle_layer, lower_layer";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($err_type, $label, $mc, $top, $mid, $low, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "\t$err_type on DIMM Label(s): '$label' location: $mc:$top:$mid:$low errors: $count\n";
-    }
-    if ($out ne "") {
-        print "Memory controller events summary:\n$out\n";
-    } else {
-        print "No Memory errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select err_type, label, mc, top_layer,middle_layer,lower_layer, count(*) from mc_event group by err_type, label, mc, top_layer, middle_layer, lower_layer";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($err_type, $label, $mc, $top, $mid, $low, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "\t$err_type on DIMM Label(s): '$label' location: $mc:$top:$mid:$low errors: $count\n";
+        }
+        if ($out ne "") {
+            print "Memory controller events summary:\n$out\n";
+        } else {
+            print "No Memory errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n";
+        log_error ("mc_event table missing from $dbname. Run 'rasdaemon --record'.\n\n");
+    };
 
     # PCIe AER aer_event errors
-    $query = "select err_type, err_msg, count(*) from aer_event group by err_type, err_msg";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($err_type, $msg, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "\t$count $err_type errors: $msg\n";
-    }
-    if ($out ne "") {
-        print "PCIe AER events summary:\n$out\n";
-    } else {
-        print "No PCIe AER errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select err_type, err_msg, count(*) from aer_event group by err_type, err_msg";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($err_type, $msg, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "\t$count $err_type errors: $msg\n";
+        }
+        if ($out ne "") {
+            print "PCIe AER events summary:\n$out\n";
+        } else {
+            print "No PCIe AER errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # ARM processor arm_event errors
-    $query = "select mpidr, count(*) from arm_event group by mpidr";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($mpidr, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= sprintf "\tCPU(mpidr=0x%x) has %d errors\n", $mpidr, $count;
-    }
-    if ($out ne "") {
-        print "ARM processor events summary:\n$out\n";
-    } else {
-        print "No ARM processor errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select mpidr, count(*) from arm_event group by mpidr";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($mpidr, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= sprintf "\tCPU(mpidr=0x%x) has %d errors\n", $mpidr, $count;
+        }
+        if ($out ne "") {
+            print "ARM processor events summary:\n$out\n";
+        } else {
+            print "No ARM processor errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # extlog errors
-    $query = "select etype, severity, count(*) from extlog_event group by etype, severity";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($etype, $severity, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $etype_string = get_extlog_type($etype);
-        $severity_string = get_extlog_severity($severity);
-        $out .= "\t$count $etype_string $severity_string errors\n";
-    }
-    if ($out ne "") {
-        print "Extlog records summary:\n$out";
-    } else {
-        print "No Extlog errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select etype, severity, count(*) from extlog_event group by etype, severity";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($etype, $severity, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $etype_string = get_extlog_type($etype);
+            $severity_string = get_extlog_severity($severity);
+            $out .= "\t$count $etype_string $severity_string errors\n";
+        }
+        if ($out ne "") {
+            print "Extlog records summary:\n$out";
+        } else {
+            print "No Extlog errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # devlink errors
-    $query = "select dev_name, count(*) from devlink_event group by dev_name";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($dev_name, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "\t$dev_name has $count errors\n";
-    }
-    if ($out ne "") {
-        print "Devlink records summary:\n$out";
-    } else {
-        print "No devlink errors.\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select dev_name, count(*) from devlink_event group by dev_name";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($dev_name, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "\t$dev_name has $count errors\n";
+        }
+        if ($out ne "") {
+            print "Devlink records summary:\n$out";
+        } else {
+            print "No devlink errors.\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # Disk errors
-    $query = "select dev, count(*) from disk_errors group by dev";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($dev, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "\t$dev has $count errors\n";
-    }
-    if ($out ne "") {
-        print "Disk errors summary:\n$out";
-    } else {
-        print "No disk errors.\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select dev, count(*) from disk_errors group by dev";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($dev, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "\t$dev has $count errors\n";
+        }
+        if ($out ne "") {
+            print "Disk errors summary:\n$out";
+        } else {
+            print "No disk errors.\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # Memory failure errors
-    $query = "select action_result, count(*) from memory_failure_event group by action_result";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($action_result, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "\t$action_result errors: $count\n";
-    }
-    if ($out ne "") {
-        print "Memory failure events summary:\n$out\n";
-    } else {
-        print "No Memory failure errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select action_result, count(*) from memory_failure_event group by action_result";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($action_result, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "\t$action_result errors: $count\n";
+        }
+        if ($out ne "") {
+            print "Memory failure events summary:\n$out\n";
+        } else {
+            print "No Memory failure errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # MCE mce_record errors
-    $query = "select error_msg, count(*) from mce_record group by error_msg";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($msg, $count));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "\t$count $msg errors\n";
-    }
-    if ($out ne "") {
-        print "MCE records summary:\n$out";
-    } else {
-        print "No MCE errors.\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select error_msg, count(*) from mce_record group by error_msg";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($msg, $count));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "\t$count $msg errors\n";
+        }
+        if ($out ne "") {
+            print "MCE records summary:\n$out";
+        } else {
+            print "No MCE errors.\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     undef($dbh);
 }
@@ -1272,189 +1308,225 @@ sub errors
     my ($pfn, $page_type, $action_result);
 
     my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", "", "", {});
+    # Disable the DBI automatic exception log
+    $dbh->{PrintError} = 0;
 
     # Memory controller mc_event errors
-    $query = "select id, timestamp, err_count, err_type, err_msg, label, mc, top_layer,middle_layer,lower_layer, address, grain, syndrome, driver_detail from mc_event order by id";
-    $query_handle = $dbh->prepare($query);
-    if (!$query_handle) {
+    try {
+        $query = "select id, timestamp, err_count, err_type, err_msg, label, mc, top_layer,middle_layer,lower_layer, address, grain, syndrome, driver_detail from mc_event order by id";
+        $query_handle = $dbh->prepare($query);
+        if (!$query_handle) {
+            log_error ("mc_event table missing from $dbname. Run 'rasdaemon --record'.\n");
+            exit -1
+        }
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $time, $count, $type, $msg, $label, $mc, $top, $mid, $low, $addr, $grain, $syndrome, $detail));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $time $count $type error(s): $msg at $label location: $mc:$top:$mid:$low, addr $addr, grain $grain, syndrome $syndrome $detail\n";
+        }
+        if ($out ne "") {
+            print "Memory controller events:\n$out\n";
+        } else {
+            print "No Memory errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
         log_error ("mc_event table missing from $dbname. Run 'rasdaemon --record'.\n");
-        exit -1
-    }
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $time, $count, $type, $msg, $label, $mc, $top, $mid, $low, $addr, $grain, $syndrome, $detail));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $time $count $type error(s): $msg at $label location: $mc:$top:$mid:$low, addr $addr, grain $grain, syndrome $syndrome $detail\n";
-    }
-    if ($out ne "") {
-        print "Memory controller events:\n$out\n";
-    } else {
-        print "No Memory errors.\n\n";
-    }
-    $query_handle->finish;
+	exit -1
+    };
 
     # PCIe AER aer_event errors
-    $query = "select id, timestamp, dev_name, err_type, err_msg from aer_event order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $time, $devname, $type, $msg));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $time $devname $type error: $msg\n";
-    }
-    if ($out ne "") {
-        print "PCIe AER events:\n$out\n";
-    } else {
-        print "No PCIe AER errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, dev_name, err_type, err_msg from aer_event order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $time, $devname, $type, $msg));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $time $devname $type error: $msg\n";
+        }
+        if ($out ne "") {
+            print "PCIe AER events:\n$out\n";
+        } else {
+           print "No PCIe AER errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # ARM processor arm_event errors
-    $query = "select id, timestamp, error_count, affinity, mpidr, running_state, psci_state from arm_event order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $timestamp, $error_count, $affinity, $mpidr, $r_state, $psci_state));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $timestamp error: ";
-        $out .= "error_count=$error_count, " if ($error_count);
-        $out .= "affinity_level=$affinity, ";
-        $out .= sprintf "mpidr=0x%x, ", $mpidr;
-        $out .= sprintf "running_state=0x%x, ", $r_state;
-        $out .= sprintf "psci_state=0x%x", $psci_state;
-        $out .= "\n";
-    }
-    if ($out ne "") {
-        print "ARM processor events:\n$out\n";
-    } else {
-        print "No ARM processor errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, error_count, affinity, mpidr, running_state, psci_state from arm_event order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $timestamp, $error_count, $affinity, $mpidr, $r_state, $psci_state));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $timestamp error: ";
+            $out .= "error_count=$error_count, " if ($error_count);
+            $out .= "affinity_level=$affinity, ";
+            $out .= sprintf "mpidr=0x%x, ", $mpidr;
+            $out .= sprintf "running_state=0x%x, ", $r_state;
+            $out .= sprintf "psci_state=0x%x", $psci_state;
+            $out .= "\n";
+        }
+        if ($out ne "") {
+            print "ARM processor events:\n$out\n";
+        } else {
+            print "No ARM processor errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # Extlog errors
-    $query = "select id, timestamp, etype, severity, address, fru_id, fru_text, cper_data from extlog_event order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $timestamp, $etype, $severity, $addr, $fru_id, $fru_text, $cper_data));
-    $out = "";
-    while($query_handle->fetch()) {
-        $etype_string = get_extlog_type($etype);
-        $severity_string = get_extlog_severity($severity);
-        $out .= "$id $timestamp error: ";
-        $out .= "type=$etype_string, ";
-        $out .= "severity=$severity_string, ";
-        $out .= sprintf "address=0x%08x, ", $addr;
-        $out .= sprintf "fru_id=%s, ", get_uuid_le($fru_id);
-        $out .= "fru_text='$fru_text', ";
-        $out .= get_cper_data_text($cper_data) if ($cper_data);
-        $out .= "\n";
-    }
-    if ($out ne "") {
-        print "Extlog events:\n$out\n";
-    } else {
-        print "No Extlog errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, etype, severity, address, fru_id, fru_text, cper_data from extlog_event order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $timestamp, $etype, $severity, $addr, $fru_id, $fru_text, $cper_data));
+        $out = "";
+        while($query_handle->fetch()) {
+            $etype_string = get_extlog_type($etype);
+            $severity_string = get_extlog_severity($severity);
+            $out .= "$id $timestamp error: ";
+            $out .= "type=$etype_string, ";
+            $out .= "severity=$severity_string, ";
+            $out .= sprintf "address=0x%08x, ", $addr;
+            $out .= sprintf "fru_id=%s, ", get_uuid_le($fru_id);
+            $out .= "fru_text='$fru_text', ";
+            $out .= get_cper_data_text($cper_data) if ($cper_data);
+            $out .= "\n";
+        }
+        if ($out ne "") {
+            print "Extlog events:\n$out\n";
+        } else {
+            print "No Extlog errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # devlink errors
-    $query = "select id, timestamp, bus_name, dev_name, driver_name, reporter_name, msg from devlink_event order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $timestamp, $bus_name, $dev_name, $driver_name, $reporter_name, $msg));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $timestamp error: ";
-        $out .= "bus_name=$bus_name, ";
-        $out .= "dev_name=$dev_name, ";
-        $out .= "driver_name=$driver_name, ";
-        $out .= "reporter_name=$reporter_name, ";
-        $out .= "message='$msg', ";
-        $out .= "\n";
-    }
-    if ($out ne "") {
-        print "Devlink events:\n$out\n";
-    } else {
-        print "No devlink errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, bus_name, dev_name, driver_name, reporter_name, msg from devlink_event order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $timestamp, $bus_name, $dev_name, $driver_name, $reporter_name, $msg));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $timestamp error: ";
+            $out .= "bus_name=$bus_name, ";
+            $out .= "dev_name=$dev_name, ";
+            $out .= "driver_name=$driver_name, ";
+            $out .= "reporter_name=$reporter_name, ";
+            $out .= "message='$msg', ";
+            $out .= "\n";
+        }
+        if ($out ne "") {
+            print "Devlink events:\n$out\n";
+        } else {
+            print "No devlink errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # Disk errors
-    $query = "select id, timestamp, dev, sector, nr_sector, error, rwbs, cmd from disk_errors order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $timestamp, $dev, $sector, $nr_sector, $error, $rwbs, $cmd));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $timestamp error: ";
-        $out .= "dev=$dev, ";
-        $out .= "sector=$sector, ";
-        $out .= "nr_sector=$nr_sector, ";
-        $out .= "error='$error', ";
-        $out .= "rwbs='$rwbs', ";
-        $out .= "cmd='$cmd', ";
-        $out .= "\n";
-    }
-    if ($out ne "") {
-        print "Disk errors\n$out\n";
-    } else {
-        print "No disk errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, dev, sector, nr_sector, error, rwbs, cmd from disk_errors order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $timestamp, $dev, $sector, $nr_sector, $error, $rwbs, $cmd));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $timestamp error: ";
+            $out .= "dev=$dev, ";
+            $out .= "sector=$sector, ";
+            $out .= "nr_sector=$nr_sector, ";
+            $out .= "error='$error', ";
+            $out .= "rwbs='$rwbs', ";
+            $out .= "cmd='$cmd', ";
+           $out .= "\n";
+        }
+        if ($out ne "") {
+            print "Disk errors\n$out\n";
+        } else {
+            print "No disk errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # Memory failure errors
-    $query = "select id, timestamp, pfn, page_type, action_result from memory_failure_event order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $timestamp, $pfn, $page_type, $action_result));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $timestamp error: ";
-        $out .= "pfn=$pfn, page_type=$page_type, action_result=$action_result\n";
-    }
-    if ($out ne "") {
-        print "Memory failure events:\n$out\n";
-    } else {
-        print "No Memory failure errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, pfn, page_type, action_result from memory_failure_event order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $timestamp, $pfn, $page_type, $action_result));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $timestamp error: ";
+            $out .= "pfn=$pfn, page_type=$page_type, action_result=$action_result\n";
+        }
+        if ($out ne "") {
+            print "Memory failure events:\n$out\n";
+        } else {
+            print "No Memory failure errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     # MCE mce_record errors
-    $query = "select id, timestamp, mcgcap, mcgstatus, status, addr, misc, ip, tsc, walltime, cpu, cpuid, apicid, socketid, cs, bank, cpuvendor, bank_name, error_msg, mcgstatus_msg, mcistatus_msg, user_action, mc_location from mce_record order by id";
-    $query_handle = $dbh->prepare($query);
-    $query_handle->execute();
-    $query_handle->bind_columns(\($id, $time, $mcgcap,$mcgstatus, $status, $addr, $misc, $ip, $tsc, $walltime, $cpu, $cpuid, $apicid, $socketid, $cs, $bank, $cpuvendor, $bank_name, $msg, $mcgstatus_msg, $mcistatus_msg, $user_action, $mc_location));
-    $out = "";
-    while($query_handle->fetch()) {
-        $out .= "$id $time error: $msg";
-	$out .= ", CPU $cpuvendor" if ($cpuvendor);
-	$out .= ", bank $bank_name" if ($bank_name);
-	$out .= ", mcg $mcgstatus_msg" if ($mcgstatus_msg);
-	$out .= ", mci $mcistatus_msg" if ($mcistatus_msg);
-	$out .= ", $mc_location" if ($mc_location);
-	$out .= ", $user_action" if ($user_action);
-	$out .= sprintf ", mcgcap=0x%08x", $mcgcap if ($mcgcap);
-	$out .= sprintf ", mcgstatus=0x%08x", $mcgstatus if ($mcgstatus);
-	$out .= sprintf ", status=0x%08x", $status if ($status);
-	$out .= sprintf ", addr=0x%08x", $addr if ($addr);
-	$out .= sprintf ", misc=0x%08x", $misc if ($misc);
-	$out .= sprintf ", ip=0x%08x", $ip if ($ip);
-	$out .= sprintf ", tsc=0x%08x", $tsc if ($tsc);
-	$out .= sprintf ", walltime=0x%08x", $walltime if ($walltime);
-	$out .= sprintf ", cpu=0x%08x", $cpu if ($cpu);
-	$out .= sprintf ", cpuid=0x%08x", $cpuid if ($cpuid);
-	$out .= sprintf ", apicid=0x%08x", $apicid if ($apicid);
-	$out .= sprintf ", socketid=0x%08x", $socketid if ($socketid);
-	$out .= sprintf ", cs=0x%08x", $cs if ($cs);
-	$out .= sprintf ", bank=0x%08x", $bank if ($bank);
-
-	$out .= "\n";
-    }
-    if ($out ne "") {
-        print "MCE events:\n$out\n";
-    } else {
-        print "No MCE errors.\n\n";
-    }
-    $query_handle->finish;
+    try {
+        $query = "select id, timestamp, mcgcap, mcgstatus, status, addr, misc, ip, tsc, walltime, cpu, cpuid, apicid, socketid, cs, bank, cpuvendor, bank_name, error_msg, mcgstatus_msg, mcistatus_msg, user_action, mc_location from mce_record order by id";
+        $query_handle = $dbh->prepare($query);
+        $query_handle->execute();
+        $query_handle->bind_columns(\($id, $time, $mcgcap,$mcgstatus, $status, $addr, $misc, $ip, $tsc, $walltime, $cpu, $cpuid, $apicid, $socketid, $cs, $bank, $cpuvendor, $bank_name, $msg, $mcgstatus_msg, $mcistatus_msg, $user_action, $mc_location));
+        $out = "";
+        while($query_handle->fetch()) {
+            $out .= "$id $time error: $msg";
+	    $out .= ", CPU $cpuvendor" if ($cpuvendor);
+	    $out .= ", bank $bank_name" if ($bank_name);
+	    $out .= ", mcg $mcgstatus_msg" if ($mcgstatus_msg);
+	    $out .= ", mci $mcistatus_msg" if ($mcistatus_msg);
+	    $out .= ", $mc_location" if ($mc_location);
+	    $out .= ", $user_action" if ($user_action);
+	    $out .= sprintf ", mcgcap=0x%08x", $mcgcap if ($mcgcap);
+	    $out .= sprintf ", mcgstatus=0x%08x", $mcgstatus if ($mcgstatus);
+	    $out .= sprintf ", status=0x%08x", $status if ($status);
+	    $out .= sprintf ", addr=0x%08x", $addr if ($addr);
+	    $out .= sprintf ", misc=0x%08x", $misc if ($misc);
+	    $out .= sprintf ", ip=0x%08x", $ip if ($ip);
+	    $out .= sprintf ", tsc=0x%08x", $tsc if ($tsc);
+	    $out .= sprintf ", walltime=0x%08x", $walltime if ($walltime);
+	    $out .= sprintf ", cpu=0x%08x", $cpu if ($cpu);
+	    $out .= sprintf ", cpuid=0x%08x", $cpuid if ($cpuid);
+	    $out .= sprintf ", apicid=0x%08x", $apicid if ($apicid);
+	    $out .= sprintf ", socketid=0x%08x", $socketid if ($socketid);
+	    $out .= sprintf ", cs=0x%08x", $cs if ($cs);
+	    $out .= sprintf ", bank=0x%08x", $bank if ($bank);
+
+	    $out .= "\n";
+        }
+        if ($out ne "") {
+            print "MCE events:\n$out\n";
+        } else {
+            print "No MCE errors.\n\n";
+        }
+        $query_handle->finish;
+    } catch {
+        print "Exception: $DBI::errstr\n\n";
+    };
 
     undef($dbh);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
  2020-11-03 14:22 ` [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling Shiju Jose
@ 2020-12-23 10:03   ` Mauro Carvalho Chehab
  2021-01-04  9:35     ` Shiju Jose
  0 siblings, 1 reply; 7+ messages in thread
From: Mauro Carvalho Chehab @ 2020-12-23 10:03 UTC (permalink / raw)
  To: Shiju Jose; +Cc: linux-edac, linuxarm, tanxiaofei

Em Tue, 3 Nov 2020 14:22:58 +0000
Shiju Jose <shiju.jose@huawei.com> escreveu:

> Add exception handling in the ras-mc-ctl.
> 
> For example, when an event's table is not present in the SQLite DB,
> then the DBI would detect exception and ras-mc-ctl exit without
> read and log remaining event's information. This would happen
> when an event is not enabled in the rasdaemon. Following is the
> error log when the devlink_event table is not present in the DB,
> "DBD::SQLite::db prepare failed: no such table: devlink_event at ./ras-mc-ctl line 1198.
> Can't call method "execute" on an undefined value at ./ras-mc-ctl line 1199"
> 
> Also disabled the DBI's automatic error logging by setting
> the $dbh->{PrintError} = 0 to avoid duplicate exception logs.

Hmm...


	$ ./util/ras-mc-ctl --summary
	No Memory errors.
	
	No PCIe AER errors.

	No ARM processor errors.
	
	No Extlog errors.
	
	No devlink errors.
	No disk errors.
	Exception: no such table: memory_failure_event

	No MCE errors.

While it sounds a good idea to catch such events, printing it as an
exception doesn't seem the right thing to me, at least for things 
like "no such table".

IMO, it should print something more intuitive, like:

	"Warning: Memory failure detection not enabled"

-

Yet, on a separate note, there's no memory_failure_event upstream.

Maybe I missed some prior patch to be applied before this one?


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
  2020-12-23 10:03   ` Mauro Carvalho Chehab
@ 2021-01-04  9:35     ` Shiju Jose
  2021-01-12 18:06       ` Shiju Jose
  0 siblings, 1 reply; 7+ messages in thread
From: Shiju Jose @ 2021-01-04  9:35 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-edac, Linuxarm, tanxiaofei

Hi Mauro,

>-----Original Message-----
>From: Mauro Carvalho Chehab [mailto:mchehab+huawei@kernel.org]
>Sent: 23 December 2020 10:04
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>tanxiaofei <tanxiaofei@huawei.com>
>Subject: Re: [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
>
>Em Tue, 3 Nov 2020 14:22:58 +0000
>Shiju Jose <shiju.jose@huawei.com> escreveu:
>
>> Add exception handling in the ras-mc-ctl.
>>
>> For example, when an event's table is not present in the SQLite DB,
>> then the DBI would detect exception and ras-mc-ctl exit without read
>> and log remaining event's information. This would happen when an event
>> is not enabled in the rasdaemon. Following is the error log when the
>> devlink_event table is not present in the DB, "DBD::SQLite::db prepare
>> failed: no such table: devlink_event at ./ras-mc-ctl line 1198.
>> Can't call method "execute" on an undefined value at ./ras-mc-ctl line
>1199"
>>
>> Also disabled the DBI's automatic error logging by setting the
>> $dbh->{PrintError} = 0 to avoid duplicate exception logs.
>
>Hmm...
>
>
>	$ ./util/ras-mc-ctl --summary
>	No Memory errors.
>
>	No PCIe AER errors.
>
>	No ARM processor errors.
>
>	No Extlog errors.
>
>	No devlink errors.
>	No disk errors.
>	Exception: no such table: memory_failure_event
>
>	No MCE errors.
>
>While it sounds a good idea to catch such events, printing it as an exception
>doesn't seem the right thing to me, at least for things like "no such table".
>
>IMO, it should print something more intuitive, like:
>
>	"Warning: Memory failure detection not enabled"

Sure. I will change.
>
>-
>
>Yet, on a separate note, there's no memory_failure_event upstream.
>
>Maybe I missed some prior patch to be applied before this one?

This patch was posted previously.
https://patchwork.kernel.org/project/linux-edac/patch/20201002180144.1365-1-shiju.jose@huawei.com/

>
>
>Thanks,
>Mauro

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
  2021-01-04  9:35     ` Shiju Jose
@ 2021-01-12 18:06       ` Shiju Jose
  0 siblings, 0 replies; 7+ messages in thread
From: Shiju Jose @ 2021-01-12 18:06 UTC (permalink / raw)
  To: Shiju Jose, Mauro Carvalho Chehab; +Cc: linux-edac, Linuxarm, tanxiaofei

Hi Mauro,

>-----Original Message-----
>From: Shiju Jose [mailto:shiju.jose@huawei.com]
>Sent: 04 January 2021 09:36
>To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>Cc: linux-edac@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>tanxiaofei <tanxiaofei@huawei.com>
>Subject: RE: [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
>
>Hi Mauro,
>
>>-----Original Message-----
>>From: Mauro Carvalho Chehab [mailto:mchehab+huawei@kernel.org]
>>Sent: 23 December 2020 10:04
>>To: Shiju Jose <shiju.jose@huawei.com>
>>Cc: linux-edac@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>>tanxiaofei <tanxiaofei@huawei.com>
>>Subject: Re: [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling
>>
>>Em Tue, 3 Nov 2020 14:22:58 +0000
>>Shiju Jose <shiju.jose@huawei.com> escreveu:
>>
>>> Add exception handling in the ras-mc-ctl.
>>>
>>> For example, when an event's table is not present in the SQLite DB,
>>> then the DBI would detect exception and ras-mc-ctl exit without read
>>> and log remaining event's information. This would happen when an
>>> event is not enabled in the rasdaemon. Following is the error log
>>> when the devlink_event table is not present in the DB,
>>> "DBD::SQLite::db prepare
>>> failed: no such table: devlink_event at ./ras-mc-ctl line 1198.
>>> Can't call method "execute" on an undefined value at ./ras-mc-ctl
>>> line
>>1199"
>>>
>>> Also disabled the DBI's automatic error logging by setting the
>>> $dbh->{PrintError} = 0 to avoid duplicate exception logs.
>>
>>Hmm...
>>
>>
>>	$ ./util/ras-mc-ctl --summary
>>	No Memory errors.
>>
>>	No PCIe AER errors.
>>
>>	No ARM processor errors.
>>
>>	No Extlog errors.
>>
>>	No devlink errors.
>>	No disk errors.
>>	Exception: no such table: memory_failure_event
>>
>>	No MCE errors.
>>
>>While it sounds a good idea to catch such events, printing it as an
>>exception doesn't seem the right thing to me, at least for things like "no
>such table".
>>
>>IMO, it should print something more intuitive, like:
>>
>>	"Warning: Memory failure detection not enabled"
>
>Sure. I will change.

The cause of exception would be vary on different errors.
Thus I think we cannot add a specific error message here.
" no such table: memory_failure_event " is the print of DBI:errstr  
when the  table is not found in the SQL database. There could be other
error cases as well.
>>
>>-
>>
>>Yet, on a separate note, there's no memory_failure_event upstream.
>>
>>Maybe I missed some prior patch to be applied before this one?
>
>This patch was posted previously.
>https://patchwork.kernel.org/project/linux-
>edac/patch/20201002180144.1365-1-shiju.jose@huawei.com/
>
>>
>>
>>Thanks,
>>Mauro
>
Thanks,
Shiju

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-12 18:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 14:22 [PATCH 0/3] rasdaemon: ras-mc-ctl: Add exception handling and support memory_failure_event Shiju Jose
2020-11-03 14:22 ` [PATCH 1/3] rasdaemon: ras-mc-ctl: Modify ARM processor error summary log Shiju Jose
2020-11-03 14:22 ` [PATCH 2/3] rasdaemon: ras-mc-ctl: Add memory failure events Shiju Jose
2020-11-03 14:22 ` [PATCH 3/3] rasdaemon: ras-mc-ctl: Add exception handling Shiju Jose
2020-12-23 10:03   ` Mauro Carvalho Chehab
2021-01-04  9:35     ` Shiju Jose
2021-01-12 18:06       ` Shiju Jose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.