All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: device stat, log when zeroed assist audit
@ 2020-01-10  4:26 Anand Jain
  2020-01-10 15:07 ` Josef Bacik
  2020-01-10 19:47 ` Nikolay Borisov
  0 siblings, 2 replies; 5+ messages in thread
From: Anand Jain @ 2020-01-10  4:26 UTC (permalink / raw)
  To: linux-btrfs

We had a report indicating that some read errors aren't reported by
the device stats in the userland. It is important to have the errors
reported in the device stat as user land scripts might depend on it to
take the reasonable corrective actions. But to debug these issue we need
to be really sure that request to reset the device stat did not come
from the userland itself. So log an info message when device error reset
happens.

For example:
 BTRFS info (device sdc): device stats zeroed by btrfs (9223)

Reported-by: philip@philip-seeger.de
Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 BTRFS info (device sdc): device stats zeroed by btrfs (9223)
The last words are name and pid of the process, unfortunately it came out
as 'by btrfs'. At some point if there is a python and lib to reset it
would change, otherwise its going to be 'by btrfs', I am ok with it,
if otherwise please suggest the alternative.

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index eb55df0d4038..6fd90270e2c7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7324,6 +7324,8 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
 			else
 				btrfs_dev_stat_set(dev, i, 0);
 		}
+		btrfs_info(fs_info, "device stats zeroed by %s (%d)",
+			   current->comm, task_pid_nr(current));
 	} else {
 		for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++)
 			if (stats->nr_items > i)
-- 
2.23.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: device stat, log when zeroed assist audit
  2020-01-10  4:26 [PATCH] btrfs: device stat, log when zeroed assist audit Anand Jain
@ 2020-01-10 15:07 ` Josef Bacik
  2020-01-11  8:50   ` Anand Jain
  2020-01-10 19:47 ` Nikolay Borisov
  1 sibling, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2020-01-10 15:07 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 1/9/20 11:26 PM, Anand Jain wrote:
> We had a report indicating that some read errors aren't reported by
> the device stats in the userland. It is important to have the errors
> reported in the device stat as user land scripts might depend on it to
> take the reasonable corrective actions. But to debug these issue we need
> to be really sure that request to reset the device stat did not come
> from the userland itself. So log an info message when device error reset
> happens.
> 
> For example:
>   BTRFS info (device sdc): device stats zeroed by btrfs (9223)
> 
> Reported-by: philip@philip-seeger.de
> Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   BTRFS info (device sdc): device stats zeroed by btrfs (9223)
> The last words are name and pid of the process, unfortunately it came out
> as 'by btrfs'. At some point if there is a python and lib to reset it
> would change, otherwise its going to be 'by btrfs', I am ok with it,
> if otherwise please suggest the alternative.

I think name(pid) makes sense, similar to what drop_caches does

pr_info("%s (%d): drop_caches: %d\n",
	current->comm, task_pid_nr(current),

Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: device stat, log when zeroed assist audit
  2020-01-10  4:26 [PATCH] btrfs: device stat, log when zeroed assist audit Anand Jain
  2020-01-10 15:07 ` Josef Bacik
@ 2020-01-10 19:47 ` Nikolay Borisov
  1 sibling, 0 replies; 5+ messages in thread
From: Nikolay Borisov @ 2020-01-10 19:47 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs



On 10.01.20 г. 6:26 ч., Anand Jain wrote:
> We had a report indicating that some read errors aren't reported by
> the device stats in the userland. It is important to have the errors
> reported in the device stat as user land scripts might depend on it to
> take the reasonable corrective actions. But to debug these issue we need
> to be really sure that request to reset the device stat did not come
> from the userland itself. So log an info message when device error reset
> happens.
> 
> For example:
>  BTRFS info (device sdc): device stats zeroed by btrfs (9223)
> 
> Reported-by: philip@philip-seeger.de
> Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  BTRFS info (device sdc): device stats zeroed by btrfs (9223)
> The last words are name and pid of the process, unfortunately it came out
> as 'by btrfs'. At some point if there is a python and lib to reset it
> would change, otherwise its going to be 'by btrfs', I am ok with it,
> if otherwise please suggest the alternative.

This patch itself is OK but is not related to what Philip has reported.
The issue there is the fact we only record errors for 2 specific retvals
from block layer.

> 
>  fs/btrfs/volumes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index eb55df0d4038..6fd90270e2c7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7324,6 +7324,8 @@ int btrfs_get_dev_stats(struct btrfs_fs_info *fs_info,
>  			else
>  				btrfs_dev_stat_set(dev, i, 0);
>  		}
> +		btrfs_info(fs_info, "device stats zeroed by %s (%d)",
> +			   current->comm, task_pid_nr(current));
>  	} else {
>  		for (i = 0; i < BTRFS_DEV_STAT_VALUES_MAX; i++)
>  			if (stats->nr_items > i)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: device stat, log when zeroed assist audit
  2020-01-10 15:07 ` Josef Bacik
@ 2020-01-11  8:50   ` Anand Jain
  2020-01-13 16:59     ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Anand Jain @ 2020-01-11  8:50 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs

On 1/10/20 11:07 PM, Josef Bacik wrote:
> On 1/9/20 11:26 PM, Anand Jain wrote:
>> We had a report indicating that some read errors aren't reported by
>> the device stats in the userland. It is important to have the errors
>> reported in the device stat as user land scripts might depend on it to
>> take the reasonable corrective actions. But to debug these issue we need
>> to be really sure that request to reset the device stat did not come
>> from the userland itself. So log an info message when device error reset
>> happens.
>>
>> For example:
>>   BTRFS info (device sdc): device stats zeroed by btrfs (9223)
>>
>> Reported-by: philip@philip-seeger.de
>> Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   BTRFS info (device sdc): device stats zeroed by btrfs (9223)
>> The last words are name and pid of the process, unfortunately it came out
>> as 'by btrfs'. At some point if there is a python and lib to reset it
>> would change, otherwise its going to be 'by btrfs', I am ok with it,
>> if otherwise please suggest the alternative.
> 
> I think name(pid) makes sense, similar to what drop_caches does
> 
> pr_info("%s (%d): drop_caches: %d\n",
>      current->comm, task_pid_nr(current),

There is a small deviation to what we already have in
device_list_add(), name (pid) is at the end the log message..

------
                         pr_info(
         "BTRFS: device label %s devid %llu transid %llu %s scanned by 
%s (%d)\n",
                                 disk_super->label, devid, 
found_transid, path,
                                 current->comm, task_pid_nr(current));
--------

I am not sure. Can David can tweak during merge ?

Thanks, Anand

> Thanks,
> 
> Josef


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: device stat, log when zeroed assist audit
  2020-01-11  8:50   ` Anand Jain
@ 2020-01-13 16:59     ` David Sterba
  0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2020-01-13 16:59 UTC (permalink / raw)
  To: Anand Jain; +Cc: Josef Bacik, linux-btrfs

On Sat, Jan 11, 2020 at 04:50:18PM +0800, Anand Jain wrote:
> On 1/10/20 11:07 PM, Josef Bacik wrote:
> > On 1/9/20 11:26 PM, Anand Jain wrote:
> >> We had a report indicating that some read errors aren't reported by
> >> the device stats in the userland. It is important to have the errors
> >> reported in the device stat as user land scripts might depend on it to
> >> take the reasonable corrective actions. But to debug these issue we need
> >> to be really sure that request to reset the device stat did not come
> >> from the userland itself. So log an info message when device error reset
> >> happens.
> >>
> >> For example:
> >>   BTRFS info (device sdc): device stats zeroed by btrfs (9223)
> >>
> >> Reported-by: philip@philip-seeger.de
> >> Link: https://www.spinics.net/lists/linux-btrfs/msg96528.html
> >> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> >> ---
> >>   BTRFS info (device sdc): device stats zeroed by btrfs (9223)
> >> The last words are name and pid of the process, unfortunately it came out
> >> as 'by btrfs'. At some point if there is a python and lib to reset it
> >> would change, otherwise its going to be 'by btrfs', I am ok with it,
> >> if otherwise please suggest the alternative.
> > 
> > I think name(pid) makes sense, similar to what drop_caches does
> > 
> > pr_info("%s (%d): drop_caches: %d\n",
> >      current->comm, task_pid_nr(current),
> 
> There is a small deviation to what we already have in
> device_list_add(), name (pid) is at the end the log message..
> 
> ------
>                          pr_info(
>          "BTRFS: device label %s devid %llu transid %llu %s scanned by 
> %s (%d)\n",
>                                  disk_super->label, devid, 
> found_transid, path,
>                                  current->comm, task_pid_nr(current));
> --------
> 
> I am not sure. Can David can tweak during merge ?

Yes, no problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-13 16:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10  4:26 [PATCH] btrfs: device stat, log when zeroed assist audit Anand Jain
2020-01-10 15:07 ` Josef Bacik
2020-01-11  8:50   ` Anand Jain
2020-01-13 16:59     ` David Sterba
2020-01-10 19:47 ` Nikolay Borisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.