All of lore.kernel.org
 help / color / mirror / Atom feed
* [v4 PATCH] block: introduce block_rq_error tracepoint
@ 2022-01-25 20:35 Yang Shi
  2022-01-26  8:21 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Yang Shi @ 2022-01-25 20:35 UTC (permalink / raw)
  To: axboe, rostedt, xiyou.wangcong; +Cc: shy828301, linux-block, linux-kernel

Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.

But there are a few problems with this approach:

1. Even kernel trace filter could do the filtering work, there is
   still some overhead after we enable this tracepoint.

2. The filter is merely based on errno, which does not align with kernel
   logic to check the errors for print_req_error().

3. block_rq_complete only provides dev major and minor to identify
   the block device, it is not convenient to use in user-space.

So introduce a new tracepoint block_rq_error just for the error case
and provides the device name for convenience too. With this patch,
rasdaemon could switch to block_rq_error.

Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
The v3 patch was submitted in Feb 2020, and Steven reviewed the patch, but
it was not merged to upstream. See
https://lore.kernel.org/lkml/20200203053650.8923-1-xiyou.wangcong@gmail.com/.

The problems fixed by that patch still exist and we do need it to make
disk error handling in rasdaemon easier. So this resurrected it and
continued the version number.

v3 --> v4:
 * Rebased to v5.17-rc1.
 * Collected reviewed-by tag from Steven.

 block/blk-mq.c               |  4 +++-
 include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3bf3358a3bb..bb0593f93675 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -789,8 +789,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
 #endif
 
 	if (unlikely(error && !blk_rq_is_passthrough(req) &&
-		     !(req->rq_flags & RQF_QUIET)))
+		     !(req->rq_flags & RQF_QUIET))) {
+		trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);
 		blk_print_req_error(req, error);
+	}
 
 	blk_account_io_completion(req, nr_bytes);
 
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 27170e40e8c9..3ab6cfe5795a 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -144,6 +144,47 @@ TRACE_EVENT(block_rq_complete,
 		  __entry->nr_sector, __entry->error)
 );
 
+/**
+ * block_rq_error - block IO operation error reported by device driver
+ * @rq: block operations request
+ * @error: status code
+ * @nr_bytes: number of completed bytes
+ *
+ * The block_rq_error tracepoint event indicates that some portion
+ * of operation request has failed as reported by the device driver.
+ */
+TRACE_EVENT(block_rq_error,
+
+	TP_PROTO(struct request *rq, int error, unsigned int nr_bytes),
+
+	TP_ARGS(rq, error, nr_bytes),
+
+	TP_STRUCT__entry(
+		__field(  dev_t,	dev			)
+		__string( name,		rq->q->disk ? rq->q->disk->disk_name : "?")
+		__field(  sector_t,	sector			)
+		__field(  unsigned int,	nr_sector		)
+		__field(  int,		error			)
+		__array(  char,		rwbs,	RWBS_LEN	)
+	),
+
+	TP_fast_assign(
+		__entry->dev	   = rq->q->disk ? disk_devt(rq->q->disk) : 0;
+		__assign_str(name,   rq->q->disk ? rq->q->disk->disk_name : "?");
+		__entry->sector    = blk_rq_pos(rq);
+		__entry->nr_sector = nr_bytes >> 9;
+		__entry->error     = error;
+
+		blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
+	),
+
+	TP_printk("%d,%d %s %s %llu + %u [%d]",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __get_str(name), __entry->rwbs,
+		  (unsigned long long)__entry->sector,
+		  __entry->nr_sector, __entry->error)
+);
+
 DECLARE_EVENT_CLASS(block_rq,
 
 	TP_PROTO(struct request *rq),
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [v4 PATCH] block: introduce block_rq_error tracepoint
  2022-01-25 20:35 [v4 PATCH] block: introduce block_rq_error tracepoint Yang Shi
@ 2022-01-26  8:21 ` Christoph Hellwig
  2022-01-26 18:35   ` Yang Shi
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2022-01-26  8:21 UTC (permalink / raw)
  To: Yang Shi; +Cc: axboe, rostedt, xiyou.wangcong, linux-block, linux-kernel

On Tue, Jan 25, 2022 at 12:35:48PM -0800, Yang Shi wrote:
> Currently, rasdaemon uses the existing tracepoint block_rq_complete
> and filters out non-error cases in order to capture block disk errors.
> 
> But there are a few problems with this approach:
> 
> 1. Even kernel trace filter could do the filtering work, there is
>    still some overhead after we enable this tracepoint.
> 
> 2. The filter is merely based on errno, which does not align with kernel
>    logic to check the errors for print_req_error().
> 
> 3. block_rq_complete only provides dev major and minor to identify
>    the block device, it is not convenient to use in user-space.
> 
> So introduce a new tracepoint block_rq_error just for the error case
> and provides the device name for convenience too. With this patch,
> rasdaemon could switch to block_rq_error.
> 
> Cc: Jens Axboe <axboe@kernel.dk>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
> The v3 patch was submitted in Feb 2020, and Steven reviewed the patch, but
> it was not merged to upstream. See
> https://lore.kernel.org/lkml/20200203053650.8923-1-xiyou.wangcong@gmail.com/.
> 
> The problems fixed by that patch still exist and we do need it to make
> disk error handling in rasdaemon easier. So this resurrected it and
> continued the version number.
> 
> v3 --> v4:
>  * Rebased to v5.17-rc1.
>  * Collected reviewed-by tag from Steven.
> 
>  block/blk-mq.c               |  4 +++-
>  include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f3bf3358a3bb..bb0593f93675 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -789,8 +789,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
>  #endif
>  
>  	if (unlikely(error && !blk_rq_is_passthrough(req) &&
> -		     !(req->rq_flags & RQF_QUIET)))
> +		     !(req->rq_flags & RQF_QUIET))) {
> +		trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);

Please report the atual block layer status code instead of the errno
mapping here.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [v4 PATCH] block: introduce block_rq_error tracepoint
  2022-01-26  8:21 ` Christoph Hellwig
@ 2022-01-26 18:35   ` Yang Shi
  0 siblings, 0 replies; 3+ messages in thread
From: Yang Shi @ 2022-01-26 18:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Steven Rostedt, Cong Wang, linux-block,
	Linux Kernel Mailing List

On Wed, Jan 26, 2022 at 12:21 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jan 25, 2022 at 12:35:48PM -0800, Yang Shi wrote:
> > Currently, rasdaemon uses the existing tracepoint block_rq_complete
> > and filters out non-error cases in order to capture block disk errors.
> >
> > But there are a few problems with this approach:
> >
> > 1. Even kernel trace filter could do the filtering work, there is
> >    still some overhead after we enable this tracepoint.
> >
> > 2. The filter is merely based on errno, which does not align with kernel
> >    logic to check the errors for print_req_error().
> >
> > 3. block_rq_complete only provides dev major and minor to identify
> >    the block device, it is not convenient to use in user-space.
> >
> > So introduce a new tracepoint block_rq_error just for the error case
> > and provides the device name for convenience too. With this patch,
> > rasdaemon could switch to block_rq_error.
> >
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> > The v3 patch was submitted in Feb 2020, and Steven reviewed the patch, but
> > it was not merged to upstream. See
> > https://lore.kernel.org/lkml/20200203053650.8923-1-xiyou.wangcong@gmail.com/.
> >
> > The problems fixed by that patch still exist and we do need it to make
> > disk error handling in rasdaemon easier. So this resurrected it and
> > continued the version number.
> >
> > v3 --> v4:
> >  * Rebased to v5.17-rc1.
> >  * Collected reviewed-by tag from Steven.
> >
> >  block/blk-mq.c               |  4 +++-
> >  include/trace/events/block.h | 41 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index f3bf3358a3bb..bb0593f93675 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -789,8 +789,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
> >  #endif
> >
> >       if (unlikely(error && !blk_rq_is_passthrough(req) &&
> > -                  !(req->rq_flags & RQF_QUIET)))
> > +                  !(req->rq_flags & RQF_QUIET))) {
> > +             trace_block_rq_error(req, blk_status_to_errno(error), nr_bytes);
>
> Please report the atual block layer status code instead of the errno
> mapping here.

Sure, thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-26 18:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25 20:35 [v4 PATCH] block: introduce block_rq_error tracepoint Yang Shi
2022-01-26  8:21 ` Christoph Hellwig
2022-01-26 18:35   ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.