* [PATCH 0/2] Trace completion of all bios @ 2017-04-07 1:10 NeilBrown 2017-04-07 1:10 ` [PATCH 1/2] block: simple improvements for bio->flags NeilBrown ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: NeilBrown @ 2017-04-07 1:10 UTC (permalink / raw) To: Jens Axboe Cc: linux-raid, Mike Snitzer, Christoph Hellwig, Ming Lei, linux-kernel, linux-block, dm-devel, Shaohua Li, Alasdair Kergon Hi Jens, I think all objections to this patch have been answered so I'm resending. I've added a small cleanup patch first which makes some small enhancements to the documentation and #defines for the bio->flags field. Thanks, NeilBrown --- NeilBrown (2): block: simple improvements for bio->flags block: trace completion of all bios. block/bio.c | 13 +++++++++++++ block/blk-core.c | 10 +++++++++- drivers/md/dm.c | 1 - drivers/md/raid5.c | 2 -- include/linux/blk_types.h | 24 +++++++++++++++--------- 5 files changed, 37 insertions(+), 13 deletions(-) -- Signature ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] block: simple improvements for bio->flags 2017-04-07 1:10 [PATCH 0/2] Trace completion of all bios NeilBrown @ 2017-04-07 1:10 ` NeilBrown 2017-04-07 1:10 ` NeilBrown 2017-04-07 15:42 ` Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: NeilBrown @ 2017-04-07 1:10 UTC (permalink / raw) To: Jens Axboe Cc: linux-raid, Mike Snitzer, Christoph Hellwig, Ming Lei, linux-kernel, linux-block, dm-devel, Shaohua Li, Alasdair Kergon The comment for the 'flags' field of 'bio' mentions "command" which is no longer stored there, and doesn't mention the bvec pool number, which is. BIO_RESET_BITS is set in such a way that it would need to be updated if new bits were added, which is easy to miss. BVEC_POOL_BITS is larger than needed. The BVEC_POOL_IDX() ranges from 0 to 6, so 3 bits are sufficient. This patch make improvements in each of these areas. Signed-off-by: NeilBrown <neilb@suse.com> --- include/linux/blk_types.h | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index d703acb55d0f..ff07d891f38f 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -29,7 +29,7 @@ struct bio { * top bits REQ_OP. Use * accessors. */ - unsigned short bi_flags; /* status, command, etc */ + unsigned short bi_flags; /* status, etc and bvec pool number */ unsigned short bi_ioprio; struct bvec_iter bi_iter; @@ -102,12 +102,7 @@ struct bio { #define BIO_REFFED 8 /* bio has elevated ->bi_cnt */ #define BIO_THROTTLED 9 /* This bio has already been subjected to * throttling rules. Don't do it again. */ - -/* - * Flags starting here get preserved by bio_reset() - this includes - * BVEC_POOL_IDX() - */ -#define BIO_RESET_BITS 10 +/* See BVEC_POOL_OFFSET below before adding new flags */ /* * We support 6 different bvec pools, the last one is magic in that it @@ -117,13 +112,22 @@ struct bio { #define BVEC_POOL_MAX (BVEC_POOL_NR - 1) /* - * Top 4 bits of bio flags indicate the pool the bvecs came from. We add + * Top 3 bits of bio flags indicate the pool the bvecs came from. We add * 1 to the actual index so that 0 indicates that there are no bvecs to be * freed. */ -#define BVEC_POOL_BITS (4) +#define BVEC_POOL_BITS (3) #define BVEC_POOL_OFFSET (16 - BVEC_POOL_BITS) #define BVEC_POOL_IDX(bio) ((bio)->bi_flags >> BVEC_POOL_OFFSET) +#if (1<< BVEC_POOL_BITS) < (BVEC_POOL_NR+1) +# error "BVEC_POOL_BITS is too small" +#endif + +/* + * Flags starting here get preserved by bio_reset() - this includes + * only BVEC_POOL_IDX() + */ +#define BIO_RESET_BITS BVEC_POOL_OFFSET /* * Operations and flags common to the bio and request structures. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] block: trace completion of all bios. 2017-04-07 1:10 [PATCH 0/2] Trace completion of all bios NeilBrown @ 2017-04-07 1:10 ` NeilBrown 2017-04-07 1:10 ` NeilBrown 2017-04-07 15:42 ` Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: NeilBrown @ 2017-04-07 1:10 UTC (permalink / raw) To: Jens Axboe Cc: Christoph Hellwig, Mike Snitzer, Ming Lei, linux-kernel, linux-raid, dm-devel, linux-block, Shaohua Li, Alasdair Kergon Currently only dm and md/raid5 bios trigger trace_block_bio_complete(). Now that we have bio_chain() and bio_inc_remaining(), it is not possible, in general, for a driver to know when the bio is really complete. Only bio_endio() knows that. So move the trace_block_bio_complete() call to bio_endio(). Now trace_block_bio_complete() pairs with trace_block_bio_queue(). Any bio for which a 'queue' event is traced, will subsequently generate a 'complete' event. There are a few cases where completion tracing is not wanted. 1/ If blk_update_request() has already generated a completion trace event at the 'request' level, there is no point generating one at the bio level too. In this case the bi_sector and bi_size will have changed, so the bio level event would be wrong 2/ If the bio hasn't actually been queued yet, but is being aborted early, then a trace event could be confusing. Some filesystems call bio_endio() but do not want tracing. 3/ The bio_integrity code interposes itself by replacing bi_end_io, then restoring it and calling bio_endio() again. This would produce two identical trace events if left like that. To handle these, we introduce a flag BIO_TRACE_COMPLETION and only produce the trace event when this is set. We address point 1 above by clearing the flag in blk_update_request(). We address point 2 above by only setting the flag when generic_make_request() is called. We address point 3 above by clearing the flag after generating a completion event. When bio_split() is used on a bio, particularly in blk_queue_split(), there is an extra complication. A new bio is split off the front, and may be handle directly without going through generic_make_request(). The old bio, which has been advanced, is passed to generic_make_request(), so it will trigger a trace event a second time. Probably the best result when a split happens is to see a single 'queue' event for the whole bio, then multiple 'complete' events - one for each component. To achieve this was can: - copy the BIO_TRACE_COMPLETION flag to the new bio in bio_split() - avoid generating a 'queue' event if BIO_TRACE_COMPLETION is already set. This way, the split-off bio won't create a queue event, the original won't either even if it re-submitted to generic_make_request(), but both will produce completion events, each for their own range. So if generic_make_request() is called (which generates a QUEUED event), then bi_endio() will create a single COMPLETE event for each range that the bio is split into, unless the driver has explicitly requested it not to. Signed-off-by: NeilBrown <neilb@suse.com> --- block/bio.c | 13 +++++++++++++ block/blk-core.c | 10 +++++++++- drivers/md/dm.c | 1 - drivers/md/raid5.c | 2 -- include/linux/blk_types.h | 2 ++ 5 files changed, 24 insertions(+), 4 deletions(-) diff --git a/block/bio.c b/block/bio.c index 12c2837c4277..3db98fc3ee87 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1789,6 +1789,11 @@ static inline bool bio_remaining_done(struct bio *bio) * bio_endio() will end I/O on the whole bio. bio_endio() is the preferred * way to end I/O on a bio. No one should call bi_end_io() directly on a * bio unless they own it and thus know that it has an end_io function. + * + * bio_endio() can be called several times on a bio that has been chained + * using bio_chain(). The ->bi_end_io() function will only be called the + * last time. At this point the BLK_TA_COMPLETE tracing event will be + * generated if BIO_TRACE_COMPLETION is set. **/ void bio_endio(struct bio *bio) { @@ -1809,6 +1814,11 @@ void bio_endio(struct bio *bio) goto again; } + if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) { + trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), + bio, bio->bi_error); + bio_clear_flag(bio, BIO_TRACE_COMPLETION); + } if (bio->bi_end_io) bio->bi_end_io(bio); } @@ -1847,6 +1857,9 @@ struct bio *bio_split(struct bio *bio, int sectors, bio_advance(bio, split->bi_iter.bi_size); + if (bio_flagged(bio, BIO_TRACE_COMPLETION)) + bio_set_flag(bio, BIO_TRACE_COMPLETION); + return split; } EXPORT_SYMBOL(bio_split); diff --git a/block/blk-core.c b/block/blk-core.c index d772c221cc17..2a7993063a7e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1936,7 +1936,13 @@ generic_make_request_checks(struct bio *bio) if (!blkcg_bio_issue_check(q, bio)) return false; - trace_block_bio_queue(q, bio); + if (!bio_flagged(bio, BIO_TRACE_COMPLETION)) { + trace_block_bio_queue(q, bio); + /* Now that enqueuing has been traced, we need to trace + * completion as well. + */ + bio_set_flag(bio, BIO_TRACE_COMPLETION); + } return true; not_supported: @@ -2601,6 +2607,8 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) if (bio_bytes == bio->bi_iter.bi_size) req->bio = bio->bi_next; + /* Completion has already been traced */ + bio_clear_flag(bio, BIO_TRACE_COMPLETION); req_bio_endio(req, bio, bio_bytes, error); total_bytes += bio_bytes; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index dfb75979e455..cd93a3b9ceca 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -810,7 +810,6 @@ static void dec_pending(struct dm_io *io, int error) queue_io(md, bio); } else { /* done with normal IO or empty flush */ - trace_block_bio_complete(md->queue, bio, io_error); bio->bi_error = io_error; bio_endio(bio); } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index d8bd25f235dc..14557259dd69 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5138,8 +5138,6 @@ static void raid5_align_endio(struct bio *bi) rdev_dec_pending(rdev, conf->mddev); if (!error) { - trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev), - raid_bi, 0); bio_endio(raid_bi); if (atomic_dec_and_test(&conf->active_aligned_reads)) wake_up(&conf->wait_for_quiescent); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index ff07d891f38f..56c2a8180ce1 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -102,6 +102,8 @@ struct bio { #define BIO_REFFED 8 /* bio has elevated ->bi_cnt */ #define BIO_THROTTLED 9 /* This bio has already been subjected to * throttling rules. Don't do it again. */ +#define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion + * of this bio. */ /* See BVEC_POOL_OFFSET below before adding new flags */ /* ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] block: trace completion of all bios. @ 2017-04-07 1:10 ` NeilBrown 0 siblings, 0 replies; 6+ messages in thread From: NeilBrown @ 2017-04-07 1:10 UTC (permalink / raw) To: Jens Axboe Cc: linux-raid, Mike Snitzer, Christoph Hellwig, Ming Lei, linux-kernel, linux-block, dm-devel, Shaohua Li, Alasdair Kergon Currently only dm and md/raid5 bios trigger trace_block_bio_complete(). Now that we have bio_chain() and bio_inc_remaining(), it is not possible, in general, for a driver to know when the bio is really complete. Only bio_endio() knows that. So move the trace_block_bio_complete() call to bio_endio(). Now trace_block_bio_complete() pairs with trace_block_bio_queue(). Any bio for which a 'queue' event is traced, will subsequently generate a 'complete' event. There are a few cases where completion tracing is not wanted. 1/ If blk_update_request() has already generated a completion trace event at the 'request' level, there is no point generating one at the bio level too. In this case the bi_sector and bi_size will have changed, so the bio level event would be wrong 2/ If the bio hasn't actually been queued yet, but is being aborted early, then a trace event could be confusing. Some filesystems call bio_endio() but do not want tracing. 3/ The bio_integrity code interposes itself by replacing bi_end_io, then restoring it and calling bio_endio() again. This would produce two identical trace events if left like that. To handle these, we introduce a flag BIO_TRACE_COMPLETION and only produce the trace event when this is set. We address point 1 above by clearing the flag in blk_update_request(). We address point 2 above by only setting the flag when generic_make_request() is called. We address point 3 above by clearing the flag after generating a completion event. When bio_split() is used on a bio, particularly in blk_queue_split(), there is an extra complication. A new bio is split off the front, and may be handle directly without going through generic_make_request(). The old bio, which has been advanced, is passed to generic_make_request(), so it will trigger a trace event a second time. Probably the best result when a split happens is to see a single 'queue' event for the whole bio, then multiple 'complete' events - one for each component. To achieve this was can: - copy the BIO_TRACE_COMPLETION flag to the new bio in bio_split() - avoid generating a 'queue' event if BIO_TRACE_COMPLETION is already set. This way, the split-off bio won't create a queue event, the original won't either even if it re-submitted to generic_make_request(), but both will produce completion events, each for their own range. So if generic_make_request() is called (which generates a QUEUED event), then bi_endio() will create a single COMPLETE event for each range that the bio is split into, unless the driver has explicitly requested it not to. Signed-off-by: NeilBrown <neilb@suse.com> --- block/bio.c | 13 +++++++++++++ block/blk-core.c | 10 +++++++++- drivers/md/dm.c | 1 - drivers/md/raid5.c | 2 -- include/linux/blk_types.h | 2 ++ 5 files changed, 24 insertions(+), 4 deletions(-) diff --git a/block/bio.c b/block/bio.c index 12c2837c4277..3db98fc3ee87 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1789,6 +1789,11 @@ static inline bool bio_remaining_done(struct bio *bio) * bio_endio() will end I/O on the whole bio. bio_endio() is the preferred * way to end I/O on a bio. No one should call bi_end_io() directly on a * bio unless they own it and thus know that it has an end_io function. + * + * bio_endio() can be called several times on a bio that has been chained + * using bio_chain(). The ->bi_end_io() function will only be called the + * last time. At this point the BLK_TA_COMPLETE tracing event will be + * generated if BIO_TRACE_COMPLETION is set. **/ void bio_endio(struct bio *bio) { @@ -1809,6 +1814,11 @@ void bio_endio(struct bio *bio) goto again; } + if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) { + trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), + bio, bio->bi_error); + bio_clear_flag(bio, BIO_TRACE_COMPLETION); + } if (bio->bi_end_io) bio->bi_end_io(bio); } @@ -1847,6 +1857,9 @@ struct bio *bio_split(struct bio *bio, int sectors, bio_advance(bio, split->bi_iter.bi_size); + if (bio_flagged(bio, BIO_TRACE_COMPLETION)) + bio_set_flag(bio, BIO_TRACE_COMPLETION); + return split; } EXPORT_SYMBOL(bio_split); diff --git a/block/blk-core.c b/block/blk-core.c index d772c221cc17..2a7993063a7e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1936,7 +1936,13 @@ generic_make_request_checks(struct bio *bio) if (!blkcg_bio_issue_check(q, bio)) return false; - trace_block_bio_queue(q, bio); + if (!bio_flagged(bio, BIO_TRACE_COMPLETION)) { + trace_block_bio_queue(q, bio); + /* Now that enqueuing has been traced, we need to trace + * completion as well. + */ + bio_set_flag(bio, BIO_TRACE_COMPLETION); + } return true; not_supported: @@ -2601,6 +2607,8 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) if (bio_bytes == bio->bi_iter.bi_size) req->bio = bio->bi_next; + /* Completion has already been traced */ + bio_clear_flag(bio, BIO_TRACE_COMPLETION); req_bio_endio(req, bio, bio_bytes, error); total_bytes += bio_bytes; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index dfb75979e455..cd93a3b9ceca 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -810,7 +810,6 @@ static void dec_pending(struct dm_io *io, int error) queue_io(md, bio); } else { /* done with normal IO or empty flush */ - trace_block_bio_complete(md->queue, bio, io_error); bio->bi_error = io_error; bio_endio(bio); } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index d8bd25f235dc..14557259dd69 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5138,8 +5138,6 @@ static void raid5_align_endio(struct bio *bi) rdev_dec_pending(rdev, conf->mddev); if (!error) { - trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev), - raid_bi, 0); bio_endio(raid_bi); if (atomic_dec_and_test(&conf->active_aligned_reads)) wake_up(&conf->wait_for_quiescent); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index ff07d891f38f..56c2a8180ce1 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -102,6 +102,8 @@ struct bio { #define BIO_REFFED 8 /* bio has elevated ->bi_cnt */ #define BIO_THROTTLED 9 /* This bio has already been subjected to * throttling rules. Don't do it again. */ +#define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion + * of this bio. */ /* See BVEC_POOL_OFFSET below before adding new flags */ /* ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] Trace completion of all bios 2017-04-07 1:10 [PATCH 0/2] Trace completion of all bios NeilBrown @ 2017-04-07 15:42 ` Jens Axboe 2017-04-07 1:10 ` NeilBrown 2017-04-07 15:42 ` Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: Jens Axboe @ 2017-04-07 15:42 UTC (permalink / raw) To: NeilBrown Cc: linux-raid, Mike Snitzer, Christoph Hellwig, Ming Lei, linux-kernel, linux-block, dm-devel, Shaohua Li, Alasdair Kergon On 04/06/2017 07:10 PM, NeilBrown wrote: > Hi Jens, > I think all objections to this patch have been answered so I'm > resending. > I've added a small cleanup patch first which makes some small > enhancements to the documentation and #defines for the bio->flags > field. Added for 4.12. I hand applied 2/2, since it did not apply directly to the 4.12 branch. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] Trace completion of all bios @ 2017-04-07 15:42 ` Jens Axboe 0 siblings, 0 replies; 6+ messages in thread From: Jens Axboe @ 2017-04-07 15:42 UTC (permalink / raw) To: NeilBrown Cc: linux-raid, Mike Snitzer, Christoph Hellwig, Ming Lei, linux-kernel, linux-block, dm-devel, Shaohua Li, Alasdair Kergon On 04/06/2017 07:10 PM, NeilBrown wrote: > Hi Jens, > I think all objections to this patch have been answered so I'm > resending. > I've added a small cleanup patch first which makes some small > enhancements to the documentation and #defines for the bio->flags > field. Added for 4.12. I hand applied 2/2, since it did not apply directly to the 4.12 branch. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-04-07 15:43 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-07 1:10 [PATCH 0/2] Trace completion of all bios NeilBrown 2017-04-07 1:10 ` [PATCH 1/2] block: simple improvements for bio->flags NeilBrown 2017-04-07 1:10 ` [PATCH 2/2] block: trace completion of all bios NeilBrown 2017-04-07 1:10 ` NeilBrown 2017-04-07 15:42 ` [PATCH 0/2] Trace " Jens Axboe 2017-04-07 15:42 ` Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.