linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support
@ 2019-12-11  6:16 Chaitanya Kulkarni
  2019-12-12 22:19 ` Keith Busch
  2019-12-19  5:50 ` Chaitanya Kulkarni
  0 siblings, 2 replies; 5+ messages in thread
From: Chaitanya Kulkarni @ 2019-12-11  6:16 UTC (permalink / raw)
  To: lsf-pc
  Cc: axboe, linux-btrace, Theodore Y. Ts'o, Bart Van Assche,
	Martin K. Petersen, linux-scsi, linux-nvme, Christoph Hellwig,
	linux-block, linux-ide, Hannes Reinecke, Johannes Thumshirn,
	Keith Busch, linux-fsdevel, Ming Lei, Omar Sandoval,
	Matias Bjorling

Hi,

* Background:-
-----------------------------------------------------------------------

Linux Kernel Block layer now supports new Zone Management operations
(REQ_OP_ZONE_[OPEN/CLOSE/FINISH] [1]).

These operations are added mainly to support NVMe Zoned Namespces
(ZNS) [2]. We are adding support for ZNS in Linux Kernel Block layer,
user-space tools (sys-utils/nvme-cli), NVMe driver, File Systems,
Device-mapper in order to support these devices in the field.

Over the years Linux kernel block layer tracing infrastructure
has proven to be not only extremely useful but essential for:-

1. Debugging the problems in the development of kernel block drivers.
2. Solving the issues at the customer sites.
3. Speeding up the development for the file system developers.
4. Finding the device-related issues on the fly without modifying
    the kernel.
5. Building white box test-cases around the complex areas in the
    linux-block layer.

* Problem with block layer tracing infrastructure:-
-----------------------------------------------------------------------

If blktrace is such a great tool why we need this session for ?

Existing blktrace infrastructure lacks the number of free bits that are
available to track the new trace category. With the addition of new
REQ_OP_ZONE_XXX we need more bits to expand the blktrace so that we can
track more number of requests.

* Current state of the work:-
-----------------------------------------------------------------------

RFC implementations [3] has been posted with the addition of new IOCTLs
which is far from the production so that it can provide a basis to get
the discussion started.

This RFC implementation provides:-
1. Extended bits to track new trace categories.
2. Support for tracing per trace priorities.
3. Support for priority mask.
4. New IOCTLs so that user-space tools can setup the extensions.
5. Ability to track the integrity fields.
6. blktrace and blkparse implementation which supports the above
    mentioned features.

Bart and Martin has suggested changes which I've incorporated in the RFC 
revisions.

* What we will discuss in the proposed session ?
-----------------------------------------------------------------------

I'd like to propose a session for Storage track to go over the following
discussion points:-

1. What is the right approach to move this work forward?
2. What are the other information bits we need to add which will help
    kernel community to speed up the development and improve tracing?
3. What are the other tracepoints we need to add in the block layer
    to improve the tracing?
4. What are device driver callbacks tracing we can add in the block
    layer?
5. Since polling is becoming popular what are the new tracepoints
    we need to improve debugging ?
 

* Required Participants:-
-----------------------------------------------------------------------

I'd like to invite block layer, device drivers and file system
developers to:-

1. Share their opinion on the topic.
2. Share their experience and any other issues with blktrace
    infrastructure.
3. Uncover additional details that are missing from this proposal.

Regards,
Chaitanya

References :-

[1] https://www.spinics.net/lists/linux-block/msg46043.html
[2] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
namespaces-zns-as-go-to-industry-technology/
[3] https://www.spinics.net/lists/linux-btrace/msg01106.html
     https://www.spinics.net/lists/linux-btrace/msg01002.html
     https://www.spinics.net/lists/linux-btrace/msg01042.html
     https://www.spinics.net/lists/linux-btrace/msg00880.html

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support
  2019-12-11  6:16 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support Chaitanya Kulkarni
@ 2019-12-12 22:19 ` Keith Busch
  2019-12-19  5:50 ` Chaitanya Kulkarni
  1 sibling, 0 replies; 5+ messages in thread
From: Keith Busch @ 2019-12-12 22:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: axboe, Ming Lei, linux-btrace, Theodore Y. Ts'o,
	Bart Van Assche, Martin K. Petersen, linux-scsi, linux-nvme,
	Christoph Hellwig, linux-block, linux-ide, Hannes Reinecke,
	Johannes Thumshirn, linux-fsdevel, lsf-pc, Omar Sandoval,
	Matias Bjorling

On Wed, Dec 11, 2019 at 06:16:29AM +0000, Chaitanya Kulkarni wrote:
> * Current state of the work:-
> -----------------------------------------------------------------------
> 
> RFC implementations [3] has been posted with the addition of new IOCTLs
> which is far from the production so that it can provide a basis to get
> the discussion started.
> 
> This RFC implementation provides:-
> 1. Extended bits to track new trace categories.
> 2. Support for tracing per trace priorities.
> 3. Support for priority mask.
> 4. New IOCTLs so that user-space tools can setup the extensions.
> 5. Ability to track the integrity fields.
> 6. blktrace and blkparse implementation which supports the above
>     mentioned features.
> 
> Bart and Martin has suggested changes which I've incorporated in the RFC 
> revisions.
> 
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
> 
> I'd like to propose a session for Storage track to go over the following
> discussion points:-
> 
> 1. What is the right approach to move this work forward?
> 2. What are the other information bits we need to add which will help
>     kernel community to speed up the development and improve tracing?
> 3. What are the other tracepoints we need to add in the block layer
>     to improve the tracing?
> 4. What are device driver callbacks tracing we can add in the block
>     layer?

I would like seeing driver/protocol specific tracepoint decoding for
users common under a single blkparse utility. For nvme, it'd be great if
we could set a fixed ABI, as people keep changing it by burdening the
kernel with making events more human readable. I'd prefer to simplify
the driver's tracepoints and do the decoding from userspace so that it's
forward compatible.

> 5. Since polling is becoming popular what are the new tracepoints
>     we need to improve debugging ?

Regarding polling, but not tracepoint related, but it'd be nice if
we had a new cpu state for this. Right now it just looks like all CPU
utilization from systat says 'system', which isn't really helpful with
analyzing how the hardware is doing.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support
  2019-12-11  6:16 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support Chaitanya Kulkarni
  2019-12-12 22:19 ` Keith Busch
@ 2019-12-19  5:50 ` Chaitanya Kulkarni
  2020-01-09 10:19   ` Hans Holmberg
  1 sibling, 1 reply; 5+ messages in thread
From: Chaitanya Kulkarni @ 2019-12-19  5:50 UTC (permalink / raw)
  To: lsf-pc
  Cc: axboe, Damien Le Moal, linux-btrace, Theodore Y. Ts'o,
	Bart Van Assche, Martin K. Petersen, linux-scsi, linux-nvme,
	Christoph Hellwig, linux-block, linux-ide, Hannes Reinecke,
	Johannes Thumshirn, Keith Busch, linux-fsdevel, Ming Lei,
	Omar Sandoval, Matias Bjorling

Adding Damien to this thread.
On 12/10/2019 10:17 PM, Chaitanya Kulkarni wrote:
> Hi,
>
> * Background:-
> -----------------------------------------------------------------------
>
> Linux Kernel Block layer now supports new Zone Management operations
> (REQ_OP_ZONE_[OPEN/CLOSE/FINISH] [1]).
>
> These operations are added mainly to support NVMe Zoned Namespces
> (ZNS) [2]. We are adding support for ZNS in Linux Kernel Block layer,
> user-space tools (sys-utils/nvme-cli), NVMe driver, File Systems,
> Device-mapper in order to support these devices in the field.
>
> Over the years Linux kernel block layer tracing infrastructure
> has proven to be not only extremely useful but essential for:-
>
> 1. Debugging the problems in the development of kernel block drivers.
> 2. Solving the issues at the customer sites.
> 3. Speeding up the development for the file system developers.
> 4. Finding the device-related issues on the fly without modifying
>      the kernel.
> 5. Building white box test-cases around the complex areas in the
>      linux-block layer.
>
> * Problem with block layer tracing infrastructure:-
> -----------------------------------------------------------------------
>
> If blktrace is such a great tool why we need this session for ?
>
> Existing blktrace infrastructure lacks the number of free bits that are
> available to track the new trace category. With the addition of new
> REQ_OP_ZONE_XXX we need more bits to expand the blktrace so that we can
> track more number of requests.
>
> * Current state of the work:-
> -----------------------------------------------------------------------
>
> RFC implementations [3] has been posted with the addition of new IOCTLs
> which is far from the production so that it can provide a basis to get
> the discussion started.
>
> This RFC implementation provides:-
> 1. Extended bits to track new trace categories.
> 2. Support for tracing per trace priorities.
> 3. Support for priority mask.
> 4. New IOCTLs so that user-space tools can setup the extensions.
> 5. Ability to track the integrity fields.
> 6. blktrace and blkparse implementation which supports the above
>      mentioned features.
>
> Bart and Martin has suggested changes which I've incorporated in the RFC
> revisions.
>
> * What we will discuss in the proposed session ?
> -----------------------------------------------------------------------
>
> I'd like to propose a session for Storage track to go over the following
> discussion points:-
>
> 1. What is the right approach to move this work forward?
> 2. What are the other information bits we need to add which will help
>      kernel community to speed up the development and improve tracing?
> 3. What are the other tracepoints we need to add in the block layer
>      to improve the tracing?
> 4. What are device driver callbacks tracing we can add in the block
>      layer?
> 5. Since polling is becoming popular what are the new tracepoints
>      we need to improve debugging ?
>
>
> * Required Participants:-
> -----------------------------------------------------------------------
>
> I'd like to invite block layer, device drivers and file system
> developers to:-
>
> 1. Share their opinion on the topic.
> 2. Share their experience and any other issues with blktrace
>      infrastructure.
> 3. Uncover additional details that are missing from this proposal.
>
> Regards,
> Chaitanya
>
> References :-
>
> [1] https://www.spinics.net/lists/linux-block/msg46043.html
> [2] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> namespaces-zns-as-go-to-industry-technology/
> [3] https://www.spinics.net/lists/linux-btrace/msg01106.html
>       https://www.spinics.net/lists/linux-btrace/msg01002.html
>       https://www.spinics.net/lists/linux-btrace/msg01042.html
>       https://www.spinics.net/lists/linux-btrace/msg00880.html
>


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support
  2019-12-19  5:50 ` Chaitanya Kulkarni
@ 2020-01-09 10:19   ` Hans Holmberg
  2020-01-09 12:59     ` Damien Le Moal
  0 siblings, 1 reply; 5+ messages in thread
From: Hans Holmberg @ 2020-01-09 10:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: axboe, Ming Lei, linux-btrace, Theodore Y. Ts'o,
	Bart Van Assche, Martin K. Petersen, Damien Le Moal, linux-scsi,
	linux-nvme, Christoph Hellwig, linux-block, linux-ide,
	Hannes Reinecke, Johannes Thumshirn, Keith Busch, linux-fsdevel,
	lsf-pc, Omar Sandoval, Matias Bjorling

On Thu, Dec 19, 2019 at 6:50 AM Chaitanya Kulkarni
<Chaitanya.Kulkarni@wdc.com> wrote:
>
> Adding Damien to this thread.
> On 12/10/2019 10:17 PM, Chaitanya Kulkarni wrote:
> > Hi,
> >
> > * Background:-
> > -----------------------------------------------------------------------
> >
> > Linux Kernel Block layer now supports new Zone Management operations
> > (REQ_OP_ZONE_[OPEN/CLOSE/FINISH] [1]).
> >
> > These operations are added mainly to support NVMe Zoned Namespces
> > (ZNS) [2]. We are adding support for ZNS in Linux Kernel Block layer,
> > user-space tools (sys-utils/nvme-cli), NVMe driver, File Systems,
> > Device-mapper in order to support these devices in the field.
> >
> > Over the years Linux kernel block layer tracing infrastructure
> > has proven to be not only extremely useful but essential for:-
> >
> > 1. Debugging the problems in the development of kernel block drivers.
> > 2. Solving the issues at the customer sites.
> > 3. Speeding up the development for the file system developers.
> > 4. Finding the device-related issues on the fly without modifying
> >      the kernel.
> > 5. Building white box test-cases around the complex areas in the
> >      linux-block layer.
> >
> > * Problem with block layer tracing infrastructure:-
> > -----------------------------------------------------------------------
> >
> > If blktrace is such a great tool why we need this session for ?
> >
> > Existing blktrace infrastructure lacks the number of free bits that are
> > available to track the new trace category. With the addition of new
> > REQ_OP_ZONE_XXX we need more bits to expand the blktrace so that we can
> > track more number of requests.

In addition to tracing the zone operations, it would be greatly
beneficial to add tracing(and blktrace support) for the reported zone
states.
I did something similar[5] for pblk and open channel chunk states, and
that proved invaluable when figuring out whether the disk or pblk was
broken.

In pblk the reported chunk state transitions are traced along with the
expected zone transitions (based on io and management commands
submitted).

[5] https://www.lkml.org/lkml/2018/8/29/457

Thanks!
Hans

> >
> > * Current state of the work:-
> > -----------------------------------------------------------------------
> >
> > RFC implementations [3] has been posted with the addition of new IOCTLs
> > which is far from the production so that it can provide a basis to get
> > the discussion started.
> >
> > This RFC implementation provides:-
> > 1. Extended bits to track new trace categories.
> > 2. Support for tracing per trace priorities.
> > 3. Support for priority mask.
> > 4. New IOCTLs so that user-space tools can setup the extensions.
> > 5. Ability to track the integrity fields.
> > 6. blktrace and blkparse implementation which supports the above
> >      mentioned features.
> >
> > Bart and Martin has suggested changes which I've incorporated in the RFC
> > revisions.
> >
> > * What we will discuss in the proposed session ?
> > -----------------------------------------------------------------------
> >
> > I'd like to propose a session for Storage track to go over the following
> > discussion points:-
> >
> > 1. What is the right approach to move this work forward?
> > 2. What are the other information bits we need to add which will help
> >      kernel community to speed up the development and improve tracing?
> > 3. What are the other tracepoints we need to add in the block layer
> >      to improve the tracing?
> > 4. What are device driver callbacks tracing we can add in the block
> >      layer?
> > 5. Since polling is becoming popular what are the new tracepoints
> >      we need to improve debugging ?
> >
> >
> > * Required Participants:-
> > -----------------------------------------------------------------------
> >
> > I'd like to invite block layer, device drivers and file system
> > developers to:-
> >
> > 1. Share their opinion on the topic.
> > 2. Share their experience and any other issues with blktrace
> >      infrastructure.
> > 3. Uncover additional details that are missing from this proposal.
> >
> > Regards,
> > Chaitanya
> >
> > References :-
> >
> > [1] https://www.spinics.net/lists/linux-block/msg46043.html
> > [2] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
> > namespaces-zns-as-go-to-industry-technology/
> > [3] https://www.spinics.net/lists/linux-btrace/msg01106.html
> >       https://www.spinics.net/lists/linux-btrace/msg01002.html
> >       https://www.spinics.net/lists/linux-btrace/msg01042.html
> >       https://www.spinics.net/lists/linux-btrace/msg00880.html
> >
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support
  2020-01-09 10:19   ` Hans Holmberg
@ 2020-01-09 12:59     ` Damien Le Moal
  0 siblings, 0 replies; 5+ messages in thread
From: Damien Le Moal @ 2020-01-09 12:59 UTC (permalink / raw)
  To: Hans Holmberg, Chaitanya Kulkarni
  Cc: axboe, Ming Lei, linux-btrace, Theodore Y. Ts'o,
	Bart Van Assche, Martin K. Petersen, linux-scsi, linux-nvme,
	Christoph Hellwig, linux-block, linux-ide, Hannes Reinecke,
	Johannes Thumshirn, Keith Busch, linux-fsdevel, lsf-pc,
	Omar Sandoval, Matias Bjorling

On 2020/01/09 19:19, Hans Holmberg wrote:
> On Thu, Dec 19, 2019 at 6:50 AM Chaitanya Kulkarni
> <Chaitanya.Kulkarni@wdc.com> wrote:
>>
>> Adding Damien to this thread.
>> On 12/10/2019 10:17 PM, Chaitanya Kulkarni wrote:
>>> Hi,
>>>
>>> * Background:-
>>> -----------------------------------------------------------------------
>>>
>>> Linux Kernel Block layer now supports new Zone Management operations
>>> (REQ_OP_ZONE_[OPEN/CLOSE/FINISH] [1]).
>>>
>>> These operations are added mainly to support NVMe Zoned Namespces
>>> (ZNS) [2]. We are adding support for ZNS in Linux Kernel Block layer,
>>> user-space tools (sys-utils/nvme-cli), NVMe driver, File Systems,
>>> Device-mapper in order to support these devices in the field.
>>>
>>> Over the years Linux kernel block layer tracing infrastructure
>>> has proven to be not only extremely useful but essential for:-
>>>
>>> 1. Debugging the problems in the development of kernel block drivers.
>>> 2. Solving the issues at the customer sites.
>>> 3. Speeding up the development for the file system developers.
>>> 4. Finding the device-related issues on the fly without modifying
>>>      the kernel.
>>> 5. Building white box test-cases around the complex areas in the
>>>      linux-block layer.
>>>
>>> * Problem with block layer tracing infrastructure:-
>>> -----------------------------------------------------------------------
>>>
>>> If blktrace is such a great tool why we need this session for ?
>>>
>>> Existing blktrace infrastructure lacks the number of free bits that are
>>> available to track the new trace category. With the addition of new
>>> REQ_OP_ZONE_XXX we need more bits to expand the blktrace so that we can
>>> track more number of requests.
> 
> In addition to tracing the zone operations, it would be greatly
> beneficial to add tracing(and blktrace support) for the reported zone
> states.

That would require a *lot* of data (e.g. super large capacity SMR
drives) and a lot of addition to the hot path tracking write commands
and all zone commands. Also massive modifications of the error path for
that tracking to be correct, and that would need report zones itself. I
am really not for this.

> I did something similar[5] for pblk and open channel chunk states, and
> that proved invaluable when figuring out whether the disk or pblk was
> broken.
> 
> In pblk the reported chunk state transitions are traced along with the
> expected zone transitions (based on io and management commands
> submitted).

pblk being a logically defined device, it likely has some form of
tracking of zone state, similarly to what dm-zoned does. So it may be
easier in that case. But for physical drives, the amount of code/changes
and the runtime overhead of this tracking would not be acceptable in my
opinion.

I have debugged enough buggy SMR drives to know that blktrace is a great
help as is. Drive level debug features (fw logs etc) combined with
blktrace as-is can easily do the same.

> 
> [5] https://www.lkml.org/lkml/2018/8/29/457
> 
> Thanks!
> Hans
> 
>>>
>>> * Current state of the work:-
>>> -----------------------------------------------------------------------
>>>
>>> RFC implementations [3] has been posted with the addition of new IOCTLs
>>> which is far from the production so that it can provide a basis to get
>>> the discussion started.
>>>
>>> This RFC implementation provides:-
>>> 1. Extended bits to track new trace categories.
>>> 2. Support for tracing per trace priorities.
>>> 3. Support for priority mask.
>>> 4. New IOCTLs so that user-space tools can setup the extensions.
>>> 5. Ability to track the integrity fields.
>>> 6. blktrace and blkparse implementation which supports the above
>>>      mentioned features.
>>>
>>> Bart and Martin has suggested changes which I've incorporated in the RFC
>>> revisions.
>>>
>>> * What we will discuss in the proposed session ?
>>> -----------------------------------------------------------------------
>>>
>>> I'd like to propose a session for Storage track to go over the following
>>> discussion points:-
>>>
>>> 1. What is the right approach to move this work forward?
>>> 2. What are the other information bits we need to add which will help
>>>      kernel community to speed up the development and improve tracing?
>>> 3. What are the other tracepoints we need to add in the block layer
>>>      to improve the tracing?
>>> 4. What are device driver callbacks tracing we can add in the block
>>>      layer?
>>> 5. Since polling is becoming popular what are the new tracepoints
>>>      we need to improve debugging ?
>>>
>>>
>>> * Required Participants:-
>>> -----------------------------------------------------------------------
>>>
>>> I'd like to invite block layer, device drivers and file system
>>> developers to:-
>>>
>>> 1. Share their opinion on the topic.
>>> 2. Share their experience and any other issues with blktrace
>>>      infrastructure.
>>> 3. Uncover additional details that are missing from this proposal.
>>>
>>> Regards,
>>> Chaitanya
>>>
>>> References :-
>>>
>>> [1] https://www.spinics.net/lists/linux-block/msg46043.html
>>> [2] https://nvmexpress.org/new-nvmetm-specification-defines-zoned-
>>> namespaces-zns-as-go-to-industry-technology/
>>> [3] https://www.spinics.net/lists/linux-btrace/msg01106.html
>>>       https://www.spinics.net/lists/linux-btrace/msg01002.html
>>>       https://www.spinics.net/lists/linux-btrace/msg01042.html
>>>       https://www.spinics.net/lists/linux-btrace/msg00880.html
>>>
>>
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-09 12:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11  6:16 [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: add blktrace extension support Chaitanya Kulkarni
2019-12-12 22:19 ` Keith Busch
2019-12-19  5:50 ` Chaitanya Kulkarni
2020-01-09 10:19   ` Hans Holmberg
2020-01-09 12:59     ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).