All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] (no subject)
       [not found] <000c01cae6ee$1d4693d0$57d3bb70$@barton@oracle.com>
@ 2010-04-29  1:25 ` di.wang
  2010-04-29  1:49   ` Andreas Dilger
  2010-04-29  4:48   ` [Lustre-devel] Lustre RPC visualization Michael Kluge
  0 siblings, 2 replies; 38+ messages in thread
From: di.wang @ 2010-04-29  1:25 UTC (permalink / raw)
  To: lustre-devel

Hello, Michael

There is a logfile parser script in the attachment, which was written by 
Eric.

This script is very simple, but it should help you understand how we 
retrieve time step information from lustre debug log. On the server 
side, if you enable rpc_trace log, whenever the request arrive/start 
being processed/end of processing, there will be corresponding records 
being written into the debug log. Basically, you can get all the time 
step information from these records (actually only two of these records 
would be enough).


a. 
00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request()) 
Handling RPC pname:cluuid+ref:pid:xid:nid:opc 
ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400

This record means the req will being handled, so you can get the start 
time stamp(1272313858.472660) operation type (opc: 400, ping), xid 
(1334115493386242), client nid(12345-0 at lo) and so on.

b. 
00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request()) 
Handled RPC pname:cluuid+ref:pid:xid:nid:opc 
ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 
Request procesed in 45us (77us total) trans 0 rc 0/0

This record means the req is already being handled, so you can get the 
end time stamp(1272313858.472687), operation type (opc: 400, ping), xid 
(1334115493386242), client nid(12345-0 at lo) and so no.

Note: (77us total) means how long it takes from the request arriving to 
the end of processing. so you can also get the request arriving time 
stamp here by (1272313858 - 77 = 1272312781).


So with these information you can draw the graph Eric mentioned in his 
email. If you have any questions, please let me know.

Thanks
WangDi


Eric Barton wrote:
>
> ------------------------------------------------------------------------
>
> Subject:
> RE: Visualising Lustre RPCs
> To:
> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi" 
> <Tom.Wang@Sun.COM>
>
> To:
> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi" 
> <Tom.Wang@Sun.COM>
> CC:
> <lustre-devel@lists.lustre.org>, "Galen M. Shipman" <gshipman@ornl.gov>
>
>
> Michael,
>
> The previous Lustre RPC visualisation effort I mentioned at the LUG 
> used the
>
> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>
> log messages and then used the Lustre debug daemon to collect them while
>
> we ran I/O tests. We then ran a simple logfile parser which used just 
> the log entries
>
> for request arrival?
>
> , start of processing and end of processing to graph request
>
> queue depth (arrival->end) and the number of requests being serviced 
> by type
>
> over time ? e.g?
>
> read3d
>
> ?which shows request queue depth (vertical) over time (axis labelled 
> 20-25) by
>
> server (axis labelled 0-80).
>
> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
> *Sent:* 17 April 2010 6:26 AM
> *To:* Galen M. Shipman; Eric Barton
> *Subject:* Visualising Lustre RPCs
>
> Hi Galen, Eric,
>
> in order to get this little project started, I think what I need at 
> first to
>
> write a prototype for a converter are the following things:
>
> A set of test traces collected on maybe a handful of clients and some 
> servers
>
> is probably a good point to start with. It would be even better if we know
>
> what is in this traces so that we have an expectation what kind of things
>
> we want to see on the Vampir displays. This little program that Eric 
> mentioned
>
> that can read the trace file would be very helpful as well. And as the 
> last
>
> idea I have right now, a technical contact. I might come up with a couple
>
> of question after I have taken the first look onto the original trace 
> data
>
> and before I start writing code.
>
> Regards, Michael
>
>
> -- 
>
> Michael Kluge, M.Sc.
>
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
>
> Contact:
> Willersbau, Room WIL A 208
> Phone: (+49) 351 463-34217
> Fax: (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
> WWW: http://www.tu-dresden.de/zih
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rpc_parse.tgz
Type: application/x-compressed-tar
Size: 30287 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100428/e7d90469/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] (no subject)
  2010-04-29  1:25 ` [Lustre-devel] (no subject) di.wang
@ 2010-04-29  1:49   ` Andreas Dilger
  2010-04-29  2:04     ` di.wang
  2010-04-29  4:48   ` [Lustre-devel] Lustre RPC visualization Michael Kluge
  1 sibling, 1 reply; 38+ messages in thread
From: Andreas Dilger @ 2010-04-29  1:49 UTC (permalink / raw)
  To: lustre-devel

AFAIK the Handling time is when request processing starts, not when it  
first arrived at the server. We would need to check I'd there is an  
RPCTRACE message from ptlrpc_handle_req_in() that we could use to  
determine the initial request arrival time.

Cheers, Andreas

On 2010-04-28, at 19:25, "di.wang" <di.wang@oracle.com> wrote:

> Hello, Michael
>
> There is a logfile parser script in the attachment, which was  
> written by Eric.
>
> This script is very simple, but it should help you understand how we  
> retrieve time step information from lustre debug log. On the server  
> side, if you enable rpc_trace log, whenever the request arrive/start  
> being processed/end of processing, there will be corresponding  
> records being written into the debug log. Basically, you can get all  
> the time step information from these records (actually only two of  
> these records would be enough).
>
>
> a. 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c: 
> 1625:ptlrpc_server_handle_request()) Handling RPC pname:cluuid 
> +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- 
> ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>
> This record means the req will being handled, so you can get the  
> start time stamp(1272313858.472660) operation type (opc: 400, ping),  
> xid (1334115493386242), client nid(12345-0 at lo) and so on.
>
> b. 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c: 
> 1672:ptlrpc_server_handle_request()) Handled RPC pname:cluuid 
> +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- 
> ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 Request  
> procesed in 45us (77us total) trans 0 rc 0/0
>
> This record means the req is already being handled, so you can get  
> the end time stamp(1272313858.472687), operation type (opc: 400,  
> ping), xid (1334115493386242), client nid(12345-0 at lo) and so no.
>
> Note: (77us total) means how long it takes from the request arriving  
> to the end of processing. so you can also get the request arriving  
> time stamp here by (1272313858 - 77 = 1272312781).
>
>
> So with these information you can draw the graph Eric mentioned in  
> his email. If you have any questions, please let me know.
>
> Thanks
> WangDi
>
>
> Eric Barton wrote:
>>
>> --- 
>> ---------------------------------------------------------------------
>>
>> Subject:
>> RE: Visualising Lustre RPCs
>> To:
>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi" <Tom.Wang@Sun.COM 
>> >
>>
>> To:
>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi" <Tom.Wang@Sun.COM 
>> >
>> CC:
>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman" <gshipman@ornl.gov 
>> >
>>
>>
>> Michael,
>>
>> The previous Lustre RPC visualisation effort I mentioned at the LUG  
>> used the
>>
>> Lustre debug log entries of type D_RPCTRACE. We disabled all but  
>> these
>>
>> log messages and then used the Lustre debug daemon to collect them  
>> while
>>
>> we ran I/O tests. We then ran a simple logfile parser which used  
>> just the log entries
>>
>> for request arrival?
>>
>> , start of processing and end of processing to graph request
>>
>> queue depth (arrival->end) and the number of requests being  
>> serviced by type
>>
>> over time ? e.g?
>>
>> read3d
>>
>> ?which shows request queue depth (vertical) over time (axis labell 
>> ed 20-25) by
>>
>> server (axis labelled 0-80).
>>
>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>> *Sent:* 17 April 2010 6:26 AM
>> *To:* Galen M. Shipman; Eric Barton
>> *Subject:* Visualising Lustre RPCs
>>
>> Hi Galen, Eric,
>>
>> in order to get this little project started, I think what I need at  
>> first to
>>
>> write a prototype for a converter are the following things:
>>
>> A set of test traces collected on maybe a handful of clients and  
>> some servers
>>
>> is probably a good point to start with. It would be even better if  
>> we know
>>
>> what is in this traces so that we have an expectation what kind of  
>> things
>>
>> we want to see on the Vampir displays. This little program that  
>> Eric mentioned
>>
>> that can read the trace file would be very helpful as well. And as  
>> the last
>>
>> idea I have right now, a technical contact. I might come up with a  
>> couple
>>
>> of question after I have taken the first look onto the original  
>> trace data
>>
>> and before I start writing code.
>>
>> Regards, Michael
>>
>>
>> -- 
>>
>> Michael Kluge, M.Sc.
>>
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>>
>> Contact:
>> Willersbau, Room WIL A 208
>> Phone: (+49) 351 463-34217
>> Fax: (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de 
>> >
>> WWW: http://www.tu-dresden.de/zih
>>
>
> <rpc_parse.tgz>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] (no subject)
  2010-04-29  1:49   ` Andreas Dilger
@ 2010-04-29  2:04     ` di.wang
  0 siblings, 0 replies; 38+ messages in thread
From: di.wang @ 2010-04-29  2:04 UTC (permalink / raw)
  To: lustre-devel

Andreas Dilger wrote:
> AFAIK the Handling time is when request processing starts, not when it  
> first arrived at the server. We would need to check I'd there is an  
> RPCTRACE message from ptlrpc_handle_req_in() that we could use to  
> determine the initial request arrival time.
>   
Oh, you can also get the arriving time in "Handled RPC" msg
by Handled RPC_time - total time.

Thanks
WangDi
> Cheers, Andreas
>
> On 2010-04-28, at 19:25, "di.wang" <di.wang@oracle.com> wrote:
>
>   
>> Hello, Michael
>>
>> There is a logfile parser script in the attachment, which was  
>> written by Eric.
>>
>> This script is very simple, but it should help you understand how we  
>> retrieve time step information from lustre debug log. On the server  
>> side, if you enable rpc_trace log, whenever the request arrive/start  
>> being processed/end of processing, there will be corresponding  
>> records being written into the debug log. Basically, you can get all  
>> the time step information from these records (actually only two of  
>> these records would be enough).
>>
>>
>> a. 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c: 
>> 1625:ptlrpc_server_handle_request()) Handling RPC pname:cluuid 
>> +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- 
>> ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>>
>> This record means the req will being handled, so you can get the  
>> start time stamp(1272313858.472660) operation type (opc: 400, ping),  
>> xid (1334115493386242), client nid(12345-0 at lo) and so on.
>>
>> b. 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c: 
>> 1672:ptlrpc_server_handle_request()) Handled RPC pname:cluuid 
>> +ref:pid:xid:nid:opc ll_mgs_00:7d4fb15c-1b1c-295f-e466- 
>> ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400 Request  
>> procesed in 45us (77us total) trans 0 rc 0/0
>>
>> This record means the req is already being handled, so you can get  
>> the end time stamp(1272313858.472687), operation type (opc: 400,  
>> ping), xid (1334115493386242), client nid(12345-0 at lo) and so no.
>>
>> Note: (77us total) means how long it takes from the request arriving  
>> to the end of processing. so you can also get the request arriving  
>> time stamp here by (1272313858 - 77 = 1272312781).
>>
>>
>> So with these information you can draw the graph Eric mentioned in  
>> his email. If you have any questions, please let me know.
>>
>> Thanks
>> WangDi
>>
>>
>> Eric Barton wrote:
>>     
>>> --- 
>>> ---------------------------------------------------------------------
>>>
>>> Subject:
>>> RE: Visualising Lustre RPCs
>>> To:
>>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi" <Tom.Wang@Sun.COM 
>>>       
>>> To:
>>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi" <Tom.Wang@Sun.COM 
>>>       
>>> CC:
>>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman" <gshipman@ornl.gov 
>>>       
>>> Michael,
>>>
>>> The previous Lustre RPC visualisation effort I mentioned at the LUG  
>>> used the
>>>
>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but  
>>> these
>>>
>>> log messages and then used the Lustre debug daemon to collect them  
>>> while
>>>
>>> we ran I/O tests. We then ran a simple logfile parser which used  
>>> just the log entries
>>>
>>> for request arrival?
>>>
>>> , start of processing and end of processing to graph request
>>>
>>> queue depth (arrival->end) and the number of requests being  
>>> serviced by type
>>>
>>> over time ? e.g?
>>>
>>> read3d
>>>
>>> ?which shows request queue depth (vertical) over time (axis labell 
>>> ed 20-25) by
>>>
>>> server (axis labelled 0-80).
>>>
>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>>> *Sent:* 17 April 2010 6:26 AM
>>> *To:* Galen M. Shipman; Eric Barton
>>> *Subject:* Visualising Lustre RPCs
>>>
>>> Hi Galen, Eric,
>>>
>>> in order to get this little project started, I think what I need at  
>>> first to
>>>
>>> write a prototype for a converter are the following things:
>>>
>>> A set of test traces collected on maybe a handful of clients and  
>>> some servers
>>>
>>> is probably a good point to start with. It would be even better if  
>>> we know
>>>
>>> what is in this traces so that we have an expectation what kind of  
>>> things
>>>
>>> we want to see on the Vampir displays. This little program that  
>>> Eric mentioned
>>>
>>> that can read the trace file would be very helpful as well. And as  
>>> the last
>>>
>>> idea I have right now, a technical contact. I might come up with a  
>>> couple
>>>
>>> of question after I have taken the first look onto the original  
>>> trace data
>>>
>>> and before I start writing code.
>>>
>>> Regards, Michael
>>>
>>>
>>> -- 
>>>
>>> Michael Kluge, M.Sc.
>>>
>>> Technische Universit?t Dresden
>>> Center for Information Services and
>>> High Performance Computing (ZIH)
>>> D-01062 Dresden
>>> Germany
>>>
>>> Contact:
>>> Willersbau, Room WIL A 208
>>> Phone: (+49) 351 463-34217
>>> Fax: (+49) 351 463-37773
>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de 
>>>       
>>> WWW: http://www.tu-dresden.de/zih
>>>
>>>       
>> <rpc_parse.tgz>
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>     
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-04-29  1:25 ` [Lustre-devel] (no subject) di.wang
  2010-04-29  1:49   ` Andreas Dilger
@ 2010-04-29  4:48   ` Michael Kluge
       [not found]     ` <4BD9CF75.8030204@oracle.com>
  1 sibling, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-04-29  4:48 UTC (permalink / raw)
  To: lustre-devel

Hi WangDi,

OK, thanks for input. I'll go ahead and try to write a converter. Could 
you please collect a set test traces that belong together from a couple 
servers and clients and put them somewhere so that I can download them?


Thanks, Michael

Am 29.04.2010 03:25, schrieb di.wang:
> Hello, Michael
>
> There is a logfile parser script in the attachment, which was written by
> Eric.
>
> This script is very simple, but it should help you understand how we
> retrieve time step information from lustre debug log. On the server
> side, if you enable rpc_trace log, whenever the request arrive/start
> being processed/end of processing, there will be corresponding records
> being written into the debug log. Basically, you can get all the time
> step information from these records (actually only two of these records
> would be enough).
>
>
> a.
> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request())
> Handling RPC pname:cluuid+ref:pid:xid:nid:opc
> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>
>
> This record means the req will being handled, so you can get the start
> time stamp(1272313858.472660) operation type (opc: 400, ping), xid
> (1334115493386242), client nid(12345-0 at lo) and so on.
>
> b.
> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request())
> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400
> Request procesed in 45us (77us total) trans 0 rc 0/0
>
> This record means the req is already being handled, so you can get the
> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid
> (1334115493386242), client nid(12345-0 at lo) and so no.
>
> Note: (77us total) means how long it takes from the request arriving to
> the end of processing. so you can also get the request arriving time
> stamp here by (1272313858 - 77 = 1272312781).
>
>
> So with these information you can draw the graph Eric mentioned in his
> email. If you have any questions, please let me know.
>
> Thanks
> WangDi
>
>
> Eric Barton wrote:
>>
>> ------------------------------------------------------------------------
>>
>> Subject:
>> RE: Visualising Lustre RPCs
>> To:
>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi"
>> <Tom.Wang@Sun.COM>
>>
>> To:
>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi"
>> <Tom.Wang@Sun.COM>
>> CC:
>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman" <gshipman@ornl.gov>
>>
>>
>> Michael,
>>
>> The previous Lustre RPC visualisation effort I mentioned at the LUG
>> used the
>>
>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>>
>> log messages and then used the Lustre debug daemon to collect them while
>>
>> we ran I/O tests. We then ran a simple logfile parser which used just
>> the log entries
>>
>> for request arrival?
>>
>> , start of processing and end of processing to graph request
>>
>> queue depth (arrival->end) and the number of requests being serviced
>> by type
>>
>> over time ? e.g?
>>
>> read3d
>>
>> ?which shows request queue depth (vertical) over time (axis labelled
>> 20-25) by
>>
>> server (axis labelled 0-80).
>>
>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>> *Sent:* 17 April 2010 6:26 AM
>> *To:* Galen M. Shipman; Eric Barton
>> *Subject:* Visualising Lustre RPCs
>>
>> Hi Galen, Eric,
>>
>> in order to get this little project started, I think what I need at
>> first to
>>
>> write a prototype for a converter are the following things:
>>
>> A set of test traces collected on maybe a handful of clients and some
>> servers
>>
>> is probably a good point to start with. It would be even better if we
>> know
>>
>> what is in this traces so that we have an expectation what kind of things
>>
>> we want to see on the Vampir displays. This little program that Eric
>> mentioned
>>
>> that can read the trace file would be very helpful as well. And as the
>> last
>>
>> idea I have right now, a technical contact. I might come up with a couple
>>
>> of question after I have taken the first look onto the original trace
>> data
>>
>> and before I start writing code.
>>
>> Regards, Michael
>>
>>
>> --
>>
>> Michael Kluge, M.Sc.
>>
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>>
>> Contact:
>> Willersbau, Room WIL A 208
>> Phone: (+49) 351 463-34217
>> Fax: (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
>> WWW: http://www.tu-dresden.de/zih

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]     ` <4BD9CF75.8030204@oracle.com>
@ 2010-05-03  8:41       ` Michael Kluge
  2010-05-03 13:20         ` Andreas Dilger
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-05-03  8:41 UTC (permalink / raw)
  To: lustre-devel

Hi WangDi,

I have a small problem understanding the logical order of the events. 
Here is one example from the logs where the client reports an RPC as 
completed before the server says that it has finished the RPC handling. 
The difference is about 1.5 ms.

Is that due to the fact that the server waits until the client has 
ack'ed the result of the RPC? Or are the clocks between the servers not 
well synchronized? Or are the timestamps in the logfile sometimes not 
correct (log message could not be flushed or whatever)?

In the first case I don't have a chance to figure out when the server 
has finished working on the RPC and is going to send out the result, 
correct?

../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set()) 
Completed RPC pname:cluuid:pid:xid:nid:opc 
ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4

../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request()) 
Handled RPC pname:cluuid+ref:pid:xid:nid:opc 
ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4 
Request procesed in 238391us (239698us total) trans 266 rc 0/0


For this RPC I see in the oss log:

1272565278.025337: incoming x1334380768266050
1272565278.026649: Handling RPC x1334380768266050
1272565278.265039: Handled RPC x1334380768266050:Request procesed in 
238391us (239698us total)

So, by doing the math I get: 265039-026649=238390 (just 1 off target, 
might be wathever) and 265039-025337=239702 (4 off target). So I guess 
the times reported by the last message are calculated by using a 
different data source than the log time stamps?


Regards, Michael


Am 29.04.2010 20:27, schrieb di.wang:
> Hello, Michael
>
> Here is a small debug log example you can use for your development.
>
> Thanks
> WangDi
> Michael Kluge wrote:
>> Hi WangDi,
>>
>> OK, thanks for input. I'll go ahead and try to write a converter.
>> Could you please collect a set test traces that belong together from a
>> couple servers and clients and put them somewhere so that I can
>> download them?
>>
>>
>> Thanks, Michael
>>
>> Am 29.04.2010 03:25, schrieb di.wang:
>>> Hello, Michael
>>>
>>> There is a logfile parser script in the attachment, which was written by
>>> Eric.
>>>
>>> This script is very simple, but it should help you understand how we
>>> retrieve time step information from lustre debug log. On the server
>>> side, if you enable rpc_trace log, whenever the request arrive/start
>>> being processed/end of processing, there will be corresponding records
>>> being written into the debug log. Basically, you can get all the time
>>> step information from these records (actually only two of these records
>>> would be enough).
>>>
>>>
>>> a.
>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request())
>>>
>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc
>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>>>
>>>
>>>
>>> This record means the req will being handled, so you can get the start
>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid
>>> (1334115493386242), client nid(12345-0 at lo) and so on.
>>>
>>> b.
>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request())
>>>
>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400
>>>
>>> Request procesed in 45us (77us total) trans 0 rc 0/0
>>>
>>> This record means the req is already being handled, so you can get the
>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid
>>> (1334115493386242), client nid(12345-0 at lo) and so no.
>>>
>>> Note: (77us total) means how long it takes from the request arriving to
>>> the end of processing. so you can also get the request arriving time
>>> stamp here by (1272313858 - 77 = 1272312781).
>>>
>>>
>>> So with these information you can draw the graph Eric mentioned in his
>>> email. If you have any questions, please let me know.
>>>
>>> Thanks
>>> WangDi
>>>
>>>
>>> Eric Barton wrote:
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> Subject:
>>>> RE: Visualising Lustre RPCs
>>>> To:
>>>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi"
>>>> <Tom.Wang@Sun.COM>
>>>>
>>>> To:
>>>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi"
>>>> <Tom.Wang@Sun.COM>
>>>> CC:
>>>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman" <gshipman@ornl.gov>
>>>>
>>>>
>>>> Michael,
>>>>
>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG
>>>> used the
>>>>
>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>>>>
>>>> log messages and then used the Lustre debug daemon to collect them
>>>> while
>>>>
>>>> we ran I/O tests. We then ran a simple logfile parser which used just
>>>> the log entries
>>>>
>>>> for request arrival?
>>>>
>>>> , start of processing and end of processing to graph request
>>>>
>>>> queue depth (arrival->end) and the number of requests being serviced
>>>> by type
>>>>
>>>> over time ? e.g?
>>>>
>>>> read3d
>>>>
>>>> ?which shows request queue depth (vertical) over time (axis labelled
>>>> 20-25) by
>>>>
>>>> server (axis labelled 0-80).
>>>>
>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>>>> *Sent:* 17 April 2010 6:26 AM
>>>> *To:* Galen M. Shipman; Eric Barton
>>>> *Subject:* Visualising Lustre RPCs
>>>>
>>>> Hi Galen, Eric,
>>>>
>>>> in order to get this little project started, I think what I need at
>>>> first to
>>>>
>>>> write a prototype for a converter are the following things:
>>>>
>>>> A set of test traces collected on maybe a handful of clients and some
>>>> servers
>>>>
>>>> is probably a good point to start with. It would be even better if we
>>>> know
>>>>
>>>> what is in this traces so that we have an expectation what kind of
>>>> things
>>>>
>>>> we want to see on the Vampir displays. This little program that Eric
>>>> mentioned
>>>>
>>>> that can read the trace file would be very helpful as well. And as the
>>>> last
>>>>
>>>> idea I have right now, a technical contact. I might come up with a
>>>> couple
>>>>
>>>> of question after I have taken the first look onto the original trace
>>>> data
>>>>
>>>> and before I start writing code.
>>>>
>>>> Regards, Michael
>>>>
>>>>
>>>> --
>>>>
>>>> Michael Kluge, M.Sc.
>>>>
>>>> Technische Universit?t Dresden
>>>> Center for Information Services and
>>>> High Performance Computing (ZIH)
>>>> D-01062 Dresden
>>>> Germany
>>>>
>>>> Contact:
>>>> Willersbau, Room WIL A 208
>>>> Phone: (+49) 351 463-34217
>>>> Fax: (+49) 351 463-37773
>>>> e-mail: michael.kluge at tu-dresden.de
>>>> <mailto:michael.kluge@tu-dresden.de>
>>>> WWW: http://www.tu-dresden.de/zih
>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03  8:41       ` Michael Kluge
@ 2010-05-03 13:20         ` Andreas Dilger
  2010-05-03 18:10           ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Dilger @ 2010-05-03 13:20 UTC (permalink / raw)
  To: lustre-devel

On 2010-05-03, at 04:41, Michael Kluge wrote:
> I have a small problem understanding the logical order of the events. 
> Here is one example from the logs where the client reports an RPC as 
> completed before the server says that it has finished the RPC handling. 
> The difference is about 1.5 ms.
> 
> Is that due to the fact that the server waits until the client has 
> ack'ed the result of the RPC? Or are the clocks between the servers not 
> well synchronized? Or are the timestamps in the logfile sometimes not 
> correct (log message could not be flushed or whatever)?

The timestamps on the log messages are only as accurate as the clocks on the nodes.  Lustre can run without synchronized clocks, using the client clock as the basis for the timestamps, but Lustre makes no effort to synchronize the client clocks.  For that you need to use NTP (easiest) or adjust the timestamps in the logs based on messages like this.

> In the first case I don't have a chance to figure out when the server 
> has finished working on the RPC and is going to send out the result, 
> correct?
> 
> ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set()) 
> Completed RPC pname:cluuid:pid:xid:nid:opc 
> ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4
> 
> ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request()) 
> Handled RPC pname:cluuid+ref:pid:xid:nid:opc 
> ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4 
> Request procesed in 238391us (239698us total) trans 266 rc 0/0
> 
> 
> For this RPC I see in the oss log:
> 
> 1272565278.025337: incoming x1334380768266050
> 1272565278.026649: Handling RPC x1334380768266050
> 1272565278.265039: Handled RPC x1334380768266050:Request procesed in 
> 238391us (239698us total)
> 
> So, by doing the math I get: 265039-026649=238390 (just 1 off target, 
> might be wathever) and 265039-025337=239702 (4 off target). So I guess 
> the times reported by the last message are calculated by using a 
> different data source than the log time stamps?

The log timestamps are generated at the time the message is processed by the log system.  The processing times are themselves generated by accessing the clock and computing the processing before the message is printed.

        do_gettimeofday(&work_start);
        :
        {handle RPC}
        :
        do_gettimeofday(&work_end);
        timediff = cfs_timeval_sub(&work_end, &work_start, NULL);
        CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in "
               "%ldus (%ldus total) trans "LPU64" rc %d/%d\n",



> Am 29.04.2010 20:27, schrieb di.wang:
>> Hello, Michael
>> 
>> Here is a small debug log example you can use for your development.
>> 
>> Thanks
>> WangDi
>> Michael Kluge wrote:
>>> Hi WangDi,
>>> 
>>> OK, thanks for input. I'll go ahead and try to write a converter.
>>> Could you please collect a set test traces that belong together from a
>>> couple servers and clients and put them somewhere so that I can
>>> download them?
>>> 
>>> 
>>> Thanks, Michael
>>> 
>>> Am 29.04.2010 03:25, schrieb di.wang:
>>>> Hello, Michael
>>>> 
>>>> There is a logfile parser script in the attachment, which was written by
>>>> Eric.
>>>> 
>>>> This script is very simple, but it should help you understand how we
>>>> retrieve time step information from lustre debug log. On the server
>>>> side, if you enable rpc_trace log, whenever the request arrive/start
>>>> being processed/end of processing, there will be corresponding records
>>>> being written into the debug log. Basically, you can get all the time
>>>> step information from these records (actually only two of these records
>>>> would be enough).
>>>> 
>>>> 
>>>> a.
>>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request())
>>>> 
>>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc
>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>>>> 
>>>> 
>>>> 
>>>> This record means the req will being handled, so you can get the start
>>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid
>>>> (1334115493386242), client nid(12345-0 at lo) and so on.
>>>> 
>>>> b.
>>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request())
>>>> 
>>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400
>>>> 
>>>> Request procesed in 45us (77us total) trans 0 rc 0/0
>>>> 
>>>> This record means the req is already being handled, so you can get the
>>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid
>>>> (1334115493386242), client nid(12345-0 at lo) and so no.
>>>> 
>>>> Note: (77us total) means how long it takes from the request arriving to
>>>> the end of processing. so you can also get the request arriving time
>>>> stamp here by (1272313858 - 77 = 1272312781).
>>>> 
>>>> 
>>>> So with these information you can draw the graph Eric mentioned in his
>>>> email. If you have any questions, please let me know.
>>>> 
>>>> Thanks
>>>> WangDi
>>>> 
>>>> 
>>>> Eric Barton wrote:
>>>>> 
>>>>> ------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Subject:
>>>>> RE: Visualising Lustre RPCs
>>>>> To:
>>>>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi"
>>>>> <Tom.Wang@Sun.COM>
>>>>> 
>>>>> To:
>>>>> "Michael Kluge" <Michael.Kluge@tu-dresden.de>, "wangdi"
>>>>> <Tom.Wang@Sun.COM>
>>>>> CC:
>>>>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman" <gshipman@ornl.gov>
>>>>> 
>>>>> 
>>>>> Michael,
>>>>> 
>>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG
>>>>> used the
>>>>> 
>>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>>>>> 
>>>>> log messages and then used the Lustre debug daemon to collect them
>>>>> while
>>>>> 
>>>>> we ran I/O tests. We then ran a simple logfile parser which used just
>>>>> the log entries
>>>>> 
>>>>> for request arrival?
>>>>> 
>>>>> , start of processing and end of processing to graph request
>>>>> 
>>>>> queue depth (arrival->end) and the number of requests being serviced
>>>>> by type
>>>>> 
>>>>> over time ? e.g?
>>>>> 
>>>>> read3d
>>>>> 
>>>>> ?which shows request queue depth (vertical) over time (axis labelled
>>>>> 20-25) by
>>>>> 
>>>>> server (axis labelled 0-80).
>>>>> 
>>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>>>>> *Sent:* 17 April 2010 6:26 AM
>>>>> *To:* Galen M. Shipman; Eric Barton
>>>>> *Subject:* Visualising Lustre RPCs
>>>>> 
>>>>> Hi Galen, Eric,
>>>>> 
>>>>> in order to get this little project started, I think what I need at
>>>>> first to
>>>>> 
>>>>> write a prototype for a converter are the following things:
>>>>> 
>>>>> A set of test traces collected on maybe a handful of clients and some
>>>>> servers
>>>>> 
>>>>> is probably a good point to start with. It would be even better if we
>>>>> know
>>>>> 
>>>>> what is in this traces so that we have an expectation what kind of
>>>>> things
>>>>> 
>>>>> we want to see on the Vampir displays. This little program that Eric
>>>>> mentioned
>>>>> 
>>>>> that can read the trace file would be very helpful as well. And as the
>>>>> last
>>>>> 
>>>>> idea I have right now, a technical contact. I might come up with a
>>>>> couple
>>>>> 
>>>>> of question after I have taken the first look onto the original trace
>>>>> data
>>>>> 
>>>>> and before I start writing code.
>>>>> 
>>>>> Regards, Michael
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Michael Kluge, M.Sc.
>>>>> 
>>>>> Technische Universit?t Dresden
>>>>> Center for Information Services and
>>>>> High Performance Computing (ZIH)
>>>>> D-01062 Dresden
>>>>> Germany
>>>>> 
>>>>> Contact:
>>>>> Willersbau, Room WIL A 208
>>>>> Phone: (+49) 351 463-34217
>>>>> Fax: (+49) 351 463-37773
>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>> <mailto:michael.kluge@tu-dresden.de>
>>>>> WWW: http://www.tu-dresden.de/zih
>>> 
>> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 13:20         ` Andreas Dilger
@ 2010-05-03 18:10           ` Michael Kluge
  2010-05-03 18:57             ` Robert Read
  2010-05-03 18:58             ` di.wang
  0 siblings, 2 replies; 38+ messages in thread
From: Michael Kluge @ 2010-05-03 18:10 UTC (permalink / raw)
  To: lustre-devel

Hi Andreas,

Am 03.05.2010 15:20, schrieb Andreas Dilger:
> On 2010-05-03, at 04:41, Michael Kluge wrote:
>> I have a small problem understanding the logical order of the events.
>> Here is one example from the logs where the client reports an RPC as
>> completed before the server says that it has finished the RPC handling.
>> The difference is about 1.5 ms.
>>
>> Is that due to the fact that the server waits until the client has
>> ack'ed the result of the RPC? Or are the clocks between the servers not
>> well synchronized? Or are the timestamps in the logfile sometimes not
>> correct (log message could not be flushed or whatever)?
>
> The timestamps on the log messages are only as accurate as the clocks on
 > the nodes.  Lustre can run without synchronized clocks, using the client
 > clock as the basis for the timestamps, but Lustre makes no effort to
 > synchronize the client clocks.  For that you need to use NTP (easiest)
>  or adjust the timestamps in the logs based on messages like this.

OK, so logically the "Completed RPC" on the client side is supposed to 
show up after the server has written his "Handled RPC" to its log (if 
clocks are snyc'ed perfectly).

One more question: RPC 1334380768266400 (in the log WangDi sent me) has 
on the client side only a "Sending RPC" message, thus missing the 
"Completed RPC". The server has all three (received,start work, done 
work). Has this RPC vanished on the way back to the client? There is no 
further indication what happend. The last timestamp in the client log is:
1272565368.228628
and the server says it finished the processing of the request at:
1272565281.379471
So the client log has been recorded long enough to contain the 
"Completed RPC" message for this RPC if it arrived ever ...


Regards, Michael


>> In the first case I don't have a chance to figure out when the server
>> has finished working on the RPC and is going to send out the result,
>> correct?
>>
>> ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set())
>> Completed RPC pname:cluuid:pid:xid:nid:opc
>> ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4
>>
>> ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request())
>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>> ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4
>> Request procesed in 238391us (239698us total) trans 266 rc 0/0
>>
>>
>> For this RPC I see in the oss log:
>>
>> 1272565278.025337: incoming x1334380768266050
>> 1272565278.026649: Handling RPC x1334380768266050
>> 1272565278.265039: Handled RPC x1334380768266050:Request procesed in
>> 238391us (239698us total)
>>
>> So, by doing the math I get: 265039-026649=238390 (just 1 off target,
>> might be wathever) and 265039-025337=239702 (4 off target). So I guess
>> the times reported by the last message are calculated by using a
>> different data source than the log time stamps?
>
> The log timestamps are generated at the time the message is processed by the log system.  The processing times are themselves generated by accessing the clock and computing the processing before the message is printed.
>
>          do_gettimeofday(&work_start);
>          :
>          {handle RPC}
>          :
>          do_gettimeofday(&work_end);
>          timediff = cfs_timeval_sub(&work_end,&work_start, NULL);
>          CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in "
>                 "%ldus (%ldus total) trans "LPU64" rc %d/%d\n",
>
>
>
>> Am 29.04.2010 20:27, schrieb di.wang:
>>> Hello, Michael
>>>
>>> Here is a small debug log example you can use for your development.
>>>
>>> Thanks
>>> WangDi
>>> Michael Kluge wrote:
>>>> Hi WangDi,
>>>>
>>>> OK, thanks for input. I'll go ahead and try to write a converter.
>>>> Could you please collect a set test traces that belong together from a
>>>> couple servers and clients and put them somewhere so that I can
>>>> download them?
>>>>
>>>>
>>>> Thanks, Michael
>>>>
>>>> Am 29.04.2010 03:25, schrieb di.wang:
>>>>> Hello, Michael
>>>>>
>>>>> There is a logfile parser script in the attachment, which was written by
>>>>> Eric.
>>>>>
>>>>> This script is very simple, but it should help you understand how we
>>>>> retrieve time step information from lustre debug log. On the server
>>>>> side, if you enable rpc_trace log, whenever the request arrive/start
>>>>> being processed/end of processing, there will be corresponding records
>>>>> being written into the debug log. Basically, you can get all the time
>>>>> step information from these records (actually only two of these records
>>>>> would be enough).
>>>>>
>>>>>
>>>>> a.
>>>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request())
>>>>>
>>>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc
>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>>>>>
>>>>>
>>>>>
>>>>> This record means the req will being handled, so you can get the start
>>>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid
>>>>> (1334115493386242), client nid(12345-0 at lo) and so on.
>>>>>
>>>>> b.
>>>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request())
>>>>>
>>>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400
>>>>>
>>>>> Request procesed in 45us (77us total) trans 0 rc 0/0
>>>>>
>>>>> This record means the req is already being handled, so you can get the
>>>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid
>>>>> (1334115493386242), client nid(12345-0 at lo) and so no.
>>>>>
>>>>> Note: (77us total) means how long it takes from the request arriving to
>>>>> the end of processing. so you can also get the request arriving time
>>>>> stamp here by (1272313858 - 77 = 1272312781).
>>>>>
>>>>>
>>>>> So with these information you can draw the graph Eric mentioned in his
>>>>> email. If you have any questions, please let me know.
>>>>>
>>>>> Thanks
>>>>> WangDi
>>>>>
>>>>>
>>>>> Eric Barton wrote:
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Subject:
>>>>>> RE: Visualising Lustre RPCs
>>>>>> To:
>>>>>> "Michael Kluge"<Michael.Kluge@tu-dresden.de>, "wangdi"
>>>>>> <Tom.Wang@Sun.COM>
>>>>>>
>>>>>> To:
>>>>>> "Michael Kluge"<Michael.Kluge@tu-dresden.de>, "wangdi"
>>>>>> <Tom.Wang@Sun.COM>
>>>>>> CC:
>>>>>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman"<gshipman@ornl.gov>
>>>>>>
>>>>>>
>>>>>> Michael,
>>>>>>
>>>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG
>>>>>> used the
>>>>>>
>>>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>>>>>>
>>>>>> log messages and then used the Lustre debug daemon to collect them
>>>>>> while
>>>>>>
>>>>>> we ran I/O tests. We then ran a simple logfile parser which used just
>>>>>> the log entries
>>>>>>
>>>>>> for request arrival?
>>>>>>
>>>>>> , start of processing and end of processing to graph request
>>>>>>
>>>>>> queue depth (arrival->end) and the number of requests being serviced
>>>>>> by type
>>>>>>
>>>>>> over time ? e.g?
>>>>>>
>>>>>> read3d
>>>>>>
>>>>>> ?which shows request queue depth (vertical) over time (axis labelled
>>>>>> 20-25) by
>>>>>>
>>>>>> server (axis labelled 0-80).
>>>>>>
>>>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>>>>>> *Sent:* 17 April 2010 6:26 AM
>>>>>> *To:* Galen M. Shipman; Eric Barton
>>>>>> *Subject:* Visualising Lustre RPCs
>>>>>>
>>>>>> Hi Galen, Eric,
>>>>>>
>>>>>> in order to get this little project started, I think what I need at
>>>>>> first to
>>>>>>
>>>>>> write a prototype for a converter are the following things:
>>>>>>
>>>>>> A set of test traces collected on maybe a handful of clients and some
>>>>>> servers
>>>>>>
>>>>>> is probably a good point to start with. It would be even better if we
>>>>>> know
>>>>>>
>>>>>> what is in this traces so that we have an expectation what kind of
>>>>>> things
>>>>>>
>>>>>> we want to see on the Vampir displays. This little program that Eric
>>>>>> mentioned
>>>>>>
>>>>>> that can read the trace file would be very helpful as well. And as the
>>>>>> last
>>>>>>
>>>>>> idea I have right now, a technical contact. I might come up with a
>>>>>> couple
>>>>>>
>>>>>> of question after I have taken the first look onto the original trace
>>>>>> data
>>>>>>
>>>>>> and before I start writing code.
>>>>>>
>>>>>> Regards, Michael
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Michael Kluge, M.Sc.
>>>>>>
>>>>>> Technische Universit?t Dresden
>>>>>> Center for Information Services and
>>>>>> High Performance Computing (ZIH)
>>>>>> D-01062 Dresden
>>>>>> Germany
>>>>>>
>>>>>> Contact:
>>>>>> Willersbau, Room WIL A 208
>>>>>> Phone: (+49) 351 463-34217
>>>>>> Fax: (+49) 351 463-37773
>>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>>> <mailto:michael.kluge@tu-dresden.de>
>>>>>> WWW: http://www.tu-dresden.de/zih

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 18:10           ` Michael Kluge
@ 2010-05-03 18:57             ` Robert Read
  2010-05-03 18:58             ` di.wang
  1 sibling, 0 replies; 38+ messages in thread
From: Robert Read @ 2010-05-03 18:57 UTC (permalink / raw)
  To: lustre-devel

Hi Michael, 

On May 3, 2010, at 11:10 , Michael Kluge wrote:

> Hi Andreas,
> 
> Am 03.05.2010 15:20, schrieb Andreas Dilger:
>> On 2010-05-03, at 04:41, Michael Kluge wrote:
>>> I have a small problem understanding the logical order of the events.
>>> Here is one example from the logs where the client reports an RPC as
>>> completed before the server says that it has finished the RPC handling.
>>> The difference is about 1.5 ms.
>>> 
>>> Is that due to the fact that the server waits until the client has
>>> ack'ed the result of the RPC? Or are the clocks between the servers not
>>> well synchronized? Or are the timestamps in the logfile sometimes not
>>> correct (log message could not be flushed or whatever)?
>> 
>> The timestamps on the log messages are only as accurate as the clocks on
>> the nodes.  Lustre can run without synchronized clocks, using the client
>> clock as the basis for the timestamps, but Lustre makes no effort to
>> synchronize the client clocks.  For that you need to use NTP (easiest)
>> or adjust the timestamps in the logs based on messages like this.
> 
> OK, so logically the "Completed RPC" on the client side is supposed to 
> show up after the server has written his "Handled RPC" to its log (if 
> clocks are snyc'ed perfectly).
> 
> One more question: RPC 1334380768266400 (in the log WangDi sent me) has 
> on the client side only a "Sending RPC" message, thus missing the 
> "Completed RPC". The server has all three (received,start work, done 
> work). Has this RPC vanished on the way back to the client? There is no 
> further indication what happend. The last timestamp in the client log is:
> 1272565368.228628
> and the server says it finished the processing of the request at:
> 1272565281.379471
> So the client log has been recorded long enough to contain the 
> "Completed RPC" message for this RPC if it arrived ever ...

If the reply was never received then you should expect to see a timeout for that rpc in the log.

It's also possible, though I'm not sure how likely in this case, that the trace buffer on the cpu that processed the completion has overflowed.  You can check if this is possible by finding the earliest message for each cpu and see if any of them of them is later than the expected time for this message.   

cheers,
robert

> 
> 
> Regards, Michael
> 
> 
>>> In the first case I don't have a chance to figure out when the server
>>> has finished working on the RPC and is going to send out the result,
>>> correct?
>>> 
>>> ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set())
>>> Completed RPC pname:cluuid:pid:xid:nid:opc
>>> ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4
>>> 
>>> ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request())
>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>>> ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4
>>> Request procesed in 238391us (239698us total) trans 266 rc 0/0
>>> 
>>> 
>>> For this RPC I see in the oss log:
>>> 
>>> 1272565278.025337: incoming x1334380768266050
>>> 1272565278.026649: Handling RPC x1334380768266050
>>> 1272565278.265039: Handled RPC x1334380768266050:Request procesed in
>>> 238391us (239698us total)
>>> 
>>> So, by doing the math I get: 265039-026649=238390 (just 1 off target,
>>> might be wathever) and 265039-025337=239702 (4 off target). So I guess
>>> the times reported by the last message are calculated by using a
>>> different data source than the log time stamps?
>> 
>> The log timestamps are generated at the time the message is processed by the log system.  The processing times are themselves generated by accessing the clock and computing the processing before the message is printed.
>> 
>>         do_gettimeofday(&work_start);
>>         :
>>         {handle RPC}
>>         :
>>         do_gettimeofday(&work_end);
>>         timediff = cfs_timeval_sub(&work_end,&work_start, NULL);
>>         CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in "
>>                "%ldus (%ldus total) trans "LPU64" rc %d/%d\n",
>> 
>> 
>> 
>>> Am 29.04.2010 20:27, schrieb di.wang:
>>>> Hello, Michael
>>>> 
>>>> Here is a small debug log example you can use for your development.
>>>> 
>>>> Thanks
>>>> WangDi
>>>> Michael Kluge wrote:
>>>>> Hi WangDi,
>>>>> 
>>>>> OK, thanks for input. I'll go ahead and try to write a converter.
>>>>> Could you please collect a set test traces that belong together from a
>>>>> couple servers and clients and put them somewhere so that I can
>>>>> download them?
>>>>> 
>>>>> 
>>>>> Thanks, Michael
>>>>> 
>>>>> Am 29.04.2010 03:25, schrieb di.wang:
>>>>>> Hello, Michael
>>>>>> 
>>>>>> There is a logfile parser script in the attachment, which was written by
>>>>>> Eric.
>>>>>> 
>>>>>> This script is very simple, but it should help you understand how we
>>>>>> retrieve time step information from lustre debug log. On the server
>>>>>> side, if you enable rpc_trace log, whenever the request arrive/start
>>>>>> being processed/end of processing, there will be corresponding records
>>>>>> being written into the debug log. Basically, you can get all the time
>>>>>> step information from these records (actually only two of these records
>>>>>> would be enough).
>>>>>> 
>>>>>> 
>>>>>> a.
>>>>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request())
>>>>>> 
>>>>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc
>>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> This record means the req will being handled, so you can get the start
>>>>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid
>>>>>> (1334115493386242), client nid(12345-0 at lo) and so on.
>>>>>> 
>>>>>> b.
>>>>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request())
>>>>>> 
>>>>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400
>>>>>> 
>>>>>> Request procesed in 45us (77us total) trans 0 rc 0/0
>>>>>> 
>>>>>> This record means the req is already being handled, so you can get the
>>>>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid
>>>>>> (1334115493386242), client nid(12345-0 at lo) and so no.
>>>>>> 
>>>>>> Note: (77us total) means how long it takes from the request arriving to
>>>>>> the end of processing. so you can also get the request arriving time
>>>>>> stamp here by (1272313858 - 77 = 1272312781).
>>>>>> 
>>>>>> 
>>>>>> So with these information you can draw the graph Eric mentioned in his
>>>>>> email. If you have any questions, please let me know.
>>>>>> 
>>>>>> Thanks
>>>>>> WangDi
>>>>>> 
>>>>>> 
>>>>>> Eric Barton wrote:
>>>>>>> 
>>>>>>> ------------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> Subject:
>>>>>>> RE: Visualising Lustre RPCs
>>>>>>> To:
>>>>>>> "Michael Kluge"<Michael.Kluge@tu-dresden.de>, "wangdi"
>>>>>>> <Tom.Wang@Sun.COM>
>>>>>>> 
>>>>>>> To:
>>>>>>> "Michael Kluge"<Michael.Kluge@tu-dresden.de>, "wangdi"
>>>>>>> <Tom.Wang@Sun.COM>
>>>>>>> CC:
>>>>>>> <lustre-devel@lists.lustre.org>, "Galen M. Shipman"<gshipman@ornl.gov>
>>>>>>> 
>>>>>>> 
>>>>>>> Michael,
>>>>>>> 
>>>>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG
>>>>>>> used the
>>>>>>> 
>>>>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>>>>>>> 
>>>>>>> log messages and then used the Lustre debug daemon to collect them
>>>>>>> while
>>>>>>> 
>>>>>>> we ran I/O tests. We then ran a simple logfile parser which used just
>>>>>>> the log entries
>>>>>>> 
>>>>>>> for request arrival?
>>>>>>> 
>>>>>>> , start of processing and end of processing to graph request
>>>>>>> 
>>>>>>> queue depth (arrival->end) and the number of requests being serviced
>>>>>>> by type
>>>>>>> 
>>>>>>> over time ? e.g?
>>>>>>> 
>>>>>>> read3d
>>>>>>> 
>>>>>>> ?which shows request queue depth (vertical) over time (axis labelled
>>>>>>> 20-25) by
>>>>>>> 
>>>>>>> server (axis labelled 0-80).
>>>>>>> 
>>>>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>>>>>>> *Sent:* 17 April 2010 6:26 AM
>>>>>>> *To:* Galen M. Shipman; Eric Barton
>>>>>>> *Subject:* Visualising Lustre RPCs
>>>>>>> 
>>>>>>> Hi Galen, Eric,
>>>>>>> 
>>>>>>> in order to get this little project started, I think what I need at
>>>>>>> first to
>>>>>>> 
>>>>>>> write a prototype for a converter are the following things:
>>>>>>> 
>>>>>>> A set of test traces collected on maybe a handful of clients and some
>>>>>>> servers
>>>>>>> 
>>>>>>> is probably a good point to start with. It would be even better if we
>>>>>>> know
>>>>>>> 
>>>>>>> what is in this traces so that we have an expectation what kind of
>>>>>>> things
>>>>>>> 
>>>>>>> we want to see on the Vampir displays. This little program that Eric
>>>>>>> mentioned
>>>>>>> 
>>>>>>> that can read the trace file would be very helpful as well. And as the
>>>>>>> last
>>>>>>> 
>>>>>>> idea I have right now, a technical contact. I might come up with a
>>>>>>> couple
>>>>>>> 
>>>>>>> of question after I have taken the first look onto the original trace
>>>>>>> data
>>>>>>> 
>>>>>>> and before I start writing code.
>>>>>>> 
>>>>>>> Regards, Michael
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Michael Kluge, M.Sc.
>>>>>>> 
>>>>>>> Technische Universit?t Dresden
>>>>>>> Center for Information Services and
>>>>>>> High Performance Computing (ZIH)
>>>>>>> D-01062 Dresden
>>>>>>> Germany
>>>>>>> 
>>>>>>> Contact:
>>>>>>> Willersbau, Room WIL A 208
>>>>>>> Phone: (+49) 351 463-34217
>>>>>>> Fax: (+49) 351 463-37773
>>>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>>>> <mailto:michael.kluge@tu-dresden.de>
>>>>>>> WWW: http://www.tu-dresden.de/zih
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 18:10           ` Michael Kluge
  2010-05-03 18:57             ` Robert Read
@ 2010-05-03 18:58             ` di.wang
  2010-05-03 19:32               ` Michael Kluge
  1 sibling, 1 reply; 38+ messages in thread
From: di.wang @ 2010-05-03 18:58 UTC (permalink / raw)
  To: lustre-devel

Michael Kluge wrote:
> Hi Andreas,
>
> Am 03.05.2010 15:20, schrieb Andreas Dilger:
>> On 2010-05-03, at 04:41, Michael Kluge wrote:
>>> I have a small problem understanding the logical order of the events.
>>> Here is one example from the logs where the client reports an RPC as
>>> completed before the server says that it has finished the RPC handling.
>>> The difference is about 1.5 ms.
>>>
>>> Is that due to the fact that the server waits until the client has
>>> ack'ed the result of the RPC? Or are the clocks between the servers not
>>> well synchronized? Or are the timestamps in the logfile sometimes not
>>> correct (log message could not be flushed or whatever)?
>>
>> The timestamps on the log messages are only as accurate as the clocks on
> > the nodes.  Lustre can run without synchronized clocks, using the 
> client
> > clock as the basis for the timestamps, but Lustre makes no effort to
> > synchronize the client clocks.  For that you need to use NTP (easiest)
>>  or adjust the timestamps in the logs based on messages like this.
>
> OK, so logically the "Completed RPC" on the client side is supposed to 
> show up after the server has written his "Handled RPC" to its log (if 
> clocks are snyc'ed perfectly).
Yes.
>
> One more question: RPC 1334380768266400 (in the log WangDi sent me) 
> has on the client side only a "Sending RPC" message, thus missing the 
> "Completed RPC". The server has all three (received,start work, done 
> work). Has this RPC vanished on the way back to the client? There is 
> no further indication what happend. The last timestamp in the client 
> log is:
> 1272565368.228628
> and the server says it finished the processing of the request at:
> 1272565281.379471
> So the client log has been recorded long enough to contain the 
> "Completed RPC" message for this RPC if it arrived ever ...
Logically, yes. But in some cases, some debug logs might be abandoned 
for some reasons(actually, it happens not rarely), and probably you need 
maintain an average time from server "Handled RPC" to client "Completed 
RPC", then you just guess the client "Completed RPC" time in this case.

Thanks
WangDi

>
>
> Regards, Michael
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 18:58             ` di.wang
@ 2010-05-03 19:32               ` Michael Kluge
  2010-05-03 19:52                 ` di.wang
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-05-03 19:32 UTC (permalink / raw)
  To: lustre-devel

Hi WangDi,

>> OK, so logically the "Completed RPC" on the client side is supposed to
>> show up after the server has written his "Handled RPC" to its log (if
>> clocks are snyc'ed perfectly).
> Yes.

Thanks.

>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
>> has on the client side only a "Sending RPC" message, thus missing the
>> "Completed RPC". The server has all three (received,start work, done
>> work). Has this RPC vanished on the way back to the client? There is
>> no further indication what happend. The last timestamp in the client
>> log is:
>> 1272565368.228628
>> and the server says it finished the processing of the request at:
>> 1272565281.379471
>> So the client log has been recorded long enough to contain the
>> "Completed RPC" message for this RPC if it arrived ever ...
> Logically, yes. But in some cases, some debug logs might be abandoned
> for some reasons(actually, it happens not rarely), and probably you need
> maintain an average time from server "Handled RPC" to client "Completed
> RPC", then you just guess the client "Completed RPC" time in this case.

Oh my gosh ;) I don't want to start speculations about the helpfulness 
of incomplete debug logs. Anyway, what can get lost? Any kind of message 
on the servers and clients?  I think I'd like to know what cases have to 
be handled while I try to track individual RPC's on their way.


Regards, Michael

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 19:32               ` Michael Kluge
@ 2010-05-03 19:52                 ` di.wang
  2010-05-03 20:04                   ` Michael Kluge
  2010-05-16  9:29                   ` Michael Kluge
  0 siblings, 2 replies; 38+ messages in thread
From: di.wang @ 2010-05-03 19:52 UTC (permalink / raw)
  To: lustre-devel

Michael Kluge wrote:
>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
>>> has on the client side only a "Sending RPC" message, thus missing the
>>> "Completed RPC". The server has all three (received,start work, done
>>> work). Has this RPC vanished on the way back to the client? There is
>>> no further indication what happend. The last timestamp in the client
>>> log is:
>>> 1272565368.228628
>>> and the server says it finished the processing of the request at:
>>> 1272565281.379471
>>> So the client log has been recorded long enough to contain the
>>> "Completed RPC" message for this RPC if it arrived ever ...
>>>       
>> Logically, yes. But in some cases, some debug logs might be abandoned
>> for some reasons(actually, it happens not rarely), and probably you need
>> maintain an average time from server "Handled RPC" to client "Completed
>> RPC", then you just guess the client "Completed RPC" time in this case.
>>     
>
> Oh my gosh ;) I don't want to start speculations about the helpfulness 
> of incomplete debug logs. Anyway, what can get lost? Any kind of message 
> on the servers and clients?  I think I'd like to know what cases have to 
> be handled while I try to track individual RPC's on their way.
>   
Any records can get lost here. Unfortunately, there are not any messages 
indicate the missing happened. :(
(Usually, I would check the time stamp in the log, i.e. no records for a 
"long" time, for example several seconds, but this is not the accurate way).

I guess you can just ignore these uncompleted records in your first 
step?  Let's see how these incomplete log will
impact the profiling result, then we will decide how to deal with this?

Thanks
Wangdi
>
> Regards, Michael
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 19:52                 ` di.wang
@ 2010-05-03 20:04                   ` Michael Kluge
  2010-05-16  9:29                   ` Michael Kluge
  1 sibling, 0 replies; 38+ messages in thread
From: Michael Kluge @ 2010-05-03 20:04 UTC (permalink / raw)
  To: lustre-devel

OK, sounds reasonable. I can put the number of ignored messages in an 
additional counter. That should make the generated trace file more 
descriptive.

Regards, Michael

Am 03.05.2010 21:52, schrieb di.wang:
>> Anyway, what can get lost? Any kind of
>> message on the servers and clients? I think I'd like to know what
>> cases have to be handled while I try to track individual RPC's on
>> their way.
> Any records can get lost here. Unfortunately, there are not any messages
> indicate the missing happened. :(
> (Usually, I would check the time stamp in the log, i.e. no records for a
> "long" time, for example several seconds, but this is not the accurate
> way).
>
> I guess you can just ignore these uncompleted records in your first
> step? Let's see how these incomplete log will
> impact the profiling result, then we will decide how to deal with this?
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-03 19:52                 ` di.wang
  2010-05-03 20:04                   ` Michael Kluge
@ 2010-05-16  9:29                   ` Michael Kluge
  2010-05-16 13:12                     ` Eric Barton
                                       ` (3 more replies)
  1 sibling, 4 replies; 38+ messages in thread
From: Michael Kluge @ 2010-05-16  9:29 UTC (permalink / raw)
  To: lustre-devel

Hi WangDi,

the first version works. Screenshot is attached. I have a couple of 
counter realized: RPC's in flight and RPC's completed in total on the 
client, RPC's enqueued, RPC's in processing and RPC'c completed in total 
on the server. All these counter can be broken down by the type of RPC 
(op code). The picture has not yet the lines that show each single RPC, 
I still have to do counter like "avg. time to complete an RPC over the 
last second" and there are some more TODO's. Like the timer 
synchronization. (In the screenshot the first and the last counter show 
total values while the one in the middle shows a rate.)

What I like to have is a complete set of traces from a small cluster 
(<100 nodes) including the servers. Would that be possible?

Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and 
like to talk about what would be useful for the next steps.


Regards, Michael

Am 03.05.2010 21:52, schrieb di.wang:
> Michael Kluge wrote:
>>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
>>>> has on the client side only a "Sending RPC" message, thus missing the
>>>> "Completed RPC". The server has all three (received,start work, done
>>>> work). Has this RPC vanished on the way back to the client? There is
>>>> no further indication what happend. The last timestamp in the client
>>>> log is:
>>>> 1272565368.228628
>>>> and the server says it finished the processing of the request at:
>>>> 1272565281.379471
>>>> So the client log has been recorded long enough to contain the
>>>> "Completed RPC" message for this RPC if it arrived ever ...
>>> Logically, yes. But in some cases, some debug logs might be abandoned
>>> for some reasons(actually, it happens not rarely), and probably you need
>>> maintain an average time from server "Handled RPC" to client "Completed
>>> RPC", then you just guess the client "Completed RPC" time in this case.
>>
>> Oh my gosh ;) I don't want to start speculations about the helpfulness
>> of incomplete debug logs. Anyway, what can get lost? Any kind of
>> message on the servers and clients? I think I'd like to know what
>> cases have to be handled while I try to track individual RPC's on
>> their way.
> Any records can get lost here. Unfortunately, there are not any messages
> indicate the missing happened. :(
> (Usually, I would check the time stamp in the log, i.e. no records for a
> "long" time, for example several seconds, but this is not the accurate
> way).
>
> I guess you can just ignore these uncompleted records in your first
> step? Let's see how these incomplete log will
> impact the profiling result, then we will decide how to deal with this?
>
> Thanks
> Wangdi
>>
>> Regards, Michael
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
>


-- 
Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Vampir - [Trace View - lustre_rpcs.otf].png
Type: image/png
Size: 97181 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100516/f4cd3a83/attachment.png>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-16  9:29                   ` Michael Kluge
@ 2010-05-16 13:12                     ` Eric Barton
  2010-05-17  4:52                       ` Michael Kluge
  2010-05-17  3:24                     ` Andrew Uselton
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Eric Barton @ 2010-05-16 13:12 UTC (permalink / raw)
  To: lustre-devel

Excellent :)

How do you think measurements taken from 1000 servers with 100,000
clients can be visualised?  We've used heat maps to visualise
10s-100s of concurrent measurements (y) over time (x) but I wonder
if that will scale.  Does vampire support heat maps?

    Cheers,
              Eric

> -----Original Message-----
> From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
> Sent: 16 May 2010 10:30 AM
> To: di.wang
> Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel
> Subject: Re: [Lustre-devel] Lustre RPC visualization
> 
> Hi WangDi,
> 
> the first version works. Screenshot is attached. I have a couple of
> counter realized: RPC's in flight and RPC's completed in total on the
> client, RPC's enqueued, RPC's in processing and RPC'c completed in total
> on the server. All these counter can be broken down by the type of RPC
> (op code). The picture has not yet the lines that show each single RPC,
> I still have to do counter like "avg. time to complete an RPC over the
> last second" and there are some more TODO's. Like the timer
> synchronization. (In the screenshot the first and the last counter show
> total values while the one in the middle shows a rate.)
> 
> What I like to have is a complete set of traces from a small cluster
> (<100 nodes) including the servers. Would that be possible?
> 
> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and
> like to talk about what would be useful for the next steps.
> 
> 
> Regards, Michael
> 
> Am 03.05.2010 21:52, schrieb di.wang:
> > Michael Kluge wrote:
> >>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
> >>>> has on the client side only a "Sending RPC" message, thus missing the
> >>>> "Completed RPC". The server has all three (received,start work, done
> >>>> work). Has this RPC vanished on the way back to the client? There is
> >>>> no further indication what happend. The last timestamp in the client
> >>>> log is:
> >>>> 1272565368.228628
> >>>> and the server says it finished the processing of the request at:
> >>>> 1272565281.379471
> >>>> So the client log has been recorded long enough to contain the
> >>>> "Completed RPC" message for this RPC if it arrived ever ...
> >>> Logically, yes. But in some cases, some debug logs might be abandoned
> >>> for some reasons(actually, it happens not rarely), and probably you need
> >>> maintain an average time from server "Handled RPC" to client "Completed
> >>> RPC", then you just guess the client "Completed RPC" time in this case.
> >>
> >> Oh my gosh ;) I don't want to start speculations about the helpfulness
> >> of incomplete debug logs. Anyway, what can get lost? Any kind of
> >> message on the servers and clients? I think I'd like to know what
> >> cases have to be handled while I try to track individual RPC's on
> >> their way.
> > Any records can get lost here. Unfortunately, there are not any messages
> > indicate the missing happened. :(
> > (Usually, I would check the time stamp in the log, i.e. no records for a
> > "long" time, for example several seconds, but this is not the accurate
> > way).
> >
> > I guess you can just ignore these uncompleted records in your first
> > step? Let's see how these incomplete log will
> > impact the profiling result, then we will decide how to deal with this?
> >
> > Thanks
> > Wangdi
> >>
> >> Regards, Michael
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >
> >
> 
> 
> --
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-16  9:29                   ` Michael Kluge
  2010-05-16 13:12                     ` Eric Barton
@ 2010-05-17  3:24                     ` Andrew Uselton
  2010-05-17  5:53                       ` Michael Kluge
       [not found]                     ` <009101caf4f9$67e1dd50$37a597f0$%barton@oracle.com>
  2010-05-25 12:03                     ` Michael Kluge
  3 siblings, 1 reply; 38+ messages in thread
From: Andrew Uselton @ 2010-05-17  3:24 UTC (permalink / raw)
  To: lustre-devel

I think this work is very interesting.  Will anyone be at CUG 2010 next week
to discuss?
Cheers,
Andrew


2010/5/16 Michael Kluge <Michael.Kluge@tu-dresden.de>

> Hi WangDi,
>
> the first version works. Screenshot is attached. I have a couple of counter
> realized: RPC's in flight and RPC's completed in total on the client, RPC's
> enqueued, RPC's in processing and RPC'c completed in total on the server.
> All these counter can be broken down by the type of RPC (op code). The
> picture has not yet the lines that show each single RPC, I still have to do
> counter like "avg. time to complete an RPC over the last second" and there
> are some more TODO's. Like the timer synchronization. (In the screenshot the
> first and the last counter show total values while the one in the middle
> shows a rate.)
>
> What I like to have is a complete set of traces from a small cluster (<100
> nodes) including the servers. Would that be possible?
>
> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and
> like to talk about what would be useful for the next steps.
>
>
>
> Regards, Michael
>
> Am 03.05.2010 21:52, schrieb di.wang:
>
>> Michael Kluge wrote:
>>
>>  One more question: RPC 1334380768266400 (in the log WangDi sent me)
>>>>> has on the client side only a "Sending RPC" message, thus missing the
>>>>> "Completed RPC". The server has all three (received,start work, done
>>>>> work). Has this RPC vanished on the way back to the client? There is
>>>>> no further indication what happend. The last timestamp in the client
>>>>> log is:
>>>>> 1272565368.228628
>>>>> and the server says it finished the processing of the request at:
>>>>> 1272565281.379471
>>>>> So the client log has been recorded long enough to contain the
>>>>> "Completed RPC" message for this RPC if it arrived ever ...
>>>>>
>>>> Logically, yes. But in some cases, some debug logs might be abandoned
>>>> for some reasons(actually, it happens not rarely), and probably you need
>>>> maintain an average time from server "Handled RPC" to client "Completed
>>>> RPC", then you just guess the client "Completed RPC" time in this case.
>>>>
>>>
>>> Oh my gosh ;) I don't want to start speculations about the helpfulness
>>> of incomplete debug logs. Anyway, what can get lost? Any kind of
>>> message on the servers and clients? I think I'd like to know what
>>> cases have to be handled while I try to track individual RPC's on
>>> their way.
>>>
>> Any records can get lost here. Unfortunately, there are not any messages
>> indicate the missing happened. :(
>> (Usually, I would check the time stamp in the log, i.e. no records for a
>> "long" time, for example several seconds, but this is not the accurate
>> way).
>>
>> I guess you can just ignore these uncompleted records in your first
>> step? Let's see how these incomplete log will
>> impact the profiling result, then we will decide how to deal with this?
>>
>> Thanks
>> Wangdi
>>
>>>
>>> Regards, Michael
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>
>>
>>
>>
>
> --
> Michael Kluge, M.Sc.
>
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
>
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100516/b1d7074f/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]                     ` <009101caf4f9$67e1dd50$37a597f0$%barton@oracle.com>
@ 2010-05-17  3:39                       ` Shipman, Galen M.
  2010-05-17  5:59                         ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Shipman, Galen M. @ 2010-05-17  3:39 UTC (permalink / raw)
  To: lustre-devel

I would be very interested in this, particularly at scale. We should look into collecting some large scale traces on Jaguar for experimentation of various visualization techniques.

Thanks,

Galen


On May 16, 2010, at 9:12 AM, Eric Barton wrote:

Excellent :)

How do you think measurements taken from 1000 servers with 100,000
clients can be visualised?  We've used heat maps to visualise
10s-100s of concurrent measurements (y) over time (x) but I wonder
if that will scale.  Does vampire support heat maps?

   Cheers,
             Eric

-----Original Message-----
From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
Sent: 16 May 2010 10:30 AM
To: di.wang
Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel
Subject: Re: [Lustre-devel] Lustre RPC visualization

Hi WangDi,

the first version works. Screenshot is attached. I have a couple of
counter realized: RPC's in flight and RPC's completed in total on the
client, RPC's enqueued, RPC's in processing and RPC'c completed in total
on the server. All these counter can be broken down by the type of RPC
(op code). The picture has not yet the lines that show each single RPC,
I still have to do counter like "avg. time to complete an RPC over the
last second" and there are some more TODO's. Like the timer
synchronization. (In the screenshot the first and the last counter show
total values while the one in the middle shows a rate.)

What I like to have is a complete set of traces from a small cluster
(<100 nodes) including the servers. Would that be possible?

Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and
like to talk about what would be useful for the next steps.


Regards, Michael

Am 03.05.2010 21:52, schrieb di.wang:
Michael Kluge wrote:
One more question: RPC 1334380768266400 (in the log WangDi sent me)
has on the client side only a "Sending RPC" message, thus missing the
"Completed RPC". The server has all three (received,start work, done
work). Has this RPC vanished on the way back to the client? There is
no further indication what happend. The last timestamp in the client
log is:
1272565368.228628
and the server says it finished the processing of the request at:
1272565281.379471
So the client log has been recorded long enough to contain the
"Completed RPC" message for this RPC if it arrived ever ...
Logically, yes. But in some cases, some debug logs might be abandoned
for some reasons(actually, it happens not rarely), and probably you need
maintain an average time from server "Handled RPC" to client "Completed
RPC", then you just guess the client "Completed RPC" time in this case.

Oh my gosh ;) I don't want to start speculations about the helpfulness
of incomplete debug logs. Anyway, what can get lost? Any kind of
message on the servers and clients? I think I'd like to know what
cases have to be handled while I try to track individual RPC's on
their way.
Any records can get lost here. Unfortunately, there are not any messages
indicate the missing happened. :(
(Usually, I would check the time stamp in the log, i.e. no records for a
"long" time, for example several seconds, but this is not the accurate
way).

I guess you can just ignore these uncompleted records in your first
step? Let's see how these incomplete log will
impact the profiling result, then we will decide how to deal with this?

Thanks
Wangdi

Regards, Michael
_______________________________________________
Lustre-devel mailing list
Lustre-devel at lists.lustre.org<mailto:Lustre-devel@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-devel




--
Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge@tu-dresden.de>
WWW:    http://www.tu-dresden.de/zih


Galen M. Shipman
Group Leader - Technology Integration
National Center for Computational Sciences
Oak Ridge National Laboratory
Office: 865.576.2672
Cell:   865.307.1209
Email:  gshipman at ornl.gov<mailto:gshipman@ornl.gov>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-16 13:12                     ` Eric Barton
@ 2010-05-17  4:52                       ` Michael Kluge
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Kluge @ 2010-05-17  4:52 UTC (permalink / raw)
  To: lustre-devel

Eric,

Vampir does scale up to this point. Almost, if I remember well, but 
thing might have changed recently). If it does not, well' make it ;) We 
have visualized real trace files with up to 20.000 processes and 
artificial trace files with many more. Heat maps can be emulated. Once 
we have had something like a color-coded counter timeline that procduces 
a heat map. Right now it can be done by on the timeline by placing 
artificial enter/leave events for function called "5%", "10%", and so 
on. I have done this already in the past, so it should not be too difficult.


Michael

Am 16.05.2010 15:12, schrieb Eric Barton:
> Excellent :)
>
> How do you think measurements taken from 1000 servers with 100,000
> clients can be visualised?  We've used heat maps to visualise
> 10s-100s of concurrent measurements (y) over time (x) but I wonder
> if that will scale.  Does vampire support heat maps?
>
>      Cheers,
>                Eric
>
>> -----Original Message-----
>> From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>> Sent: 16 May 2010 10:30 AM
>> To: di.wang
>> Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel
>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>
>> Hi WangDi,
>>
>> the first version works. Screenshot is attached. I have a couple of
>> counter realized: RPC's in flight and RPC's completed in total on the
>> client, RPC's enqueued, RPC's in processing and RPC'c completed in total
>> on the server. All these counter can be broken down by the type of RPC
>> (op code). The picture has not yet the lines that show each single RPC,
>> I still have to do counter like "avg. time to complete an RPC over the
>> last second" and there are some more TODO's. Like the timer
>> synchronization. (In the screenshot the first and the last counter show
>> total values while the one in the middle shows a rate.)
>>
>> What I like to have is a complete set of traces from a small cluster
>> (<100 nodes) including the servers. Would that be possible?
>>
>> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and
>> like to talk about what would be useful for the next steps.
>>
>>
>> Regards, Michael
>>
>> Am 03.05.2010 21:52, schrieb di.wang:
>>> Michael Kluge wrote:
>>>>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
>>>>>> has on the client side only a "Sending RPC" message, thus missing the
>>>>>> "Completed RPC". The server has all three (received,start work, done
>>>>>> work). Has this RPC vanished on the way back to the client? There is
>>>>>> no further indication what happend. The last timestamp in the client
>>>>>> log is:
>>>>>> 1272565368.228628
>>>>>> and the server says it finished the processing of the request at:
>>>>>> 1272565281.379471
>>>>>> So the client log has been recorded long enough to contain the
>>>>>> "Completed RPC" message for this RPC if it arrived ever ...
>>>>> Logically, yes. But in some cases, some debug logs might be abandoned
>>>>> for some reasons(actually, it happens not rarely), and probably you need
>>>>> maintain an average time from server "Handled RPC" to client "Completed
>>>>> RPC", then you just guess the client "Completed RPC" time in this case.
>>>>
>>>> Oh my gosh ;) I don't want to start speculations about the helpfulness
>>>> of incomplete debug logs. Anyway, what can get lost? Any kind of
>>>> message on the servers and clients? I think I'd like to know what
>>>> cases have to be handled while I try to track individual RPC's on
>>>> their way.
>>> Any records can get lost here. Unfortunately, there are not any messages
>>> indicate the missing happened. :(
>>> (Usually, I would check the time stamp in the log, i.e. no records for a
>>> "long" time, for example several seconds, but this is not the accurate
>>> way).
>>>
>>> I guess you can just ignore these uncompleted records in your first
>>> step? Let's see how these incomplete log will
>>> impact the profiling result, then we will decide how to deal with this?
>>>
>>> Thanks
>>> Wangdi
>>>>
>>>> Regards, Michael
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>
>>>
>>
>>
>> --
>> Michael Kluge, M.Sc.
>>
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>>
>> Contact:
>> Willersbau, Room WIL A 208
>> Phone:  (+49) 351 463-34217
>> Fax:    (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de
>> WWW:    http://www.tu-dresden.de/zih
>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-17  3:24                     ` Andrew Uselton
@ 2010-05-17  5:53                       ` Michael Kluge
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Kluge @ 2010-05-17  5:53 UTC (permalink / raw)
  To: lustre-devel

Hi Andrew,

unfortunately no. We don't own a Cray :( 


Regards, Michael


Am Sonntag, den 16.05.2010, 20:24 -0700 schrieb Andrew Uselton:
> I think this work is very interesting.  Will anyone be at CUG 2010
> next week to discuss? 
> Cheers,
> Andrew
> 
> 
> 2010/5/16 Michael Kluge <Michael.Kluge@tu-dresden.de>
>         Hi WangDi,
>         
>         the first version works. Screenshot is attached. I have a
>         couple of counter realized: RPC's in flight and RPC's
>         completed in total on the client, RPC's enqueued, RPC's in
>         processing and RPC'c completed in total on the server. All
>         these counter can be broken down by the type of RPC (op code).
>         The picture has not yet the lines that show each single RPC, I
>         still have to do counter like "avg. time to complete an RPC
>         over the last second" and there are some more TODO's. Like the
>         timer synchronization. (In the screenshot the first and the
>         last counter show total values while the one in the middle
>         shows a rate.)
>         
>         What I like to have is a complete set of traces from a small
>         cluster (<100 nodes) including the servers. Would that be
>         possible?
>         
>         Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be
>         there and like to talk about what would be useful for the next
>         steps. 
>         
>         
>         
>         Regards, Michael
>         
>         Am 03.05.2010 21:52, schrieb di.wang:
>         
>                 Michael Kluge wrote: 
>                 
>                 
>                                         One more question: RPC
>                                         1334380768266400 (in the log
>                                         WangDi sent me)
>                                         has on the client side only a
>                                         "Sending RPC" message, thus
>                                         missing the
>                                         "Completed RPC". The server
>                                         has all three (received,start
>                                         work, done
>                                         work). Has this RPC vanished
>                                         on the way back to the client?
>                                         There is
>                                         no further indication what
>                                         happend. The last timestamp in
>                                         the client
>                                         log is:
>                                         1272565368.228628
>                                         and the server says it
>                                         finished the processing of the
>                                         request at:
>                                         1272565281.379471
>                                         So the client log has been
>                                         recorded long enough to
>                                         contain the
>                                         "Completed RPC" message for
>                                         this RPC if it arrived
>                                         ever ...
>                                 Logically, yes. But in some cases,
>                                 some debug logs might be abandoned
>                                 for some reasons(actually, it happens
>                                 not rarely), and probably you need
>                                 maintain an average time from server
>                                 "Handled RPC" to client "Completed
>                                 RPC", then you just guess the client
>                                 "Completed RPC" time in this case.
>                         
>                         Oh my gosh ;) I don't want to start
>                         speculations about the helpfulness
>                         of incomplete debug logs. Anyway, what can get
>                         lost? Any kind of
>                         message on the servers and clients? I think
>                         I'd like to know what
>                         cases have to be handled while I try to track
>                         individual RPC's on
>                         their way.
>                 Any records can get lost here. Unfortunately, there
>                 are not any messages
>                 indicate the missing happened. :(
>                 (Usually, I would check the time stamp in the log,
>                 i.e. no records for a
>                 "long" time, for example several seconds, but this is
>                 not the accurate
>                 way).
>                 
>                 I guess you can just ignore these uncompleted records
>                 in your first
>                 step? Let's see how these incomplete log will
>                 impact the profiling result, then we will decide how
>                 to deal with this?
>                 
>                 Thanks
>                 Wangdi
>                         
>                         Regards, Michael
>                         _______________________________________________
>                         Lustre-devel mailing list
>                         Lustre-devel at lists.lustre.org
>                         http://lists.lustre.org/mailman/listinfo/lustre-devel
>                 
>                 
>                 
>         
>         
>         -- 
>         Michael Kluge, M.Sc.
>         
>         Technische Universit?t Dresden
>         Center for Information Services and
>         High Performance Computing (ZIH)
>         D-01062 Dresden
>         Germany
>         
>         Contact:
>         Willersbau, Room WIL A 208
>         Phone:  (+49) 351 463-34217
>         Fax:    (+49) 351 463-37773
>         e-mail: michael.kluge at tu-dresden.de
>         
>         
>         WWW:    http://www.tu-dresden.de/zih
>         
>         
>         _______________________________________________
>         Lustre-devel mailing list
>         Lustre-devel at lists.lustre.org
>         http://lists.lustre.org/mailman/listinfo/lustre-devel
>         
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100517/92de1f31/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-17  3:39                       ` Shipman, Galen M.
@ 2010-05-17  5:59                         ` Michael Kluge
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Kluge @ 2010-05-17  5:59 UTC (permalink / raw)
  To: lustre-devel

Hi Galen,

that's why we started this :) I guess I need some more input data to be
able to play with the visualization. Vampir also supports markers that
can point to specific (process,timestamp) combinations. If we know that
some kind of condition happened while we collected the trace, it is
probably possible to detect this in the trace converter and to put a
mark into the trace file. That would help finding "interesting points"
faster.


Regards, Michael

Am Sonntag, den 16.05.2010, 23:39 -0400 schrieb Shipman, Galen M.: 
> I would be very interested in this, particularly at scale. We should look into collecting some large scale traces on Jaguar for experimentation of various visualization techniques.
> 
> Thanks,
> 
> Galen
> 
> 
> On May 16, 2010, at 9:12 AM, Eric Barton wrote:
> 
> Excellent :)
> 
> How do you think measurements taken from 1000 servers with 100,000
> clients can be visualised?  We've used heat maps to visualise
> 10s-100s of concurrent measurements (y) over time (x) but I wonder
> if that will scale.  Does vampire support heat maps?
> 
>    Cheers,
>              Eric
> 
> -----Original Message-----
> From: Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
> Sent: 16 May 2010 10:30 AM
> To: di.wang
> Cc: Eric Barton; Andreas Dilger; Robert Read; Galen M. Shipman; lustre-devel
> Subject: Re: [Lustre-devel] Lustre RPC visualization
> 
> Hi WangDi,
> 
> the first version works. Screenshot is attached. I have a couple of
> counter realized: RPC's in flight and RPC's completed in total on the
> client, RPC's enqueued, RPC's in processing and RPC'c completed in total
> on the server. All these counter can be broken down by the type of RPC
> (op code). The picture has not yet the lines that show each single RPC,
> I still have to do counter like "avg. time to complete an RPC over the
> last second" and there are some more TODO's. Like the timer
> synchronization. (In the screenshot the first and the last counter show
> total values while the one in the middle shows a rate.)
> 
> What I like to have is a complete set of traces from a small cluster
> (<100 nodes) including the servers. Would that be possible?
> 
> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and
> like to talk about what would be useful for the next steps.
> 
> 
> Regards, Michael
> 
> Am 03.05.2010 21:52, schrieb di.wang:
> Michael Kluge wrote:
> One more question: RPC 1334380768266400 (in the log WangDi sent me)
> has on the client side only a "Sending RPC" message, thus missing the
> "Completed RPC". The server has all three (received,start work, done
> work). Has this RPC vanished on the way back to the client? There is
> no further indication what happend. The last timestamp in the client
> log is:
> 1272565368.228628
> and the server says it finished the processing of the request at:
> 1272565281.379471
> So the client log has been recorded long enough to contain the
> "Completed RPC" message for this RPC if it arrived ever ...
> Logically, yes. But in some cases, some debug logs might be abandoned
> for some reasons(actually, it happens not rarely), and probably you need
> maintain an average time from server "Handled RPC" to client "Completed
> RPC", then you just guess the client "Completed RPC" time in this case.
> 
> Oh my gosh ;) I don't want to start speculations about the helpfulness
> of incomplete debug logs. Anyway, what can get lost? Any kind of
> message on the servers and clients? I think I'd like to know what
> cases have to be handled while I try to track individual RPC's on
> their way.
> Any records can get lost here. Unfortunately, there are not any messages
> indicate the missing happened. :(
> (Usually, I would check the time stamp in the log, i.e. no records for a
> "long" time, for example several seconds, but this is not the accurate
> way).
> 
> I guess you can just ignore these uncompleted records in your first
> step? Let's see how these incomplete log will
> impact the profiling result, then we will decide how to deal with this?
> 
> Thanks
> Wangdi
> 
> Regards, Michael
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org<mailto:Lustre-devel@lists.lustre.org>
> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
> 
> 
> --
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge@tu-dresden.de>
> WWW:    http://www.tu-dresden.de/zih
> 
> 
> Galen M. Shipman
> Group Leader - Technology Integration
> National Center for Computational Sciences
> Oak Ridge National Laboratory
> Office: 865.576.2672
> Cell:   865.307.1209
> Email:  gshipman at ornl.gov<mailto:gshipman@ornl.gov>
> 
> 
> 
> 

-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100517/c15ea9a9/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-05-16  9:29                   ` Michael Kluge
                                       ` (2 preceding siblings ...)
       [not found]                     ` <009101caf4f9$67e1dd50$37a597f0$%barton@oracle.com>
@ 2010-05-25 12:03                     ` Michael Kluge
       [not found]                       ` <4BFC7177.9000808@oracle.com>
  3 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-05-25 12:03 UTC (permalink / raw)
  To: lustre-devel

Hi WangDi,

so, for the moment I am done with what I promised. The work to be done
is mainly debugging with more input data sets. Screenshot of Vampir
showing the derived counter values for the RPC processing/queue times on
the server and the client is attached. Units for the values are either
microseconds or just a number.


Regards, Michael

Am Sonntag, den 16.05.2010, 11:29 +0200 schrieb Michael Kluge: 
> Hi WangDi,
> 
> the first version works. Screenshot is attached. I have a couple of 
> counter realized: RPC's in flight and RPC's completed in total on the 
> client, RPC's enqueued, RPC's in processing and RPC'c completed in total 
> on the server. All these counter can be broken down by the type of RPC 
> (op code). The picture has not yet the lines that show each single RPC, 
> I still have to do counter like "avg. time to complete an RPC over the 
> last second" and there are some more TODO's. Like the timer 
> synchronization. (In the screenshot the first and the last counter show 
> total values while the one in the middle shows a rate.)
> 
> What I like to have is a complete set of traces from a small cluster 
> (<100 nodes) including the servers. Would that be possible?
> 
> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and 
> like to talk about what would be useful for the next steps.
> 
> 
> Regards, Michael
> 
> Am 03.05.2010 21:52, schrieb di.wang:
> > Michael Kluge wrote:
> >>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
> >>>> has on the client side only a "Sending RPC" message, thus missing the
> >>>> "Completed RPC". The server has all three (received,start work, done
> >>>> work). Has this RPC vanished on the way back to the client? There is
> >>>> no further indication what happend. The last timestamp in the client
> >>>> log is:
> >>>> 1272565368.228628
> >>>> and the server says it finished the processing of the request at:
> >>>> 1272565281.379471
> >>>> So the client log has been recorded long enough to contain the
> >>>> "Completed RPC" message for this RPC if it arrived ever ...
> >>> Logically, yes. But in some cases, some debug logs might be abandoned
> >>> for some reasons(actually, it happens not rarely), and probably you need
> >>> maintain an average time from server "Handled RPC" to client "Completed
> >>> RPC", then you just guess the client "Completed RPC" time in this case.
> >>
> >> Oh my gosh ;) I don't want to start speculations about the helpfulness
> >> of incomplete debug logs. Anyway, what can get lost? Any kind of
> >> message on the servers and clients? I think I'd like to know what
> >> cases have to be handled while I try to track individual RPC's on
> >> their way.
> > Any records can get lost here. Unfortunately, there are not any messages
> > indicate the missing happened. :(
> > (Usually, I would check the time stamp in the log, i.e. no records for a
> > "long" time, for example several seconds, but this is not the accurate
> > way).
> >
> > I guess you can just ignore these uncompleted records in your first
> > step? Let's see how these incomplete log will
> > impact the profiling result, then we will decide how to deal with this?
> >
> > Thanks
> > Wangdi
> >>
> >> Regards, Michael
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >
> >
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_rpc_trace.png
Type: image/png
Size: 146793 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100525/66fe7ab6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100525/66fe7ab6/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]                       ` <4BFC7177.9000808@oracle.com>
@ 2010-05-28 14:54                         ` Michael Kluge
       [not found]                           ` <4BFFA456.7030502@oracle.com>
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-05-28 14:54 UTC (permalink / raw)
  To: lustre-devel

Hi WangDi,

> Looks great! Just query, as you said, "All these counter can be broken 
> down by the type of RPC (op code)" , you actually implemented that, but 
> not shown in the attached picture? 

Yes.

> And could you please also add "Server queued RPCs" over time ?

Already done.

One good news: The Feature that Vampir can show something like a heat
map (Eric asked about this) comes back with the release at ISC. It is
now called "performance radar". It can produce a heat map for a counter
and does some other things as well. I could send a picture around, but
need at first an bigger trace (more hosts generating traces in
parallel).


Regards, Michael

> Thanks
> WangDi
> 
> Michael Kluge wrote:
> > Hi WangDi,
> >
> > so, for the moment I am done with what I promised. The work to be done
> > is mainly debugging with more input data sets. Screenshot of Vampir
> > showing the derived counter values for the RPC processing/queue times on
> > the server and the client is attached. Units for the values are either
> > microseconds or just a number.
> >
> >
> > Regards, Michael
> >
> > Am Sonntag, den 16.05.2010, 11:29 +0200 schrieb Michael Kluge: 
> >   
> >> Hi WangDi,
> >>
> >> the first version works. Screenshot is attached. I have a couple of 
> >> counter realized: RPC's in flight and RPC's completed in total on the 
> >> client, RPC's enqueued, RPC's in processing and RPC'c completed in total 
> >> on the server. All these counter can be broken down by the type of RPC 
> >> (op code). The picture has not yet the lines that show each single RPC, 
> >> I still have to do counter like "avg. time to complete an RPC over the 
> >> last second" and there are some more TODO's. Like the timer 
> >> synchronization. (In the screenshot the first and the last counter show 
> >> total values while the one in the middle shows a rate.)
> >>
> >> What I like to have is a complete set of traces from a small cluster 
> >> (<100 nodes) including the servers. Would that be possible?
> >>
> >> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and 
> >> like to talk about what would be useful for the next steps.
> >>
> >>
> >> Regards, Michael
> >>
> >> Am 03.05.2010 21:52, schrieb di.wang:
> >>     
> >>> Michael Kluge wrote:
> >>>       
> >>>>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
> >>>>>> has on the client side only a "Sending RPC" message, thus missing the
> >>>>>> "Completed RPC". The server has all three (received,start work, done
> >>>>>> work). Has this RPC vanished on the way back to the client? There is
> >>>>>> no further indication what happend. The last timestamp in the client
> >>>>>> log is:
> >>>>>> 1272565368.228628
> >>>>>> and the server says it finished the processing of the request at:
> >>>>>> 1272565281.379471
> >>>>>> So the client log has been recorded long enough to contain the
> >>>>>> "Completed RPC" message for this RPC if it arrived ever ...
> >>>>>>             
> >>>>> Logically, yes. But in some cases, some debug logs might be abandoned
> >>>>> for some reasons(actually, it happens not rarely), and probably you need
> >>>>> maintain an average time from server "Handled RPC" to client "Completed
> >>>>> RPC", then you just guess the client "Completed RPC" time in this case.
> >>>>>           
> >>>> Oh my gosh ;) I don't want to start speculations about the helpfulness
> >>>> of incomplete debug logs. Anyway, what can get lost? Any kind of
> >>>> message on the servers and clients? I think I'd like to know what
> >>>> cases have to be handled while I try to track individual RPC's on
> >>>> their way.
> >>>>         
> >>> Any records can get lost here. Unfortunately, there are not any messages
> >>> indicate the missing happened. :(
> >>> (Usually, I would check the time stamp in the log, i.e. no records for a
> >>> "long" time, for example several seconds, but this is not the accurate
> >>> way).
> >>>
> >>> I guess you can just ignore these uncompleted records in your first
> >>> step? Let's see how these incomplete log will
> >>> impact the profiling result, then we will decide how to deal with this?
> >>>
> >>> Thanks
> >>> Wangdi
> >>>       
> >>>> Regards, Michael
> >>>> _______________________________________________
> >>>> Lustre-devel mailing list
> >>>> Lustre-devel at lists.lustre.org
> >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>>>         
> >>>       
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>     
> >
> >   
> >
> > ------------------------------------------------------------------------
> >
> 
> 

-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100528/3897d501/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]                                     ` <001601cb01a3$546c93d0$fd45bb70$%barton@oracle.com>
@ 2010-06-01 12:12                                       ` di.wang
  2010-06-01 17:03                                         ` Andreas Dilger
  0 siblings, 1 reply; 38+ messages in thread
From: di.wang @ 2010-06-01 12:12 UTC (permalink / raw)
  To: lustre-devel

Eric Barton wrote:
> I'd really like to see how vampire handles _all_ the trace data
> we can throw at it.  If 600 clients is a pain, how bad will it
> be at 60,000?   
>
> What in particular makes collecting traces from all the clients
> + servers hard?  How can/should we automate it or otherwise make
> it easier?
>   
Aha, I do not mean it is difficult to get all trace logs.

I had thought the current target is to make sure vampire can handle
the trace from "big" enough clusters.

Actually, I am not sure whether vampire(with this trace analyse) should 
work in
this way as we do now?  i.e. running job exclusively on the cluster, get 
rpctrace log, then get
graph by vampire?

Or I miss sth here? Michale, could you please explain a bit? What is 
your idea how vampire could help the end users? or the target here is 
just helping the developers and sysadmin to understand the system?

Thanks
WangDi
>     Cheers,
>               Eric
>
>   
>> -----Original Message-----
>> From: di.wang [mailto:di.wang at oracle.com]
>> Sent: 01 June 2010 12:50 PM
>> To: Robert Read
>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>
>> Hello,
>>
>> IMHO, just run IOR with whatever parameters, and get rpctrace
>> log(probably only enable rpctrace) from 1 OST and some of clients
>> (probably 2 is enough).
>> Note: please make sure these 2 clients did communicate the OST during
>> the IOR.
>>
>> Michael, I do not think you need all the trace logs from the clients. right?
>>
>> Robert
>>
>> If there are available time slots for this test on Hyperion, who can
>> help to get these logs?
>>
>> Thanks
>> Wangdi
>>
>> Robert Read wrote:
>>     
>>> What should I run then? Do have scripts to capture this?
>>>
>>> robert
>>>
>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>
>>>       
>>>> Hi Robert,
>>>>
>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>
>>>>
>>>> Michael
>>>>
>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>
>>>>         
>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>
>>>>>           
>>>>>> Hello, Michael
>>>>>>
>>>>>>             
>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>> counter
>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>> parallel).
>>>>>>>
>>>>>>>               
>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>> I will see what I can do here.
>>>>>>             
>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>
>>>>> robert
>>>>>
>>>>>           
>>>>>> Thanks
>>>>>> WangDi
>>>>>>             
>>>>>>>
>>>>>>>               
>>>>>>>>>                   
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lustre-devel mailing list
>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>
>>>>>>>               
>>>>>           
>>>> --
>>>>
>>>> Michael Kluge, M.Sc.
>>>>
>>>> Technische Universit?t Dresden
>>>> Center for Information Services and
>>>> High Performance Computing (ZIH)
>>>> D-01062 Dresden
>>>> Germany
>>>>
>>>> Contact:
>>>> Willersbau, Room WIL A 208
>>>> Phone:  (+49) 351 463-34217
>>>> Fax:    (+49) 351 463-37773
>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
>>>> WWW:    http://www.tu-dresden.de/zih
>>>>
>>>>         
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]                                   ` <4C04F3F0.9040708@oracle.com>
       [not found]                                     ` <001601cb01a3$546c93d0$fd45bb70$%barton@oracle.com>
@ 2010-06-01 15:58                                     ` Eric Barton
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Barton @ 2010-06-01 15:58 UTC (permalink / raw)
  To: lustre-devel

I'd really like to see how vampire handles _all_ the trace data
we can throw at it.  If 600 clients is a pain, how bad will it
be at 60,000?   

What in particular makes collecting traces from all the clients
+ servers hard?  How can/should we automate it or otherwise make
it easier?

    Cheers,
              Eric

> -----Original Message-----
> From: di.wang [mailto:di.wang at oracle.com]
> Sent: 01 June 2010 12:50 PM
> To: Robert Read
> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
> Subject: Re: [Lustre-devel] Lustre RPC visualization
> 
> Hello,
> 
> IMHO, just run IOR with whatever parameters, and get rpctrace
> log(probably only enable rpctrace) from 1 OST and some of clients
> (probably 2 is enough).
> Note: please make sure these 2 clients did communicate the OST during
> the IOR.
> 
> Michael, I do not think you need all the trace logs from the clients. right?
> 
> Robert
> 
> If there are available time slots for this test on Hyperion, who can
> help to get these logs?
> 
> Thanks
> Wangdi
> 
> Robert Read wrote:
> > What should I run then? Do have scripts to capture this?
> >
> > robert
> >
> > On May 31, 2010, at 2:39 , Michael Kluge wrote:
> >
> >> Hi Robert,
> >>
> >> 600 is a nice number. Plus the traces from the server an I am happy.
> >>
> >>
> >> Michael
> >>
> >> Am 28.05.2010 um 17:53 schrieb Robert Read:
> >>
> >>>
> >>> On May 28, 2010, at 4:09 , di.wang wrote:
> >>>
> >>>> Hello, Michael
> >>>>
> >>>>> One good news: The Feature that Vampir can show something like a heat
> >>>>> map (Eric asked about this) comes back with the release at ISC. It is
> >>>>> now called "performance radar". It can produce a heat map for a
> >>>>> counter
> >>>>> and does some other things as well. I could send a picture around, but
> >>>>> need at first an bigger trace (more hosts generating traces in
> >>>>> parallel).
> >>>>>
> >>>> Right now I do not have big clusters available to generate the trace.
> >>>> I will see what I can do here.
> >>>
> >>> If ~600 clients is big enough we could generate that on Hyperion.
> >>>
> >>> robert
> >>>
> >>>>
> >>>> Thanks
> >>>> WangDi
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> _______________________________________________
> >>>>> Lustre-devel mailing list
> >>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
> >>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >> --
> >>
> >> Michael Kluge, M.Sc.
> >>
> >> Technische Universit?t Dresden
> >> Center for Information Services and
> >> High Performance Computing (ZIH)
> >> D-01062 Dresden
> >> Germany
> >>
> >> Contact:
> >> Willersbau, Room WIL A 208
> >> Phone:  (+49) 351 463-34217
> >> Fax:    (+49) 351 463-37773
> >> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
> >> WWW:    http://www.tu-dresden.de/zih
> >>
> >

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-01 12:12                                       ` di.wang
@ 2010-06-01 17:03                                         ` Andreas Dilger
  2010-06-01 19:39                                           ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Dilger @ 2010-06-01 17:03 UTC (permalink / raw)
  To: lustre-devel

On 2010-06-01, at 06:12, di.wang wrote:
> Michael, I do not think you need all the trace logs from the clients. right?

Actually, I think he does want trace logs from all of the clients.

> I had thought the current target is to make sure vampire can handle
> the trace from "big" enough clusters.
> 
> Actually, I am not sure whether vampire(with this trace analyse) should 
> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
> 
> Or I miss sth here? Michale, could you please explain a bit? What is 
> your idea how vampire could help the end users? or the target here is 
> just helping the developers and sysadmin to understand the system?

My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.

In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.

In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.

>>> -----Original Message-----
>>> From: di.wang [mailto:di.wang at oracle.com]
>>> Sent: 01 June 2010 12:50 PM
>>> To: Robert Read
>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>> 
>>> Hello,
>>> 
>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>> (probably 2 is enough).
>>> Note: please make sure these 2 clients did communicate the OST during
>>> the IOR.
>>> 
>>> Michael, I do not think you need all the trace logs from the clients. right?
>>> 
>>> Robert
>>> 
>>> If there are available time slots for this test on Hyperion, who can
>>> help to get these logs?
>>> 
>>> Thanks
>>> Wangdi
>>> 
>>> Robert Read wrote:
>>> 
>>>> What should I run then? Do have scripts to capture this?
>>>> 
>>>> robert
>>>> 
>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>> 
>>>> 
>>>>> Hi Robert,
>>>>> 
>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>> 
>>>>> 
>>>>> Michael
>>>>> 
>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>> 
>>>>> 
>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>> 
>>>>>> 
>>>>>>> Hello, Michael
>>>>>>> 
>>>>>>> 
>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>> counter
>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>> parallel).
>>>>>>>> 
>>>>>>>> 
>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>> I will see what I can do here.
>>>>>>> 
>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>> 
>>>>>> robert
>>>>>> 
>>>>>> 
>>>>>>> Thanks
>>>>>>> WangDi
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Lustre-devel mailing list
>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> --
>>>>> 
>>>>> Michael Kluge, M.Sc.
>>>>> 
>>>>> Technische Universit?t Dresden
>>>>> Center for Information Services and
>>>>> High Performance Computing (ZIH)
>>>>> D-01062 Dresden
>>>>> Germany
>>>>> 
>>>>> Contact:
>>>>> Willersbau, Room WIL A 208
>>>>> Phone:  (+49) 351 463-34217
>>>>> Fax:    (+49) 351 463-37773
>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>> 
>>>>> 
>> 
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-01 17:03                                         ` Andreas Dilger
@ 2010-06-01 19:39                                           ` Michael Kluge
  2010-06-16  8:46                                             ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-06-01 19:39 UTC (permalink / raw)
  To: lustre-devel

> On 2010-06-01, at 06:12, di.wang wrote:
>> Michael, I do not think you need all the trace logs from the clients. right?
> 
> Actually, I think he does want trace logs from all of the clients.

Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.


Michael

>> I had thought the current target is to make sure vampire can handle
>> the trace from "big" enough clusters.
>> 
>> Actually, I am not sure whether vampire(with this trace analyse) should 
>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
>> 
>> Or I miss sth here? Michale, could you please explain a bit? What is 
>> your idea how vampire could help the end users? or the target here is 
>> just helping the developers and sysadmin to understand the system?
> 
> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
> 
> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
> 
> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
> 
>>>> -----Original Message-----
>>>> From: di.wang [mailto:di.wang at oracle.com]
>>>> Sent: 01 June 2010 12:50 PM
>>>> To: Robert Read
>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>>> 
>>>> Hello,
>>>> 
>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>>> (probably 2 is enough).
>>>> Note: please make sure these 2 clients did communicate the OST during
>>>> the IOR.
>>>> 
>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>> 
>>>> Robert
>>>> 
>>>> If there are available time slots for this test on Hyperion, who can
>>>> help to get these logs?
>>>> 
>>>> Thanks
>>>> Wangdi
>>>> 
>>>> Robert Read wrote:
>>>> 
>>>>> What should I run then? Do have scripts to capture this?
>>>>> 
>>>>> robert
>>>>> 
>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>>> 
>>>>> 
>>>>>> Hi Robert,
>>>>>> 
>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>>> 
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>>> 
>>>>>> 
>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> Hello, Michael
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>>> counter
>>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>>> parallel).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>>> I will see what I can do here.
>>>>>>>> 
>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>>> 
>>>>>>> robert
>>>>>>> 
>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> WangDi
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Lustre-devel mailing list
>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Michael Kluge, M.Sc.
>>>>>> 
>>>>>> Technische Universit?t Dresden
>>>>>> Center for Information Services and
>>>>>> High Performance Computing (ZIH)
>>>>>> D-01062 Dresden
>>>>>> Germany
>>>>>> 
>>>>>> Contact:
>>>>>> Willersbau, Room WIL A 208
>>>>>> Phone:  (+49) 351 463-34217
>>>>>> Fax:    (+49) 351 463-37773
>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>> 
>>>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 


-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100601/557b4f54/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-01 19:39                                           ` Michael Kluge
@ 2010-06-16  8:46                                             ` Michael Kluge
  2010-06-16 14:50                                               ` Andreas Dilger
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-06-16  8:46 UTC (permalink / raw)
  To: lustre-devel

Hi Robert,

thanks for the trace files. Every single file contains log entries like:

line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback())               incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296

Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?


Michael

Am 01.06.2010 um 21:39 schrieb Michael Kluge:

>> On 2010-06-01, at 06:12, di.wang wrote:
>>> Michael, I do not think you need all the trace logs from the clients. right?
>> 
>> Actually, I think he does want trace logs from all of the clients.
> 
> Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.
> 
> 
> Michael
> 
>>> I had thought the current target is to make sure vampire can handle
>>> the trace from "big" enough clusters.
>>> 
>>> Actually, I am not sure whether vampire(with this trace analyse) should 
>>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
>>> 
>>> Or I miss sth here? Michale, could you please explain a bit? What is 
>>> your idea how vampire could help the end users? or the target here is 
>>> just helping the developers and sysadmin to understand the system?
>> 
>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
>> 
>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
>> 
>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
>> 
>>>>> -----Original Message-----
>>>>> From: di.wang [mailto:di.wang at oracle.com]
>>>>> Sent: 01 June 2010 12:50 PM
>>>>> To: Robert Read
>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>>>> (probably 2 is enough).
>>>>> Note: please make sure these 2 clients did communicate the OST during
>>>>> the IOR.
>>>>> 
>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>> 
>>>>> Robert
>>>>> 
>>>>> If there are available time slots for this test on Hyperion, who can
>>>>> help to get these logs?
>>>>> 
>>>>> Thanks
>>>>> Wangdi
>>>>> 
>>>>> Robert Read wrote:
>>>>> 
>>>>>> What should I run then? Do have scripts to capture this?
>>>>>> 
>>>>>> robert
>>>>>> 
>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>>>> 
>>>>>> 
>>>>>>> Hi Robert,
>>>>>>> 
>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>>>> 
>>>>>>> 
>>>>>>> Michael
>>>>>>> 
>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Hello, Michael
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>>>> counter
>>>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>>>> parallel).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>>>> I will see what I can do here.
>>>>>>>>> 
>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>>>> 
>>>>>>>> robert
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> WangDi
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Lustre-devel mailing list
>>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Michael Kluge, M.Sc.
>>>>>>> 
>>>>>>> Technische Universit?t Dresden
>>>>>>> Center for Information Services and
>>>>>>> High Performance Computing (ZIH)
>>>>>>> D-01062 Dresden
>>>>>>> Germany
>>>>>>> 
>>>>>>> Contact:
>>>>>>> Willersbau, Room WIL A 208
>>>>>>> Phone:  (+49) 351 463-34217
>>>>>>> Fax:    (+49) 351 463-37773
>>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
>>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>> 
>>> 
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
> 
> 
> -- 
> 
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100616/4755505c/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-16  8:46                                             ` Michael Kluge
@ 2010-06-16 14:50                                               ` Andreas Dilger
  2010-06-17 14:02                                                 ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Andreas Dilger @ 2010-06-16 14:50 UTC (permalink / raw)
  To: lustre-devel

On 2010-06-16, at 02:46, Michael Kluge wrote:
> thanks for the trace files. Every single file contains log entries like:
> 
> line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback())               incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296
> 
> Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?

Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs.  We don't have a test system with 684 servers...

> Am 01.06.2010 um 21:39 schrieb Michael Kluge:
> 
>>> On 2010-06-01, at 06:12, di.wang wrote:
>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>> 
>>> Actually, I think he does want trace logs from all of the clients.
>> 
>> Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.
>> 
>> 
>> Michael
>> 
>>>> I had thought the current target is to make sure vampire can handle
>>>> the trace from "big" enough clusters.
>>>> 
>>>> Actually, I am not sure whether vampire(with this trace analyse) should 
>>>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
>>>> 
>>>> Or I miss sth here? Michale, could you please explain a bit? What is 
>>>> your idea how vampire could help the end users? or the target here is 
>>>> just helping the developers and sysadmin to understand the system?
>>> 
>>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
>>> 
>>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
>>> 
>>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
>>> 
>>>>>> -----Original Message-----
>>>>>> From: di.wang [mailto:di.wang at oracle.com]
>>>>>> Sent: 01 June 2010 12:50 PM
>>>>>> To: Robert Read
>>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>>>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>>>>> (probably 2 is enough).
>>>>>> Note: please make sure these 2 clients did communicate the OST during
>>>>>> the IOR.
>>>>>> 
>>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>>> 
>>>>>> Robert
>>>>>> 
>>>>>> If there are available time slots for this test on Hyperion, who can
>>>>>> help to get these logs?
>>>>>> 
>>>>>> Thanks
>>>>>> Wangdi
>>>>>> 
>>>>>> Robert Read wrote:
>>>>>> 
>>>>>>> What should I run then? Do have scripts to capture this?
>>>>>>> 
>>>>>>> robert
>>>>>>> 
>>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi Robert,
>>>>>>>> 
>>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Michael
>>>>>>>> 
>>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Hello, Michael
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>>>>> counter
>>>>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>>>>> parallel).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>>>>> I will see what I can do here.
>>>>>>>>>> 
>>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>>>>> 
>>>>>>>>> robert
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> WangDi
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Lustre-devel mailing list
>>>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
>>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Michael Kluge, M.Sc.
>>>>>>>> 
>>>>>>>> Technische Universit?t Dresden
>>>>>>>> Center for Information Services and
>>>>>>>> High Performance Computing (ZIH)
>>>>>>>> D-01062 Dresden
>>>>>>>> Germany
>>>>>>>> 
>>>>>>>> Contact:
>>>>>>>> Willersbau, Room WIL A 208
>>>>>>>> Phone:  (+49) 351 463-34217
>>>>>>>> Fax:    (+49) 351 463-37773
>>>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
>>>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Lustre-devel mailing list
>>>>> Lustre-devel at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>>> 
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Technical Lead
>>> Oracle Corporation Canada Inc.
>>> 
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>> 
>> 
>> -- 
>> 
>> Michael Kluge, M.Sc.
>> 
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>> 
>> Contact:
>> Willersbau, Room WIL A 208
>> Phone:  (+49) 351 463-34217
>> Fax:    (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de
>> WWW:    http://www.tu-dresden.de/zih
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
> -- 
> 
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
> 


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-16 14:50                                               ` Andreas Dilger
@ 2010-06-17 14:02                                                 ` Michael Kluge
       [not found]                                                   ` <4169315E-9A94-4430-8970-92068222EF15@oracle.com>
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-06-17 14:02 UTC (permalink / raw)
  To: lustre-devel

Hmm. Ok. Is there a way to figure out what type of file I have if I have
nothing like the (most probably incomplete) file itself? Whether it is a
client or a server log and which NID the node has had? Right now the
code is executing a not-so-easy mapping algorithm over all files to
figure this out. The way to distinguish between client and server was to
look at those lines which I expected to see only on the server (the
"incoming req at .....").

Michael

Am Mittwoch, den 16.06.2010, 08:50 -0600 schrieb Andreas Dilger: 
> On 2010-06-16, at 02:46, Michael Kluge wrote:
> > thanks for the trace files. Every single file contains log entries like:
> > 
> > line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback())               incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296
> > 
> > Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?
> 
> Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs.  We don't have a test system with 684 servers...
> 
> > Am 01.06.2010 um 21:39 schrieb Michael Kluge:
> > 
> >>> On 2010-06-01, at 06:12, di.wang wrote:
> >>>> Michael, I do not think you need all the trace logs from the clients. right?
> >>> 
> >>> Actually, I think he does want trace logs from all of the clients.
> >> 
> >> Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.
> >> 
> >> 
> >> Michael
> >> 
> >>>> I had thought the current target is to make sure vampire can handle
> >>>> the trace from "big" enough clusters.
> >>>> 
> >>>> Actually, I am not sure whether vampire(with this trace analyse) should 
> >>>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
> >>>> 
> >>>> Or I miss sth here? Michale, could you please explain a bit? What is 
> >>>> your idea how vampire could help the end users? or the target here is 
> >>>> just helping the developers and sysadmin to understand the system?
> >>> 
> >>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
> >>> 
> >>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
> >>> 
> >>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
> >>> 
> >>>>>> -----Original Message-----
> >>>>>> From: di.wang [mailto:di.wang at oracle.com]
> >>>>>> Sent: 01 June 2010 12:50 PM
> >>>>>> To: Robert Read
> >>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
> >>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
> >>>>>> 
> >>>>>> Hello,
> >>>>>> 
> >>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
> >>>>>> log(probably only enable rpctrace) from 1 OST and some of clients
> >>>>>> (probably 2 is enough).
> >>>>>> Note: please make sure these 2 clients did communicate the OST during
> >>>>>> the IOR.
> >>>>>> 
> >>>>>> Michael, I do not think you need all the trace logs from the clients. right?
> >>>>>> 
> >>>>>> Robert
> >>>>>> 
> >>>>>> If there are available time slots for this test on Hyperion, who can
> >>>>>> help to get these logs?
> >>>>>> 
> >>>>>> Thanks
> >>>>>> Wangdi
> >>>>>> 
> >>>>>> Robert Read wrote:
> >>>>>> 
> >>>>>>> What should I run then? Do have scripts to capture this?
> >>>>>>> 
> >>>>>>> robert
> >>>>>>> 
> >>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> Hi Robert,
> >>>>>>>> 
> >>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Michael
> >>>>>>>> 
> >>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> Hello, Michael
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
> >>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
> >>>>>>>>>>> now called "performance radar". It can produce a heat map for a
> >>>>>>>>>>> counter
> >>>>>>>>>>> and does some other things as well. I could send a picture around, but
> >>>>>>>>>>> need at first an bigger trace (more hosts generating traces in
> >>>>>>>>>>> parallel).
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> Right now I do not have big clusters available to generate the trace.
> >>>>>>>>>> I will see what I can do here.
> >>>>>>>>>> 
> >>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
> >>>>>>>>> 
> >>>>>>>>> robert
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> Thanks
> >>>>>>>>>> WangDi
> >>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>> ------------------------------------------------------------------------
> >>>>>>>>>>> 
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> Lustre-devel mailing list
> >>>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
> >>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>> --
> >>>>>>>> 
> >>>>>>>> Michael Kluge, M.Sc.
> >>>>>>>> 
> >>>>>>>> Technische Universit?t Dresden
> >>>>>>>> Center for Information Services and
> >>>>>>>> High Performance Computing (ZIH)
> >>>>>>>> D-01062 Dresden
> >>>>>>>> Germany
> >>>>>>>> 
> >>>>>>>> Contact:
> >>>>>>>> Willersbau, Room WIL A 208
> >>>>>>>> Phone:  (+49) 351 463-34217
> >>>>>>>> Fax:    (+49) 351 463-37773
> >>>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge@tu-dresden.de>
> >>>>>>>> WWW:    http://www.tu-dresden.de/zih
> >>>>>>>> 
> >>>>>>>> 
> >>>>> 
> >>>>> 
> >>>>> _______________________________________________
> >>>>> Lustre-devel mailing list
> >>>>> Lustre-devel at lists.lustre.org
> >>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> Lustre-devel mailing list
> >>>> Lustre-devel at lists.lustre.org
> >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>> 
> >>> 
> >>> Cheers, Andreas
> >>> --
> >>> Andreas Dilger
> >>> Lustre Technical Lead
> >>> Oracle Corporation Canada Inc.
> >>> 
> >>> _______________________________________________
> >>> Lustre-devel mailing list
> >>> Lustre-devel at lists.lustre.org
> >>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>> 
> >> 
> >> 
> >> -- 
> >> 
> >> Michael Kluge, M.Sc.
> >> 
> >> Technische Universit?t Dresden
> >> Center for Information Services and
> >> High Performance Computing (ZIH)
> >> D-01062 Dresden
> >> Germany
> >> 
> >> Contact:
> >> Willersbau, Room WIL A 208
> >> Phone:  (+49) 351 463-34217
> >> Fax:    (+49) 351 463-37773
> >> e-mail: michael.kluge at tu-dresden.de
> >> WWW:    http://www.tu-dresden.de/zih
> >> 
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> > 
> > 
> > -- 
> > 
> > Michael Kluge, M.Sc.
> > 
> > Technische Universit?t Dresden
> > Center for Information Services and
> > High Performance Computing (ZIH)
> > D-01062 Dresden
> > Germany
> > 
> > Contact:
> > Willersbau, Room WIL A 208
> > Phone:  (+49) 351 463-34217
> > Fax:    (+49) 351 463-37773
> > e-mail: michael.kluge at tu-dresden.de
> > WWW:    http://www.tu-dresden.de/zih
> > 
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
> 
> 

-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100617/5402077b/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]                                                   ` <4169315E-9A94-4430-8970-92068222EF15@oracle.com>
@ 2010-06-20 20:44                                                     ` Michael Kluge
  2010-06-22 15:12                                                       ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-06-20 20:44 UTC (permalink / raw)
  To: lustre-devel

OK, I found time do do programming again. There are still some issues, 
but I got a first screenshot with the data from Roberts log.

Michael

Am 17.06.2010 16:35, schrieb Robert Read:
> I don't have the nids handy (I'll try to get that later today), but these are the server logs:
>
> ior-634/hyperion-sun-ost6.ior.log
> ior-634/hyperion-sun-ost7.ior.log
> ior-634/hyperion-sun-ost5.ior.log
> ior-634/hyperion-mds7.ior.log
>
> robert
>
> On Jun 17, 2010, at 7:02 , Michael Kluge wrote:
>
>> Hmm. Ok. Is there a way to figure out what type of file I have if I have
>> nothing like the (most probably incomplete) file itself? Whether it is a
>> client or a server log and which NID the node has had? Right now the
>> code is executing a not-so-easy mapping algorithm over all files to
>> figure this out. The way to distinguish between client and server was to
>> look at those lines which I expected to see only on the server (the
>> "incoming req at .....").
>>
>> Michael
>>
>> Am Mittwoch, den 16.06.2010, 08:50 -0600 schrieb Andreas Dilger:
>>> On 2010-06-16, at 02:46, Michael Kluge wrote:
>>>> thanks for the trace files. Every single file contains log entries like:
>>>>
>>>> line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback())               incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296
>>>>
>>>> Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?
>>>
>>> Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs.  We don't have a test system with 684 servers...
>>>
>>>> Am 01.06.2010 um 21:39 schrieb Michael Kluge:
>>>>
>>>>>> On 2010-06-01, at 06:12, di.wang wrote:
>>>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>>>
>>>>>> Actually, I think he does want trace logs from all of the clients.
>>>>>
>>>>> Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.
>>>>>
>>>>>
>>>>> Michael
>>>>>
>>>>>>> I had thought the current target is to make sure vampire can handle
>>>>>>> the trace from "big" enough clusters.
>>>>>>>
>>>>>>> Actually, I am not sure whether vampire(with this trace analyse) should
>>>>>>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
>>>>>>>
>>>>>>> Or I miss sth here? Michale, could you please explain a bit? What is
>>>>>>> your idea how vampire could help the end users? or the target here is
>>>>>>> just helping the developers and sysadmin to understand the system?
>>>>>>
>>>>>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
>>>>>>
>>>>>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
>>>>>>
>>>>>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: di.wang [mailto:di.wang at oracle.com]
>>>>>>>>> Sent: 01 June 2010 12:50 PM
>>>>>>>>> To: Robert Read
>>>>>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>>>>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>>>>>>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>>>>>>>> (probably 2 is enough).
>>>>>>>>> Note: please make sure these 2 clients did communicate the OST during
>>>>>>>>> the IOR.
>>>>>>>>>
>>>>>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>>>>>>
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>> If there are available time slots for this test on Hyperion, who can
>>>>>>>>> help to get these logs?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Wangdi
>>>>>>>>>
>>>>>>>>> Robert Read wrote:
>>>>>>>>>
>>>>>>>>>> What should I run then? Do have scripts to capture this?
>>>>>>>>>>
>>>>>>>>>> robert
>>>>>>>>>>
>>>>>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi Robert,
>>>>>>>>>>>
>>>>>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Michael
>>>>>>>>>>>
>>>>>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, Michael
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>>>>>>>> counter
>>>>>>>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>>>>>>>> parallel).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>>>>>>>> I will see what I can do here.
>>>>>>>>>>>>>
>>>>>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>>>>>>>>
>>>>>>>>>>>> robert
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> WangDi
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Lustre-devel mailing list
>>>>>>>>>>>>>> Lustre-devel at lists.lustre.org<mailto:Lustre-devel@lists.lustre.org>
>>>>>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Michael Kluge, M.Sc.
>>>>>>>>>>>
>>>>>>>>>>> Technische Universit?t Dresden
>>>>>>>>>>> Center for Information Services and
>>>>>>>>>>> High Performance Computing (ZIH)
>>>>>>>>>>> D-01062 Dresden
>>>>>>>>>>> Germany
>>>>>>>>>>>
>>>>>>>>>>> Contact:
>>>>>>>>>>> Willersbau, Room WIL A 208
>>>>>>>>>>> Phone:  (+49) 351 463-34217
>>>>>>>>>>> Fax:    (+49) 351 463-37773
>>>>>>>>>>> e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge@tu-dresden.de>
>>>>>>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Lustre-devel mailing list
>>>>>>>> Lustre-devel at lists.lustre.org
>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lustre-devel mailing list
>>>>>>> Lustre-devel at lists.lustre.org
>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>
>>>>>>
>>>>>> Cheers, Andreas
>>>>>> --
>>>>>> Andreas Dilger
>>>>>> Lustre Technical Lead
>>>>>> Oracle Corporation Canada Inc.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Lustre-devel mailing list
>>>>>> Lustre-devel at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Michael Kluge, M.Sc.
>>>>>
>>>>> Technische Universit?t Dresden
>>>>> Center for Information Services and
>>>>> High Performance Computing (ZIH)
>>>>> D-01062 Dresden
>>>>> Germany
>>>>>
>>>>> Contact:
>>>>> Willersbau, Room WIL A 208
>>>>> Phone:  (+49) 351 463-34217
>>>>> Fax:    (+49) 351 463-37773
>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>
>>>>> _______________________________________________
>>>>> Lustre-devel mailing list
>>>>> Lustre-devel at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>
>>>>
>>>> --
>>>>
>>>> Michael Kluge, M.Sc.
>>>>
>>>> Technische Universit?t Dresden
>>>> Center for Information Services and
>>>> High Performance Computing (ZIH)
>>>> D-01062 Dresden
>>>> Germany
>>>>
>>>> Contact:
>>>> Willersbau, Room WIL A 208
>>>> Phone:  (+49) 351 463-34217
>>>> Fax:    (+49) 351 463-37773
>>>> e-mail: michael.kluge at tu-dresden.de
>>>> WWW:    http://www.tu-dresden.de/zih
>>>>
>>>
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Technical Lead
>>> Oracle Corporation Canada Inc.
>>>
>>>
>>
>> --
>>
>> Michael Kluge, M.Sc.
>>
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>>
>> Contact:
>> Willersbau, Room A 208
>> Phone:  (+49) 351 463-34217
>> Fax:    (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de
>> WWW:    http://www.tu-dresden.de/zih
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
>


-- 
Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre2.png
Type: image/png
Size: 28695 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100620/81293fa9/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre1.png
Type: image/png
Size: 63875 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100620/81293fa9/attachment-0001.png>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-20 20:44                                                     ` Michael Kluge
@ 2010-06-22 15:12                                                       ` Michael Kluge
  2010-06-23 10:29                                                         ` Alexey Lyashkov
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-06-22 15:12 UTC (permalink / raw)
  To: lustre-devel

The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.

Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?


Michael








Am 20.06.2010 um 22:44 schrieb Michael Kluge:

> OK, I found time do do programming again. There are still some issues, but I got a first screenshot with the data from Roberts log.
> 
> Michael
> 
> Am 17.06.2010 16:35, schrieb Robert Read:
>> I don't have the nids handy (I'll try to get that later today), but these are the server logs:
>> 
>> ior-634/hyperion-sun-ost6.ior.log
>> ior-634/hyperion-sun-ost7.ior.log
>> ior-634/hyperion-sun-ost5.ior.log
>> ior-634/hyperion-mds7.ior.log
>> 
>> robert
>> 
>> On Jun 17, 2010, at 7:02 , Michael Kluge wrote:
>> 
>>> Hmm. Ok. Is there a way to figure out what type of file I have if I have
>>> nothing like the (most probably incomplete) file itself? Whether it is a
>>> client or a server log and which NID the node has had? Right now the
>>> code is executing a not-so-easy mapping algorithm over all files to
>>> figure this out. The way to distinguish between client and server was to
>>> look at those lines which I expected to see only on the server (the
>>> "incoming req at .....").
>>> 
>>> Michael
>>> 
>>> Am Mittwoch, den 16.06.2010, 08:50 -0600 schrieb Andreas Dilger:
>>>> On 2010-06-16, at 02:46, Michael Kluge wrote:
>>>>> thanks for the trace files. Every single file contains log entries like:
>>>>> 
>>>>> line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback())               incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296
>>>>> 
>>>>> Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?
>>>> 
>>>> Clients can receive requests from the server for things like lock cancellations, so this line will appear in both client and server logs.  We don't have a test system with 684 servers...
>>>> 
>>>>> Am 01.06.2010 um 21:39 schrieb Michael Kluge:
>>>>> 
>>>>>>> On 2010-06-01, at 06:12, di.wang wrote:
>>>>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>>>> 
>>>>>>> Actually, I think he does want trace logs from all of the clients.
>>>>>> 
>>>>>> Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.
>>>>>> 
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>>>> I had thought the current target is to make sure vampire can handle
>>>>>>>> the trace from "big" enough clusters.
>>>>>>>> 
>>>>>>>> Actually, I am not sure whether vampire(with this trace analyse) should
>>>>>>>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
>>>>>>>> 
>>>>>>>> Or I miss sth here? Michale, could you please explain a bit? What is
>>>>>>>> your idea how vampire could help the end users? or the target here is
>>>>>>>> just helping the developers and sysadmin to understand the system?
>>>>>>> 
>>>>>>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
>>>>>>> 
>>>>>>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
>>>>>>> 
>>>>>>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: di.wang [mailto:di.wang at oracle.com]
>>>>>>>>>> Sent: 01 June 2010 12:50 PM
>>>>>>>>>> To: Robert Read
>>>>>>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>>>>>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>>>>>>>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>>>>>>>>> (probably 2 is enough).
>>>>>>>>>> Note: please make sure these 2 clients did communicate the OST during
>>>>>>>>>> the IOR.
>>>>>>>>>> 
>>>>>>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>>>>>>> 
>>>>>>>>>> Robert
>>>>>>>>>> 
>>>>>>>>>> If there are available time slots for this test on Hyperion, who can
>>>>>>>>>> help to get these logs?
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Wangdi
>>>>>>>>>> 
>>>>>>>>>> Robert Read wrote:
>>>>>>>>>> 
>>>>>>>>>>> What should I run then? Do have scripts to capture this?
>>>>>>>>>>> 
>>>>>>>>>>> robert
>>>>>>>>>>> 
>>>>>>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Robert,
>>>>>>>>>>>> 
>>>>>>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Michael
>>>>>>>>>>>> 
>>>>>>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hello, Michael
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>>>>>>>>> counter
>>>>>>>>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>>>>>>>>> parallel).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>>>>>>>>> I will see what I can do here.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> robert
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> WangDi
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Lustre-devel mailing list
>>>>>>>>>>>>>>> Lustre-devel at lists.lustre.org<mailto:Lustre-devel@lists.lustre.org>
>>>>>>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> Michael Kluge, M.Sc.
>>>>>>>>>>>> 
>>>>>>>>>>>> Technische Universit?t Dresden
>>>>>>>>>>>> Center for Information Services and
>>>>>>>>>>>> High Performance Computing (ZIH)
>>>>>>>>>>>> D-01062 Dresden
>>>>>>>>>>>> Germany
>>>>>>>>>>>> 
>>>>>>>>>>>> Contact:
>>>>>>>>>>>> Willersbau, Room WIL A 208
>>>>>>>>>>>> Phone:  (+49) 351 463-34217
>>>>>>>>>>>> Fax:    (+49) 351 463-37773
>>>>>>>>>>>> e-mail: michael.kluge at tu-dresden.de<mailto:michael.kluge@tu-dresden.de>
>>>>>>>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Lustre-devel mailing list
>>>>>>>>> Lustre-devel at lists.lustre.org
>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Lustre-devel mailing list
>>>>>>>> Lustre-devel at lists.lustre.org
>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>> 
>>>>>>> 
>>>>>>> Cheers, Andreas
>>>>>>> --
>>>>>>> Andreas Dilger
>>>>>>> Lustre Technical Lead
>>>>>>> Oracle Corporation Canada Inc.
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Lustre-devel mailing list
>>>>>>> Lustre-devel at lists.lustre.org
>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Michael Kluge, M.Sc.
>>>>>> 
>>>>>> Technische Universit?t Dresden
>>>>>> Center for Information Services and
>>>>>> High Performance Computing (ZIH)
>>>>>> D-01062 Dresden
>>>>>> Germany
>>>>>> 
>>>>>> Contact:
>>>>>> Willersbau, Room WIL A 208
>>>>>> Phone:  (+49) 351 463-34217
>>>>>> Fax:    (+49) 351 463-37773
>>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Lustre-devel mailing list
>>>>>> Lustre-devel at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Michael Kluge, M.Sc.
>>>>> 
>>>>> Technische Universit?t Dresden
>>>>> Center for Information Services and
>>>>> High Performance Computing (ZIH)
>>>>> D-01062 Dresden
>>>>> Germany
>>>>> 
>>>>> Contact:
>>>>> Willersbau, Room WIL A 208
>>>>> Phone:  (+49) 351 463-34217
>>>>> Fax:    (+49) 351 463-37773
>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>> 
>>>> 
>>>> 
>>>> Cheers, Andreas
>>>> --
>>>> Andreas Dilger
>>>> Lustre Technical Lead
>>>> Oracle Corporation Canada Inc.
>>>> 
>>>> 
>>> 
>>> --
>>> 
>>> Michael Kluge, M.Sc.
>>> 
>>> Technische Universit?t Dresden
>>> Center for Information Services and
>>> High Performance Computing (ZIH)
>>> D-01062 Dresden
>>> Germany
>>> 
>>> Contact:
>>> Willersbau, Room A 208
>>> Phone:  (+49) 351 463-34217
>>> Fax:    (+49) 351 463-37773
>>> e-mail: michael.kluge at tu-dresden.de
>>> WWW:    http://www.tu-dresden.de/zih
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
>> 
> 
> 
> -- 
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
> <lustre2.png><lustre1.png>_______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100622/644c5c4f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre3.png
Type: image/png
Size: 153070 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100622/644c5c4f/attachment.png>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-22 15:12                                                       ` Michael Kluge
@ 2010-06-23 10:29                                                         ` Alexey Lyashkov
  2010-06-23 11:50                                                           ` Michael Kluge
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey Lyashkov @ 2010-06-23 10:29 UTC (permalink / raw)
  To: lustre-devel

I think better to use client UUID instead of NID as client identification. Because in your's case - you can't separate info from two clients which run on same node.


On Jun 22, 2010, at 19:12, Michael Kluge wrote:

> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.
> 
> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?
> 
> 
> Michael
> 
> 
> 
> <lustre3.png>
> 
> 
> 
> 
> http://lists.lustre.org/mailman/listinfo/lustre-devel



--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100623/f04b9bb3/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-23 10:29                                                         ` Alexey Lyashkov
@ 2010-06-23 11:50                                                           ` Michael Kluge
  2010-06-23 12:09                                                             ` Alexey Lyashkov
  2010-06-23 15:55                                                             ` Andreas Dilger
  0 siblings, 2 replies; 38+ messages in thread
From: Michael Kluge @ 2010-06-23 11:50 UTC (permalink / raw)
  To: lustre-devel

Hi Alexey,

thanks  for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for?


Michael

Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov:

> I think better to use client UUID instead of NID as client identification. Because in your's case - you can't separate info from two clients which run on same node.
> 
> 
> On Jun 22, 2010, at 19:12, Michael Kluge wrote:
> 
>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.
>> 
>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?
>> 
>> 
>> Michael
>> 
>> 
>> 
>> <lustre3.png>
>> 
>> 
>> 
>> 
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
> 
> --------------------------------------
> Alexey Lyashkov
> alexey.lyashkov at clusterstor.com
> 
> 
> 
> 


-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100623/c50279fa/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-23 11:50                                                           ` Michael Kluge
@ 2010-06-23 12:09                                                             ` Alexey Lyashkov
  2010-06-23 12:38                                                               ` Michael Kluge
  2010-06-23 15:55                                                             ` Andreas Dilger
  1 sibling, 1 reply; 38+ messages in thread
From: Alexey Lyashkov @ 2010-06-23 12:09 UTC (permalink / raw)
  To: lustre-devel

Hi Mchael,

that easy to get from logs.
>>
00000100:00100000:0:1197446488.301126:0:28197:0:(service.c:721:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pi
d:xid:nid:opc ll_ost_26:lustre-MDT0000-mdtlov_UUID+5:27309:266:12345-0 at lo:400
>>
so..
ll_ost_26 - thread name, which handle request,
lustre-MDT0000-mdtlov_UUID - client which send request.
+5 - number references to that request (unneed for you).
27309 - pid which handle request (or which send on client)
266 - xid - internal request id.
12345-0 - sender NID
@lo - LND used to deliver request = @lo -> loopback, tcp -> tcp, @o2ib,  - IB hardware, @ptl - cray portals.
400 - command, you can look to enums "*_cmd_t " to decode that command identifier (in that example - 400 OBD_PING)

as other suggestions fix scaling, at you example only few clients has 100 requests in flight, but legend on bottom show up to 510.


On Jun 23, 2010, at 15:50, Michael Kluge wrote:

> Hi Alexey,
> 
> thanks  for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for?
> 
> 
> Michael
> 
> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov:
> 
>> I think better to use client UUID instead of NID as client identification. Because in your's case - you can't separate info from two clients which run on same node.
>> 
>> 
>> On Jun 22, 2010, at 19:12, Michael Kluge wrote:
>> 
>>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.
>>> 
>>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?
>>> 
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> <lustre3.png>
>>> 
>>> 
>>> 
>>> 
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
>> 
>> 
>> --------------------------------------
>> Alexey Lyashkov
>> alexey.lyashkov at clusterstor.com
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100623/b973f159/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-23 12:09                                                             ` Alexey Lyashkov
@ 2010-06-23 12:38                                                               ` Michael Kluge
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Kluge @ 2010-06-23 12:38 UTC (permalink / raw)
  To: lustre-devel

> 00000100:00100000:0:1197446488.301126:0:28197:0:(service.c:721:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pi
> d:xid:nid:opc ll_ost_26:lustre-MDT0000-mdtlov_UUID+5:27309:266:12345-0 at lo:400
> >>
> so..
> ll_ost_26 - thread name, which handle request,
> lustre-MDT0000-mdtlov_UUID - client which send request.
> +5 - number references to that request (unneed for you).
> 27309 - pid which handle request (or which send on client)
> 266 - xid - internal request id.
> 12345-0 - sender NID
> @lo - LND used to deliver request = @lo -> loopback, tcp -> tcp, @o2ib,  - IB hardware, @ptl - cray portals.
> 400 - command, you can look to enums "*_cmd_t " to decode that command identifier (in that example - 400 OBD_PING)

OK, thanks. I'll need to think about this as it might make sense to have one line per thread and not per UUID? Or is there already a 1:1 mapping?

> as other suggestions fix scaling, at you example only few clients has 100 requests in flight, but legend on bottom show up to 510.

Yes, did that while you were writing the mail ;) Definitely looks much better.



Michael

> 
> 
> On Jun 23, 2010, at 15:50, Michael Kluge wrote:
> 
>> Hi Alexey,
>> 
>> thanks  for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for?
>> 
>> 
>> Michael
>> 
>> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov:
>> 
>>> I think better to use client UUID instead of NID as client identification. Because in your's case - you can't separate info from two clients which run on same node.
>>> 
>>> 
>>> On Jun 22, 2010, at 19:12, Michael Kluge wrote:
>>> 
>>>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.
>>>> 
>>>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?
>>>> 
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>>> <lustre3.png>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>>> 
>>> 
>>> --------------------------------------
>>> Alexey Lyashkov
>>> alexey.lyashkov at clusterstor.com
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> 
>> Michael Kluge, M.Sc.
>> 
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>> 
>> Contact:
>> Willersbau, Room WIL A 208
>> Phone:  (+49) 351 463-34217
>> Fax:    (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de
>> WWW:    http://www.tu-dresden.de/zih
>> 
> 


-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100623/c0c14636/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: client_rpcs_in_flight.jpg
Type: image/jpg
Size: 326747 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100623/c0c14636/attachment.jpg>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-23 11:50                                                           ` Michael Kluge
  2010-06-23 12:09                                                             ` Alexey Lyashkov
@ 2010-06-23 15:55                                                             ` Andreas Dilger
  2010-06-24  8:01                                                               ` Michael Kluge
  1 sibling, 1 reply; 38+ messages in thread
From: Andreas Dilger @ 2010-06-23 15:55 UTC (permalink / raw)
  To: lustre-devel

On 2010-06-23, at 05:50, Michael Kluge wrote:
> thanks  for the note. How do I do this if I only have a debug log? I see a field "cluuid+ref" in those lines of the log file that we are interested in. cluuid is probably "client uuid". This UUID is definitely associated to the node, that also owns the NID that appears in the same log line? What does "+ref" stands for?

While Alexey's comment is correct, in that there can be multiple Lustre client mounts on a single node, this is generally only used for testing.  For client connections the UUID is just a random value like "f4726b1e-f7eb-479f-b163-784ea45adf32", so using the NID is much more useful to the viewer in the vast majority of cases.

If you wanted to distinguish the multiple mounts from a single client it would be best to just do this internally by tracking both the NID and the UUID, but only printing the NID on the output.  For the rare case where requests have the same NID but a different UUID you could show this as "NID:2", "NID:3" or similar.  That preserves the distinction between client connections, while not making the output completely useless.

Even better than plain numeric NIDs would be to do IP->hostname conversion for the case of TCP and IB LNDs, if this works.

> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov:
> 
>> I think better to use client UUID instead of NID as client identification. Because in your's case - you can't separate info from two clients which run on same node.
>> 
>> 
>> On Jun 22, 2010, at 19:12, Michael Kluge wrote:
>> 
>>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.
>>> 
>>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?
>>> 
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> <lustre3.png>
>>> 
>>> 
>>> 
>>> 
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
>> 
>> 
>> --------------------------------------
>> Alexey Lyashkov
>> alexey.lyashkov at clusterstor.com
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> Michael Kluge, M.Sc.
> 
> Technische Universit?t Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-06-23 15:55                                                             ` Andreas Dilger
@ 2010-06-24  8:01                                                               ` Michael Kluge
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Kluge @ 2010-06-24  8:01 UTC (permalink / raw)
  To: lustre-devel

> While Alexey's comment is correct, in that there can be multiple Lustre client mounts on a single node, this is generally only used for testing.  For client connections the UUID is just a random value like "f4726b1e-f7eb-479f-b163-784ea45adf32", so using the NID is much more useful to the viewer in the vast majority of cases.

Well, what I can do with OTF and Vampir is to use a process hierarchy. So for each NID I can have a couple of UUID as 'child processes'. Like the way one looks at hybrid MPI & OpenMP programs. Where we have MPI processes and treat OpenMP threads as child processes.

> If you wanted to distinguish the multiple mounts from a single client it would be best to just do this internally by tracking both the NID and the UUID, but only printing the NID on the output.  For the rare case where requests have the same NID but a different UUID you could show this as "NID:2", "NID:3" or similar.  That preserves the distinction between client connections, while not making the output completely useless.

Yes, that fits into the statement above.

> Even better than plain numeric NIDs would be to do IP->hostname conversion for the case of TCP and IB LNDs, if this works.

Well, typically I only have the debug log. Which might be incomplete. I don't think I have something that can do this conversion reliably?


Michael


>> Am 23.06.2010 um 12:29 schrieb Alexey Lyashkov:
>> 
>>> I think better to use client UUID instead of NID as client identification. Because in your's case - you can't separate info from two clients which run on same node.
>>> 
>>> 
>>> On Jun 22, 2010, at 19:12, Michael Kluge wrote:
>>> 
>>>> The remaining problems of the counter calculations have been fixed. There is a screenshot attached showing some values. The code is in a gforge server that we operate here in Dresden (gforge.zih.tu-dresden.de). The converter runs on Linux and on MAC OS X and you need Vampir to take a look at the OTF trace file.
>>>> 
>>>> Eric, Galen, for the moment I think I am done with the stuff I promised you at LUG this year? Are there any more ideas?
>>>> 
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>>> <lustre3.png>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>>> 
>>> 
>>> --------------------------------------
>>> Alexey Lyashkov
>>> alexey.lyashkov at clusterstor.com
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> 
>> Michael Kluge, M.Sc.
>> 
>> Technische Universit?t Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>> 
>> Contact:
>> Willersbau, Room WIL A 208
>> Phone:  (+49) 351 463-34217
>> Fax:    (+49) 351 463-37773
>> e-mail: michael.kluge at tu-dresden.de
>> WWW:    http://www.tu-dresden.de/zih
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
> 
> 


-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100624/77e1f513/attachment.htm>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
       [not found]                             ` <C671351E-110C-4D2C-B216-4E8BE23A943A@oracle.com>
       [not found]                               ` <1FF3D25F-3369-462E-9651-62D56319612A@tu-dresden.de>
@ 2010-09-22 13:46                               ` Michael Kluge
  2010-09-22 18:28                                 ` Andreas Dilger
  1 sibling, 1 reply; 38+ messages in thread
From: Michael Kluge @ 2010-09-22 13:46 UTC (permalink / raw)
  To: lustre-devel

Hi Robert,

could please sent me the command how to generate such a log? 

lctl debug_daemon start /tmp/$HOSTNAME
sysctl -w lnet.debug=+rpctrace

Still produces some kind of binary format (at least with my 1.8.3) which
looks different from what you sent me.


Regards, Michael

> > Right now I do not have big clusters available to generate the trace.
> > I will see what I can do here.
> 
> If ~600 clients is big enough we could generate that on Hyperion.
> 
> robert
> 
> > 
> > Thanks
> > WangDi
> >> 
> >> 
> >>  
> >>>> 
> >>>>      
> >>>    
> >> 
> >>  ------------------------------------------------------------------------
> >> 
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>  
> > 
> 
> 

-- 

Michael Kluge, M.Sc.

Technische Universit?t Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100922/c7d27756/attachment.bin>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Lustre-devel] Lustre RPC visualization
  2010-09-22 13:46                               ` Michael Kluge
@ 2010-09-22 18:28                                 ` Andreas Dilger
  0 siblings, 0 replies; 38+ messages in thread
From: Andreas Dilger @ 2010-09-22 18:28 UTC (permalink / raw)
  To: lustre-devel

On 2010-09-22, at 06:46, Michael Kluge wrote:
> could please sent me the command how to generate such a log? 
> 
> lctl debug_daemon start /tmp/$HOSTNAME
> sysctl -w lnet.debug=+rpctrace
> 
> Still produces some kind of binary format (at least with my 1.8.3) which
> looks different from what you sent me.

There are a few different ways to generate a Lustre debug log:
- "lctl dk [filename]" (short for "lctl debug_kernel [filename]") will
  dump the current debug buffer to [filename], or stdout if unspecified.
  It will decode the binary debug log into ASCII at this time
- "lctl debug_daemon {filename}" starts the binary kernel debug log to be
  written to {filename} as the debug pages are filled
- in case of LBUG/LASSERT it also dumps the binary debug log to the file
  /tmp/lustre.log.{timestamp} (unless specified otherwise)

In the last two cases, it is necessary to run "lctl debug_file {infile} [outfile]" decodes binary the input file to the ASCII output file.  This should preferably be run on the system where the binary file was generated, though generally the binary format is only changing between systems of different endianness.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2010-09-22 18:28 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <000c01cae6ee$1d4693d0$57d3bb70$@barton@oracle.com>
2010-04-29  1:25 ` [Lustre-devel] (no subject) di.wang
2010-04-29  1:49   ` Andreas Dilger
2010-04-29  2:04     ` di.wang
2010-04-29  4:48   ` [Lustre-devel] Lustre RPC visualization Michael Kluge
     [not found]     ` <4BD9CF75.8030204@oracle.com>
2010-05-03  8:41       ` Michael Kluge
2010-05-03 13:20         ` Andreas Dilger
2010-05-03 18:10           ` Michael Kluge
2010-05-03 18:57             ` Robert Read
2010-05-03 18:58             ` di.wang
2010-05-03 19:32               ` Michael Kluge
2010-05-03 19:52                 ` di.wang
2010-05-03 20:04                   ` Michael Kluge
2010-05-16  9:29                   ` Michael Kluge
2010-05-16 13:12                     ` Eric Barton
2010-05-17  4:52                       ` Michael Kluge
2010-05-17  3:24                     ` Andrew Uselton
2010-05-17  5:53                       ` Michael Kluge
     [not found]                     ` <009101caf4f9$67e1dd50$37a597f0$%barton@oracle.com>
2010-05-17  3:39                       ` Shipman, Galen M.
2010-05-17  5:59                         ` Michael Kluge
2010-05-25 12:03                     ` Michael Kluge
     [not found]                       ` <4BFC7177.9000808@oracle.com>
2010-05-28 14:54                         ` Michael Kluge
     [not found]                           ` <4BFFA456.7030502@oracle.com>
     [not found]                             ` <C671351E-110C-4D2C-B216-4E8BE23A943A@oracle.com>
     [not found]                               ` <1FF3D25F-3369-462E-9651-62D56319612A@tu-dresden.de>
     [not found]                                 ` <D29ED098-3DEB-4AF4-AA68-B52B4E2BF5EA@oracle.com>
     [not found]                                   ` <4C04F3F0.9040708@oracle.com>
     [not found]                                     ` <001601cb01a3$546c93d0$fd45bb70$%barton@oracle.com>
2010-06-01 12:12                                       ` di.wang
2010-06-01 17:03                                         ` Andreas Dilger
2010-06-01 19:39                                           ` Michael Kluge
2010-06-16  8:46                                             ` Michael Kluge
2010-06-16 14:50                                               ` Andreas Dilger
2010-06-17 14:02                                                 ` Michael Kluge
     [not found]                                                   ` <4169315E-9A94-4430-8970-92068222EF15@oracle.com>
2010-06-20 20:44                                                     ` Michael Kluge
2010-06-22 15:12                                                       ` Michael Kluge
2010-06-23 10:29                                                         ` Alexey Lyashkov
2010-06-23 11:50                                                           ` Michael Kluge
2010-06-23 12:09                                                             ` Alexey Lyashkov
2010-06-23 12:38                                                               ` Michael Kluge
2010-06-23 15:55                                                             ` Andreas Dilger
2010-06-24  8:01                                                               ` Michael Kluge
2010-06-01 15:58                                     ` Eric Barton
2010-09-22 13:46                               ` Michael Kluge
2010-09-22 18:28                                 ` Andreas Dilger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.