linux-trace-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/7] NumPy Interface for KernelShark
@ 2019-03-27 16:03 Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments Yordan Karadzhov
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

NumPy is an efficient multi-dimensional container of generic data.
It uses strong typing in order to provide fast data processing in
Python. The NumPy interface will allow sophisticated analysis of
tracing data via scripts, but it also opens the door for
exposing the kernel tracing data to the instruments provided by
the scientific toolkit of Python (matplotlib, scikit-learn) or
maybe even PyTorch and TensorFlow in the future.

Disclaimer: I am not very good in Python. Please check as carefully
as possible :-)

Yordan Karadzhov (7):
  kernel-shark: kshark_string_config_alloc() must take no arguments
  kernel-shark: Add new dataloading method to be used by the NumPu
    interface
  kernel-shark: Prepare for building the NumPy interface
  kernel-shark: Add the core components of the NumPy API
  kernel-shark: Add Numpy Interface for processing of tracing data
  kernel-shark: Add automatic building of the NumPy interface
  kernel-shark: Add basic example demonstrating the NumPy interface

 kernel-shark/CMakeLists.txt                 |   3 +
 kernel-shark/README                         |  12 +-
 kernel-shark/bin/sched_wakeup.py            |  96 +++++++
 kernel-shark/build/py/libkshark_wrapper.pyx | 264 ++++++++++++++++++++
 kernel-shark/build/py/np_setup.py           |  87 +++++++
 kernel-shark/build/py/pybuild.sh            |  26 ++
 kernel-shark/src/CMakeLists.txt             |  39 +++
 kernel-shark/src/libkshark-configio.c       |   2 +-
 kernel-shark/src/libkshark-py.c             | 176 +++++++++++++
 kernel-shark/src/libkshark.c                | 128 ++++++++++
 kernel-shark/src/libkshark.h                |   9 +-
 11 files changed, 838 insertions(+), 4 deletions(-)
 create mode 100755 kernel-shark/bin/sched_wakeup.py
 create mode 100644 kernel-shark/build/py/libkshark_wrapper.pyx
 create mode 100644 kernel-shark/build/py/np_setup.py
 create mode 100755 kernel-shark/build/py/pybuild.sh
 create mode 100644 kernel-shark/src/libkshark-py.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 16:19   ` Steven Rostedt
  2019-03-27 16:03 ` [RFC 2/7] kernel-shark: Add new dataloading method to be used by the NumPu interface Yordan Karadzhov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

The function is not meant to have arguments, so it must explicitly
state this. A functions taking unspecified list of parameters will
be a problem for the Pythion interface.

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/src/libkshark-configio.c | 2 +-
 kernel-shark/src/libkshark.h          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel-shark/src/libkshark-configio.c b/kernel-shark/src/libkshark-configio.c
index 2bd5600..9106522 100644
--- a/kernel-shark/src/libkshark-configio.c
+++ b/kernel-shark/src/libkshark-configio.c
@@ -369,7 +369,7 @@ kshark_filter_config_new(enum kshark_config_formats format)
  * @returns kshark_config_doc instance on success, otherwise NULL. Use free()
  *	    to free the object.
  */
-struct kshark_config_doc *kshark_string_config_alloc()
+struct kshark_config_doc *kshark_string_config_alloc(void)
 {
 	return kshark_config_alloc(KS_CONFIG_STRING);
 }
diff --git a/kernel-shark/src/libkshark.h b/kernel-shark/src/libkshark.h
index a1b1f91..c218b61 100644
--- a/kernel-shark/src/libkshark.h
+++ b/kernel-shark/src/libkshark.h
@@ -541,7 +541,7 @@ kshark_record_config_new(enum kshark_config_formats);
 struct kshark_config_doc *
 kshark_filter_config_new(enum kshark_config_formats);
 
-struct kshark_config_doc *kshark_string_config_alloc();
+struct kshark_config_doc *kshark_string_config_alloc(void);
 
 bool kshark_type_check(struct kshark_config_doc *conf, const char *type);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 2/7] kernel-shark: Add new dataloading method to be used by the NumPu interface
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 23:41   ` Slavomir Kaslev
  2019-03-27 16:03 ` [RFC 3/7] kernel-shark: Prepare for building the NumPy interface Yordan Karadzhov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

The new function loads the content of the trace data file into a
table / matrix, made of columns / arrays of data having various integer
types. Later those arrays will be wrapped as NumPy arrays.

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/src/libkshark.c | 128 +++++++++++++++++++++++++++++++++++
 kernel-shark/src/libkshark.h |   7 ++
 2 files changed, 135 insertions(+)

diff --git a/kernel-shark/src/libkshark.c b/kernel-shark/src/libkshark.c
index a886f80..20b1594 100644
--- a/kernel-shark/src/libkshark.c
+++ b/kernel-shark/src/libkshark.c
@@ -959,6 +959,134 @@ ssize_t kshark_load_data_records(struct kshark_context *kshark_ctx,
 	return -ENOMEM;
 }
 
+static bool data_matrix_alloc(size_t n_rows, uint64_t **offset_array,
+					     uint8_t **cpu_array,
+					     uint64_t **ts_array,
+					     uint16_t **pid_array,
+					     int **event_array)
+{
+	if (offset_array) {
+		*offset_array = calloc(n_rows, sizeof(**offset_array));
+		if (!offset_array)
+			goto free_all;
+	}
+
+	if (cpu_array) {
+		*cpu_array = calloc(n_rows, sizeof(**cpu_array));
+		if (!cpu_array)
+			goto free_all;
+	}
+
+	if (ts_array) {
+		*ts_array = calloc(n_rows, sizeof(**ts_array));
+		if (!ts_array)
+			goto free_all;
+	}
+
+	if (pid_array) {
+		*pid_array = calloc(n_rows, sizeof(**pid_array));
+		if (!pid_array)
+			goto free_all;
+	}
+
+	if (event_array) {
+		*event_array = calloc(n_rows, sizeof(**event_array));
+		if (!event_array)
+			goto free_all;
+	}
+
+	return true;
+
+ free_all:
+	fprintf(stderr, "Failed to allocate memory during data loading.\n");
+
+	if (offset_array)
+		free(offset_array);
+
+	if (cpu_array)
+		free(cpu_array);
+
+	if (ts_array)
+		free(ts_array);
+
+	if (pid_array)
+		free(pid_array);
+
+	if (event_array)
+		free(event_array);
+
+	return false;
+}
+
+/**
+ * @brief Load the content of the trace data file into a table / matrix made
+ *	  of columns / arrays of data. The user is responsible for freeing the
+ *	  elements of the outputted array
+ *
+ * @param kshark_ctx: Input location for the session context pointer.
+ * @param offset_array: Output location for the array of record offsets.
+ * @param cpu_array: Output location for the array of CPU Ids.
+ * @param ts_array: Output location for the array of timestamps.
+ * @param pid_array: Output location for the array of Process Ids.
+ * @param event_array: Output location for the array of Event Ids.
+ *
+ * @returns The size of the outputted arrays in the case of success, or a
+ *	    negative error code on failure.
+ */
+size_t kshark_load_data_matrix(struct kshark_context *kshark_ctx,
+			       uint64_t **offset_array,
+			       uint8_t **cpu_array,
+			       uint64_t **ts_array,
+			       uint16_t **pid_array,
+			       int **event_array)
+{
+	enum rec_type type = REC_RECORD;
+	struct rec_list **rec_list;
+	struct tep_record *rec;
+	size_t count, total = 0;
+	bool status;
+	int n_cpus;
+
+	total = get_records(kshark_ctx, &rec_list, REC_RECORD);
+	if (total < 0)
+		goto fail;
+
+	status = data_matrix_alloc(total, offset_array,
+					  cpu_array,
+					  ts_array,
+					  pid_array,
+					  event_array);
+	if (!status)
+		goto fail;
+
+	n_cpus = tracecmd_cpus(kshark_ctx->handle);
+
+	for (count = 0; count < total; count++) {
+		int next_cpu;
+
+		next_cpu = pick_next_cpu(rec_list, n_cpus, type);
+		if (next_cpu >= 0) {
+			rec = rec_list[next_cpu]->rec;
+
+			(*offset_array)[count] = rec->offset;
+			(*cpu_array)[count] = rec->cpu;
+			(*ts_array)[count] = rec->ts;
+			(*pid_array)[count] = tep_data_pid(kshark_ctx->pevent, rec);
+			(*event_array)[count] = tep_data_type(kshark_ctx->pevent, rec);
+
+			rec_list[next_cpu] = rec_list[next_cpu]->next;
+		}
+	}
+
+	free_rec_list(rec_list, n_cpus, type);
+
+	return total;
+
+ fail:
+	fprintf(stderr, "Failed to allocate memory during data loading.\n");
+	return -ENOMEM;
+}
+
 static const char *kshark_get_latency(struct tep_handle *pe,
 				      struct tep_record *record)
 {
diff --git a/kernel-shark/src/libkshark.h b/kernel-shark/src/libkshark.h
index c218b61..92ade41 100644
--- a/kernel-shark/src/libkshark.h
+++ b/kernel-shark/src/libkshark.h
@@ -149,6 +149,13 @@ ssize_t kshark_load_data_entries(struct kshark_context *kshark_ctx,
 ssize_t kshark_load_data_records(struct kshark_context *kshark_ctx,
 				 struct tep_record ***data_rows);
 
+size_t kshark_load_data_matrix(struct kshark_context *kshark_ctx,
+			       uint64_t **offset_array,
+			       uint8_t **cpu_array,
+			       uint64_t **ts_array,
+			       uint16_t **pid_array,
+			       int **event_array);
+
 ssize_t kshark_get_task_pids(struct kshark_context *kshark_ctx, int **pids);
 
 void kshark_close(struct kshark_context *kshark_ctx);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 3/7] kernel-shark: Prepare for building the NumPy interface
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 2/7] kernel-shark: Add new dataloading method to be used by the NumPu interface Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 4/7] kernel-shark: Add the core components of the NumPy API Yordan Karadzhov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

This patch prepares the Cmake build infrastructure for the
introduction of a the NumPy interface.

We add building of a static version of the C API library to be used by
the interface. The NumPy interface itself will be added in the following
patches.

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/CMakeLists.txt     |  3 +++
 kernel-shark/README             | 12 ++++++++++--
 kernel-shark/src/CMakeLists.txt | 19 +++++++++++++++++++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/kernel-shark/CMakeLists.txt b/kernel-shark/CMakeLists.txt
index 20ced14..0c5d7a7 100644
--- a/kernel-shark/CMakeLists.txt
+++ b/kernel-shark/CMakeLists.txt
@@ -34,6 +34,9 @@ if (Qt5Widgets_FOUND)
 
 endif (Qt5Widgets_FOUND)
 
+find_package(PythonLibs)
+include(${KS_DIR}/build/FindNumPy.cmake)
+
 set(LIBRARY_OUTPUT_PATH    "${KS_DIR}/lib")
 set(EXECUTABLE_OUTPUT_PATH "${KS_DIR}/bin")
 
diff --git a/kernel-shark/README b/kernel-shark/README
index efc6748..5a723a4 100644
--- a/kernel-shark/README
+++ b/kernel-shark/README
@@ -12,7 +12,11 @@ KernelShark has the following external dependencies:
     sudo apt-get install freeglut3-dev libxmu-dev libxi-dev -y
     sudo apt-get install qtbase5-dev -y
 
-1.1 I you want to be able to generate Doxygen documentation:
+1.1 I you want to be able to use the NumPu Interface of KernelShark:
+    sudo apt-get install libpython-dev cython -y
+    sudo apt-get install python-numpy python-matplotlib -y
+
+1.2 I you want to be able to generate Doxygen documentation:
     sudo apt-get install graphviz doxygen-gui -y
 
 
@@ -21,7 +25,11 @@ KernelShark has the following external dependencies:
     dnf install freeglut-devel redhat-rpm-config -y
     dnf install qt5-qtbase-devel -y
 
-2.1 I you want to be able to generate Doxygen documentation:
+2.1 I you want to be able to use the NumPu Interface of KernelShark:
+    dnf install python2-devel python-Cython -y
+    dnf install python-numpy python3-matplotlib -y
+
+2.2 I you want to be able to generate Doxygen documentation:
     dnf install graphviz doxygen -y
 
 
diff --git a/kernel-shark/src/CMakeLists.txt b/kernel-shark/src/CMakeLists.txt
index 1e0a794..a3c6743 100644
--- a/kernel-shark/src/CMakeLists.txt
+++ b/kernel-shark/src/CMakeLists.txt
@@ -28,6 +28,25 @@ if (OPENGL_FOUND AND GLUT_FOUND)
 
 endif (OPENGL_FOUND AND GLUT_FOUND)
 
+if (PYTHONLIBS_FOUND AND CYTHON_FOUND AND NUMPY_FOUND)
+
+    message(STATUS "kshark_wrapper")
+
+    add_library(pykshark STATIC libkshark.c
+                                libkshark-model.c
+                                libkshark-plugin.c
+                                libkshark-configio.c
+                                libkshark-collection.c)
+
+    target_compile_options(pykshark PUBLIC "-fPIC")
+
+    target_link_libraries(pykshark ${TRACEEVENT_LIBRARY}
+                                   ${TRACECMD_LIBRARY}
+                                   ${CMAKE_DL_LIBS}
+                                   ${JSONC_LIBRARIES})
+
+endif (PYTHONLIBS_FOUND AND CYTHON_FOUND AND NUMPY_FOUND)
+
 if (Qt5Widgets_FOUND AND Qt5Network_FOUND)
 
     message(STATUS "libkshark-gui")
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 4/7] kernel-shark: Add the core components of the NumPy API
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
                   ` (2 preceding siblings ...)
  2019-03-27 16:03 ` [RFC 3/7] kernel-shark: Prepare for building the NumPy interface Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 22:53   ` Slavomir Kaslev
  2019-03-27 16:03 ` [RFC 5/7] kernel-shark: Add Numpy Interface for processing of tracing data Yordan Karadzhov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

The NumPy API is meant to operate on top of the C API-s of trace-cmd and
KernelShark and to provide only the minimum of basic functionalities needed
in order to processor the tracing data. The NumPy API itself is made of two
layers. A bottom-one written in C and a top-one which implements the
interface, written in Cython (C-Python). This patch introduces the C layer.

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/src/libkshark-py.c | 176 ++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 kernel-shark/src/libkshark-py.c

diff --git a/kernel-shark/src/libkshark-py.c b/kernel-shark/src/libkshark-py.c
new file mode 100644
index 0000000..4c589f3
--- /dev/null
+++ b/kernel-shark/src/libkshark-py.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: LGPL-2.1
+
+/*
+ * Copyright (C) 2019 VMware Inc, Yordan Karadzhov <y.karadz@gmail.com>
+ */
+
+ /**
+ *  @file    libpykshark-py.c
+ *  @brief   Python API for processing of FTRACE (trace-cmd) data.
+ */
+
+// KernelShark
+#include "libkshark.h"
+#include "libkshark-model.h"
+
+bool kspy_open(const char *fname)
+{
+	struct kshark_context *kshark_ctx = NULL;
+
+	if (!kshark_instance(&kshark_ctx))
+		return false;
+
+	return kshark_open(kshark_ctx, fname);
+}
+
+void kspy_close(void)
+{
+	struct kshark_context *kshark_ctx = NULL;
+
+	if (!kshark_instance(&kshark_ctx))
+		return;
+
+	kshark_close(kshark_ctx);
+	kshark_free(kshark_ctx);
+}
+
+static int compare(const void * a, const void * b)
+{
+  return ( *(int*)a - *(int*)b );
+}
+
+size_t kspy_get_tasks(int **pids, char ***names)
+{
+	struct kshark_context *kshark_ctx = NULL;
+	const char *comm;
+	ssize_t i, n;
+	int ret;
+
+	if (!kshark_instance(&kshark_ctx))
+		return 0;
+
+	n = kshark_get_task_pids(kshark_ctx, pids);
+	if (n == 0)
+		return 0;
+
+	qsort(*pids, n, sizeof(**pids), compare);
+
+	*names = calloc(n, sizeof(char*));
+	if (!(*names))
+		goto fail;
+
+	for (i = 0; i < n; ++i) {
+		comm = tep_data_comm_from_pid(kshark_ctx->pevent, (*pids)[i]);
+		ret = asprintf(&(*names)[i], "%s", comm);
+		if (ret < 1)
+			goto fail;
+	}
+
+	return n;
+
+  fail:
+	free(*pids);
+	free(*names);
+	return 0;
+}
+
+size_t kspy_trace2matrix(uint64_t **offset_array,
+			 uint8_t **cpu_array,
+			 uint64_t **ts_array,
+			 uint16_t **pid_array,
+			 int **event_array)
+{
+	struct kshark_context *kshark_ctx = NULL;
+	size_t total = 0;
+
+	if (!kshark_instance(&kshark_ctx))
+		return false;
+
+	total = kshark_load_data_matrix(kshark_ctx, offset_array,
+					cpu_array,
+					ts_array,
+					pid_array,
+					event_array);
+
+	return total;
+}
+
+int kspy_get_event_id(const char *sys, const char *evt)
+{
+	struct kshark_context *kshark_ctx = NULL;
+	struct tep_event *event;
+
+	if (!kshark_instance(&kshark_ctx))
+		return -1;
+
+	event = tep_find_event_by_name(kshark_ctx->pevent, sys, evt);
+
+	return event->id;
+}
+
+uint64_t kspy_read_event_field(uint64_t offset, const char *sys,
+						const char *evt,
+						const char *field)
+{
+	struct kshark_context *kshark_ctx = NULL;
+	struct tep_format_field *evt_field;
+	struct tep_record *record;
+	struct tep_event *event;
+	unsigned long long val;
+	int ret;
+
+	if (!kshark_instance(&kshark_ctx))
+		return 0;
+
+	event = tep_find_event_by_name(kshark_ctx->pevent, sys, evt);
+	if (!event)
+		return 0;
+
+	evt_field = tep_find_any_field(event, field);
+	if (!evt_field)
+		return 0;
+
+	record = tracecmd_read_at(kshark_ctx->handle, offset, NULL);
+	if (!record)
+		return 0;
+
+	ret = tep_read_number_field(evt_field, record->data, &val);
+	free_record(record);
+
+	if (ret != 0)
+		return 0;
+
+	return val;
+}
+
+void kspy_new_session_file(const char *data_file, const char *session_file)
+{
+	struct kshark_context *kshark_ctx = NULL;
+	struct kshark_trace_histo histo;
+	struct kshark_config_doc *session;
+	struct kshark_config_doc *filters;
+	struct kshark_config_doc *markers;
+	struct kshark_config_doc *model;
+	struct kshark_config_doc *file;
+
+	if (!kshark_instance(&kshark_ctx))
+		return;
+
+	session = kshark_config_new("kshark.config.session",
+				    KS_CONFIG_JSON);
+
+	file = kshark_export_trace_file(data_file, KS_CONFIG_JSON);
+	kshark_config_doc_add(session, "Data", file);
+
+	filters = kshark_export_all_filters(kshark_ctx, KS_CONFIG_JSON);
+	kshark_config_doc_add(session, "Filters", filters);
+
+	ksmodel_init(&histo);
+	model = kshark_export_model(&histo, KS_CONFIG_JSON);
+	kshark_config_doc_add(session, "Model", model);
+
+	markers = kshark_config_new("kshark.config.markers", KS_CONFIG_JSON);
+	kshark_config_doc_add(session, "Markers", markers);
+
+	kshark_save_config_file(session_file, session);
+}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 5/7] kernel-shark: Add Numpy Interface for processing of tracing data
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
                   ` (3 preceding siblings ...)
  2019-03-27 16:03 ` [RFC 4/7] kernel-shark: Add the core components of the NumPy API Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 6/7] kernel-shark: Add automatic building of the NumPy interface Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 7/7] kernel-shark: Add basic example demonstrating " Yordan Karadzhov
  6 siblings, 0 replies; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

This patch contains the Cython implementation of the Interface
together with a Python script used to build the corresponding
library (libkshark_wrapper.so)

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/build/py/libkshark_wrapper.pyx | 264 ++++++++++++++++++++
 kernel-shark/build/py/np_setup.py           |  87 +++++++
 2 files changed, 351 insertions(+)
 create mode 100644 kernel-shark/build/py/libkshark_wrapper.pyx
 create mode 100644 kernel-shark/build/py/np_setup.py

diff --git a/kernel-shark/build/py/libkshark_wrapper.pyx b/kernel-shark/build/py/libkshark_wrapper.pyx
new file mode 100644
index 0000000..825850f
--- /dev/null
+++ b/kernel-shark/build/py/libkshark_wrapper.pyx
@@ -0,0 +1,264 @@
+"""
+SPDX-License-Identifier: LGPL-2.1
+
+Copyright (C) 2017 VMware Inc, Yordan Karadzhov <ykaradzhov@vmware.com>
+"""
+
+# Import the Python-level symbols of numpy
+import numpy as np
+
+# Import the C-level symbols of numpy
+cimport numpy as np
+
+import ctypes
+import json
+
+cdef extern from "stdint.h":
+  ctypedef unsigned short uint8_t
+  ctypedef unsigned short uint16_t
+  ctypedef unsigned long long uint64_t
+
+from libcpp cimport bool
+
+cdef extern from "numpy/ndarraytypes.h":
+  int NPY_ARRAY_CARRAY
+
+# Declare the prototype of the C function we are interested in calling
+cdef extern from "../../src/libkshark-py.c":
+  bool kspy_open(const char *fname);
+
+cdef extern from "../../src/libkshark-py.c":
+  bool kspy_close()
+
+cdef extern from "../../src/libkshark-py.c":
+  size_t kspy_trace2matrix(uint64_t **offset_array,
+                           uint8_t **cpu_array,
+                           uint64_t **ts_array,
+                           uint16_t **pid_array,
+                           int **event_array);
+
+cdef extern from "../../src/libkshark-py.c":
+  int kspy_get_event_id(const char *sys, const char *evt)
+
+cdef extern from "../../src/libkshark-py.c":
+  uint64_t kspy_read_event_field(uint64_t offset, const char *sys,
+                                                    const char *evt,
+                                                    const char *field)
+
+cdef extern from "../../src/libkshark-py.c":
+  ssize_t kspy_get_tasks(int **pids, char ***names)
+
+cdef extern from "../../src/libkshark.h":
+  int KS_EVENT_OVERFLOW
+
+cdef extern from "../../src/libkshark-py.c":
+  void kspy_new_session_file(const char *data_file, const char *session_file)
+
+EVENT_OVERFLOW = KS_EVENT_OVERFLOW
+
+from libc.stdlib cimport free
+
+#from libcpp.string cimport string
+
+from cpython cimport PyObject, Py_INCREF
+
+# Numpy must be initialized!!!
+np.import_array()
+
+cdef class KsDataWrapper:
+  cdef int item_size
+  cdef int data_size
+  cdef int data_type
+  cdef void* data_ptr
+
+  cdef init(self, int data_type,
+                  int data_size,
+                  int item_size,
+                  void* data_ptr):
+    """ This initialization cannot be done in the constructor because we use
+        C-level arguments.
+    """
+    self.item_size = item_size
+    self.data_size = data_size
+    self.data_type = data_type
+    self.data_ptr = data_ptr
+
+  def __array__(self):
+    """ Here we use the __array__ method, that is called when numpy
+        tries to get an array from the object.
+    """
+    cdef np.npy_intp shape[1]
+    shape[0] = <np.npy_intp> self.data_size
+
+    ndarray = np.PyArray_New(np.ndarray,
+                             1, shape,
+                             self.data_type,
+                             NULL,
+                             self.data_ptr,
+                             self.item_size,
+                             NPY_ARRAY_CARRAY,
+                             <object> NULL)
+
+    return ndarray
+
+  def __dealloc__(self):
+    """ Frees the data. This is called by Python when all the
+        references to the object are gone.
+    """
+    free(<void*>self.data_ptr)
+
+def open_file(fname):
+  """ Open a tracing data file.
+  """
+  return kspy_open(fname)
+
+def close():
+  """ Open the session file.
+  """
+  kspy_close()
+
+def read_event_field(offset, sys, event, field):
+  """ Read the value of a specific field of the trace event.
+  """
+  cdef uint64_t v
+
+  v = kspy_read_event_field(offset, sys, event, field)
+  return v
+
+def event_id(system, event):
+  """ Get the unique Id of the event
+  """
+  return kspy_get_event_id(system, event)
+
+def c_str_cast(char *c_str):
+  """ String convertion C -> Python
+  """
+  return ctypes.c_char_p(c_str).value
+
+def get_tasks():
+  """ get a dictionary of all task's PIDs
+  """
+  cdef int *pids
+  cdef char **names
+  cdef int size = kspy_get_tasks(&pids, &names)
+
+  task_dict = {}
+
+  for i in range(0, size):
+    task_dict.update({c_str_cast(names[i]) : pids[i]})
+
+  return task_dict
+
+def load_data():
+  """ Python binding of the 'kshark_load_data_matrix' function that does not
+      copy the data.
+  """
+  cdef uint64_t *ofst
+  cdef uint8_t *cpu
+  cdef uint64_t *ts
+  cdef uint16_t *pid
+  cdef int *evt
+
+  cdef np.ndarray ndarray_ofst
+  cdef np.ndarray ndarray_cpu
+  cdef np.ndarray ndarray_ts
+  cdef np.ndarray ndarray_pid
+  cdef np.ndarray ndarray_evt
+
+  # Call the C function
+  cdef int size = kspy_trace2matrix(&ofst, &cpu, &ts, &pid, &evt)
+
+  array_wrapper_ofst = KsDataWrapper()
+  array_wrapper_cpu = KsDataWrapper()
+  array_wrapper_ts = KsDataWrapper()
+  array_wrapper_pid = KsDataWrapper()
+  array_wrapper_evt = KsDataWrapper()
+
+  array_wrapper_ofst.init(data_type=np.NPY_UINT64,
+                          item_size=0,
+                          data_size=size,
+                          data_ptr=<void*> ofst)
+
+  array_wrapper_cpu.init(data_type=np.NPY_UINT8,
+                         data_size=size,
+                         item_size=0,
+                         data_ptr=<void*> cpu)
+
+  array_wrapper_ts.init(data_type=np.NPY_UINT64,
+                        data_size=size,
+                        item_size=0,
+                        data_ptr=<void*> ts)
+
+  array_wrapper_pid.init(data_type=np.NPY_UINT16,
+                         data_size=size,
+                         item_size=0,
+                         data_ptr=<void*> pid)
+
+  array_wrapper_evt.init(data_type=np.NPY_INT,
+                         data_size=size,
+                         item_size=0,
+                         data_ptr=<void*> evt)
+
+  ndarray_ofst = np.array(array_wrapper_ofst, copy=False)
+  ndarray_cpu = np.array(array_wrapper_cpu, copy=False)
+  ndarray_ts = np.array(array_wrapper_ts, copy=False)
+  ndarray_pid = np.array(array_wrapper_pid, copy=False)
+  ndarray_evt = np.array(array_wrapper_evt, copy=False)
+
+  # Assign our object to the 'base' of the ndarray object
+  ndarray_ofst.base = <PyObject*> array_wrapper_ofst
+  ndarray_cpu.base = <PyObject*> array_wrapper_cpu
+  ndarray_ts.base = <PyObject*> array_wrapper_ts
+  ndarray_pid.base = <PyObject*> array_wrapper_pid
+  ndarray_evt.base = <PyObject*> array_wrapper_evt
+
+  # Increment the reference count, as the above assignement was done in
+  # C, and Python does not know that there is this additional reference
+  Py_INCREF(array_wrapper_ofst)
+  Py_INCREF(array_wrapper_cpu)
+  Py_INCREF(array_wrapper_ts)
+  Py_INCREF(array_wrapper_pid)
+  Py_INCREF(array_wrapper_evt)
+
+  return ndarray_ofst, ndarray_cpu, ndarray_evt, ndarray_pid, ndarray_ts
+
+def save_session(session, s):
+  """ Save a KernelShark session description of a JSON file.
+  """
+  s.seek(0)
+  json.dump(session, s, indent=4)
+  s.truncate()
+
+def new_session(fname, sname):
+  """ Generate and save a default KernelShark session description file (JSON).
+  """
+  kspy_new_session_file(fname, sname)
+
+  with open(sname, "r+") as s:
+    session = json.load(s)
+
+    session['Filters']['filter mask'] = 7
+
+    session['CPUPlots'] = []
+
+    session['TaskPlots'] = []
+
+    session['Splitter'] = [1, 1]
+
+    session['MainWindow'] = [1200, 800]
+
+    session['ViewTop'] = 0
+
+    session['ColorScheme'] = 0.75
+
+    session['Model']['bins'] = 1000
+
+    session['Markers']['markA'] = {}
+    session['Markers']['markA']['isSet'] = False
+
+    session['Markers']['markB'] = {}
+    session['Markers']['markB']['isSet'] = False
+
+    session['Markers']['Active'] = 'A'
+
+    save_session(session, s)
diff --git a/kernel-shark/build/py/np_setup.py b/kernel-shark/build/py/np_setup.py
new file mode 100644
index 0000000..602a99c
--- /dev/null
+++ b/kernel-shark/build/py/np_setup.py
@@ -0,0 +1,87 @@
+#!/usr/bin/python
+
+import sys, getopt
+import numpy
+from Cython.Distutils import build_ext
+
+def libs(argv):
+   pykslibdir = ''
+   evlibdir = ''
+   evincdir = ''
+   trlibdir = ''
+   trincdir = ''
+   jsnincdir = ''
+
+   try:
+      opts, args = getopt.getopt(argv,"l:t:i:e:n:j:", \
+				      ["pykslibdir=", \
+                                       "trlibdir=", \
+                                       "trincdir=", \
+                                       "evlibdir=", \
+                                       "evincdir=", \
+                                       "jsnincdir="])
+   except getopt.GetoptError:
+      sys.exit(2)
+   for opt, arg in opts:
+      if opt in ("-l", "--pykslibdir"):
+         pykslibdir = arg
+      elif opt in ("-t", "--trlibdir"):
+         trlibdir = arg
+      elif opt in ("-i", "--trincdir"):
+         trincdir = arg
+      elif opt in ("-e", "--evlibdir"):
+         evlibdir = arg
+      elif opt in ("-n", "--evincdir"):
+         evincdir = arg
+      elif opt in ("-j", "--jsnincdir"):
+         jsnincdir = arg
+
+   cmd1 = 1
+   for i in xrange(len(sys.argv)):
+      if sys.argv[i] == "build_ext":
+         cmd1 = i
+
+   sys.argv = sys.argv[:1] + sys.argv[cmd1:]
+   print "jsnincdir:", jsnincdir
+
+   return pykslibdir, evlibdir, evincdir, trlibdir, trincdir, jsnincdir
+
+def configuration(parent_package='', top_path=None, \
+                  libs=["pykshark", "tracecmd", "traceevent", "json-c"], \
+                  libdirs=['.'], \
+                  incdirs=['.']):
+   """ Function used to build our configuration.
+   """
+   from numpy.distutils.misc_util import Configuration
+
+   # The configuration object that hold information on all the files
+   # to be built.
+   config = Configuration('', parent_package, top_path)
+   config.add_extension("libkshark_wrapper",
+                        sources=["libkshark_wrapper.pyx"],
+                        libraries=libs,
+                        library_dirs=libdirs,
+                        depends=["../../src/libkshark.c"],
+                        include_dirs=incdirs)
+
+   return config
+
+def main(argv):
+   # Retrieve third-party libraries
+   pykslibdir, evlibdir, evincdir, trlibdir, trincdir, jsnincdir = libs(sys.argv[1:])
+
+   # Retrieve the parameters of our local configuration
+   params = configuration(top_path='', \
+                          libdirs=[pykslibdir, trlibdir, evlibdir], \
+                          incdirs=[trincdir, evincdir, jsnincdir]).todict()
+
+   #Override the C-extension building so that it knows about '.pyx'
+   #Cython files
+   params['cmdclass'] = dict(build_ext=build_ext)
+
+   # Call the actual building/packaging function (see distutils docs)
+   from numpy.distutils.core import setup
+   setup(**params)
+
+if __name__ == "__main__":
+   main(sys.argv[1:])
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 6/7] kernel-shark: Add automatic building of the NumPy interface
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
                   ` (4 preceding siblings ...)
  2019-03-27 16:03 ` [RFC 5/7] kernel-shark: Add Numpy Interface for processing of tracing data Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 16:03 ` [RFC 7/7] kernel-shark: Add basic example demonstrating " Yordan Karadzhov
  6 siblings, 0 replies; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

CMAKE will call the python script that build the NumPy interface.
It will also takes care about cleaning.

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/build/py/pybuild.sh | 26 ++++++++++++++++++++++++++
 kernel-shark/src/CMakeLists.txt  | 20 ++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100755 kernel-shark/build/py/pybuild.sh

diff --git a/kernel-shark/build/py/pybuild.sh b/kernel-shark/build/py/pybuild.sh
new file mode 100755
index 0000000..978085c
--- /dev/null
+++ b/kernel-shark/build/py/pybuild.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+python np_setup.py --pykslibdir=$1 \
+                   --trlibdir=$2 \
+                   --trincdir=$3 \
+                   --evlibdir=$4 \
+                   --evincdir=$5 \
+                   --jsnincdir=$6 \
+                   build_ext -i &> pybuild.log
+
+if grep -q 'error:' pybuild.log; then
+   cat pybuild.log
+fi
+
+if grep -q 'Error:' pybuild.log; then
+   cat pybuild.log
+fi
+
+WC=$(grep 'warning' pybuild.log | wc -l)
+if ((WC > 2)); then
+   cat pybuild.log
+fi
+
+if grep -q 'usage:' pybuild.log; then
+   cat pybuild.log
+fi
diff --git a/kernel-shark/src/CMakeLists.txt b/kernel-shark/src/CMakeLists.txt
index a3c6743..886ddad 100644
--- a/kernel-shark/src/CMakeLists.txt
+++ b/kernel-shark/src/CMakeLists.txt
@@ -45,6 +45,26 @@ if (PYTHONLIBS_FOUND AND CYTHON_FOUND AND NUMPY_FOUND)
                                    ${CMAKE_DL_LIBS}
                                    ${JSONC_LIBRARIES})
 
+
+    add_custom_target(kshark_wrapper ALL DEPENDS pykshark libkshark-py.c
+                                     COMMENT "Generating libkshark_wrapper.c")
+
+    add_custom_command(TARGET kshark_wrapper
+                       PRE_BUILD
+                       COMMAND ./pybuild.sh ${KS_DIR}/lib/
+                                            ${TRACECMD_LIBRARY_DIR}   ${TRACECMD_INCLUDE_DIR}
+                                            ${TRACEEVENT_LIBRARY_DIR} ${TRACEEVENT_INCLUDE_DIR}
+                                            ${JSONC_INCLUDE_DIR}
+                       COMMAND cp libkshark_wrapper.so  ${KS_DIR}/bin
+                       WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/py
+                       DEPENDS ${KS_DIR}/build/py/np_setup.py ${KS_DIR}/src/libkshark-py.c
+                       )
+
+    set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES
+                                           "${KS_DIR}/bin/libkshark_wrapper.so"
+                                           "${KS_DIR}/build/py/libkshark_wrapper.so"
+                                           "${KS_DIR}/build/py/libkshark_wrapper.c")
+
 endif (PYTHONLIBS_FOUND AND CYTHON_FOUND AND NUMPY_FOUND)
 
 if (Qt5Widgets_FOUND AND Qt5Network_FOUND)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 7/7] kernel-shark: Add basic example demonstrating the NumPy interface
  2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
                   ` (5 preceding siblings ...)
  2019-03-27 16:03 ` [RFC 6/7] kernel-shark: Add automatic building of the NumPy interface Yordan Karadzhov
@ 2019-03-27 16:03 ` Yordan Karadzhov
  2019-03-27 23:25   ` Slavomir Kaslev
  2019-03-28 12:47   ` Slavomir Kaslev
  6 siblings, 2 replies; 14+ messages in thread
From: Yordan Karadzhov @ 2019-03-27 16:03 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel, y.karadz

The example script plots the distribution (histogram) of the pulseaudio
wake-up times and finds the biggest latency. The script also generates
a KernelShark session descriptor file (JSON). The session descriptor file
can be used by the KernelSherk GUI to open a session which will directly
visualize the largest wake-up latency.

Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
---
 kernel-shark/bin/sched_wakeup.py | 96 ++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)
 create mode 100755 kernel-shark/bin/sched_wakeup.py

diff --git a/kernel-shark/bin/sched_wakeup.py b/kernel-shark/bin/sched_wakeup.py
new file mode 100755
index 0000000..79dbedd
--- /dev/null
+++ b/kernel-shark/bin/sched_wakeup.py
@@ -0,0 +1,96 @@
+#!/usr/bin/env python2
+
+import matplotlib.pyplot as plt
+import libkshark_wrapper as ks
+import scipy.stats as st
+import numpy as np
+import array
+import json
+import sys
+
+fname = str(sys.argv[1])
+
+ks.open_file(fname)
+ofst, cpu, evt, pid, ts = ks.load_data()
+tasks = ks.get_tasks()
+task_pid = tasks["pulseaudio"]
+
+ss_eid = ks.event_id("sched", "sched_switch");
+w_eid = ks.event_id("sched", "sched_waking");
+
+i = evt.size - 1
+dt = []
+delta_max = i_ss_max = i_sw_max = 0
+
+while (i >= 0):
+  if (evt[i] == ss_eid):
+    next_pid = ks.read_event_field(offset=ofst[i],
+                                   sys="sched",
+                                   event="sched_switch",
+                                   field="next_pid")
+
+    if (next_pid == task_pid):
+      time_ss = ts[i]
+      index_ss = i
+
+      while(i >= 0):
+        i = i - 1
+        if (evt[i] == w_eid):
+          waking_pid = ks.read_event_field(offset=ofst[i],
+                                           sys="sched",
+                                           event="sched_waking",
+                                           field="pid")
+
+          if (waking_pid == task_pid):
+            delta = (time_ss - ts[i]) / 1000.
+            #print delta
+            dt.append(delta);
+            if (delta > delta_max):
+              print "lat. max: ", delta
+              i_ss_max = index_ss
+              i_sw_max = i
+              delta_max = delta
+
+            break
+
+  i = i - 1
+
+desc = st.describe(np.array(dt))
+print desc
+
+fig, ax = plt.subplots(nrows=1, ncols=1)
+fig.set_figheight(6)
+fig.set_figwidth(7)
+
+rect = fig.patch
+rect.set_facecolor('white')
+
+ax.set_xlabel('latency [$\mu$s]')
+plt.yscale('log')
+ax.hist(dt, bins=(200), histtype=u'step');
+plt.show()
+
+sname = 'sched.json'
+ks.new_session(fname, sname)
+
+with open(sname, "r+") as s:
+  session = json.load(s)
+  session['TaskPlots'] = [task_pid]
+  session['CPUPlots'] = [int(cpu[i_sw_max]), int(cpu[i_ss_max])]
+
+  delta = ts[i_ss_max] - ts[i_sw_max]
+  tmin = int(ts[i_sw_max] - delta)
+  tmax = int(ts[i_ss_max] + delta)
+  session['Model']['range'] = [tmin, tmax]
+
+  session['Markers']['markA']['isSet'] = True
+  session['Markers']['markA']['row'] = int(i_sw_max)
+
+  session['Markers']['markB']['isSet'] = True
+  session['Markers']['markB']['row'] = int(i_ss_max)
+
+  session['ViewTop'] = int(i_sw_max) - 5
+
+  ks.save_session(session, s)
+
+ks.close()
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments
  2019-03-27 16:03 ` [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments Yordan Karadzhov
@ 2019-03-27 16:19   ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2019-03-27 16:19 UTC (permalink / raw)
  To: Yordan Karadzhov; +Cc: linux-trace-devel, y.karadz

On Wed, 27 Mar 2019 18:03:17 +0200
Yordan Karadzhov <ykaradzhov@vmware.com> wrote:

> The function is not meant to have arguments, so it must explicitly
> state this. A functions taking unspecified list of parameters will
> be a problem for the Pythion interface.

I applied this now, because it is a real fix.

-- Steve


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 4/7] kernel-shark: Add the core components of the NumPy API
  2019-03-27 16:03 ` [RFC 4/7] kernel-shark: Add the core components of the NumPy API Yordan Karadzhov
@ 2019-03-27 22:53   ` Slavomir Kaslev
  0 siblings, 0 replies; 14+ messages in thread
From: Slavomir Kaslev @ 2019-03-27 22:53 UTC (permalink / raw)
  To: Yordan Karadzhov
  Cc: Steven Rostedt, linux-trace-devel, Yordan Karadzhov (VMware)

On Wed, Mar 27, 2019 at 6:04 PM Yordan Karadzhov <ykaradzhov@vmware.com> wrote:
>
> The NumPy API is meant to operate on top of the C API-s of trace-cmd and
> KernelShark and to provide only the minimum of basic functionalities needed
> in order to processor the tracing data. The NumPy API itself is made of two
> layers. A bottom-one written in C and a top-one which implements the
> interface, written in Cython (C-Python). This patch introduces the C layer.

Why is there a need for two layers? Since the upper layer is Cython,
it can access the libkernelshark C API just fine. I don't understand
the need for this middle layer.

Also there's already a Python binding for traceevent, maybe we can
utilize it for this instead of providing an ad hoc binding for trace
events here too?

On an unrelated note, Python 2 will be hitting its end of life date in
a couple of months[1]. Does it make sense to start with Python 2 in
2019 when we'll need to port it to Python 3 in the very near future? I
think going for Python 3 only might be a better choice.

[1] https://pythonclock.org/

Several non high-level comments follow inline.

>
> Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
> ---
>  kernel-shark/src/libkshark-py.c | 176 ++++++++++++++++++++++++++++++++
>  1 file changed, 176 insertions(+)
>  create mode 100644 kernel-shark/src/libkshark-py.c
>
> diff --git a/kernel-shark/src/libkshark-py.c b/kernel-shark/src/libkshark-py.c
> new file mode 100644
> index 0000000..4c589f3
> --- /dev/null
> +++ b/kernel-shark/src/libkshark-py.c
> @@ -0,0 +1,176 @@
> +// SPDX-License-Identifier: LGPL-2.1
> +
> +/*
> + * Copyright (C) 2019 VMware Inc, Yordan Karadzhov <y.karadz@gmail.com>
> + */
> +
> + /**
> + *  @file    libpykshark-py.c

                      ^ libkshark-py.c?

> + *  @brief   Python API for processing of FTRACE (trace-cmd) data.
> + */
> +
> +// KernelShark
> +#include "libkshark.h"
> +#include "libkshark-model.h"
> +
> +bool kspy_open(const char *fname)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return false;
> +
> +       return kshark_open(kshark_ctx, fname);
> +}
> +
> +void kspy_close(void)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return;
> +
> +       kshark_close(kshark_ctx);
> +       kshark_free(kshark_ctx);
> +}
> +
> +static int compare(const void * a, const void * b)
> +{
> +  return ( *(int*)a - *(int*)b );
> +}
> +
> +size_t kspy_get_tasks(int **pids, char ***names)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +       const char *comm;
> +       ssize_t i, n;
> +       int ret;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return 0;
> +
> +       n = kshark_get_task_pids(kshark_ctx, pids);
> +       if (n == 0)
> +               return 0;
> +
> +       qsort(*pids, n, sizeof(**pids), compare);
> +
> +       *names = calloc(n, sizeof(char*));
> +       if (!(*names))
> +               goto fail;
> +
> +       for (i = 0; i < n; ++i) {
> +               comm = tep_data_comm_from_pid(kshark_ctx->pevent, (*pids)[i]);
> +               ret = asprintf(&(*names)[i], "%s", comm);
> +               if (ret < 1)
> +                       goto fail;
> +       }
> +
> +       return n;
> +
> +  fail:
> +       free(*pids);
> +       free(*names);
> +       return 0;
> +}
> +
> +size_t kspy_trace2matrix(uint64_t **offset_array,
> +                        uint8_t **cpu_array,
> +                        uint64_t **ts_array,
> +                        uint16_t **pid_array,
> +                        int **event_array)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +       size_t total = 0;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return false;
> +
> +       total = kshark_load_data_matrix(kshark_ctx, offset_array,
> +                                       cpu_array,
> +                                       ts_array,
> +                                       pid_array,
> +                                       event_array);
> +
> +       return total;
> +}
> +
> +int kspy_get_event_id(const char *sys, const char *evt)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +       struct tep_event *event;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return -1;
> +
> +       event = tep_find_event_by_name(kshark_ctx->pevent, sys, evt);
> +
> +       return event->id;
> +}
> +
> +uint64_t kspy_read_event_field(uint64_t offset, const char *sys,
> +                                               const char *evt,
> +                                               const char *field)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +       struct tep_format_field *evt_field;
> +       struct tep_record *record;
> +       struct tep_event *event;
> +       unsigned long long val;
> +       int ret;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return 0;
> +
> +       event = tep_find_event_by_name(kshark_ctx->pevent, sys, evt);

tep_find_event_by_name() is linear in the number of events and it does
a couple of thousand string comparisons. So calling
kspy_read_event_field() in a loop can get *really* slow. I haven't
measured, but I would guess that a good chunk of the running time of
the script in patch 7 is spent in this function.

We might want to provide an efficient API using event ids instead and
use tep_find_event() (which does binary search; that is 10s of
numerical comparisons) here to prevent Python users shooting
themselves in the foot performance-wise.

> +       if (!event)
> +               return 0;
> +
> +       evt_field = tep_find_any_field(event, field);
> +       if (!evt_field)
> +               return 0;
> +
> +       record = tracecmd_read_at(kshark_ctx->handle, offset, NULL);
> +       if (!record)
> +               return 0;
> +
> +       ret = tep_read_number_field(evt_field, record->data, &val);
> +       free_record(record);
> +
> +       if (ret != 0)
> +               return 0;
> +
> +       return val;
> +}
> +
> +void kspy_new_session_file(const char *data_file, const char *session_file)
> +{
> +       struct kshark_context *kshark_ctx = NULL;
> +       struct kshark_trace_histo histo;
> +       struct kshark_config_doc *session;
> +       struct kshark_config_doc *filters;
> +       struct kshark_config_doc *markers;
> +       struct kshark_config_doc *model;
> +       struct kshark_config_doc *file;
> +
> +       if (!kshark_instance(&kshark_ctx))
> +               return;
> +
> +       session = kshark_config_new("kshark.config.session",
> +                                   KS_CONFIG_JSON);
> +
> +       file = kshark_export_trace_file(data_file, KS_CONFIG_JSON);
> +       kshark_config_doc_add(session, "Data", file);
> +
> +       filters = kshark_export_all_filters(kshark_ctx, KS_CONFIG_JSON);
> +       kshark_config_doc_add(session, "Filters", filters);
> +
> +       ksmodel_init(&histo);
> +       model = kshark_export_model(&histo, KS_CONFIG_JSON);
> +       kshark_config_doc_add(session, "Model", model);
> +
> +       markers = kshark_config_new("kshark.config.markers", KS_CONFIG_JSON);
> +       kshark_config_doc_add(session, "Markers", markers);
> +
> +       kshark_save_config_file(session_file, session);
> +}

Aren't the struct kshark_config_doc variables leaking from this function?

Also, this function doesn't seem related to the Python bindings at
all. If it's useful on it's own, you might want to have it in
libkernelshark.
The Python binding layer should not contain any additional logic
besides defining the binding itself imo.

-- Slavi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 7/7] kernel-shark: Add basic example demonstrating the NumPy interface
  2019-03-27 16:03 ` [RFC 7/7] kernel-shark: Add basic example demonstrating " Yordan Karadzhov
@ 2019-03-27 23:25   ` Slavomir Kaslev
  2019-03-28 12:47   ` Slavomir Kaslev
  1 sibling, 0 replies; 14+ messages in thread
From: Slavomir Kaslev @ 2019-03-27 23:25 UTC (permalink / raw)
  To: Yordan Karadzhov
  Cc: Steven Rostedt, linux-trace-devel, Yordan Karadzhov (VMware)

On Wed, Mar 27, 2019 at 6:04 PM Yordan Karadzhov <ykaradzhov@vmware.com> wrote:
>
> The example script plots the distribution (histogram) of the pulseaudio
> wake-up times and finds the biggest latency. The script also generates
> a KernelShark session descriptor file (JSON). The session descriptor file
> can be used by the KernelSherk GUI to open a session which will directly
> visualize the largest wake-up latency.
>
> Signed-off-by: Yordan Karadzhov <ykaradzhov@vmware.com>
> ---
>  kernel-shark/bin/sched_wakeup.py | 96 ++++++++++++++++++++++++++++++++
>  1 file changed, 96 insertions(+)
>  create mode 100755 kernel-shark/bin/sched_wakeup.py
>
> diff --git a/kernel-shark/bin/sched_wakeup.py b/kernel-shark/bin/sched_wakeup.py

<snip>

> +while (i >= 0):
> +  if (evt[i] == ss_eid):

Those while (...)s and if (...)s are very C style Python code :)
Usually parentheses are needed only when the expression inside is too
long and one wants to break it on several lines.

Since we don't have a blessed Python style guide, I would suggest
sticking to pep8[1]. There's also a pep8 checker tool[2] which comes
handy and can point out unidiomatic code.

Cheers,

-- Slavi

[1] https://www.python.org/dev/peps/pep-0008/
[2] `apt install pep8` on Debian systems

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 2/7] kernel-shark: Add new dataloading method to be used by the NumPu interface
  2019-03-27 16:03 ` [RFC 2/7] kernel-shark: Add new dataloading method to be used by the NumPu interface Yordan Karadzhov
@ 2019-03-27 23:41   ` Slavomir Kaslev
  0 siblings, 0 replies; 14+ messages in thread
From: Slavomir Kaslev @ 2019-03-27 23:41 UTC (permalink / raw)
  To: Yordan Karadzhov
  Cc: Steven Rostedt, linux-trace-devel, Yordan Karadzhov (VMware)

On Wed, Mar 27, 2019 at 6:04 PM Yordan Karadzhov <ykaradzhov@vmware.com> wrote:
<snip>
> +       if (offset_array)
> +               free(offset_array);

This should be
    free(*offset_array);
and ditto for the rest.

Also in the very unlikely event of malloc() failing, if the caller
didn't initialize the arguments before calling, this will be calling
free() with garbage. If we're paranoid for this case, we can either
1) have NULL initialized local variables where we malloc() and on
failure free(), or on success assign to the output arguments
2) have a separate label for each malloc() failure case and jump to
the correct label when malloc() fails

-- Slavi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 7/7] kernel-shark: Add basic example demonstrating the NumPy interface
  2019-03-27 16:03 ` [RFC 7/7] kernel-shark: Add basic example demonstrating " Yordan Karadzhov
  2019-03-27 23:25   ` Slavomir Kaslev
@ 2019-03-28 12:47   ` Slavomir Kaslev
  2019-04-05 10:10     ` Yordan Karadzhov (VMware)
  1 sibling, 1 reply; 14+ messages in thread
From: Slavomir Kaslev @ 2019-03-28 12:47 UTC (permalink / raw)
  To: Yordan Karadzhov
  Cc: Steven Rostedt, linux-trace-devel, Yordan Karadzhov (VMware)

On Wed, Mar 27, 2019 at 6:04 PM Yordan Karadzhov <ykaradzhov@vmware.com> wrote:
[...]
> diff --git a/kernel-shark/bin/sched_wakeup.py b/kernel-shark/bin/sched_wakeup.py

[...]

> +ks.open_file(fname)
> +ofst, cpu, evt, pid, ts = ks.load_data()

Returning a tuple of 5 numpy arrays as API will be impossible to
extend in the future without breaking existing code. It's also easy
for users to get the order of the return values wrong. Have you
thought of returning a container object that's holding the numpy
arrays? Pandas DataFrame[1] comes to mind although I don't know if
it's worth to take Pandas as dependency for this.

[1] https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html

-- Slavi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 7/7] kernel-shark: Add basic example demonstrating the NumPy interface
  2019-03-28 12:47   ` Slavomir Kaslev
@ 2019-04-05 10:10     ` Yordan Karadzhov (VMware)
  0 siblings, 0 replies; 14+ messages in thread
From: Yordan Karadzhov (VMware) @ 2019-04-05 10:10 UTC (permalink / raw)
  To: Slavomir Kaslev, Yordan Karadzhov; +Cc: Steven Rostedt, linux-trace-devel

Hi Slavi,

Thanks a lot for the detailed review of this patch-set!
I am about to send version 2 of NumPy interface patches which tries to 
address all the problems that you found.

Please review the new version carefully and be as critical as possible.

Thanks!
Yordan

On 28.03.19 г. 14:47 ч., Slavomir Kaslev wrote:
> On Wed, Mar 27, 2019 at 6:04 PM Yordan Karadzhov <ykaradzhov@vmware.com> wrote:
> [...]
>> diff --git a/kernel-shark/bin/sched_wakeup.py b/kernel-shark/bin/sched_wakeup.py
> 
> [...]
> 
>> +ks.open_file(fname)
>> +ofst, cpu, evt, pid, ts = ks.load_data()
> 
> Returning a tuple of 5 numpy arrays as API will be impossible to
> extend in the future without breaking existing code. It's also easy
> for users to get the order of the return values wrong. Have you
> thought of returning a container object that's holding the numpy
> arrays? Pandas DataFrame[1] comes to mind although I don't know if
> it's worth to take Pandas as dependency for this.
> 
> [1] https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html
> 
> -- Slavi
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-04-05 10:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27 16:03 [RFC 0/7] NumPy Interface for KernelShark Yordan Karadzhov
2019-03-27 16:03 ` [RFC 1/7] kernel-shark: kshark_string_config_alloc() must take no arguments Yordan Karadzhov
2019-03-27 16:19   ` Steven Rostedt
2019-03-27 16:03 ` [RFC 2/7] kernel-shark: Add new dataloading method to be used by the NumPu interface Yordan Karadzhov
2019-03-27 23:41   ` Slavomir Kaslev
2019-03-27 16:03 ` [RFC 3/7] kernel-shark: Prepare for building the NumPy interface Yordan Karadzhov
2019-03-27 16:03 ` [RFC 4/7] kernel-shark: Add the core components of the NumPy API Yordan Karadzhov
2019-03-27 22:53   ` Slavomir Kaslev
2019-03-27 16:03 ` [RFC 5/7] kernel-shark: Add Numpy Interface for processing of tracing data Yordan Karadzhov
2019-03-27 16:03 ` [RFC 6/7] kernel-shark: Add automatic building of the NumPy interface Yordan Karadzhov
2019-03-27 16:03 ` [RFC 7/7] kernel-shark: Add basic example demonstrating " Yordan Karadzhov
2019-03-27 23:25   ` Slavomir Kaslev
2019-03-28 12:47   ` Slavomir Kaslev
2019-04-05 10:10     ` Yordan Karadzhov (VMware)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).