linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts in event_analyzing_sample.py
@ 2013-06-07 18:58 Emmet Caulfield
  2013-06-10 14:30 ` Feng Tang
  0 siblings, 1 reply; 2+ messages in thread
From: Emmet Caulfield @ 2013-06-07 18:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Feng Tang

The example script tools/perf/scripts/python/event_analyzing_sample.py
contains a minor error. This script takes a perf.data file and
populates a SQLite database with it.

There's a long comment on lines 29-34 to the effect that it takes a
long time to populate the database if the .db file is on disk, so it's
done in the "ramdisk" (/dev/shm/perf.db), but the problem here is
actually line 36:

    con.isolation_level=None

This line turns on AUTOCOMMIT, making every INSERT statement into its
own transaction, and greatly slowing down a bulk insert (25 minutes
vs. a few seconds to insert 15,000 records). This is best solved by
merely omitting this line or changing it to:

    con.isolation_level='DEFERRED'

After making this change, if the database is in memory, it takes
roughly 0.5 seconds to insert 15,000 records and 0.8 seconds if the
database file is on disk, effectively solving the problem.

Given that the whole purpose of having AUTOCOMMIT turned on is to
ensure that individual insert/update/delete operations are committed
to persistent storage, moving the .db file to a ramdisk defeats the
purpose of turning this option on in the first place. Thus
leaving/turning it *off* with the file on disk is no worse. It is
pretty much standard practice to defer transactions and index updates
for bulk inserts like this anyway.

The following patch deletes the offending line and updates the
associated comment.

Emmet.


--- tools/perf/scripts/python/event_analyzing_sample.py~
2013-06-03 15:38:41.762331865 -0700
+++ tools/perf/scripts/python/event_analyzing_sample.py 2013-06-03
15:43:48.978344602 -0700
@@ -26,14 +26,9 @@
 from perf_trace_context import *
 from EventClass import *

-#
-# If the perf.data has a big number of samples, then the insert operation
-# will be very time consuming (about 10+ minutes for 10000 samples) if the
-# .db database is on disk. Move the .db file to RAM based FS to speedup
-# the handling, which will cut the time down to several seconds.
-#
+# Create/connect to a SQLite3 database:
 con = sqlite3.connect("/dev/shm/perf.db")
-con.isolation_level = None
+

 def trace_begin():
        print "In trace_begin:\n"

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts in event_analyzing_sample.py
  2013-06-07 18:58 [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts in event_analyzing_sample.py Emmet Caulfield
@ 2013-06-10 14:30 ` Feng Tang
  0 siblings, 0 replies; 2+ messages in thread
From: Feng Tang @ 2013-06-10 14:30 UTC (permalink / raw)
  To: Emmet Caulfield; +Cc: linux-kernel

On Fri, Jun 07, 2013 at 11:58:53AM -0700, Emmet Caulfield wrote:
> The example script tools/perf/scripts/python/event_analyzing_sample.py
> contains a minor error. This script takes a perf.data file and
> populates a SQLite database with it.
> 
> There's a long comment on lines 29-34 to the effect that it takes a
> long time to populate the database if the .db file is on disk, so it's
> done in the "ramdisk" (/dev/shm/perf.db), but the problem here is
> actually line 36:
> 
>     con.isolation_level=None
> 
> This line turns on AUTOCOMMIT, making every INSERT statement into its
> own transaction, and greatly slowing down a bulk insert (25 minutes
> vs. a few seconds to insert 15,000 records). This is best solved by
> merely omitting this line or changing it to:
> 
>     con.isolation_level='DEFERRED'
> 
> After making this change, if the database is in memory, it takes
> roughly 0.5 seconds to insert 15,000 records and 0.8 seconds if the
> database file is on disk, effectively solving the problem.
> 
> Given that the whole purpose of having AUTOCOMMIT turned on is to
> ensure that individual insert/update/delete operations are committed
> to persistent storage, moving the .db file to a ramdisk defeats the
> purpose of turning this option on in the first place. Thus
> leaving/turning it *off* with the file on disk is no worse. It is
> pretty much standard practice to defer transactions and index updates
> for bulk inserts like this anyway.
> 
> The following patch deletes the offending line and updates the
> associated comment.
> 
> Emmet.
> 
> 
> --- tools/perf/scripts/python/event_analyzing_sample.py~
> 2013-06-03 15:38:41.762331865 -0700
> +++ tools/perf/scripts/python/event_analyzing_sample.py 2013-06-03
> 15:43:48.978344602 -0700
> @@ -26,14 +26,9 @@
>  from perf_trace_context import *
>  from EventClass import *
> 
> -#
> -# If the perf.data has a big number of samples, then the insert operation
> -# will be very time consuming (about 10+ minutes for 10000 samples) if the
> -# .db database is on disk. Move the .db file to RAM based FS to speedup
> -# the handling, which will cut the time down to several seconds.
> -#
> +# Create/connect to a SQLite3 database:
>  con = sqlite3.connect("/dev/shm/perf.db")
> -con.isolation_level = None
> +
> 
>  def trace_begin():
>         print "In trace_begin:\n"

Thanks for the root causing the slowness of SQLite3 operation.

Acked-by: Feng Tang <feng.tang@intel.com>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-06-10 14:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-07 18:58 [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts in event_analyzing_sample.py Emmet Caulfield
2013-06-10 14:30 ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).