xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: konrad@kernel.org, xen-devel@lists.xenproject.org,
	sasha.levin@oracle.com, andrew.cooper3@citrix.com,
	ross.lagerwall@citrix.com, mpohlack@amazon.de
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <jbeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup
Date: Mon, 25 Apr 2016 11:35:02 -0400	[thread overview]
Message-ID: <1461598514-5440-16-git-send-email-konrad.wilk@oracle.com> (raw)
In-Reply-To: <1461598514-5440-1-git-send-email-konrad.wilk@oracle.com>

The current mechanism is geared towards fast virtual address ->
symbol names lookup. This is fine for the normal use cases
(BUG_ON, WARN_ON, etc), but for xSplice - where we need to find
hypervisor symbols - it is slow.

To understand this patch, a description of the existing
method is explained first. For folks familar go to 'NEW CODE:'.

HOW IT WORKS:

The symbol table lookup mechanism uses a simple encoding mechanism
where it extracts the common ascii characters that the symbol's use.

This saves us space. The lookup mechanism is geared towards looking
up symbols based on address. We have one 0..N (where N is
the number of symbols, so 6849 for example) table:

symbols_addresses[0..N]

And an 1-1 (in a loose fashion) of the symbols (encoded) in a
symbols_names stream of size N.

The N is variable (later on that below)

The symbols_names are sorted based on symbols_addresses, which
means that the decoded entries inside symbols_names are not in
ascending or descending order.

There is also the encoding mechanism - the table of 255 entries
called symbols_token_index[]. And the symbols_token_table which
is an stream of ASCIIZ characters, such as (it really
is not a table as the values are variable):

@0   .asciz  "credit"
@6   .asciz  "mask"
..
@300 .asciz  "S"

And the symbols_token_index:
@0        .short  0
@1        .short  7
@2        .short  12
@4        .short  16
...
@84         .short  300

The relationship between them is that the symbols_token_index
gives us the offset to symbols_token_table.

The symbol_names[] array is a stream of encoded values. Each value
follows the same pattern - <len> followed by <encoding values>.
And the another <len> followed by <encoding values>.

Hence to find the right one you need to read <len>, add <len>
(to skip over), read <len>, add <len>, and so on until one
finds the right tuple offset.

The <encoding values> are the indicies into the symbols_token_index.

Meaning if you have:
  0x04, 0x54, 0xda, 0xe2, 0x74
  [4, 84, 218, 226, 116 in human numbering]

The 0x04 tells us that the symbol is four bytes past this one (so next
symbol offset starts at 5). If we lookup symbols_token_index[84] we get 300.
symbols_token[300] gets us the "S". And so on, the string eventually
end up being decode to be 'S_stext'. The first character is the type,
then optionally follwed by the filename (and # right after filename)
and then lastly the symbol, such as:

tvpmu_intel.c#core2_vpmu_do_interrupt

Keep in mind that there are two fixed sized tables:
symbols_addresses[0..symbols_num_syms], and
symbols_markers[0..symbols_num_syms/255].

The symbols_markers is used to speed searching for the right address.
It gives us the offsets within symbol_names that start at the <len><encoded value>.

The way to find a symbol based on the address is:
1) Figure out the 'tuple offset' from symbols_address[0..symbols_num_syms].
   This table is sorted by virtual addresses so finding the value is simple.
2) Get starting offset of symbol_names by retrieving value of
   symbol_markers['tuple offset' / 255].
3). Iterate up to 'tuple_offset & 255' in symbols_markers stream starting
   at 'offset'.
4). Decode the <len><encoded value>

This however does not work very well if we want to search the other
way - we have the symbol name and want to find the address.

NEW CODE:

To make that work we add one fixed size table called symbols_sorted_offsets which
has two elements: offset in symbol stream, offset in the symbol-address.

This whole array is sorted on the original symbol name during build-time
(in case of collision we also take into account the type).

The values are for example:

symbols_sorted_offsets:
    .long 83363, 6302 # [.bss, len=5]
    .long 80459, 6084 # [.data, len=5]
..
[The # added for clarity]

Which makes it incredibly easy to get in the symbols_names and also
symbols_addresses (or symbols_offsets)

Searching for symbols is simplified as we can do a binary search
on symbols_sorted_offsets. Since the symbols are sorted it takes on
average 13 calls to symbols_expand_symbol.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>

v8: New
 - Remove the debug code
 - Return the 'mid' index in symbol_addresses, not the 'low'.
v9:
 - Make it return void*
 - Ditch the old implementation. Use a single fixed-size array
   with two uint32_t values - offset in stream and offset in address.
 - Change printf in symbols.c to %u. Change parameter to --sort-by-name.
 - Squash the two seperate implementation of symbols_lookup_by_name
   in one function using #ifdefs.
 - Fix comment and simplify compare_name_orig code.
---
 xen/arch/x86/Makefile      |  3 +++
 xen/common/Kconfig         | 12 +++++++++++
 xen/common/symbols-dummy.c |  5 +++++
 xen/common/symbols.c       | 28 ++++++++++++++++++++++++++
 xen/include/xen/symbols.h  |  8 ++++++++
 xen/tools/symbols.c        | 50 ++++++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index fdf4202..900fa59 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -74,6 +74,9 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
 
 ifdef CONFIG_XSPLICE
 all_symbols = --all-symbols
+ifdef CONFIG_FAST_SYMBOL_LOOKUP
+all_symbols = --all-symbols --sort-by-name
+endif
 else
 all_symbols =
 endif
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 692ef51..e4f86c2 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -200,4 +200,16 @@ config XSPLICE
 
 	  If unsure, say Y.
 
+config FAST_SYMBOL_LOOKUP
+	bool "Fast symbol lookup (bigger binary)"
+	default y
+	depends on XSPLICE
+	---help---
+	  When searching for symbol addresses we can use the built-in system
+	  that is optimized for searching symbols using addresses as the key.
+	  However using it for the inverse (find address using the symbol name)
+	  it is slow. This extra data and code (~55kB) speeds up the search.
+	  The only user of this is xSplice.
+
+	  If unsure, say Y.
 endmenu
diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c
index 5090c3b..044dfd3 100644
--- a/xen/common/symbols-dummy.c
+++ b/xen/common/symbols-dummy.c
@@ -5,6 +5,7 @@
 
 #include <xen/config.h>
 #include <xen/types.h>
+#include <xen/symbols.h>
 
 #ifdef SYMBOLS_ORIGIN
 const unsigned int symbols_offsets[1];
@@ -14,6 +15,10 @@ const unsigned long symbols_addresses[1];
 const unsigned int symbols_num_syms;
 const u8 symbols_names[1];
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+const struct symbol_offset symbols_sorted_offsets[1];
+#endif
+
 const u8 symbols_token_table[1];
 const u16 symbols_token_index[1];
 
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 8b4d0fd..c6931e9 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -31,6 +31,8 @@ extern const unsigned long symbols_addresses[];
 extern const unsigned int symbols_num_syms;
 extern const u8 symbols_names[];
 
+extern const struct symbol_offset symbols_sorted_offsets[];
+
 extern const u8 symbols_token_table[];
 extern const u16 symbols_token_index[];
 
@@ -211,14 +213,39 @@ int xensyms_read(uint32_t *symnum, char *type,
 void *symbols_lookup_by_name(const char *symname)
 {
     char name[KSYM_NAME_LEN + 1];
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+    unsigned long low, high;
+#else
     uint32_t symnum = 0;
     char type;
     unsigned long addr;
     int rc;
+#endif
 
     if ( *symname == '\0' )
         return NULL;
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+    low = 0;
+    high = symbols_num_syms;
+    while ( low < high )
+    {
+        unsigned long mid = low + ((high - low) / 2);
+        const struct symbol_offset *s;
+        int rc;
+
+        s = &symbols_sorted_offsets[mid];
+        (void)symbols_expand_symbol(s->stream, name);
+        /* Format is: [filename]#<symbol>. symbols_expand_symbol eats type.*/
+        rc = strcmp(symname, name);
+        if ( rc < 0 )
+            high = mid;
+        else if ( rc > 0 )
+            low = mid + 1;
+        else
+            return (void *)symbols_address(s->addr);
+    }
+#else
     do {
         rc = xensyms_read(&symnum, &type, &addr, name);
         if ( rc )
@@ -229,6 +256,7 @@ void *symbols_lookup_by_name(const char *symname)
 
     } while ( name[0] != '\0' );
 
+#endif
     return NULL;
 }
 
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 2122a5d..0f5876a 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -25,4 +25,12 @@ int xensyms_read(uint32_t *symnum, char *type,
 
 void *symbols_lookup_by_name(const char *symname);
 
+/*
+ * A sorted (by symbols) lookup table table to symbols_names (stream)
+ * and symbols_address (or offset).
+ */
+struct symbol_offset {
+    uint32_t stream; /* .. in the compressed stream.*/
+    uint32_t addr;   /* .. and in the fixed size address array. */
+};
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c
index 196db74..941fbe7 100644
--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -40,6 +40,10 @@ struct sym_entry {
 	unsigned long long addr;
 	unsigned int len;
 	unsigned char *sym;
+	char *orig_symbol;
+	unsigned int addr_idx;
+	unsigned int stream_offset;
+	unsigned char type;
 };
 #define SYMBOL_NAME(s) ((char *)(s)->sym + 1)
 
@@ -47,8 +51,10 @@ static struct sym_entry *table;
 static unsigned int table_size, table_cnt;
 static unsigned long long _stext, _etext, _sinittext, _einittext, _sextratext, _eextratext;
 static int all_symbols = 0;
+static int sort_by_name = 0;
 static char symbol_prefix_char = '\0';
 static enum { fmt_bsd, fmt_sysv } input_format;
+static int compare_name(const void *p1, const void *p2);
 
 int token_profit[0x10000];
 
@@ -175,8 +181,11 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 		*sym++ = '#';
 	}
 	strcpy(sym, str);
+	if (sort_by_name) {
+		s->orig_symbol = strdup(SYMBOL_NAME(s));
+		s->type = stype; /* As s->sym[0] ends mangled. */
+	}
 	s->sym[0] = stype;
-
 	rc = 0;
 
  skip_tail:
@@ -276,6 +285,21 @@ static int expand_symbol(unsigned char *data, int len, char *result)
 	return total;
 }
 
+/* Sort by original (non mangled) symbol name, then type. */
+static int compare_name_orig(const void *p1, const void *p2)
+{
+	const struct sym_entry *sym1 = p1;
+	const struct sym_entry *sym2 = p2;
+	int rc;
+
+	rc = strcmp(sym1->orig_symbol, sym2->orig_symbol);
+
+	if (!rc)
+		rc = sym1->type - sym2->type;
+
+	return rc;
+}
+
 static void write_src(void)
 {
 	unsigned int i, k, off;
@@ -325,6 +349,7 @@ static void write_src(void)
 			printf(", 0x%02x", table[i].sym[k]);
 		printf("\n");
 
+		table[i].stream_offset = off;
 		off += table[i].len + 1;
 	}
 	printf("\n");
@@ -334,7 +359,6 @@ static void write_src(void)
 		printf("\t.long\t%d\n", markers[i]);
 	printf("\n");
 
-	free(markers);
 
 	output_label("symbols_token_table");
 	off = 0;
@@ -350,6 +374,25 @@ static void write_src(void)
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
 	printf("\n");
+
+	if (!sort_by_name) {
+		free(markers);
+		return;
+	}
+
+	/* Sorted by original symbol names and type. */
+	qsort(table, table_cnt, sizeof(*table), compare_name_orig);
+
+	output_label("symbols_sorted_offsets");
+	/* A fixed sized array with two entries: offset in the
+	 * compressed stream (for symbol name), and offset in
+	 * symbols_addresses (or symbols_offset). */
+	for (i = 0; i < table_cnt; i++) {
+		printf("\t.long %u, %u\n", table[i].stream_offset, table[i].addr_idx);
+	}
+	printf("\n");
+
+	free(markers);
 }
 
 
@@ -410,6 +453,7 @@ static void compress_symbols(unsigned char *str, int idx)
 		len = table[i].len;
 		p1 = table[i].sym;
 
+		table[i].addr_idx = i;
 		/* find the token on the symbol */
 		p2 = memmem_pvt(p1, len, str, 2);
 		if (!p2) continue;
@@ -561,6 +605,8 @@ int main(int argc, char **argv)
 				input_format = fmt_sysv;
 			else if (strcmp(argv[i], "--sort") == 0)
 				unsorted = true;
+			else if (strcmp(argv[i], "--sort-by-name") == 0)
+				sort_by_name = 1;
 			else if (strcmp(argv[i], "--warn-dup") == 0)
 				warn_dup = true;
 			else
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-04-25 15:36 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall" Konrad Rzeszutek Wilk
2016-04-25 15:48   ` Jan Beulich
2016-04-25 15:53     ` Wei Liu
2016-04-25 15:34 ` [PATCH v9 02/27] Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane." Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 03/27] xsplice: Design document Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
2016-04-26  7:48   ` Ross Lagerwall
2016-04-26  7:52   ` Ross Lagerwall
2016-04-26 10:21   ` Jan Beulich
2016-04-26 17:50     ` Konrad Rzeszutek Wilk
2016-04-27  6:51       ` Jan Beulich
2016-04-27 13:47         ` Konrad Rzeszutek Wilk
2016-04-27 14:11           ` Jan Beulich
2016-04-25 15:34 ` [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
2016-04-26  7:51   ` Ross Lagerwall
2016-04-25 15:34 ` [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
2016-04-26  7:49   ` Ross Lagerwall
2016-04-25 15:34 ` [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup Konrad Rzeszutek Wilk
2016-04-26 10:31   ` Jan Beulich
2016-04-25 15:34 ` [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type Konrad Rzeszutek Wilk
2016-04-26 10:47   ` Jan Beulich
2016-04-27  2:38     ` Konrad Rzeszutek Wilk
2016-04-27  7:12       ` Jan Beulich
2016-04-27 13:46         ` Konrad Rzeszutek Wilk
2016-04-27 14:15           ` Jan Beulich
2016-04-25 15:34 ` [PATCH v9 09/27] x86/mm: Introduce modify_xen_mappings() Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 10/27] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
2016-04-26 10:05   ` Ross Lagerwall
2016-04-26 11:52     ` Jan Beulich
2016-04-26 12:37   ` Jan Beulich
2016-04-27  1:59     ` Konrad Rzeszutek Wilk
2016-04-27  7:27       ` Jan Beulich
2016-04-27 14:00         ` Konrad Rzeszutek Wilk
2016-04-27  4:06     ` Konrad Rzeszutek Wilk
2016-04-27  7:52       ` Jan Beulich
2016-04-27 18:45         ` Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 11/27] xsplice: Implement payload loading Konrad Rzeszutek Wilk
2016-04-26 10:48   ` Ross Lagerwall
2016-04-26 13:39   ` Jan Beulich
2016-04-27  1:47     ` Konrad Rzeszutek Wilk
2016-04-27  7:57       ` Jan Beulich
2016-04-27  3:28     ` Konrad Rzeszutek Wilk
2016-04-27  8:28       ` Jan Beulich
2016-04-27 15:48         ` Konrad Rzeszutek Wilk
2016-04-27 16:06           ` Jan Beulich
2016-04-27 16:14           ` Jan Beulich
2016-04-27 18:40             ` Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
2016-04-26 15:21   ` Jan Beulich
2016-04-27  3:39     ` Konrad Rzeszutek Wilk
2016-04-27  8:36       ` Jan Beulich
2016-05-11  9:51       ` Martin Pohlack
2016-05-11 13:56         ` Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
2016-04-26 15:31   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
2016-04-26 15:48   ` Jan Beulich
2016-04-25 15:35 ` Konrad Rzeszutek Wilk [this message]
2016-04-26 15:53   ` [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup Jan Beulich
2016-04-25 15:35 ` [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
2016-04-26 11:06   ` Ross Lagerwall
2016-04-26 12:41     ` Jan Beulich
2016-04-26 12:48       ` Ross Lagerwall
2016-04-26 13:41         ` Jan Beulich
2016-04-27  3:31           ` Konrad Rzeszutek Wilk
2016-04-27  8:37             ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 17/27] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
2016-04-26 11:05   ` Ross Lagerwall
2016-04-26 13:08     ` Ross Lagerwall
2016-04-26 15:58   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 18/27] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
2016-04-26 16:01   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 19/27] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
2016-04-27  8:58   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 20/27] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 21/27] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 22/27] XENVER_build_id/libxc: Provide ld-embedded build-id Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 23/27] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 24/27] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
2016-04-27  9:27   ` Jan Beulich
2016-04-27 16:36     ` Konrad Rzeszutek Wilk
2016-04-28  9:47       ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 25/27] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
2016-04-27  9:31   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 27/27] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
2016-04-25 15:41 ` [PATCH 9] xSplice v1 design and implementation Jan Beulich
2016-04-25 15:47   ` Konrad Rzeszutek Wilk
2016-04-25 15:54     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1461598514-5440-16-git-send-email-konrad.wilk@oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=konrad@kernel.org \
    --cc=mpohlack@amazon.de \
    --cc=ross.lagerwall@citrix.com \
    --cc=sasha.levin@oracle.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).