qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] memory: an optimization
@ 2016-02-22  8:34 Gonglei
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Gonglei @ 2016-02-22  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng

Perf top tells me qemu_get_ram_ptr consume some cpu cycles.

Before this optimization:
  1.26%  qemu-kvm                  [.] qemu_get_ram_ptr
  0.89%  qemu-kvm                  [.] qemu_get_ram_block

Applied the patch set:
 0.87%  qemu-kvm                 [.] qemu_get_ram_ptr

And Paolo suggested that we can get rid of qemu_get_ram_ptr
by storing the RAMBlock pointer into the memory region,
instead of the ram_addr_t value. And after appling this change,
I got much better performance indeed.

BTW, PATCH 3 is an occasional find.

v2:
 - using 'struct RAMBlock *' instead of 'void *' in patch 1 [Fam]
 - drop superfluous comments in patch 1 [Fam]

Gonglei (3):
  exec: store RAMBlock pointer into memory region
  memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
  memory: Remove the superfluous code

 exec.c                | 48 ++++++++++++++++++++++++++++++------------------
 include/exec/memory.h |  8 ++++----
 memory.c              |  3 ++-
 3 files changed, 36 insertions(+), 23 deletions(-)

-- 
1.8.5.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region
  2016-02-22  8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei
@ 2016-02-22  8:34 ` Gonglei
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Gonglei @ 2016-02-22  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng

Each RAM memory region has a unique corresponding RAMBlock.
In the current realization, the memory region only stored
the ram_addr which means the offset of RAM address space,
We need to qurey the global ram.list to find the ram block
by ram_addr if we want to get the ram block, which is very
expensive.

Now, we store the RAMBlock pointer into memory region
structure. So, if we know the mr, we can easily get the
RAMBlock.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 exec.c                | 2 ++
 include/exec/memory.h | 2 ++
 memory.c              | 1 +
 3 files changed, 5 insertions(+)

diff --git a/exec.c b/exec.c
index 1f24500..4c0114a 100644
--- a/exec.c
+++ b/exec.c
@@ -1717,6 +1717,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
         error_propagate(errp, local_err);
         return -1;
     }
+
+    mr->ram_block = new_block;
     return addr;
 }
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index c92734a..4025729 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -156,6 +156,7 @@ struct MemoryRegionIOMMUOps {
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
+typedef struct RAMBlock RAMBlock;
 
 struct MemoryRegion {
     Object parent_obj;
@@ -172,6 +173,7 @@ struct MemoryRegion {
     bool global_locking;
     uint8_t dirty_log_mask;
     ram_addr_t ram_addr;
+    RAMBlock *ram_block;
     Object *owner;
     const MemoryRegionIOMMUOps *iommu_ops;
 
diff --git a/memory.c b/memory.c
index 09041ed..b4451dd 100644
--- a/memory.c
+++ b/memory.c
@@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr,
     }
     mr->name = g_strdup(name);
     mr->owner = owner;
+    mr->ram_block = NULL;
 
     if (name) {
         char *escaped_name = memory_region_escape_name(name);
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
  2016-02-22  8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei
@ 2016-02-22  8:34 ` Gonglei
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code Gonglei
  2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini
  3 siblings, 0 replies; 6+ messages in thread
From: Gonglei @ 2016-02-22  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng

these two functions consume too much cpu overhead to
find the RAMBlock by ram address.

After this patch, we can pass the RAMBlock pointer
to them so that they don't need to find the RAMBlock
anymore most of the time. We can get better performance
in address translation processing.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 exec.c                | 46 ++++++++++++++++++++++++++++------------------
 include/exec/memory.h |  4 ++--
 memory.c              |  2 +-
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 4c0114a..c62c439 100644
--- a/exec.c
+++ b/exec.c
@@ -1868,9 +1868,13 @@ void *qemu_get_ram_block_host_ptr(ram_addr_t addr)
  *
  * Called within RCU critical section.
  */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+void *qemu_get_ram_ptr(RAMBlock *ram_block, ram_addr_t addr)
 {
-    RAMBlock *block = qemu_get_ram_block(addr);
+    RAMBlock *block = ram_block;
+
+    if (block == NULL) {
+        block = qemu_get_ram_block(addr);
+    }
 
     if (xen_enabled() && block->host == NULL) {
         /* We need to check if the requested address is in the RAM
@@ -1891,15 +1895,18 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
  *
  * Called within RCU critical section.
  */
-static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
+static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
+                                 hwaddr *size)
 {
-    RAMBlock *block;
+    RAMBlock *block = ram_block;
     ram_addr_t offset_inside_block;
     if (*size == 0) {
         return NULL;
     }
 
-    block = qemu_get_ram_block(addr);
+    if (block == NULL) {
+        block = qemu_get_ram_block(addr);
+    }
     offset_inside_block = addr - block->offset;
     *size = MIN(*size, block->max_length - offset_inside_block);
 
@@ -2027,13 +2034,13 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
     }
     switch (size) {
     case 1:
-        stb_p(qemu_get_ram_ptr(ram_addr), val);
+        stb_p(qemu_get_ram_ptr(NULL, ram_addr), val);
         break;
     case 2:
-        stw_p(qemu_get_ram_ptr(ram_addr), val);
+        stw_p(qemu_get_ram_ptr(NULL, ram_addr), val);
         break;
     case 4:
-        stl_p(qemu_get_ram_ptr(ram_addr), val);
+        stl_p(qemu_get_ram_ptr(NULL, ram_addr), val);
         break;
     default:
         abort();
@@ -2609,7 +2616,7 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
         } else {
             addr1 += memory_region_get_ram_addr(mr);
             /* RAM case */
-            ptr = qemu_get_ram_ptr(addr1);
+            ptr = qemu_get_ram_ptr(mr->ram_block, addr1);
             memcpy(ptr, buf, l);
             invalidate_and_set_dirty(mr, addr1, l);
         }
@@ -2700,7 +2707,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
             }
         } else {
             /* RAM case */
-            ptr = qemu_get_ram_ptr(mr->ram_addr + addr1);
+            ptr = qemu_get_ram_ptr(mr->ram_block, mr->ram_addr + addr1);
             memcpy(buf, ptr, l);
         }
 
@@ -2785,7 +2792,7 @@ static inline void cpu_physical_memory_write_rom_internal(AddressSpace *as,
         } else {
             addr1 += memory_region_get_ram_addr(mr);
             /* ROM/RAM case */
-            ptr = qemu_get_ram_ptr(addr1);
+            ptr = qemu_get_ram_ptr(mr->ram_block, addr1);
             switch (type) {
             case WRITE_DATA:
                 memcpy(ptr, buf, l);
@@ -2997,7 +3004,7 @@ void *address_space_map(AddressSpace *as,
 
     memory_region_ref(mr);
     *plen = done;
-    ptr = qemu_ram_ptr_length(raddr + base, plen);
+    ptr = qemu_ram_ptr_length(mr->ram_block, raddr + base, plen);
     rcu_read_unlock();
 
     return ptr;
@@ -3081,7 +3088,8 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
 #endif
     } else {
         /* RAM case */
-        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
+        ptr = qemu_get_ram_ptr(mr->ram_block,
+                               (memory_region_get_ram_addr(mr)
                                 & TARGET_PAGE_MASK)
                                + addr1);
         switch (endian) {
@@ -3176,7 +3184,8 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
 #endif
     } else {
         /* RAM case */
-        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
+        ptr = qemu_get_ram_ptr(mr->ram_block,
+                               (memory_region_get_ram_addr(mr)
                                 & TARGET_PAGE_MASK)
                                + addr1);
         switch (endian) {
@@ -3291,7 +3300,8 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
 #endif
     } else {
         /* RAM case */
-        ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
+        ptr = qemu_get_ram_ptr(mr->ram_block,
+                               (memory_region_get_ram_addr(mr)
                                 & TARGET_PAGE_MASK)
                                + addr1);
         switch (endian) {
@@ -3376,7 +3386,7 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
         r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
     } else {
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
-        ptr = qemu_get_ram_ptr(addr1);
+        ptr = qemu_get_ram_ptr(mr->ram_block, addr1);
         stl_p(ptr, val);
 
         dirty_log_mask = memory_region_get_dirty_log_mask(mr);
@@ -3431,7 +3441,7 @@ static inline void address_space_stl_internal(AddressSpace *as,
     } else {
         /* RAM case */
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
-        ptr = qemu_get_ram_ptr(addr1);
+        ptr = qemu_get_ram_ptr(mr->ram_block, addr1);
         switch (endian) {
         case DEVICE_LITTLE_ENDIAN:
             stl_le_p(ptr, val);
@@ -3541,7 +3551,7 @@ static inline void address_space_stw_internal(AddressSpace *as,
     } else {
         /* RAM case */
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
-        ptr = qemu_get_ram_ptr(addr1);
+        ptr = qemu_get_ram_ptr(mr->ram_block, addr1);
         switch (endian) {
         case DEVICE_LITTLE_ENDIAN:
             stw_le_p(ptr, val);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4025729..1cf2e51 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1391,7 +1391,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
 					MemoryRegion *mr);
 MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
                                     MemTxAttrs attrs, uint8_t *buf, int len);
-void *qemu_get_ram_ptr(ram_addr_t addr);
+void *qemu_get_ram_ptr(RAMBlock *ram_block, ram_addr_t addr);
 
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
@@ -1432,7 +1432,7 @@ MemTxResult address_space_read(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
             mr = address_space_translate(as, addr, &addr1, &l, false);
             if (len == l && memory_access_is_direct(mr, false)) {
                 addr1 += memory_region_get_ram_addr(mr);
-                ptr = qemu_get_ram_ptr(addr1);
+                ptr = qemu_get_ram_ptr(mr->ram_block, addr1);
                 memcpy(buf, ptr, len);
             } else {
                 result = address_space_read_continue(as, addr, attrs, buf, len,
diff --git a/memory.c b/memory.c
index b4451dd..0dd9695 100644
--- a/memory.c
+++ b/memory.c
@@ -1570,7 +1570,7 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr)
         mr = mr->alias;
     }
     assert(mr->ram_addr != RAM_ADDR_INVALID);
-    ptr = qemu_get_ram_ptr(mr->ram_addr & TARGET_PAGE_MASK);
+    ptr = qemu_get_ram_ptr(mr->ram_block, mr->ram_addr & TARGET_PAGE_MASK);
     rcu_read_unlock();
 
     return ptr + offset;
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code
  2016-02-22  8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei
@ 2016-02-22  8:34 ` Gonglei
  2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini
  3 siblings, 0 replies; 6+ messages in thread
From: Gonglei @ 2016-02-22  8:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Gonglei, peter.huangpeng

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 include/exec/memory.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1cf2e51..4e5a145 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1400,8 +1400,6 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
     } else {
         return memory_region_is_ram(mr) || memory_region_is_romd(mr);
     }
-
-    return false;
 }
 
 /**
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] memory: an optimization
  2016-02-22  8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei
                   ` (2 preceding siblings ...)
  2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code Gonglei
@ 2016-02-22 10:22 ` Paolo Bonzini
  2016-02-23  3:49   ` Fam Zheng
  3 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2016-02-22 10:22 UTC (permalink / raw)
  To: Gonglei, qemu-devel; +Cc: peter.huangpeng



On 22/02/2016 09:34, Gonglei wrote:
> Perf top tells me qemu_get_ram_ptr consume some cpu cycles.
> 
> Before this optimization:
>   1.26%  qemu-kvm                  [.] qemu_get_ram_ptr
>   0.89%  qemu-kvm                  [.] qemu_get_ram_block
> 
> Applied the patch set:
>  0.87%  qemu-kvm                 [.] qemu_get_ram_ptr
> 
> And Paolo suggested that we can get rid of qemu_get_ram_ptr
> by storing the RAMBlock pointer into the memory region,
> instead of the ram_addr_t value. And after appling this change,
> I got much better performance indeed.
> 
> BTW, PATCH 3 is an occasional find.
> 
> v2:
>  - using 'struct RAMBlock *' instead of 'void *' in patch 1 [Fam]
>  - drop superfluous comments in patch 1 [Fam]
> 
> Gonglei (3):
>   exec: store RAMBlock pointer into memory region
>   memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
>   memory: Remove the superfluous code
> 
>  exec.c                | 48 ++++++++++++++++++++++++++++++------------------
>  include/exec/memory.h |  8 ++++----
>  memory.c              |  3 ++-
>  3 files changed, 36 insertions(+), 23 deletions(-)
> 

Thanks Lei and Fam, patches queued.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] memory: an optimization
  2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini
@ 2016-02-23  3:49   ` Fam Zheng
  0 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2016-02-23  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gonglei, qemu-devel, peter.huangpeng

On Mon, 02/22 11:22, Paolo Bonzini wrote:
> 
> 
> On 22/02/2016 09:34, Gonglei wrote:
> > Perf top tells me qemu_get_ram_ptr consume some cpu cycles.
> > 
> > Before this optimization:
> >   1.26%  qemu-kvm                  [.] qemu_get_ram_ptr
> >   0.89%  qemu-kvm                  [.] qemu_get_ram_block
> > 
> > Applied the patch set:
> >  0.87%  qemu-kvm                 [.] qemu_get_ram_ptr
> > 
> > And Paolo suggested that we can get rid of qemu_get_ram_ptr
> > by storing the RAMBlock pointer into the memory region,
> > instead of the ram_addr_t value. And after appling this change,
> > I got much better performance indeed.
> > 
> > BTW, PATCH 3 is an occasional find.
> > 
> > v2:
> >  - using 'struct RAMBlock *' instead of 'void *' in patch 1 [Fam]
> >  - drop superfluous comments in patch 1 [Fam]
> > 
> > Gonglei (3):
> >   exec: store RAMBlock pointer into memory region
> >   memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
> >   memory: Remove the superfluous code
> > 
> >  exec.c                | 48 ++++++++++++++++++++++++++++++------------------
> >  include/exec/memory.h |  8 ++++----
> >  memory.c              |  3 ++-
> >  3 files changed, 36 insertions(+), 23 deletions(-)
> > 
> 
> Thanks Lei and Fam, patches queued.

Thanks!

Actually I'd like to clean this up a bit more: moving assigning to
mr->ram_block from exec.c to memory.c, and drop mr->ram_addr.

I've already done these on top of master last Friday before v1 of this was
posted (oops! :), but I can rebase on top of these patches.

And upon that, I think we can replicate the ram_list.mru_block trick as
AddressSpaceDispatch.mru_section, to further reduce the calls to
qemu_get_ram_ptr.

Paolo, is there a git branch I can base off?

Fam

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-23  3:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22  8:34 [Qemu-devel] [PATCH v2 0/3] memory: an optimization Gonglei
2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 1/3] exec: store RAMBlock pointer into memory region Gonglei
2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 2/3] memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length Gonglei
2016-02-22  8:34 ` [Qemu-devel] [PATCH v2 3/3] memory: Remove the superfluous code Gonglei
2016-02-22 10:22 ` [Qemu-devel] [PATCH v2 0/3] memory: an optimization Paolo Bonzini
2016-02-23  3:49   ` Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).