All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] e1000 memory corruption in guest OS
@ 2014-02-16  4:29 Hoyer, David
  2014-02-24  4:20 ` Alexey Kardashevskiy
  2014-02-24 14:02 ` Paolo Bonzini
  0 siblings, 2 replies; 12+ messages in thread
From: Hoyer, David @ 2014-02-16  4:29 UTC (permalink / raw)
  To: Qemu-devel; +Cc: Moyer, Keith, Best, Tish


[-- Attachment #1.1: Type: text/plain, Size: 3776 bytes --]

We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie.   We are noticing that when we transfer large files from our network to the guestOS via the e1000 virtual network device that we experience memory corruption on the guestOS.   We have debugged this problem and have determined where it appears that the corruption is happening and have created a patch file with a fix (at least the corruption is no longer happening on our guestOS anymore).     Note that our test file is a large file consisting of the value 0x61 repeated over the entire file.

To troubleshoot this issue, we enabled tracing in qemu and used the xen_map_cache and xen_map_cache_return trace events.  We also added some of our own debug statements in e1000.c before and after the function call to DMA the network packet to the guestOS descriptor address.  Below is a commented summary of the trace output:

/*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted
xen_map_cache want 0xe00000
xen_map_cache_return 0x7f15c313f000
/*** It wasn't corrupted before the dma write
/*** DMA a packet of length 0x5aa containing '0x61616161...' to guestOS at address 0x12ffac2 (which maps to 0x7f15c313eac2)
dma write to 12ffac2 len 5aa
xen_map_cache want 0x12ffac2
xen_map_cache_return 0x7f15c313eac2
/*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted
xen_map_cache want 0xe00000
xen_map_cache_return 0x7f15c313f000
/*** It is corrupted now.
e1000: Corrupted 7: test_buf:5aa 5aa

The DMA address 0x12ffac2 mapped to 0x7f15c313eac2.  When you add the packet length, 0x5aa, the result is 0x7f15c313f06c.  This result is 0x6c bytes into the mapping of guestOS address 0xe00000, which mapped to 0x7f15c313f000.  If you dump 0xe00000 in the guestOS, 0x6c bytes are corrupted.

We believe that the correct fix is to use qemu_ram_ptr_length instead of qemu_get_ram_ptr in the function address_space_rw to ensure (from what we can tell) that the mapped address is valid for the entire length specified.  It looked like this might also be an issue in cpu_physical_memory_write_rom so we made the change there as well.

We are fairly new to the qemu source base so we are looking to the community to see if this problem has previously been identified and to see if this is the correct fix.

Following is the patch

--- orig/exec.c 2013-11-27 16:52:55.000000000 -0600
+++ new/exec.c  2014-02-15 21:58:34.311518000 -0600
@@ -1911,7 +1911,7 @@
             } else {
                 addr1 += memory_region_get_ram_addr(mr);
                 /* RAM case */
-                ptr = qemu_get_ram_ptr(addr1);
+                ptr = qemu_ram_ptr_length(addr1, &l);
                 memcpy(ptr, buf, l);
                 invalidate_and_set_dirty(addr1, l);
             }
@@ -1945,7 +1945,7 @@
                 }
             } else {
                 /* RAM case */
-                ptr = qemu_get_ram_ptr(mr->ram_addr + addr1);
+                ptr = qemu_ram_ptr_length(mr->ram_addr + addr1, &l);
                 memcpy(buf, ptr, l);
             }
         }
@@ -1995,7 +1995,7 @@
         } else {
             addr1 += memory_region_get_ram_addr(mr);
             /* ROM/RAM case */
-            ptr = qemu_get_ram_ptr(addr1);
+            ptr = qemu_ram_ptr_length(addr1, &l);
             memcpy(ptr, buf, l);
             invalidate_and_set_dirty(addr1, l);
         }


David Hoyer
Controller Firmware Development
Array Products Group

NetApp
3718 N. Rock Road
Wichita, KS 67226
316-636-8047 phone
316-617-3677 mobile
David.Hoyer@netapp.com<mailto:David.Hoyer@lsi.com>
netapp.com<http://www.netapp.com/?ref_source=eSig>

 [Description: http://media.netapp.com/images/netapp-logo-sig-5.gif]


[-- Attachment #1.2: Type: text/html, Size: 15227 bytes --]

[-- Attachment #2: image001.gif --]
[-- Type: image/gif, Size: 2568 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-02-16  4:29 [Qemu-devel] e1000 memory corruption in guest OS Hoyer, David
@ 2014-02-24  4:20 ` Alexey Kardashevskiy
  2014-03-01 12:30   ` Alexey Kardashevskiy
  2014-02-24 14:02 ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-24  4:20 UTC (permalink / raw)
  To: Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

On 02/16/2014 03:29 PM, Hoyer, David wrote:

> We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie.   We are
> noticing that when we transfer large files from our network to the
> guestOS via the e1000 virtual network device that we experience memory
> corruption on the guestOS.   We have debugged this problem and have
> determined where it appears that the corruption is happening and have
> created a patch file with a fix (at least the corruption is no longer
> happening on our guestOS anymore).     Note that our test file is a
> large file consisting of the value 0x61 repeated over the entire file.
> 

> To troubleshoot this issue, we enabled tracing in qemu and used the
> xen_map_cache and xen_map_cache_return trace events.  We also added some
> of our own debug statements in e1000.c before and after the function
> call to DMA the network packet to the guestOS descriptor address.  Below
> is a commented summary of the trace output:
> 

> /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted
> xen_map_cache want 0xe00000
> xen_map_cache_return 0x7f15c313f000
> /*** It wasn't corrupted before the dma write
> /*** DMA a packet of length 0x5aa containing '0x61616161...' to guestOS at address 0x12ffac2 (which maps to 0x7f15c313eac2)
> dma write to 12ffac2 len 5aa
> xen_map_cache want 0x12ffac2
> xen_map_cache_return 0x7f15c313eac2
> /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted
> xen_map_cache want 0xe00000
> xen_map_cache_return 0x7f15c313f000
> /*** It is corrupted now.
> e1000: Corrupted 7: test_buf:5aa 5aa
> 

> The DMA address 0x12ffac2 mapped to 0x7f15c313eac2.  When you add the
> packet length, 0x5aa, the result is 0x7f15c313f06c.  This result is 0x6c
> bytes into the mapping of guestOS address 0xe00000, which mapped to
> 0x7f15c313f000.  If you dump 0xe00000 in the guestOS, 0x6c bytes are
> corrupted.

> We believe that the correct fix is to use qemu_ram_ptr_length instead of
> qemu_get_ram_ptr in the function address_space_rw to ensure (from what
> we can tell) that the mapped address is valid for the entire length
> specified.  It looked like this might also be an issue in
> cpu_physical_memory_write_rom so we made the change there as well.
> 


Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.

(0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
(0x5aa - 0x53e) = 0x6c bytes get corrupted.

I see here buffer overrun from e1000 and I suspect that your patch just
hides this problem. What did I miss?

Does e1000 still work with the patch applied? Are all 100% packets
delivered fine?




> We are fairly new to the qemu source base so we are looking to the
> community to see if this problem has previously been identified and to
> see if this is the correct fix.
> 


> Following is the patch
> 
> --- orig/exec.c 2013-11-27 16:52:55.000000000 -0600
> +++ new/exec.c  2014-02-15 21:58:34.311518000 -0600
> @@ -1911,7 +1911,7 @@
>              } else {
>                  addr1 += memory_region_get_ram_addr(mr);
>                  /* RAM case */
> -                ptr = qemu_get_ram_ptr(addr1);
> +                ptr = qemu_ram_ptr_length(addr1, &l);
>                  memcpy(ptr, buf, l);
>                  invalidate_and_set_dirty(addr1, l);
>              }
> @@ -1945,7 +1945,7 @@
>                  }
>              } else {
>                  /* RAM case */
> -                ptr = qemu_get_ram_ptr(mr->ram_addr + addr1);
> +                ptr = qemu_ram_ptr_length(mr->ram_addr + addr1, &l);
>                  memcpy(buf, ptr, l);
>              }
>          }
> @@ -1995,7 +1995,7 @@
>          } else {
>              addr1 += memory_region_get_ram_addr(mr);
>              /* ROM/RAM case */
> -            ptr = qemu_get_ram_ptr(addr1);
> +            ptr = qemu_ram_ptr_length(addr1, &l);
>              memcpy(ptr, buf, l);
>              invalidate_and_set_dirty(addr1, l);
>          }
> 
> 
> David Hoyer
> Controller Firmware Development
> Array Products Group
> 
> NetApp
> 3718 N. Rock Road
> Wichita, KS 67226
> 316-636-8047 phone
> 316-617-3677 mobile
> David.Hoyer@netapp.com<mailto:David.Hoyer@lsi.com>
> netapp.com<http://www.netapp.com/?ref_source=eSig>
> 
>  [Description: http://media.netapp.com/images/netapp-logo-sig-5.gif]
> 
> 

Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-02-16  4:29 [Qemu-devel] e1000 memory corruption in guest OS Hoyer, David
  2014-02-24  4:20 ` Alexey Kardashevskiy
@ 2014-02-24 14:02 ` Paolo Bonzini
  2014-02-24 14:15   ` Hoyer, David
  1 sibling, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-02-24 14:02 UTC (permalink / raw)
  To: Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

Il 16/02/2014 05:29, Hoyer, David ha scritto:
> We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie.   We are
> noticing that when we transfer large files from our network to the
> guestOS via the e1000 virtual network device that we experience memory
> corruption on the guestOS.   We have debugged this problem and have
> determined where it appears that the corruption is happening and have
> created a patch file with a fix (at least the corruption is no longer
> happening on our guestOS anymore).     Note that our test file is a
> large file consisting of the value 0x61 repeated over the entire file.

I think you are missing this patch:

commit 360e607b88a23d378f6efaa769c76d26f538234d
Author: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Date:   Thu Jan 30 12:46:05 2014 +0000

    address_space_translate: do not cross page boundaries
    
    The following commit:
    
    commit 149f54b53b7666a3facd45e86eece60ce7d3b114
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Fri May 24 12:59:37 2013 +0200
    
        memory: add address_space_translate
    
    breaks Xen support in QEMU, in particular the Xen mapcache. The effect
    is that one Windows XP installation out of ten would end up with BSOD.
    
    The reason is that after this commit l in address_space_rw can span a
    page boundary, however qemu_get_ram_ptr still calls xen_map_cache asking
    to map a single page (if block->offset == 0).
    
    Fix the issue by reverting to the previous behaviour: do not return a
    length from address_space_translate_internal that can span a page
    boundary.
    
    Also in address_space_translate do not ignore the length returned by
    address_space_translate_internal.
    
    This patch should be backported to QEMU 1.6.x.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Signed-off-by: Anthony Perard <anthony.perard@citrix.com>
    Tested-by: Paolo Bonzini <pbonzini@redhat.com>
    Acked-by: Paolo Bonzini <pbonzini@redhat.com>
    Cc: qemu-stable@nongnu.org

and the subsequent fix:

commit a87f39543a9259f671c5413723311180ee2ad2a8
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Feb 7 15:47:46 2014 +0100

    memory: fix limiting of translation at a page boundary
    
    Commit 360e607 (address_space_translate: do not cross page boundaries,
    2014-01-30) broke MMIO accesses in cases where the section is shorter
    than the full register width.  This can happen for example with the
    Bochs DISPI registers, which are 16 bits wide but have only a 1-byte
    long MemoryRegion (if you write to the "second byte" of the register
    your access is discarded; it doesn't write only to half of the register).
    
    Restrict the action of commit 360e607 to direct RAM accesses.  This
    is enough for Xen, since MMIO will not go through the mapcache.
    
    Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


Both of them will be included next week in 1.7.1.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-02-24 14:02 ` Paolo Bonzini
@ 2014-02-24 14:15   ` Hoyer, David
  0 siblings, 0 replies; 12+ messages in thread
From: Hoyer, David @ 2014-02-24 14:15 UTC (permalink / raw)
  To: Paolo Bonzini, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

Yes - we found this late last week and it did fix our issue.   Thanks!

-----Original Message-----
From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bonzini
Sent: Monday, February 24, 2014 8:02 AM
To: Hoyer, David; Qemu-devel@nongnu.org
Cc: Moyer, Keith; Best, Tish
Subject: Re: e1000 memory corruption in guest OS

Il 16/02/2014 05:29, Hoyer, David ha scritto:
> We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie.   We are
> noticing that when we transfer large files from our network to the 
> guestOS via the e1000 virtual network device that we experience memory
> corruption on the guestOS.   We have debugged this problem and have
> determined where it appears that the corruption is happening and have 
> created a patch file with a fix (at least the corruption is no longer
> happening on our guestOS anymore).     Note that our test file is a
> large file consisting of the value 0x61 repeated over the entire file.

I think you are missing this patch:

commit 360e607b88a23d378f6efaa769c76d26f538234d
Author: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Date:   Thu Jan 30 12:46:05 2014 +0000

    address_space_translate: do not cross page boundaries
    
    The following commit:
    
    commit 149f54b53b7666a3facd45e86eece60ce7d3b114
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Fri May 24 12:59:37 2013 +0200
    
        memory: add address_space_translate
    
    breaks Xen support in QEMU, in particular the Xen mapcache. The effect
    is that one Windows XP installation out of ten would end up with BSOD.
    
    The reason is that after this commit l in address_space_rw can span a
    page boundary, however qemu_get_ram_ptr still calls xen_map_cache asking
    to map a single page (if block->offset == 0).
    
    Fix the issue by reverting to the previous behaviour: do not return a
    length from address_space_translate_internal that can span a page
    boundary.
    
    Also in address_space_translate do not ignore the length returned by
    address_space_translate_internal.
    
    This patch should be backported to QEMU 1.6.x.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Signed-off-by: Anthony Perard <anthony.perard@citrix.com>
    Tested-by: Paolo Bonzini <pbonzini@redhat.com>
    Acked-by: Paolo Bonzini <pbonzini@redhat.com>
    Cc: qemu-stable@nongnu.org

and the subsequent fix:

commit a87f39543a9259f671c5413723311180ee2ad2a8
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Feb 7 15:47:46 2014 +0100

    memory: fix limiting of translation at a page boundary
    
    Commit 360e607 (address_space_translate: do not cross page boundaries,
    2014-01-30) broke MMIO accesses in cases where the section is shorter
    than the full register width.  This can happen for example with the
    Bochs DISPI registers, which are 16 bits wide but have only a 1-byte
    long MemoryRegion (if you write to the "second byte" of the register
    your access is discarded; it doesn't write only to half of the register).
    
    Restrict the action of commit 360e607 to direct RAM accesses.  This
    is enough for Xen, since MMIO will not go through the mapcache.
    
    Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


Both of them will be included next week in 1.7.1.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-02-24  4:20 ` Alexey Kardashevskiy
@ 2014-03-01 12:30   ` Alexey Kardashevskiy
  2014-03-01 21:31     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-01 12:30 UTC (permalink / raw)
  To: Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

On 02/24/2014 03:20 PM, Alexey Kardashevskiy wrote:
> On 02/16/2014 03:29 PM, Hoyer, David wrote:
> 
>> We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie.   We are
>> noticing that when we transfer large files from our network to the
>> guestOS via the e1000 virtual network device that we experience memory
>> corruption on the guestOS.   We have debugged this problem and have
>> determined where it appears that the corruption is happening and have
>> created a patch file with a fix (at least the corruption is no longer
>> happening on our guestOS anymore).     Note that our test file is a
>> large file consisting of the value 0x61 repeated over the entire file.
>>
> 
>> To troubleshoot this issue, we enabled tracing in qemu and used the
>> xen_map_cache and xen_map_cache_return trace events.  We also added some
>> of our own debug statements in e1000.c before and after the function
>> call to DMA the network packet to the guestOS descriptor address.  Below
>> is a commented summary of the trace output:
>>
> 
>> /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted
>> xen_map_cache want 0xe00000
>> xen_map_cache_return 0x7f15c313f000
>> /*** It wasn't corrupted before the dma write
>> /*** DMA a packet of length 0x5aa containing '0x61616161...' to guestOS at address 0x12ffac2 (which maps to 0x7f15c313eac2)
>> dma write to 12ffac2 len 5aa
>> xen_map_cache want 0x12ffac2
>> xen_map_cache_return 0x7f15c313eac2
>> /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted
>> xen_map_cache want 0xe00000
>> xen_map_cache_return 0x7f15c313f000
>> /*** It is corrupted now.
>> e1000: Corrupted 7: test_buf:5aa 5aa
>>
> 
>> The DMA address 0x12ffac2 mapped to 0x7f15c313eac2.  When you add the
>> packet length, 0x5aa, the result is 0x7f15c313f06c.  This result is 0x6c
>> bytes into the mapping of guestOS address 0xe00000, which mapped to
>> 0x7f15c313f000.  If you dump 0xe00000 in the guestOS, 0x6c bytes are
>> corrupted.
> 
>> We believe that the correct fix is to use qemu_ram_ptr_length instead of
>> qemu_get_ram_ptr in the function address_space_rw to ensure (from what
>> we can tell) that the mapped address is valid for the entire length
>> specified.  It looked like this might also be an issue in
>> cpu_physical_memory_write_rom so we made the change there as well.
>>
> 
> 
> Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
> The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
> 
> (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
> (0x5aa - 0x53e) = 0x6c bytes get corrupted.
> 
> I see here buffer overrun from e1000 and I suspect that your patch just
> hides this problem. What did I miss?


Ping, anyone?



> Does e1000 still work with the patch applied? Are all 100% packets
> delivered fine?
> 
> 
> 
> 
>> We are fairly new to the qemu source base so we are looking to the
>> community to see if this problem has previously been identified and to
>> see if this is the correct fix.
>>
> 
> 
>> Following is the patch
>>
>> --- orig/exec.c 2013-11-27 16:52:55.000000000 -0600
>> +++ new/exec.c  2014-02-15 21:58:34.311518000 -0600
>> @@ -1911,7 +1911,7 @@
>>              } else {
>>                  addr1 += memory_region_get_ram_addr(mr);
>>                  /* RAM case */
>> -                ptr = qemu_get_ram_ptr(addr1);
>> +                ptr = qemu_ram_ptr_length(addr1, &l);
>>                  memcpy(ptr, buf, l);
>>                  invalidate_and_set_dirty(addr1, l);
>>              }
>> @@ -1945,7 +1945,7 @@
>>                  }
>>              } else {
>>                  /* RAM case */
>> -                ptr = qemu_get_ram_ptr(mr->ram_addr + addr1);
>> +                ptr = qemu_ram_ptr_length(mr->ram_addr + addr1, &l);
>>                  memcpy(buf, ptr, l);
>>              }
>>          }
>> @@ -1995,7 +1995,7 @@
>>          } else {
>>              addr1 += memory_region_get_ram_addr(mr);
>>              /* ROM/RAM case */
>> -            ptr = qemu_get_ram_ptr(addr1);
>> +            ptr = qemu_ram_ptr_length(addr1, &l);
>>              memcpy(ptr, buf, l);
>>              invalidate_and_set_dirty(addr1, l);
>>          }
>>
>>
>> David Hoyer
>> Controller Firmware Development
>> Array Products Group
>>
>> NetApp
>> 3718 N. Rock Road
>> Wichita, KS 67226
>> 316-636-8047 phone
>> 316-617-3677 mobile
>> David.Hoyer@netapp.com<mailto:David.Hoyer@lsi.com>
>> netapp.com<http://www.netapp.com/?ref_source=eSig>


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-01 12:30   ` Alexey Kardashevskiy
@ 2014-03-01 21:31     ` Paolo Bonzini
  2014-03-03  1:58       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-03-01 21:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

Il 01/03/2014 13:30, Alexey Kardashevskiy ha scritto:
>> >
>> > Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
>> > The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
>> >
>> > (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
>> > (0x5aa - 0x53e) = 0x6c bytes get corrupted.
>> >
>> > I see here buffer overrun from e1000 and I suspect that your patch just
>> > hides this problem. What did I miss?
>
> Ping, anyone?

You missed that this is a Xen-specific problem.  Xen maps things a page 
at a time, so address_space_map/unmap/rw can operate only on a small 
part of the requested [address, address+length) range.

So there is no overrun in e1000.  The patch is incomplete, because it 
fixes only address_space_rw, but the problem is indeed in exec.c.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-01 21:31     ` Paolo Bonzini
@ 2014-03-03  1:58       ` Alexey Kardashevskiy
  2014-03-03  8:33         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-03  1:58 UTC (permalink / raw)
  To: Paolo Bonzini, Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

On 03/02/2014 08:31 AM, Paolo Bonzini wrote:
> Il 01/03/2014 13:30, Alexey Kardashevskiy ha scritto:
>>> >
>>> > Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
>>> > The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
>>> >
>>> > (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
>>> > (0x5aa - 0x53e) = 0x6c bytes get corrupted.
>>> >
>>> > I see here buffer overrun from e1000 and I suspect that your patch just
>>> > hides this problem. What did I miss?
>>
>> Ping, anyone?
> 
> You missed that this is a Xen-specific problem.  Xen maps things a page at
> a time, so address_space_map/unmap/rw can operate only on a small part of
> the requested [address, address+length) range.

Sorry, I am not following you here. Does KVM map things not page-aligned?


> So there is no overrun in e1000.  The patch is incomplete, because it fixes
> only address_space_rw, but the problem is indeed in exec.c.

So you know what the problem is? We have a bunch of bugreports against
e1000 breaking things...


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-03  1:58       ` Alexey Kardashevskiy
@ 2014-03-03  8:33         ` Paolo Bonzini
  2014-03-03 10:47           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-03-03  8:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

Il 03/03/2014 02:58, Alexey Kardashevskiy ha scritto:
> On 03/02/2014 08:31 AM, Paolo Bonzini wrote:
>> Il 01/03/2014 13:30, Alexey Kardashevskiy ha scritto:
>>>>>
>>>>> Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
>>>>> The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
>>>>>
>>>>> (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
>>>>> (0x5aa - 0x53e) = 0x6c bytes get corrupted.
>>>>>
>>>>> I see here buffer overrun from e1000 and I suspect that your patch just
>>>>> hides this problem. What did I miss?
>>>
>>> Ping, anyone?
>>
>> You missed that this is a Xen-specific problem.  Xen maps things a page at
>> a time, so address_space_map/unmap/rw can operate only on a small part of
>> the requested [address, address+length) range.
>
> Sorry, I am not following you here. Does KVM map things not page-aligned?

Look in exec.c for xen_enabled().  Xen's implementation of 
address_space_map/unmap is completely different.

>> So there is no overrun in e1000.  The patch is incomplete, because it fixes
>> only address_space_rw, but the problem is indeed in exec.c.
>
> So you know what the problem is? We have a bunch of bugreports against
> e1000 breaking things...

This one had been reported and fixed already.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-03  8:33         ` Paolo Bonzini
@ 2014-03-03 10:47           ` Alexey Kardashevskiy
  2014-03-03 11:21             ` Paolo Bonzini
  2014-03-03 21:33             ` Don Slutz
  0 siblings, 2 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-03 10:47 UTC (permalink / raw)
  To: Paolo Bonzini, Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

On 03/03/2014 07:33 PM, Paolo Bonzini wrote:
> Il 03/03/2014 02:58, Alexey Kardashevskiy ha scritto:
>> On 03/02/2014 08:31 AM, Paolo Bonzini wrote:
>>> Il 01/03/2014 13:30, Alexey Kardashevskiy ha scritto:
>>>>>>
>>>>>> Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
>>>>>> The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
>>>>>>
>>>>>> (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
>>>>>> (0x5aa - 0x53e) = 0x6c bytes get corrupted.
>>>>>>
>>>>>> I see here buffer overrun from e1000 and I suspect that your patch just
>>>>>> hides this problem. What did I miss?
>>>>
>>>> Ping, anyone?
>>>
>>> You missed that this is a Xen-specific problem.  Xen maps things a page at
>>> a time, so address_space_map/unmap/rw can operate only on a small part of
>>> the requested [address, address+length) range.
>>
>> Sorry, I am not following you here. Does KVM map things not page-aligned?
> 
> Look in exec.c for xen_enabled().  Xen's implementation of
> address_space_map/unmap is completely different.

Honestly cannot see much difference in the current QEMU...

> 
>>> So there is no overrun in e1000.  The patch is incomplete, because it fixes
>>> only address_space_rw, but the problem is indeed in exec.c.
>>
>> So you know what the problem is? We have a bunch of bugreports against
>> e1000 breaking things...
> 
> This one had been reported and fixed already.

Any hint? :) Could not spot any related to overrun except:
a0ae17a 8 months ago Andrew Jones e1000: cleanup process_tx_desc
but commit message says "complaints are false positives"...


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-03 10:47           ` Alexey Kardashevskiy
@ 2014-03-03 11:21             ` Paolo Bonzini
  2014-03-03 21:33             ` Don Slutz
  1 sibling, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-03-03 11:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Hoyer, David, Qemu-devel; +Cc: Moyer, Keith, Best, Tish

Il 03/03/2014 11:47, Alexey Kardashevskiy ha scritto:
> > > Sorry, I am not following you here. Does KVM map things not page-aligned?
> >
> > Look in exec.c for xen_enabled().  Xen's implementation of
> > address_space_map/unmap is completely different.
>
> Honestly cannot see much difference in the current QEMU...

void *qemu_get_ram_ptr(ram_addr_t addr)
{
     RAMBlock *block = qemu_get_ram_block(addr);

     if (xen_enabled()) {
         /* We need to check if the requested address is in the RAM
          * because we don't want to map the entire memory in QEMU.
          * In that case just map until the end of the page.
          */
         if (block->offset == 0) {
             return xen_map_cache(addr, 0, 0);
         } else if (block->host == NULL) {
             block->host =
                 xen_map_cache(block->offset, block->length, 1);
         }
     }
     return block->host + (addr - block->offset);
}

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-03 10:47           ` Alexey Kardashevskiy
  2014-03-03 11:21             ` Paolo Bonzini
@ 2014-03-03 21:33             ` Don Slutz
  2014-03-03 22:53               ` Alexey Kardashevskiy
  1 sibling, 1 reply; 12+ messages in thread
From: Don Slutz @ 2014-03-03 21:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paolo Bonzini, Hoyer, David, Qemu-devel
  Cc: Moyer, Keith, Best, Tish

On 03/03/14 05:47, Alexey Kardashevskiy wrote:
> On 03/03/2014 07:33 PM, Paolo Bonzini wrote:
>> Il 03/03/2014 02:58, Alexey Kardashevskiy ha scritto:
>>> On 03/02/2014 08:31 AM, Paolo Bonzini wrote:
>>>> Il 01/03/2014 13:30, Alexey Kardashevskiy ha scritto:
>>>>>>> Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
>>>>>>> The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
>>>>>>>
>>>>>>> (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
>>>>>>> (0x5aa - 0x53e) = 0x6c bytes get corrupted.
>>>>>>>
>>>>>>> I see here buffer overrun from e1000 and I suspect that your patch just
>>>>>>> hides this problem. What did I miss?
>>>>> Ping, anyone?
>>>> You missed that this is a Xen-specific problem.  Xen maps things a page at
>>>> a time, so address_space_map/unmap/rw can operate only on a small part of
>>>> the requested [address, address+length) range.
>>> Sorry, I am not following you here. Does KVM map things not page-aligned?
>> Look in exec.c for xen_enabled().  Xen's implementation of
>> address_space_map/unmap is completely different.
> Honestly cannot see much difference in the current QEMU...
>
>>>> So there is no overrun in e1000.  The patch is incomplete, because it fixes
>>>> only address_space_rw, but the problem is indeed in exec.c.
>>> So you know what the problem is? We have a bunch of bugreports against
>>> e1000 breaking things...
>> This one had been reported and fixed already.
> Any hint? :) Could not spot any related to overrun except:
> a0ae17a 8 months ago Andrew Jones e1000: cleanup process_tx_desc
> but commit message says "complaints are false positives"...
>
>

Looks like you have also ignored:

http://marc.info/?l=qemu-devel&m=139325054822857&w=2
http://marc.info/?l=qemu-devel&m=139325135123159&w=2


     -Don Slutz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] e1000 memory corruption in guest OS
  2014-03-03 21:33             ` Don Slutz
@ 2014-03-03 22:53               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-03 22:53 UTC (permalink / raw)
  To: Don Slutz, Paolo Bonzini, Hoyer, David, Qemu-devel
  Cc: Moyer, Keith, Best, Tish

On 03/04/2014 08:33 AM, Don Slutz wrote:
> On 03/03/14 05:47, Alexey Kardashevskiy wrote:
>> On 03/03/2014 07:33 PM, Paolo Bonzini wrote:
>>> Il 03/03/2014 02:58, Alexey Kardashevskiy ha scritto:
>>>> On 03/02/2014 08:31 AM, Paolo Bonzini wrote:
>>>>> Il 01/03/2014 13:30, Alexey Kardashevskiy ha scritto:
>>>>>>>> Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000.
>>>>>>>> The e1000 packet is at  0x12ffac2 -- 0x7f15c313eac2.
>>>>>>>>
>>>>>>>> (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and
>>>>>>>> (0x5aa - 0x53e) = 0x6c bytes get corrupted.
>>>>>>>>
>>>>>>>> I see here buffer overrun from e1000 and I suspect that your patch
>>>>>>>> just
>>>>>>>> hides this problem. What did I miss?
>>>>>> Ping, anyone?
>>>>> You missed that this is a Xen-specific problem.  Xen maps things a
>>>>> page at
>>>>> a time, so address_space_map/unmap/rw can operate only on a small part of
>>>>> the requested [address, address+length) range.
>>>> Sorry, I am not following you here. Does KVM map things not page-aligned?
>>> Look in exec.c for xen_enabled().  Xen's implementation of
>>> address_space_map/unmap is completely different.
>> Honestly cannot see much difference in the current QEMU...
>>
>>>>> So there is no overrun in e1000.  The patch is incomplete, because it
>>>>> fixes
>>>>> only address_space_rw, but the problem is indeed in exec.c.
>>>> So you know what the problem is? We have a bunch of bugreports against
>>>> e1000 breaking things...
>>> This one had been reported and fixed already.
>> Any hint? :) Could not spot any related to overrun except:
>> a0ae17a 8 months ago Andrew Jones e1000: cleanup process_tx_desc
>> but commit message says "complaints are false positives"...
>>
>>
> 
> Looks like you have also ignored:
> 
> http://marc.info/?l=qemu-devel&m=139325054822857&w=2
> http://marc.info/?l=qemu-devel&m=139325135123159&w=2

Oh, thanks!



-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-03-03 22:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-16  4:29 [Qemu-devel] e1000 memory corruption in guest OS Hoyer, David
2014-02-24  4:20 ` Alexey Kardashevskiy
2014-03-01 12:30   ` Alexey Kardashevskiy
2014-03-01 21:31     ` Paolo Bonzini
2014-03-03  1:58       ` Alexey Kardashevskiy
2014-03-03  8:33         ` Paolo Bonzini
2014-03-03 10:47           ` Alexey Kardashevskiy
2014-03-03 11:21             ` Paolo Bonzini
2014-03-03 21:33             ` Don Slutz
2014-03-03 22:53               ` Alexey Kardashevskiy
2014-02-24 14:02 ` Paolo Bonzini
2014-02-24 14:15   ` Hoyer, David

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.