All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: qemu-devel@nongnu.org, peter.maydell@linaro.org
Cc: Ding Hui <dinghui@sangfor.com.cn>,
	Leonid Myravjev <asm@asm.pp.ru>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	qemu-stable@nongnu.org, Jing Zhang <zhangjing@sangfor.com.cn>,
	Frank Lee <lifan38153@sangfor.com.cn>,
	Jason Wang <jasowang@redhat.com>
Subject: [PULL 1/2] e1000: set RX descriptor status in a separate operation
Date: Wed,  6 Jul 2022 11:47:05 +0800	[thread overview]
Message-ID: <20220706034706.36620-2-jasowang@redhat.com> (raw)
In-Reply-To: <20220706034706.36620-1-jasowang@redhat.com>

From: Ding Hui <dinghui@sangfor.com.cn>

The code of setting RX descriptor status field maybe work fine in
previously, however with the update of glibc version, it shows two
issues when guest using dpdk receive packets:

  1. The dpdk has a certain probability getting wrong buffer_addr

     this impact may be not obvious, such as lost a packet once in
     a while

  2. The dpdk may consume a packet twice when scan the RX desc queue
     over again

     this impact will lead a infinite wait in Qemu, since the RDT
     (tail pointer) be inscreased to equal to RDH by unexpected,
     which regard as the RX desc queue is full

Write a whole of RX desc with DD flag on is not quite correct, because
when the underlying implementation of memcpy using XMM registers to
copy e1000_rx_desc (when AVX or something else CPU feature is usable),
the bytes order of desc writing to memory is indeterminacy

We can use full-scale test case to reproduce the issue-2 by
https://github.com/BASM/qemu_dpdk_e1000_test (thanks to Leonid Myravjev)

I also write a POC test case at https://github.com/cdkey/e1000_poc
which can reproduce both of them, and easy to verify the patch effect.

The hw watchpoint also shows that, when Qemu using XMM related instructions
writing 16 bytes e1000_rx_desc, concurrent with DPDK using movb
writing 1 byte status, the final result of writing to memory will be one
of them, if it made by Qemu which DD flag is on, DPDK will consume it
again.

Setting DD status in a separate operation, can prevent the impact of
disorder memory writing by memcpy, also avoid unexpected data when
concurrent writing status by qemu and guest dpdk.

Links: https://lore.kernel.org/qemu-devel/20200102110504.GG121208@stefanha-x1.localdomain/T/

Reported-by: Leonid Myravjev <asm@asm.pp.ru>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Tested-by: Jing Zhang <zhangjing@sangfor.com.cn>
Reviewed-by: Frank Lee <lifan38153@sangfor.com.cn>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/e1000.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index f5bc812..e26e0a6 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -979,7 +979,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
         base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
         pci_dma_read(d, base, &desc, sizeof(desc));
         desc.special = vlan_special;
-        desc.status |= (vlan_status | E1000_RXD_STAT_DD);
+        desc.status &= ~E1000_RXD_STAT_DD;
         if (desc.buffer_addr) {
             if (desc_offset < size) {
                 size_t iov_copy;
@@ -1013,6 +1013,9 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
             DBGOUT(RX, "Null RX descriptor!!\n");
         }
         pci_dma_write(d, base, &desc, sizeof(desc));
+        desc.status |= (vlan_status | E1000_RXD_STAT_DD);
+        pci_dma_write(d, base + offsetof(struct e1000_rx_desc, status),
+                      &desc.status, sizeof(desc.status));
 
         if (++s->mac_reg[RDH] * sizeof(desc) >= s->mac_reg[RDLEN])
             s->mac_reg[RDH] = 0;
-- 
2.7.4



  reply	other threads:[~2022-07-06  3:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06  3:47 [PULL 0/2] Net patches Jason Wang
2022-07-06  3:47 ` Jason Wang [this message]
2022-07-06  3:47 ` [PULL 2/2] ebpf: replace deprecated bpf_program__set_socket_filter Jason Wang
2022-07-06  6:50 ` [PULL 0/2] Net patches Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706034706.36620-2-jasowang@redhat.com \
    --to=jasowang@redhat.com \
    --cc=asm@asm.pp.ru \
    --cc=dinghui@sangfor.com.cn \
    --cc=lifan38153@sangfor.com.cn \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=zhangjing@sangfor.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.