From: Stefan Hajnoczi <stefanha@gmail.com>
To: ASM <asm@asm.pp.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
dmitry.fleytman@gmail.com, qemu-devel@nongnu.org,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date: Thu, 19 Dec 2019 14:58:17 +0000 [thread overview]
Message-ID: <20191219145817.GG1624084@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAMmAVbW5a+v_dJ6NM3erwouOqpXyTzL36_W566SL1KuPFPFSEw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6555 bytes --]
On Wed, Nov 27, 2019 at 03:39:03PM +0300, ASM wrote:
> When the packet is received, e1000 writes it to memory directrly
> without any RCU.
> The address of memory for writing is set by the driver from dpdk driver.
> Driver writes to RDBA (RDBAH,RDBAL) base address of ring.
>
> It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
> protect, and can't protect the ring descriptors.
> The area for protection may be any area of operational memory. And it
> becomes famous when writing to registers RDBA by driver.
> (see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue Structure")
>
> How can this memory be protected? As I understand it, the e1000 should
> track the record in RDBA and enable memory protection in this region.
> But how to do it right?
I misunderstood the issue and you can probably ignore my comments about
coalesced MMIO. You quoted descriptor DMA code below so coalesced MMIO
shouldn't be relevant since desc->status isn't an MMIO register.
>
> Source code qemu:
> hw/net/e1000.c:954 (version master)
>
> 954 base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
> where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
> ...
> 955 pci_dma_read(d, base, &desc, sizeof(desc));
> ...
> 957 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
> ...
> 990 pci_dma_write(d, base, &desc, sizeof(desc));
> ->
> exec.c:
> 3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> 3112 MemTxAttrs attrs,
> 3113 const uint8_t *buf,
> 3114 hwaddr len, hwaddr addr1,
> 3115 hwaddr l, MemoryRegion *mr)
> 3116 {
> ...
> 3123 if (!memory_access_is_direct(mr, true)) {
> (false)
> 3131 } else {
> 3132 /* RAM case */
> 3133 ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
> 3134 memcpy(ptr, buf, l);
>
> where I be seeing weird behavior with KVM due to MMIO write coalescing
>
> 3135 invalidate_and_set_dirty(mr, addr1, l);
> 3136 }
> 3137
>
> Source code dpdk(e1000): (version dpdk-stable-17.11.9)
> drivers/net/e1000/em_rxtx.c:
>
> 699 uint16_t
> 700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> 701 uint16_t nb_pkts)
> ...
> 718 rxq = rx_queue
> ...
> 722 rx_id = rxq->rx_tail;
> 723 rx_ring = rxq->rx_ring
> ...
> 734 rxdp = &rx_ring[rx_id];
> 735 status = rxdp->status;
> 736 if (! (status & E1000_RXD_STAT_DD))
> 737 break;
> ...
> 807 rxdp->buffer_addr = dma_addr;
> 808 rxdp->status = 0;
> where I be seeing weird behavior with KVM due to MMIO write
> coalescing
It could be a bug in QEMU's e1000 emulation - maybe it's not doing
things in the correct order and causes a race condition with the DPDK
polling driver - or it could be a bug in the DPDK e1000 driver regarding
the order in which the descriptor ring and RX Head/Tail MMIO registers
are updated.
Did you find the root cause?
> P.S.
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> All tests on single virtual CPU.
>
> I created github project for quick reproduction of this error:
> https://github.com/BASM/qemu_dpdk_e1000_test
>
> ---
> Best regards,
> Leonid Myravjev
>
> On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > > My studies led to the following:
> > > 1. From flatview_write_continue() I see, what e1000 writes the number
> > > "7" to the STAT register.
> > > 2. The driver from target OS reads STAT register with number "7" and
> > > writes to the register the number "0".
> > > 3. From flatview_write_continue() (I make edits):
> > > memcpy(ptr, buf, l);
> > > new1=ptr[0xc];
> > > usleep(100);
> > > new2=ptr[0xc];
> > > invalidate_and_set_dirty(mr, addr1, l);
> > > new3=ptr[0xc];
> > > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> > >
> > > I see what memory in first printf is "7", but after usleep() is "0".
> > > Do I understand correctly that this should not be? Or RCU lock
> > > suggests the ability to the multiple writers?
> > >
> > > The problem is that qemu(e1000) writes the number 7, after which
> > > target(dpdk driver) reads 7, on the basis of this it writes the number
> > > 0, but as a result (extremely rarely), the value STATUS still remains
> > > 7. Therefore, packet processing is interrupted. This behavior is
> > > observed only on kvm (it is not observed on tcg).
> > >
> > > Please help with advice or ideas.
> >
> > Hi Leonid,
> > Could you be seeing weird behavior with KVM due to MMIO write
> > coalescing?
> >
> > static void e1000_mmio_setup(E1000State *d)
> > {
> > int i;
> > const uint32_t excluded_regs[] = {
> > E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
> > E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
> > };
> >
> > memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> > "e1000-mmio", PNPMMIO_SIZE);
> > memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
> > for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
> > memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
> > excluded_regs[i+1] - excluded_regs[i] - 4);
> > memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io", IOPORT_SIZE);
> > }
> >
> > MMIO write coalescing means that QEMU doesn't see the register writes
> > immediately. Instead kvm.ko records them into a ring buffer and QEMU
> > processes the ring when the next ioctl(KVM_RUN) exit occurs.
> >
> > See Linux Documentation/virt/kvm/api.txt "4.116
> > KVM_(UN)REGISTER_COALESCED_MMIO" for more details.
> >
> > I don't really understand your printf debugging explanation. It would
> > help to see the DPDK code and the exact printf() output.
> >
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> >
> > Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-12-19 14:59 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAMmAVbWzrYWZBXwKxSd-f5SXmq6qP1ok8abvyKJhp3=REEaMPA@mail.gmail.com>
2019-11-20 17:36 ` PCI memory sync question (kvm,dpdk,e1000,packet stalled) ASM
2019-11-21 14:05 ` Stefan Hajnoczi
2019-11-27 12:39 ` ASM
2019-12-19 14:58 ` Stefan Hajnoczi [this message]
2019-12-30 10:10 ` ASM
2020-01-02 11:05 ` Stefan Hajnoczi
2022-05-23 12:47 ` Ding Hui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191219145817.GG1624084@stefanha-x1.localdomain \
--to=stefanha@gmail.com \
--cc=asm@asm.pp.ru \
--cc=dmitry.fleytman@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).