qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: ASM <asm@asm.pp.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	dmitry.fleytman@gmail.com, qemu-devel@nongnu.org,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date: Thu, 19 Dec 2019 14:58:17 +0000	[thread overview]
Message-ID: <20191219145817.GG1624084@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAMmAVbW5a+v_dJ6NM3erwouOqpXyTzL36_W566SL1KuPFPFSEw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6555 bytes --]

On Wed, Nov 27, 2019 at 03:39:03PM +0300, ASM wrote:
> When the packet is received, e1000 writes it to memory directrly
> without any RCU.
> The address of memory for writing is set by the driver from dpdk driver.
> Driver writes to RDBA (RDBAH,RDBAL) base address of ring.
> 
> It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
> protect, and can't protect the ring descriptors.
> The area for protection may be any area of operational memory. And it
> becomes famous when writing to registers RDBA by driver.
> (see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue Structure")
> 
> How can this memory be protected? As I understand it, the e1000 should
> track the record in RDBA and enable memory protection in this region.
> But how to do it right?

I misunderstood the issue and you can probably ignore my comments about
coalesced MMIO.  You quoted descriptor DMA code below so coalesced MMIO
shouldn't be relevant since desc->status isn't an MMIO register.

> 
> Source code qemu:
> hw/net/e1000.c:954 (version master)
> 
>  954         base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
> where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
> ...
> 955         pci_dma_read(d, base, &desc, sizeof(desc));
> ...
> 957         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
> ...
> 990         pci_dma_write(d, base, &desc, sizeof(desc));
> ->
> exec.c:
> 3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> 3112                                            MemTxAttrs attrs,
> 3113                                            const uint8_t *buf,
> 3114                                            hwaddr len, hwaddr addr1,
> 3115                                            hwaddr l, MemoryRegion *mr)
> 3116 {
> ...
> 3123         if (!memory_access_is_direct(mr, true)) {
> (false)
> 3131         } else {
> 3132             /* RAM case */
> 3133             ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
> 3134             memcpy(ptr, buf, l);
> 
> where I be seeing weird behavior with KVM due to MMIO write coalescing
> 
> 3135             invalidate_and_set_dirty(mr, addr1, l);
> 3136         }
> 3137
> 
> Source code dpdk(e1000): (version dpdk-stable-17.11.9)
> drivers/net/e1000/em_rxtx.c:
> 
> 699 uint16_t
> 700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> 701                 uint16_t nb_pkts)
> ...
> 718         rxq = rx_queue
> ...
> 722         rx_id = rxq->rx_tail;
> 723         rx_ring = rxq->rx_ring
> ...
> 734                 rxdp = &rx_ring[rx_id];
> 735                 status = rxdp->status;
> 736                 if (! (status & E1000_RXD_STAT_DD))
> 737                         break;
> ...
> 807                 rxdp->buffer_addr = dma_addr;
> 808                 rxdp->status = 0;
> where I be seeing weird behavior with KVM due to MMIO write
> coalescing

It could be a bug in QEMU's e1000 emulation - maybe it's not doing
things in the correct order and causes a race condition with the DPDK
polling driver - or it could be a bug in the DPDK e1000 driver regarding
the order in which the descriptor ring and RX Head/Tail MMIO registers
are updated.

Did you find the root cause?

> P.S.
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
>  All tests on single virtual CPU.
> 
> I created github project for quick reproduction of this error:
> https://github.com/BASM/qemu_dpdk_e1000_test
> 
> ---
> Best regards,
> Leonid Myravjev
> 
> On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > > My studies led to the following:
> > > 1. From flatview_write_continue() I see, what e1000 writes the number
> > > "7" to the STAT register.
> > > 2. The driver from target OS reads STAT register with number "7" and
> > > writes to the register the number "0".
> > > 3. From flatview_write_continue() (I make edits):
> > >             memcpy(ptr, buf, l);
> > >             new1=ptr[0xc];
> > >             usleep(100);
> > >             new2=ptr[0xc];
> > >             invalidate_and_set_dirty(mr, addr1, l);
> > >             new3=ptr[0xc];
> > > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> > >
> > > I see what memory in first printf is "7", but after usleep() is "0".
> > > Do I understand correctly that this should not be? Or RCU lock
> > > suggests the ability to the multiple writers?
> > >
> > > The problem is that qemu(e1000) writes the number 7, after which
> > > target(dpdk driver) reads 7, on the basis of this it writes the number
> > > 0, but as a result (extremely rarely), the value STATUS still remains
> > > 7. Therefore, packet processing is interrupted. This behavior is
> > > observed only on kvm (it is not observed on tcg).
> > >
> > > Please help with advice or ideas.
> >
> > Hi Leonid,
> > Could you be seeing weird behavior with KVM due to MMIO write
> > coalescing?
> >
> >   static void e1000_mmio_setup(E1000State *d)
> >   {
> >       int i;
> >       const uint32_t excluded_regs[] = {
> >           E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
> >           E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
> >       };
> >
> >       memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> >                             "e1000-mmio", PNPMMIO_SIZE);
> >       memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
> >       for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
> >           memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
> >                                        excluded_regs[i+1] - excluded_regs[i] - 4);
> >       memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io", IOPORT_SIZE);
> >   }
> >
> > MMIO write coalescing means that QEMU doesn't see the register writes
> > immediately.  Instead kvm.ko records them into a ring buffer and QEMU
> > processes the ring when the next ioctl(KVM_RUN) exit occurs.
> >
> > See Linux Documentation/virt/kvm/api.txt "4.116
> > KVM_(UN)REGISTER_COALESCED_MMIO" for more details.
> >
> > I don't really understand your printf debugging explanation.  It would
> > help to see the DPDK code and the exact printf() output.
> >
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> >
> > Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-12-19 14:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAMmAVbWzrYWZBXwKxSd-f5SXmq6qP1ok8abvyKJhp3=REEaMPA@mail.gmail.com>
2019-11-20 17:36 ` PCI memory sync question (kvm,dpdk,e1000,packet stalled) ASM
2019-11-21 14:05   ` Stefan Hajnoczi
2019-11-27 12:39     ` ASM
2019-12-19 14:58       ` Stefan Hajnoczi [this message]
2019-12-30 10:10         ` ASM
2020-01-02 11:05           ` Stefan Hajnoczi
2022-05-23 12:47             ` Ding Hui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191219145817.GG1624084@stefanha-x1.localdomain \
    --to=stefanha@gmail.com \
    --cc=asm@asm.pp.ru \
    --cc=dmitry.fleytman@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).