All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: ASM <asm@asm.pp.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	dmitry.fleytman@gmail.com, qemu-devel@nongnu.org,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date: Thu, 19 Dec 2019 14:58:17 +0000	[thread overview]
Message-ID: <20191219145817.GG1624084@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAMmAVbW5a+v_dJ6NM3erwouOqpXyTzL36_W566SL1KuPFPFSEw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6555 bytes --]

On Wed, Nov 27, 2019 at 03:39:03PM +0300, ASM wrote:
> When the packet is received, e1000 writes it to memory directrly
> without any RCU.
> The address of memory for writing is set by the driver from dpdk driver.
> Driver writes to RDBA (RDBAH,RDBAL) base address of ring.
> 
> It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
> protect, and can't protect the ring descriptors.
> The area for protection may be any area of operational memory. And it
> becomes famous when writing to registers RDBA by driver.
> (see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue Structure")
> 
> How can this memory be protected? As I understand it, the e1000 should
> track the record in RDBA and enable memory protection in this region.
> But how to do it right?

I misunderstood the issue and you can probably ignore my comments about
coalesced MMIO.  You quoted descriptor DMA code below so coalesced MMIO
shouldn't be relevant since desc->status isn't an MMIO register.

> 
> Source code qemu:
> hw/net/e1000.c:954 (version master)
> 
>  954         base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
> where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
> ...
> 955         pci_dma_read(d, base, &desc, sizeof(desc));
> ...
> 957         desc.status |= (vlan_status | E1000_RXD_STAT_DD);
> ...
> 990         pci_dma_write(d, base, &desc, sizeof(desc));
> ->
> exec.c:
> 3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
> 3112                                            MemTxAttrs attrs,
> 3113                                            const uint8_t *buf,
> 3114                                            hwaddr len, hwaddr addr1,
> 3115                                            hwaddr l, MemoryRegion *mr)
> 3116 {
> ...
> 3123         if (!memory_access_is_direct(mr, true)) {
> (false)
> 3131         } else {
> 3132             /* RAM case */
> 3133             ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
> 3134             memcpy(ptr, buf, l);
> 
> where I be seeing weird behavior with KVM due to MMIO write coalescing
> 
> 3135             invalidate_and_set_dirty(mr, addr1, l);
> 3136         }
> 3137
> 
> Source code dpdk(e1000): (version dpdk-stable-17.11.9)
> drivers/net/e1000/em_rxtx.c:
> 
> 699 uint16_t
> 700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> 701                 uint16_t nb_pkts)
> ...
> 718         rxq = rx_queue
> ...
> 722         rx_id = rxq->rx_tail;
> 723         rx_ring = rxq->rx_ring
> ...
> 734                 rxdp = &rx_ring[rx_id];
> 735                 status = rxdp->status;
> 736                 if (! (status & E1000_RXD_STAT_DD))
> 737                         break;
> ...
> 807                 rxdp->buffer_addr = dma_addr;
> 808                 rxdp->status = 0;
> where I be seeing weird behavior with KVM due to MMIO write
> coalescing

It could be a bug in QEMU's e1000 emulation - maybe it's not doing
things in the correct order and causes a race condition with the DPDK
polling driver - or it could be a bug in the DPDK e1000 driver regarding
the order in which the descriptor ring and RX Head/Tail MMIO registers
are updated.

Did you find the root cause?

> P.S.
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
>  All tests on single virtual CPU.
> 
> I created github project for quick reproduction of this error:
> https://github.com/BASM/qemu_dpdk_e1000_test
> 
> ---
> Best regards,
> Leonid Myravjev
> 
> On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > > My studies led to the following:
> > > 1. From flatview_write_continue() I see, what e1000 writes the number
> > > "7" to the STAT register.
> > > 2. The driver from target OS reads STAT register with number "7" and
> > > writes to the register the number "0".
> > > 3. From flatview_write_continue() (I make edits):
> > >             memcpy(ptr, buf, l);
> > >             new1=ptr[0xc];
> > >             usleep(100);
> > >             new2=ptr[0xc];
> > >             invalidate_and_set_dirty(mr, addr1, l);
> > >             new3=ptr[0xc];
> > > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> > >
> > > I see what memory in first printf is "7", but after usleep() is "0".
> > > Do I understand correctly that this should not be? Or RCU lock
> > > suggests the ability to the multiple writers?
> > >
> > > The problem is that qemu(e1000) writes the number 7, after which
> > > target(dpdk driver) reads 7, on the basis of this it writes the number
> > > 0, but as a result (extremely rarely), the value STATUS still remains
> > > 7. Therefore, packet processing is interrupted. This behavior is
> > > observed only on kvm (it is not observed on tcg).
> > >
> > > Please help with advice or ideas.
> >
> > Hi Leonid,
> > Could you be seeing weird behavior with KVM due to MMIO write
> > coalescing?
> >
> >   static void e1000_mmio_setup(E1000State *d)
> >   {
> >       int i;
> >       const uint32_t excluded_regs[] = {
> >           E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
> >           E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
> >       };
> >
> >       memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> >                             "e1000-mmio", PNPMMIO_SIZE);
> >       memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
> >       for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
> >           memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
> >                                        excluded_regs[i+1] - excluded_regs[i] - 4);
> >       memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io", IOPORT_SIZE);
> >   }
> >
> > MMIO write coalescing means that QEMU doesn't see the register writes
> > immediately.  Instead kvm.ko records them into a ring buffer and QEMU
> > processes the ring when the next ioctl(KVM_RUN) exit occurs.
> >
> > See Linux Documentation/virt/kvm/api.txt "4.116
> > KVM_(UN)REGISTER_COALESCED_MMIO" for more details.
> >
> > I don't really understand your printf debugging explanation.  It would
> > help to see the DPDK code and the exact printf() output.
> >
> > Also, is DPDK accessing the e1000 device from more than 1 vCPU?
> >
> > Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-12-19 14:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 14:40 [dpdk-dev] PCI memory sync (kvm,dpdk,e1000,packet stalled) ASM
2019-11-20 17:36 ` PCI memory sync question " ASM
2019-11-21 14:05   ` Stefan Hajnoczi
2019-11-27 12:39     ` ASM
2019-12-19 14:58       ` Stefan Hajnoczi [this message]
2019-12-30 10:10         ` ASM
2020-01-02 11:05           ` Stefan Hajnoczi
2022-05-23 12:47             ` Ding Hui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191219145817.GG1624084@stefanha-x1.localdomain \
    --to=stefanha@gmail.com \
    --cc=asm@asm.pp.ru \
    --cc=dmitry.fleytman@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.