From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1755260AbdDNWuy (ORCPT <rfc822;w@1wt.eu>);
        Fri, 14 Apr 2017 18:50:54 -0400
Received: from mx1.redhat.com ([209.132.183.28]:46280 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751462AbdDNWuw (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 14 Apr 2017 18:50:52 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 46A7585542
Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mst@redhat.com
DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 46A7585542
Date: Sat, 15 Apr 2017 01:50:48 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: linux-kernel@vger.kernel.org, brouer@redhat.com
Subject: Re: [PATCH 1/3] ptr_ring: batch ring zeroing
Message-ID: <20170415014859-mutt-send-email-mst@kernel.org>
References: <1491544049-19108-1-git-send-email-mst@redhat.com>
 <19f56d99-279a-5a8b-39a7-1017a3cb4bdd@redhat.com>
 <97915bd0-ef22-4e67-1b18-3522e2369fda@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <97915bd0-ef22-4e67-1b18-3522e2369fda@redhat.com>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 14 Apr 2017 22:50:51 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Apr 14, 2017 at 03:52:23PM +0800, Jason Wang wrote:
> 
> 
> On 2017年04月12日 16:03, Jason Wang wrote:
> > 
> > 
> > On 2017年04月07日 13:49, Michael S. Tsirkin wrote:
> > > A known weakness in ptr_ring design is that it does not handle well the
> > > situation when ring is almost full: as entries are consumed they are
> > > immediately used again by the producer, so consumer and producer are
> > > writing to a shared cache line.
> > > 
> > > To fix this, add batching to consume calls: as entries are
> > > consumed do not write NULL into the ring until we get
> > > a multiple (in current implementation 2x) of cache lines
> > > away from the producer. At that point, write them all out.
> > > 
> > > We do the write out in the reverse order to keep
> > > producer from sharing cache with consumer for as long
> > > as possible.
> > > 
> > > Writeout also triggers when ring wraps around - there's
> > > no special reason to do this but it helps keep the code
> > > a bit simpler.
> > > 
> > > What should we do if getting away from producer by 2 cache lines
> > > would mean we are keeping the ring moe than half empty?
> > > Maybe we should reduce the batching in this case,
> > > current patch simply reduces the batching.
> > > 
> > > Notes:
> > > - it is no longer true that a call to consume guarantees
> > >    that the following call to produce will succeed.
> > >    No users seem to assume that.
> > > - batching can also in theory reduce the signalling rate:
> > >    users that would previously send interrups to the producer
> > >    to wake it up after consuming each entry would now only
> > >    need to do this once in a batch.
> > >    Doing this would be easy by returning a flag to the caller.
> > >    No users seem to do signalling on consume yet so this was not
> > >    implemented yet.
> > > 
> > > Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> > > ---
> > > 
> > > Jason, I am curious whether the following gives you some of
> > > the performance boost that you see with vhost batching
> > > patches. Is vhost batching on top still helpful?
> > 
> > The patch looks good to me, will have a test for vhost batching patches.
> > 
> > Thanks
> 
> Still helpful:
> 
> before this patch: 1.84Mpps
> with this patch: 2.00Mpps
> with batch dequeuing: 2.30Mpps

Just a thought: could you test dropping the consumer spinlock
completely?  Just around the peek?

As I said previously, perf c2c tool should be helpful
to locate sources latency related to cache.

> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> Thanks