From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755260AbdDNWuy (ORCPT ); Fri, 14 Apr 2017 18:50:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46280 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751462AbdDNWuw (ORCPT ); Fri, 14 Apr 2017 18:50:52 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 46A7585542 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mst@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 46A7585542 Date: Sat, 15 Apr 2017 01:50:48 +0300 From: "Michael S. Tsirkin" To: Jason Wang Cc: linux-kernel@vger.kernel.org, brouer@redhat.com Subject: Re: [PATCH 1/3] ptr_ring: batch ring zeroing Message-ID: <20170415014859-mutt-send-email-mst@kernel.org> References: <1491544049-19108-1-git-send-email-mst@redhat.com> <19f56d99-279a-5a8b-39a7-1017a3cb4bdd@redhat.com> <97915bd0-ef22-4e67-1b18-3522e2369fda@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <97915bd0-ef22-4e67-1b18-3522e2369fda@redhat.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Fri, 14 Apr 2017 22:50:51 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 14, 2017 at 03:52:23PM +0800, Jason Wang wrote: > > > On 2017年04月12日 16:03, Jason Wang wrote: > > > > > > On 2017年04月07日 13:49, Michael S. Tsirkin wrote: > > > A known weakness in ptr_ring design is that it does not handle well the > > > situation when ring is almost full: as entries are consumed they are > > > immediately used again by the producer, so consumer and producer are > > > writing to a shared cache line. > > > > > > To fix this, add batching to consume calls: as entries are > > > consumed do not write NULL into the ring until we get > > > a multiple (in current implementation 2x) of cache lines > > > away from the producer. At that point, write them all out. > > > > > > We do the write out in the reverse order to keep > > > producer from sharing cache with consumer for as long > > > as possible. > > > > > > Writeout also triggers when ring wraps around - there's > > > no special reason to do this but it helps keep the code > > > a bit simpler. > > > > > > What should we do if getting away from producer by 2 cache lines > > > would mean we are keeping the ring moe than half empty? > > > Maybe we should reduce the batching in this case, > > > current patch simply reduces the batching. > > > > > > Notes: > > > - it is no longer true that a call to consume guarantees > > > that the following call to produce will succeed. > > > No users seem to assume that. > > > - batching can also in theory reduce the signalling rate: > > > users that would previously send interrups to the producer > > > to wake it up after consuming each entry would now only > > > need to do this once in a batch. > > > Doing this would be easy by returning a flag to the caller. > > > No users seem to do signalling on consume yet so this was not > > > implemented yet. > > > > > > Signed-off-by: Michael S. Tsirkin > > > --- > > > > > > Jason, I am curious whether the following gives you some of > > > the performance boost that you see with vhost batching > > > patches. Is vhost batching on top still helpful? > > > > The patch looks good to me, will have a test for vhost batching patches. > > > > Thanks > > Still helpful: > > before this patch: 1.84Mpps > with this patch: 2.00Mpps > with batch dequeuing: 2.30Mpps Just a thought: could you test dropping the consumer spinlock completely? Just around the peek? As I said previously, perf c2c tool should be helpful to locate sources latency related to cache. > Acked-by: Jason Wang > > Thanks