netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhi Yong Wu <zwu.kernel@gmail.com>
To: Ben Hutchings <bhutchings@solarflare.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: Linux Netdev List <netdev@vger.kernel.org>,
	Tom Herbert <therbert@google.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support
Date: Sat, 18 Jan 2014 00:54:20 +0800	[thread overview]
Message-ID: <CAEH94LgKJ0qThGHS-hqXsYKqOe+ns2pRi-wfZy=af1RMejOZ5A@mail.gmail.com> (raw)
In-Reply-To: <1389914188.11912.146.camel@bwh-desktop.uk.level5networks.com>

On Fri, Jan 17, 2014 at 7:16 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Fri, 2014-01-17 at 06:00 +0800, Zhi Yong Wu wrote:
>> On Fri, Jan 17, 2014 at 5:31 AM, Ben Hutchings
>> <bhutchings@solarflare.com> wrote:
>> > On Wed, 2014-01-15 at 22:20 +0800, Zhi Yong Wu wrote:
>> > [...]
>> >> +static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi)
>> >> +{
>> >> +     int rc = 0;
>> >> +
>> >> +#ifdef CONFIG_RFS_ACCEL
>> >> +     struct virtio_device *vdev = vi->vdev;
>> >> +     unsigned int irq;
>> >> +     int i;
>> >> +
>> >> +     if (!vi->affinity_hint_set)
>> >> +             goto out;
>> >> +
>> >> +     vi->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(vi->max_queue_pairs);
>> >> +     if (!vi->dev->rx_cpu_rmap) {
>> >> +             rc = -ENOMEM;
>> >> +             goto out;
>> >> +     }
>> >> +
>> >> +     for (i = 0; i < vi->max_queue_pairs; i++) {
>> >> +             irq = virtqueue_get_vq_irq(vdev, vi->rq[i].vq);
>> >> +             if (irq == -1)
>> >> +                     goto failed;
>> >
>> > Jumping into an if-statement is confusing.  Also do you really want to
>> > return 0 in this case?
>> No, If it fail to get irq, i want it to exit as soon as possible,
>> otherwise it will cause irq_cpu_rmap_add() to be invoked with one
>> incorrect argument irq.
>
> Well currently this goto does result in returning 0, as rc has not been
> changed after its initialisation to 0.
Yes, this goto will result in returning 0. I am thinking if it it
appropriate, but definitely  the current code has one bug here. I
thought that when this NIC doesn't enable MSI-X support, and if
CONFIG_RFS_ACCEL is defined, its aRFS cpu_rmap table will not be
allocated here and aRFS will be ignored automatically, but this will
trigger set_rps_cpu() to hit memory issue.

>
>> By the way, do you have thought about if it makes sense to add aRFS
>> support to virtio_net? For [patch 2/3], what do you think of those
>> missing stuff listed by me?
>> For how indirect table is implemented in sfc NIC, do you have any doc
>> to share with  me? thanks.
>
> Going through that list:
>
>> 1.)  guest virtio_net driver should have one filter table and its
>> entries can be expired periodically;
>
> In sfc, we keep a count how many entries have been inserted in each NAPI
> context.  Whenever the NAPI poll function is about to call
> napi_complete() and the count for that context has reached a trigger
> level, it will scan some quota of filter entries for expiry.
thanks for your sharing.
>
>> 2.)  guest virtio_net driver should pass rx queue index and filter
>> info down to the emulated virtio_net NIC in QEMU.
>> 3.) the emulated virtio_net NIC should have its indirect table to
>> store the flow to rx queue mapping.
>> 4.) the emulated virtio_net NIC should classify the rx packet to
>> selected queue by applying the filter.
>
> I think the most efficient way to do this would be to put a hash table
> in some shared memory that both guest and host can read and write.  The
> virtio control path would only be used to set up and tear down the
> table.  I don't know whether virtio allows for that.
Yes, i agree with you, and don't know what stefan think if it. CC
stefanha. He is virtio expert. Stefan, Can you give some advice about
this?

>
> However, to take advantage of ARFS on a physical net driver, it would be
> necessary to send a control request for part 2.
aRFS on a physical net driver? What is this physical net driver? I
thought that in order to enable aRFS, guest virtio_net driver should
send a control request to its emulated virtio_net NIC.
>
>> 5.) update virtio spec.
>> Do i miss anything? If yes, please correct me.
>> For 3.) and 4.), do you have any doc about how they are implemented in
>> physical NICs? e.g. mlx4_en or sfc, etc.
>
> The Programmer's Reference Manuals for Solarflare controllers are only
> available under NDA.  I can describe the hardware filtering briefly, but
> actually I don't think it's very relevant to virtio_net.
>
> There is a typical RSS hash indirection table (128 entries), but for
> ARFS we use a different RX filter table which has 8K entries
> (RX_FILTER_TBL0 on SFC4000/SFC9000 family).
>
> Solarflare controllers support user-level networking, which requires
> perfect filtering to deliver each application's flows into that
> application's dedicated RX queue(s).  Lookups in this larger filter
> table are still hash-based, but each entry specifies a TCP/IP or UDP/IP
> 4-tuple or local 2-tuple to match.  ARFS uses the 4-tuple type only.
>
> To allow for hash collisions, a secondary hash function generates an
> increment to be added to the initial table index repeatedly for hash
> chaining.  There is a control register which tells the controller the
> maximum hash chain length to search for each IP filter type; after this
> it will fall back to checking MAC filters and then default filters.
>
> On the SFC9100 family, filter updates and lookups are implemented by
> firmware and the driver doesn't manage the filter table itself, but I
> know it is still a hash table of perfect filters.
>
> For ARFS, perfect filtering is not needed.  I think it would be
> preferable to use a fairly big hash table and make insertion fail in
> case of a collision.  Since the backend for virtio_net will do RX queue
> selection in software, the entire table of queue indices should fit into
> its L1 cache.
It is exactly in detail, thanks a lot for your nice sharing. It is
very helpful to me. I will come back to carefully read this after we
reach the agreement about if aRFS should be integrated into virtio_net
and what its big picture is.

>
> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>



-- 
Regards,

Zhi Yong Wu

  reply	other threads:[~2014-01-17 16:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15 14:20 [RFC PATCH net-next 0/3] virtio_net: add aRFS support Zhi Yong Wu
2014-01-15 14:20 ` [RFC PATCH net-next 1/3] virtio_pci: Introduce one new config api vp_get_vq_irq() Zhi Yong Wu
2014-01-15 14:20 ` [RFC PATCH net-next 2/3] virtio_net: Introduce one dummy function virtnet_filter_rfs() Zhi Yong Wu
2014-01-15 17:54   ` Tom Herbert
2014-01-16  2:45     ` Zhi Yong Wu
2014-01-15 14:20 ` [RFC PATCH net-next 3/3] virtio-net: Add accelerated RFS support Zhi Yong Wu
2014-01-16 21:31   ` Ben Hutchings
2014-01-16 22:00     ` Zhi Yong Wu
2014-01-16 23:16       ` Ben Hutchings
2014-01-17 16:54         ` Zhi Yong Wu [this message]
2014-01-17 17:20           ` Ben Hutchings
2014-01-18  4:59             ` Tom Herbert
2014-01-18 14:19               ` Ben Hutchings
2014-01-16  4:23 ` [RFC PATCH net-next 0/3] virtio_net: add aRFS support Jason Wang
2014-01-16  8:34   ` Fwd: " Zhi Yong Wu
2014-01-16  8:52     ` Stefan Hajnoczi
2014-01-16 17:12       ` Tom Herbert
2014-01-17  3:26         ` Jason Wang
2014-01-17  5:08           ` Tom Herbert
2014-01-17  6:36             ` Jason Wang
2014-01-17 16:03               ` Tom Herbert
2014-01-17  5:22         ` Stefan Hajnoczi
2014-01-17  6:45           ` Jason Wang
2014-01-20 14:36           ` Ben Hutchings
2014-01-22 13:27         ` Zhi Yong Wu
2014-01-22 18:00           ` Tom Herbert
2014-01-23  0:40             ` Zhi Yong Wu
2014-01-23 14:23             ` Michael S. Tsirkin
2014-01-17  3:04       ` Jason Wang
2014-01-20 16:49         ` Stefan Hajnoczi
2014-01-16  8:48   ` Zhi Yong Wu
2014-01-23 14:26     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEH94LgKJ0qThGHS-hqXsYKqOe+ns2pRi-wfZy=af1RMejOZ5A@mail.gmail.com' \
    --to=zwu.kernel@gmail.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=stefanha@redhat.com \
    --cc=therbert@google.com \
    --cc=wuzhy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).