From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753438AbdKHFnz (ORCPT <rfc822;w@1wt.eu>);
        Wed, 8 Nov 2017 00:43:55 -0500
Received: from mx1.redhat.com ([209.132.183.28]:39738 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751727AbdKHFny (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 8 Nov 2017 00:43:54 -0500
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DE2EB5F7AC
Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mst@redhat.com
Date: Wed, 8 Nov 2017 07:43:52 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
        Network Development <netdev@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>, Tom Herbert <tom@herbertland.com>,
        Aaron Conole <aconole@redhat.com>
Subject: Re: [PATCH net-next V2 3/3] tun: add eBPF based queue selection
 method
Message-ID: <20171108073717-mutt-send-email-mst@kernel.org>
References: <1509445938-4345-1-git-send-email-jasowang@redhat.com>
 <1509445938-4345-4-git-send-email-jasowang@redhat.com>
 <CAF=yD-L7v-KQQ5SZ9yVUiqPcaNCn5XbwcPrPHXnGO_tDv6_UgQ@mail.gmail.com>
 <CAF=yD-Jp6+0gsbRv9ivyuuAbKzPJ0ooA1Zx28uZe+a6zZpqNaQ@mail.gmail.com>
 <1e5256e3-72cf-fa6b-b00e-2661e29291b1@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1e5256e3-72cf-fa6b-b00e-2661e29291b1@redhat.com>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 08 Nov 2017 05:43:54 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Nov 08, 2017 at 02:28:53PM +0900, Jason Wang wrote:
> 
> 
> On 2017年11月04日 08:56, Willem de Bruijn wrote:
> > On Fri, Nov 3, 2017 at 5:56 PM, Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > > On Tue, Oct 31, 2017 at 7:32 PM, Jason Wang <jasowang@redhat.com> wrote:
> > > > This patch introduces an eBPF based queue selection method based on
> > > > the flow steering policy ops. Userspace could load an eBPF program
> > > > through TUNSETSTEERINGEBPF. This gives much more flexibility compare
> > > > to simple but hard coded policy in kernel.
> > > > 
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > +static int tun_set_steering_ebpf(struct tun_struct *tun, void __user *data)
> > > > +{
> > > > +       struct bpf_prog *prog;
> > > > +       u32 fd;
> > > > +
> > > > +       if (copy_from_user(&fd, data, sizeof(fd)))
> > > > +               return -EFAULT;
> > > > +
> > > > +       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > If the idea is to allow guests to pass BPF programs down to the host,
> > > you may want to define a new program type that is more restrictive than
> > > socket filter.
> > > 
> > > The external functions allowed for socket filters (sk_filter_func_proto)
> > > are relatively few (compared to, say, clsact), but may still leak host
> > > information to a guest. More importantly, guest security considerations
> > > limits how we can extend socket filters later.
> > Unless the idea is for the hypervisor to prepared the BPF based on a
> > limited set of well defined modes that the guest can configure. Then
> > socket filters are fine, as the BPF is prepared by a regular host process.
> 
> Yes, I think the idea is to let qemu to build a BPF program now.
> 
> Passing eBPF program from guest to host is interesting, but an obvious issue
> is how to deal with the accessing of map.
> 
> Thanks

Fundamentally, I suspect the way to solve it is to allow
the program to specify "should be offloaded to host".

And then it would access the host map rather than the guest map.

Then add some control path API for guest to poke at the host map.

It's not that there's anything special about the host map -
it's just separate from the guest - so if we wanted to
do something that can work on bare-metal we could -
just do something like a namespace and put all host
maps there. But I'm not sure it's worth the complexity.

Cc Aaron who wanted to look at this.

-- 
MST