From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jYv9=4P=dpdk.org=dev-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.2 required=3.0 tests=DATE_IN_FUTURE_06_12,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 677E4C4BA37
	for <dpdk-dev@archiver.kernel.org>; Thu, 27 Feb 2020 02:09:59 +0000 (UTC)
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by mail.kernel.org (Postfix) with ESMTP id D3F2724650
	for <dpdk-dev@archiver.kernel.org>; Thu, 27 Feb 2020 02:09:58 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3F2724650
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 1B0941BFB1;
	Thu, 27 Feb 2020 03:09:58 +0100 (CET)
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by dpdk.org (Postfix) with ESMTP id D99CB1F1C
 for <dev@dpdk.org>; Thu, 27 Feb 2020 03:09:55 +0100 (CET)
X-Amp-Result: UNKNOWN
X-Amp-Original-Verdict: FILE UNKNOWN
X-Amp-File-Uploaded: False
Received: from orsmga007.jf.intel.com ([10.7.209.58])
 by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 26 Feb 2020 18:09:54 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.70,490,1574150400"; d="scan'208";a="226945495"
Received: from hs1.sh.intel.com (HELO hs1) ([10.67.119.44])
 by orsmga007.jf.intel.com with ESMTP; 26 Feb 2020 18:09:40 -0800
Date: Thu, 27 Feb 2020 04:25:39 -0500
From: Wang Xiang <xiang.w.wang@intel.com>
To: Ori Kam <orika@mellanox.com>
Cc: Jerin Jacob <jerinjacobk@gmail.com>, Jerin Jacob <jerinj@marvell.com>,
 dpdk-dev <dev@dpdk.org>, Pavan Nikhilesh <pbhagavatula@marvell.com>,
 Shahaf Shuler <shahafs@mellanox.com>,
 Hemant Agrawal <hemant.agrawal@nxp.com>, Opher Reviv <opher@mellanox.com>,
 Alex Rosenbaum <alexr@mellanox.com>,
 "dovrat@marvell.com" <dovrat@marvell.com>,
 Prasun Kapoor <pkapoor@marvell.com>, Nipun Gupta <nipun.gupta@nxp.com>,
 "Richardson, Bruce" <bruce.richardson@intel.com>,
 "yang.a.hong@intel.com" <yang.a.hong@intel.com>,
 "harry.chang@intel.com" <harry.chang@intel.com>,
 "gu.jian1@zte.com.cn" <gu.jian1@zte.com.cn>,
 "shanjiangh@chinatelecom.cn" <shanjiangh@chinatelecom.cn>,
 "zhangy.yun@chinatelecom.cn" <zhangy.yun@chinatelecom.cn>,
 "lixingfu@huachentel.com" <lixingfu@huachentel.com>,
 "wushuai@inspur.com" <wushuai@inspur.com>,
 "yuyingxia@yxlink.com" <yuyingxia@yxlink.com>,
 "fanchenggang@sunyainfo.com" <fanchenggang@sunyainfo.com>,
 "davidfgao@tencent.com" <davidfgao@tencent.com>,
 "liuzhong1@chinaunicom.cn" <liuzhong1@chinaunicom.cn>,
 "zhaoyong11@huawei.com" <zhaoyong11@huawei.com>,
 "oc@yunify.com" <oc@yunify.com>, "jim@netgate.com" <jim@netgate.com>,
 "hongjun.ni@intel.com" <hongjun.ni@intel.com>,
 "j.bromhead@titan-ic.com" <j.bromhead@titan-ic.com>,
 "deri@ntop.org" <deri@ntop.org>, "fc@napatech.com" <fc@napatech.com>,
 "arthur.su@lionic.com" <arthur.su@lionic.com>,
 Thomas Monjalon <thomas@monjalon.net>
Message-ID: <20200227092539.GA2188@hs1>
References: <20190627155036.56940-1-jerinj@marvell.com>
 <1580202029-37096-1-git-send-email-orika@mellanox.com>
 <CALBAE1O823JWYtwNPUkLZb=QOxxWTP3ipsUdikaMQ4udgR9mrA@mail.gmail.com>
 <AM6PR05MB5176AC40CD7F5A269F8A4F99DBEF0@AM6PR05MB5176.eurprd05.prod.outlook.com>
 <CALBAE1M_XKjs7W93+VLjiXJm_v5zamoWfnn9aSjG=ArU9LVOPA@mail.gmail.com>
 <AM0PR05MB51728A2770B60D32AFC8AE3DDBEF0@AM0PR05MB5172.eurprd05.prod.outlook.com>
 <CALBAE1NzYnWUXXBmnjFd1=rcV+Rsb4q_koirgEWmC9zA4B=x8A@mail.gmail.com>
 <AM6PR05MB5176621E6EEFB45C88CB4AD3DBED0@AM6PR05MB5176.eurprd05.prod.outlook.com>
 <20200226090318.GA85327@hs1>
 <AM6PR05MB5176C25A83BAB3C90900A6A2DBEA0@AM6PR05MB5176.eurprd05.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <AM6PR05MB5176C25A83BAB3C90900A6A2DBEA0@AM6PR05MB5176.eurprd05.prod.outlook.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Ori,

Thanks for the comments.

Hyperscan supports both start_offset and end_offset modes with most
users choosing end_offset for rule coverage and performance reasons.
I'm OK to have the default behavior with start_offset and len.
It'll be good to change RTE_REGEX_DEV_CFG_MATCH_AS_START to 
RTE_REGEX_DEV_CFG_MATCH_AS_END. For users who need only end_offset,
they have to set RTE_REGEX_DEV_CFG_MATCH_AS_END bit. We may also remove 
RTE_REGEX_DEV_SUPP_MATCH_AS_START if you like.

One question is related to the consistency of start_offset definition 
among different solutions, does all solutions return the leftmost
start_offset, i.e. for rule: foo.*bar and input: foofoobar, the returned
start_offset will be 0 not 3?

Thanks,
Xiang

On Wed, Feb 26, 2020 at 08:36:51AM +0000, Ori Kam wrote:
> Hi Xiang,
> 
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Wang Xiang
> > Sent: Wednesday, February 26, 2020 11:03 AM
> > To: Ori Kam <orika@mellanox.com>
> > Cc: Jerin Jacob <jerinjacobk@gmail.com>; Jerin Jacob <jerinj@marvell.com>;
> > dpdk-dev <dev@dpdk.org>; Pavan Nikhilesh <pbhagavatula@marvell.com>;
> > Shahaf Shuler <shahafs@mellanox.com>; Hemant Agrawal
> > <hemant.agrawal@nxp.com>; Opher Reviv <opher@mellanox.com>; Alex
> > Rosenbaum <alexr@mellanox.com>; dovrat@marvell.com; Prasun Kapoor
> > <pkapoor@marvell.com>; Nipun Gupta <nipun.gupta@nxp.com>; Richardson,
> > Bruce <bruce.richardson@intel.com>; yang.a.hong@intel.com;
> > harry.chang@intel.com; gu.jian1@zte.com.cn; shanjiangh@chinatelecom.cn;
> > zhangy.yun@chinatelecom.cn; lixingfu@huachentel.com; wushuai@inspur.com;
> > yuyingxia@yxlink.com; fanchenggang@sunyainfo.com;
> > davidfgao@tencent.com; liuzhong1@chinaunicom.cn;
> > zhaoyong11@huawei.com; oc@yunify.com; jim@netgate.com;
> > hongjun.ni@intel.com; j.bromhead@titan-ic.com; deri@ntop.org;
> > fc@napatech.com; arthur.su@lionic.com; Thomas Monjalon
> > <thomas@monjalon.net>
> > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem
> > 
> > Hi Ori and Jerin,
> > 
> > One comment regarding my concern with len and end_offset problem.
> > From open source SW regex library(libpcre, re2 and Hyperscan) and
> > Intel's perspective, the matching results returned are always start
> > offset and end offset. More importantly, Hyperscan only reports end offset
> > most of the time.
> > 
> > It'll be good to keep this union as an abstraction and enforce the default
> > behavior for each solution, i.e. HW solutions doesn't support MATCH_AS_START
> > flag at rule compile time. Applications will know the meaning of variable at
> > rule compile time with the flag so they don't have to do extra check at fast path
> > run-time matching.
> > Welcome for better abstraction ideas.
> > 
> 
> I don't mind to keep the union as it was in V3, but I would like to remove the
> configuration bit (RTE_REGEX_DEV_CFG_MATCH_AS_START). 
> Meaning that if the device reports RTE_REGEX_DEV_SUPP_MATCH_AS_START
> the result will always be with start_offset and len.
> 
> Best,
> Ori
> 
> > Thanks,
> > Xiang
> > 
> > > > > > +                       /**< Starting Byte Position for matched rule. */
> > > > > > +                       RTE_STD_C11
> > > > > > +                       union {
> > > > > > +                               uint16_t len;
> > > > > > +                               /**< Length of match in bytes */
> > > > > > +                               uint16_t end_offset;
> > > > > > +                               /**< The end offset of the match. In case
> > > > > > +                                * MATCH_AS_START configuration is disabled.
> > > > > > +                                * @see RTE_REGEX_DEV_CFG_MATCH_AS_START
> > > > > > +                                */
> > > > >
> > > > > We have not concluded on this scheme. Have one field which has
> > > > > different meaning will be difficult
> > > > > for application. i.e fast path we need to have a check for this.
> > > > >
> > > >
> > > > This is the time to conclude . at least for the first version.
> > > > Why do we have one field with different meaning?
> > > > The result can be ether len or end_offset.
> > > >
> > > > > I think, Based on the majority of HW/SW implementation, we need to
> > > > > either go with len or
> > > > > end_offset. What Mellanox HW returns? len or end_offset?
> > > > >
> > > >
> > > > From Mellanox perspective we prefer the len approach. We also think
> > > > it is much more user oriented.
> > > >
> > > > > or We can keep it as len or end_offset based on which drivers upstream
> > > first,
> > > > > other drivers when it comes, we can see how to abstract it?
> > > > >
> > > >
> > > > I can except that assuming we choose the start and len approach
> > >
> > > I think, we can have first version with "start and len" by removing
> > > RTE_REGEX_DEV_CFG_MATCH_AS_START.
> > > When can think, how to abstract new drivers when it upstream based on
> > > the overhead.
> > >
> > 
> > 
> > On Tue, Feb 25, 2020 at 07:48:54AM +0000, Ori Kam wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Tuesday, February 25, 2020 7:57 AM
> > > > To: Ori Kam <orika@mellanox.com>
> > > > Cc: Jerin Jacob <jerinj@marvell.com>; xiang.w.wang@intel.com; dpdk-dev
> > > > <dev@dpdk.org>; Pavan Nikhilesh <pbhagavatula@marvell.com>; Shahaf
> > > > Shuler <shahafs@mellanox.com>; Hemant Agrawal
> > > > <hemant.agrawal@nxp.com>; Opher Reviv <opher@mellanox.com>; Alex
> > > > Rosenbaum <alexr@mellanox.com>; dovrat@marvell.com; Prasun Kapoor
> > > > <pkapoor@marvell.com>; Nipun Gupta <nipun.gupta@nxp.com>;
> > Richardson,
> > > > Bruce <bruce.richardson@intel.com>; yang.a.hong@intel.com;
> > > > harry.chang@intel.com; gu.jian1@zte.com.cn;
> > shanjiangh@chinatelecom.cn;
> > > > zhangy.yun@chinatelecom.cn; lixingfu@huachentel.com;
> > wushuai@inspur.com;
> > > > yuyingxia@yxlink.com; fanchenggang@sunyainfo.com;
> > > > davidfgao@tencent.com; liuzhong1@chinaunicom.cn;
> > > > zhaoyong11@huawei.com; oc@yunify.com; jim@netgate.com;
> > > > hongjun.ni@intel.com; j.bromhead@titan-ic.com; deri@ntop.org;
> > > > fc@napatech.com; arthur.su@lionic.com; Thomas Monjalon
> > > > <thomas@monjalon.net>
> > > > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem
> > > >
> > > > > > 4) app/test/test_regexdev.c like app/test/test_eventdev.c
> > > > >
> > > > > We started to create a super basic app, after the API will be finalized and
> > we
> > > > will have HW
> > > > > we can push it. (if you need it faster than feel free)
> > > >
> > > > A simple Unit test case needs to be present for the APIs. On the
> > > > course of developing common code,
> > > > it can be developed to test the common code with dummy/skeleton driver.
> > > >
> > >
> > > Agree this is what we are currently have.
> > >
> > > > >
> > > > > > 5) Need a maintainer for maintaining the regex subsystem
> > > > > >
> > > > > We wish to maintain it if you agree.
> > > >
> > > > Yes. Please.
> > > >
> > >
> > > Great.
> > >
> > > > > > >
> > > > > > > One more thing, regarding the ops structure, I think it is better to split
> > it
> > > > to 2
> > > > > > different
> > > > > > > structures one enque and one for dequeue, since there are no real
> > shared
> > > > > > data and we will
> > > > > > > be able to save memory, what do you think?
> > > > > >
> > > > > > Ops are allocated from mempool so it will be overhead to manage both.
> > > > > > moreover, some
> > > > > > of the fields added in req can be used for resp as info. cryptodev
> > > > > > follows the similar concept,
> > > > > > I think, we can have symmetry with cryptodev wherever is possible to
> > avoid
> > > > > > end-user to learn new API models.
> > > > >
> > > > > True that there will be overhead with 2 mempools (small one)
> > > > > but lets assume 255 results. This means that the buffer should be 255 *
> > > > sizeof(rte_regex_match) = 2K
> > > > > also this will enable us to replace groupX with group[] which will allow
> > even
> > > > more groups.
> > > > > In addition don't think that crypto is a good example.
> > > > > The main difference is that in RegEx the output is different format then
> > the
> > > > input.
> > > >
> > > > # IMO, Some of the fields may be useful for a response as well. I
> > > > think application may be interested in following
> > > > req filed in the response.
> > > > a) buf_addr
> > >
> > > I don't see how this can be used in the response. since if working in out of
> > order result.
> > > you don’t know which result will be returned.
> > > I also think it is error prone to use the same op for the enqueue and dequeue.
> > >
> > > > b) scan_size
> > >
> > > Please see above.
> > >
> > > > c) user_id (This would be main one)
> > >
> > > Agree
> > >
> > > >
> > > > # Having two mempools adds overhead per lcore L1 cache usage and extra
> > > > complexity to the application.
> > > >
> > > > # IMO, From a performance perspective, one mempool is good due to less
> > > > stress on the cache and it is costly to
> > > > add new mempool for HW mempool implementations.
> > > >
> > > > # I think, group[] use case we can add it when it required by
> > > > introducing "matches_start_offset" field, which will
> > > > tell the req, where is the end of group[] and where "matches" start
> > > > with single mempool scheme also.
> > > >
> > > > # I think, one of the other use case for "matches_start_offset" that,
> > > > It may possible to put vendor-specific
> > > > opaque data. It will be filled by driver on response. The application
> > > > can reference the matches as
> > > >
> > > > struct rte_regex_match *matches = RTE_PTR_ADD(ops, ops-
> > > >
> > > >matches_start_offset);
> > > >
> > >
> > > O.K for now we will keep  it as is, and we will see what will be in the future.
> > >
> > > > >
> > > > > > I assume you will send the v4 with these comments. I think, with v4 we
> > > > > > can start implementing common library code.
> > > > >
> > > > > Just need to agree on the split (one more iteration )
> > > > > and I will start working on the common code.
> > > >
> > > > Ack.
> > >
> > > Great,
> > > I'm starting to work on V4 with all comments so the RFC will be acked and
> > then will start
> > > coding the rest of the common code.
> > >