From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91A14C2D0EC for ; Fri, 10 Apr 2020 19:15:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1F46D20769 for ; Fri, 10 Apr 2020 19:15:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sUYdYF4n" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1F46D20769 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B7AAE8E0054; Fri, 10 Apr 2020 15:15:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B04938E004D; Fri, 10 Apr 2020 15:15:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CBA28E0054; Fri, 10 Apr 2020 15:15:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225]) by kanga.kvack.org (Postfix) with ESMTP id 7CA198E004D for ; Fri, 10 Apr 2020 15:15:42 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2B3F66D62 for ; Fri, 10 Apr 2020 19:15:42 +0000 (UTC) X-FDA: 76692899724.20.store04_149fa19bc7d31 X-HE-Tag: store04_149fa19bc7d31 X-Filterd-Recvd-Size: 11200 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Apr 2020 19:15:41 +0000 (UTC) Received: by mail-wm1-f67.google.com with SMTP id c195so5387749wme.1 for ; Fri, 10 Apr 2020 12:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lrjUZsrELM7m4SlJ+9uENDrXv8OPhuZwcNmebnVp+g4=; b=sUYdYF4ng7IxA3oM7M4EXrPxtjOlrd07lGI0bZ7V3I9oL/U0Jju1ENtZw6khx4uiE6 Ftp7kw+8g46oDI1ylThDUYn9y2sJ08l7u6BmnBnVJbKwHpjpw9+S2G3ef+DCoiWtVP2r ZmCi60xr/BRFD0EPe7N+0X2T8xlsAh8mFPJeO8kRzWOtILLoIxcs4fZc16002hVgRmDs nvU8ApZy9MfU33ZsDae2golB5fmjpemDbFuyejlxh28dOUzAUT0zBWvVNam962iHvUq1 74l85u3hPvgceMev0I7wCUNDC2F+6xwwgaP9ZOhWVzWGGXq3IKT73CQwi86z9VbFQnbH RgPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lrjUZsrELM7m4SlJ+9uENDrXv8OPhuZwcNmebnVp+g4=; b=Wgt63BQn6fotiSp961D4KQEQ6Y6bpTQkimMmpto1xQtxchdYRBRr9n95C+yCO332VT nJp2/xYiBuPQ09H5mVbqwgXLUNUPxCr/TG/mD8Tgoi2LOg2NCFwufHZB81eOj1J9SPj2 Xyi9TEBZ0c2dWx75cRYtIC5HPxhmccgrB7E9KodfXNian9W37voNfNPTGodcXJH4ATfM m9D0Xew2duiWKRY8+VDsujkoLdTJS2E66RnZgjpJrO1TVW5KvnvUg/L5JdJ5JsKChEjZ 7l1/xUPDKvCw5sN+9xDRZYx/fCMk8KEnhYFieyqYKJTX2yGxmCAXihFHfn7kp0JHCr/G tYYw== X-Gm-Message-State: AGi0PuZLq9eQldrDIPOqPdDjM/uNVC4w45w1ZRi8u87Gq/FYLwwW1Rvs BdCq++NZ5mUXRqCPSH2Qp3/heiUPozCq4oYlBdy3mw== X-Google-Smtp-Source: APiQypLr8/Zce0NrHgv+zGJPQH+trYjcHEyQwuE6spT5cIasUPOPuHIwu1JExbYTBfOXO4IbRx7lW1eARB+HzIWuMyI= X-Received: by 2002:a1c:1f4a:: with SMTP id f71mr6377573wmf.8.1586546140136; Fri, 10 Apr 2020 12:15:40 -0700 (PDT) MIME-Version: 1.0 References: <20200128025958.43490-1-arjunroy.kdev@gmail.com> <20200128025958.43490-3-arjunroy.kdev@gmail.com> <20200212185605.d89c820903b7aa9fbbc060b2@linux-foundation.org> <20200410120443.ad7856db13e158fbd441f3ae@linux-foundation.org> In-Reply-To: <20200410120443.ad7856db13e158fbd441f3ae@linux-foundation.org> From: Arjun Roy Date: Fri, 10 Apr 2020 12:15:29 -0700 Message-ID: Subject: Re: [PATCH resend mm,net-next 3/3] net-zerocopy: Use vm_insert_pages() for tcp rcv zerocopy. To: Andrew Morton Cc: Arjun Roy , David Miller , netdev , linux-mm@kvack.org, Eric Dumazet , Soheil Hassas Yeganeh Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 10, 2020 at 12:04 PM Andrew Morton wrote: > > On Fri, 21 Feb 2020 13:21:41 -0800 Arjun Roy wrote: > > > I remain a bit concerned regarding the merge process for this specific > > patch (0003, the net/ipv4/tcp.c change) since I have other in-flight > > changes for TCP receive zerocopy that I'd like to upstream for > > net-next - and would like to avoid weird merge issues. > > > > So perhaps the following could work: > > > > 1. Andrew, perhaps we could remove this particular patch (0003, the > > net/ipv4/tcp.c change) from mm-next; that way we merge > > vm_insert_pages() but not the call-site within TCP, for now. > > 2. net-next will eventually pick vm_insert_pages() up. > > 3. I can modify the zerocopy code to use it at that point? > > > > Else I'm concerned a complicated merge situation may result. > > The merge situation is quite clean. > > I guess I'll hold off on > net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy.patch (below) and > shall send it to davem after Linus has merged the prerequisites. > Acknowledged, thank you! (resend because gmail conveniently forgot my plain text mode preference...) -Arjun > > From: Arjun Roy > Subject: net-zerocopy: use vm_insert_pages() for tcp rcv zerocopy > > Use vm_insert_pages() for tcp receive zerocopy. Spin lock cycles (as > reported by perf) drop from a couple of percentage points to a fraction of > a percent. This results in a roughly 6% increase in efficiency, measured > roughly as zerocopy receive count divided by CPU utilization. > > The intention of this patchset is to reduce atomic ops for tcp zerocopy > receives, which normally hits the same spinlock multiple times > consecutively. > > [akpm@linux-foundation.org: suppress gcc-7.2.0 warning] > Link: http://lkml.kernel.org/r/20200128025958.43490-3-arjunroy.kdev@gmail.com > Signed-off-by: Arjun Roy > Signed-off-by: Eric Dumazet > Signed-off-by: Soheil Hassas Yeganeh > Cc: David Miller > Cc: Matthew Wilcox > Cc: Jason Gunthorpe > Cc: Stephen Rothwell > Signed-off-by: Andrew Morton > --- > > net/ipv4/tcp.c | 70 ++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 63 insertions(+), 7 deletions(-) > > --- a/net/ipv4/tcp.c~net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy > +++ a/net/ipv4/tcp.c > @@ -1734,14 +1734,48 @@ int tcp_mmap(struct file *file, struct s > } > EXPORT_SYMBOL(tcp_mmap); > > +static int tcp_zerocopy_vm_insert_batch(struct vm_area_struct *vma, > + struct page **pages, > + unsigned long pages_to_map, > + unsigned long *insert_addr, > + u32 *length_with_pending, > + u32 *seq, > + struct tcp_zerocopy_receive *zc) > +{ > + unsigned long pages_remaining = pages_to_map; > + int bytes_mapped; > + int ret; > + > + ret = vm_insert_pages(vma, *insert_addr, pages, &pages_remaining); > + bytes_mapped = PAGE_SIZE * (pages_to_map - pages_remaining); > + /* Even if vm_insert_pages fails, it may have partially succeeded in > + * mapping (some but not all of the pages). > + */ > + *seq += bytes_mapped; > + *insert_addr += bytes_mapped; > + if (ret) { > + /* But if vm_insert_pages did fail, we have to unroll some state > + * we speculatively touched before. > + */ > + const int bytes_not_mapped = PAGE_SIZE * pages_remaining; > + *length_with_pending -= bytes_not_mapped; > + zc->recv_skip_hint += bytes_not_mapped; > + } > + return ret; > +} > + > static int tcp_zerocopy_receive(struct sock *sk, > struct tcp_zerocopy_receive *zc) > { > unsigned long address = (unsigned long)zc->address; > u32 length = 0, seq, offset, zap_len; > + #define PAGE_BATCH_SIZE 8 > + struct page *pages[PAGE_BATCH_SIZE]; > const skb_frag_t *frags = NULL; > struct vm_area_struct *vma; > struct sk_buff *skb = NULL; > + unsigned long pg_idx = 0; > + unsigned long curr_addr; > struct tcp_sock *tp; > int inq; > int ret; > @@ -1754,6 +1788,8 @@ static int tcp_zerocopy_receive(struct s > > sock_rps_record_flow(sk); > > + tp = tcp_sk(sk); > + > down_read(¤t->mm->mmap_sem); > > ret = -EINVAL; > @@ -1762,7 +1798,6 @@ static int tcp_zerocopy_receive(struct s > goto out; > zc->length = min_t(unsigned long, zc->length, vma->vm_end - address); > > - tp = tcp_sk(sk); > seq = tp->copied_seq; > inq = tcp_inq(sk); > zc->length = min_t(u32, zc->length, inq); > @@ -1774,8 +1809,20 @@ static int tcp_zerocopy_receive(struct s > zc->recv_skip_hint = zc->length; > } > ret = 0; > + curr_addr = address; > while (length + PAGE_SIZE <= zc->length) { > if (zc->recv_skip_hint < PAGE_SIZE) { > + /* If we're here, finish the current batch. */ > + if (pg_idx) { > + ret = tcp_zerocopy_vm_insert_batch(vma, pages, > + pg_idx, > + &curr_addr, > + &length, > + &seq, zc); > + if (ret) > + goto out; > + pg_idx = 0; > + } > if (skb) { > if (zc->recv_skip_hint > 0) > break; > @@ -1784,7 +1831,6 @@ static int tcp_zerocopy_receive(struct s > } else { > skb = tcp_recv_skb(sk, seq, &offset); > } > - > zc->recv_skip_hint = skb->len - offset; > offset -= skb_headlen(skb); > if ((int)offset < 0 || skb_has_frag_list(skb)) > @@ -1808,14 +1854,24 @@ static int tcp_zerocopy_receive(struct s > zc->recv_skip_hint -= remaining; > break; > } > - ret = vm_insert_page(vma, address + length, > - skb_frag_page(frags)); > - if (ret) > - break; > + pages[pg_idx] = skb_frag_page(frags); > + pg_idx++; > length += PAGE_SIZE; > - seq += PAGE_SIZE; > zc->recv_skip_hint -= PAGE_SIZE; > frags++; > + if (pg_idx == PAGE_BATCH_SIZE) { > + ret = tcp_zerocopy_vm_insert_batch(vma, pages, pg_idx, > + &curr_addr, &length, > + &seq, zc); > + if (ret) > + goto out; > + pg_idx = 0; > + } > + } > + if (pg_idx) { > + ret = tcp_zerocopy_vm_insert_batch(vma, pages, pg_idx, > + &curr_addr, &length, &seq, > + zc); > } > out: > up_read(¤t->mm->mmap_sem); > _ >