All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: He Zhe <zhe.he@windriver.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"open list:NETWORKING DRIVERS" <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Bruce Fields <bfields@fieldses.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH] SUNRPC: Fix svc_flush_dcache()
Date: Tue, 22 Sep 2020 10:14:03 -0400	[thread overview]
Message-ID: <EFDE7D43-5E0D-4484-9B8C-CBED30EA4E04@oracle.com> (raw)
In-Reply-To: <07c27f89-187b-69a3-fd40-f9beef29da40@windriver.com>



> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@windriver.com> wrote:
> 
> 
> 
> On 9/21/20 3:51 AM, Chuck Lever wrote:
>> On platforms that implement flush_dcache_page(), a large NFS WRITE
>> triggers the WARN_ONCE in bvec_iter_advance():
>> 
>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>> 
>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 [nfsd]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? kthread_mod_delayed_work+0x115/0x115
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
>> 
>> Reported-by: He Zhe <zhe.he@windriver.com>
>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/svcsock.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> Hi Zhe-
>> 
>> If you confirm this fixes your issue and there are no other
>> objections or regressions, I can submit this for v5.9-rc.
> 
> I don't quite get why we add "seek" to "size". It seems this action does not
> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
> introduces overflow.

> Why don't we avoid this problem at the very begginning like my v1? That is, call
> svc_flush_bvec only when we have received more than we want to seek.
> 
>         len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT);
> -       if (len > 0)
> +       if (len > 0 && (size_t)len > (seek & PAGE_MASK))
>                 svc_flush_bvec(bvec, len, seek);

Because this doesn't fix the underlying bug that triggered the
WARN_ONCE.

svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call
from a sequence of sock_recvmsg's.

@seek is the running number of bytes that has been received so
far for the RPC Call we are assembling. @size is the number of
bytes that was just received in the most recent sock_recvmsg.

We want svc_flush_bvec to flush just the area of @bvec that
hasn't been flushed yet.

Thus: the current size of the partial Call message in @bvec is
@seek + @size. The starting location of the flush is
@seek & PAGE_MASK. This aligns the flush so it starts on a page
boundary.

This:

 230         struct bvec_iter bi = {
 231                 .bi_size        = size + seek,
 232         };

 235         bvec_iter_advance(bvec, &bi, seek & PAGE_MASK);

advances the bvec_iter to the part of @bvec that hasn't been
flushed yet.

This loop:

 236         for_each_bvec(bv, bvec, bi, bi)
 237                 flush_dcache_page(bv.bv_page);

flushes each page starting at that point to the end of the bytes
that have been received so far

In other words, ca07eda33e01 was wrong because it always flushed
the first section of @bvec, never the later parts of it.


> Regards,
> Zhe
> 
>> 
>> 
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index d5805fa1d066..c2752e2b9ce3 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
>> {
>> 	struct bvec_iter bi = {
>> -		.bi_size	= size,
>> +		.bi_size	= size + seek,
>> 	};
>> 	struct bio_vec bv;

--
Chuck Lever




  reply	other threads:[~2020-09-22 14:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-20 19:51 [PATCH] SUNRPC: Fix svc_flush_dcache() Chuck Lever
2020-09-22  7:13 ` He Zhe
2020-09-22 14:14   ` Chuck Lever [this message]
2020-09-23  3:30     ` He Zhe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EFDE7D43-5E0D-4484-9B8C-CBED30EA4E04@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=anna.schumaker@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=zhe.he@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.