All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@leon.nu>
To: santosh shilimkar <santosh.shilimkar@oracle.com>
Cc: linux-rdma@vger.kernel.org,
	Wengang Wang <wen.gang.wang@oracle.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH] RDS: sync congestion map updating
Date: Sat, 2 Apr 2016 04:14:59 +0300	[thread overview]
Message-ID: <20160402011459.GC8565@leon.nu> (raw)
In-Reply-To: <56FED04C.2060806@oracle.com>

On Fri, Apr 01, 2016 at 12:47:24PM -0700, santosh shilimkar wrote:
> (cc-ing netdev)
> On 3/30/2016 7:59 PM, Wengang Wang wrote:
> >
> >
> >在 2016年03月31日 09:51, Wengang Wang 写道:
> >>
> >>
> >>在 2016年03月31日 01:16, santosh shilimkar 写道:
> >>>Hi Wengang,
> >>>
> >>>On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
> >>>>On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
> >>>>>Problem is found that some among a lot of parallel RDS
> >>>>>communications hang.
> >>>>>In my test ten or so among 33 communications hang. The send
> >>>>>requests got
> >>>>>-ENOBUF error meaning the peer socket (port) is congested. But
> >>>>>meanwhile,
> >>>>>peer socket (port) is not congested.
> >>>>>
> >>>>>The congestion map updating can happen in two paths: one is in
> >>>>>rds_recvmsg path
> >>>>>and the other is when it receives packets from the hardware. There
> >>>>>is no
> >>>>>synchronization when updating the congestion map. So a bit
> >>>>>operation (clearing)
> >>>>>in the rds_recvmsg path can be skipped by another bit operation
> >>>>>(setting) in
> >>>>>hardware packet receving path.
> >>>>>
> >
> >To be more detailed.  Here, the two paths (user calls recvmsg and
> >hardware receives data) are for different rds socks. thus the
> >rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
> >map.
> >
> For archive purpose, let me try to conclude the thread. I synced
> with Wengang offlist and came up with below fix. I was under
> impression that __set_bit_le() was atmoic version. After fixing
> it like patch(end of the email), the bug gets addressed.
> 
> I will probably send this as fix for stable as well.
> 
> 
> From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
> From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Date: Wed, 30 Mar 2016 23:26:47 -0700
> Subject: [PATCH] RDS: Fix the atomicity for congestion map update
> 
> Two different threads with different rds sockets may be in
> rds_recv_rcvbuf_delta() via receive path. If their ports
> both map to the same word in the congestion map, then
> using non-atomic ops to update it could cause the map to
> be incorrect. Lets use atomics to avoid such an issue.
> 
> Full credit to Wengang <wen.gang.wang@oracle.com> for
> finding the issue, analysing it and also pointing out
> to offending code with spin lock based fix.

I'm glad that you solved the issue without spinlocks.
Out of curiosity, I see that this patch is needed to be sent
to Dave and applied by him. Is it right?

➜  linus-tree git:(master) ./scripts/get_maintainer.pl -f net/rds/cong.c
Santosh Shilimkar <santosh.shilimkar@oracle.com> (supporter:RDS -
RELIABLE DATAGRAM SOCKETS)
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING
[GENERAL])
netdev@vger.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
linux-rdma@vger.kernel.org (open list:RDS - RELIABLE DATAGRAM SOCKETS)
rds-devel@oss.oracle.com (moderated list:RDS - RELIABLE DATAGRAM
SOCKETS)
linux-kernel@vger.kernel.org (open list)

> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Reviewed-by: Leon Romanovsky <leon@leon.nu>

  reply	other threads:[~2016-04-02  1:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30  9:08 [PATCH] RDS: sync congestion map updating Wengang Wang
     [not found] ` <1459328902-31968-1-git-send-email-wen.gang.wang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-30 16:19   ` Leon Romanovsky
     [not found]     ` <20160330161952.GA2670-2ukJVAZIZ/Y@public.gmane.org>
2016-03-30 17:16       ` santosh shilimkar
     [not found]         ` <56FC09D6.7090602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-31  1:51           ` Wengang Wang
     [not found]             ` <56FC82B7.3070504-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-31  2:59               ` Wengang Wang
     [not found]                 ` <56FC927E.9090404-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-04-01 19:47                   ` santosh shilimkar
2016-04-02  1:14                     ` Leon Romanovsky [this message]
     [not found]                       ` <20160402011459.GC8565-2ukJVAZIZ/Y@public.gmane.org>
2016-04-02  4:30                         ` santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA
2016-03-31  1:24       ` Wengang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160402011459.GC8565@leon.nu \
    --to=leon@leon.nu \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=santosh.shilimkar@oracle.com \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.