From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matan Barak Subject: Re: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache Date: Tue, 14 Apr 2015 16:23:18 +0300 Message-ID: <552D14C6.50000@mellanox.com> References: <1427318422-12004-1-git-send-email-somnath.kotur@emulex.com> <44ab0dce-c7c9-400b-af24-10b8981358a7@CMEXHTCAS2.ad.emulex.com> <551347E9.5090503@sandisk.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <551347E9.5090503-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Bart Van Assche , Somnath Kotur , roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, Moni Shoua , Or Gerlitz Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-rdma@vger.kernel.org On 3/26/2015 1:42 AM, Bart Van Assche wrote: > On 03/25/2015 02:19 PM, Somnath Kotur wrote: >> + if (cache->data_vec[ix].attr.ndev && >> + cache->data_vec[ix].attr.ndev != old_net_dev) > > A few lines earlier the memory old_net_dev points at was freed. If two > instances of this function run concurrently, what prevents that the > old_net_dev memory has been reallocated and hence that attr.ndev == > old_net_dev although both pointers refer(red) to different network > devices ? > write_gid is *almost* always called in a mutex. The only case it's not protected is in free_roce_gid_cache. free_roce_gid_cache is called only in the error flow of roce_gid_cache_setup_one, when no concurrent write_gid could be made (as the cache isn't setup yet) and in roce_gid_cache_cleanup_one. roce_gid_cache_client_setup_one is called in the error flow of roce_gid_cache_client_setup_one (where no other write_gid are expected for the same above reason) and in roce_gid_cache_client_cleanup_work_handler, where it's called after flush_workqueue(roce_gid_mgmt_wq). Since all write_gids are called through roce_gid_mgmt_wq and we set the cache to inactive mode before flushing the wq and freeing the cache, I think we can conclude no concurrent write_gid on the same cache are possible. >> + ACCESS_ONCE(cache->data_vec[ix].seq) = orig_seq; > > Invoking write_gid() is only safe if the caller serializes write_gid() > calls. Apparently the cache->lock mutex is used for that purpose. So why > is it necessary to use ACCESS_ONCE() here ? Why is it needed to prevent > that the compiler coalesces this write with another write into the same > structure ? > The mutex only serializes cache writes. Cache reads could be done in concurrent with writes and are protected by the ACCESS_ONCE. >> + /* Make sure the sequence number we remeber was read > > This looks like a typo - shouldn't the above read "remember" ? > Will be fixed in V4, thanks. > BTW, the style of that comment is recommended only for networking code > and not for IB code. Have you verified this patch with checkpatch ? > Of course and I've just re-run checkpatch on this patch. It doesn't catch this. >> + mutex_lock(&cache->lock); >> + >> + for (ix = 0; ix < cache->sz; ix++) >> + if (cache->data_vec[ix].attr.ndev == ndev) >> + write_gid(ib_dev, port, cache, ix, &zgid, &zattr); >> + >> + mutex_unlock(&cache->lock); >> + return 0; > > The traditional Linux kernel coding style is one blank line before > mutex_lock() and after mutex_unlock() but not after mutex_lock() nor > before mutex_unlock(). > I didn't find this in the CodingStyle doc. Could you please quote or post a link? >> + orig_seq = ACCESS_ONCE(cache->data_vec[index].seq); >> + /* Make sure we read the sequence number before copying the >> + * gid to local storage. */ >> + smp_rmb(); > > Please use READ_ONCE() instead of ACCESS_ONCE() as recommended in > . > Ok, I'll change that in V4. I see that READ_ONCE and WRITE_ONCE is different from ACCESS_ONCE only for aggregated data types (which isn't our case), but it won't hurt to change that. >> +static void free_roce_gid_cache(struct ib_device *ib_dev, u8 port) >> +{ >> + int i; >> + struct ib_roce_gid_cache *cache = >> + ib_dev->cache.roce_gid_cache[port - 1]; >> + >> + if (!cache) >> + return; >> + >> + for (i = 0; i < cache->sz; ++i) { >> + if (memcmp(&cache->data_vec[i].gid, &zgid, >> + sizeof(cache->data_vec[i].gid))) >> + write_gid(ib_dev, port, cache, i, &zgid, &zattr); >> + } > > + kfree(cache->data_vec); > > + kfree(cache); > > +} > > Overwriting data just before it is freed is not useful. Please use > CONFIG_SLUB_DEBUG=y to debug use-after-free issues instead of such code. > It's mandatory as write_gid with zgid might cause the vendor driver to free memory it allocated for this GID entry (like in the mlx4 case). > Bart. Thanks for the review :) Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html