linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rimmer, Todd" <todd.rimmer@intel.com>
To: Christoph Hellwig <hch@infradead.org>, Jason Gunthorpe <jgg@nvidia.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
	"Wan, Kaike" <kaike.wan@intel.com>,
	"dledford@redhat.com" <dledford@redhat.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"Rimmer, Todd" <todd.rimmer@intel.com>
Subject: RE: [PATCH RFC 0/9] A rendezvous module
Date: Tue, 23 Mar 2021 17:25:43 +0000	[thread overview]
Message-ID: <BL0PR11MB3299B764F783B77F018CC759F6649@BL0PR11MB3299.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20210323160709.GA2444111@infradead.org>

> I can only recommende everone to buy from a less f***** up GPU or
> accelerator vendor.  
I would certainly love that.  This is not just a recent problem, it's been going on for at least 3-5 years with no end in sight.  And the nvidia driver itself is closed-source in the kernel :-(  Making tuning and debug even harder and continuing to add costs to NIC vendors other than nVidia themselves to support this.

Back to the topic at hand, yes, there are a few misalignments in the ABI.  Most of the structures are carefully aligned.  Below I summarize the major structures and their alignment characteristics. In a few places we chose readability for the application programmer by ordering fields in a logical order, such as for statistics.   

In one place a superfluous resv field was used (rv_query_params_out)  and when I alluded that might be able to be taken advantage in the future to enable a common ABI for GPUs, we went down this deep rat hole.

In studying all the related fields, in most cases if we shuffled everything for maximum packing, the structures would still end up being about the same size and this would all be for non-performance path ABIs.

Here is a summary:
13 structures all perfectly aligned with no gaps, plus 7 structures below.

rv_query_params_out - has one 4 byte reserved field to guarantee alignment for a u64 which follows.

rv_attach_params_in - organized logically as early fields influence how later fields are interpreted.  Fair number of fields, two 1 byte gaps and one 7 byte gap.  Shuffling this might save about 4-8 bytes tops

rv_cache_stats_params_out - ordered logically by statistics meanings.  Two 4 byte gaps could be solved by having a less logical order.  Of course, applications reporting these statistics will tend to do output in the original order, so packing this results in a harder to use ABI and more difficult code review for application writers wanting to make sure they report all stats but do so in a logical order.

rv_conn_get_stats_params_out - one 2 byte gap (so the 1 bytes field mimicking the input request can be 1st), three 4 byte gaps.  Same explanation as rv_cache_stats_params_out

rv_conn_create_params_in - one 4 byte gap, easy enough to swap

rv_post_write_params_out - one 3 byte gap.  Presented in logical order, shuffling would still yield the same size as compiler will round up size.

rv_event - carefully packed and aligned.  Had to make this work on a wide range of compilers with a 1 byte common field defining which part of union was relevant.  Could put the same field in all unions to get rid of packed attribute if that is preferred.  We found other similar examples like this in an older 4.18 kernel, cited one below.

It should be noted, there are existing examples with small gaps or reserved fields in the existing kernel and RDMA stack.  A few examples in ib_user_verbs.h include:

ib_uverbs_send_wr - 4 byte gap after ex for the rdma field in union.

ib_uverbs_flow_attr - 2 byte reserved field declared

ib_flow_spec - 2 byte gap after size field

rdma_netdev - 7 byte gap after port_num

./hw/mthca/mthca_eq.c - very similar use of packed for mthca_eqe to the one complained about in rv_event

Todd

  reply	other threads:[~2021-03-23 17:26 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 12:56 [PATCH RFC 0/9] A rendezvous module kaike.wan
2021-03-19 12:56 ` [PATCH RFC 1/9] RDMA/rv: Public interferce for the RDMA Rendezvous module kaike.wan
2021-03-19 16:00   ` Jason Gunthorpe
2021-03-19 12:56 ` [PATCH RFC 2/9] RDMA/rv: Add the internal header files kaike.wan
2021-03-19 16:02   ` Jason Gunthorpe
2021-03-19 12:56 ` [PATCH RFC 3/9] RDMA/rv: Add the rv module kaike.wan
2021-03-19 12:56 ` [PATCH RFC 4/9] RDMA/rv: Add functions for memory region cache kaike.wan
2021-03-19 12:56 ` [PATCH RFC 5/9] RDMA/rv: Add function to register/deregister memory region kaike.wan
2021-03-19 12:56 ` [PATCH RFC 6/9] RDMA/rv: Add connection management functions kaike.wan
2021-03-19 12:56 ` [PATCH RFC 7/9] RDMA/rv: Add functions for RDMA transactions kaike.wan
2021-03-19 12:56 ` [PATCH RFC 8/9] RDMA/rv: Add functions for file operations kaike.wan
2021-03-19 12:56 ` [PATCH RFC 9/9] RDMA/rv: Integrate the file operations into the rv module kaike.wan
2021-03-19 13:53 ` [PATCH RFC 0/9] A rendezvous module Jason Gunthorpe
2021-03-19 14:49   ` Wan, Kaike
2021-03-19 15:48     ` Jason Gunthorpe
2021-03-19 19:22       ` Dennis Dalessandro
2021-03-19 19:44         ` Jason Gunthorpe
2021-03-19 20:12           ` Rimmer, Todd
2021-03-19 20:26             ` Jason Gunthorpe
2021-03-19 20:46               ` Rimmer, Todd
2021-03-19 20:54                 ` Jason Gunthorpe
2021-03-19 20:59                   ` Wan, Kaike
2021-03-19 21:28                     ` Dennis Dalessandro
2021-03-19 21:58                       ` Wan, Kaike
2021-03-19 22:35                         ` Jason Gunthorpe
2021-03-19 22:57                       ` Rimmer, Todd
2021-03-19 23:06                         ` Jason Gunthorpe
2021-03-20 16:39                         ` Dennis Dalessandro
2021-03-21  8:56                           ` Leon Romanovsky
2021-03-21 16:24                             ` Dennis Dalessandro
2021-03-21 16:45                               ` Jason Gunthorpe
2021-03-21 17:21                                 ` Dennis Dalessandro
2021-03-21 18:08                                   ` Jason Gunthorpe
2021-03-22 15:17                                     ` Rimmer, Todd
2021-03-22 16:47                                       ` Jason Gunthorpe
2021-03-22 17:31                                     ` Hefty, Sean
2021-03-23 22:56                                       ` Jason Gunthorpe
2021-03-23 23:29                                         ` Rimmer, Todd
2021-03-21 19:19                                   ` Wan, Kaike
2021-03-23 15:36                                   ` Christoph Hellwig
2021-03-23 15:35                                 ` Christoph Hellwig
2021-03-23 15:33                               ` Christoph Hellwig
2021-03-23 15:30                         ` Christoph Hellwig
2021-03-23 15:46                           ` Jason Gunthorpe
2021-03-23 16:07                             ` Christoph Hellwig
2021-03-23 17:25                               ` Rimmer, Todd [this message]
2021-03-23 17:44                                 ` Jason Gunthorpe
2021-03-19 20:18           ` Dennis Dalessandro
2021-03-19 20:30             ` Jason Gunthorpe
2021-03-19 20:34       ` Hefty, Sean
2021-03-21 12:08         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR11MB3299B764F783B77F018CC759F6649@BL0PR11MB3299.namprd11.prod.outlook.com \
    --to=todd.rimmer@intel.com \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=dledford@redhat.com \
    --cc=hch@infradead.org \
    --cc=jgg@nvidia.com \
    --cc=kaike.wan@intel.com \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).