linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] optimize some data structure in nvme
@ 2023-05-01 12:40 Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 1/5] nvmet: Reorder fields in 'struct nvmet_sq' Christophe JAILLET
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-01 12:40 UTC (permalink / raw)
  To: hch, sagi, kch
  Cc: linux-nvme, linux-kernel, kernel-janitors, Christophe JAILLET

This serie is a proposal to slighly optimize the memory needed for some
structures used in nvme.

This follows the discussion in [1].

Honnestly, I'm not convinced that this serie really brings semething.
Because of the way memory alocation works, and its over-allocation to try to
avoid memory fragmentation, some limited gains are most of the time useless.

It could still help:
   - many holes in structure can, at some point, have its size reach a threshold
     (this is specially true if such structures are allocated with kcalloc() or
     kmalloc_array())
   - it can save some space in some other structures if embedded in them
   - it can save a few cycles if the structure is memcpy()'ed or zeroed, for
     example
   - can reduce cache usage

With that in mind, patch 3 is a win, patch 4 is likely a win, the other ones are
much more theorical.

The changes are really limited, so even if the gain is marginal, maybe it still
makes sense to merge them.

Each patch gives the layout generated by pahole before and after the patch.

[1]: https://lore.kernel.org/all/67a9e53e-4ac9-7ba8-9713-96c1dfe1e341@nvidia.com/

Christophe JAILLET (5):
  nvmet: Reorder fields in 'struct nvmet_sq'
  nvmet: Reorder fields in 'struct nvme_ctrl'
  nvmet: Reorder fields in 'struct nvmf_ctrl_options'
  nvmet: Reorder fields in 'struct nvme_dhchap_queue_context'
  nvmet: Reorder fields in 'struct nvmefc_fcp_req'

 drivers/nvme/host/auth.c       |  6 +++---
 drivers/nvme/host/fabrics.h    |  8 ++++----
 drivers/nvme/host/nvme.h       |  6 +++---
 drivers/nvme/target/nvmet.h    |  4 ++--
 include/linux/nvme-fc-driver.h | 10 +++++-----
 5 files changed, 17 insertions(+), 17 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] nvmet: Reorder fields in 'struct nvmet_sq'
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
@ 2023-05-01 12:40 ` Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl' Christophe JAILLET
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-01 12:40 UTC (permalink / raw)
  To: hch, sagi, kch
  Cc: linux-nvme, linux-kernel, kernel-janitors, Christophe JAILLET

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_sq' from 472 to 464
bytes when CONFIG_NVME_TARGET_AUTH is defined.

This structure is embedded into some other structures, so it helps reducing
their sizes as well.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Using pahole

Before:
======
struct nvmet_sq {
	struct nvmet_ctrl *        ctrl;                 /*     0     8 */
	struct percpu_ref          ref;                  /*     8    16 */
	u16                        qid;                  /*    24     2 */
	u16                        size;                 /*    26     2 */
	u32                        sqhd;                 /*    28     4 */
	bool                       sqhd_disabled;        /*    32     1 */

	/* XXX 7 bytes hole, try to pack */

	struct delayed_work        auth_expired_work;    /*    40   184 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
	bool                       authenticated;        /*   224     1 */

	/* XXX 1 byte hole, try to pack */

	u16                        dhchap_tid;           /*   226     2 */
	u16                        dhchap_status;        /*   228     2 */

	/* XXX 2 bytes hole, try to pack */

	int                        dhchap_step;          /*   232     4 */

	/* XXX 4 bytes hole, try to pack */

	u8 *                       dhchap_c1;            /*   240     8 */
	u8 *                       dhchap_c2;            /*   248     8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	u32                        dhchap_s1;            /*   256     4 */
	u32                        dhchap_s2;            /*   260     4 */
	u8 *                       dhchap_skey;          /*   264     8 */
	int                        dhchap_skey_len;      /*   272     4 */

	/* XXX 4 bytes hole, try to pack */

	struct completion          free_done;            /*   280    96 */
	/* --- cacheline 5 boundary (320 bytes) was 56 bytes ago --- */
	struct completion          confirm_done;         /*   376    96 */

	/* size: 472, cachelines: 8, members: 19 */
	/* sum members: 454, holes: 5, sum holes: 18 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 24 bytes */
};

After:
=====
struct nvmet_sq {
	struct nvmet_ctrl *        ctrl;                 /*     0     8 */
	struct percpu_ref          ref;                  /*     8    16 */
	u16                        qid;                  /*    24     2 */
	u16                        size;                 /*    26     2 */
	u32                        sqhd;                 /*    28     4 */
	bool                       sqhd_disabled;        /*    32     1 */
	bool                       authenticated;        /*    33     1 */

	/* XXX 6 bytes hole, try to pack */

	struct delayed_work        auth_expired_work;    /*    40   184 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
	u16                        dhchap_tid;           /*   224     2 */
	u16                        dhchap_status;        /*   226     2 */
	int                        dhchap_step;          /*   228     4 */
	u8 *                       dhchap_c1;            /*   232     8 */
	u8 *                       dhchap_c2;            /*   240     8 */
	u32                        dhchap_s1;            /*   248     4 */
	u32                        dhchap_s2;            /*   252     4 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	u8 *                       dhchap_skey;          /*   256     8 */
	int                        dhchap_skey_len;      /*   264     4 */

	/* XXX 4 bytes hole, try to pack */

	struct completion          free_done;            /*   272    96 */
	/* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */
	struct completion          confirm_done;         /*   368    96 */

	/* size: 464, cachelines: 8, members: 19 */
	/* sum members: 454, holes: 2, sum holes: 10 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 16 bytes */
};
---
 drivers/nvme/target/nvmet.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c50146085fb5..8cfd60f3b564 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -109,8 +109,8 @@ struct nvmet_sq {
 	u32			sqhd;
 	bool			sqhd_disabled;
 #ifdef CONFIG_NVME_TARGET_AUTH
-	struct delayed_work	auth_expired_work;
 	bool			authenticated;
+	struct delayed_work	auth_expired_work;
 	u16			dhchap_tid;
 	u16			dhchap_status;
 	int			dhchap_step;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl'
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 1/5] nvmet: Reorder fields in 'struct nvmet_sq' Christophe JAILLET
@ 2023-05-01 12:40 ` Christophe JAILLET
  2023-05-15 14:21   ` Keith Busch
  2023-05-01 12:40 ` [PATCH 3/5] nvmet: Reorder fields in 'struct nvmf_ctrl_options' Christophe JAILLET
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-01 12:40 UTC (permalink / raw)
  To: hch, sagi, kch
  Cc: linux-nvme, linux-kernel, kernel-janitors, Christophe JAILLET

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_sq' from 5368 to 5344
bytes when all CONFIG_* are defined.

This structure is embedded into some other structures, so it helps reducing
their size as well.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Using pahole

Before:
======
struct nvme_ctrl {
	bool                       comp_seen;            /*     0     1 */

	/* XXX 3 bytes hole, try to pack */

	enum nvme_ctrl_state       state;                /*     4     4 */
	bool                       identified;           /*     8     1 */

	/* XXX 7 bytes hole, try to pack */

	spinlock_t                 lock;                 /*    16    72 */
	/* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
	struct mutex               scan_lock;            /*    88   160 */
	/* --- cacheline 3 boundary (192 bytes) was 56 bytes ago --- */
	const struct nvme_ctrl_ops  * ops;               /*   248     8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	struct request_queue *     admin_q;              /*   256     8 */
	struct request_queue *     connect_q;            /*   264     8 */
	struct request_queue *     fabrics_q;            /*   272     8 */
	struct device *            dev;                  /*   280     8 */
	int                        instance;             /*   288     4 */
	int                        numa_node;            /*   292     4 */
	struct blk_mq_tag_set *    tagset;               /*   296     8 */
	struct blk_mq_tag_set *    admin_tagset;         /*   304     8 */
	struct list_head           namespaces;           /*   312    16 */
	/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
	struct rw_semaphore        namespaces_rwsem;     /*   328   168 */
	/* --- cacheline 7 boundary (448 bytes) was 48 bytes ago --- */
	struct device              ctrl_device __attribute__((__aligned__(8))); /*   496  1400 */

	/* XXX last struct has 3 bytes of padding */

	/* --- cacheline 29 boundary (1856 bytes) was 40 bytes ago --- */
	struct device *            device;               /*  1896     8 */
	struct device *            hwmon_device;         /*  1904     8 */
	struct cdev                cdev;                 /*  1912   296 */
	/* --- cacheline 34 boundary (2176 bytes) was 32 bytes ago --- */
	struct work_struct         reset_work;           /*  2208    80 */
	/* --- cacheline 35 boundary (2240 bytes) was 48 bytes ago --- */
	struct work_struct         delete_work;          /*  2288    80 */
	/* --- cacheline 37 boundary (2368 bytes) --- */
	wait_queue_head_t          state_wq;             /*  2368    88 */
	/* --- cacheline 38 boundary (2432 bytes) was 24 bytes ago --- */
	struct nvme_subsystem *    subsys;               /*  2456     8 */
	struct list_head           subsys_entry;         /*  2464    16 */
	struct opal_dev *          opal_dev;             /*  2480     8 */
	char                       name[12];             /*  2488    12 */
	/* --- cacheline 39 boundary (2496 bytes) was 4 bytes ago --- */
	u16                        cntlid;               /*  2500     2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        ctrl_config;          /*  2504     4 */
	u16                        mtfa;                 /*  2508     2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        queue_count;          /*  2512     4 */

	/* XXX 4 bytes hole, try to pack */

	u64                        cap;                  /*  2520     8 */
	u32                        max_hw_sectors;       /*  2528     4 */
	u32                        max_segments;         /*  2532     4 */
	u32                        max_integrity_segments; /*  2536     4 */
	u32                        max_discard_sectors;  /*  2540     4 */
	u32                        max_discard_segments; /*  2544     4 */
	u32                        max_zeroes_sectors;   /*  2548     4 */
	u32                        max_zone_append;      /*  2552     4 */
	u16                        crdt[3];              /*  2556     6 */
	/* --- cacheline 40 boundary (2560 bytes) was 2 bytes ago --- */
	u16                        oncs;                 /*  2562     2 */
	u32                        dmrsl;                /*  2564     4 */
	u16                        oacs;                 /*  2568     2 */
	u16                        sqsize;               /*  2570     2 */
	u32                        max_namespaces;       /*  2572     4 */
	atomic_t                   abort_limit;          /*  2576     4 */
	u8                         vwc;                  /*  2580     1 */

	/* XXX 3 bytes hole, try to pack */

	u32                        vs;                   /*  2584     4 */
	u32                        sgls;                 /*  2588     4 */
	u16                        kas;                  /*  2592     2 */
	u8                         npss;                 /*  2594     1 */
	u8                         apsta;                /*  2595     1 */
	u16                        wctemp;               /*  2596     2 */
	u16                        cctemp;               /*  2598     2 */
	u32                        oaes;                 /*  2600     4 */
	u32                        aen_result;           /*  2604     4 */
	u32                        ctratt;               /*  2608     4 */
	unsigned int               shutdown_timeout;     /*  2612     4 */
	unsigned int               kato;                 /*  2616     4 */
	bool                       subsystem;            /*  2620     1 */

	/* XXX 3 bytes hole, try to pack */

	/* --- cacheline 41 boundary (2624 bytes) --- */
	long unsigned int          quirks;               /*  2624     8 */
	struct nvme_id_power_state psd[32];              /*  2632  1024 */
	/* --- cacheline 57 boundary (3648 bytes) was 8 bytes ago --- */
	struct nvme_effects_log *  effects;              /*  3656     8 */
	struct xarray              cels;                 /*  3664    88 */
	/* --- cacheline 58 boundary (3712 bytes) was 40 bytes ago --- */
	struct work_struct         scan_work;            /*  3752    80 */
	/* --- cacheline 59 boundary (3776 bytes) was 56 bytes ago --- */
	struct work_struct         async_event_work;     /*  3832    80 */
	/* --- cacheline 61 boundary (3904 bytes) was 8 bytes ago --- */
	struct delayed_work        ka_work;              /*  3912   184 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 64 boundary (4096 bytes) --- */
	struct delayed_work        failfast_work;        /*  4096   184 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 66 boundary (4224 bytes) was 56 bytes ago --- */
	struct nvme_command        ka_cmd;               /*  4280    64 */
	/* --- cacheline 67 boundary (4288 bytes) was 56 bytes ago --- */
	struct work_struct         fw_act_work;          /*  4344    80 */
	/* --- cacheline 69 boundary (4416 bytes) was 8 bytes ago --- */
	long unsigned int          events;               /*  4424     8 */
	u8                         anacap;               /*  4432     1 */
	u8                         anatt;                /*  4433     1 */

	/* XXX 2 bytes hole, try to pack */

	u32                        anagrpmax;            /*  4436     4 */
	u32                        nanagrpid;            /*  4440     4 */

	/* XXX 4 bytes hole, try to pack */

	struct mutex               ana_lock;             /*  4448   160 */
	/* --- cacheline 72 boundary (4608 bytes) --- */
	struct nvme_ana_rsp_hdr *  ana_log_buf;          /*  4608     8 */
	size_t                     ana_log_size;         /*  4616     8 */
	struct timer_list          anatt_timer;          /*  4624    88 */
	/* --- cacheline 73 boundary (4672 bytes) was 40 bytes ago --- */
	struct work_struct         ana_work;             /*  4712    80 */
	/* --- cacheline 74 boundary (4736 bytes) was 56 bytes ago --- */
	struct work_struct         dhchap_auth_work;     /*  4792    80 */
	/* --- cacheline 76 boundary (4864 bytes) was 8 bytes ago --- */
	struct mutex               dhchap_auth_mutex;    /*  4872   160 */
	/* --- cacheline 78 boundary (4992 bytes) was 40 bytes ago --- */
	struct nvme_dhchap_queue_context * dhchap_ctxs;  /*  5032     8 */
	struct nvme_dhchap_key *   host_key;             /*  5040     8 */
	struct nvme_dhchap_key *   ctrl_key;             /*  5048     8 */
	/* --- cacheline 79 boundary (5056 bytes) --- */
	u16                        transaction;          /*  5056     2 */

	/* XXX 6 bytes hole, try to pack */

	u64                        ps_max_latency_us;    /*  5064     8 */
	bool                       apst_enabled;         /*  5072     1 */

	/* XXX 3 bytes hole, try to pack */

	u32                        hmpre;                /*  5076     4 */
	u32                        hmmin;                /*  5080     4 */
	u32                        hmminds;              /*  5084     4 */
	u16                        hmmaxd;               /*  5088     2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        ioccsz;               /*  5092     4 */
	u32                        iorcsz;               /*  5096     4 */
	u16                        icdoff;               /*  5100     2 */
	u16                        maxcmd;               /*  5102     2 */
	int                        nr_reconnects;        /*  5104     4 */

	/* XXX 4 bytes hole, try to pack */

	long unsigned int          flags;                /*  5112     8 */
	/* --- cacheline 80 boundary (5120 bytes) --- */
	struct nvmf_ctrl_options * opts;                 /*  5120     8 */
	struct page *              discard_page;         /*  5128     8 */
	long unsigned int          discard_page_busy;    /*  5136     8 */
	struct nvme_fault_inject   fault_inject;         /*  5144   216 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 83 boundary (5312 bytes) was 48 bytes ago --- */
	enum nvme_ctrl_type        cntrltype;            /*  5360     4 */
	enum nvme_dctype           dctype;               /*  5364     4 */

	/* size: 5368, cachelines: 84, members: 104 */
	/* sum members: 5323, holes: 13, sum holes: 45 */
	/* paddings: 4, sum paddings: 15 */
	/* forced alignments: 1 */
	/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));


After:
=====
struct nvme_ctrl {
	bool                       comp_seen;            /*     0     1 */
	bool                       identified;           /*     1     1 */

	/* XXX 2 bytes hole, try to pack */

	enum nvme_ctrl_state       state;                /*     4     4 */
	spinlock_t                 lock;                 /*     8    72 */
	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
	struct mutex               scan_lock;            /*    80   160 */
	/* --- cacheline 3 boundary (192 bytes) was 48 bytes ago --- */
	const struct nvme_ctrl_ops  * ops;               /*   240     8 */
	struct request_queue *     admin_q;              /*   248     8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	struct request_queue *     connect_q;            /*   256     8 */
	struct request_queue *     fabrics_q;            /*   264     8 */
	struct device *            dev;                  /*   272     8 */
	int                        instance;             /*   280     4 */
	int                        numa_node;            /*   284     4 */
	struct blk_mq_tag_set *    tagset;               /*   288     8 */
	struct blk_mq_tag_set *    admin_tagset;         /*   296     8 */
	struct list_head           namespaces;           /*   304    16 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	struct rw_semaphore        namespaces_rwsem;     /*   320   168 */
	/* --- cacheline 7 boundary (448 bytes) was 40 bytes ago --- */
	struct device              ctrl_device __attribute__((__aligned__(8))); /*   488  1400 */

	/* XXX last struct has 3 bytes of padding */

	/* --- cacheline 29 boundary (1856 bytes) was 32 bytes ago --- */
	struct device *            device;               /*  1888     8 */
	struct device *            hwmon_device;         /*  1896     8 */
	struct cdev                cdev;                 /*  1904   296 */
	/* --- cacheline 34 boundary (2176 bytes) was 24 bytes ago --- */
	struct work_struct         reset_work;           /*  2200    80 */
	/* --- cacheline 35 boundary (2240 bytes) was 40 bytes ago --- */
	struct work_struct         delete_work;          /*  2280    80 */
	/* --- cacheline 36 boundary (2304 bytes) was 56 bytes ago --- */
	wait_queue_head_t          state_wq;             /*  2360    88 */
	/* --- cacheline 38 boundary (2432 bytes) was 16 bytes ago --- */
	struct nvme_subsystem *    subsys;               /*  2448     8 */
	struct list_head           subsys_entry;         /*  2456    16 */
	struct opal_dev *          opal_dev;             /*  2472     8 */
	char                       name[12];             /*  2480    12 */
	u16                        cntlid;               /*  2492     2 */
	u16                        mtfa;                 /*  2494     2 */
	/* --- cacheline 39 boundary (2496 bytes) --- */
	u32                        ctrl_config;          /*  2496     4 */
	u32                        queue_count;          /*  2500     4 */
	u64                        cap;                  /*  2504     8 */
	u32                        max_hw_sectors;       /*  2512     4 */
	u32                        max_segments;         /*  2516     4 */
	u32                        max_integrity_segments; /*  2520     4 */
	u32                        max_discard_sectors;  /*  2524     4 */
	u32                        max_discard_segments; /*  2528     4 */
	u32                        max_zeroes_sectors;   /*  2532     4 */
	u32                        max_zone_append;      /*  2536     4 */
	u16                        crdt[3];              /*  2540     6 */
	u16                        oncs;                 /*  2546     2 */
	u32                        dmrsl;                /*  2548     4 */
	u16                        oacs;                 /*  2552     2 */
	u16                        sqsize;               /*  2554     2 */
	u32                        max_namespaces;       /*  2556     4 */
	/* --- cacheline 40 boundary (2560 bytes) --- */
	atomic_t                   abort_limit;          /*  2560     4 */
	u8                         vwc;                  /*  2564     1 */

	/* XXX 3 bytes hole, try to pack */

	u32                        vs;                   /*  2568     4 */
	u32                        sgls;                 /*  2572     4 */
	u16                        kas;                  /*  2576     2 */
	u8                         npss;                 /*  2578     1 */
	u8                         apsta;                /*  2579     1 */
	u16                        wctemp;               /*  2580     2 */
	u16                        cctemp;               /*  2582     2 */
	u32                        oaes;                 /*  2584     4 */
	u32                        aen_result;           /*  2588     4 */
	u32                        ctratt;               /*  2592     4 */
	unsigned int               shutdown_timeout;     /*  2596     4 */
	unsigned int               kato;                 /*  2600     4 */
	bool                       subsystem;            /*  2604     1 */

	/* XXX 3 bytes hole, try to pack */

	long unsigned int          quirks;               /*  2608     8 */
	struct nvme_id_power_state psd[32];              /*  2616  1024 */
	/* --- cacheline 56 boundary (3584 bytes) was 56 bytes ago --- */
	struct nvme_effects_log *  effects;              /*  3640     8 */
	/* --- cacheline 57 boundary (3648 bytes) --- */
	struct xarray              cels;                 /*  3648    88 */
	/* --- cacheline 58 boundary (3712 bytes) was 24 bytes ago --- */
	struct work_struct         scan_work;            /*  3736    80 */
	/* --- cacheline 59 boundary (3776 bytes) was 40 bytes ago --- */
	struct work_struct         async_event_work;     /*  3816    80 */
	/* --- cacheline 60 boundary (3840 bytes) was 56 bytes ago --- */
	struct delayed_work        ka_work;              /*  3896   184 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 63 boundary (4032 bytes) was 48 bytes ago --- */
	struct delayed_work        failfast_work;        /*  4080   184 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 66 boundary (4224 bytes) was 40 bytes ago --- */
	struct nvme_command        ka_cmd;               /*  4264    64 */
	/* --- cacheline 67 boundary (4288 bytes) was 40 bytes ago --- */
	struct work_struct         fw_act_work;          /*  4328    80 */
	/* --- cacheline 68 boundary (4352 bytes) was 56 bytes ago --- */
	long unsigned int          events;               /*  4408     8 */
	/* --- cacheline 69 boundary (4416 bytes) --- */
	u8                         anacap;               /*  4416     1 */
	u8                         anatt;                /*  4417     1 */

	/* XXX 2 bytes hole, try to pack */

	u32                        anagrpmax;            /*  4420     4 */
	u32                        nanagrpid;            /*  4424     4 */

	/* XXX 4 bytes hole, try to pack */

	struct mutex               ana_lock;             /*  4432   160 */
	/* --- cacheline 71 boundary (4544 bytes) was 48 bytes ago --- */
	struct nvme_ana_rsp_hdr *  ana_log_buf;          /*  4592     8 */
	size_t                     ana_log_size;         /*  4600     8 */
	/* --- cacheline 72 boundary (4608 bytes) --- */
	struct timer_list          anatt_timer;          /*  4608    88 */
	/* --- cacheline 73 boundary (4672 bytes) was 24 bytes ago --- */
	struct work_struct         ana_work;             /*  4696    80 */
	/* --- cacheline 74 boundary (4736 bytes) was 40 bytes ago --- */
	struct work_struct         dhchap_auth_work;     /*  4776    80 */
	/* --- cacheline 75 boundary (4800 bytes) was 56 bytes ago --- */
	struct mutex               dhchap_auth_mutex;    /*  4856   160 */
	/* --- cacheline 78 boundary (4992 bytes) was 24 bytes ago --- */
	struct nvme_dhchap_queue_context * dhchap_ctxs;  /*  5016     8 */
	struct nvme_dhchap_key *   host_key;             /*  5024     8 */
	struct nvme_dhchap_key *   ctrl_key;             /*  5032     8 */
	u16                        transaction;          /*  5040     2 */

	/* XXX 6 bytes hole, try to pack */

	u64                        ps_max_latency_us;    /*  5048     8 */
	/* --- cacheline 79 boundary (5056 bytes) --- */
	bool                       apst_enabled;         /*  5056     1 */

	/* XXX 1 byte hole, try to pack */

	u16                        hmmaxd;               /*  5058     2 */
	u32                        hmpre;                /*  5060     4 */
	u32                        hmmin;                /*  5064     4 */
	u32                        hmminds;              /*  5068     4 */
	u32                        ioccsz;               /*  5072     4 */
	u32                        iorcsz;               /*  5076     4 */
	u16                        icdoff;               /*  5080     2 */
	u16                        maxcmd;               /*  5082     2 */
	int                        nr_reconnects;        /*  5084     4 */
	long unsigned int          flags;                /*  5088     8 */
	struct nvmf_ctrl_options * opts;                 /*  5096     8 */
	struct page *              discard_page;         /*  5104     8 */
	long unsigned int          discard_page_busy;    /*  5112     8 */
	/* --- cacheline 80 boundary (5120 bytes) --- */
	struct nvme_fault_inject   fault_inject;         /*  5120   216 */

	/* XXX last struct has 4 bytes of padding */

	/* --- cacheline 83 boundary (5312 bytes) was 24 bytes ago --- */
	enum nvme_ctrl_type        cntrltype;            /*  5336     4 */
	enum nvme_dctype           dctype;               /*  5340     4 */

	/* size: 5344, cachelines: 84, members: 104 */
	/* sum members: 5323, holes: 7, sum holes: 21 */
	/* paddings: 4, sum paddings: 15 */
	/* forced alignments: 1 */
	/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
---
 drivers/nvme/host/nvme.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bf46f122e9e1..53417b6439a7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -246,8 +246,8 @@ enum nvme_ctrl_flags {
 
 struct nvme_ctrl {
 	bool comp_seen;
-	enum nvme_ctrl_state state;
 	bool identified;
+	enum nvme_ctrl_state state;
 	spinlock_t lock;
 	struct mutex scan_lock;
 	const struct nvme_ctrl_ops *ops;
@@ -279,8 +279,8 @@ struct nvme_ctrl {
 	char name[12];
 	u16 cntlid;
 
-	u32 ctrl_config;
 	u16 mtfa;
+	u32 ctrl_config;
 	u32 queue_count;
 
 	u64 cap;
@@ -353,10 +353,10 @@ struct nvme_ctrl {
 	bool apst_enabled;
 
 	/* PCIe only: */
+	u16 hmmaxd;
 	u32 hmpre;
 	u32 hmmin;
 	u32 hmminds;
-	u16 hmmaxd;
 
 	/* Fabrics only */
 	u32 ioccsz;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] nvmet: Reorder fields in 'struct nvmf_ctrl_options'
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 1/5] nvmet: Reorder fields in 'struct nvmet_sq' Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl' Christophe JAILLET
@ 2023-05-01 12:40 ` Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 4/5] nvmet: Reorder fields in 'struct nvme_dhchap_queue_context' Christophe JAILLET
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-01 12:40 UTC (permalink / raw)
  To: hch, sagi, kch
  Cc: linux-nvme, linux-kernel, kernel-janitors, Christophe JAILLET

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmf_ctrl_options' from 136 to
128 bytes.

When such a structure is allocated in nvmf_create_ctrl(), because of the
way memory allocation works, when 136 bytes were requested, 192 bytes were
allocated.

So this saves 64 bytes per allocation, 1 cache line to hold the whole
structure and a few cycles when zeroing the memory in nvmf_create_ctrl().

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Using pahole

Before:
======
struct nvmf_ctrl_options {
	unsigned int               mask;                 /*     0     4 */

	/* XXX 4 bytes hole, try to pack */

	char *                     transport;            /*     8     8 */
	char *                     subsysnqn;            /*    16     8 */
	char *                     traddr;               /*    24     8 */
	char *                     trsvcid;              /*    32     8 */
	char *                     host_traddr;          /*    40     8 */
	char *                     host_iface;           /*    48     8 */
	size_t                     queue_size;           /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	unsigned int               nr_io_queues;         /*    64     4 */
	unsigned int               reconnect_delay;      /*    68     4 */
	bool                       discovery_nqn;        /*    72     1 */
	bool                       duplicate_connect;    /*    73     1 */

	/* XXX 2 bytes hole, try to pack */

	unsigned int               kato;                 /*    76     4 */
	struct nvmf_host *         host;                 /*    80     8 */
	int                        max_reconnects;       /*    88     4 */

	/* XXX 4 bytes hole, try to pack */

	char *                     dhchap_secret;        /*    96     8 */
	char *                     dhchap_ctrl_secret;   /*   104     8 */
	bool                       disable_sqflow;       /*   112     1 */
	bool                       hdr_digest;           /*   113     1 */
	bool                       data_digest;          /*   114     1 */

	/* XXX 1 byte hole, try to pack */

	unsigned int               nr_write_queues;      /*   116     4 */
	unsigned int               nr_poll_queues;       /*   120     4 */
	int                        tos;                  /*   124     4 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	int                        fast_io_fail_tmo;     /*   128     4 */

	/* size: 136, cachelines: 3, members: 24 */
	/* sum members: 121, holes: 4, sum holes: 11 */
	/* padding: 4 */
	/* last cacheline: 8 bytes */
};


After:
=====
struct nvmf_ctrl_options {
	unsigned int               mask;                 /*     0     4 */
	int                        max_reconnects;       /*     4     4 */
	char *                     transport;            /*     8     8 */
	char *                     subsysnqn;            /*    16     8 */
	char *                     traddr;               /*    24     8 */
	char *                     trsvcid;              /*    32     8 */
	char *                     host_traddr;          /*    40     8 */
	char *                     host_iface;           /*    48     8 */
	size_t                     queue_size;           /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	unsigned int               nr_io_queues;         /*    64     4 */
	unsigned int               reconnect_delay;      /*    68     4 */
	bool                       discovery_nqn;        /*    72     1 */
	bool                       duplicate_connect;    /*    73     1 */

	/* XXX 2 bytes hole, try to pack */

	unsigned int               kato;                 /*    76     4 */
	struct nvmf_host *         host;                 /*    80     8 */
	char *                     dhchap_secret;        /*    88     8 */
	char *                     dhchap_ctrl_secret;   /*    96     8 */
	bool                       disable_sqflow;       /*   104     1 */
	bool                       hdr_digest;           /*   105     1 */
	bool                       data_digest;          /*   106     1 */

	/* XXX 1 byte hole, try to pack */

	unsigned int               nr_write_queues;      /*   108     4 */
	unsigned int               nr_poll_queues;       /*   112     4 */
	int                        tos;                  /*   116     4 */
	int                        fast_io_fail_tmo;     /*   120     4 */

	/* size: 128, cachelines: 2, members: 24 */
	/* sum members: 121, holes: 2, sum holes: 3 */
	/* padding: 4 */
};
---
 drivers/nvme/host/fabrics.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index dcac3df8a5f7..1bc57fb75825 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -77,6 +77,9 @@ enum {
  *			      with the parsing opts enum.
  * @mask:	Used by the fabrics library to parse through sysfs options
  *		on adding a NVMe controller.
+ * @max_reconnects: maximum number of allowed reconnect attempts before removing
+ *		the controller, (-1) means reconnect forever, zero means remove
+ *		immediately;
  * @transport:	Holds the fabric transport "technology name" (for a lack of
  *		better description) that will be used by an NVMe controller
  *		being added.
@@ -96,9 +99,6 @@ enum {
  * @discovery_nqn: indicates if the subsysnqn is the well-known discovery NQN.
  * @kato:	Keep-alive timeout.
  * @host:	Virtual NVMe host, contains the NQN and Host ID.
- * @max_reconnects: maximum number of allowed reconnect attempts before removing
- *              the controller, (-1) means reconnect forever, zero means remove
- *              immediately;
  * @dhchap_secret: DH-HMAC-CHAP secret
  * @dhchap_ctrl_secret: DH-HMAC-CHAP controller secret for bi-directional
  *              authentication
@@ -112,6 +112,7 @@ enum {
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
+	int			max_reconnects;
 	char			*transport;
 	char			*subsysnqn;
 	char			*traddr;
@@ -125,7 +126,6 @@ struct nvmf_ctrl_options {
 	bool			duplicate_connect;
 	unsigned int		kato;
 	struct nvmf_host	*host;
-	int			max_reconnects;
 	char			*dhchap_secret;
 	char			*dhchap_ctrl_secret;
 	bool			disable_sqflow;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] nvmet: Reorder fields in 'struct nvme_dhchap_queue_context'
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
                   ` (2 preceding siblings ...)
  2023-05-01 12:40 ` [PATCH 3/5] nvmet: Reorder fields in 'struct nvmf_ctrl_options' Christophe JAILLET
@ 2023-05-01 12:40 ` Christophe JAILLET
  2023-05-01 12:40 ` [PATCH 5/5] nvmet: Reorder fields in 'struct nvmefc_fcp_req' Christophe JAILLET
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-01 12:40 UTC (permalink / raw)
  To: hch, sagi, kch
  Cc: linux-nvme, linux-kernel, kernel-janitors, Christophe JAILLET

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvme_dhchap_queue_context' from
416 to 400 bytes.

This structure is kvcalloc()'ed in nvme_auth_init_ctrl(), so it is likely
that the allocation can be relatively big. Saving 16 bytes per structure
may might a slight difference.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Using pahole

Before:
======
struct nvme_dhchap_queue_context {
	struct list_head           entry;                /*     0    16 */
	struct work_struct         auth_work;            /*    16    80 */
	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
	struct nvme_ctrl *         ctrl;                 /*    96     8 */
	struct crypto_shash *      shash_tfm;            /*   104     8 */
	struct crypto_kpp *        dh_tfm;               /*   112     8 */
	void *                     buf;                  /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	int                        qid;                  /*   128     4 */
	int                        error;                /*   132     4 */
	u32                        s1;                   /*   136     4 */
	u32                        s2;                   /*   140     4 */
	u16                        transaction;          /*   144     2 */
	u8                         status;               /*   146     1 */
	u8                         hash_id;              /*   147     1 */

	/* XXX 4 bytes hole, try to pack */

	size_t                     hash_len;             /*   152     8 */
	u8                         dhgroup_id;           /*   160     1 */
	u8                         c1[64];               /*   161    64 */
	/* --- cacheline 3 boundary (192 bytes) was 33 bytes ago --- */
	u8                         c2[64];               /*   225    64 */
	/* --- cacheline 4 boundary (256 bytes) was 33 bytes ago --- */
	u8                         response[64];         /*   289    64 */

	/* XXX 7 bytes hole, try to pack */

	/* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */
	u8 *                       host_response;        /*   360     8 */
	u8 *                       ctrl_key;             /*   368     8 */
	int                        ctrl_key_len;         /*   376     4 */

	/* XXX 4 bytes hole, try to pack */

	/* --- cacheline 6 boundary (384 bytes) --- */
	u8 *                       host_key;             /*   384     8 */
	int                        host_key_len;         /*   392     4 */

	/* XXX 4 bytes hole, try to pack */

	u8 *                       sess_key;             /*   400     8 */
	int                        sess_key_len;         /*   408     4 */

	/* size: 416, cachelines: 7, members: 25 */
	/* sum members: 393, holes: 4, sum holes: 19 */
	/* padding: 4 */
	/* last cacheline: 32 bytes */
};


After:
=====
struct nvme_dhchap_queue_context {
	struct list_head           entry;                /*     0    16 */
	struct work_struct         auth_work;            /*    16    80 */
	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
	struct nvme_ctrl *         ctrl;                 /*    96     8 */
	struct crypto_shash *      shash_tfm;            /*   104     8 */
	struct crypto_kpp *        dh_tfm;               /*   112     8 */
	void *                     buf;                  /*   120     8 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	int                        qid;                  /*   128     4 */
	int                        error;                /*   132     4 */
	u32                        s1;                   /*   136     4 */
	u32                        s2;                   /*   140     4 */
	u16                        transaction;          /*   144     2 */
	u8                         status;               /*   146     1 */
	u8                         dhgroup_id;           /*   147     1 */
	u8                         hash_id;              /*   148     1 */

	/* XXX 3 bytes hole, try to pack */

	size_t                     hash_len;             /*   152     8 */
	u8                         c1[64];               /*   160    64 */
	/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
	u8                         c2[64];               /*   224    64 */
	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
	u8                         response[64];         /*   288    64 */
	/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
	u8 *                       host_response;        /*   352     8 */
	u8 *                       ctrl_key;             /*   360     8 */
	u8 *                       host_key;             /*   368     8 */
	u8 *                       sess_key;             /*   376     8 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	int                        ctrl_key_len;         /*   384     4 */
	int                        host_key_len;         /*   388     4 */
	int                        sess_key_len;         /*   392     4 */

	/* size: 400, cachelines: 7, members: 25 */
	/* sum members: 393, holes: 1, sum holes: 3 */
	/* padding: 4 */
	/* last cacheline: 16 bytes */
};
---
 drivers/nvme/host/auth.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index ea16a0aba679..daf5d144a8ea 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -30,18 +30,18 @@ struct nvme_dhchap_queue_context {
 	u32 s2;
 	u16 transaction;
 	u8 status;
+	u8 dhgroup_id;
 	u8 hash_id;
 	size_t hash_len;
-	u8 dhgroup_id;
 	u8 c1[64];
 	u8 c2[64];
 	u8 response[64];
 	u8 *host_response;
 	u8 *ctrl_key;
-	int ctrl_key_len;
 	u8 *host_key;
-	int host_key_len;
 	u8 *sess_key;
+	int ctrl_key_len;
+	int host_key_len;
 	int sess_key_len;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] nvmet: Reorder fields in 'struct nvmefc_fcp_req'
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
                   ` (3 preceding siblings ...)
  2023-05-01 12:40 ` [PATCH 4/5] nvmet: Reorder fields in 'struct nvme_dhchap_queue_context' Christophe JAILLET
@ 2023-05-01 12:40 ` Christophe JAILLET
  2023-05-01 13:57 ` [PATCH 0/5] optimize some data structure in nvme Sagi Grimberg
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-01 12:40 UTC (permalink / raw)
  To: hch, sagi, kch
  Cc: linux-nvme, linux-kernel, kernel-janitors, Christophe JAILLET

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmefc_fcp_req' from
112 to 104 bytes.

This structure is embedded in some other structures (nvme_fc_fcp_op
which itself is embedded in nvme_fcp_op_w_sgl), so it helps reducing the
size of these structures too.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Using pahole

Before:
======
struct nvmefc_fcp_req {
	void *                     cmdaddr;              /*     0     8 */
	void *                     rspaddr;              /*     8     8 */
	dma_addr_t                 cmddma;               /*    16     8 */
	dma_addr_t                 rspdma;               /*    24     8 */
	u16                        cmdlen;               /*    32     2 */
	u16                        rsplen;               /*    34     2 */
	u32                        payload_length;       /*    36     4 */
	struct sg_table            sg_table;             /*    40    16 */
	struct scatterlist *       first_sgl;            /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	int                        sg_cnt;               /*    64     4 */
	enum nvmefc_fcp_datadir    io_dir;               /*    68     4 */
	__le16                     sqid;                 /*    72     2 */

	/* XXX 6 bytes hole, try to pack */

	void                       (*done)(struct nvmefc_fcp_req *); /*    80     8 */
	void *                     private;              /*    88     8 */
	u32                        transferred_length;   /*    96     4 */
	u16                        rcv_rsplen;           /*   100     2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        status;               /*   104     4 */

	/* size: 112, cachelines: 2, members: 17 */
	/* sum members: 100, holes: 2, sum holes: 8 */
	/* padding: 4 */
	/* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));


After:
=====
struct nvmefc_fcp_req {
	void *                     cmdaddr;              /*     0     8 */
	void *                     rspaddr;              /*     8     8 */
	dma_addr_t                 cmddma;               /*    16     8 */
	dma_addr_t                 rspdma;               /*    24     8 */
	u16                        cmdlen;               /*    32     2 */
	u16                        rsplen;               /*    34     2 */
	u32                        payload_length;       /*    36     4 */
	struct sg_table            sg_table;             /*    40    16 */
	struct scatterlist *       first_sgl;            /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	int                        sg_cnt;               /*    64     4 */
	enum nvmefc_fcp_datadir    io_dir;               /*    68     4 */
	void                       (*done)(struct nvmefc_fcp_req *); /*    72     8 */
	void *                     private;              /*    80     8 */
	__le16                     sqid;                 /*    88     2 */
	u16                        rcv_rsplen;           /*    90     2 */
	u32                        transferred_length;   /*    92     4 */
	u32                        status;               /*    96     4 */

	/* size: 104, cachelines: 2, members: 17 */
	/* padding: 4 */
	/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
---
 include/linux/nvme-fc-driver.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h
index fa092b9be2fd..4109f1bd6128 100644
--- a/include/linux/nvme-fc-driver.h
+++ b/include/linux/nvme-fc-driver.h
@@ -185,7 +185,6 @@ enum nvmefc_fcp_datadir {
  * @first_sgl: memory for 1st scatter/gather list segment for payload data
  * @sg_cnt:    number of elements in the scatter/gather list
  * @io_dir:    direction of the FCP request (see NVMEFC_FCP_xxx)
- * @sqid:      The nvme SQID the command is being issued on
  * @done:      The callback routine the LLDD is to invoke upon completion of
  *             the FCP operation. req argument is the pointer to the original
  *             FCP IO operation.
@@ -194,12 +193,13 @@ enum nvmefc_fcp_datadir {
  *             while processing the operation. The length of the buffer
  *             corresponds to the fcprqst_priv_sz value specified in the
  *             nvme_fc_port_template supplied by the LLDD.
+ * @sqid:      The nvme SQID the command is being issued on
  *
  * Values set by the LLDD indicating completion status of the FCP operation.
  * Must be set prior to calling the done() callback.
+ * @rcv_rsplen: length, in bytes, of the FCP RSP IU received.
  * @transferred_length: amount of payload data, in bytes, that were
  *             transferred. Should equal payload_length on success.
- * @rcv_rsplen: length, in bytes, of the FCP RSP IU received.
  * @status:    Completion status of the FCP operation. must be 0 upon success,
  *             negative errno value upon failure (ex: -EIO). Note: this is
  *             NOT a reflection of the NVME CQE completion status. Only the
@@ -219,14 +219,14 @@ struct nvmefc_fcp_req {
 	int			sg_cnt;
 	enum nvmefc_fcp_datadir	io_dir;
 
-	__le16			sqid;
-
 	void (*done)(struct nvmefc_fcp_req *req);
 
 	void			*private;
 
-	u32			transferred_length;
+	__le16			sqid;
+
 	u16			rcv_rsplen;
+	u32			transferred_length;
 	u32			status;
 } __aligned(sizeof(u64));	/* alignment for other things alloc'd with */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] optimize some data structure in nvme
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
                   ` (4 preceding siblings ...)
  2023-05-01 12:40 ` [PATCH 5/5] nvmet: Reorder fields in 'struct nvmefc_fcp_req' Christophe JAILLET
@ 2023-05-01 13:57 ` Sagi Grimberg
  2023-05-12 20:26 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Sagi Grimberg @ 2023-05-01 13:57 UTC (permalink / raw)
  To: Christophe JAILLET, hch, kch; +Cc: linux-nvme, linux-kernel, kernel-janitors


> This serie is a proposal to slighly optimize the memory needed for some
> structures used in nvme.
> 
> This follows the discussion in [1].
> 
> Honnestly, I'm not convinced that this serie really brings semething.
> Because of the way memory alocation works, and its over-allocation to try to
> avoid memory fragmentation, some limited gains are most of the time useless.
> 
> It could still help:
>     - many holes in structure can, at some point, have its size reach a threshold
>       (this is specially true if such structures are allocated with kcalloc() or
>       kmalloc_array())
>     - it can save some space in some other structures if embedded in them
>     - it can save a few cycles if the structure is memcpy()'ed or zeroed, for
>       example
>     - can reduce cache usage
> 
> With that in mind, patch 3 is a win, patch 4 is likely a win, the other ones are
> much more theorical.
> 
> The changes are really limited, so even if the gain is marginal, maybe it still
> makes sense to merge them.

Don't see why not, given they make do the structures smaller.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] optimize some data structure in nvme
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
                   ` (5 preceding siblings ...)
  2023-05-01 13:57 ` [PATCH 0/5] optimize some data structure in nvme Sagi Grimberg
@ 2023-05-12 20:26 ` Christoph Hellwig
  2023-05-13  1:01 ` Chaitanya Kulkarni
  2023-05-15 17:13 ` Keith Busch
  8 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2023-05-12 20:26 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: hch, sagi, kch, linux-nvme, linux-kernel, kernel-janitors

Thanks,

the whole series looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] optimize some data structure in nvme
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
                   ` (6 preceding siblings ...)
  2023-05-12 20:26 ` Christoph Hellwig
@ 2023-05-13  1:01 ` Chaitanya Kulkarni
  2023-05-15 17:13 ` Keith Busch
  8 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2023-05-13  1:01 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: linux-nvme, sagi, Chaitanya Kulkarni, linux-kernel, hch, kernel-janitors

On 5/1/23 05:40, Christophe JAILLET wrote:
> This serie is a proposal to slighly optimize the memory needed for some
> structures used in nvme.
>
> This follows the discussion in [1].
>
> Honnestly, I'm not convinced that this serie really brings semething.
> Because of the way memory alocation works, and its over-allocation to try to
> avoid memory fragmentation, some limited gains are most of the time useless.
>
> It could still help:
>     - many holes in structure can, at some point, have its size reach a threshold
>       (this is specially true if such structures are allocated with kcalloc() or
>       kmalloc_array())
>     - it can save some space in some other structures if embedded in them
>     - it can save a few cycles if the structure is memcpy()'ed or zeroed, for
>       example
>     - can reduce cache usage
>
> With that in mind, patch 3 is a win, patch 4 is likely a win, the other ones are
> much more theorical.
>
> The changes are really limited, so even if the gain is marginal, maybe it still
> makes sense to merge them.
>
> Each patch gives the layout generated by pahole before and after the patch.
>
> [1]: https://lore.kernel.org/all/67a9e53e-4ac9-7ba8-9713-96c1dfe1e341@nvidia.com/
>
> Christophe JAILLET (5):
>    nvmet: Reorder fields in 'struct nvmet_sq'
>    nvmet: Reorder fields in 'struct nvme_ctrl'
>    nvmet: Reorder fields in 'struct nvmf_ctrl_options'
>    nvmet: Reorder fields in 'struct nvme_dhchap_queue_context'
>    nvmet: Reorder fields in 'struct nvmefc_fcp_req'
>
>   drivers/nvme/host/auth.c       |  6 +++---
>   drivers/nvme/host/fabrics.h    |  8 ++++----
>   drivers/nvme/host/nvme.h       |  6 +++---
>   drivers/nvme/target/nvmet.h    |  4 ++--
>   include/linux/nvme-fc-driver.h | 10 +++++-----
>   5 files changed, 17 insertions(+), 17 deletions(-)
>

thanks a lot for doing this and following the comment on the other patch.

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl'
  2023-05-01 12:40 ` [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl' Christophe JAILLET
@ 2023-05-15 14:21   ` Keith Busch
  2023-05-15 17:22     ` Christophe JAILLET
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2023-05-15 14:21 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: hch, sagi, kch, linux-nvme, linux-kernel, kernel-janitors

On Mon, May 01, 2023 at 02:40:26PM +0200, Christophe JAILLET wrote:
> Group some variables based on their sizes to reduce holes.
> On x86_64, this shrinks the size of 'struct nvmet_sq' from 5368 to 5344
> bytes when all CONFIG_* are defined.

This patch is changing struct nvme_ctrl but the commit log only mentions
struct nvmet_sq, which was handled in patch 1/5. I'll just fix that up
when applying.
 
> This structure is embedded into some other structures, so it helps reducing
> their size as well.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> Using pahole
> 
> Before:
> ======
> struct nvme_ctrl {
> 	bool                       comp_seen;            /*     0     1 */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] optimize some data structure in nvme
  2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
                   ` (7 preceding siblings ...)
  2023-05-13  1:01 ` Chaitanya Kulkarni
@ 2023-05-15 17:13 ` Keith Busch
  8 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2023-05-15 17:13 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: hch, sagi, kch, linux-nvme, linux-kernel, kernel-janitors

Thanks, applied to nvme-6.5.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl'
  2023-05-15 14:21   ` Keith Busch
@ 2023-05-15 17:22     ` Christophe JAILLET
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe JAILLET @ 2023-05-15 17:22 UTC (permalink / raw)
  To: Keith Busch; +Cc: hch, sagi, kch, linux-nvme, linux-kernel, kernel-janitors

Le 15/05/2023 à 16:21, Keith Busch a écrit :
> On Mon, May 01, 2023 at 02:40:26PM +0200, Christophe JAILLET wrote:
>> Group some variables based on their sizes to reduce holes.
>> On x86_64, this shrinks the size of 'struct nvmet_sq' from 5368 to 5344
>> bytes when all CONFIG_* are defined.
> 
> This patch is changing struct nvme_ctrl but the commit log only mentions
> struct nvmet_sq, which was handled in patch 1/5. I'll just fix that up
> when applying.
>   

Thanks for catching and fixing it :).

CJ


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-15 17:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-01 12:40 [PATCH 0/5] optimize some data structure in nvme Christophe JAILLET
2023-05-01 12:40 ` [PATCH 1/5] nvmet: Reorder fields in 'struct nvmet_sq' Christophe JAILLET
2023-05-01 12:40 ` [PATCH 2/5] nvmet: Reorder fields in 'struct nvme_ctrl' Christophe JAILLET
2023-05-15 14:21   ` Keith Busch
2023-05-15 17:22     ` Christophe JAILLET
2023-05-01 12:40 ` [PATCH 3/5] nvmet: Reorder fields in 'struct nvmf_ctrl_options' Christophe JAILLET
2023-05-01 12:40 ` [PATCH 4/5] nvmet: Reorder fields in 'struct nvme_dhchap_queue_context' Christophe JAILLET
2023-05-01 12:40 ` [PATCH 5/5] nvmet: Reorder fields in 'struct nvmefc_fcp_req' Christophe JAILLET
2023-05-01 13:57 ` [PATCH 0/5] optimize some data structure in nvme Sagi Grimberg
2023-05-12 20:26 ` Christoph Hellwig
2023-05-13  1:01 ` Chaitanya Kulkarni
2023-05-15 17:13 ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).