netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group
@ 2022-01-20 14:09 Tony Lu
  2022-01-20 14:50 ` Karsten Graul
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Lu @ 2022-01-20 14:09 UTC (permalink / raw)
  To: kgraul; +Cc: kuba, davem, netdev, linux-s390

When analyzed memory usage of SMC, we found that the size of struct
smc_link_group is 16048 bytes, which is too big for a busy machine to
allocate contiguous memory. Using kvzalloc instead that falls back to
vmalloc if there has not enough contiguous memory.

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 net/smc/smc_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 8935ef4811b0..a5024b098540 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -828,7 +828,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
 		}
 	}
 
-	lgr = kzalloc(sizeof(*lgr), GFP_KERNEL);
+	lgr = kvzalloc(sizeof(*lgr), GFP_KERNEL);
 	if (!lgr) {
 		rc = SMC_CLC_DECL_MEM;
 		goto ism_put_vlan;
@@ -914,7 +914,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
 free_wq:
 	destroy_workqueue(lgr->tx_wq);
 free_lgr:
-	kfree(lgr);
+	kvfree(lgr);
 ism_put_vlan:
 	if (ini->is_smcd && ini->vlan_id)
 		smc_ism_put_vlan(ini->ism_dev[ini->ism_selected], ini->vlan_id);
@@ -1317,7 +1317,7 @@ static void smc_lgr_free(struct smc_link_group *lgr)
 		if (!atomic_dec_return(&lgr_cnt))
 			wake_up(&lgrs_deleted);
 	}
-	kfree(lgr);
+	kvfree(lgr);
 }
 
 static void smc_sk_wake_ups(struct smc_sock *smc)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group
  2022-01-20 14:09 [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group Tony Lu
@ 2022-01-20 14:50 ` Karsten Graul
  2022-01-21  3:24   ` Tony Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Karsten Graul @ 2022-01-20 14:50 UTC (permalink / raw)
  To: Tony Lu; +Cc: kuba, davem, netdev, linux-s390

On 20/01/2022 15:09, Tony Lu wrote:
> When analyzed memory usage of SMC, we found that the size of struct
> smc_link_group is 16048 bytes, which is too big for a busy machine to
> allocate contiguous memory. Using kvzalloc instead that falls back to
> vmalloc if there has not enough contiguous memory.

I am wondering where the needed contiguous memory for the required RMB buffers should come from when 
you don't even get enough storage for the initial link group?

The idea is that when the system is so low on contiguous memory then a link group creation should fail 
early, because most of the later buffer allocations will also fail then later.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group
  2022-01-20 14:50 ` Karsten Graul
@ 2022-01-21  3:24   ` Tony Lu
  2022-01-21 11:06     ` Karsten Graul
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Lu @ 2022-01-21  3:24 UTC (permalink / raw)
  To: Karsten Graul; +Cc: kuba, davem, netdev, linux-s390

On Thu, Jan 20, 2022 at 03:50:26PM +0100, Karsten Graul wrote:
> On 20/01/2022 15:09, Tony Lu wrote:
> > When analyzed memory usage of SMC, we found that the size of struct
> > smc_link_group is 16048 bytes, which is too big for a busy machine to
> > allocate contiguous memory. Using kvzalloc instead that falls back to
> > vmalloc if there has not enough contiguous memory.
> 
> I am wondering where the needed contiguous memory for the required RMB buffers should come from when 
> you don't even get enough storage for the initial link group?

Yes, this is what I want to talking about. The RMB buffers size inherits
from TCP, we cannot assume that RMB is always larger than 16k bytes, the
tcp_mem can be changed on the fly, and it can be tuned to very small for
saving memory. Also, If we freed existed link group or somewhere else,
we can allocate enough contiguous memory for the new link group.

> The idea is that when the system is so low on contiguous memory then a link group creation should fail 
> early, because most of the later buffer allocations will also fail then later.

IMHO, it is not a "pre-checker" for allocating buffer, it is a reminder
for us to save contiguous memory, this is a precious resource, and a
possible way to do this. This patch is not the best approach to solve
this problem, but the simplest one. A possible approach to allocate
link array in link group with a pointer to another memory. Glad to hear
your advice.

Thanks,
Tony Lu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group
  2022-01-21  3:24   ` Tony Lu
@ 2022-01-21 11:06     ` Karsten Graul
  2022-01-24  9:46       ` Tony Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Karsten Graul @ 2022-01-21 11:06 UTC (permalink / raw)
  To: Tony Lu; +Cc: kuba, davem, netdev, linux-s390

On 21/01/2022 04:24, Tony Lu wrote:
> On Thu, Jan 20, 2022 at 03:50:26PM +0100, Karsten Graul wrote:
>> On 20/01/2022 15:09, Tony Lu wrote:
>>> When analyzed memory usage of SMC, we found that the size of struct
>>> smc_link_group is 16048 bytes, which is too big for a busy machine to
>>> allocate contiguous memory. Using kvzalloc instead that falls back to
>>> vmalloc if there has not enough contiguous memory.
>>
>> I am wondering where the needed contiguous memory for the required RMB buffers should come from when 
>> you don't even get enough storage for the initial link group?
> 
> Yes, this is what I want to talking about. The RMB buffers size inherits
> from TCP, we cannot assume that RMB is always larger than 16k bytes, the
> tcp_mem can be changed on the fly, and it can be tuned to very small for
> saving memory. Also, If we freed existed link group or somewhere else,
> we can allocate enough contiguous memory for the new link group.

The lowest size for an RMB is 16kb, smaller inherited tcp sizes do not apply here.
> 
>> The idea is that when the system is so low on contiguous memory then a link group creation should fail 
>> early, because most of the later buffer allocations will also fail then later.
> 
> IMHO, it is not a "pre-checker" for allocating buffer, it is a reminder
> for us to save contiguous memory, this is a precious resource, and a
> possible way to do this. This patch is not the best approach to solve
> this problem, but the simplest one. A possible approach to allocate
> link array in link group with a pointer to another memory. Glad to hear
> your advice.

I am still not fully convinced of this change. It does not harm and the overhead of
a vmalloc() is acceptable because a link group is not created so often. But since
kvzmalloc() will first try to use normal kmalloc() and if that fails switch to the
(more expensive) vmalloc() this will not _save_ any contiguous memory.
And for the subsequent required allocations of at least one RMB we need another 16KB.

Did this change had any measurable advantages in your tests?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group
  2022-01-21 11:06     ` Karsten Graul
@ 2022-01-24  9:46       ` Tony Lu
  2022-01-27 15:28         ` Karsten Graul
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Lu @ 2022-01-24  9:46 UTC (permalink / raw)
  To: Karsten Graul; +Cc: kuba, davem, netdev, linux-s390

On Fri, Jan 21, 2022 at 12:06:56PM +0100, Karsten Graul wrote:
> On 21/01/2022 04:24, Tony Lu wrote:
> > On Thu, Jan 20, 2022 at 03:50:26PM +0100, Karsten Graul wrote:
> >> On 20/01/2022 15:09, Tony Lu wrote:
> >>> When analyzed memory usage of SMC, we found that the size of struct
> >>> smc_link_group is 16048 bytes, which is too big for a busy machine to
> >>> allocate contiguous memory. Using kvzalloc instead that falls back to
> >>> vmalloc if there has not enough contiguous memory.
> >>
> >> I am wondering where the needed contiguous memory for the required RMB buffers should come from when 
> >> you don't even get enough storage for the initial link group?
> > 
> > Yes, this is what I want to talking about. The RMB buffers size inherits
> > from TCP, we cannot assume that RMB is always larger than 16k bytes, the
> > tcp_mem can be changed on the fly, and it can be tuned to very small for
> > saving memory. Also, If we freed existed link group or somewhere else,
> > we can allocate enough contiguous memory for the new link group.
> 
> The lowest size for an RMB is 16kb, smaller inherited tcp sizes do not apply here.

Yes, for my unclear description, this is the corner case for RMB is not
always larger than 16kib, equal is a possible scene.

> > 
> >> The idea is that when the system is so low on contiguous memory then a link group creation should fail 
> >> early, because most of the later buffer allocations will also fail then later.
> > 
> > IMHO, it is not a "pre-checker" for allocating buffer, it is a reminder
> > for us to save contiguous memory, this is a precious resource, and a
> > possible way to do this. This patch is not the best approach to solve
> > this problem, but the simplest one. A possible approach to allocate
> > link array in link group with a pointer to another memory. Glad to hear
> > your advice.
> 
> I am still not fully convinced of this change. It does not harm and the overhead of
> a vmalloc() is acceptable because a link group is not created so often. But since
> kvzmalloc() will first try to use normal kmalloc() and if that fails switch to the
> (more expensive) vmalloc() this will not _save_ any contiguous memory.
> And for the subsequent required allocations of at least one RMB we need another 16KB.

I agree with you. kvzmalloc doesn't save contiguous memory for the most
time, only when high order contiguous memory is used out, or malloc link
group when another link group just freed its buffer. This race window is
too small to reach it in real world.

I prepare a complete solution for this. After analyzed memory footprint
of structures in SMC, struct smc_link_group is the largest struction,
here is the detailed fields:

struct smc_link_group {
        struct list_head           list;                 /*     0    16 */
        struct rb_root             conns_all;            /*    16     8 */
        rwlock_t                   conns_lock;           /*    24     8 */
        unsigned int               conns_num;            /*    32     4 */
        short unsigned int         vlan_id;              /*    36     2 */

        /* XXX 2 bytes hole, try to pack */

        struct list_head           sndbufs[16];          /*    40   256 */
        /* --- cacheline 4 boundary (256 bytes) was 40 bytes ago --- */
        struct mutex               sndbufs_lock;         /*   296    32 */
        /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
        struct list_head           rmbs[16];             /*   328   256 */
        /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
        struct mutex               rmbs_lock;            /*   584    32 */
        u8                         id[4];                /*   616     4 */

        /* XXX 4 bytes hole, try to pack */

        struct delayed_work        free_work;            /*   624    88 */

        /* XXX last struct has 4 bytes of padding */

        /* --- cacheline 11 boundary (704 bytes) was 8 bytes ago --- */
        struct work_struct         terminate_work;       /*   712    32 */
        struct workqueue_struct *  tx_wq;                /*   744     8 */
        u8                         sync_err:1;           /*   752: 0  1 */
        u8                         terminating:1;        /*   752: 1  1 */
        u8                         freeing:1;            /*   752: 2  1 */

        /* XXX 5 bits hole, try to pack */

        bool                       is_smcd;              /*   753     1 */
        u8                         smc_version;          /*   754     1 */
        u8                         negotiated_eid[32];   /*   755    32 */
        /* --- cacheline 12 boundary (768 bytes) was 19 bytes ago --- */
        u8                         peer_os;              /*   787     1 */
        u8                         peer_smc_release;     /*   788     1 */
        u8                         peer_hostname[32];    /*   789    32 */

        /* XXX 3 bytes hole, try to pack */

        union {
                struct {
                        enum smc_lgr_role role;          /*   824     4 */

                        /* XXX 4 bytes hole, try to pack */

                        /* --- cacheline 13 boundary (832 bytes) --- */
                        struct smc_link lnk[3];          /*   832  2616 */
                        /* --- cacheline 53 boundary (3392 bytes) was 56 bytes ago --- */
                        struct smc_wr_v2_buf * wr_rx_buf_v2; /*  3448     8 */
                        /* --- cacheline 54 boundary (3456 bytes) --- */
                        struct smc_wr_v2_buf * wr_tx_buf_v2; /*  3456     8 */
                        char       peer_systemid[8];     /*  3464     8 */
                        struct smc_rtoken rtokens[255][3]; /*  3472 12240 */
                        /* --- cacheline 245 boundary (15680 bytes) was 32 bytes ago --- */
                        long unsigned int rtokens_used_mask[4]; /* 15712    32 */
                        /* --- cacheline 246 boundary (15744 bytes) --- */
                        u8         next_link_id;         /* 15744     1 */

                        /* XXX 3 bytes hole, try to pack */

                        enum smc_lgr_type type;          /* 15748     4 */
                        u8         pnet_id[17];          /* 15752    17 */

                        /* XXX 7 bytes hole, try to pack */

                        struct list_head llc_event_q;    /* 15776    16 */
                        spinlock_t llc_event_q_lock;     /* 15792     4 */

                        /* XXX 4 bytes hole, try to pack */

                        struct mutex llc_conf_mutex;     /* 15800    32 */
                        /* --- cacheline 247 boundary (15808 bytes) was 24 bytes ago --- */
                        struct work_struct llc_add_link_work; /* 15832    32 */
                        struct work_struct llc_del_link_work; /* 15864    32 */
                        /* --- cacheline 248 boundary (15872 bytes) was 24 bytes ago --- */
                        struct work_struct llc_event_work; /* 15896    32 */
                        wait_queue_head_t llc_flow_waiter; /* 15928    24 */
                        /* --- cacheline 249 boundary (15936 bytes) was 16 bytes ago --- */
                        wait_queue_head_t llc_msg_waiter; /* 15952    24 */
                        struct smc_llc_flow llc_flow_lcl; /* 15976    16 */
                        struct smc_llc_flow llc_flow_rmt; /* 15992    16 */
                        /* --- cacheline 250 boundary (16000 bytes) was 8 bytes ago --- */
                        struct smc_llc_qentry * delayed_event; /* 16008     8 */
                        spinlock_t llc_flow_lock;        /* 16016     4 */
                        int        llc_testlink_time;    /* 16020     4 */
                        u32        llc_termination_rsn;  /* 16024     4 */
                        u8         nexthop_mac[6];       /* 16028     6 */
                        u8         uses_gateway;         /* 16034     1 */

                        /* XXX 1 byte hole, try to pack */

                        __be32     saddr;                /* 16036     4 */
                        struct net * net;                /* 16040     8 */
                };                                       /*   824 15224 */
                struct {
                        u64        peer_gid;             /*   824     8 */
                        /* --- cacheline 13 boundary (832 bytes) --- */
                        struct smcd_dev * smcd;          /*   832     8 */
                        u8         peer_shutdown:1;      /*   840: 0  1 */
                };                                       /*   824    24 */
        };                                               /*   824 15224 */

        /* size: 16048, cachelines: 251, members: 23 */
        /* sum members: 16038, holes: 3, sum holes: 9 */
        /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 48 bytes */
};

These fields use most of memory in struct smc_link_group.

struct smc_link lnk[3];          /*   832  2616 */
struct smc_rtoken rtokens[255][3]; /*  3472 12240 */

There is a possible to spread this large allocation to multiple
contiguous allocations, try to keep allocating one page every time.
 
# Appendix

There are background information about this and following patches. We
are working on optimizing memory usage to expand the usage scenarios.
such as container environment. These scenarios have a common trait that
the resource is limited, especially memory (contiguous memory is the most).

So we are working on some methods to reduce memory usage, such as:
- reduce corners' memory usage, for example this patch mentions;
- flexible release reused link group buffer, manual or based on mem
  cgroup pressure;
- NIC non-contiguous DMA memory, try to reduce contiguous memory usage;
- elastic memory for allocating more memory when needed and release when
  free;
- tunable snd/rcv buffer and unbind them from TCP;

These methods trade off performance and flexibility. The detailed
designs are still working. We will send RFCs out when they are clear,
and glad to receive your advice. This patch is so trivial that I sent it
out now.

Thanks,
Tony Lu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group
  2022-01-24  9:46       ` Tony Lu
@ 2022-01-27 15:28         ` Karsten Graul
  0 siblings, 0 replies; 6+ messages in thread
From: Karsten Graul @ 2022-01-27 15:28 UTC (permalink / raw)
  To: Tony Lu; +Cc: kuba, davem, netdev, linux-s390

On 24/01/2022 10:46, Tony Lu wrote:
> On Fri, Jan 21, 2022 at 12:06:56PM +0100, Karsten Graul wrote:
>> On 21/01/2022 04:24, Tony Lu wrote:
>>> On Thu, Jan 20, 2022 at 03:50:26PM +0100, Karsten Graul wrote:
>>>> On 20/01/2022 15:09, Tony Lu wrote:
>> I am still not fully convinced of this change. It does not harm and the overhead of
>> a vmalloc() is acceptable because a link group is not created so often. But since
>> kvzmalloc() will first try to use normal kmalloc() and if that fails switch to the
>> (more expensive) vmalloc() this will not _save_ any contiguous memory.
>> And for the subsequent required allocations of at least one RMB we need another 16KB.
> I agree with you. kvzmalloc doesn't save contiguous memory for the most
> time, only when high order contiguous memory is used out, or malloc link
> group when another link group just freed its buffer. This race window is
> too small to reach it in real world.

Okay I see we are in sync with that, and we should drop your kvzalloc() patch.
It generates noise and doesn't solve a real problem.

I appreciate your work on this topic, but when I see the numbers then the whole lgr 
WITH all links inside it would occupy less than one 4K page of memory (~3808 bytes).
The vast majority of memory in this struct is needed by the

struct smc_rtoken rtokens[255][3]; /*  3472 12240 */

array. This is where continuous space could be saved, but that needs some effort
to provide an equivalent fast way to store and lookup the RMBs.

Moving out the links from the lgr will not help here.

A link group holds up to 255 connections, so even with your 10000 connection test
we need no more than 40 instances of lgr...I am not sure if it is worth the time that
you need to spend for this particular change (lgr). The other topics you listed also
sound interesting!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-27 15:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20 14:09 [PATCH net-next] net/smc: Use kvzalloc for allocating smc_link_group Tony Lu
2022-01-20 14:50 ` Karsten Graul
2022-01-21  3:24   ` Tony Lu
2022-01-21 11:06     ` Karsten Graul
2022-01-24  9:46       ` Tony Lu
2022-01-27 15:28         ` Karsten Graul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).