From: Steve French <smfrench@gmail.com>
To: Enzo Matsumiya <ematsumiya@suse.de>
Cc: "Aurélien Aptel" <aurelien.aptel@gmail.com>,
linux-cifs@vger.kernel.org, pc@cjr.nz, ronniesahlberg@gmail.com,
nspmangalore@gmail.com
Subject: Re: [PATCH] cifs: perf improvement - use faster macros ALIGN() and round_up()
Date: Tue, 13 Sep 2022 21:50:51 -0500 [thread overview]
Message-ID: <CAH2r5msyrbfM71WDpUggD7V_YTZhE50w7y4Umt=QjAG=Yfhz7g@mail.gmail.com> (raw)
In-Reply-To: <20220907204140.77ssfeib3zmwvqy2@cyberdelia>
Most of the changes look ok to me, although not much point in changing
the lines like:
x = 2 /* sizeof(__u16) */ ....
to
x = sizeof(__u16) ...
not sure that those particular kinds of changes help enough with
readability (but they make backports harder) - and they have 0 impact
on performance. So even if that type of change helps readability a
small amount, it could be split out from the things which could help
performance (and/or clearly improve readability).
The one area that looked confusing is this part. Can you explain why
this part of the change?
@@ -2687,20 +2683,17 @@ int smb311_posix_mkdir(const unsigned int xid,
struct inode *inode,
uni_path_len = (2 * UniStrnlen((wchar_t *)utf16_path,
PATH_MAX)) + 2;
/* MUST set path len (NameLength) to 0 opening root of share */
req->NameLength = cpu_to_le16(uni_path_len - 2);
- if (uni_path_len % 8 != 0) {
- copy_size = roundup(uni_path_len, 8);
- copy_path = kzalloc(copy_size, GFP_KERNEL);
- if (!copy_path) {
- rc = -ENOMEM;
- goto err_free_req;
- }
- memcpy((char *)copy_path, (const char *)utf16_path,
- uni_path_len);
- uni_path_len = copy_size;
- /* free before overwriting resource */
- kfree(utf16_path);
- utf16_path = copy_path;
+ copy_size = round_up(uni_path_len, 8);
+ copy_path = kzalloc(copy_size, GFP_KERNEL);
+ if (!copy_path) {
+ rc = -ENOMEM;
+ goto err_free_req;
}
+ memcpy((char *)copy_path, (const char *)utf16_path,
uni_path_len);
+ uni_path_len = copy_size;
+ /* free before overwriting resource */
+ kfree(utf16_path);
+ utf16_path = copy_path;
}
On Wed, Sep 7, 2022 at 3:41 PM Enzo Matsumiya <ematsumiya@suse.de> wrote:
>
> On 09/07, Aurélien Aptel wrote:
> >Hi,
> >
> >Changes are good from a readability stand point but like the others
> >I'm very skeptical about the perf improvement claims.
>
> Just to be clear, as I re-read the commit message and realize I might have
> sounded a tad sensationalist; the "~50% improvement" was from the measure of
> the average latency for a single copy operation. As I replied to Tom Talpey in
> https://lore.kernel.org/linux-cifs/20220906160508.x4ahjzo3tr34w6k5@cyberdelia/
> the actual perceptible gain was only 0.02 to 0.05 seconds on my tests.
>
> Pardon me for any confusion.
>
>
> Enzo
>
> >On Tue, Sep 6, 2022 at 3:34 AM Enzo Matsumiya <ematsumiya@suse.de> wrote:
> >> But also replace (DIV_ROUND_UP(len, 8) * 8) with ALIGN(len, 8), which,
> >> if not optimized by the compiler, has the overhead of a multiplication
> >> and a division. Do the same for roundup() by replacing it by round_up()
> >> (division-less version, but requires the multiple to be a power of 2,
> >> which is always the case for us).
> >
> >Many of these compile to the same division-less instructions
> >especially if any of the values are known at compile time.
> >
> >> @@ -2305,7 +2305,7 @@ int CIFSSMBRenameOpenFile(const unsigned int xid, struct cifs_tcon *pTcon,
> >
> >smb1 and computed at compile time
> >
> >> @@ -3350,8 +3350,7 @@ validate_ntransact(char *buf, char **ppparm, char **ppdata,
> >
> >smb1
> >
> >> @@ -2833,9 +2833,12 @@ ip_rfc1001_connect(struct TCP_Server_Info *server)
> >> @@ -2871,8 +2874,12 @@ ip_rfc1001_connect(struct TCP_Server_Info *server)
> >
> >connect time
> >
> >> @@ -1318,7 +1313,7 @@ sess_auth_ntlmv2(struct sess_data *sess_data)
> >> @@ -1442,8 +1437,7 @@ sess_auth_kerberos(struct sess_data *sess_data)
> >> @@ -1494,7 +1488,7 @@ sess_auth_kerberos(struct sess_data *sess_data)
> >> @@ -1546,7 +1540,7 @@ _sess_auth_rawntlmssp_assemble_req(struct sess_data *sess_data)
> >> @@ -1747,7 +1741,7 @@ sess_auth_rawntlmssp_authenticate(struct sess_data *sess_data)
> >
> >connect time
> >
> >> @@ -207,7 +207,7 @@ smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
> >> - size[0] = 1; /* sizeof __u8 See MS-FSCC section 2.4.11 */
> >> + size[0] = sizeof(u8); /* See MS-FSCC section 2.4.11 */
> >> - size[0] = 8; /* sizeof __le64 */
> >> + size[0] = sizeof(__le64);
> >
> >Hot path but no functional change
> >
> >> @@ -248,7 +248,7 @@ smb2_check_message(char *buf, unsigned int len, struct TCP_Server_Info *server)
> >> * Some windows servers (win2016) will pad also the final
> >> * PDU in a compound to 8 bytes.
> >> */
> >> - if (((calc_len + 7) & ~7) == len)
> >> + if (ALIGN(calc_len, 8) == len)
> >
> >Hot path but should compile to the same thing
> >
> >> @@ -466,15 +466,14 @@ build_signing_ctxt(struct smb2_signing_capabilities *pneg_ctxt)
> >> @@ -511,8 +510,7 @@ build_netname_ctxt(struct smb2_netname_neg_context *pneg_ctxt, char *hostname)
> >> @@ -557,18 +555,18 @@ assemble_neg_contexts(struct smb2_negotiate_req *req,
> >> @@ -595,9 +593,7 @@ assemble_neg_contexts(struct smb2_negotiate_req *req,
> >> @@ -780,7 +776,7 @@ static int smb311_decode_neg_context(struct smb2_negotiate_rsp *rsp,
> >
> >connect time
> >
> >> @@ -2413,7 +2409,7 @@ create_sd_buf(umode_t mode, bool set_owner, unsigned int *len)
> >> @@ -2487,7 +2483,7 @@ create_sd_buf(umode_t mode, bool set_owner, unsigned int *len)
> >
> >only relevant if mounted with some acl flags
> >
> >> @@ -2581,7 +2577,7 @@ alloc_path_with_tree_prefix(__le16 **out_path, int *out_size, int *out_len,
> >> - *out_size = roundup(*out_len * sizeof(__le16), 8);
> >> + *out_size = round_up(*out_len * sizeof(__le16), 8);
> >
> >Hot path but from my experiments, round_up() compiles to an *extra*
> >instruction. See below.
> >
> >> @@ -2687,20 +2683,17 @@ int smb311_posix_mkdir(const unsigned int xid, struct inode *inode,
> >
> >Only relevant on posix mounts.
> >
> >> @@ -2826,9 +2819,7 @@ SMB2_open_init(struct cifs_tcon *tcon, struct TCP_Server_Info *server,
> >> - copy_size = uni_path_len;
> >> - if (copy_size % 8 != 0)
> >> - copy_size = roundup(copy_size, 8);
> >> + copy_size = round_up(uni_path_len, 8);
> >> @@ -4090,7 +4081,7 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
> >> - *total_len = DIV_ROUND_UP(*total_len, 8) * 8;
> >> + *total_len = ALIGN(*total_len, 8);
> >
> >These 2 are also hot paths, but skeptical about the optimizations.
> >
> >I've looked at those macros in Compiler Explorer and sure enough they
> >compile to the same thing on x86_64.
> >Even worse, the optimized versions compile with extra instructions for some:
> >
> >https://godbolt.org/z/z1xhhW9sj
> >
> >size_t mod2(size_t num) {
> > return num%2;
> >}
> >
> >size_t is_aligned2(size_t num) {
> > return IS_ALIGNED(num, 2);
> >}
> >
> >mod2:
> > mov rax, rdi
> > and eax, 1
> > ret
> >
> >is_aligned2:
> > mov rax, rdi
> > not rax <=== extra
> > and eax, 1
> > ret
> >--------------------------
> >
> >size_t align8(size_t num) {
> > return ALIGN(num, 8);
> >}
> >
> >size_t align_andshift8(size_t num) {
> > return ((num + 7) & ~7);
> >}
> >
> >
> >align8:
> > lea rax, [rdi+7]
> > and rax, -8
> > ret
> >align_andshift8:
> > lea rax, [rdi+7]
> > and rax, -8
> > ret
> >
> >same code
> >--------------------------
> >
> >size_t dru8(size_t num) {
> > return DIV_ROUND_UP(num, 8)*8;
> >}
> >
> >size_t rnup8_a(size_t num) {
> > return round_up(num, 8);
> >}
> >
> >size_t rnup8_b(size_t num) {
> > return roundup(num, 8);
> >}
> >
> >dru8:
> > lea rax, [rdi+7]
> > and rax, -8
> > ret
> >rnup8_a:
> > lea rax, [rdi-1]
> > or rax, 7 <==== extra
> > add rax, 1
> > ret
> >rnup8_b:
> > lea rax, [rdi+7]
> > and rax, -8
> > ret
> >
> >round_up has an extra instruction
> >--------------------------
> >
> >I suspect the improvements are more likely to be related to caches,
> >system load, server load, ...
> >You can try to use perf to make a flamegraph and compare.
> >
> >Cheers,
--
Thanks,
Steve
next prev parent reply other threads:[~2022-09-14 2:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-06 1:30 [PATCH] cifs: perf improvement - use faster macros ALIGN() and round_up() Enzo Matsumiya
2022-09-06 2:36 ` ronnie sahlberg
2022-09-06 14:41 ` Enzo Matsumiya
2022-09-06 15:47 ` Tom Talpey
2022-09-06 16:05 ` Enzo Matsumiya
2022-09-07 8:37 ` Aurélien Aptel
2022-09-07 18:15 ` Enzo Matsumiya
2022-09-07 20:41 ` Enzo Matsumiya
2022-09-14 2:50 ` Steve French [this message]
2022-09-14 14:38 ` Enzo Matsumiya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAH2r5msyrbfM71WDpUggD7V_YTZhE50w7y4Umt=QjAG=Yfhz7g@mail.gmail.com' \
--to=smfrench@gmail.com \
--cc=aurelien.aptel@gmail.com \
--cc=ematsumiya@suse.de \
--cc=linux-cifs@vger.kernel.org \
--cc=nspmangalore@gmail.com \
--cc=pc@cjr.nz \
--cc=ronniesahlberg@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).