From: Gao Xiang <hsiangkao@redhat.com> To: Daeho Jeong <daeho43@gmail.com> Cc: Chao Yu <yuchao0@huawei.com>, Daeho Jeong <daehojeong@google.com>, kernel-team@android.com, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH] f2fs: change virtual mapping way for compression pages Date: Tue, 11 Aug 2020 19:29:12 +0800 [thread overview] Message-ID: <20200811112912.GB7870@xiangao.remote.csb> (raw) In-Reply-To: <CACOAw_zRPeGzHyc_siLqBRjURWTE61G5rGCwk7bnbcOnADGRpg@mail.gmail.com> On Tue, Aug 11, 2020 at 08:21:23PM +0900, Daeho Jeong wrote: > Sure, I'll update the test condition as you said in the commit message. > FYI, the test is done with 16kb chunk and Pixel 3 (arm64) device. Yeah, anyway, it'd better to lock the freq and offline the little cores in your test as well (it'd make more sense). e.g. if 16k cluster is applied, even all data is zeroed, the count of vmap/vm_map_ram isn't hugeous (and as you said, "sometimes, it has a very long delay", it's much like another scheduling concern as well). Anyway, I'm not against your commit but the commit message is a bit of unclear. At least, if you think that is really the case, I'm ok with that. Thanks, Gao Xiang > > Thanks, > > 2020년 8월 11일 (화) 오후 7:18, Gao Xiang <hsiangkao@redhat.com>님이 작성: > > > > On Tue, Aug 11, 2020 at 06:33:26PM +0900, Daeho Jeong wrote: > > > Plus, when we use vmap(), vmap() normally executes in a short time > > > like vm_map_ram(). > > > But, sometimes, it has a very long delay. > > > > > > 2020ë…„ 8ì›” 11ì�¼ (í™”) 오후 6:28, Daeho Jeong <daeho43@gmail.com>님ì�´ 작성: > > > > > > > > Actually, as you can see, I use the whole zero data blocks in the test file. > > > > It can maximize the effect of changing virtual mapping. > > > > When I use normal files which can be compressed about 70% from the > > > > original file, > > > > The vm_map_ram() version is about 2x faster than vmap() version. > > > > What f2fs does is much similar to btrfs compression. Even if these > > blocks are all zeroed. In principle, the maximum compression ratio > > is determined (cluster sized blocks into one compressed block, e.g > > 16k cluster into one compressed block). > > > > So it'd be better to describe your configured cluster size (16k or > > 128k) and your hardware information in the commit message as well. > > > > Actually, I also tried with this patch as well on my x86 laptop just > > now with FIO (I didn't use zeroed block though), and I didn't notice > > much difference with turbo boost off and maxfreq. > > > > I'm not arguing this commit, just a note about this commit message. > > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > > > IMHO, the above number is much like decompressing in the arm64 little cores. > > > > Thanks, > > Gao Xiang > > > > > > > > > > > > 2020ë…„ 8ì›” 11ì�¼ (í™”) 오후 4:55, Chao Yu <yuchao0@huawei.com>님ì�´ 작성: > > > > > > > > > > On 2020/8/11 15:15, Gao Xiang wrote: > > > > > > On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote: > > > > > >> From: Daeho Jeong <daehojeong@google.com> > > > > > >> > > > > > >> By profiling f2fs compression works, I've found vmap() callings are > > > > > >> bottlenecks of f2fs decompression path. Changing these with > > > > > >> vm_map_ram(), we can enhance f2fs decompression speed pretty much. > > > > > >> > > > > > >> [Verification] > > > > > >> dd if=/dev/zero of=dummy bs=1m count=1000 > > > > > >> echo 3 > /proc/sys/vm/drop_caches > > > > > >> dd if=dummy of=/dev/zero bs=512k > > > > > >> > > > > > >> - w/o compression - > > > > > >> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s > > > > > >> > > > > > >> - before patch - > > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > > > > >> > > > > > >> - after patch - > > > > > >> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s > > > > > > > > > > > > Indeed, vmap() approach has some impact on the whole > > > > > > workflow. But I don't think the gap is such significant, > > > > > > maybe it relates to unlocked cpufreq (and big little > > > > > > core difference if it's on some arm64 board). > > > > > > > > > > Agreed, > > > > > > > > > > I guess there should be other reason causing the large performance > > > > > gap, scheduling, frequency, or something else. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Linux-f2fs-devel mailing list > > > > > > Linux-f2fs-devel@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > > > > > . > > > > > > > > > > > >
WARNING: multiple messages have this Message-ID (diff)
From: Gao Xiang <hsiangkao@redhat.com> To: Daeho Jeong <daeho43@gmail.com> Cc: kernel-team@android.com, Daeho Jeong <daehojeong@google.com>, linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [f2fs-dev] [PATCH] f2fs: change virtual mapping way for compression pages Date: Tue, 11 Aug 2020 19:29:12 +0800 [thread overview] Message-ID: <20200811112912.GB7870@xiangao.remote.csb> (raw) In-Reply-To: <CACOAw_zRPeGzHyc_siLqBRjURWTE61G5rGCwk7bnbcOnADGRpg@mail.gmail.com> On Tue, Aug 11, 2020 at 08:21:23PM +0900, Daeho Jeong wrote: > Sure, I'll update the test condition as you said in the commit message. > FYI, the test is done with 16kb chunk and Pixel 3 (arm64) device. Yeah, anyway, it'd better to lock the freq and offline the little cores in your test as well (it'd make more sense). e.g. if 16k cluster is applied, even all data is zeroed, the count of vmap/vm_map_ram isn't hugeous (and as you said, "sometimes, it has a very long delay", it's much like another scheduling concern as well). Anyway, I'm not against your commit but the commit message is a bit of unclear. At least, if you think that is really the case, I'm ok with that. Thanks, Gao Xiang > > Thanks, > > 2020ë…„ 8ì›” 11ì�¼ (í™”) 오후 7:18, Gao Xiang <hsiangkao@redhat.com>님ì�´ 작성: > > > > On Tue, Aug 11, 2020 at 06:33:26PM +0900, Daeho Jeong wrote: > > > Plus, when we use vmap(), vmap() normally executes in a short time > > > like vm_map_ram(). > > > But, sometimes, it has a very long delay. > > > > > > 2020ë…„ 8ì›â€� 11� (Ãâ„¢â€�) 오Û„ 6:28, Daeho Jeong <daeho43@gmail.com>님� 작성: > > > > > > > > Actually, as you can see, I use the whole zero data blocks in the test file. > > > > It can maximize the effect of changing virtual mapping. > > > > When I use normal files which can be compressed about 70% from the > > > > original file, > > > > The vm_map_ram() version is about 2x faster than vmap() version. > > > > What f2fs does is much similar to btrfs compression. Even if these > > blocks are all zeroed. In principle, the maximum compression ratio > > is determined (cluster sized blocks into one compressed block, e.g > > 16k cluster into one compressed block). > > > > So it'd be better to describe your configured cluster size (16k or > > 128k) and your hardware information in the commit message as well. > > > > Actually, I also tried with this patch as well on my x86 laptop just > > now with FIO (I didn't use zeroed block though), and I didn't notice > > much difference with turbo boost off and maxfreq. > > > > I'm not arguing this commit, just a note about this commit message. > > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > > > IMHO, the above number is much like decompressing in the arm64 little cores. > > > > Thanks, > > Gao Xiang > > > > > > > > > > > > 2020ë…„ 8ì›â€� 11� (Ãâ„¢â€�) 오Û„ 4:55, Chao Yu <yuchao0@huawei.com>님� 작성: > > > > > > > > > > On 2020/8/11 15:15, Gao Xiang wrote: > > > > > > On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote: > > > > > >> From: Daeho Jeong <daehojeong@google.com> > > > > > >> > > > > > >> By profiling f2fs compression works, I've found vmap() callings are > > > > > >> bottlenecks of f2fs decompression path. Changing these with > > > > > >> vm_map_ram(), we can enhance f2fs decompression speed pretty much. > > > > > >> > > > > > >> [Verification] > > > > > >> dd if=/dev/zero of=dummy bs=1m count=1000 > > > > > >> echo 3 > /proc/sys/vm/drop_caches > > > > > >> dd if=dummy of=/dev/zero bs=512k > > > > > >> > > > > > >> - w/o compression - > > > > > >> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s > > > > > >> > > > > > >> - before patch - > > > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > > > > >> > > > > > >> - after patch - > > > > > >> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s > > > > > >> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s > > > > > > > > > > > > Indeed, vmap() approach has some impact on the whole > > > > > > workflow. But I don't think the gap is such significant, > > > > > > maybe it relates to unlocked cpufreq (and big little > > > > > > core difference if it's on some arm64 board). > > > > > > > > > > Agreed, > > > > > > > > > > I guess there should be other reason causing the large performance > > > > > gap, scheduling, frequency, or something else. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Linux-f2fs-devel mailing list > > > > > > Linux-f2fs-devel@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > > > > > . > > > > > > > > > > > > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2020-08-11 11:29 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-11 3:37 [PATCH] f2fs: change virtual mapping way for compression pages Daeho Jeong 2020-08-11 3:37 ` [f2fs-dev] " Daeho Jeong 2020-08-11 7:15 ` Gao Xiang 2020-08-11 7:15 ` Gao Xiang 2020-08-11 7:54 ` Chao Yu 2020-08-11 7:54 ` Chao Yu 2020-08-11 9:28 ` Daeho Jeong 2020-08-11 9:28 ` Daeho Jeong 2020-08-11 9:33 ` Daeho Jeong 2020-08-11 9:33 ` Daeho Jeong 2020-08-11 10:18 ` Gao Xiang 2020-08-11 10:18 ` Gao Xiang 2020-08-11 11:21 ` Daeho Jeong 2020-08-11 11:21 ` Daeho Jeong 2020-08-11 11:29 ` Gao Xiang [this message] 2020-08-11 11:29 ` Gao Xiang 2020-08-11 11:31 ` Daeho Jeong 2020-08-11 11:31 ` Daeho Jeong 2020-08-12 1:51 ` Chao Yu 2020-08-12 1:51 ` Chao Yu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200811112912.GB7870@xiangao.remote.csb \ --to=hsiangkao@redhat.com \ --cc=daeho43@gmail.com \ --cc=daehojeong@google.com \ --cc=kernel-team@android.com \ --cc=linux-f2fs-devel@lists.sourceforge.net \ --cc=linux-kernel@vger.kernel.org \ --cc=yuchao0@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.