All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Victor Kamensky" <kamensky@cisco.com>
To: Khem Raj <raj.khem@gmail.com>,
	Alexander Kanavin <alex.kanavin@gmail.com>
Cc: OE-core <openembedded-core@lists.openembedded.org>,
	Ross Burton <ross@burtonini.com>
Subject: Re: [OE-core] [PATCH 1/2] qemu: add 34Kf-64tlb fictitious cpu type
Date: Thu, 8 Oct 2020 16:39:08 +0000	[thread overview]
Message-ID: <BYAPR11MB30476A66D1C6A0E1B1C3D6D6CD0B0@BYAPR11MB3047.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAMKF1srHSjgB2MGri=HqHfuzTUkzwPRqiLtoq6=-phDmRBH-UA@mail.gmail.com>

Hi Khem, Alexander,

Please response inline, look for 'kamensky>'

________________________________________
From: openembedded-core@lists.openembedded.org <openembedded-core@lists.openembedded.org> on behalf of Khem Raj <raj.khem@gmail.com>
Sent: Thursday, October 8, 2020 9:05 AM
To: Alexander Kanavin
Cc: Victor Kamensky (kamensky); OE-core; Ross Burton
Subject: Re: [OE-core] [PATCH 1/2] qemu: add 34Kf-64tlb fictitious cpu type

On Thu, Oct 8, 2020 at 4:53 AM Alexander Kanavin <alex.kanavin@gmail.com> wrote:
>
> Thanks - I note that Upstream-Status is missing, are you planning to approach qemu upstream with this?
>

Thinking about upstreaming, I think it might be worth proposing it upstream.

kamensky> Yes, the same was briefly discussed during today's big triage meeting:
kamensky> I will try to submit it to qemu upstream, and argue our case. it won't
kamensky> hurt to try it anyway.

kamensky> I as far as Upstream-Status concerned. Yes, it looks I've messed up,
kamensky> I've added 'Upstream Status' to OE patch. Did not realize that it should
kamensky> be added to the added patch itself.

Thanks,
Victor

> Alex
>
> On Thu, 8 Oct 2020 at 09:30, Ross Burton <ross@burtonini.com> wrote:
>>
>> Excellent work to identify a relatively simple way to dramatically
>> improve performance. Nice one!
>>
>> Ross
>>
>> On Wed, 7 Oct 2020 at 21:39, Victor Kamensky via
>> lists.openembedded.org <kamensky=cisco.com@lists.openembedded.org>
>> wrote:
>> >
>> > In Yocto Project PR 13992 it was reported that qemumips
>> > in autobuilder runs almost twice slower then qemumips64 and
>> > some times hit time out.
>> >
>> > Upon investigations of qemu-system with perf, gdb, and
>> > SystemTap and comparing qemumips and qemumips64 machines
>> > behavior it was noticed that qemu soft mmu code behaves
>> > quite different and in case if qemumips tlbwr instruction
>> > called 16 times more oftern. It happens that in qemumips64
>> > case qemu runs with cpu type that contains 64 TLB, but in case
>> > of qemumips qemu runs with cpu type that contains only
>> > 16 TLBs.
>> >
>> > The idea of proposed qemu patch is to introduce fictitious
>> > 34Kf-64tlb cpu type that defined exactly as 34Kf but has
>> > 64 TLBs, instead of original 16 TLBs.
>> >
>> > Testing of core-image-full-cmdline:do_testimage with
>> > 34Kf-64tlb shows 40% or so test execution real time
>> > improvement.
>> >
>> > Note for future porters of the patch: easiest way to update
>> > the patch and be in sync with 34Kf definition is to copy
>> > 34Kf machine definition and apply the following changes to
>> > it (just change 15 to 63 of CP0C1_MMU bits value)
>> >
>> > [kamensky@coreos-lnx2 qemu]$ diff ~/34Kf.c ~/34Kf-64tlb.c
>> > 2c2
>> > <         .name = "34Kf",
>> > >         .name = "34Kf-64tlb",
>> > 6c6
>> > <         .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (15 << CP0C1_MMU) |
>> > >         .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) |
>> >
>> > Fixes https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
>> >
>> > Upstream Status: Inappropriate
>> >
>> > Signed-off-by: Victor Kamensky <kamensky@cisco.com>
>> > ---
>> >  meta/recipes-devtools/qemu/qemu.inc                |   1 +
>> >  ...Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch | 118 +++++++++++++++++++++
>> >  2 files changed, 119 insertions(+)
>> >  create mode 100644 meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
>> >
>> > diff --git a/meta/recipes-devtools/qemu/qemu.inc b/meta/recipes-devtools/qemu/qemu.inc
>> > index bbb9038961..6c0edcb706 100644
>> > --- a/meta/recipes-devtools/qemu/qemu.inc
>> > +++ b/meta/recipes-devtools/qemu/qemu.inc
>> > @@ -31,6 +31,7 @@ SRC_URI = "https://download.qemu.org/${BPN}-${PV}.tar.xz \
>> >             file://0001-qemu-Do-not-include-file-if-not-exists.patch \
>> >             file://find_datadir.patch \
>> >             file://usb-fix-setup_len-init.patch \
>> > +           file://0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch \
>> >             "
>> >  UPSTREAM_CHECK_REGEX = "qemu-(?P<pver>\d+(\.\d+)+)\.tar"
>> >
>> > diff --git a/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch b/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
>> > new file mode 100644
>> > index 0000000000..b6312e1543
>> > --- /dev/null
>> > +++ b/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
>> > @@ -0,0 +1,118 @@
>> > +From b3fcc7d96523ad8e3ea28c09d495ef08529d01ce Mon Sep 17 00:00:00 2001
>> > +From: Victor Kamensky <kamensky@cisco.com>
>> > +Date: Wed, 7 Oct 2020 10:19:42 -0700
>> > +Subject: [PATCH] mips: add 34Kf-64tlb fictitious cpu type like 34Kf but with
>> > + 64 TLBs
>> > +
>> > +In Yocto Project CI runs it was observed that test run
>> > +of 32 bit mips image takes almost twice longer than 64 bit
>> > +mips image with the same logical load and CI execution
>> > +hits timeout.
>> > +
>> > +See https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
>> > +
>> > +Yocto project uses 34Kf cpu type to run 32 bit mips image,
>> > +and MIPS64R2-generic cpu type to run 64 bit mips64 image.
>> > +
>> > +Upon qemu behavior differences investigation between mips
>> > +and mips64 two prominent observations came up: under
>> > +logically similar load (same definition and configuration
>> > +of user-land image) in case of mips get_physical_address
>> > +function is called almost twice more often, meaning
>> > +twice more memory accesses involved in this case. Also
>> > +number of tlbwr instruction executed (r4k_helper_tlbwr
>> > +qemu function) almost 16 time bigger in mips case than in
>> > +mips64.
>> > +
>> > +It turns out that 34Kf cpu has 16 TLBs, but in case of
>> > +MIPS64R2-generic it is 64 TLBs. So that explains why
>> > +some many more tlbwr had to be execute by kernel TLB refill
>> > +handler in case of 32 bit misp.
>> > +
>> > +The idea of the fix is to come up with new 34Kf-64tlb fictitious
>> > +cpu type, that would behave exactly as 34Kf but it would
>> > +contain 64 TLBs to reduce TLB trashing. After all, adding
>> > +more TLBs to soft mmu is easy.
>> > +
>> > +Experiment with some significant non-trvial load in Yocto
>> > +environment by running do_testimage load shows that 34Kf-64tlb
>> > +cpu performs 40% or so better than original 34Kf cpu wrt test
>> > +execution real time.
>> > +
>> > +It is not ideal to have cpu type that does not exist in the
>> > +wild but given performance gains it seems to be justified.
>> > +
>> > +Signed-off-by: Victor Kamensky <kamensky@cisco.com>
>> > +---
>> > + target/mips/translate_init.inc.c | 55 ++++++++++++++++++++++++++++++++++++++++
>> > + 1 file changed, 55 insertions(+)
>> > +
>> > +diff --git a/target/mips/translate_init.inc.c b/target/mips/translate_init.inc.c
>> > +index 637caccd89..b73ab48231 100644
>> > +--- a/target/mips/translate_init.inc.c
>> > ++++ b/target/mips/translate_init.inc.c
>> > +@@ -297,6 +297,61 @@ const mips_def_t mips_defs[] =
>> > +         .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_MT,
>> > +         .mmu_type = MMU_TYPE_R4000,
>> > +     },
>> > ++    /*
>> > ++     * Verbatim copy of "34Kf" cpu, only bumped up number of TLB entries
>> > ++     * from 16 to 64 (see CP0_Config0 value at CP0C1_MMU bits) to improve
>> > ++     * performance by reducing number of TLB refill exceptions and
>> > ++     * eliminating need to run all corresponding TLB refill handling
>> > ++     * instructions.
>> > ++     */
>> > ++    {
>> > ++        .name = "34Kf-64tlb",
>> > ++        .CP0_PRid = 0x00019500,
>> > ++        .CP0_Config0 = MIPS_CONFIG0 | (0x1 << CP0C0_AR) |
>> > ++                       (MMU_TYPE_R4000 << CP0C0_MT),
>> > ++        .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) |
>> > ++                       (0 << CP0C1_IS) | (3 << CP0C1_IL) | (1 << CP0C1_IA) |
>> > ++                       (0 << CP0C1_DS) | (3 << CP0C1_DL) | (1 << CP0C1_DA) |
>> > ++                       (1 << CP0C1_CA),
>> > ++        .CP0_Config2 = MIPS_CONFIG2,
>> > ++        .CP0_Config3 = MIPS_CONFIG3 | (1 << CP0C3_VInt) | (1 << CP0C3_MT) |
>> > ++                       (1 << CP0C3_DSPP),
>> > ++        .CP0_LLAddr_rw_bitmask = 0,
>> > ++        .CP0_LLAddr_shift = 0,
>> > ++        .SYNCI_Step = 32,
>> > ++        .CCRes = 2,
>> > ++        .CP0_Status_rw_bitmask = 0x3778FF1F,
>> > ++        .CP0_TCStatus_rw_bitmask = (0 << CP0TCSt_TCU3) | (0 << CP0TCSt_TCU2) |
>> > ++                    (1 << CP0TCSt_TCU1) | (1 << CP0TCSt_TCU0) |
>> > ++                    (0 << CP0TCSt_TMX) | (1 << CP0TCSt_DT) |
>> > ++                    (1 << CP0TCSt_DA) | (1 << CP0TCSt_A) |
>> > ++                    (0x3 << CP0TCSt_TKSU) | (1 << CP0TCSt_IXMT) |
>> > ++                    (0xff << CP0TCSt_TASID),
>> > ++        .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_L) | (1 << FCR0_W) |
>> > ++                    (1 << FCR0_D) | (1 << FCR0_S) | (0x95 << FCR0_PRID),
>> > ++        .CP1_fcr31 = 0,
>> > ++        .CP1_fcr31_rw_bitmask = 0xFF83FFFF,
>> > ++        .CP0_SRSCtl = (0xf << CP0SRSCtl_HSS),
>> > ++        .CP0_SRSConf0_rw_bitmask = 0x3fffffff,
>> > ++        .CP0_SRSConf0 = (1U << CP0SRSC0_M) | (0x3fe << CP0SRSC0_SRS3) |
>> > ++                    (0x3fe << CP0SRSC0_SRS2) | (0x3fe << CP0SRSC0_SRS1),
>> > ++        .CP0_SRSConf1_rw_bitmask = 0x3fffffff,
>> > ++        .CP0_SRSConf1 = (1U << CP0SRSC1_M) | (0x3fe << CP0SRSC1_SRS6) |
>> > ++                    (0x3fe << CP0SRSC1_SRS5) | (0x3fe << CP0SRSC1_SRS4),
>> > ++        .CP0_SRSConf2_rw_bitmask = 0x3fffffff,
>> > ++        .CP0_SRSConf2 = (1U << CP0SRSC2_M) | (0x3fe << CP0SRSC2_SRS9) |
>> > ++                    (0x3fe << CP0SRSC2_SRS8) | (0x3fe << CP0SRSC2_SRS7),
>> > ++        .CP0_SRSConf3_rw_bitmask = 0x3fffffff,
>> > ++        .CP0_SRSConf3 = (1U << CP0SRSC3_M) | (0x3fe << CP0SRSC3_SRS12) |
>> > ++                    (0x3fe << CP0SRSC3_SRS11) | (0x3fe << CP0SRSC3_SRS10),
>> > ++        .CP0_SRSConf4_rw_bitmask = 0x3fffffff,
>> > ++        .CP0_SRSConf4 = (0x3fe << CP0SRSC4_SRS15) |
>> > ++                    (0x3fe << CP0SRSC4_SRS14) | (0x3fe << CP0SRSC4_SRS13),
>> > ++        .SEGBITS = 32,
>> > ++        .PABITS = 32,
>> > ++        .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_MT,
>> > ++        .mmu_type = MMU_TYPE_R4000,
>> > ++    },
>> > +     {
>> > +         .name = "74Kf",
>> > +         .CP0_PRid = 0x00019700,
>> > +--
>> > +2.14.5
>> > +
>> > --
>> > 2.14.5
>> >
>> >
>> >
>> >
>>
>>
>>
>
>
>

  reply	other threads:[~2020-10-08 16:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 20:38 [PATCH 0/2] qemumips: speeding up Victor Kamensky
2020-10-07 20:38 ` [PATCH 1/2] qemu: add 34Kf-64tlb fictitious cpu type Victor Kamensky
2020-10-07 20:46   ` [OE-core] " Paul Barker
2020-10-07 21:52     ` Victor Kamensky
2020-10-07 22:11       ` Khem Raj
2020-10-07 22:04     ` Richard Purdie
2020-10-07 22:15     ` Khem Raj
2020-10-07 22:24       ` Paul Barker
2020-10-07 22:05   ` Khem Raj
2020-10-08  5:05     ` Victor Kamensky
2020-10-08  5:55       ` Khem Raj
2020-10-08  7:29   ` [OE-core] " Ross Burton
2020-10-08 11:53     ` Alexander Kanavin
2020-10-08 16:05       ` Khem Raj
2020-10-08 16:39         ` Victor Kamensky [this message]
2020-10-07 20:38 ` [PATCH 2/2] qemumips: use 34Kf-64tlb CPU emulation Victor Kamensky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR11MB30476A66D1C6A0E1B1C3D6D6CD0B0@BYAPR11MB3047.namprd11.prod.outlook.com \
    --to=kamensky@cisco.com \
    --cc=alex.kanavin@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=raj.khem@gmail.com \
    --cc=ross@burtonini.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.