From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8144C43381 for ; Fri, 29 Mar 2019 13:36:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 92DE0217F5 for ; Fri, 29 Mar 2019 13:36:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=digitalocean.com header.i=@digitalocean.com header.b="edZJ5WVD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729669AbfC2NgB (ORCPT ); Fri, 29 Mar 2019 09:36:01 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:33092 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729524AbfC2NgA (ORCPT ); Fri, 29 Mar 2019 09:36:00 -0400 Received: by mail-qt1-f195.google.com with SMTP id k14so2579395qtb.0 for ; Fri, 29 Mar 2019 06:36:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digitalocean.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=P1txVTy3BOE9RrAQOHROueECHiNKVow0rjE1tqZzri8=; b=edZJ5WVDkeGbq9qGwKJMm/P14wVFpzffdrp6AH+byPvF4Waznp7jF3ai0ocpjRWHzm 6ug1C0Ef/fqBMtLAtnTpWl8FgwYkzWKRHPD/ibNRrW52o2/VqUOedTwOizsLiC1mCQk0 l1XZdYsO0Cj8YMFGK3WZJwOwiA5pcdj7rNfoM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=P1txVTy3BOE9RrAQOHROueECHiNKVow0rjE1tqZzri8=; b=k4ITeLnpjGk+fkS9uc8ez5qmD/5bX2TKilkF0B59AT7hQimk3Z15B9PojZSwrxHQ8C YSDry0QRsDNYkMCdcDqsiKGsfeoGxhduZslj6PUy596WYk6g55qtgo8+SpfHMotOaqda zQ81qoQDWmVpK+SGBv5vTN7iYJ4IOFmcCowzih85yYyske7GBI4+V8SAA1GPiZrTqlE6 a0SsSg37SCXuD9F8udCIYdfHMOc+vxi57dS6tgmfwoYGjghpcl44aXEB+SduUnyRo3Rm Ebk6mBhqr+ryxGt1O6IX3/OHoR3x27znISXdeqyva3F7jP96Oekfhz1fehpYMEUQqhAj uU7g== X-Gm-Message-State: APjAAAXA4a0Q8SxCdzanpjUO/0TH0oklYwJazY/5b6fisbmRCReKzQMk fboa2/gwZeS3OX479ZMCiND/8bZLU8mIVA== X-Google-Smtp-Source: APXvYqwqcjERvHt8sA2pypaI2yfx1AgUedR/UTptKBmeo4fUreYwDE/r3eNnglC40/DQFlYilz78lA== X-Received: by 2002:ac8:44c3:: with SMTP id b3mr29291027qto.349.1553866559813; Fri, 29 Mar 2019 06:35:59 -0700 (PDT) Received: from [192.168.1.240] (192-222-189-155.qc.cable.ebox.net. [192.222.189.155]) by smtp.gmail.com with ESMTPSA id c84sm1109050qke.0.2019.03.29.06.35.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 29 Mar 2019 06:35:58 -0700 (PDT) From: Julien Desfossez To: Subhra Mazumdar Cc: Julien Desfossez , Peter Zijlstra , mingo@kernel.org, tglx@linutronix.de, pjt@google.com, tim.c.chen@linux.intel.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Vineeth Pillai , Nishanth Aravamudan Subject: Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access Date: Fri, 29 Mar 2019 09:35:27 -0400 Message-Id: <1553866527-18879-1-git-send-email-jdesfossez@digitalocean.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 22, 2019 at 8:09 PM Subhra Mazumdar wrote: > Is the core wide lock primarily responsible for the regression? I ran > upto patch > 12 which also has the core wide lock for tagged cgroups and also calls > newidle_balance() from pick_next_task(). I don't see any regression. Of > course > the core sched version of pick_next_task() may be doing more but > comparing with > the __pick_next_task() it doesn't look too horrible. On further testing and investigation, we also agree that spinlock contention is not the major cause for the regression, but we feel that it should be one of the major contributing factors to this performance loss. To reduce the scope of the investigation of the performance regression, we designed a couple of smaller test cases (compared to big VMs running complex benchmarks) and it turns out the test case that is most impacted is a simple disk write-intensive case (up to 99% performance drop). CPU-intensive and scheduler-intensive tests (perf bench sched) behave pretty well. On the same server we used before (2x18 cores, 72 hardware threads), with all the non-essential services disabled, we setup a cpuset of 4 cores (8 hardware threads) and ran sysbench fileio on a dedicated drive (no RAID). With sysbench running with 8 threads in this cpuset without core scheduling, we get about 155.23 MiB/s in sequential write. If we enable the tag, we drop to 0.25 MiB/s. Interestingly, even with 4 threads, we see the same kind of performance drop. Command used: sysbench --test=fileio prepare cgexec -g cpu,cpuset:test sysbench --threads=4 --test=fileio \ --file-test-mode=seqwr run If we run this with the data in a ramdisk instead of a real drive, we don’t notice any drop. The amount of performance drops depends a bit depending on the machine, but it’s always significant. We spent a lot of time in the trace and noticed that a couple times during every run, the sysbench worker threads are waiting for IO sometimes up to 4 seconds, all the threads wait for the same duration, and during that time we don’t see any block-related softirq coming in. As soon as the interrupt is processed, sysbench gets woken up immediately. This long wait never happens without the core scheduling. So we are trying to see if there is a place where the interrupts are disabled for an extended period of time. The irqsoff tracer doesn’t seem to pick it up. Any thoughts about that ?