linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v4] sched: automated per session task groups
@ 2010-11-21 13:37 Ingo Molnar
  2010-11-21 13:39 ` Ingo Molnar
                   ` (3 more replies)
  0 siblings, 4 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-21 13:37 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

[-- Attachment #1: Type: text/plain, Size: 4191 bytes --]


Hello Mike,

* Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2010-11-16 at 18:28 +0100, Ingo Molnar wrote:
> 
> > Mike,
> > 
> > Mind sending a new patch with a separate v2 announcement in a new thread, once you 
> > have something i could apply to the scheduler tree (for a v2.6.38 merge)?
> 
> Changes since last:
> - switch to per session vs tty
> - make autogroups visible in /proc/sched_debug
> - make autogroups visible in /proc/<pid>/autogroup
> - add nice level bandwidth tweakability to /proc/<pid>/autogroup

I tested it a bit, and autosched-v4 crashes on bootup with with attached config.

Note: the box has serial logging enabled and there's UART code in the stacktrace - 
maybe it's related. Let me know if you need the full bootup log.

Thanks,

	Ingo

[FAILED]
Enabling local filesystem quotas:  [  OK  ]
PPS event at 4294886381
Enabling /etc/fstab swaps:  swapon: /dev/hda2: Function not implemented
[FAILED]
INIT: Entering runleveBUG: unable to handle kernel paging request at f548604c
IP:l: 3 [<c10307f0>] update_cfs_shares+0x60/0x160
*pdpt = 0000000002017001 *pde = 00000000029d4067 *pte = 8000000035486160 
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sr0/dev

Pid: 1, comm: init Not tainted 2.6.37-rc2-tip+ #64308 A8N-E/System Product Name
EIP: 0060:[<c10307f0>] EFLAGS: 00010086 CPU: 1
EIP is at update_cfs_shares+0x60/0x160
EAX: fffffffe EBX: f547603b ECX: 00000400 EDX: 00000002
ESI: f5486000 EDI: 0000013b EBP: f6459d48 ESP: f6459d3c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process init (pid: 1, ti=f6458000 task=f6450000 task.ti=f6458000)
Stack:
 f5475a80 f6f066c0 00000004 f6459d84 c103256f 00000002 00000001 00000000
 c10324d0 c200e6c0 00000001 f6f06b34 00000046 f5475a80 f5475ac8 f6f066c0
 00000001 ffffffff f6459dfc c1b32820 f64a0010 f6459dc4 00000046 00000000
Call Trace:
 [<c103256f>] update_shares+0x9f/0x170
 [<c10324d0>] ? update_shares+0x0/0x170
 [<c1b32820>] schedule+0x580/0x9d0
 [<c1039335>] ? sub_preempt_count+0xa5/0xe0
 [<c1b330e5>] schedule_timeout+0x125/0x2a0
 [<c104fe60>] ? process_timeout+0x0/0x10
 [<c15aef4f>] uart_close+0x17f/0x350
 [<c105fea0>] ? autoremove_wake_function+0x0/0x50
 [<c1471f72>] tty_release+0x102/0x500
 [<c1125fdf>] ? locks_remove_posix+0xf/0xa0
 [<c1119a43>] ? fsnotify+0x1e3/0x2f0
 [<c11198d3>] ? fsnotify+0x73/0x2f0
 [<c10ea1e1>] fput+0xb1/0x230
 [<c10e7e7e>] filp_close+0x4e/0x70
 [<c10e7f14>] sys_close+0x74/0xc0
 [<c1002b90>] sysenter_do_call+0x12/0x31
Code: 00 00 00 8b 18 8b 79 1c 8b 49 18 2b b8 84 00 00 00 01 d3 89 d8 0f af c1 01 fb 74 07 89 c2 c1 fa 1f f7 fb 83 f8 02 ba 02 00 00 00 <8b> 5e 4c 0f 4d d0 39 d1 0f 42 d1 8b 4e 1c 85 c9 0f 84 6a 00 00 
EIP: [<c10307f0>] update_cfs_shares+0x60/0x160 SS:ESP 0068:f6459d3c
CR2: 00000000f548604c
---[ end trace f0ad48f53e29a8fe ]---
Kernel panic - not syncing: Fatal exception
Pid: 1, comm: init Tainted: G      D     2.6.37-rc2-tip+ #64308
Call Trace:
 [<c1b31ef1>] ? panic+0x66/0x15c
 [<c10065c3>] ? oops_end+0x83/0x90
 [<c10220fc>] ? no_context+0xbc/0x190
 [<c102225d>] ? __bad_area_nosemaphore+0x8d/0x130
 [<c10219a4>] ? vmalloc_fault+0x14/0x1c0
 [<c1021b64>] ? spurious_fault+0x14/0x110
 [<c1022317>] ? bad_area_nosemaphore+0x17/0x20
 [<c1022741>] ? do_page_fault+0x281/0x4c0
 [<c1008756>] ? native_sched_clock+0x26/0x90
 [<c1066033>] ? sched_clock_local+0xd3/0x1c0
 [<c10224c0>] ? do_page_fault+0x0/0x4c0
 [<c1b361e2>] ? error_code+0x5a/0x60
 [<c10224c0>] ? do_page_fault+0x0/0x4c0
 [<c10307f0>] ? update_cfs_shares+0x60/0x160
 [<c103256f>] ? update_shares+0x9f/0x170
 [<c10324d0>] ? update_shares+0x0/0x170
 [<c1b32820>] ? schedule+0x580/0x9d0
 [<c1039335>] ? sub_preempt_count+0xa5/0xe0
 [<c1b330e5>] ? schedule_timeout+0x125/0x2a0
 [<c104fe60>] ? process_timeout+0x0/0x10
 [<c15aef4f>] ? uart_close+0x17f/0x350
 [<c105fea0>] ? autoremove_wake_function+0x0/0x50
 [<c1471f72>] ? tty_release+0x102/0x500
 [<c1125fdf>] ? locks_remove_posix+0xf/0xa0
 [<c1119a43>] ? fsnotify+0x1e3/0x2f0
 [<c11198d3>] ? fsnotify+0x73/0x2f0
 [<c10ea1e1>] ? fput+0xb1/0x230
 [<c10e7e7e>] ? filp_close+0x4e/0x70
 [<c10e7f14>] ? sys_close+0x74/0xc0
 [<c1002b90>] ? sysenter_do_call+0x12/0x31
Rebooting in 1 seconds..Press any key to enter the menu


[-- Attachment #2: config --]
[-- Type: text/plain, Size: 68197 bytes --]

#
# Automatically generated make config: don't edit
# Linux/i386 2.6.37-rc2 Kernel Configuration
# Sun Nov 21 15:57:07 2010
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_KTIME_SCALAR=y
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
# CONFIG_GENERIC_HARDIRQS_NO_DEPRECATED is not set
CONFIG_HAVE_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
# CONFIG_AUTO_IRQ_AFFINITY is not set
# CONFIG_IRQ_PER_CPU is not set
# CONFIG_HARDIRQS_SW_RESEND is not set
CONFIG_SPARSE_IRQ=y

#
# RCU Subsystem
#
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_EXACT=y
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
CONFIG_USER_NS=y
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
CONFIG_SCHED_AUTOGROUP=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_LZO=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_PERF_COUNTERS is not set
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE_EVENT_MULTIPLEX=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_JUMP_LABEL is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y

#
# GCOV-based kernel profiling
#
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_LBDAF is not set
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_INTEGRITY=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
# CONFIG_INLINE_SPIN_TRYLOCK is not set
# CONFIG_INLINE_SPIN_TRYLOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK is not set
# CONFIG_INLINE_SPIN_LOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK_IRQ is not set
# CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
# CONFIG_INLINE_SPIN_UNLOCK is not set
# CONFIG_INLINE_SPIN_UNLOCK_BH is not set
# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set
# CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_READ_TRYLOCK is not set
# CONFIG_INLINE_READ_LOCK is not set
# CONFIG_INLINE_READ_LOCK_BH is not set
# CONFIG_INLINE_READ_LOCK_IRQ is not set
# CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
# CONFIG_INLINE_READ_UNLOCK is not set
# CONFIG_INLINE_READ_UNLOCK_BH is not set
# CONFIG_INLINE_READ_UNLOCK_IRQ is not set
# CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_WRITE_TRYLOCK is not set
# CONFIG_INLINE_WRITE_LOCK is not set
# CONFIG_INLINE_WRITE_LOCK_BH is not set
# CONFIG_INLINE_WRITE_LOCK_IRQ is not set
# CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
# CONFIG_INLINE_WRITE_UNLOCK is not set
# CONFIG_INLINE_WRITE_UNLOCK_BH is not set
# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set
# CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
# CONFIG_MUTEX_SPIN_ON_OWNER is not set
# CONFIG_FREEZER is not set

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_MPPARSE=y
# CONFIG_X86_BIGSMP is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_ELAN is not set
CONFIG_X86_MRST=y
CONFIG_X86_RDC321X=y
CONFIG_X86_32_NON_STANDARD=y
CONFIG_X86_NUMAQ=y
CONFIG_X86_SUMMIT=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_NO_BOOTMEM=y
# CONFIG_MEMTEST is not set
CONFIG_X86_SUMMIT_NUMA=y
CONFIG_X86_CYCLONE_TIMER=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
CONFIG_M686=y
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=7
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
# CONFIG_X86_PPRO_FENCE is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=5
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_CYRIX_32=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_TRANSMETA_32=y
CONFIG_CPU_SUP_UMC_32=y
# CONFIG_HPET_TIMER is not set
CONFIG_APB_TIMER=y
CONFIG_DMI=y
# CONFIG_IOMMU_HELPER is not set
CONFIG_IOMMU_API=y
CONFIG_NR_CPUS=8
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
# CONFIG_X86_MCE_AMD is not set
CONFIG_X86_ANCIENT_MCE=y
CONFIG_X86_MCE_INJECT=y
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_X86_REBOOTFIXUPS=y
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
CONFIG_X86_CPUID=y
CONFIG_HIGHMEM64G=y
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_HIGHMEM=y
CONFIG_X86_PAE=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_NUMA=y
CONFIG_NODES_SHIFT=4
CONFIG_HAVE_ARCH_BOOTMEM=y
CONFIG_ARCH_HAVE_MEMORY_PRESENT=y
CONFIG_NEED_NODE_MEMMAP_SIZE=y
CONFIG_HAVE_ARCH_ALLOC_REMAP=y
CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_DISCONTIGMEM_MANUAL=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_DISCONTIGMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=999999
# CONFIG_MIGRATION is not set
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
# CONFIG_KSM is not set
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
# CONFIG_HIGHPTE is not set
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MATH_EMULATION=y
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_EFI=y
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
# CONFIG_SCHED_HRTICK is not set
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x1000000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID is not set
# CONFIG_USE_PERCPU_NUMA_NODE_ID is not set

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SUSPEND is not set
CONFIG_PM_RUNTIME=y
CONFIG_PM_OPS=y
CONFIG_ACPI=y
# CONFIG_ACPI_PROCFS is not set
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_POWER_METER=y
# CONFIG_ACPI_EC_DEBUGFS is not set
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_VIDEO is not set
# CONFIG_ACPI_FAN is not set
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_DEBUG_FUNC_TRACE=y
CONFIG_ACPI_PCI_SLOT=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_SBS is not set
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_APEI is not set
CONFIG_SFI=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_INTEL_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_CNB20LE_QUIRK=y
CONFIG_DMAR=y
# CONFIG_DMAR_DEFAULT_ON is not set
CONFIG_DMAR_FLOPPY_WA=y
CONFIG_PCIEPORTBUS=y
# CONFIG_HOTPLUG_PCI_PCIE is not set
CONFIG_PCIEAER=y
CONFIG_PCIE_ECRC=y
# CONFIG_PCIEAER_INJECT is not set
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEBUG=y
CONFIG_PCIE_PME=y
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
CONFIG_PCI_IOV=y
CONFIG_PCI_IOAPIC=y
CONFIG_ISA_DMA_API=y
CONFIG_ISA=y
# CONFIG_EISA is not set
CONFIG_MCA=y
# CONFIG_MCA_LEGACY is not set
# CONFIG_SCx200 is not set
# CONFIG_OLPC is not set
CONFIG_AMD_NB=y
# CONFIG_PCCARD is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_FAKE is not set
CONFIG_HOTPLUG_PCI_COMPAQ=y
# CONFIG_HOTPLUG_PCI_COMPAQ_NVRAM is not set
CONFIG_HOTPLUG_PCI_IBM=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_HAVE_AOUT=y
# CONFIG_BINFMT_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM=y
# CONFIG_XFRM_USER is not set
CONFIG_XFRM_SUB_POLICY=y
# CONFIG_XFRM_MIGRATE is not set
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=y
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=y
CONFIG_NET_IPGRE_DEMUX=y
# CONFIG_NET_IPGRE is not set
CONFIG_IP_MROUTE=y
# CONFIG_IP_PIMSM_V1 is not set
# CONFIG_IP_PIMSM_V2 is not set
CONFIG_ARPD=y
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
CONFIG_INET_ESP=y
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_LRO=y
# CONFIG_INET_DIAG is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
# CONFIG_TCP_CONG_CUBIC is not set
CONFIG_TCP_CONG_WESTWOOD=y
CONFIG_TCP_CONG_HTCP=y
CONFIG_TCP_CONG_HSTCP=y
# CONFIG_TCP_CONG_HYBLA is not set
CONFIG_TCP_CONG_VEGAS=y
CONFIG_TCP_CONG_SCALABLE=y
CONFIG_TCP_CONG_LP=y
CONFIG_TCP_CONG_VENO=y
CONFIG_TCP_CONG_YEAH=y
# CONFIG_TCP_CONG_ILLINOIS is not set
# CONFIG_DEFAULT_BIC is not set
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
CONFIG_DEFAULT_VENO=y
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="veno"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETLABEL is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=y
# CONFIG_NETFILTER_NETLINK_QUEUE is not set
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
# CONFIG_NF_CT_PROTO_SCTP is not set
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=y
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_H323=y
# CONFIG_NF_CONNTRACK_IRC is not set
CONFIG_NF_CONNTRACK_NETBIOS_NS=y
CONFIG_NF_CONNTRACK_PPTP=y
# CONFIG_NF_CONNTRACK_SANE is not set
# CONFIG_NF_CONNTRACK_SIP is not set
CONFIG_NF_CONNTRACK_TFTP=y
# CONFIG_NF_CT_NETLINK is not set
CONFIG_NETFILTER_TPROXY=y
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=y
CONFIG_NETFILTER_XT_CONNMARK=y

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
# CONFIG_NETFILTER_XT_TARGET_CONNMARK is not set
# CONFIG_NETFILTER_XT_TARGET_CT is not set
CONFIG_NETFILTER_XT_TARGET_DSCP=y
CONFIG_NETFILTER_XT_TARGET_HL=y
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
# CONFIG_NETFILTER_XT_TARGET_LED is not set
# CONFIG_NETFILTER_XT_TARGET_MARK is not set
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
CONFIG_NETFILTER_XT_TARGET_NOTRACK=y
CONFIG_NETFILTER_XT_TARGET_RATEEST=y
CONFIG_NETFILTER_XT_TARGET_TEE=y
# CONFIG_NETFILTER_XT_TARGET_TPROXY is not set
# CONFIG_NETFILTER_XT_TARGET_TRACE is not set
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_CLUSTER is not set
CONFIG_NETFILTER_XT_MATCH_COMMENT=y
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=y
# CONFIG_NETFILTER_XT_MATCH_CONNLIMIT is not set
CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
CONFIG_NETFILTER_XT_MATCH_CPU=y
# CONFIG_NETFILTER_XT_MATCH_DCCP is not set
# CONFIG_NETFILTER_XT_MATCH_DSCP is not set
# CONFIG_NETFILTER_XT_MATCH_ESP is not set
# CONFIG_NETFILTER_XT_MATCH_HASHLIMIT is not set
CONFIG_NETFILTER_XT_MATCH_HELPER=y
CONFIG_NETFILTER_XT_MATCH_HL=y
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
# CONFIG_NETFILTER_XT_MATCH_IPVS is not set
# CONFIG_NETFILTER_XT_MATCH_LENGTH is not set
# CONFIG_NETFILTER_XT_MATCH_LIMIT is not set
CONFIG_NETFILTER_XT_MATCH_MAC=y
# CONFIG_NETFILTER_XT_MATCH_MARK is not set
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
CONFIG_NETFILTER_XT_MATCH_OSF=y
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
# CONFIG_NETFILTER_XT_MATCH_POLICY is not set
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=y
# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
CONFIG_NETFILTER_XT_MATCH_RATEEST=y
# CONFIG_NETFILTER_XT_MATCH_REALM is not set
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
CONFIG_NETFILTER_XT_MATCH_SCTP=y
CONFIG_NETFILTER_XT_MATCH_SOCKET=y
# CONFIG_NETFILTER_XT_MATCH_STATE is not set
CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
CONFIG_NETFILTER_XT_MATCH_STRING=y
CONFIG_NETFILTER_XT_MATCH_TCPMSS=y
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
CONFIG_IP_VS=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
# CONFIG_IP_VS_PROTO_UDP is not set
CONFIG_IP_VS_PROTO_AH_ESP=y
# CONFIG_IP_VS_PROTO_ESP is not set
CONFIG_IP_VS_PROTO_AH=y
# CONFIG_IP_VS_PROTO_SCTP is not set

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=y
CONFIG_IP_VS_WRR=y
CONFIG_IP_VS_LC=y
CONFIG_IP_VS_WLC=y
CONFIG_IP_VS_LBLC=y
CONFIG_IP_VS_LBLCR=y
CONFIG_IP_VS_DH=y
# CONFIG_IP_VS_SH is not set
# CONFIG_IP_VS_SED is not set
# CONFIG_IP_VS_NQ is not set

#
# IPVS application helper
#
# CONFIG_IP_VS_NFCT is not set

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_CONNTRACK_IPV4=y
# CONFIG_NF_CONNTRACK_PROC_COMPAT is not set
CONFIG_IP_NF_QUEUE=y
CONFIG_IP_NF_IPTABLES=y
# CONFIG_IP_NF_MATCH_ADDRTYPE is not set
# CONFIG_IP_NF_MATCH_AH is not set
# CONFIG_IP_NF_MATCH_ECN is not set
# CONFIG_IP_NF_MATCH_TTL is not set
# CONFIG_IP_NF_FILTER is not set
CONFIG_IP_NF_TARGET_LOG=y
# CONFIG_IP_NF_TARGET_ULOG is not set
# CONFIG_NF_NAT is not set
CONFIG_IP_NF_MANGLE=y
CONFIG_IP_NF_TARGET_CLUSTERIP=y
CONFIG_IP_NF_TARGET_ECN=y
# CONFIG_IP_NF_TARGET_TTL is not set
CONFIG_IP_NF_RAW=y
CONFIG_IP_NF_SECURITY=y
# CONFIG_IP_NF_ARPTABLES is not set
# CONFIG_BRIDGE_NF_EBTABLES is not set
CONFIG_IP_DCCP=y

#
# DCCP CCIDs Configuration (EXPERIMENTAL)
#
CONFIG_IP_DCCP_CCID2_DEBUG=y
CONFIG_IP_DCCP_CCID3=y
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_TFRC_LIB=y

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
CONFIG_IP_SCTP=y
CONFIG_SCTP_DBG_MSG=y
CONFIG_SCTP_DBG_OBJCNT=y
# CONFIG_SCTP_HMAC_NONE is not set
CONFIG_SCTP_HMAC_SHA1=y
# CONFIG_SCTP_HMAC_MD5 is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
CONFIG_STP=y
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
CONFIG_LLC2=y
CONFIG_IPX=y
# CONFIG_IPX_INTERN is not set
CONFIG_ATALK=y
CONFIG_DEV_APPLETALK=y
# CONFIG_LTPC is not set
# CONFIG_COPS is not set
CONFIG_IPDDP=y
CONFIG_IPDDP_ENCAP=y
CONFIG_IPDDP_DECAP=y
CONFIG_X25=y
CONFIG_LAPB=y
CONFIG_ECONET=y
# CONFIG_ECONET_AUNUDP is not set
CONFIG_ECONET_NATIVE=y
CONFIG_WAN_ROUTER=y
CONFIG_PHONET=y
CONFIG_PHONET_PIPECTRLR=y
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
CONFIG_NET_SCH_PRIO=y
# CONFIG_NET_SCH_MULTIQ is not set
CONFIG_NET_SCH_RED=y
# CONFIG_NET_SCH_SFQ is not set
CONFIG_NET_SCH_TEQL=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_GRED=y
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
CONFIG_NET_CLS_ROUTE4=y
CONFIG_NET_CLS_ROUTE=y
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
CONFIG_NET_CLS_RSVP=y
CONFIG_NET_CLS_RSVP6=y
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
# CONFIG_NET_EMATCH_CMP is not set
# CONFIG_NET_EMATCH_NBYTE is not set
# CONFIG_NET_EMATCH_U32 is not set
# CONFIG_NET_EMATCH_META is not set
CONFIG_NET_EMATCH_TEXT=y
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
CONFIG_RPS=y

#
# Network testing
#
CONFIG_NET_PKTGEN=y
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
CONFIG_IRDA=y

#
# IrDA protocols
#
# CONFIG_IRLAN is not set
CONFIG_IRCOMM=y
# CONFIG_IRDA_ULTRA is not set

#
# IrDA options
#
CONFIG_IRDA_CACHE_LAST_LSAP=y
# CONFIG_IRDA_FAST_RR is not set
# CONFIG_IRDA_DEBUG is not set

#
# Infrared-port device drivers
#

#
# SIR device drivers
#
# CONFIG_IRTTY_SIR is not set

#
# Dongle support
#
CONFIG_KINGSUN_DONGLE=y
CONFIG_KSDAZZLE_DONGLE=y
CONFIG_KS959_DONGLE=y

#
# FIR device drivers
#
CONFIG_USB_IRDA=y
CONFIG_SIGMATEL_FIR=y
# CONFIG_NSC_FIR is not set
# CONFIG_WINBOND_FIR is not set
# CONFIG_TOSHIBA_FIR is not set
CONFIG_SMC_IRCC_FIR=y
# CONFIG_ALI_FIR is not set
CONFIG_VLSI_FIR=y
CONFIG_VIA_FIR=y
# CONFIG_MCS_FIR is not set
# CONFIG_BT is not set
CONFIG_AF_RXRPC=y
CONFIG_AF_RXRPC_DEBUG=y
CONFIG_RXKAD=y
CONFIG_WIRELESS=y
CONFIG_WIRELESS_EXT=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_WEXT_SPY=y
CONFIG_WEXT_PRIV=y
# CONFIG_CFG80211 is not set
CONFIG_WIRELESS_EXT_SYSFS=y
CONFIG_LIB80211=y
# CONFIG_LIB80211_DEBUG is not set

#
# CFG80211 needs to be enabled for MAC80211
#

#
# Some wireless drivers require a rate control algorithm
#
CONFIG_WIMAX=y
CONFIG_WIMAX_DEBUG_LEVEL=8
# CONFIG_RFKILL is not set
CONFIG_CAIF=y
# CONFIG_CAIF_DEBUG is not set
# CONFIG_CAIF_NETDEV is not set
CONFIG_CEPH_LIB=y
CONFIG_CEPH_LIB_PRETTYDEBUG=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
CONFIG_DEBUG_DRIVER=y
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_CONNECTOR is not set
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_ISAPNP=y
# CONFIG_PNPBIOS is not set
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
CONFIG_BLK_DEV_XD=y
CONFIG_BLK_CPQ_DA=y
CONFIG_BLK_CPQ_CISS_DA=y
# CONFIG_CISS_SCSI_TAPE is not set
CONFIG_BLK_DEV_DAC960=y
CONFIG_BLK_DEV_UMEM=y
# CONFIG_BLK_DEV_COW_COMMON is not set
# CONFIG_BLK_DEV_LOOP is not set

#
# DRBD disabled because PROC_FS, INET or CONNECTOR not selected
#
# CONFIG_BLK_DEV_NBD is not set
CONFIG_BLK_DEV_SX8=y
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_XIP=y
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=8
CONFIG_CDROM_PKTCDVD_WCACHE=y
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_BLK_DEV_HD is not set
# CONFIG_BLK_DEV_RBD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_AD525X_DPOT is not set
CONFIG_IBM_ASM=y
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
CONFIG_TIFM_CORE=y
CONFIG_TIFM_7XX1=y
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_CS5535_MFGPT is not set
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
# CONFIG_ISL29020 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1780 is not set
# CONFIG_SENSORS_BH1770 is not set
CONFIG_SENSORS_APDS990X=y
CONFIG_HMC6352=y
CONFIG_DS1682=y
CONFIG_TI_DAC7512=y
CONFIG_VMWARE_BALLOON=y
# CONFIG_BMP085 is not set
CONFIG_PCH_PHUB=y
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_AT25 is not set
CONFIG_EEPROM_LEGACY=y
# CONFIG_EEPROM_MAX6875 is not set
CONFIG_EEPROM_93CX6=y
CONFIG_CB710_CORE=y
CONFIG_CB710_DEBUG=y
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
# CONFIG_SCSI_PROC_FS is not set

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
# CONFIG_CHR_DEV_SG is not set
CONFIG_CHR_DEV_SCH=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=y
CONFIG_SCSI_SAS_LIBSAS=y
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SAS_LIBSAS_DEBUG is not set
CONFIG_SCSI_SRP_ATTRS=y
# CONFIG_SCSI_LOWLEVEL is not set
# CONFIG_SCSI_DH is not set
CONFIG_SCSI_OSD_INITIATOR=y
# CONFIG_SCSI_OSD_ULD is not set
CONFIG_SCSI_OSD_DPRINT_SENSE=1
CONFIG_SCSI_OSD_DEBUG=y
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
# CONFIG_ATA_ACPI is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=y
CONFIG_SATA_AHCI_PLATFORM=y
CONFIG_SATA_INIC162X=y
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
CONFIG_PDC_ADMA=y
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=y
CONFIG_SATA_MV=y
CONFIG_SATA_NV=y
CONFIG_SATA_PROMISE=y
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
CONFIG_SATA_SVW=y
CONFIG_SATA_ULI=y
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
CONFIG_PATA_AMD=y
CONFIG_PATA_ARTOP=y
# CONFIG_PATA_ATIIXP is not set
CONFIG_PATA_ATP867X=y
CONFIG_PATA_CMD64X=y
CONFIG_PATA_CS5520=y
# CONFIG_PATA_CS5530 is not set
CONFIG_PATA_CS5535=y
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
CONFIG_PATA_HPT37X=y
# CONFIG_PATA_HPT3X2N is not set
CONFIG_PATA_HPT3X3=y
CONFIG_PATA_HPT3X3_DMA=y
# CONFIG_PATA_IT8213 is not set
CONFIG_PATA_IT821X=y
CONFIG_PATA_JMICRON=y
# CONFIG_PATA_MARVELL is not set
CONFIG_PATA_NETCELL=y
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
CONFIG_PATA_OLDPIIX=y
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
CONFIG_PATA_RADISYS=y
CONFIG_PATA_RDC=y
CONFIG_PATA_SC1200=y
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
CONFIG_PATA_SIL680=y
CONFIG_PATA_SIS=y
# CONFIG_PATA_TOSHIBA is not set
CONFIG_PATA_TRIFLEX=y
CONFIG_PATA_VIA=y
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_ISAPNP is not set
CONFIG_PATA_MPIIX=y
CONFIG_PATA_NS87410=y
# CONFIG_PATA_OPTI is not set
CONFIG_PATA_QDI=y
CONFIG_PATA_RZ1000=y
# CONFIG_PATA_WINBOND_VLB is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_LEGACY is not set
# CONFIG_MD is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=y
# CONFIG_FIREWIRE_OHCI is not set
# CONFIG_FIREWIRE_SBP2 is not set
CONFIG_FIREWIRE_NET=y
CONFIG_FIREWIRE_NOSY=y
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
CONFIG_EQUALIZER=y
# CONFIG_TUN is not set
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_MII=y
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
CONFIG_DAVICOM_PHY=y
# CONFIG_QSEMI_PHY is not set
CONFIG_LXT_PHY=y
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
CONFIG_SMSC_PHY=y
CONFIG_BROADCOM_PHY=y
CONFIG_BCM63XX_PHY=y
# CONFIG_ICPLUS_PHY is not set
CONFIG_REALTEK_PHY=y
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MICREL_PHY is not set
CONFIG_FIXED_PHY=y
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
# CONFIG_HAPPYMEAL is not set
CONFIG_SUNGEM=y
CONFIG_CASSINI=y
# CONFIG_NET_VENDOR_3COM is not set
CONFIG_LANCE=y
CONFIG_NET_VENDOR_SMC=y
CONFIG_ULTRAMCA=y
CONFIG_ULTRA=y
# CONFIG_SMC9194 is not set
# CONFIG_ENC28J60 is not set
CONFIG_ETHOC=y
CONFIG_NET_VENDOR_RACAL=y
CONFIG_NI52=y
CONFIG_NI65=y
# CONFIG_DNET is not set
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
CONFIG_TULIP=y
# CONFIG_TULIP_MWI is not set
# CONFIG_TULIP_MMIO is not set
# CONFIG_TULIP_NAPI is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
CONFIG_DM9102=y
# CONFIG_ULI526X is not set
# CONFIG_AT1700 is not set
CONFIG_DEPCA=y
CONFIG_HP100=y
# CONFIG_NET_ISA is not set
# CONFIG_IBMLANA is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_AC3200 is not set
CONFIG_KSZ884X_PCI=y
# CONFIG_APRICOT is not set
# CONFIG_B44 is not set
CONFIG_FORCEDETH=y
CONFIG_CS89x0=y
CONFIG_E100=y
CONFIG_FEALNX=y
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_8139CP is not set
CONFIG_8139TOO=y
CONFIG_8139TOO_PIO=y
CONFIG_8139TOO_TUNE_TWISTER=y
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=y
CONFIG_EPIC100=y
CONFIG_SMSC9420=y
# CONFIG_SUNDANCE is not set
CONFIG_TLAN=y
# CONFIG_KS8842 is not set
CONFIG_KS8851=y
CONFIG_KS8851_MLL=y
# CONFIG_VIA_RHINE is not set
# CONFIG_SC92031 is not set
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
# CONFIG_ACENIC is not set
CONFIG_DL2K=y
# CONFIG_E1000 is not set
CONFIG_E1000E=y
CONFIG_IP1000=y
CONFIG_IGB=y
# CONFIG_IGBVF is not set
CONFIG_NS83820=y
CONFIG_HAMACHI=y
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=y
CONFIG_SIS190=y
CONFIG_SKGE=y
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=y
# CONFIG_SKY2_DEBUG is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_TIGON3=y
CONFIG_BNX2=y
CONFIG_CNIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
CONFIG_JME=y
# CONFIG_STMMAC_ETH is not set
# CONFIG_PCH_GBE is not set
CONFIG_NETDEV_10000=y
CONFIG_MDIO=y
CONFIG_CHELSIO_T1=y
CONFIG_CHELSIO_T1_1G=y
CONFIG_CHELSIO_T3_DEPENDS=y
# CONFIG_CHELSIO_T3 is not set
CONFIG_CHELSIO_T4_DEPENDS=y
CONFIG_CHELSIO_T4=y
CONFIG_CHELSIO_T4VF_DEPENDS=y
CONFIG_CHELSIO_T4VF=y
CONFIG_ENIC=y
# CONFIG_IXGBE is not set
CONFIG_IXGBEVF=y
# CONFIG_IXGB is not set
CONFIG_S2IO=y
CONFIG_MYRI10GE=y
# CONFIG_NIU is not set
# CONFIG_MLX4_EN is not set
CONFIG_MLX4_CORE=y
CONFIG_MLX4_DEBUG=y
CONFIG_TEHUTI=y
# CONFIG_BNX2X is not set
# CONFIG_QLCNIC is not set
# CONFIG_QLGE is not set
# CONFIG_BNA is not set
CONFIG_SFC=y
CONFIG_BE2NET=y
# CONFIG_TR is not set
CONFIG_WLAN=y
# CONFIG_AIRO is not set
# CONFIG_ATMEL is not set
CONFIG_PRISM54=y
CONFIG_USB_ZD1201=y
# CONFIG_HOSTAP is not set

#
# WiMAX Wireless Broadband devices
#

#
# Enable MMC support to see WiMAX SDIO drivers
#

#
# USB Network Adapters
#
CONFIG_USB_CATC=y
CONFIG_USB_KAWETH=y
CONFIG_USB_PEGASUS=y
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
CONFIG_USB_CDC_PHONET=y
# CONFIG_USB_IPHETH is not set
CONFIG_WAN=y
CONFIG_LANMEDIA=y
CONFIG_HDLC=y
CONFIG_HDLC_RAW=y
# CONFIG_HDLC_RAW_ETH is not set
CONFIG_HDLC_CISCO=y
CONFIG_HDLC_FR=y
# CONFIG_HDLC_PPP is not set
CONFIG_HDLC_X25=y
# CONFIG_PCI200SYN is not set
# CONFIG_WANXL is not set
CONFIG_PC300TOO=y
CONFIG_N2=y
# CONFIG_C101 is not set
CONFIG_FARSYNC=y
# CONFIG_DLCI is not set
# CONFIG_WAN_ROUTER_DRIVERS is not set
# CONFIG_LAPBETHER is not set
# CONFIG_X25_ASY is not set
CONFIG_SBNI=y
# CONFIG_SBNI_MULTILINE is not set

#
# CAIF transport drivers
#
# CONFIG_CAIF_TTY is not set
CONFIG_CAIF_SPI_SLAVE=y
CONFIG_CAIF_SPI_SYNC=y
CONFIG_FDDI=y
CONFIG_DEFXX=y
CONFIG_DEFXX_MMIO=y
CONFIG_SKFP=y
CONFIG_HIPPI=y
# CONFIG_ROADRUNNER is not set
# CONFIG_PPP is not set
CONFIG_SLIP=y
# CONFIG_SLIP_COMPRESSED is not set
CONFIG_SLHC=y
# CONFIG_SLIP_SMART is not set
CONFIG_SLIP_MODE_SLIP6=y
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_VMXNET3 is not set
CONFIG_ISDN=y
CONFIG_ISDN_I4L=y
CONFIG_ISDN_PPP=y
# CONFIG_ISDN_PPP_VJ is not set
# CONFIG_ISDN_MPP is not set
CONFIG_IPPP_FILTER=y
CONFIG_ISDN_PPP_BSDCOMP=y
# CONFIG_ISDN_AUDIO is not set
CONFIG_ISDN_X25=y

#
# ISDN feature submodules
#
# CONFIG_ISDN_DIVERSION is not set

#
# ISDN4Linux hardware drivers
#

#
# Passive cards
#
CONFIG_ISDN_DRV_HISAX=y

#
# D-channel protocol features
#
# CONFIG_HISAX_EURO is not set
CONFIG_HISAX_1TR6=y
CONFIG_HISAX_NI1=y
CONFIG_HISAX_MAX_CARDS=8

#
# HiSax supported cards
#
# CONFIG_HISAX_16_0 is not set
CONFIG_HISAX_16_3=y
# CONFIG_HISAX_TELESPCI is not set
# CONFIG_HISAX_S0BOX is not set
# CONFIG_HISAX_AVM_A1 is not set
CONFIG_HISAX_FRITZPCI=y
CONFIG_HISAX_AVM_A1_PCMCIA=y
# CONFIG_HISAX_ELSA is not set
CONFIG_HISAX_IX1MICROR2=y
CONFIG_HISAX_DIEHLDIVA=y
# CONFIG_HISAX_ASUSCOM is not set
CONFIG_HISAX_TELEINT=y
CONFIG_HISAX_HFCS=y
# CONFIG_HISAX_SEDLBAUER is not set
# CONFIG_HISAX_SPORTSTER is not set
CONFIG_HISAX_MIC=y
CONFIG_HISAX_NETJET=y
# CONFIG_HISAX_NETJET_U is not set
CONFIG_HISAX_NICCY=y
# CONFIG_HISAX_ISURF is not set
# CONFIG_HISAX_HSTSAPHIR is not set
# CONFIG_HISAX_BKM_A4T is not set
# CONFIG_HISAX_SCT_QUADRO is not set
CONFIG_HISAX_GAZEL=y
# CONFIG_HISAX_HFC_PCI is not set
# CONFIG_HISAX_W6692 is not set
# CONFIG_HISAX_HFC_SX is not set
# CONFIG_HISAX_ENTERNOW_PCI is not set
CONFIG_HISAX_DEBUG=y

#
# HiSax PCMCIA card service modules
#

#
# HiSax sub driver modules
#
# CONFIG_HISAX_ST5481 is not set
# CONFIG_HISAX_HFCUSB is not set
# CONFIG_HISAX_HFC4S8S is not set
CONFIG_HISAX_FRITZ_PCIPNP=y

#
# Active cards
#
# CONFIG_ISDN_DRV_ICN is not set
# CONFIG_ISDN_DRV_PCBIT is not set
# CONFIG_ISDN_DRV_SC is not set
# CONFIG_ISDN_DRV_ACT2000 is not set
# CONFIG_ISDN_CAPI is not set
CONFIG_ISDN_DRV_GIGASET=y
CONFIG_GIGASET_I4L=y
# CONFIG_GIGASET_DUMMYLL is not set
CONFIG_GIGASET_BASE=y
# CONFIG_GIGASET_M105 is not set
# CONFIG_GIGASET_M101 is not set
# CONFIG_GIGASET_DEBUG is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y
# CONFIG_INPUT_SPARSEKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ADP5520=y
CONFIG_KEYBOARD_ADP5588=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_KEYBOARD_QT2160=y
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_TCA6416 is not set
CONFIG_KEYBOARD_LM8323=y
CONFIG_KEYBOARD_MAX7359=y
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_KEYBOARD_OPENCORES=y
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_KEYBOARD_SUNKBD=y
CONFIG_KEYBOARD_XTKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_SENTELIC is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_SERIAL=y
# CONFIG_MOUSE_APPLETOUCH is not set
CONFIG_MOUSE_BCM5974=y
CONFIG_MOUSE_INPORT=y
CONFIG_MOUSE_ATIXL=y
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_INPUT_JOYSTICK is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_USB_ACECAD is not set
# CONFIG_TABLET_USB_AIPTEK is not set
# CONFIG_TABLET_USB_GTCO is not set
# CONFIG_TABLET_USB_HANWANG is not set
# CONFIG_TABLET_USB_KBTAB is not set
# CONFIG_TABLET_USB_WACOM is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=y
CONFIG_SERIO_ALTERA_PS2=y
CONFIG_SERIO_PS2MULT=y
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=y
CONFIG_GAMEPORT_L4=y
CONFIG_GAMEPORT_EMU10K1=y
# CONFIG_GAMEPORT_FM801 is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
CONFIG_COMPUTONE=y
CONFIG_ROCKETPORT=y
# CONFIG_CYCLADES is not set
# CONFIG_DIGIEPCA is not set
CONFIG_MOXA_INTELLIO=y
# CONFIG_MOXA_SMARTIO is not set
CONFIG_ISI=y
CONFIG_SYNCLINK=y
CONFIG_SYNCLINKMP=y
# CONFIG_SYNCLINK_GT is not set
# CONFIG_N_HDLC is not set
CONFIG_N_GSM=y
CONFIG_RISCOM8=y
CONFIG_SPECIALIX=y
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_MCA is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_MAX3100=y
CONFIG_SERIAL_MAX3107=y
# CONFIG_SERIAL_MFD_HSU is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=y
# CONFIG_SERIAL_TIMBERDALE is not set
CONFIG_SERIAL_ALTERA_JTAGUART=y
CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE=y
# CONFIG_SERIAL_ALTERA_JTAGUART_CONSOLE_BYPASS is not set
# CONFIG_SERIAL_ALTERA_UART is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_IPMI_HANDLER=y
CONFIG_IPMI_PANIC_EVENT=y
# CONFIG_IPMI_PANIC_STRING is not set
# CONFIG_IPMI_DEVICE_INTERFACE is not set
CONFIG_IPMI_SI=y
# CONFIG_IPMI_WATCHDOG is not set
# CONFIG_IPMI_POWEROFF is not set
# CONFIG_HW_RANDOM is not set
CONFIG_NVRAM=y
CONFIG_DTLK=y
CONFIG_R3964=y
CONFIG_APPLICOM=y
# CONFIG_SONYPI is not set
CONFIG_MWAVE=y
CONFIG_PC8736x_GPIO=y
CONFIG_NSC_GPIO=y
# CONFIG_CS5535_GPIO is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HANGCHECK_TIMER is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_NSC is not set
CONFIG_TCG_ATMEL=y
CONFIG_TCG_INFINEON=y
CONFIG_TELCLOCK=y
CONFIG_DEVPORT=y
# CONFIG_RAMOOPS is not set
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=y
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
CONFIG_I2C_ALI1563=y
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=y
# CONFIG_I2C_AMD8111 is not set
CONFIG_I2C_I801=y
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
CONFIG_I2C_SCMI=y

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_INTEL_MID is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_SIMTEC is not set
CONFIG_I2C_XILINX=y

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT_LIGHT=y
CONFIG_I2C_TAOS_EVM=y
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_PCA_ISA is not set
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_DEBUG_CORE is not set
CONFIG_I2C_DEBUG_ALGO=y
# CONFIG_I2C_DEBUG_BUS is not set
CONFIG_SPI=y
CONFIG_SPI_DEBUG=y
CONFIG_SPI_MASTER=y

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_BITBANG is not set
# CONFIG_SPI_TOPCLIFF_PCH is not set
# CONFIG_SPI_XILINX is not set
CONFIG_SPI_DESIGNWARE=y
# CONFIG_SPI_DW_PCI is not set

#
# SPI Protocol Masters
#
CONFIG_SPI_SPIDEV=y
CONFIG_SPI_TLE62X0=y

#
# PPS support
#
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
CONFIG_PPS_CLIENT_KTIMER=y
CONFIG_PPS_CLIENT_LDISC=y
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
CONFIG_W1=y

#
# 1-wire Bus Masters
#
# CONFIG_W1_MASTER_MATROX is not set
CONFIG_W1_MASTER_DS2490=y
CONFIG_W1_MASTER_DS2482=y

#
# 1-wire Slaves
#
# CONFIG_W1_SLAVE_THERM is not set
# CONFIG_W1_SLAVE_SMEM is not set
# CONFIG_W1_SLAVE_DS2431 is not set
CONFIG_W1_SLAVE_DS2433=y
# CONFIG_W1_SLAVE_DS2433_CRC is not set
CONFIG_W1_SLAVE_DS2760=y
CONFIG_W1_SLAVE_BQ27000=y
CONFIG_POWER_SUPPLY=y
CONFIG_POWER_SUPPLY_DEBUG=y
# CONFIG_PDA_POWER is not set
CONFIG_TEST_POWER=y
CONFIG_BATTERY_DS2760=y
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_BQ20Z75 is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=y
CONFIG_HWMON_DEBUG_CHIP=y

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
CONFIG_SENSORS_ABITUGURU3=y
CONFIG_SENSORS_AD7414=y
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADCXX=y
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
CONFIG_SENSORS_ADM1029=y
# CONFIG_SENSORS_ADM1031 is not set
CONFIG_SENSORS_ADM9240=y
# CONFIG_SENSORS_ADT7411 is not set
CONFIG_SENSORS_ADT7462=y
CONFIG_SENSORS_ADT7470=y
# CONFIG_SENSORS_ADT7475 is not set
CONFIG_SENSORS_ASC7621=y
CONFIG_SENSORS_K8TEMP=y
# CONFIG_SENSORS_K10TEMP is not set
CONFIG_SENSORS_ASB100=y
# CONFIG_SENSORS_ATXP1 is not set
CONFIG_SENSORS_DS1621=y
CONFIG_SENSORS_I5K_AMB=y
CONFIG_SENSORS_F71805F=y
CONFIG_SENSORS_F71882FG=y
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHMD=y
# CONFIG_SENSORS_G760A is not set
CONFIG_SENSORS_GL518SM=y
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_CORETEMP=y
CONFIG_SENSORS_PKGTEMP=y
CONFIG_SENSORS_IBMAEM=y
# CONFIG_SENSORS_IBMPEX is not set
CONFIG_SENSORS_IT87=y
CONFIG_SENSORS_JC42=y
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM70 is not set
CONFIG_SENSORS_LM73=y
CONFIG_SENSORS_LM75=y
CONFIG_SENSORS_LM77=y
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
CONFIG_SENSORS_LM83=y
# CONFIG_SENSORS_LM85 is not set
CONFIG_SENSORS_LM87=y
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_LM95241 is not set
CONFIG_SENSORS_MAX1111=y
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
CONFIG_SENSORS_PC87427=y
CONFIG_SENSORS_PCF8591=y
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SMM665 is not set
CONFIG_SENSORS_DME1737=y
# CONFIG_SENSORS_EMC1403 is not set
CONFIG_SENSORS_EMC2103=y
CONFIG_SENSORS_SMSC47M1=y
# CONFIG_SENSORS_SMSC47M192 is not set
CONFIG_SENSORS_SMSC47B397=y
CONFIG_SENSORS_ADS7828=y
# CONFIG_SENSORS_ADS7871 is not set
# CONFIG_SENSORS_AMC6821 is not set
CONFIG_SENSORS_THMC50=y
# CONFIG_SENSORS_TMP102 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_VIA_CPUTEMP is not set
CONFIG_SENSORS_VIA686A=y
# CONFIG_SENSORS_VT1211 is not set
CONFIG_SENSORS_VT8231=y
CONFIG_SENSORS_W83781D=y
CONFIG_SENSORS_W83791D=y
CONFIG_SENSORS_W83792D=y
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
CONFIG_SENSORS_W83L785TS=y
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_LIS3_I2C is not set
CONFIG_SENSORS_APPLESMC=y

#
# ACPI drivers
#
CONFIG_SENSORS_ATK0110=y
# CONFIG_SENSORS_LIS3LV02D is not set
CONFIG_THERMAL=y
CONFIG_THERMAL_HWMON=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
# CONFIG_SSB_DRIVER_PCICORE is not set
CONFIG_MFD_SUPPORT=y
CONFIG_MFD_CORE=y
CONFIG_MFD_88PM860X=y
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_TPS6507X is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_MFD_STMPE is not set
CONFIG_MFD_TC35892=y
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
CONFIG_PMIC_ADP5520=y
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8998 is not set
CONFIG_MFD_WM8400=y
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_MC13XXX is not set
CONFIG_ABX500_CORE=y
# CONFIG_AB3100_CORE is not set
# CONFIG_EZX_PCAP is not set
CONFIG_AB3550_CORE=y
CONFIG_LPC_SCH=y
CONFIG_MFD_RDC321X=y
CONFIG_MFD_JANZ_CMODIO=y
# CONFIG_MFD_VX855 is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_ALI=y
# CONFIG_AGP_ATI is not set
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
# CONFIG_AGP_INTEL is not set
CONFIG_AGP_NVIDIA=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_SWORKS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_AGP_EFFICEON is not set
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=y
CONFIG_DRM_TTM=y
# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=y
CONFIG_DRM_RADEON=y
CONFIG_DRM_MGA=y
CONFIG_DRM_SIS=y
CONFIG_DRM_VIA=y
# CONFIG_DRM_SAVAGE is not set
# CONFIG_STUB_POULSBO is not set
CONFIG_VGASTATE=y
CONFIG_VIDEO_OUTPUT_CONTROL=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=y
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_HECUBA=y
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
CONFIG_FB_PM2=y
CONFIG_FB_PM2_FIFO_DISCONNECT=y
# CONFIG_FB_CYBER2000 is not set
CONFIG_FB_ARC=y
# CONFIG_FB_ASILIANT is not set
CONFIG_FB_IMSTT=y
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_VESA is not set
CONFIG_FB_EFI=y
CONFIG_FB_N411=y
# CONFIG_FB_HGA is not set
CONFIG_FB_S1D13XXX=y
# CONFIG_FB_NVIDIA is not set
CONFIG_FB_RIVA=y
CONFIG_FB_RIVA_I2C=y
CONFIG_FB_RIVA_DEBUG=y
CONFIG_FB_RIVA_BACKLIGHT=y
# CONFIG_FB_LE80578 is not set
CONFIG_FB_MATROX=y
CONFIG_FB_MATROX_MILLENIUM=y
# CONFIG_FB_MATROX_MYSTIQUE is not set
CONFIG_FB_MATROX_G=y
CONFIG_FB_MATROX_I2C=y
# CONFIG_FB_MATROX_MAVEN is not set
# CONFIG_FB_RADEON is not set
CONFIG_FB_ATY128=y
# CONFIG_FB_ATY128_BACKLIGHT is not set
CONFIG_FB_ATY=y
# CONFIG_FB_ATY_CT is not set
CONFIG_FB_ATY_GX=y
# CONFIG_FB_ATY_BACKLIGHT is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
CONFIG_FB_NEOMAGIC=y
# CONFIG_FB_KYRO is not set
CONFIG_FB_3DFX=y
CONFIG_FB_3DFX_ACCEL=y
# CONFIG_FB_3DFX_I2C is not set
CONFIG_FB_VOODOO1=y
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
CONFIG_FB_PM3=y
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
CONFIG_FB_TMIO=y
CONFIG_FB_TMIO_ACCELL=y
# CONFIG_FB_VIRTUAL is not set
CONFIG_FB_METRONOME=y
CONFIG_FB_MB862XX=y
CONFIG_FB_MB862XX_PCI_GDC=y
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
CONFIG_BACKLIGHT_PROGEAR=y
# CONFIG_BACKLIGHT_MBP_NVIDIA is not set
CONFIG_BACKLIGHT_SAHARA=y
CONFIG_BACKLIGHT_ADP5520=y
# CONFIG_BACKLIGHT_ADP8860 is not set
CONFIG_BACKLIGHT_88PM860X=y

#
# Display device support
#
CONFIG_DISPLAY_SUPPORT=y

#
# Display hardware drivers
#

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
# CONFIG_FONT_8x16 is not set
CONFIG_FONT_6x11=y
# CONFIG_FONT_7x14 is not set
CONFIG_FONT_PEARL_8x8=y
# CONFIG_FONT_ACORN_8x8 is not set
CONFIG_FONT_MINI_4x6=y
CONFIG_FONT_SUN8x16=y
# CONFIG_FONT_SUN12x22 is not set
CONFIG_FONT_10x18=y
# CONFIG_LOGO is not set
# CONFIG_SOUND is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HIDRAW=y

#
# USB Input Devices
#
# CONFIG_USB_HID is not set
# CONFIG_HID_PID is not set

#
# Special HID drivers
#
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
CONFIG_USB_DEBUG=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
# CONFIG_USB_DEVICEFS is not set
# CONFIG_USB_DEVICE_CLASS is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set
# CONFIG_USB_MON is not set
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_U132_HCD is not set
# CONFIG_USB_SL811_HCD is not set
CONFIG_USB_R8A66597_HCD=y
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
CONFIG_USB_WDM=y
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
# CONFIG_USB_STORAGE is not set
# CONFIG_USB_UAS is not set
CONFIG_USB_LIBUSUAL=y

#
# USB Imaging devices
#
CONFIG_USB_MDC800=y
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
CONFIG_USB_EMI26=y
CONFIG_USB_ADUTUX=y
# CONFIG_USB_SEVSEG is not set
CONFIG_USB_RIO500=y
CONFIG_USB_LEGOTOWER=y
CONFIG_USB_LCD=y
CONFIG_USB_LED=y
# CONFIG_USB_CYPRESS_CY7C63 is not set
CONFIG_USB_CYTHERM=y
CONFIG_USB_IDMOUSE=y
CONFIG_USB_FTDI_ELAN=y
CONFIG_USB_APPLEDISPLAY=y
# CONFIG_USB_SISUSBVGA is not set
CONFIG_USB_LD=y
CONFIG_USB_TRANCEVIBRATOR=y
# CONFIG_USB_IOWARRIOR is not set
CONFIG_USB_TEST=y
CONFIG_USB_ISIGHTFW=y
CONFIG_USB_YUREX=y
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
CONFIG_LEDS_88PM860X=y
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
CONFIG_LEDS_LP3944=y
CONFIG_LEDS_LP5521=y
CONFIG_LEDS_LP5523=y
CONFIG_LEDS_CLEVO_MAIL=y
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_DAC124S085 is not set
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_ADP5520=y
CONFIG_LEDS_TRIGGERS=y

#
# LED Triggers
#
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_BACKLIGHT=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_ACCESSIBILITY=y
# CONFIG_A11Y_BRAILLE_CONSOLE is not set
CONFIG_INFINIBAND=y
CONFIG_INFINIBAND_USER_MAD=y
# CONFIG_INFINIBAND_USER_ACCESS is not set
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_AMSO1100=y
CONFIG_INFINIBAND_AMSO1100_DEBUG=y
# CONFIG_INFINIBAND_CXGB4 is not set
CONFIG_MLX4_INFINIBAND=y
# CONFIG_INFINIBAND_NES is not set
# CONFIG_INFINIBAND_IPOIB is not set
CONFIG_INFINIBAND_SRP=y
# CONFIG_INFINIBAND_ISER is not set
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=y
CONFIG_EDAC_MCE_INJ=y
# CONFIG_EDAC_MM_EDAC is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
# CONFIG_RTC_HCTOSYS is not set
# CONFIG_RTC_DEBUG is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
CONFIG_RTC_DRV_TEST=y

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
CONFIG_RTC_DRV_DS1672=y
CONFIG_RTC_DRV_DS3232=y
CONFIG_RTC_DRV_MAX6900=y
# CONFIG_RTC_DRV_RS5C372 is not set
CONFIG_RTC_DRV_ISL1208=y
# CONFIG_RTC_DRV_ISL12022 is not set
CONFIG_RTC_DRV_X1205=y
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
CONFIG_RTC_DRV_M41T80=y
CONFIG_RTC_DRV_M41T80_WDT=y
# CONFIG_RTC_DRV_BQ32K is not set
CONFIG_RTC_DRV_S35390A=y
# CONFIG_RTC_DRV_FM3130 is not set
CONFIG_RTC_DRV_RX8581=y
# CONFIG_RTC_DRV_RX8025 is not set

#
# SPI RTC drivers
#
CONFIG_RTC_DRV_M41T94=y
# CONFIG_RTC_DRV_DS1305 is not set
CONFIG_RTC_DRV_DS1390=y
# CONFIG_RTC_DRV_MAX6902 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RS5C348=y
# CONFIG_RTC_DRV_DS3234 is not set
# CONFIG_RTC_DRV_PCF2123 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=y
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
CONFIG_RTC_DRV_DS1742=y
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
CONFIG_RTC_DRV_MSM6242=y
CONFIG_RTC_DRV_BQ4802=y
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set

#
# on-CPU RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_INTEL_MID_DMAC=y
# CONFIG_INTEL_IOATDMA is not set
# CONFIG_TIMB_DMA is not set
# CONFIG_PCH_DMA is not set
CONFIG_DMA_ENGINE=y

#
# DMA Clients
#
CONFIG_NET_DMA=y
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=y
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=y
CONFIG_UIO_CIF=y
CONFIG_UIO_PDRV=y
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
CONFIG_UIO_PCI_GENERIC=y
CONFIG_UIO_NETX=y
# CONFIG_STAGING is not set
# CONFIG_X86_PLATFORM_DEVICES is not set

#
# Firmware Drivers
#
CONFIG_EDD=y
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_EFI_VARS=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
# CONFIG_DMIID is not set
CONFIG_ISCSI_IBFT_FIND=y

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_EXT4_DEBUG is not set
CONFIG_FS_XIP=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=y
# CONFIG_JFS_POSIX_ACL is not set
CONFIG_JFS_SECURITY=y
CONFIG_JFS_DEBUG=y
# CONFIG_JFS_STATISTICS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_XFS_FS=y
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_POSIX_ACL is not set
# CONFIG_XFS_RT is not set
# CONFIG_XFS_DEBUG is not set
# CONFIG_OCFS2_FS is not set
CONFIG_BTRFS_FS=y
# CONFIG_BTRFS_FS_POSIX_ACL is not set
CONFIG_NILFS2_FS=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
# CONFIG_QUOTA is not set
# CONFIG_QUOTACTL is not set
CONFIG_AUTOFS4_FS=y
# CONFIG_FUSE_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
# CONFIG_JOLIET is not set
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
# CONFIG_TMPFS is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
CONFIG_HFS_FS=y
# CONFIG_HFSPLUS_FS is not set
CONFIG_BEFS_FS=y
CONFIG_BEFS_DEBUG=y
CONFIG_BFS_FS=y
CONFIG_EFS_FS=y
CONFIG_LOGFS=y
# CONFIG_CRAMFS is not set
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
# CONFIG_SQUASHFS_LZO is not set
CONFIG_SQUASHFS_EMBEDDED=y
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
CONFIG_VXFS_FS=y
CONFIG_MINIX_FS=y
# CONFIG_OMFS_FS is not set
CONFIG_HPFS_FS=y
CONFIG_QNX4FS_FS=y
# CONFIG_ROMFS_FS is not set
CONFIG_SYSV_FS=y
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_CEPH_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
CONFIG_AFS_FS=y
# CONFIG_AFS_DEBUG is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
CONFIG_KARMA_PARTITION=y
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=y
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=y
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
CONFIG_NLS_CODEPAGE_862=y
CONFIG_NLS_CODEPAGE_863=y
CONFIG_NLS_CODEPAGE_864=y
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=y
# CONFIG_NLS_CODEPAGE_936 is not set
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=y
CONFIG_NLS_ISO8859_8=y
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
CONFIG_NLS_ISO8859_3=y
# CONFIG_NLS_ISO8859_4 is not set
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
CONFIG_NLS_ISO8859_14=y
# CONFIG_NLS_ISO8859_15 is not set
CONFIG_NLS_KOI8_R=y
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set
CONFIG_DLM=y
CONFIG_DLM_DEBUG=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
CONFIG_MAGIC_SYSRQ=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=1
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHEDSTATS is not set
CONFIG_TIMER_STATS=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
# CONFIG_DEBUG_OBJECTS_FREE is not set
# CONFIG_DEBUG_OBJECTS_TIMERS is not set
# CONFIG_DEBUG_OBJECTS_WORK is not set
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
# CONFIG_SLUB_DEBUG_ON is not set
CONFIG_SLUB_STATS=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_BKL=y
CONFIG_DEBUG_LOCK_ALLOC=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_LOCKDEP=y
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_HIGHMEM is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_WRITECOUNT=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
CONFIG_TEST_LIST_SORT=y
CONFIG_DEBUG_SG=y
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_DEBUG_CREDENTIALS=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
CONFIG_BOOT_PRINTK_DELAY=y
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
CONFIG_BACKTRACE_SELF_TEST=y
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# CONFIG_LKDTM is not set
# CONFIG_CPU_NOTIFIER_ERROR_INJECT is not set
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
CONFIG_FAIL_PAGE_ALLOC=y
# CONFIG_FAIL_MAKE_REQUEST is not set
# CONFIG_FAIL_IO_TIMEOUT is not set
# CONFIG_FAULT_INJECTION_DEBUG_FS is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_PREEMPT_TRACER=y
# CONFIG_SCHED_TRACER is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_STACK_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_DYNAMIC_FTRACE is not set
CONFIG_FUNCTION_PROFILER=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
CONFIG_DYNAMIC_DEBUG=y
# CONFIG_DMA_API_DEBUG is not set
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_SAMPLES=y
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_MRST is not set
CONFIG_EARLY_PRINTK_DBGP=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_X86_PTDUMP=y
# CONFIG_DEBUG_RODATA is not set
CONFIG_DOUBLEFAULT=y
CONFIG_IOMMU_STRESS=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
CONFIG_CPA_DEBUG=y
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
# CONFIG_SECURITY_NETWORK is not set
CONFIG_SECURITY_PATH=y
# CONFIG_INTEL_TXT is not set
CONFIG_SECURITY_TOMOYO=y
# CONFIG_SECURITY_APPARMOR is not set
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_AUDIT=y
CONFIG_DEFAULT_SECURITY_TOMOYO=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_DEFAULT_SECURITY="tomoyo"
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=y

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_SEQIV=y

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=y
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
CONFIG_CRYPTO_PCBC=y
CONFIG_CRYPTO_XTS=y

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
CONFIG_CRYPTO_RMD320=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_TGR192=y
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_586=y
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
CONFIG_CRYPTO_CAMELLIA=y
CONFIG_CRYPTO_CAST5=y
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_DES=y
CONFIG_CRYPTO_FCRYPT=y
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_586 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
CONFIG_CRYPTO_TEA=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_TWOFISH_COMMON=y
# CONFIG_CRYPTO_TWOFISH_586 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_ZLIB=y
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=y
# CONFIG_CRYPTO_HW is not set
CONFIG_HAVE_KVM=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=y
CONFIG_TEXTSEARCH_BM=y
CONFIG_TEXTSEARCH_FSM=y
CONFIG_BTREE=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_NLATTR=y

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-21 13:37 [PATCH v4] sched: automated per session task groups Ingo Molnar
@ 2010-11-21 13:39 ` Ingo Molnar
  2010-11-21 15:44   ` Oleg Nesterov
  2010-11-21 16:15 ` Mike Galbraith
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2010-11-21 13:39 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML


Btw., there's a small cleanup in the patch that i picked up (see below), and i also 
edited the commit log a bit - so you might want to pick up the version below.

	Ingo

--------------->
>From 741430a6af2bdefe3d017226bbcfe96f9ed46b58 Mon Sep 17 00:00:00 2001
From: Mike Galbraith <efault@gmx.de>
Date: Sat, 20 Nov 2010 12:35:00 -0700
Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups

A recurring complaint from CFS users is that parallel kbuild has
a negative impact on desktop interactivity.  This patch
implements an idea from Linus, to automatically create task
groups.  This patch only per session autogroups, but leaves the
way open for enhancement.

Implementation: each task's signal struct contains an inherited
pointer to a refcounted autogroup struct containing a task group
pointer, the default for all tasks pointing to the
init_task_group.  When a task calls setsid(), the process wide
reference to the default group is dropped, a new task group is
created, and the process is moved into the new task group.
Children thereafter inherit this task group, and increase it's
refcount.  On exit, a reference to the current task group is
dropped when the last reference to each signal struct is
dropped.  The task group is destroyed when the last signal
struct referencing it is freed.

At runqueue selection time, IFF a task has no cgroup assignment, its
current autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level
through the proc filesystem.  cat /proc/<pid>/autogroup displays
the task's group and the group's nice level.

 echo <nice level> > /proc/<pid>/autogroup

Sets the task group's shares to the weight of nice <level> task.
Setting nice level is rate limited for !admin users due to the abuse
risk of task group locking.

The feature is enabled from boot by default if
CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via the
boot option noautogroup, and can be also be turned on/off on the
fly via..    echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
..which will automatically move tasks to/from the root task
group.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kernel-parameters.txt |    2 +
 fs/proc/base.c                      |   79 ++++++++++++
 include/linux/sched.h               |   23 ++++
 init/Kconfig                        |   12 ++
 kernel/fork.c                       |    5 +-
 kernel/sched.c                      |   13 ++-
 kernel/sched_autogroup.c            |  240 +++++++++++++++++++++++++++++++++++
 kernel/sched_autogroup.h            |   23 ++++
 kernel/sched_debug.c                |   29 ++--
 kernel/sys.c                        |    4 +-
 kernel/sysctl.c                     |   11 ++
 11 files changed, 423 insertions(+), 18 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 92e83e5..86820a7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1622,6 +1622,8 @@ and is between 256 and 4096 characters. It is defined in the file
 	noapic		[SMP,APIC] Tells the kernel to not make use of any
 			IOAPICs that may be present in the system.
 
+	noautogroup	Disable scheduler automatic task group creation.
+
 	nobats		[PPC] Do not use BATs for mapping kernel lowmem
 			on "Classic" PPC cores.
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f3d02ca..2fa0ce2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1407,6 +1407,82 @@ static const struct file_operations proc_pid_sched_operations = {
 
 #endif
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+/*
+ * Print out autogroup related information:
+ */
+static int sched_autogroup_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *p;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+	proc_sched_autogroup_show_task(p, m);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static ssize_t
+sched_autogroup_write(struct file *file, const char __user *buf,
+	    size_t count, loff_t *offset)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct task_struct *p;
+	char buffer[PROC_NUMBUF];
+	long nice;
+	int err;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	err = strict_strtol(strstrip(buffer), 0, &nice);
+	if (err)
+		return -EINVAL;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+
+	err = nice;
+	err = proc_sched_autogroup_set_nice(p, &err);
+	if (err)
+		count = err;
+
+	put_task_struct(p);
+
+	return count;
+}
+
+static int sched_autogroup_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	ret = single_open(filp, sched_autogroup_show, NULL);
+	if (!ret) {
+		struct seq_file *m = filp->private_data;
+
+		m->private = inode;
+	}
+	return ret;
+}
+
+static const struct file_operations proc_pid_sched_autogroup_operations = {
+	.open		= sched_autogroup_open,
+	.read		= seq_read,
+	.write		= sched_autogroup_write,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
+
 static ssize_t comm_write(struct file *file, const char __user *buf,
 				size_t count, loff_t *offset)
 {
@@ -2733,6 +2809,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
+#ifdef CONFIG_SCHED_AUTOGROUP
+	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
+#endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	INF("syscall",    S_IRUSR, proc_pid_syscall),
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 840f127..bc6dca5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -509,6 +509,8 @@ struct thread_group_cputimer {
 	spinlock_t lock;
 };
 
+struct autogroup;
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
@@ -576,6 +578,9 @@ struct signal_struct {
 
 	struct tty_struct *tty; /* NULL if no tty */
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 	/*
 	 * Cumulative resource counters for dead threads in the group,
 	 * and for reaped dead child processes forked by this group.
@@ -1926,6 +1931,24 @@ int sched_rt_handler(struct ctl_table *table, int write,
 
 extern unsigned int sysctl_sched_compat_yield;
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+extern unsigned int sysctl_sched_autogroup_enabled;
+
+extern void sched_autogroup_create_attach(struct task_struct *p);
+extern void sched_autogroup_detach(struct task_struct *p);
+extern void sched_autogroup_fork(struct signal_struct *sig);
+extern void sched_autogroup_exit(struct signal_struct *sig);
+#ifdef CONFIG_PROC_FS
+extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
+extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
+#endif
+#else
+static inline void sched_autogroup_create_attach(struct task_struct *p) { }
+static inline void sched_autogroup_detach(struct task_struct *p) { }
+static inline void sched_autogroup_fork(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+#endif
+
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
diff --git a/init/Kconfig b/init/Kconfig
index 88c1046..f6f44d0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -728,6 +728,18 @@ config NET_NS
 
 endif # NAMESPACES
 
+config SCHED_AUTOGROUP
+	bool "Automatic process group scheduling"
+	select CGROUPS
+	select CGROUP_SCHED
+	select FAIR_GROUP_SCHED
+	help
+	  This option optimizes the scheduler for common desktop workloads by
+	  automatically creating and populating task groups.  This separation
+	  of workloads isolates aggressive CPU burners (like build jobs) from
+	  desktop applications.  Task group autogeneration is currently based
+	  upon task session.
+
 config MM_OWNER
 	bool
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 3b159c5..b6f2475 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -174,8 +174,10 @@ static inline void free_signal_struct(struct signal_struct *sig)
 
 static inline void put_signal_struct(struct signal_struct *sig)
 {
-	if (atomic_dec_and_test(&sig->sigcnt))
+	if (atomic_dec_and_test(&sig->sigcnt)) {
+		sched_autogroup_exit(sig);
 		free_signal_struct(sig);
+	}
 }
 
 void __put_task_struct(struct task_struct *tsk)
@@ -904,6 +906,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	posix_cpu_timers_init_group(sig);
 
 	tty_audit_fork(sig);
+	sched_autogroup_fork(sig);
 
 	sig->oom_adj = current->signal->oom_adj;
 	sig->oom_score_adj = current->signal->oom_score_adj;
diff --git a/kernel/sched.c b/kernel/sched.c
index 550cf3a..2bc19cb 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -78,6 +78,7 @@
 
 #include "sched_cpupri.h"
 #include "workqueue_sched.h"
+#include "sched_autogroup.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/sched.h>
@@ -270,6 +271,10 @@ struct task_group {
 	struct task_group *parent;
 	struct list_head siblings;
 	struct list_head children;
+
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 };
 
 #define root_task_group init_task_group
@@ -612,11 +617,14 @@ static inline int cpu_of(struct rq *rq)
  */
 static inline struct task_group *task_group(struct task_struct *p)
 {
+	struct task_group *tg;
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
 			lockdep_is_held(&task_rq(p)->lock));
-	return container_of(css, struct task_group, css);
+	tg = container_of(css, struct task_group, css);
+
+	return autogroup_task_group(p, tg);
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
@@ -1878,6 +1886,7 @@ static void sched_irq_time_avg_update(struct rq *rq, u64 curr_irq_time) { }
 #include "sched_idletask.c"
 #include "sched_fair.c"
 #include "sched_rt.c"
+#include "sched_autogroup.c"
 #include "sched_stoptask.c"
 #ifdef CONFIG_SCHED_DEBUG
 # include "sched_debug.c"
@@ -7734,7 +7743,7 @@ void __init sched_init(void)
 #ifdef CONFIG_CGROUP_SCHED
 	list_add(&init_task_group.list, &task_groups);
 	INIT_LIST_HEAD(&init_task_group.children);
-
+	autogroup_init(&init_task);
 #endif /* CONFIG_CGROUP_SCHED */
 
 	for_each_possible_cpu(i) {
diff --git a/kernel/sched_autogroup.c b/kernel/sched_autogroup.c
new file mode 100644
index 0000000..2bd4020
--- /dev/null
+++ b/kernel/sched_autogroup.c
@@ -0,0 +1,240 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/utsname.h>
+
+unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
+
+struct autogroup {
+	struct task_group	*tg;
+	struct kref		kref;
+	struct rw_semaphore	lock;
+	unsigned long		id;
+	int			nice;
+};
+
+static struct autogroup autogroup_default;
+static atomic_t autogroup_seq_nr;
+
+static void autogroup_init(struct task_struct *init_task)
+{
+	autogroup_default.tg = &init_task_group;
+	init_task_group.autogroup = &autogroup_default;
+	kref_init(&autogroup_default.kref);
+	init_rwsem(&autogroup_default.lock);
+	init_task->signal->autogroup = &autogroup_default;
+}
+
+static inline void autogroup_destroy(struct kref *kref)
+{
+	struct autogroup *ag = container_of(kref, struct autogroup, kref);
+	struct task_group *tg = ag->tg;
+
+	kfree(ag);
+	sched_destroy_group(tg);
+}
+
+static inline void autogroup_kref_put(struct autogroup *ag)
+{
+	kref_put(&ag->kref, autogroup_destroy);
+}
+
+static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
+{
+	kref_get(&ag->kref);
+	return ag;
+}
+
+static inline struct autogroup *autogroup_create(void)
+{
+	struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+
+	if (!ag)
+		goto out_fail;
+
+	ag->tg = sched_create_group(&init_task_group);
+
+	if (IS_ERR(ag->tg))
+		goto out_fail;
+
+	ag->tg->autogroup = ag;
+	kref_init(&ag->kref);
+	init_rwsem(&ag->lock);
+	ag->id = atomic_inc_return(&autogroup_seq_nr);
+
+	return ag;
+
+out_fail:
+	kfree(ag);
+	WARN_ON_ONCE(1);
+
+	return autogroup_kref_get(&autogroup_default);
+}
+
+static inline bool
+task_wants_autogroup(struct task_struct *p, struct task_group *tg)
+{
+	if (tg != &root_task_group)
+		return false;
+
+	if (p->sched_class != &fair_sched_class)
+		return false;
+
+	/*
+	 * We can only assume the task group can't go away on us if
+	 * autogroup_move_group() can see us on ->thread_group list.
+	 */
+	if (p->flags & PF_EXITING)
+		return false;
+
+	return true;
+}
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
+
+	if (enabled && task_wants_autogroup(p, tg))
+		return p->signal->autogroup->tg;
+
+	return tg;
+}
+
+static void
+autogroup_move_group(struct task_struct *p, struct autogroup *ag)
+{
+	struct autogroup *prev;
+	struct task_struct *t;
+
+	spin_lock(&p->sighand->siglock);
+
+	prev = p->signal->autogroup;
+	if (prev == ag) {
+		spin_unlock(&p->sighand->siglock);
+		return;
+	}
+
+	p->signal->autogroup = autogroup_kref_get(ag);
+	t = p;
+
+	do {
+		sched_move_task(p);
+	} while_each_thread(p, t);
+
+	spin_unlock(&p->sighand->siglock);
+
+	autogroup_kref_put(prev);
+}
+
+/* Allocates GFP_KERNEL, cannot be called under any spinlock */
+void sched_autogroup_create_attach(struct task_struct *p)
+{
+	struct autogroup *ag = autogroup_create();
+
+	autogroup_move_group(p, ag);
+	/* drop extra refrence added by autogroup_create() */
+	autogroup_kref_put(ag);
+}
+EXPORT_SYMBOL(sched_autogroup_create_attach);
+
+/* Cannot be called under siglock.  Currently has no users */
+void sched_autogroup_detach(struct task_struct *p)
+{
+	autogroup_move_group(p, &autogroup_default);
+}
+EXPORT_SYMBOL(sched_autogroup_detach);
+
+void sched_autogroup_fork(struct signal_struct *sig)
+{
+	struct sighand_struct *sighand = current->sighand;
+
+	spin_lock(&sighand->siglock);
+	sig->autogroup = autogroup_kref_get(current->signal->autogroup);
+	spin_unlock(&sighand->siglock);
+}
+
+void sched_autogroup_exit(struct signal_struct *sig)
+{
+	autogroup_kref_put(sig->autogroup);
+}
+
+static int __init setup_autogroup(char *str)
+{
+	sysctl_sched_autogroup_enabled = 0;
+
+	return 1;
+}
+
+__setup("noautogroup", setup_autogroup);
+
+#ifdef CONFIG_PROC_FS
+
+static inline struct autogroup *autogroup_get(struct task_struct *p)
+{
+	struct autogroup *ag;
+
+	/* task may be moved after we unlock.. tough */
+	spin_lock(&p->sighand->siglock);
+	ag = autogroup_kref_get(p->signal->autogroup);
+	spin_unlock(&p->sighand->siglock);
+
+	return ag;
+}
+
+int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
+{
+	static unsigned long next = INITIAL_JIFFIES;
+	struct autogroup *ag;
+	int err;
+
+	if (*nice < -20 || *nice > 19)
+		return -EINVAL;
+
+	err = security_task_setnice(current, *nice);
+	if (err)
+		return err;
+
+	if (*nice < 0 && !can_nice(current, *nice))
+		return -EPERM;
+
+	/* this is a heavy operation taking global locks.. */
+	if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
+		return -EAGAIN;
+
+	next = HZ / 10 + jiffies;;
+	ag = autogroup_get(p);
+
+	down_write(&ag->lock);
+	err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
+	if (!err)
+		ag->nice = *nice;
+	up_write(&ag->lock);
+
+	autogroup_kref_put(ag);
+
+	return err;
+}
+
+void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
+{
+	struct autogroup *ag = autogroup_get(p);
+
+	down_read(&ag->lock);
+	seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
+	up_read(&ag->lock);
+
+	autogroup_kref_put(ag);
+}
+#endif /* CONFIG_PROC_FS */
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
diff --git a/kernel/sched_autogroup.h b/kernel/sched_autogroup.h
new file mode 100644
index 0000000..40deaef
--- /dev/null
+++ b/kernel/sched_autogroup.h
@@ -0,0 +1,23 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg);
+
+#else /* !CONFIG_SCHED_AUTOGROUP */
+
+static inline void autogroup_init(struct task_struct *init_task) {  }
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	return tg;
+}
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return 0;
+}
+#endif
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index e6590e7..3e5b067 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu,
 }
 #endif
 
+#if defined(CONFIG_CGROUP_SCHED) && \
+	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+static void task_group_path(struct task_group *tg, char *buf, int buflen)
+{
+	/* may be NULL if the underlying cgroup isn't fully-created yet */
+	if (!tg->css.cgroup) {
+		if (!autogroup_path(tg, buf, buflen))
+			buf[0] = '\0';
+		return;
+	}
+	cgroup_path(tg->css.cgroup, buf, buflen);
+}
+#endif
+
 static void
 print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 {
@@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 		char path[64];
 
 		rcu_read_lock();
-		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
+		task_group_path(task_group(p), path, sizeof(path));
 		rcu_read_unlock();
 		SEQ_printf(m, " %s", path);
 	}
@@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
 	read_unlock_irqrestore(&tasklist_lock, flags);
 }
 
-#if defined(CONFIG_CGROUP_SCHED) && \
-	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
-static void task_group_path(struct task_group *tg, char *buf, int buflen)
-{
-	/* may be NULL if the underlying cgroup isn't fully-created yet */
-	if (!tg->css.cgroup) {
-		buf[0] = '\0';
-		return;
-	}
-	cgroup_path(tg->css.cgroup, buf, buflen);
-}
-#endif
-
 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 {
 	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
diff --git a/kernel/sys.c b/kernel/sys.c
index 7f5a0cd..2745dcd 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
-	if (err > 0)
+	if (err > 0) {
 		proc_sid_connector(group_leader);
+		sched_autogroup_create_attach(group_leader);
+	}
 	return err;
 }
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 9b520d7..eb4b493 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -370,6 +370,17 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_SCHED_AUTOGROUP
+	{
+		.procname	= "sched_autogroup_enabled",
+		.data		= &sysctl_sched_autogroup_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 #ifdef CONFIG_PROVE_LOCKING
 	{
 		.procname	= "prove_locking",

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-21 13:39 ` Ingo Molnar
@ 2010-11-21 15:44   ` Oleg Nesterov
  2010-11-21 16:35     ` Mike Galbraith
  0 siblings, 1 reply; 79+ messages in thread
From: Oleg Nesterov @ 2010-11-21 15:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Peter Zijlstra, Linus Torvalds, LKML

On 11/21, Ingo Molnar wrote:
>
> Btw., there's a small cleanup in the patch that i picked up (see below), and i also
> edited the commit log a bit - so you might want to pick up the version below.

I didn't read the patch in details, but a couple of nits...

> +static void
> +autogroup_move_group(struct task_struct *p, struct autogroup *ag)
> +{
> +	struct autogroup *prev;
> +	struct task_struct *t;
> +
> +	spin_lock(&p->sighand->siglock);

This needs spin_lock_irq(), ->siglock is irq-safe.

The same for other lockers, but:

> +static inline struct autogroup *autogroup_get(struct task_struct *p)
> +{
> +	struct autogroup *ag;
> +
> +	/* task may be moved after we unlock.. tough */
> +	spin_lock(&p->sighand->siglock);

This is called by fs/proc. In this case nothing protects us from
release_task(), we can hit ->siglock == NULL (or we can race with
exec which changes ->sighand in theory).

This needs lock_task_sighand() (it can fail). Perhaps something
else have the same problem...

If the task is current and it is not exiting, or it is the new
child (sched_autogroup_fork), then it is safe to use ->siglock
directly.

And a pure cosmetic nit,

> +void sched_autogroup_fork(struct signal_struct *sig)
> +{
> +     struct sighand_struct *sighand = current->sighand;
> +
> +     spin_lock(&sighand->siglock);
> +     sig->autogroup = autogroup_kref_get(current->signal->autogroup);
> +     spin_unlock(&sighand->siglock);
> +}

This looks a bit confusing. We do not need current->sighand->siglock
to set sig->autogroup. Nobody except us can see this new signal_struct,
and in any case current->sighand->siglock can't help.

It is needed for autogroup_kref_get(), but we already have autogroup_get().
I'd suggest

	void sched_autogroup_fork(struct signal_struct *sig)
	{
		sig->autogroup = autogroup_get(current);	
	}

Oleg.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-21 13:37 [PATCH v4] sched: automated per session task groups Ingo Molnar
  2010-11-21 13:39 ` Ingo Molnar
@ 2010-11-21 16:15 ` Mike Galbraith
  2010-11-21 18:43 ` Gene Heskett
  2010-11-25 16:00 ` Mike Galbraith
  3 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-21 16:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On Sun, 2010-11-21 at 14:37 +0100, Ingo Molnar wrote:
> Hello Mike,
> 
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Tue, 2010-11-16 at 18:28 +0100, Ingo Molnar wrote:
> > 
> > > Mike,
> > > 
> > > Mind sending a new patch with a separate v2 announcement in a new thread, once you 
> > > have something i could apply to the scheduler tree (for a v2.6.38 merge)?
> > 
> > Changes since last:
> > - switch to per session vs tty
> > - make autogroups visible in /proc/sched_debug
> > - make autogroups visible in /proc/<pid>/autogroup
> > - add nice level bandwidth tweakability to /proc/<pid>/autogroup
> 
> I tested it a bit, and autosched-v4 crashes on bootup with with attached config.

Oh crud.  I ran 37, but not tip, it's toxic with my own config.  So much
for darn thing being ready :(

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-21 15:44   ` Oleg Nesterov
@ 2010-11-21 16:35     ` Mike Galbraith
  0 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-21 16:35 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Linus Torvalds, LKML

On Sun, 2010-11-21 at 16:44 +0100, Oleg Nesterov wrote:
> On 11/21, Ingo Molnar wrote:
> >
> > Btw., there's a small cleanup in the patch that i picked up (see below), and i also
> > edited the commit log a bit - so you might want to pick up the version below.
> 
> I didn't read the patch in details, but a couple of nits...
> 
> > +static void
> > +autogroup_move_group(struct task_struct *p, struct autogroup *ag)
> > +{
> > +	struct autogroup *prev;
> > +	struct task_struct *t;
> > +
> > +	spin_lock(&p->sighand->siglock);
> 
> This needs spin_lock_irq(), ->siglock is irq-safe.

Ok.

> The same for other lockers, but:
> 
> > +static inline struct autogroup *autogroup_get(struct task_struct *p)
> > +{
> > +	struct autogroup *ag;
> > +
> > +	/* task may be moved after we unlock.. tough */
> > +	spin_lock(&p->sighand->siglock);
> 
> This is called by fs/proc. In this case nothing protects us from
> release_task(), we can hit ->siglock == NULL (or we can race with
> exec which changes ->sighand in theory).

Oh my.  Thanks.

(gad, signal_struct is sooo much harder to get right)

> This needs lock_task_sighand() (it can fail). Perhaps something
> else have the same problem...
> 
> If the task is current and it is not exiting, or it is the new
> child (sched_autogroup_fork), then it is safe to use ->siglock
> directly.

Ok,

> And a pure cosmetic nit,
> 
> > +void sched_autogroup_fork(struct signal_struct *sig)
> > +{
> > +     struct sighand_struct *sighand = current->sighand;
> > +
> > +     spin_lock(&sighand->siglock);
> > +     sig->autogroup = autogroup_kref_get(current->signal->autogroup);
> > +     spin_unlock(&sighand->siglock);
> > +}
> 
> This looks a bit confusing. We do not need current->sighand->siglock
> to set sig->autogroup. Nobody except us can see this new signal_struct,
> and in any case current->sighand->siglock can't help.
> 
> It is needed for autogroup_kref_get(), but we already have autogroup_get().
> I'd suggest
> 
> 	void sched_autogroup_fork(struct signal_struct *sig)
> 	{
> 		sig->autogroup = autogroup_get(current);	
> 	}

I'll do that.. once the thing is non-toxic to tip.

Thanks a lot Oleg.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-21 13:37 [PATCH v4] sched: automated per session task groups Ingo Molnar
  2010-11-21 13:39 ` Ingo Molnar
  2010-11-21 16:15 ` Mike Galbraith
@ 2010-11-21 18:43 ` Gene Heskett
  2010-11-25 16:00 ` Mike Galbraith
  3 siblings, 0 replies; 79+ messages in thread
From: Gene Heskett @ 2010-11-21 18:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Galbraith, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On Sunday, November 21, 2010, Ingo Molnar wrote:
>Hello Mike,
>
>* Mike Galbraith <efault@gmx.de> wrote:
>> On Tue, 2010-11-16 at 18:28 +0100, Ingo Molnar wrote:
>> > Mike,
>> > 
>> > Mind sending a new patch with a separate v2 announcement in a new
>> > thread, once you have something i could apply to the scheduler tree
>> > (for a v2.6.38 merge)?
>> 
>> Changes since last:
>> - switch to per session vs tty
>> - make autogroups visible in /proc/sched_debug
>> - make autogroups visible in /proc/<pid>/autogroup
>> - add nice level bandwidth tweakability to /proc/<pid>/autogroup
>
>I tested it a bit, and autosched-v4 crashes on bootup with with attached
>config.
>
>Note: the box has serial logging enabled and there's UART code in the
>stacktrace - maybe it's related. Let me know if you need the full bootup
>log.
>
>Thanks,
>
>	Ingo
>
>[FAILED]
>Enabling local filesystem quotas:  [  OK  ]
>PPS event at 4294886381
>Enabling /etc/fstab swaps:  swapon: /dev/hda2: Function not implemented
>[FAILED]
>INIT: Entering runleveBUG: unable to handle kernel paging request at
>f548604c IP:l: 3 [<c10307f0>] update_cfs_shares+0x60/0x160
>*pdpt = 0000000002017001 *pde = 00000000029d4067 *pte = 8000000035486160
>Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>last sysfs file: /sys/block/sr0/dev
>
>Pid: 1, comm: init Not tainted 2.6.37-rc2-tip+ #64308 A8N-E/System
>Product Name EIP: 0060:[<c10307f0>] EFLAGS: 00010086 CPU: 1
>EIP is at update_cfs_shares+0x60/0x160
>EAX: fffffffe EBX: f547603b ECX: 00000400 EDX: 00000002
>ESI: f5486000 EDI: 0000013b EBP: f6459d48 ESP: f6459d3c
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>Process init (pid: 1, ti=f6458000 task=f6450000 task.ti=f6458000)
>Stack:
> f5475a80 f6f066c0 00000004 f6459d84 c103256f 00000002 00000001 00000000
> c10324d0 c200e6c0 00000001 f6f06b34 00000046 f5475a80 f5475ac8 f6f066c0
> 00000001 ffffffff f6459dfc c1b32820 f64a0010 f6459dc4 00000046 00000000
>Call Trace:
> [<c103256f>] update_shares+0x9f/0x170
> [<c10324d0>] ? update_shares+0x0/0x170
> [<c1b32820>] schedule+0x580/0x9d0
> [<c1039335>] ? sub_preempt_count+0xa5/0xe0
> [<c1b330e5>] schedule_timeout+0x125/0x2a0
> [<c104fe60>] ? process_timeout+0x0/0x10
> [<c15aef4f>] uart_close+0x17f/0x350
> [<c105fea0>] ? autoremove_wake_function+0x0/0x50
> [<c1471f72>] tty_release+0x102/0x500
> [<c1125fdf>] ? locks_remove_posix+0xf/0xa0
> [<c1119a43>] ? fsnotify+0x1e3/0x2f0
> [<c11198d3>] ? fsnotify+0x73/0x2f0
> [<c10ea1e1>] fput+0xb1/0x230
> [<c10e7e7e>] filp_close+0x4e/0x70
> [<c10e7f14>] sys_close+0x74/0xc0
> [<c1002b90>] sysenter_do_call+0x12/0x31
>Code: 00 00 00 8b 18 8b 79 1c 8b 49 18 2b b8 84 00 00 00 01 d3 89 d8 0f
>af c1 01 fb 74 07 89 c2 c1 fa 1f f7 fb 83 f8 02 ba 02 00 00 00 <8b> 5e
>4c 0f 4d d0 39 d1 0f 42 d1 8b 4e 1c 85 c9 0f 84 6a 00 00 EIP:
>[<c10307f0>] update_cfs_shares+0x60/0x160 SS:ESP 0068:f6459d3c CR2:
>00000000f548604c
>---[ end trace f0ad48f53e29a8fe ]---
>Kernel panic - not syncing: Fatal exception
>Pid: 1, comm: init Tainted: G      D     2.6.37-rc2-tip+ #64308
>Call Trace:
> [<c1b31ef1>] ? panic+0x66/0x15c
> [<c10065c3>] ? oops_end+0x83/0x90
> [<c10220fc>] ? no_context+0xbc/0x190
> [<c102225d>] ? __bad_area_nosemaphore+0x8d/0x130
> [<c10219a4>] ? vmalloc_fault+0x14/0x1c0
> [<c1021b64>] ? spurious_fault+0x14/0x110
> [<c1022317>] ? bad_area_nosemaphore+0x17/0x20
> [<c1022741>] ? do_page_fault+0x281/0x4c0
> [<c1008756>] ? native_sched_clock+0x26/0x90
> [<c1066033>] ? sched_clock_local+0xd3/0x1c0
> [<c10224c0>] ? do_page_fault+0x0/0x4c0
> [<c1b361e2>] ? error_code+0x5a/0x60
> [<c10224c0>] ? do_page_fault+0x0/0x4c0
> [<c10307f0>] ? update_cfs_shares+0x60/0x160
> [<c103256f>] ? update_shares+0x9f/0x170
> [<c10324d0>] ? update_shares+0x0/0x170
> [<c1b32820>] ? schedule+0x580/0x9d0
> [<c1039335>] ? sub_preempt_count+0xa5/0xe0
> [<c1b330e5>] ? schedule_timeout+0x125/0x2a0
> [<c104fe60>] ? process_timeout+0x0/0x10
> [<c15aef4f>] ? uart_close+0x17f/0x350
> [<c105fea0>] ? autoremove_wake_function+0x0/0x50
> [<c1471f72>] ? tty_release+0x102/0x500
> [<c1125fdf>] ? locks_remove_posix+0xf/0xa0
> [<c1119a43>] ? fsnotify+0x1e3/0x2f0
> [<c11198d3>] ? fsnotify+0x73/0x2f0
> [<c10ea1e1>] ? fput+0xb1/0x230
> [<c10e7e7e>] ? filp_close+0x4e/0x70
> [<c10e7f14>] ? sys_close+0x74/0xc0
> [<c1002b90>] ? sysenter_do_call+0x12/0x31
>Rebooting in 1 seconds..Press any key to enter the menu

And I just 2 hours ago got it working on 2.6.36.1(rc1) but had to learn and 
add to my 'makeit' script before I could make x work again.  Yeah, I'm a 
bad bad boy, I run the latest nvidia drivers.  A tail on the syslog is 
clean (so far anyway, uptime is 2:06).

So you can have (FWTW) my reviewed by: Gene Heskett

These patches are a definite keeper IMNSHO.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
On the whole, I'd rather be in Philadelphia.
		-- W.C. Fields' epitaph

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-21 13:37 [PATCH v4] sched: automated per session task groups Ingo Molnar
                   ` (2 preceding siblings ...)
  2010-11-21 18:43 ` Gene Heskett
@ 2010-11-25 16:00 ` Mike Galbraith
  2010-11-28 14:24   ` Mike Galbraith
  3 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-25 16:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On Sun, 2010-11-21 at 14:37 +0100, Ingo Molnar wrote:

> I tested it a bit, and autosched-v4 crashes on bootup with with attached config.

Hah.  Took a while between vacation activities and my wimpy little
memory ordering knowledge, but I've finally got it fingered out.  Tip's
update_shares() changes exposed previously invisible (to me) memory
ordering woes nicely it seems.

My vacation is (sniff) over, so I won't get a fully tested patch out the
door for review until I get back home.

	-Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-25 16:00 ` Mike Galbraith
@ 2010-11-28 14:24   ` Mike Galbraith
  2010-11-28 19:31     ` Linus Torvalds
                       ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-28 14:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

[-- Attachment #1: Type: text/plain, Size: 1586 bytes --]

On Thu, 2010-11-25 at 09:00 -0700, Mike Galbraith wrote: 

> My vacation is (sniff) over, so I won't get a fully tested patch out the
> door for review until I get back home.

Either I forgot to pack my eyeballs, or laptop is just too dinky and
annoying.  Now back home on beloved box, this little bugger poked me
dead in the eye.

Something else is seriously wrong though.  36.1 with attached (plus
sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
37.git and tip with fixlet below both suck rocks.  With a make -j40
running, wakeup-latency is showing latencies of >100ms, amarok skips,
mouse lurches badly.. generally horrid.  Something went south.

sched: fix 3d4b47b4 typo.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: new submission
---
 kernel/sched.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -8087,7 +8087,6 @@ static inline void unregister_fair_sched
 {
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
-	int i;
 
 	/*
 	* Only empty task groups can be destroyed; so we can speculatively
@@ -8097,7 +8096,7 @@ static inline void unregister_fair_sched
 		return;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-	list_del_leaf_cfs_rq(tg->cfs_rq[i]);
+	list_del_leaf_cfs_rq(tg->cfs_rq[cpu]);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
 #else /* !CONFG_FAIR_GROUP_SCHED */


[-- Attachment #2: sched_autogroup.36.1.diff --]
[-- Type: text/x-patch, Size: 19918 bytes --]

From: Mike Galbraith <efault@gmx.de>
Date: Sat, 20 Nov 2010 12:35:00 -0700
Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups

A recurring complaint from CFS users is that parallel kbuild has a negative
impact on desktop interactivity.  This patch implements an idea from Linus,
to automatically create task groups.  Currently, only per session autogroups
are implemented, but the patch leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited pointer to
a refcounted autogroup struct containing a task group pointer, the default
for all tasks pointing to the init_task_group.  When a task calls setsid(),
a new task group is created, the process is moved into the new task group,
and a reference to the preveious task group is dropped.  Child processes
inherit this task group thereafter, and increase it's refcount.  When the
last thread of a process exits, the process's reference is dropped, such
that when the last process referencing an autogroup exits, the autogroup
is destroyed.

At runqueue selection time, IFF a task has no cgroup assignment, its current
autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level through the
proc filesystem.  cat /proc/<pid>/autogroup displays the task's group and the
group's nice level.  echo <nice level> > /proc/<pid>/autogroup Sets the task
group's shares to the weight of nice <level> task.  Setting nice level is rate
limited for !admin users due to the abuse risk of task group locking.

The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
selected, but can be disabled via the boot option noautogroup, and can also
be turned on/off on the fly via..
	echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
..which will automatically move tasks to/from the root task group.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kernel-parameters.txt |    2 
 fs/proc/base.c                      |   79 ++++++++++++
 include/linux/sched.h               |   23 +++
 init/Kconfig                        |   12 +
 kernel/fork.c                       |    5 
 kernel/sched.c                      |   13 +
 kernel/sched_autogroup.c            |  235 ++++++++++++++++++++++++++++++++++++
 kernel/sched_autogroup.h            |   32 ++++
 kernel/sched_debug.c                |   29 ++--
 kernel/sys.c                        |    4 
 kernel/sysctl.c                     |   11 +
 11 files changed, 427 insertions(+), 18 deletions(-)

Index: linux-2.6.36/include/linux/sched.h
===================================================================
--- linux-2.6.36.orig/include/linux/sched.h
+++ linux-2.6.36/include/linux/sched.h
@@ -506,6 +506,8 @@ struct thread_group_cputimer {
 	spinlock_t lock;
 };
 
+struct autogroup;
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
@@ -573,6 +575,9 @@ struct signal_struct {
 
 	struct tty_struct *tty; /* NULL if no tty */
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 	/*
 	 * Cumulative resource counters for dead threads in the group,
 	 * and for reaped dead child processes forked by this group.
@@ -1900,6 +1905,24 @@ int sched_rt_handler(struct ctl_table *t
 
 extern unsigned int sysctl_sched_compat_yield;
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+extern unsigned int sysctl_sched_autogroup_enabled;
+
+extern void sched_autogroup_create_attach(struct task_struct *p);
+extern void sched_autogroup_detach(struct task_struct *p);
+extern void sched_autogroup_fork(struct signal_struct *sig);
+extern void sched_autogroup_exit(struct signal_struct *sig);
+#ifdef CONFIG_PROC_FS
+extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
+extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
+#endif
+#else
+static inline void sched_autogroup_create_attach(struct task_struct *p) { }
+static inline void sched_autogroup_detach(struct task_struct *p) { }
+static inline void sched_autogroup_fork(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+#endif
+
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
Index: linux-2.6.36/kernel/sched.c
===================================================================
--- linux-2.6.36.orig/kernel/sched.c
+++ linux-2.6.36/kernel/sched.c
@@ -78,6 +78,7 @@
 
 #include "sched_cpupri.h"
 #include "workqueue_sched.h"
+#include "sched_autogroup.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/sched.h>
@@ -268,6 +269,10 @@ struct task_group {
 	struct task_group *parent;
 	struct list_head siblings;
 	struct list_head children;
+
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 };
 
 #define root_task_group init_task_group
@@ -612,11 +617,14 @@ static inline int cpu_of(struct rq *rq)
  */
 static inline struct task_group *task_group(struct task_struct *p)
 {
+	struct task_group *tg;
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
 			lockdep_is_held(&task_rq(p)->lock));
-	return container_of(css, struct task_group, css);
+	tg = container_of(css, struct task_group, css);
+
+	return autogroup_task_group(p, tg);
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
@@ -1913,6 +1921,7 @@ static void deactivate_task(struct rq *r
 #include "sched_idletask.c"
 #include "sched_fair.c"
 #include "sched_rt.c"
+#include "sched_autogroup.c"
 #ifdef CONFIG_SCHED_DEBUG
 # include "sched_debug.c"
 #endif
@@ -7742,7 +7751,7 @@ void __init sched_init(void)
 #ifdef CONFIG_CGROUP_SCHED
 	list_add(&init_task_group.list, &task_groups);
 	INIT_LIST_HEAD(&init_task_group.children);
-
+	autogroup_init(&init_task);
 #endif /* CONFIG_CGROUP_SCHED */
 
 #if defined CONFIG_FAIR_GROUP_SCHED && defined CONFIG_SMP
Index: linux-2.6.36/kernel/fork.c
===================================================================
--- linux-2.6.36.orig/kernel/fork.c
+++ linux-2.6.36/kernel/fork.c
@@ -173,8 +173,10 @@ static inline void free_signal_struct(st
 
 static inline void put_signal_struct(struct signal_struct *sig)
 {
-	if (atomic_dec_and_test(&sig->sigcnt))
+	if (atomic_dec_and_test(&sig->sigcnt)) {
+		sched_autogroup_exit(sig);
 		free_signal_struct(sig);
+	}
 }
 
 void __put_task_struct(struct task_struct *tsk)
@@ -900,6 +902,7 @@ static int copy_signal(unsigned long clo
 	posix_cpu_timers_init_group(sig);
 
 	tty_audit_fork(sig);
+	sched_autogroup_fork(sig);
 
 	sig->oom_adj = current->signal->oom_adj;
 	sig->oom_score_adj = current->signal->oom_score_adj;
Index: linux-2.6.36/kernel/sys.c
===================================================================
--- linux-2.6.36.orig/kernel/sys.c
+++ linux-2.6.36/kernel/sys.c
@@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
-	if (err > 0)
+	if (err > 0) {
 		proc_sid_connector(group_leader);
+		sched_autogroup_create_attach(group_leader);
+	}
 	return err;
 }
 
Index: linux-2.6.36/kernel/sched_debug.c
===================================================================
--- linux-2.6.36.orig/kernel/sched_debug.c
+++ linux-2.6.36/kernel/sched_debug.c
@@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
 }
 #endif
 
+#if defined(CONFIG_CGROUP_SCHED) && \
+	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+static void task_group_path(struct task_group *tg, char *buf, int buflen)
+{
+	/* may be NULL if the underlying cgroup isn't fully-created yet */
+	if (!tg->css.cgroup) {
+		if (!autogroup_path(tg, buf, buflen))
+			buf[0] = '\0';
+		return;
+	}
+	cgroup_path(tg->css.cgroup, buf, buflen);
+}
+#endif
+
 static void
 print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 {
@@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
 		char path[64];
 
 		rcu_read_lock();
-		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
+		task_group_path(task_group(p), path, sizeof(path));
 		rcu_read_unlock();
 		SEQ_printf(m, " %s", path);
 	}
@@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
 	read_unlock_irqrestore(&tasklist_lock, flags);
 }
 
-#if defined(CONFIG_CGROUP_SCHED) && \
-	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
-static void task_group_path(struct task_group *tg, char *buf, int buflen)
-{
-	/* may be NULL if the underlying cgroup isn't fully-created yet */
-	if (!tg->css.cgroup) {
-		buf[0] = '\0';
-		return;
-	}
-	cgroup_path(tg->css.cgroup, buf, buflen);
-}
-#endif
-
 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 {
 	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
Index: linux-2.6.36/fs/proc/base.c
===================================================================
--- linux-2.6.36.orig/fs/proc/base.c
+++ linux-2.6.36/fs/proc/base.c
@@ -1359,6 +1359,82 @@ static const struct file_operations proc
 
 #endif
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+/*
+ * Print out autogroup related information:
+ */
+static int sched_autogroup_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *p;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+	proc_sched_autogroup_show_task(p, m);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static ssize_t
+sched_autogroup_write(struct file *file, const char __user *buf,
+	    size_t count, loff_t *offset)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct task_struct *p;
+	char buffer[PROC_NUMBUF];
+	long nice;
+	int err;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	err = strict_strtol(strstrip(buffer), 0, &nice);
+	if (err)
+		return -EINVAL;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+
+	err = nice;
+	err = proc_sched_autogroup_set_nice(p, &err);
+	if (err)
+		count = err;
+
+	put_task_struct(p);
+
+	return count;
+}
+
+static int sched_autogroup_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	ret = single_open(filp, sched_autogroup_show, NULL);
+	if (!ret) {
+		struct seq_file *m = filp->private_data;
+
+		m->private = inode;
+	}
+	return ret;
+}
+
+static const struct file_operations proc_pid_sched_autogroup_operations = {
+	.open		= sched_autogroup_open,
+	.read		= seq_read,
+	.write		= sched_autogroup_write,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
+
 static ssize_t comm_write(struct file *file, const char __user *buf,
 				size_t count, loff_t *offset)
 {
@@ -2679,6 +2755,9 @@ static const struct pid_entry tgid_base_
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
+#ifdef CONFIG_SCHED_AUTOGROUP
+	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
+#endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	INF("syscall",    S_IRUSR, proc_pid_syscall),
Index: linux-2.6.36/kernel/sched_autogroup.h
===================================================================
--- /dev/null
+++ linux-2.6.36/kernel/sched_autogroup.h
@@ -0,0 +1,32 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+struct autogroup {
+	struct kref		kref;
+	struct task_group	*tg;
+	struct rw_semaphore	lock;
+	unsigned long		id;
+	int			nice;
+};
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg);
+
+#else /* !CONFIG_SCHED_AUTOGROUP */
+
+static inline void autogroup_init(struct task_struct *init_task) {  }
+static inline void autogroup_free(struct task_group *tg) { }
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	return tg;
+}
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return 0;
+}
+#endif
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6.36/kernel/sched_autogroup.c
===================================================================
--- /dev/null
+++ linux-2.6.36/kernel/sched_autogroup.c
@@ -0,0 +1,235 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/utsname.h>
+
+unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
+static struct autogroup autogroup_default;
+static atomic_t autogroup_seq_nr;
+
+static void autogroup_init(struct task_struct *init_task)
+{
+	autogroup_default.tg = &init_task_group;
+	init_task_group.autogroup = &autogroup_default;
+	kref_init(&autogroup_default.kref);
+	init_rwsem(&autogroup_default.lock);
+	init_task->signal->autogroup = &autogroup_default;
+}
+
+static inline void autogroup_free(struct task_group *tg)
+{
+	kfree(tg->autogroup);
+}
+
+static inline void autogroup_destroy(struct kref *kref)
+{
+	struct autogroup *ag = container_of(kref, struct autogroup, kref);
+
+	sched_destroy_group(ag->tg);
+}
+
+static inline void autogroup_kref_put(struct autogroup *ag)
+{
+	kref_put(&ag->kref, autogroup_destroy);
+}
+
+static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
+{
+	kref_get(&ag->kref);
+	return ag;
+}
+
+static inline struct autogroup *autogroup_create(void)
+{
+	struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	struct task_group *tg;
+
+	if (!ag)
+		goto out_fail;
+
+	tg = sched_create_group(&init_task_group);
+
+	if (IS_ERR(tg))
+		goto out_free;
+
+	kref_init(&ag->kref);
+	init_rwsem(&ag->lock);
+	ag->id = atomic_inc_return(&autogroup_seq_nr);
+	ag->tg = tg;
+	tg->autogroup = ag;
+
+	return ag;
+
+out_free:
+	kfree(ag);
+out_fail:
+	if (printk_ratelimit()) {
+		printk(KERN_WARNING "autogroup_create: %s failure.\n",
+			ag ? "sched_create_group()" : "kmalloc()");
+	}
+
+	return autogroup_kref_get(&autogroup_default);
+}
+
+static inline bool
+task_wants_autogroup(struct task_struct *p, struct task_group *tg)
+{
+	if (tg != &root_task_group)
+		return false;
+
+	if (p->sched_class != &fair_sched_class)
+		return false;
+
+	/*
+	 * We can only assume the task group can't go away on us if
+	 * autogroup_move_group() can see us on ->thread_group list.
+	 */
+	if (p->flags & PF_EXITING)
+		return false;
+
+	return true;
+}
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
+
+	if (enabled && task_wants_autogroup(p, tg))
+		return p->signal->autogroup->tg;
+
+	return tg;
+}
+
+static void
+autogroup_move_group(struct task_struct *p, struct autogroup *ag)
+{
+	struct autogroup *prev;
+	struct task_struct *t;
+	unsigned long flags;
+
+	BUG_ON(!lock_task_sighand(p, &flags));
+
+	prev = p->signal->autogroup;
+	if (prev == ag) {
+		unlock_task_sighand(p, &flags);
+		return;
+	}
+
+	p->signal->autogroup = autogroup_kref_get(ag);
+	smp_mb();
+
+	t = p;
+	do {
+		sched_move_task(t);
+	} while_each_thread(p, t);
+
+	unlock_task_sighand(p, &flags);
+	autogroup_kref_put(prev);
+}
+
+/* Allocates GFP_KERNEL, cannot be called under any spinlock */
+void sched_autogroup_create_attach(struct task_struct *p)
+{
+	struct autogroup *ag = autogroup_create();
+
+	autogroup_move_group(p, ag);
+	/* drop extra refrence added by autogroup_create() */
+	autogroup_kref_put(ag);
+}
+EXPORT_SYMBOL(sched_autogroup_create_attach);
+
+/* Cannot be called under siglock.  Currently has no users */
+void sched_autogroup_detach(struct task_struct *p)
+{
+	autogroup_move_group(p, &autogroup_default);
+}
+EXPORT_SYMBOL(sched_autogroup_detach);
+
+void sched_autogroup_fork(struct signal_struct *sig)
+{
+	struct task_struct *p = current;
+
+	spin_lock_irq(&p->sighand->siglock);
+	sig->autogroup = autogroup_kref_get(p->signal->autogroup);
+	spin_unlock_irq(&p->sighand->siglock);
+}
+
+void sched_autogroup_exit(struct signal_struct *sig)
+{
+	struct autogroup *ag;
+
+	rcu_read_lock();
+	ag = rcu_dereference(sig->autogroup);
+	rcu_read_unlock();
+	autogroup_kref_put(ag);
+}
+
+static int __init setup_autogroup(char *str)
+{
+	sysctl_sched_autogroup_enabled = 0;
+
+	return 1;
+}
+
+__setup("noautogroup", setup_autogroup);
+
+#ifdef CONFIG_PROC_FS
+
+/* Called with siglock held. */
+int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
+{
+	static unsigned long next = INITIAL_JIFFIES;
+	struct autogroup *ag;
+	int err;
+
+	if (*nice < -20 || *nice > 19)
+		return -EINVAL;
+
+	err = security_task_setnice(current, *nice);
+	if (err)
+		return err;
+
+	if (*nice < 0 && !can_nice(current, *nice))
+		return -EPERM;
+
+	/* this is a heavy operation taking global locks.. */
+	if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
+		return -EAGAIN;
+
+	next = HZ / 10 + jiffies;
+	ag = autogroup_kref_get(p->signal->autogroup);
+
+	down_write(&ag->lock);
+	err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
+	if (!err)
+		ag->nice = *nice;
+	up_write(&ag->lock);
+
+	autogroup_kref_put(ag);
+
+	return err;
+}
+
+void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
+{
+	struct autogroup *ag = autogroup_kref_get(p->signal->autogroup);
+
+	down_read(&ag->lock);
+	seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
+	up_read(&ag->lock);
+
+	autogroup_kref_put(ag);
+}
+#endif /* CONFIG_PROC_FS */
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6.36/kernel/sysctl.c
===================================================================
--- linux-2.6.36.orig/kernel/sysctl.c
+++ linux-2.6.36/kernel/sysctl.c
@@ -384,6 +384,17 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_SCHED_AUTOGROUP
+	{
+		.procname	= "sched_autogroup_enabled",
+		.data		= &sysctl_sched_autogroup_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 #ifdef CONFIG_PROVE_LOCKING
 	{
 		.procname	= "prove_locking",
Index: linux-2.6.36/init/Kconfig
===================================================================
--- linux-2.6.36.orig/init/Kconfig
+++ linux-2.6.36/init/Kconfig
@@ -652,6 +652,18 @@ config DEBUG_BLK_CGROUP
 
 endif # CGROUPS
 
+config SCHED_AUTOGROUP
+	bool "Automatic process group scheduling"
+	select CGROUPS
+	select CGROUP_SCHED
+	select FAIR_GROUP_SCHED
+	help
+	  This option optimizes the scheduler for common desktop workloads by
+	  automatically creating and populating task groups.  This separation
+	  of workloads isolates aggressive CPU burners (like build jobs) from
+	  desktop applications.  Task group autogeneration is currently based
+	  upon task session.
+
 config MM_OWNER
 	bool
 
Index: linux-2.6.36/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.36.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.36/Documentation/kernel-parameters.txt
@@ -1610,6 +1610,8 @@ and is between 256 and 4096 characters.
 	noapic		[SMP,APIC] Tells the kernel to not make use of any
 			IOAPICs that may be present in the system.
 
+	noautogroup	Disable scheduler automatic task group creation.
+
 	nobats		[PPC] Do not use BATs for mapping kernel lowmem
 			on "Classic" PPC cores.
 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-28 14:24   ` Mike Galbraith
@ 2010-11-28 19:31     ` Linus Torvalds
  2010-11-28 20:18       ` Ingo Molnar
  2010-11-29  5:45       ` Mike Galbraith
  2010-12-01  3:39     ` Paul Turner
  2010-12-01  3:39     ` Paul Turner
  2 siblings, 2 replies; 79+ messages in thread
From: Linus Torvalds @ 2010-11-28 19:31 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, LKML

On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
>
> Something else is seriously wrong though.  36.1 with attached (plus
> sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> 37.git and tip with fixlet below both suck rocks.  With a make -j40
> running, wakeup-latency is showing latencies of >100ms, amarok skips,
> mouse lurches badly.. generally horrid.  Something went south.

Can you test -rc3? Is that still ok? And are you perhaps using
Nouveau? There's a report of some graphics (?) regression since -rc3
about bad desktop performance:

   https://bugzilla.kernel.org/show_bug.cgi?id=23912

but it doesn't have any more information yet (so if -rc3 _is_ good for
you, and you can add anything to that report, it would be good. The
original reporter is hopefully bisecting it now)

                     Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-28 19:31     ` Linus Torvalds
@ 2010-11-28 20:18       ` Ingo Molnar
  2010-11-29 11:53         ` Peter Zijlstra
  2010-11-29  5:45       ` Mike Galbraith
  1 sibling, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2010-11-28 20:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Galbraith, Oleg Nesterov, Peter Zijlstra, LKML


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
> >
> > Something else is seriously wrong though.  36.1 with attached (plus
> > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > running, wakeup-latency is showing latencies of >100ms, amarok skips,
> > mouse lurches badly.. generally horrid.  Something went south.
> 
> Can you test -rc3? Is that still ok? And are you perhaps using
> Nouveau? There's a report of some graphics (?) regression since -rc3
> about bad desktop performance:
> 
>    https://bugzilla.kernel.org/show_bug.cgi?id=23912
> 
> but it doesn't have any more information yet (so if -rc3 _is_ good for
> you, and you can add anything to that report, it would be good. The
> original reporter is hopefully bisecting it now)

Mike, the last pure -rc3 -tip commit is 92c883adf03b - you could try to check that 
out too: it has most of the current sched/core commits, but has none of the post-rc3 
DRM changes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-28 19:31     ` Linus Torvalds
  2010-11-28 20:18       ` Ingo Molnar
@ 2010-11-29  5:45       ` Mike Galbraith
  1 sibling, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-29  5:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, LKML

On Sun, 2010-11-28 at 11:31 -0800, Linus Torvalds wrote:
> On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
> >
> > Something else is seriously wrong though.  36.1 with attached (plus
> > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > running, wakeup-latency is showing latencies of >100ms, amarok skips,
> > mouse lurches badly.. generally horrid.  Something went south.
> 
> Can you test -rc3? Is that still ok? And are you perhaps using
> Nouveau? There's a report of some graphics (?) regression since -rc3
> about bad desktop performance:

No Nouveau here, plain old boring nv.

>    https://bugzilla.kernel.org/show_bug.cgi?id=23912
> 
> but it doesn't have any more information yet (so if -rc3 _is_ good for
> you, and you can add anything to that report, it would be good. The
> original reporter is hopefully bisecting it now)

I'll hunt as soon as I can (inbox runneth over).

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-28 20:18       ` Ingo Molnar
@ 2010-11-29 11:53         ` Peter Zijlstra
  2010-11-29 12:30           ` Ingo Molnar
  2010-11-29 16:27           ` Linus Torvalds
  0 siblings, 2 replies; 79+ messages in thread
From: Peter Zijlstra @ 2010-11-29 11:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Mike Galbraith, Oleg Nesterov, LKML, Paul Turner

On Sun, 2010-11-28 at 21:18 +0100, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
> > >
> > > Something else is seriously wrong though.  36.1 with attached (plus
> > > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > > running, wakeup-latency is showing latencies of >100ms, amarok skips,
> > > mouse lurches badly.. generally horrid.  Something went south.
> > 
> > Can you test -rc3? Is that still ok? And are you perhaps using
> > Nouveau? There's a report of some graphics (?) regression since -rc3
> > about bad desktop performance:
> > 
> >    https://bugzilla.kernel.org/show_bug.cgi?id=23912
> > 
> > but it doesn't have any more information yet (so if -rc3 _is_ good for
> > you, and you can add anything to that report, it would be good. The
> > original reporter is hopefully bisecting it now)
> 
> Mike, the last pure -rc3 -tip commit is 92c883adf03b - you could try to check that 
> out too: it has most of the current sched/core commits, but has none of the post-rc3 
> DRM changes.

Well we totally re-wrote the cgroup load-balancer in -tip. The thing
currently in -linus is a utter crap because its very strongly serialized
across all cores (some people spend like 25% of their time in there).

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 11:53         ` Peter Zijlstra
@ 2010-11-29 12:30           ` Ingo Molnar
  2010-11-29 13:45             ` Mike Galbraith
  2010-11-29 16:27           ` Linus Torvalds
  1 sibling, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2010-11-29 12:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Mike Galbraith, Oleg Nesterov, LKML, Paul Turner


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Sun, 2010-11-28 at 21:18 +0100, Ingo Molnar wrote:
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > >
> > > > Something else is seriously wrong though.  36.1 with attached (plus
> > > > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > > > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > > > running, wakeup-latency is showing latencies of >100ms, amarok skips,
> > > > mouse lurches badly.. generally horrid.  Something went south.
> > > 
> > > Can you test -rc3? Is that still ok? And are you perhaps using
> > > Nouveau? There's a report of some graphics (?) regression since -rc3
> > > about bad desktop performance:
> > > 
> > >    https://bugzilla.kernel.org/show_bug.cgi?id=23912
> > > 
> > > but it doesn't have any more information yet (so if -rc3 _is_ good for
> > > you, and you can add anything to that report, it would be good. The
> > > original reporter is hopefully bisecting it now)
> > 
> > Mike, the last pure -rc3 -tip commit is 92c883adf03b - you could try to check that 
> > out too: it has most of the current sched/core commits, but has none of the post-rc3 
> > DRM changes.
> 

> Well we totally re-wrote the cgroup load-balancer in -tip. The thing currently in 
> -linus is a utter crap because its very strongly serialized across all cores (some 
> people spend like 25% of their time in there).

Yes, 92c883adf03b includes those changes:

 08f3c3065f4c: Merge branch 'sched/core'
 9437178f623a: sched: Update tg->shares after cpu.shares write
 d6b5591829bd: sched: Allow update_cfs_load() to update global load
 3b3d190ec368: sched: Implement demand based update_cfs_load()
 c66eaf619c0c: sched: Update shares on idle_balance
 a7a4f8a752ec: sched: Add sysctl_sched_shares_window
 67e86250f8ea: sched: Introduce hierarchal order on shares update list
 e33078baa4d3: sched: Fix update_cfs_load() synchronization
 f0d7442a5924: sched: Fix load corruption from update_cfs_shares()
 9e3081ca6114: sched: Make tg_shares_up() walk on-demand
 3d4b47b4b040: sched: Implement on-demand (active) cfs_rq list
 2069dd75c7d0: sched: Rewrite tg_shares_up)
 48c5ccae88dc: sched: Simplify cpu-hot-unplug task migration
 92fd4d4d67b9: Merge commit 'v2.6.37-rc2' into sched/core

I just wanted to give Mike a known-stable sha1 that has -rc3 but not the post-rc3 
DRM changes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 12:30           ` Ingo Molnar
@ 2010-11-29 13:45             ` Mike Galbraith
  2010-11-29 13:47               ` Ingo Molnar
  0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-29 13:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner

On Mon, 2010-11-29 at 13:30 +0100, Ingo Molnar wrote: 
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > On Sun, 2010-11-28 at 21:18 +0100, Ingo Molnar wrote:
> > > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > 
> > > > On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > > >
> > > > > Something else is seriously wrong though.  36.1 with attached (plus
> > > > > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > > > > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > > > > running, wakeup-latency is showing latencies of >100ms, amarok skips,
> > > > > mouse lurches badly.. generally horrid.  Something went south.
> > > > 
> > > > Can you test -rc3? Is that still ok? And are you perhaps using
> > > > Nouveau? There's a report of some graphics (?) regression since -rc3
> > > > about bad desktop performance:
> > > > 
> > > >    https://bugzilla.kernel.org/show_bug.cgi?id=23912
> > > > 
> > > > but it doesn't have any more information yet (so if -rc3 _is_ good for
> > > > you, and you can add anything to that report, it would be good. The
> > > > original reporter is hopefully bisecting it now)
> > > 
> > > Mike, the last pure -rc3 -tip commit is 92c883adf03b - you could try to check that 
> > > out too: it has most of the current sched/core commits, but has none of the post-rc3 
> > > DRM changes.
> > 
> 
> > Well we totally re-wrote the cgroup load-balancer in -tip. The thing currently in 
> > -linus is a utter crap because its very strongly serialized across all cores (some 
> > people spend like 25% of their time in there).
> 
> Yes, 92c883adf03b includes those changes:
> 
>  08f3c3065f4c: Merge branch 'sched/core'
>  9437178f623a: sched: Update tg->shares after cpu.shares write
>  d6b5591829bd: sched: Allow update_cfs_load() to update global load
>  3b3d190ec368: sched: Implement demand based update_cfs_load()
>  c66eaf619c0c: sched: Update shares on idle_balance
>  a7a4f8a752ec: sched: Add sysctl_sched_shares_window
>  67e86250f8ea: sched: Introduce hierarchal order on shares update list
>  e33078baa4d3: sched: Fix update_cfs_load() synchronization
>  f0d7442a5924: sched: Fix load corruption from update_cfs_shares()
>  9e3081ca6114: sched: Make tg_shares_up() walk on-demand
>  3d4b47b4b040: sched: Implement on-demand (active) cfs_rq list
>  2069dd75c7d0: sched: Rewrite tg_shares_up)
>  48c5ccae88dc: sched: Simplify cpu-hot-unplug task migration
>  92fd4d4d67b9: Merge commit 'v2.6.37-rc2' into sched/core
> 
> I just wanted to give Mike a known-stable sha1 that has -rc3 but not the post-rc3 
> DRM changes.

The good news is that the 37.git kernel was mislabeled in grub, was also
booting the _tip_ kernel, and is actually just fine.  It's only tip, and
tip 92fd4d4d67b9 is still bad.  I'll try a quick bisect before getting
back to backlog.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 13:45             ` Mike Galbraith
@ 2010-11-29 13:47               ` Ingo Molnar
  2010-11-29 14:04                 ` Mike Galbraith
  0 siblings, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2010-11-29 13:47 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner


* Mike Galbraith <efault@gmx.de> wrote:

> On Mon, 2010-11-29 at 13:30 +0100, Ingo Molnar wrote: 
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > 
> > > On Sun, 2010-11-28 at 21:18 +0100, Ingo Molnar wrote:
> > > > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > 
> > > > > On Sun, Nov 28, 2010 at 6:24 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > > > >
> > > > > > Something else is seriously wrong though.  36.1 with attached (plus
> > > > > > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > > > > > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > > > > > running, wakeup-latency is showing latencies of >100ms, amarok skips,
> > > > > > mouse lurches badly.. generally horrid.  Something went south.
> > > > > 
> > > > > Can you test -rc3? Is that still ok? And are you perhaps using
> > > > > Nouveau? There's a report of some graphics (?) regression since -rc3
> > > > > about bad desktop performance:
> > > > > 
> > > > >    https://bugzilla.kernel.org/show_bug.cgi?id=23912
> > > > > 
> > > > > but it doesn't have any more information yet (so if -rc3 _is_ good for
> > > > > you, and you can add anything to that report, it would be good. The
> > > > > original reporter is hopefully bisecting it now)
> > > > 
> > > > Mike, the last pure -rc3 -tip commit is 92c883adf03b - you could try to check that 
> > > > out too: it has most of the current sched/core commits, but has none of the post-rc3 
> > > > DRM changes.
> > > 
> > 
> > > Well we totally re-wrote the cgroup load-balancer in -tip. The thing currently in 
> > > -linus is a utter crap because its very strongly serialized across all cores (some 
> > > people spend like 25% of their time in there).
> > 
> > Yes, 92c883adf03b includes those changes:
> > 
> >  08f3c3065f4c: Merge branch 'sched/core'
> >  9437178f623a: sched: Update tg->shares after cpu.shares write
> >  d6b5591829bd: sched: Allow update_cfs_load() to update global load
> >  3b3d190ec368: sched: Implement demand based update_cfs_load()
> >  c66eaf619c0c: sched: Update shares on idle_balance
> >  a7a4f8a752ec: sched: Add sysctl_sched_shares_window
> >  67e86250f8ea: sched: Introduce hierarchal order on shares update list
> >  e33078baa4d3: sched: Fix update_cfs_load() synchronization
> >  f0d7442a5924: sched: Fix load corruption from update_cfs_shares()
> >  9e3081ca6114: sched: Make tg_shares_up() walk on-demand
> >  3d4b47b4b040: sched: Implement on-demand (active) cfs_rq list
> >  2069dd75c7d0: sched: Rewrite tg_shares_up)
> >  48c5ccae88dc: sched: Simplify cpu-hot-unplug task migration
> >  92fd4d4d67b9: Merge commit 'v2.6.37-rc2' into sched/core
> > 
> > I just wanted to give Mike a known-stable sha1 that has -rc3 but not the post-rc3 
> > DRM changes.
> 
> The good news is that the 37.git kernel was mislabeled in grub, was also
> booting the _tip_ kernel, and is actually just fine.  It's only tip, and
> tip 92fd4d4d67b9 is still bad.  I'll try a quick bisect before getting
> back to backlog.

Just curious, what's the freshest still good -tip sha1?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 13:47               ` Ingo Molnar
@ 2010-11-29 14:04                 ` Mike Galbraith
  0 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-29 14:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner

On Mon, 2010-11-29 at 14:47 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:

> > The good news is that the 37.git kernel was mislabeled in grub, was also
> > booting the _tip_ kernel, and is actually just fine.  It's only tip, and
> > tip 92fd4d4d67b9 is still bad.  I'll try a quick bisect before getting
> > back to backlog.
> 
> Just curious, what's the freshest still good -tip sha1?

I don't have a good tip yet.  My bisection started with a merge, which
tested bad, so it spat out...

marge:..git/linux-2.6 # git bisect bad
The merge base e53beacd23d9cb47590da6a7a7f6d417b941a994 is bad.
This means the bug has been fixed between e53beacd23d9cb47590da6a7a7f6d417b941a994 and [19650e8580987c0ffabc2fe2cbc16b944789df8b].

marge:..git/linux-2.6 # git bisect log
git bisect start
# good: [19650e8580987c0ffabc2fe2cbc16b944789df8b] Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect good 19650e8580987c0ffabc2fe2cbc16b944789df8b
# bad: [92fd4d4d67b945c0766416284d4ab236b31542c4] Merge commit 'v2.6.37-rc2' into sched/core
git bisect bad 92fd4d4d67b945c0766416284d4ab236b31542c4
# bad: [e53beacd23d9cb47590da6a7a7f6d417b941a994] Linux 2.6.37-rc2
git bisect bad e53beacd23d9cb47590da6a7a7f6d417b941a994
marge:..git/linux-2.6 #


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 11:53         ` Peter Zijlstra
  2010-11-29 12:30           ` Ingo Molnar
@ 2010-11-29 16:27           ` Linus Torvalds
  2010-11-29 16:44             ` Ingo Molnar
  2010-11-29 17:37             ` Peter Zijlstra
  1 sibling, 2 replies; 79+ messages in thread
From: Linus Torvalds @ 2010-11-29 16:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Mike Galbraith, Oleg Nesterov, LKML, Paul Turner

On Mon, Nov 29, 2010 at 3:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Well we totally re-wrote the cgroup load-balancer in -tip. The thing
> currently in -linus is a utter crap because its very strongly serialized
> across all cores (some people spend like 25% of their time in there).

Well, it seems that the rewrite is more crap than the "utter crap" in
current -git. What does that make -tip? Super-utter-crap?

Peter - getting the wrong answer quickly is not any better than strong
serialization.

Anyway, I'm happy to hear that the problem hasn't reached mainline yet.

                    Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 16:27           ` Linus Torvalds
@ 2010-11-29 16:44             ` Ingo Molnar
  2010-11-29 17:37             ` Peter Zijlstra
  1 sibling, 0 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-29 16:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Mike Galbraith, Oleg Nesterov, LKML, Paul Turner


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Nov 29, 2010 at 3:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > Well we totally re-wrote the cgroup load-balancer in -tip. The thing currently 
> > in -linus is a utter crap because its very strongly serialized across all cores 
> > (some people spend like 25% of their time in there).
> 
> Well, it seems that the rewrite is more crap than the "utter crap" in current 
> -git. What does that make -tip? Super-utter-crap?

Something along that line, or worse.

> Peter - getting the wrong answer quickly is not any better than strong 
> serialization.

Yeah, obviously.

> Anyway, I'm happy to hear that the problem hasn't reached mainline yet.

It wont.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 16:27           ` Linus Torvalds
  2010-11-29 16:44             ` Ingo Molnar
@ 2010-11-29 17:37             ` Peter Zijlstra
  2010-11-29 18:03               ` Ingo Molnar
  2010-11-29 19:06               ` Mike Galbraith
  1 sibling, 2 replies; 79+ messages in thread
From: Peter Zijlstra @ 2010-11-29 17:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Mike Galbraith, Oleg Nesterov, LKML, Paul Turner

On Mon, 2010-11-29 at 08:27 -0800, Linus Torvalds wrote:
> On Mon, Nov 29, 2010 at 3:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > Well we totally re-wrote the cgroup load-balancer in -tip. The thing
> > currently in -linus is a utter crap because its very strongly serialized
> > across all cores (some people spend like 25% of their time in there).
> 
> Well, it seems that the rewrite is more crap than the "utter crap" in
> current -git. What does that make -tip? Super-utter-crap?
> 
> Peter - getting the wrong answer quickly is not any better than strong
> serialization.

I know, from the testing so far we _thought_ it was fairly sane.
Apparently there's still some work to do.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 17:37             ` Peter Zijlstra
@ 2010-11-29 18:03               ` Ingo Molnar
  2010-11-29 19:06               ` Mike Galbraith
  1 sibling, 0 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-29 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Mike Galbraith, Oleg Nesterov, LKML, Paul Turner


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Mon, 2010-11-29 at 08:27 -0800, Linus Torvalds wrote:
> > On Mon, Nov 29, 2010 at 3:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > >
> > > Well we totally re-wrote the cgroup load-balancer in -tip. The thing
> > > currently in -linus is a utter crap because its very strongly serialized
> > > across all cores (some people spend like 25% of their time in there).
> > 
> > Well, it seems that the rewrite is more crap than the "utter crap" in
> > current -git. What does that make -tip? Super-utter-crap?
> > 
> > Peter - getting the wrong answer quickly is not any better than strong
> > serialization.
> 
> I know, from the testing so far we _thought_ it was fairly sane.
> Apparently there's still some work to do.

Btw., i think it shows the conceptual power of Mike's patch that this cgroups 
scheduling suckage was exposed so clearly. Previously it took weeks (sometimes 
months) for bugs to reach those who are using cgroups.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 17:37             ` Peter Zijlstra
  2010-11-29 18:03               ` Ingo Molnar
@ 2010-11-29 19:06               ` Mike Galbraith
  2010-11-29 19:20                 ` Ingo Molnar
  1 sibling, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-29 19:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Oleg Nesterov, LKML, Paul Turner

On Mon, 2010-11-29 at 18:37 +0100, Peter Zijlstra wrote:
> On Mon, 2010-11-29 at 08:27 -0800, Linus Torvalds wrote:
> > On Mon, Nov 29, 2010 at 3:53 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > >
> > > Well we totally re-wrote the cgroup load-balancer in -tip. The thing
> > > currently in -linus is a utter crap because its very strongly serialized
> > > across all cores (some people spend like 25% of their time in there).
> > 
> > Well, it seems that the rewrite is more crap than the "utter crap" in
> > current -git. What does that make -tip? Super-utter-crap?
> > 
> > Peter - getting the wrong answer quickly is not any better than strong
> > serialization.
> 
> I know, from the testing so far we _thought_ it was fairly sane.
> Apparently there's still some work to do.

Damn thing bisected to:

commit 92fd4d4d67b945c0766416284d4ab236b31542c4
Merge: fe7de49 e53beac
Author: Ingo Molnar <mingo@elte.hu>
Date:   Thu Nov 18 13:22:14 2010 +0100

    Merge commit 'v2.6.37-rc2' into sched/core

    Merge reason: Move to a .37-rc base.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>

92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit

git bisect start
# good: [f6f94e2ab1b33f0082ac22d71f66385a60d8157f] Linux 2.6.36
git bisect good f6f94e2ab1b33f0082ac22d71f66385a60d8157f
# bad: [3a2b7f908d45fa45670e8ba9e7e24c0409ba43d8] Merge branch 'linus'
git bisect bad 3a2b7f908d45fa45670e8ba9e7e24c0409ba43d8
# good: [520045db940a381d2bee1c1b2179f7921b40fb10] Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
git bisect good 520045db940a381d2bee1c1b2179f7921b40fb10
# good: [520045db940a381d2bee1c1b2179f7921b40fb10] Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
git bisect good 520045db940a381d2bee1c1b2179f7921b40fb10
# good: [120a795da07c9a02221ca23464c28a7c6ad7de1d] audit mmap
git bisect good 120a795da07c9a02221ca23464c28a7c6ad7de1d
# good: [19650e8580987c0ffabc2fe2cbc16b944789df8b] Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect good 19650e8580987c0ffabc2fe2cbc16b944789df8b
# good: [11259d65a61b84ad954953a194c41fe84dff889a] Merge branch 'out-of-tree'
git bisect good 11259d65a61b84ad954953a194c41fe84dff889a
# good: [eae0932ceba16e7ee0b5690455a13ef8364845da] Merge branch 'x86/mm'
git bisect good eae0932ceba16e7ee0b5690455a13ef8364845da
# good: [0464a38aaca10e1a8afed003d16d25dca2168d86] Merge branch 'sched/urgent'
git bisect good 0464a38aaca10e1a8afed003d16d25dca2168d86
# good: [22d1b202a8d0e1dedc35086b8f3df0a7b37d1371] Merge branch 'x86/urgent'
git bisect good 22d1b202a8d0e1dedc35086b8f3df0a7b37d1371
# bad: [282810f891cf6587dfc04fc5e26ec7772330c8cb] Merge branch 'sched/core'
git bisect bad 282810f891cf6587dfc04fc5e26ec7772330c8cb
# bad: [2932e532dd8fbd699ce072a4badc7fbe69451be6] Merge branch 'out-of-tree'
git bisect bad 2932e532dd8fbd699ce072a4badc7fbe69451be6
# bad: [d6b5591829bd348a5fbe1c428d28dea00621cdba] sched: Allow update_cfs_load() to update global load
git bisect bad d6b5591829bd348a5fbe1c428d28dea00621cdba
# bad: [f0d7442a5924a802b66eef79b3708f77297bfb35] sched: Fix load corruption from update_cfs_shares()
git bisect bad f0d7442a5924a802b66eef79b3708f77297bfb35
# bad: [2069dd75c7d0f49355939e5586daf5a9ab216db7] sched: Rewrite tg_shares_up)
git bisect bad 2069dd75c7d0f49355939e5586daf5a9ab216db7
# bad: [48c5ccae88dcd989d9de507e8510313c6cbd352b] sched: Simplify cpu-hot-unplug task migration
git bisect bad 48c5ccae88dcd989d9de507e8510313c6cbd352b
# bad: [92fd4d4d67b945c0766416284d4ab236b31542c4] Merge commit 'v2.6.37-rc2' into sched/core
git bisect bad 92fd4d4d67b945c0766416284d4ab236b31542c4



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 19:06               ` Mike Galbraith
@ 2010-11-29 19:20                 ` Ingo Molnar
  2010-11-30  3:39                   ` Paul Turner
  2010-11-30  7:54                   ` Mike Galbraith
  0 siblings, 2 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-29 19:20 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner


* Mike Galbraith <efault@gmx.de> wrote:

> > I know, from the testing so far we _thought_ it was fairly sane. Apparently 
> > there's still some work to do.
> 
> Damn thing bisected to:
> 
> commit 92fd4d4d67b945c0766416284d4ab236b31542c4
> Merge: fe7de49 e53beac
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Thu Nov 18 13:22:14 2010 +0100
> 
>     Merge commit 'v2.6.37-rc2' into sched/core
> 
>     Merge reason: Move to a .37-rc base.
> 
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit

Hm, i'd suggest to double check the two originator points:

  e53beac - is it really 'bad' ?
  fe7de49 - is it really 'good'?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 19:20                 ` Ingo Molnar
@ 2010-11-30  3:39                   ` Paul Turner
  2010-11-30  4:14                     ` Mike Galbraith
  2010-11-30  7:54                   ` Mike Galbraith
  1 sibling, 1 reply; 79+ messages in thread
From: Paul Turner @ 2010-11-30  3:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Galbraith, Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML

On Mon, Nov 29, 2010 at 11:20 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Mike Galbraith <efault@gmx.de> wrote:
>
>> > I know, from the testing so far we _thought_ it was fairly sane. Apparently
>> > there's still some work to do.
>>
>> Damn thing bisected to:
>>
>> commit 92fd4d4d67b945c0766416284d4ab236b31542c4
>> Merge: fe7de49 e53beac
>> Author: Ingo Molnar <mingo@elte.hu>
>> Date:   Thu Nov 18 13:22:14 2010 +0100
>>
>>     Merge commit 'v2.6.37-rc2' into sched/core
>>
>>     Merge reason: Move to a .37-rc base.
>>
>>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>>
>> 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit
>
> Hm, i'd suggest to double check the two originator points:
>
>  e53beac - is it really 'bad' ?
>  fe7de49 - is it really 'good'?
>
> Thanks,
>
>        Ingo
>

https://lkml.org/lkml/2010/11/29/566

Should fix this.  We missed this in testing as the delay between
last-task-exit and group destruction was always sufficiently large as
to ensure that the task_group had aged out of shares updates (as
opposed to requiring explicit removal).

With autogroup obviously the window here is essentially instantaneous
which leads to the buggy removal code being executed.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30  3:39                   ` Paul Turner
@ 2010-11-30  4:14                     ` Mike Galbraith
  2010-11-30  4:23                       ` Paul Turner
  0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-30  4:14 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML

On Mon, 2010-11-29 at 19:39 -0800, Paul Turner wrote:
> On Mon, Nov 29, 2010 at 11:20 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Mike Galbraith <efault@gmx.de> wrote:
> >
> >> > I know, from the testing so far we _thought_ it was fairly sane. Apparently
> >> > there's still some work to do.
> >>
> >> Damn thing bisected to:
> >>
> >> commit 92fd4d4d67b945c0766416284d4ab236b31542c4
> >> Merge: fe7de49 e53beac
> >> Author: Ingo Molnar <mingo@elte.hu>
> >> Date:   Thu Nov 18 13:22:14 2010 +0100
> >>
> >>     Merge commit 'v2.6.37-rc2' into sched/core
> >>
> >>     Merge reason: Move to a .37-rc base.
> >>
> >>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> >>
> >> 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit
> >
> > Hm, i'd suggest to double check the two originator points:
> >
> >  e53beac - is it really 'bad' ?
> >  fe7de49 - is it really 'good'?
> >
> > Thanks,
> >
> >        Ingo
> >
> 
> https://lkml.org/lkml/2010/11/29/566
> 
> Should fix this.

No, I had it in place where pertinent.  Problem with bisection is that
there are a couple of spots where X doesn't work.  With X, it's obvious,
less so in text console.  Looks like I must have miscalled one of those.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30  4:14                     ` Mike Galbraith
@ 2010-11-30  4:23                       ` Paul Turner
  2010-11-30 13:18                         ` Mike Galbraith
  0 siblings, 1 reply; 79+ messages in thread
From: Paul Turner @ 2010-11-30  4:23 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML

On Mon, Nov 29, 2010 at 8:14 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Mon, 2010-11-29 at 19:39 -0800, Paul Turner wrote:
>> On Mon, Nov 29, 2010 at 11:20 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * Mike Galbraith <efault@gmx.de> wrote:
>> >
>> >> > I know, from the testing so far we _thought_ it was fairly sane. Apparently
>> >> > there's still some work to do.
>> >>
>> >> Damn thing bisected to:
>> >>
>> >> commit 92fd4d4d67b945c0766416284d4ab236b31542c4
>> >> Merge: fe7de49 e53beac
>> >> Author: Ingo Molnar <mingo@elte.hu>
>> >> Date:   Thu Nov 18 13:22:14 2010 +0100
>> >>
>> >>     Merge commit 'v2.6.37-rc2' into sched/core
>> >>
>> >>     Merge reason: Move to a .37-rc base.
>> >>
>> >>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>> >>
>> >> 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit
>> >
>> > Hm, i'd suggest to double check the two originator points:
>> >
>> >  e53beac - is it really 'bad' ?
>> >  fe7de49 - is it really 'good'?
>> >
>> > Thanks,
>> >
>> >        Ingo
>> >
>>
>> https://lkml.org/lkml/2010/11/29/566
>>
>> Should fix this.
>
> No, I had it in place where pertinent.  Problem with bisection is that
> there are a couple of spots where X doesn't work.  With X, it's obvious,
> less so in text console.  Looks like I must have miscalled one of those.
>
>        -Mike

I've left some machines running tip + fix above + autogroup to see if
anything else emerges.  Hasn't crashed yet, I'll leave it going
overnight.

>
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-29 19:20                 ` Ingo Molnar
  2010-11-30  3:39                   ` Paul Turner
@ 2010-11-30  7:54                   ` Mike Galbraith
  2010-11-30 14:18                     ` Ingo Molnar
  1 sibling, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-30  7:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner

On Mon, 2010-11-29 at 20:20 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > > I know, from the testing so far we _thought_ it was fairly sane. Apparently 
> > > there's still some work to do.
> > 
> > Damn thing bisected to:
> > 
> > commit 92fd4d4d67b945c0766416284d4ab236b31542c4
> > Merge: fe7de49 e53beac
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date:   Thu Nov 18 13:22:14 2010 +0100
> > 
> >     Merge commit 'v2.6.37-rc2' into sched/core
> > 
> >     Merge reason: Move to a .37-rc base.
> > 
> >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > 
> > 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit
> 
> Hm, i'd suggest to double check the two originator points:
> 
>   e53beac - is it really 'bad' ?
>   fe7de49 - is it really 'good'?

Nope.   I did a bisection this morning in text mode with a pipe-test
based measurement proggy, and it bisected cleanly.

2069dd75c7d0f49355939e5586daf5a9ab216db7 is the first bad commit

commit 2069dd75c7d0f49355939e5586daf5a9ab216db7
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Mon Nov 15 15:47:00 2010 -0800

    sched: Rewrite tg_shares_up)

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30  4:23                       ` Paul Turner
@ 2010-11-30 13:18                         ` Mike Galbraith
  2010-11-30 13:48                           ` Peter Zijlstra
                                             ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-30 13:18 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML

On Mon, 2010-11-29 at 20:23 -0800, Paul Turner wrote:

> I've left some machines running tip + fix above + autogroup to see if
> anything else emerges.  Hasn't crashed yet, I'll leave it going
> overnight.

Thanks.  Below is the hopefully final version against tip.  The last I
sent contained a couple remnants.

From: Mike Galbraith <efault@gmx.de>
Date: Tue Nov 30 14:07:12 CET 2010
Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups

A recurring complaint from CFS users is that parallel kbuild has a negative
impact on desktop interactivity.  This patch implements an idea from Linus,
to automatically create task groups.  Currently, only per session autogroups
are implemented, but the patch leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited pointer to
a refcounted autogroup struct containing a task group pointer, the default
for all tasks pointing to the init_task_group.  When a task calls setsid(),
a new task group is created, the process is moved into the new task group,
and a reference to the preveious task group is dropped.  Child processes
inherit this task group thereafter, and increase it's refcount.  When the
last thread of a process exits, the process's reference is dropped, such
that when the last process referencing an autogroup exits, the autogroup
is destroyed.

At runqueue selection time, IFF a task has no cgroup assignment, its current
autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level through the
proc filesystem.  cat /proc/<pid>/autogroup displays the task's group and the
group's nice level.  echo <nice level> > /proc/<pid>/autogroup Sets the task
group's shares to the weight of nice <level> task.  Setting nice level is rate
limited for !admin users due to the abuse risk of task group locking.

The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
selected, but can be disabled via the boot option noautogroup, and can also
be turned on/off on the fly via..
	echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
..which will automatically move tasks to/from the root task group.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kernel-parameters.txt |    2 
 fs/proc/base.c                      |   79 ++++++++++++
 include/linux/sched.h               |   23 +++
 init/Kconfig                        |   12 +
 kernel/fork.c                       |    5 
 kernel/sched.c                      |   13 +-
 kernel/sched_autogroup.c            |  229 ++++++++++++++++++++++++++++++++++++
 kernel/sched_autogroup.h            |   32 +++++
 kernel/sched_debug.c                |   29 ++--
 kernel/sys.c                        |    4 
 kernel/sysctl.c                     |   11 +
 11 files changed, 421 insertions(+), 18 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -513,6 +513,8 @@ struct thread_group_cputimer {
 	spinlock_t lock;
 };
 
+struct autogroup;
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
@@ -580,6 +582,9 @@ struct signal_struct {
 
 	struct tty_struct *tty; /* NULL if no tty */
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 	/*
 	 * Cumulative resource counters for dead threads in the group,
 	 * and for reaped dead child processes forked by this group.
@@ -1931,6 +1936,24 @@ int sched_rt_handler(struct ctl_table *t
 
 extern unsigned int sysctl_sched_compat_yield;
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+extern unsigned int sysctl_sched_autogroup_enabled;
+
+extern void sched_autogroup_create_attach(struct task_struct *p);
+extern void sched_autogroup_detach(struct task_struct *p);
+extern void sched_autogroup_fork(struct signal_struct *sig);
+extern void sched_autogroup_exit(struct signal_struct *sig);
+#ifdef CONFIG_PROC_FS
+extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
+extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
+#endif
+#else
+static inline void sched_autogroup_create_attach(struct task_struct *p) { }
+static inline void sched_autogroup_detach(struct task_struct *p) { }
+static inline void sched_autogroup_fork(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+#endif
+
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -79,6 +79,7 @@
 
 #include "sched_cpupri.h"
 #include "workqueue_sched.h"
+#include "sched_autogroup.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/sched.h>
@@ -271,6 +272,10 @@ struct task_group {
 	struct task_group *parent;
 	struct list_head siblings;
 	struct list_head children;
+
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 };
 
 #define root_task_group init_task_group
@@ -603,11 +608,14 @@ static inline int cpu_of(struct rq *rq)
  */
 static inline struct task_group *task_group(struct task_struct *p)
 {
+	struct task_group *tg;
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
 			lockdep_is_held(&task_rq(p)->lock));
-	return container_of(css, struct task_group, css);
+	tg = container_of(css, struct task_group, css);
+
+	return autogroup_task_group(p, tg);
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
@@ -1869,6 +1877,7 @@ static void sched_irq_time_avg_update(st
 #include "sched_idletask.c"
 #include "sched_fair.c"
 #include "sched_rt.c"
+#include "sched_autogroup.c"
 #include "sched_stoptask.c"
 #ifdef CONFIG_SCHED_DEBUG
 # include "sched_debug.c"
@@ -7752,7 +7761,7 @@ void __init sched_init(void)
 #ifdef CONFIG_CGROUP_SCHED
 	list_add(&init_task_group.list, &task_groups);
 	INIT_LIST_HEAD(&init_task_group.children);
-
+	autogroup_init(&init_task);
 #endif /* CONFIG_CGROUP_SCHED */
 
 	for_each_possible_cpu(i) {
Index: linux-2.6/kernel/fork.c
===================================================================
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -174,8 +174,10 @@ static inline void free_signal_struct(st
 
 static inline void put_signal_struct(struct signal_struct *sig)
 {
-	if (atomic_dec_and_test(&sig->sigcnt))
+	if (atomic_dec_and_test(&sig->sigcnt)) {
+		sched_autogroup_exit(sig);
 		free_signal_struct(sig);
+	}
 }
 
 void __put_task_struct(struct task_struct *tsk)
@@ -904,6 +906,7 @@ static int copy_signal(unsigned long clo
 	posix_cpu_timers_init_group(sig);
 
 	tty_audit_fork(sig);
+	sched_autogroup_fork(sig);
 
 	sig->oom_adj = current->signal->oom_adj;
 	sig->oom_score_adj = current->signal->oom_score_adj;
Index: linux-2.6/kernel/sys.c
===================================================================
--- linux-2.6.orig/kernel/sys.c
+++ linux-2.6/kernel/sys.c
@@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
-	if (err > 0)
+	if (err > 0) {
 		proc_sid_connector(group_leader);
+		sched_autogroup_create_attach(group_leader);
+	}
 	return err;
 }
 
Index: linux-2.6/kernel/sched_debug.c
===================================================================
--- linux-2.6.orig/kernel/sched_debug.c
+++ linux-2.6/kernel/sched_debug.c
@@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
 }
 #endif
 
+#if defined(CONFIG_CGROUP_SCHED) && \
+	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+static void task_group_path(struct task_group *tg, char *buf, int buflen)
+{
+	/* may be NULL if the underlying cgroup isn't fully-created yet */
+	if (!tg->css.cgroup) {
+		if (!autogroup_path(tg, buf, buflen))
+			buf[0] = '\0';
+		return;
+	}
+	cgroup_path(tg->css.cgroup, buf, buflen);
+}
+#endif
+
 static void
 print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 {
@@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
 		char path[64];
 
 		rcu_read_lock();
-		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
+		task_group_path(task_group(p), path, sizeof(path));
 		rcu_read_unlock();
 		SEQ_printf(m, " %s", path);
 	}
@@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
 	read_unlock_irqrestore(&tasklist_lock, flags);
 }
 
-#if defined(CONFIG_CGROUP_SCHED) && \
-	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
-static void task_group_path(struct task_group *tg, char *buf, int buflen)
-{
-	/* may be NULL if the underlying cgroup isn't fully-created yet */
-	if (!tg->css.cgroup) {
-		buf[0] = '\0';
-		return;
-	}
-	cgroup_path(tg->css.cgroup, buf, buflen);
-}
-#endif
-
 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 {
 	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
Index: linux-2.6/fs/proc/base.c
===================================================================
--- linux-2.6.orig/fs/proc/base.c
+++ linux-2.6/fs/proc/base.c
@@ -1407,6 +1407,82 @@ static const struct file_operations proc
 
 #endif
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+/*
+ * Print out autogroup related information:
+ */
+static int sched_autogroup_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *p;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+	proc_sched_autogroup_show_task(p, m);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static ssize_t
+sched_autogroup_write(struct file *file, const char __user *buf,
+	    size_t count, loff_t *offset)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct task_struct *p;
+	char buffer[PROC_NUMBUF];
+	long nice;
+	int err;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	err = strict_strtol(strstrip(buffer), 0, &nice);
+	if (err)
+		return -EINVAL;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+
+	err = nice;
+	err = proc_sched_autogroup_set_nice(p, &err);
+	if (err)
+		count = err;
+
+	put_task_struct(p);
+
+	return count;
+}
+
+static int sched_autogroup_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	ret = single_open(filp, sched_autogroup_show, NULL);
+	if (!ret) {
+		struct seq_file *m = filp->private_data;
+
+		m->private = inode;
+	}
+	return ret;
+}
+
+static const struct file_operations proc_pid_sched_autogroup_operations = {
+	.open		= sched_autogroup_open,
+	.read		= seq_read,
+	.write		= sched_autogroup_write,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
+
 static ssize_t comm_write(struct file *file, const char __user *buf,
 				size_t count, loff_t *offset)
 {
@@ -2733,6 +2809,9 @@ static const struct pid_entry tgid_base_
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
+#ifdef CONFIG_SCHED_AUTOGROUP
+	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
+#endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	INF("syscall",    S_IRUSR, proc_pid_syscall),
Index: linux-2.6/kernel/sched_autogroup.h
===================================================================
--- /dev/null
+++ linux-2.6/kernel/sched_autogroup.h
@@ -0,0 +1,32 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+struct autogroup {
+	struct kref		kref;
+	struct task_group	*tg;
+	struct rw_semaphore	lock;
+	unsigned long		id;
+	int			nice;
+};
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg);
+
+#else /* !CONFIG_SCHED_AUTOGROUP */
+
+static inline void autogroup_init(struct task_struct *init_task) {  }
+static inline void autogroup_free(struct task_group *tg) { }
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	return tg;
+}
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return 0;
+}
+#endif
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6/kernel/sched_autogroup.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/sched_autogroup.c
@@ -0,0 +1,229 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/utsname.h>
+
+unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
+static struct autogroup autogroup_default;
+static atomic_t autogroup_seq_nr;
+
+static void autogroup_init(struct task_struct *init_task)
+{
+	autogroup_default.tg = &init_task_group;
+	init_task_group.autogroup = &autogroup_default;
+	kref_init(&autogroup_default.kref);
+	init_rwsem(&autogroup_default.lock);
+	init_task->signal->autogroup = &autogroup_default;
+}
+
+static inline void autogroup_free(struct task_group *tg)
+{
+	kfree(tg->autogroup);
+}
+
+static inline void autogroup_destroy(struct kref *kref)
+{
+	struct autogroup *ag = container_of(kref, struct autogroup, kref);
+
+	sched_destroy_group(ag->tg);
+}
+
+static inline void autogroup_kref_put(struct autogroup *ag)
+{
+	kref_put(&ag->kref, autogroup_destroy);
+}
+
+static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
+{
+	kref_get(&ag->kref);
+	return ag;
+}
+
+static inline struct autogroup *autogroup_create(void)
+{
+	struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	struct task_group *tg;
+
+	if (!ag)
+		goto out_fail;
+
+	tg = sched_create_group(&init_task_group);
+
+	if (IS_ERR(tg))
+		goto out_free;
+
+	kref_init(&ag->kref);
+	init_rwsem(&ag->lock);
+	ag->id = atomic_inc_return(&autogroup_seq_nr);
+	ag->tg = tg;
+	tg->autogroup = ag;
+
+	return ag;
+
+out_free:
+	kfree(ag);
+out_fail:
+	if (printk_ratelimit()) {
+		printk(KERN_WARNING "autogroup_create: %s failure.\n",
+			ag ? "sched_create_group()" : "kmalloc()");
+	}
+
+	return autogroup_kref_get(&autogroup_default);
+}
+
+static inline bool
+task_wants_autogroup(struct task_struct *p, struct task_group *tg)
+{
+	if (tg != &root_task_group)
+		return false;
+
+	if (p->sched_class != &fair_sched_class)
+		return false;
+
+	/*
+	 * We can only assume the task group can't go away on us if
+	 * autogroup_move_group() can see us on ->thread_group list.
+	 */
+	if (p->flags & PF_EXITING)
+		return false;
+
+	return true;
+}
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
+
+	if (enabled && task_wants_autogroup(p, tg))
+		return p->signal->autogroup->tg;
+
+	return tg;
+}
+
+static void
+autogroup_move_group(struct task_struct *p, struct autogroup *ag)
+{
+	struct autogroup *prev;
+	struct task_struct *t;
+	unsigned long flags;
+
+	BUG_ON(!lock_task_sighand(p, &flags));
+
+	prev = p->signal->autogroup;
+	if (prev == ag) {
+		unlock_task_sighand(p, &flags);
+		return;
+	}
+
+	p->signal->autogroup = autogroup_kref_get(ag);
+
+	t = p;
+	do {
+		sched_move_task(t);
+	} while_each_thread(p, t);
+
+	unlock_task_sighand(p, &flags);
+	autogroup_kref_put(prev);
+}
+
+/* Allocates GFP_KERNEL, cannot be called under any spinlock */
+void sched_autogroup_create_attach(struct task_struct *p)
+{
+	struct autogroup *ag = autogroup_create();
+
+	autogroup_move_group(p, ag);
+	/* drop extra refrence added by autogroup_create() */
+	autogroup_kref_put(ag);
+}
+EXPORT_SYMBOL(sched_autogroup_create_attach);
+
+/* Cannot be called under siglock.  Currently has no users */
+void sched_autogroup_detach(struct task_struct *p)
+{
+	autogroup_move_group(p, &autogroup_default);
+}
+EXPORT_SYMBOL(sched_autogroup_detach);
+
+void sched_autogroup_fork(struct signal_struct *sig)
+{
+	struct task_struct *p = current;
+
+	spin_lock_irq(&p->sighand->siglock);
+	sig->autogroup = autogroup_kref_get(p->signal->autogroup);
+	spin_unlock_irq(&p->sighand->siglock);
+}
+
+void sched_autogroup_exit(struct signal_struct *sig)
+{
+	autogroup_kref_put(sig->autogroup);
+}
+
+static int __init setup_autogroup(char *str)
+{
+	sysctl_sched_autogroup_enabled = 0;
+
+	return 1;
+}
+
+__setup("noautogroup", setup_autogroup);
+
+#ifdef CONFIG_PROC_FS
+
+/* Called with siglock held. */
+int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
+{
+	static unsigned long next = INITIAL_JIFFIES;
+	struct autogroup *ag;
+	int err;
+
+	if (*nice < -20 || *nice > 19)
+		return -EINVAL;
+
+	err = security_task_setnice(current, *nice);
+	if (err)
+		return err;
+
+	if (*nice < 0 && !can_nice(current, *nice))
+		return -EPERM;
+
+	/* this is a heavy operation taking global locks.. */
+	if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
+		return -EAGAIN;
+
+	next = HZ / 10 + jiffies;
+	ag = autogroup_kref_get(p->signal->autogroup);
+
+	down_write(&ag->lock);
+	err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
+	if (!err)
+		ag->nice = *nice;
+	up_write(&ag->lock);
+
+	autogroup_kref_put(ag);
+
+	return err;
+}
+
+void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
+{
+	struct autogroup *ag = autogroup_kref_get(p->signal->autogroup);
+
+	down_read(&ag->lock);
+	seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
+	up_read(&ag->lock);
+
+	autogroup_kref_put(ag);
+}
+#endif /* CONFIG_PROC_FS */
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -370,6 +370,17 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_SCHED_AUTOGROUP
+	{
+		.procname	= "sched_autogroup_enabled",
+		.data		= &sysctl_sched_autogroup_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 #ifdef CONFIG_PROVE_LOCKING
 	{
 		.procname	= "prove_locking",
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -741,6 +741,18 @@ config NET_NS
 
 endif # NAMESPACES
 
+config SCHED_AUTOGROUP
+	bool "Automatic process group scheduling"
+	select CGROUPS
+	select CGROUP_SCHED
+	select FAIR_GROUP_SCHED
+	help
+	  This option optimizes the scheduler for common desktop workloads by
+	  automatically creating and populating task groups.  This separation
+	  of workloads isolates aggressive CPU burners (like build jobs) from
+	  desktop applications.  Task group autogeneration is currently based
+	  upon task session.
+
 config MM_OWNER
 	bool
 
Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -1622,6 +1622,8 @@ and is between 256 and 4096 characters.
 	noapic		[SMP,APIC] Tells the kernel to not make use of any
 			IOAPICs that may be present in the system.
 
+	noautogroup	Disable scheduler automatic task group creation.
+
 	nobats		[PPC] Do not use BATs for mapping kernel lowmem
 			on "Classic" PPC cores.
 





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 13:18                         ` Mike Galbraith
@ 2010-11-30 13:48                           ` Peter Zijlstra
  2010-11-30 13:59                             ` Ingo Molnar
  2010-11-30 14:13                           ` Ingo Molnar
  2010-11-30 15:17                           ` Vivek Goyal
  2 siblings, 1 reply; 79+ messages in thread
From: Peter Zijlstra @ 2010-11-30 13:48 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Turner, Ingo Molnar, Linus Torvalds, Oleg Nesterov, LKML

On Tue, 2010-11-30 at 14:18 +0100, Mike Galbraith wrote:
> 
> From: Mike Galbraith <efault@gmx.de>
> Date: Tue Nov 30 14:07:12 CET 2010
> Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups
> 
> A recurring complaint from CFS users is that parallel kbuild has a negative
> impact on desktop interactivity.  This patch implements an idea from Linus,
> to automatically create task groups.  Currently, only per session autogroups
> are implemented, but the patch leaves the way open for enhancement.
> 
> Implementation: each task's signal struct contains an inherited pointer to
> a refcounted autogroup struct containing a task group pointer, the default
> for all tasks pointing to the init_task_group.  When a task calls setsid(),
> a new task group is created, the process is moved into the new task group,
> and a reference to the preveious task group is dropped.  Child processes
> inherit this task group thereafter, and increase it's refcount.  When the
> last thread of a process exits, the process's reference is dropped, such
> that when the last process referencing an autogroup exits, the autogroup
> is destroyed.
> 
> At runqueue selection time, IFF a task has no cgroup assignment, its current
> autogroup is used.
> 
> Autogroup bandwidth is controllable via setting it's nice level through the
> proc filesystem.  cat /proc/<pid>/autogroup displays the task's group and the
> group's nice level.  echo <nice level> > /proc/<pid>/autogroup Sets the task
> group's shares to the weight of nice <level> task.  Setting nice level is rate
> limited for !admin users due to the abuse risk of task group locking.
> 
> The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
> selected, but can be disabled via the boot option noautogroup, and can also
> be turned on/off on the fly via..
>         echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
> ..which will automatically move tasks to/from the root task group.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
> Signed-off-by: Ingo Molnar <mingo@elte.hu> 

Looks good to me, Thanks Mike!

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 13:48                           ` Peter Zijlstra
@ 2010-11-30 13:59                             ` Ingo Molnar
  0 siblings, 0 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-30 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Paul Turner, Linus Torvalds, Oleg Nesterov, LKML


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2010-11-30 at 14:18 +0100, Mike Galbraith wrote:
> > 
> > From: Mike Galbraith <efault@gmx.de>
> > Date: Tue Nov 30 14:07:12 CET 2010
> > Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups
> > 
> > A recurring complaint from CFS users is that parallel kbuild has a negative
> > impact on desktop interactivity.  This patch implements an idea from Linus,
> > to automatically create task groups.  Currently, only per session autogroups
> > are implemented, but the patch leaves the way open for enhancement.
> > 
> > Implementation: each task's signal struct contains an inherited pointer to
> > a refcounted autogroup struct containing a task group pointer, the default
> > for all tasks pointing to the init_task_group.  When a task calls setsid(),
> > a new task group is created, the process is moved into the new task group,
> > and a reference to the preveious task group is dropped.  Child processes
> > inherit this task group thereafter, and increase it's refcount.  When the
> > last thread of a process exits, the process's reference is dropped, such
> > that when the last process referencing an autogroup exits, the autogroup
> > is destroyed.
> > 
> > At runqueue selection time, IFF a task has no cgroup assignment, its current
> > autogroup is used.
> > 
> > Autogroup bandwidth is controllable via setting it's nice level through the
> > proc filesystem.  cat /proc/<pid>/autogroup displays the task's group and the
> > group's nice level.  echo <nice level> > /proc/<pid>/autogroup Sets the task
> > group's shares to the weight of nice <level> task.  Setting nice level is rate
> > limited for !admin users due to the abuse risk of task group locking.
> > 
> > The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
> > selected, but can be disabled via the boot option noautogroup, and can also
> > be turned on/off on the fly via..
> >         echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
> > ..which will automatically move tasks to/from the root task group.
> > 
> > Signed-off-by: Mike Galbraith <efault@gmx.de>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
> > Signed-off-by: Ingo Molnar <mingo@elte.hu> 
> 
> Looks good to me, Thanks Mike!
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Ok, great!

I've queued it up in tip:sched/core and started testing it - will push it out if it 
passes basic tests. Added Linus's Acked-by - i presume that's still valid for v4 as 
well, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 13:18                         ` Mike Galbraith
  2010-11-30 13:48                           ` Peter Zijlstra
@ 2010-11-30 14:13                           ` Ingo Molnar
  2010-11-30 16:41                             ` Mike Galbraith
  2010-11-30 15:17                           ` Vivek Goyal
  2 siblings, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2010-11-30 14:13 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Turner, Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML


* Mike Galbraith <efault@gmx.de> wrote:

> On Mon, 2010-11-29 at 20:23 -0800, Paul Turner wrote:
> 
> > I've left some machines running tip + fix above + autogroup to see if
> > anything else emerges.  Hasn't crashed yet, I'll leave it going
> > overnight.
> 
> Thanks.  Below is the hopefully final version against tip.  The last I
> sent contained a couple remnants.

Note, I removed this chunk:

>  kernel/sched_debug.c                |   29 ++--

> Index: linux-2.6/kernel/sched_debug.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_debug.c
> +++ linux-2.6/kernel/sched_debug.c
> @@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
>  }
>  #endif
>  
> +#if defined(CONFIG_CGROUP_SCHED) && \
> +	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> +static void task_group_path(struct task_group *tg, char *buf, int buflen)
> +{
> +	/* may be NULL if the underlying cgroup isn't fully-created yet */
> +	if (!tg->css.cgroup) {
> +		if (!autogroup_path(tg, buf, buflen))
> +			buf[0] = '\0';
> +		return;
> +	}
> +	cgroup_path(tg->css.cgroup, buf, buflen);
> +}
> +#endif
> +
>  static void
>  print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
>  {
> @@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
>  		char path[64];
>  
>  		rcu_read_lock();
> -		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
> +		task_group_path(task_group(p), path, sizeof(path));
>  		rcu_read_unlock();
>  		SEQ_printf(m, " %s", path);
>  	}
> @@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
>  	read_unlock_irqrestore(&tasklist_lock, flags);
>  }
>  
> -#if defined(CONFIG_CGROUP_SCHED) && \
> -	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> -static void task_group_path(struct task_group *tg, char *buf, int buflen)
> -{
> -	/* may be NULL if the underlying cgroup isn't fully-created yet */
> -	if (!tg->css.cgroup) {
> -		buf[0] = '\0';
> -		return;
> -	}
> -	cgroup_path(tg->css.cgroup, buf, buflen);
> -}
> -#endif
> -
>  void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
>  {
>  	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,

Because it didn't build (for obvious reasons - the CONFIG conditions dont match up), 
but more importantly it's quite ugly. Some existing 'path' variables are 64 byte, 
some are 128 byte - so there's pre-existing damage - i removed it all.

Could we do this debugging code in a bit saner way please? (as a delta patch on top 
of the -tip that i'll push out in the next hour or so.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30  7:54                   ` Mike Galbraith
@ 2010-11-30 14:18                     ` Ingo Molnar
  2010-11-30 14:53                       ` Ingo Molnar
  2010-11-30 16:28                       ` Mike Galbraith
  0 siblings, 2 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-30 14:18 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner


* Mike Galbraith <efault@gmx.de> wrote:

> On Mon, 2010-11-29 at 20:20 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> > 
> > > > I know, from the testing so far we _thought_ it was fairly sane. Apparently 
> > > > there's still some work to do.
> > > 
> > > Damn thing bisected to:
> > > 
> > > commit 92fd4d4d67b945c0766416284d4ab236b31542c4
> > > Merge: fe7de49 e53beac
> > > Author: Ingo Molnar <mingo@elte.hu>
> > > Date:   Thu Nov 18 13:22:14 2010 +0100
> > > 
> > >     Merge commit 'v2.6.37-rc2' into sched/core
> > > 
> > >     Merge reason: Move to a .37-rc base.
> > > 
> > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > 
> > > 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit
> > 
> > Hm, i'd suggest to double check the two originator points:
> > 
> >   e53beac - is it really 'bad' ?
> >   fe7de49 - is it really 'good'?
> 
> Nope.   I did a bisection this morning in text mode with a pipe-test
> based measurement proggy, and it bisected cleanly.
> 
> 2069dd75c7d0f49355939e5586daf5a9ab216db7 is the first bad commit
> 
> commit 2069dd75c7d0f49355939e5586daf5a9ab216db7
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Mon Nov 15 15:47:00 2010 -0800
> 
>     sched: Rewrite tg_shares_up)

Ok. And has this fixed it:

  822bc180a7f7: sched: Fix unregister_fair_sched_group()

... or are there two bugs?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 14:18                     ` Ingo Molnar
@ 2010-11-30 14:53                       ` Ingo Molnar
  2010-11-30 15:01                         ` Peter Zijlstra
  2010-11-30 16:28                       ` Mike Galbraith
  1 sibling, 1 reply; 79+ messages in thread
From: Ingo Molnar @ 2010-11-30 14:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner


another detail is that i needed this fix:

--- linux.orig/init/Kconfig
+++ linux/init/Kconfig
@@ -790,6 +790,7 @@ endif # NAMESPACES
 
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
+	select EVENTFD
 	select CGROUPS
 	select CGROUP_SCHED
 	select FAIR_GROUP_SCHED

Because CGROUPS depends on eventfd infrastructure.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 14:53                       ` Ingo Molnar
@ 2010-11-30 15:01                         ` Peter Zijlstra
  2010-11-30 15:11                           ` Ingo Molnar
  0 siblings, 1 reply; 79+ messages in thread
From: Peter Zijlstra @ 2010-11-30 15:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Galbraith, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner

On Tue, 2010-11-30 at 15:53 +0100, Ingo Molnar wrote:
> another detail is that i needed this fix:
> 
> --- linux.orig/init/Kconfig
> +++ linux/init/Kconfig
> @@ -790,6 +790,7 @@ endif # NAMESPACES
>  
>  config SCHED_AUTOGROUP
>  	bool "Automatic process group scheduling"
> +	select EVENTFD
>  	select CGROUPS
>  	select CGROUP_SCHED
>  	select FAIR_GROUP_SCHED
> 
> Because CGROUPS depends on eventfd infrastructure.

Shouldn't then cgroups select that?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 15:01                         ` Peter Zijlstra
@ 2010-11-30 15:11                           ` Ingo Molnar
  0 siblings, 0 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-11-30 15:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2010-11-30 at 15:53 +0100, Ingo Molnar wrote:
> > another detail is that i needed this fix:
> > 
> > --- linux.orig/init/Kconfig
> > +++ linux/init/Kconfig
> > @@ -790,6 +790,7 @@ endif # NAMESPACES
> >  
> >  config SCHED_AUTOGROUP
> >  	bool "Automatic process group scheduling"
> > +	select EVENTFD
> >  	select CGROUPS
> >  	select CGROUP_SCHED
> >  	select FAIR_GROUP_SCHED
> > 
> > Because CGROUPS depends on eventfd infrastructure.
> 
> Shouldn't then cgroups select that?

It depends on it so selecting it is fine. !EVENTFD is a CONFIG_EMBEDDED-only thing 
anyway.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 13:18                         ` Mike Galbraith
  2010-11-30 13:48                           ` Peter Zijlstra
  2010-11-30 14:13                           ` Ingo Molnar
@ 2010-11-30 15:17                           ` Vivek Goyal
  2010-11-30 17:13                             ` Mike Galbraith
  2 siblings, 1 reply; 79+ messages in thread
From: Vivek Goyal @ 2010-11-30 15:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Linus Torvalds,
	Oleg Nesterov, LKML

On Tue, Nov 30, 2010 at 02:18:03PM +0100, Mike Galbraith wrote:
> On Mon, 2010-11-29 at 20:23 -0800, Paul Turner wrote:
> 
> > I've left some machines running tip + fix above + autogroup to see if
> > anything else emerges.  Hasn't crashed yet, I'll leave it going
> > overnight.
> 
> Thanks.  Below is the hopefully final version against tip.  The last I
> sent contained a couple remnants.
> 
> From: Mike Galbraith <efault@gmx.de>
> Date: Tue Nov 30 14:07:12 CET 2010
> Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups
> 
> A recurring complaint from CFS users is that parallel kbuild has a negative
> impact on desktop interactivity.  This patch implements an idea from Linus,
> to automatically create task groups.  Currently, only per session autogroups
> are implemented, but the patch leaves the way open for enhancement.
> 
> Implementation: each task's signal struct contains an inherited pointer to
> a refcounted autogroup struct containing a task group pointer, the default
> for all tasks pointing to the init_task_group.  When a task calls setsid(),
> a new task group is created, the process is moved into the new task group,
> and a reference to the preveious task group is dropped.  Child processes
> inherit this task group thereafter, and increase it's refcount.  When the
> last thread of a process exits, the process's reference is dropped, such
> that when the last process referencing an autogroup exits, the autogroup
> is destroyed.
> 
> At runqueue selection time, IFF a task has no cgroup assignment, its current
> autogroup is used.
> 
> Autogroup bandwidth is controllable via setting it's nice level through the
> proc filesystem.  cat /proc/<pid>/autogroup displays the task's group and the
> group's nice level.  echo <nice level> > /proc/<pid>/autogroup Sets the task
> group's shares to the weight of nice <level> task.  Setting nice level is rate
> limited for !admin users due to the abuse risk of task group locking.
> 

Hi Mike,

I was wonderig if these autogroups can be visible in regular cgroup
hierarchy so that once somebody mounts cpu controller, these are visible?

I was wondering why is a good idea to create a separate interface for
autogroups through proc and not try to integrate it with cgroup interface.

Without it now any user space tool shall have to either disable the
autogroup feature completely or now also worry about /proc interface
and there also autogroups are searchable through pid and there is no
direct way to access these.

IIUC, these autogroups create flat setup and are at same level as
init_task_group and are not children of it. Currently cpu cgroup 
is hierarchical by default and any new cgroup is child of init_task_group
and that could lead to representation issues.

Well, will we not get same kind of latency boost if we make these autogroups
children of root? If yes, then hierarchical representation issue of autogroup
will be a moot point.

We already have /proc/<pid>/cgroup interface which points to tasks's
cgroup. We probably can avoid creating /proc/<pid>/autgroup if there
is an associated cgroup which appears in cgroup hierachy and then user
can change the weight of group through cgroup interface (instead of
introducing another interface).

Thanks
Vivek

> The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
> selected, but can be disabled via the boot option noautogroup, and can also
> be turned on/off on the fly via..
> 	echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
> ..which will automatically move tasks to/from the root task group.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  Documentation/kernel-parameters.txt |    2 
>  fs/proc/base.c                      |   79 ++++++++++++
>  include/linux/sched.h               |   23 +++
>  init/Kconfig                        |   12 +
>  kernel/fork.c                       |    5 
>  kernel/sched.c                      |   13 +-
>  kernel/sched_autogroup.c            |  229 ++++++++++++++++++++++++++++++++++++
>  kernel/sched_autogroup.h            |   32 +++++
>  kernel/sched_debug.c                |   29 ++--
>  kernel/sys.c                        |    4 
>  kernel/sysctl.c                     |   11 +
>  11 files changed, 421 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/include/linux/sched.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sched.h
> +++ linux-2.6/include/linux/sched.h
> @@ -513,6 +513,8 @@ struct thread_group_cputimer {
>  	spinlock_t lock;
>  };
>  
> +struct autogroup;
> +
>  /*
>   * NOTE! "signal_struct" does not have it's own
>   * locking, because a shared signal_struct always
> @@ -580,6 +582,9 @@ struct signal_struct {
>  
>  	struct tty_struct *tty; /* NULL if no tty */
>  
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +	struct autogroup *autogroup;
> +#endif
>  	/*
>  	 * Cumulative resource counters for dead threads in the group,
>  	 * and for reaped dead child processes forked by this group.
> @@ -1931,6 +1936,24 @@ int sched_rt_handler(struct ctl_table *t
>  
>  extern unsigned int sysctl_sched_compat_yield;
>  
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +extern unsigned int sysctl_sched_autogroup_enabled;
> +
> +extern void sched_autogroup_create_attach(struct task_struct *p);
> +extern void sched_autogroup_detach(struct task_struct *p);
> +extern void sched_autogroup_fork(struct signal_struct *sig);
> +extern void sched_autogroup_exit(struct signal_struct *sig);
> +#ifdef CONFIG_PROC_FS
> +extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
> +extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
> +#endif
> +#else
> +static inline void sched_autogroup_create_attach(struct task_struct *p) { }
> +static inline void sched_autogroup_detach(struct task_struct *p) { }
> +static inline void sched_autogroup_fork(struct signal_struct *sig) { }
> +static inline void sched_autogroup_exit(struct signal_struct *sig) { }
> +#endif
> +
>  #ifdef CONFIG_RT_MUTEXES
>  extern int rt_mutex_getprio(struct task_struct *p);
>  extern void rt_mutex_setprio(struct task_struct *p, int prio);
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -79,6 +79,7 @@
>  
>  #include "sched_cpupri.h"
>  #include "workqueue_sched.h"
> +#include "sched_autogroup.h"
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/sched.h>
> @@ -271,6 +272,10 @@ struct task_group {
>  	struct task_group *parent;
>  	struct list_head siblings;
>  	struct list_head children;
> +
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +	struct autogroup *autogroup;
> +#endif
>  };
>  
>  #define root_task_group init_task_group
> @@ -603,11 +608,14 @@ static inline int cpu_of(struct rq *rq)
>   */
>  static inline struct task_group *task_group(struct task_struct *p)
>  {
> +	struct task_group *tg;
>  	struct cgroup_subsys_state *css;
>  
>  	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
>  			lockdep_is_held(&task_rq(p)->lock));
> -	return container_of(css, struct task_group, css);
> +	tg = container_of(css, struct task_group, css);
> +
> +	return autogroup_task_group(p, tg);
>  }
>  
>  /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
> @@ -1869,6 +1877,7 @@ static void sched_irq_time_avg_update(st
>  #include "sched_idletask.c"
>  #include "sched_fair.c"
>  #include "sched_rt.c"
> +#include "sched_autogroup.c"
>  #include "sched_stoptask.c"
>  #ifdef CONFIG_SCHED_DEBUG
>  # include "sched_debug.c"
> @@ -7752,7 +7761,7 @@ void __init sched_init(void)
>  #ifdef CONFIG_CGROUP_SCHED
>  	list_add(&init_task_group.list, &task_groups);
>  	INIT_LIST_HEAD(&init_task_group.children);
> -
> +	autogroup_init(&init_task);
>  #endif /* CONFIG_CGROUP_SCHED */
>  
>  	for_each_possible_cpu(i) {
> Index: linux-2.6/kernel/fork.c
> ===================================================================
> --- linux-2.6.orig/kernel/fork.c
> +++ linux-2.6/kernel/fork.c
> @@ -174,8 +174,10 @@ static inline void free_signal_struct(st
>  
>  static inline void put_signal_struct(struct signal_struct *sig)
>  {
> -	if (atomic_dec_and_test(&sig->sigcnt))
> +	if (atomic_dec_and_test(&sig->sigcnt)) {
> +		sched_autogroup_exit(sig);
>  		free_signal_struct(sig);
> +	}
>  }
>  
>  void __put_task_struct(struct task_struct *tsk)
> @@ -904,6 +906,7 @@ static int copy_signal(unsigned long clo
>  	posix_cpu_timers_init_group(sig);
>  
>  	tty_audit_fork(sig);
> +	sched_autogroup_fork(sig);
>  
>  	sig->oom_adj = current->signal->oom_adj;
>  	sig->oom_score_adj = current->signal->oom_score_adj;
> Index: linux-2.6/kernel/sys.c
> ===================================================================
> --- linux-2.6.orig/kernel/sys.c
> +++ linux-2.6/kernel/sys.c
> @@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
>  	err = session;
>  out:
>  	write_unlock_irq(&tasklist_lock);
> -	if (err > 0)
> +	if (err > 0) {
>  		proc_sid_connector(group_leader);
> +		sched_autogroup_create_attach(group_leader);
> +	}
>  	return err;
>  }
>  
> Index: linux-2.6/kernel/sched_debug.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_debug.c
> +++ linux-2.6/kernel/sched_debug.c
> @@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
>  }
>  #endif
>  
> +#if defined(CONFIG_CGROUP_SCHED) && \
> +	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> +static void task_group_path(struct task_group *tg, char *buf, int buflen)
> +{
> +	/* may be NULL if the underlying cgroup isn't fully-created yet */
> +	if (!tg->css.cgroup) {
> +		if (!autogroup_path(tg, buf, buflen))
> +			buf[0] = '\0';
> +		return;
> +	}
> +	cgroup_path(tg->css.cgroup, buf, buflen);
> +}
> +#endif
> +
>  static void
>  print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
>  {
> @@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
>  		char path[64];
>  
>  		rcu_read_lock();
> -		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
> +		task_group_path(task_group(p), path, sizeof(path));
>  		rcu_read_unlock();
>  		SEQ_printf(m, " %s", path);
>  	}
> @@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
>  	read_unlock_irqrestore(&tasklist_lock, flags);
>  }
>  
> -#if defined(CONFIG_CGROUP_SCHED) && \
> -	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> -static void task_group_path(struct task_group *tg, char *buf, int buflen)
> -{
> -	/* may be NULL if the underlying cgroup isn't fully-created yet */
> -	if (!tg->css.cgroup) {
> -		buf[0] = '\0';
> -		return;
> -	}
> -	cgroup_path(tg->css.cgroup, buf, buflen);
> -}
> -#endif
> -
>  void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
>  {
>  	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
> Index: linux-2.6/fs/proc/base.c
> ===================================================================
> --- linux-2.6.orig/fs/proc/base.c
> +++ linux-2.6/fs/proc/base.c
> @@ -1407,6 +1407,82 @@ static const struct file_operations proc
>  
>  #endif
>  
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +/*
> + * Print out autogroup related information:
> + */
> +static int sched_autogroup_show(struct seq_file *m, void *v)
> +{
> +	struct inode *inode = m->private;
> +	struct task_struct *p;
> +
> +	p = get_proc_task(inode);
> +	if (!p)
> +		return -ESRCH;
> +	proc_sched_autogroup_show_task(p, m);
> +
> +	put_task_struct(p);
> +
> +	return 0;
> +}
> +
> +static ssize_t
> +sched_autogroup_write(struct file *file, const char __user *buf,
> +	    size_t count, loff_t *offset)
> +{
> +	struct inode *inode = file->f_path.dentry->d_inode;
> +	struct task_struct *p;
> +	char buffer[PROC_NUMBUF];
> +	long nice;
> +	int err;
> +
> +	memset(buffer, 0, sizeof(buffer));
> +	if (count > sizeof(buffer) - 1)
> +		count = sizeof(buffer) - 1;
> +	if (copy_from_user(buffer, buf, count))
> +		return -EFAULT;
> +
> +	err = strict_strtol(strstrip(buffer), 0, &nice);
> +	if (err)
> +		return -EINVAL;
> +
> +	p = get_proc_task(inode);
> +	if (!p)
> +		return -ESRCH;
> +
> +	err = nice;
> +	err = proc_sched_autogroup_set_nice(p, &err);
> +	if (err)
> +		count = err;
> +
> +	put_task_struct(p);
> +
> +	return count;
> +}
> +
> +static int sched_autogroup_open(struct inode *inode, struct file *filp)
> +{
> +	int ret;
> +
> +	ret = single_open(filp, sched_autogroup_show, NULL);
> +	if (!ret) {
> +		struct seq_file *m = filp->private_data;
> +
> +		m->private = inode;
> +	}
> +	return ret;
> +}
> +
> +static const struct file_operations proc_pid_sched_autogroup_operations = {
> +	.open		= sched_autogroup_open,
> +	.read		= seq_read,
> +	.write		= sched_autogroup_write,
> +	.llseek		= seq_lseek,
> +	.release	= single_release,
> +};
> +
> +#endif /* CONFIG_SCHED_AUTOGROUP */
> +
>  static ssize_t comm_write(struct file *file, const char __user *buf,
>  				size_t count, loff_t *offset)
>  {
> @@ -2733,6 +2809,9 @@ static const struct pid_entry tgid_base_
>  #ifdef CONFIG_SCHED_DEBUG
>  	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
>  #endif
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
> +#endif
>  	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
>  #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
>  	INF("syscall",    S_IRUSR, proc_pid_syscall),
> Index: linux-2.6/kernel/sched_autogroup.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/sched_autogroup.h
> @@ -0,0 +1,32 @@
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +
> +struct autogroup {
> +	struct kref		kref;
> +	struct task_group	*tg;
> +	struct rw_semaphore	lock;
> +	unsigned long		id;
> +	int			nice;
> +};
> +
> +static inline struct task_group *
> +autogroup_task_group(struct task_struct *p, struct task_group *tg);
> +
> +#else /* !CONFIG_SCHED_AUTOGROUP */
> +
> +static inline void autogroup_init(struct task_struct *init_task) {  }
> +static inline void autogroup_free(struct task_group *tg) { }
> +
> +static inline struct task_group *
> +autogroup_task_group(struct task_struct *p, struct task_group *tg)
> +{
> +	return tg;
> +}
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#endif /* CONFIG_SCHED_AUTOGROUP */
> Index: linux-2.6/kernel/sched_autogroup.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/sched_autogroup.c
> @@ -0,0 +1,229 @@
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +
> +#include <linux/proc_fs.h>
> +#include <linux/seq_file.h>
> +#include <linux/kallsyms.h>
> +#include <linux/utsname.h>
> +
> +unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
> +static struct autogroup autogroup_default;
> +static atomic_t autogroup_seq_nr;
> +
> +static void autogroup_init(struct task_struct *init_task)
> +{
> +	autogroup_default.tg = &init_task_group;
> +	init_task_group.autogroup = &autogroup_default;
> +	kref_init(&autogroup_default.kref);
> +	init_rwsem(&autogroup_default.lock);
> +	init_task->signal->autogroup = &autogroup_default;
> +}
> +
> +static inline void autogroup_free(struct task_group *tg)
> +{
> +	kfree(tg->autogroup);
> +}
> +
> +static inline void autogroup_destroy(struct kref *kref)
> +{
> +	struct autogroup *ag = container_of(kref, struct autogroup, kref);
> +
> +	sched_destroy_group(ag->tg);
> +}
> +
> +static inline void autogroup_kref_put(struct autogroup *ag)
> +{
> +	kref_put(&ag->kref, autogroup_destroy);
> +}
> +
> +static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
> +{
> +	kref_get(&ag->kref);
> +	return ag;
> +}
> +
> +static inline struct autogroup *autogroup_create(void)
> +{
> +	struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
> +	struct task_group *tg;
> +
> +	if (!ag)
> +		goto out_fail;
> +
> +	tg = sched_create_group(&init_task_group);
> +
> +	if (IS_ERR(tg))
> +		goto out_free;
> +
> +	kref_init(&ag->kref);
> +	init_rwsem(&ag->lock);
> +	ag->id = atomic_inc_return(&autogroup_seq_nr);
> +	ag->tg = tg;
> +	tg->autogroup = ag;
> +
> +	return ag;
> +
> +out_free:
> +	kfree(ag);
> +out_fail:
> +	if (printk_ratelimit()) {
> +		printk(KERN_WARNING "autogroup_create: %s failure.\n",
> +			ag ? "sched_create_group()" : "kmalloc()");
> +	}
> +
> +	return autogroup_kref_get(&autogroup_default);
> +}
> +
> +static inline bool
> +task_wants_autogroup(struct task_struct *p, struct task_group *tg)
> +{
> +	if (tg != &root_task_group)
> +		return false;
> +
> +	if (p->sched_class != &fair_sched_class)
> +		return false;
> +
> +	/*
> +	 * We can only assume the task group can't go away on us if
> +	 * autogroup_move_group() can see us on ->thread_group list.
> +	 */
> +	if (p->flags & PF_EXITING)
> +		return false;
> +
> +	return true;
> +}
> +
> +static inline struct task_group *
> +autogroup_task_group(struct task_struct *p, struct task_group *tg)
> +{
> +	int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
> +
> +	if (enabled && task_wants_autogroup(p, tg))
> +		return p->signal->autogroup->tg;
> +
> +	return tg;
> +}
> +
> +static void
> +autogroup_move_group(struct task_struct *p, struct autogroup *ag)
> +{
> +	struct autogroup *prev;
> +	struct task_struct *t;
> +	unsigned long flags;
> +
> +	BUG_ON(!lock_task_sighand(p, &flags));
> +
> +	prev = p->signal->autogroup;
> +	if (prev == ag) {
> +		unlock_task_sighand(p, &flags);
> +		return;
> +	}
> +
> +	p->signal->autogroup = autogroup_kref_get(ag);
> +
> +	t = p;
> +	do {
> +		sched_move_task(t);
> +	} while_each_thread(p, t);
> +
> +	unlock_task_sighand(p, &flags);
> +	autogroup_kref_put(prev);
> +}
> +
> +/* Allocates GFP_KERNEL, cannot be called under any spinlock */
> +void sched_autogroup_create_attach(struct task_struct *p)
> +{
> +	struct autogroup *ag = autogroup_create();
> +
> +	autogroup_move_group(p, ag);
> +	/* drop extra refrence added by autogroup_create() */
> +	autogroup_kref_put(ag);
> +}
> +EXPORT_SYMBOL(sched_autogroup_create_attach);
> +
> +/* Cannot be called under siglock.  Currently has no users */
> +void sched_autogroup_detach(struct task_struct *p)
> +{
> +	autogroup_move_group(p, &autogroup_default);
> +}
> +EXPORT_SYMBOL(sched_autogroup_detach);
> +
> +void sched_autogroup_fork(struct signal_struct *sig)
> +{
> +	struct task_struct *p = current;
> +
> +	spin_lock_irq(&p->sighand->siglock);
> +	sig->autogroup = autogroup_kref_get(p->signal->autogroup);
> +	spin_unlock_irq(&p->sighand->siglock);
> +}
> +
> +void sched_autogroup_exit(struct signal_struct *sig)
> +{
> +	autogroup_kref_put(sig->autogroup);
> +}
> +
> +static int __init setup_autogroup(char *str)
> +{
> +	sysctl_sched_autogroup_enabled = 0;
> +
> +	return 1;
> +}
> +
> +__setup("noautogroup", setup_autogroup);
> +
> +#ifdef CONFIG_PROC_FS
> +
> +/* Called with siglock held. */
> +int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
> +{
> +	static unsigned long next = INITIAL_JIFFIES;
> +	struct autogroup *ag;
> +	int err;
> +
> +	if (*nice < -20 || *nice > 19)
> +		return -EINVAL;
> +
> +	err = security_task_setnice(current, *nice);
> +	if (err)
> +		return err;
> +
> +	if (*nice < 0 && !can_nice(current, *nice))
> +		return -EPERM;
> +
> +	/* this is a heavy operation taking global locks.. */
> +	if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
> +		return -EAGAIN;
> +
> +	next = HZ / 10 + jiffies;
> +	ag = autogroup_kref_get(p->signal->autogroup);
> +
> +	down_write(&ag->lock);
> +	err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
> +	if (!err)
> +		ag->nice = *nice;
> +	up_write(&ag->lock);
> +
> +	autogroup_kref_put(ag);
> +
> +	return err;
> +}
> +
> +void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
> +{
> +	struct autogroup *ag = autogroup_kref_get(p->signal->autogroup);
> +
> +	down_read(&ag->lock);
> +	seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
> +	up_read(&ag->lock);
> +
> +	autogroup_kref_put(ag);
> +}
> +#endif /* CONFIG_PROC_FS */
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
> +{
> +	return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
> +}
> +#endif /* CONFIG_SCHED_DEBUG */
> +
> +#endif /* CONFIG_SCHED_AUTOGROUP */
> Index: linux-2.6/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.orig/kernel/sysctl.c
> +++ linux-2.6/kernel/sysctl.c
> @@ -370,6 +370,17 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +	{
> +		.procname	= "sched_autogroup_enabled",
> +		.data		= &sysctl_sched_autogroup_enabled,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +#endif
>  #ifdef CONFIG_PROVE_LOCKING
>  	{
>  		.procname	= "prove_locking",
> Index: linux-2.6/init/Kconfig
> ===================================================================
> --- linux-2.6.orig/init/Kconfig
> +++ linux-2.6/init/Kconfig
> @@ -741,6 +741,18 @@ config NET_NS
>  
>  endif # NAMESPACES
>  
> +config SCHED_AUTOGROUP
> +	bool "Automatic process group scheduling"
> +	select CGROUPS
> +	select CGROUP_SCHED
> +	select FAIR_GROUP_SCHED
> +	help
> +	  This option optimizes the scheduler for common desktop workloads by
> +	  automatically creating and populating task groups.  This separation
> +	  of workloads isolates aggressive CPU burners (like build jobs) from
> +	  desktop applications.  Task group autogeneration is currently based
> +	  upon task session.
> +
>  config MM_OWNER
>  	bool
>  
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6/Documentation/kernel-parameters.txt
> @@ -1622,6 +1622,8 @@ and is between 256 and 4096 characters.
>  	noapic		[SMP,APIC] Tells the kernel to not make use of any
>  			IOAPICs that may be present in the system.
>  
> +	noautogroup	Disable scheduler automatic task group creation.
> +
>  	nobats		[PPC] Do not use BATs for mapping kernel lowmem
>  			on "Classic" PPC cores.
>  
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 14:18                     ` Ingo Molnar
  2010-11-30 14:53                       ` Ingo Molnar
@ 2010-11-30 16:28                       ` Mike Galbraith
  1 sibling, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-30 16:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML, Paul Turner

On Tue, 2010-11-30 at 15:18 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Mon, 2010-11-29 at 20:20 +0100, Ingo Molnar wrote:
> > > * Mike Galbraith <efault@gmx.de> wrote:
> > > 
> > > > > I know, from the testing so far we _thought_ it was fairly sane. Apparently 
> > > > > there's still some work to do.
> > > > 
> > > > Damn thing bisected to:
> > > > 
> > > > commit 92fd4d4d67b945c0766416284d4ab236b31542c4
> > > > Merge: fe7de49 e53beac
> > > > Author: Ingo Molnar <mingo@elte.hu>
> > > > Date:   Thu Nov 18 13:22:14 2010 +0100
> > > > 
> > > >     Merge commit 'v2.6.37-rc2' into sched/core
> > > > 
> > > >     Merge reason: Move to a .37-rc base.
> > > > 
> > > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > > 
> > > > 92fd4d4d67b945c0766416284d4ab236b31542c4 is the first bad commit
> > > 
> > > Hm, i'd suggest to double check the two originator points:
> > > 
> > >   e53beac - is it really 'bad' ?
> > >   fe7de49 - is it really 'good'?
> > 
> > Nope.   I did a bisection this morning in text mode with a pipe-test
> > based measurement proggy, and it bisected cleanly.
> > 
> > 2069dd75c7d0f49355939e5586daf5a9ab216db7 is the first bad commit
> > 
> > commit 2069dd75c7d0f49355939e5586daf5a9ab216db7
> > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date:   Mon Nov 15 15:47:00 2010 -0800
> > 
> >     sched: Rewrite tg_shares_up)
> 
> Ok. And has this fixed it:
> 
>   822bc180a7f7: sched: Fix unregister_fair_sched_group()
> 
> ... or are there two bugs?

Two bugs.  822bc180a7f7 fixes the explosions that were happening in tip.
The interactivity issue is some problem in the update_shares() stuff.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 14:13                           ` Ingo Molnar
@ 2010-11-30 16:41                             ` Mike Galbraith
  0 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-11-30 16:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul Turner, Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML

On Tue, 2010-11-30 at 15:13 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Mon, 2010-11-29 at 20:23 -0800, Paul Turner wrote:
> > 
> > > I've left some machines running tip + fix above + autogroup to see if
> > > anything else emerges.  Hasn't crashed yet, I'll leave it going
> > > overnight.
> > 
> > Thanks.  Below is the hopefully final version against tip.  The last I
> > sent contained a couple remnants.
> 
> Note, I removed this chunk:
> 
> >  kernel/sched_debug.c                |   29 ++--
> 
> > Index: linux-2.6/kernel/sched_debug.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched_debug.c
> > +++ linux-2.6/kernel/sched_debug.c
> > @@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
> >  }
> >  #endif
> >  
> > +#if defined(CONFIG_CGROUP_SCHED) && \
> > +	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> > +static void task_group_path(struct task_group *tg, char *buf, int buflen)
> > +{
> > +	/* may be NULL if the underlying cgroup isn't fully-created yet */
> > +	if (!tg->css.cgroup) {
> > +		if (!autogroup_path(tg, buf, buflen))
> > +			buf[0] = '\0';
> > +		return;
> > +	}
> > +	cgroup_path(tg->css.cgroup, buf, buflen);
> > +}
> > +#endif
> > +
> >  static void
> >  print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
> >  {
> > @@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
> >  		char path[64];
> >  
> >  		rcu_read_lock();
> > -		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
> > +		task_group_path(task_group(p), path, sizeof(path));
> >  		rcu_read_unlock();
> >  		SEQ_printf(m, " %s", path);
> >  	}
> > @@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
> >  	read_unlock_irqrestore(&tasklist_lock, flags);
> >  }
> >  
> > -#if defined(CONFIG_CGROUP_SCHED) && \
> > -	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> > -static void task_group_path(struct task_group *tg, char *buf, int buflen)
> > -{
> > -	/* may be NULL if the underlying cgroup isn't fully-created yet */
> > -	if (!tg->css.cgroup) {
> > -		buf[0] = '\0';
> > -		return;
> > -	}
> > -	cgroup_path(tg->css.cgroup, buf, buflen);
> > -}
> > -#endif
> > -
> >  void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
> >  {
> >  	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
> 
> Because it didn't build (for obvious reasons - the CONFIG conditions dont match up), 
> but more importantly it's quite ugly. Some existing 'path' variables are 64 byte, 
> some are 128 byte - so there's pre-existing damage - i removed it all.

Won't removing that hunk bring back oops if you cat /proc/sched_debug?

cfs_rq[0]:/autogroup-88
  .exec_clock                    : 0.228697
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 0.819879
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -22925903.100800

> Could we do this debugging code in a bit saner way please? (as a delta patch on top 
> of the -tip that i'll push out in the next hour or so.)

Guess I'll try.

	-Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 15:17                           ` Vivek Goyal
@ 2010-11-30 17:13                             ` Mike Galbraith
  2010-11-30 19:36                               ` Vivek Goyal
  0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-30 17:13 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Linus Torvalds,
	Oleg Nesterov, LKML

On Tue, 2010-11-30 at 10:17 -0500, Vivek Goyal wrote:

> Hi Mike,

Hi,

> I was wonderig if these autogroups can be visible in regular cgroup
> hierarchy so that once somebody mounts cpu controller, these are visible?

No, autogroup is not auto-cgroup.  You get zero whistles and zero bells
with autogroup.  Only dirt simple automated task groups.

> I was wondering why is a good idea to create a separate interface for
> autogroups through proc and not try to integrate it with cgroup interface.

I only put an interface there at all because it was requested, and made
it a dirt simple 'nice level' interface because there's nothing simpler
than 'nice'.  The whole autogroup thing is intended for folks who don't
want to set up cgroups, shares yadayada, so tying it into the cgroups
interface seems kinda pointless.

> Without it now any user space tool shall have to either disable the
> autogroup feature completely or now also worry about /proc interface
> and there also autogroups are searchable through pid and there is no
> direct way to access these.

Maybe I should make it disable itself when you mount big brother.

> IIUC, these autogroups create flat setup and are at same level as
> init_task_group and are not children of it. Currently cpu cgroup 
> is hierarchical by default and any new cgroup is child of init_task_group
> and that could lead to representation issues.

Well, it's flat, but autogroup does..
	tg = sched_create_group(&init_task_group);

> Well, will we not get same kind of latency boost if we make these autogroups
> children of root? If yes, then hierarchical representation issue of autogroup
> will be a moot point.

No problem then.

> We already have /proc/<pid>/cgroup interface which points to tasks's
> cgroup. We probably can avoid creating /proc/<pid>/autgroup if there
> is an associated cgroup which appears in cgroup hierachy and then user
> can change the weight of group through cgroup interface (instead of
> introducing another interface).

That's possible (for someone familiar with cgroups;), but I don't see
any reason for a wedding.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 17:13                             ` Mike Galbraith
@ 2010-11-30 19:36                               ` Vivek Goyal
  2010-12-01  5:01                                 ` Américo Wang
  2010-12-01  5:57                                 ` Mike Galbraith
  0 siblings, 2 replies; 79+ messages in thread
From: Vivek Goyal @ 2010-11-30 19:36 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Linus Torvalds,
	Oleg Nesterov, LKML

On Tue, Nov 30, 2010 at 06:13:41PM +0100, Mike Galbraith wrote:
> On Tue, 2010-11-30 at 10:17 -0500, Vivek Goyal wrote:
> 
> > Hi Mike,
> 
> Hi,
> 
> > I was wonderig if these autogroups can be visible in regular cgroup
> > hierarchy so that once somebody mounts cpu controller, these are visible?
> 
> No, autogroup is not auto-cgroup.  You get zero whistles and zero bells
> with autogroup.  Only dirt simple automated task groups.
> 
> > I was wondering why is a good idea to create a separate interface for
> > autogroups through proc and not try to integrate it with cgroup interface.
> 
> I only put an interface there at all because it was requested, and made
> it a dirt simple 'nice level' interface because there's nothing simpler
> than 'nice'.  The whole autogroup thing is intended for folks who don't
> want to set up cgroups, shares yadayada, so tying it into the cgroups
> interface seems kinda pointless.
> 
> > Without it now any user space tool shall have to either disable the
> > autogroup feature completely or now also worry about /proc interface
> > and there also autogroups are searchable through pid and there is no
> > direct way to access these.
> 
> Maybe I should make it disable itself when you mount big brother.
> 
> > IIUC, these autogroups create flat setup and are at same level as
> > init_task_group and are not children of it. Currently cpu cgroup 
> > is hierarchical by default and any new cgroup is child of init_task_group
> > and that could lead to representation issues.
> 
> Well, it's flat, but autogroup does..
> 	tg = sched_create_group(&init_task_group);
> 
> > Well, will we not get same kind of latency boost if we make these autogroups
> > children of root? If yes, then hierarchical representation issue of autogroup
> > will be a moot point.
> 
> No problem then.
> 
> > We already have /proc/<pid>/cgroup interface which points to tasks's
> > cgroup. We probably can avoid creating /proc/<pid>/autgroup if there
> > is an associated cgroup which appears in cgroup hierachy and then user
> > can change the weight of group through cgroup interface (instead of
> > introducing another interface).
> 
> That's possible (for someone familiar with cgroups;), but I don't see
> any reason for a wedding.

Few things.

- This /proc/<pid>/autogroup is good for doing this single thing but when
  I start thinking of possible extensions of it down the line, it creates
  issues. 

- Once we have some kind of uppper limit support in cpu controller, these
  autogroups are beyond control. If you want to impose some kind of 
  limits on them then you shall have to extend parallel interface
  /proc/<pid>/autogroup to also speicify upper limit (like nice levels).

- Similiarly if this autgroup notion is extended to other cgroup
  controllers, then you shall have to again extend /proc/<pid>/autogroup
  to be able to specify these additional parameters.

- If there is a monitoring tool which is monitoring the system for
  resource usage by the groups, then I think these autogroups are beyond
  reach and any stats exported by cgroup interface will not be available.
  (though right now I can't see any stats being exported by cgroup files
   in cpu controller but other controllers like block and memory do.).

- I am doing some testing with the patch and w.r.t. cgroup interface some
  things don't seem right.

  I have applied your patch and enabled CONFIG_AUTO_GROUP. Now I boot
  into the kernel and open a new ssh connection to the machine. 

  # echo $$
    3555
  # cat /proc/3555/autogroup
    /autogroup-63 nice 0

  IIUC, task 3555 has been moved into an autogroup. Now I mount the cpu
  controller and this task is visible in root cgroup.

  # mount -t cgroup -o cpu none /cgroup/cpu
  # cat /cgroup/cpu/tasks | grep 3555
    3555

  First of all this gives user a wrong impression that task 3555 is in
  root cgroup.

  Now I create a child group test1 and move the task there and also change
  the weight/shares of the cgroup to 10240.

  # mkdir test1
  # echo 3555 > test1/tasks
  # echo 10240 > test1/cpu.shares
  # cat /proc/3555/cgroup
    3:cpu:/test1
  # cat /proc/3555/autogroup
    /autogroup-63 nice 0

So again, user will think that task is in cgroup test1 and is being
controlled by the respective weight but that's not the case.

Even if we prevent autogroup task from being visible in cpu controller
root group, then comes the question what happens if cpu and some other
controller is comounted. Say cpuset. Now in that case will task be 
visible in root group task file and can one operate on that. Now showing
up there does not make much sense as task should still be controllable
by other controllers and its policies.

So yes, creating a /proc/<pid>/autogroup is dirt cheap and makes the life
easier in terms of implementation of this patch and it should work well.
But it is also a new user interface which does not sound too extensible and
does not seem to cooperate well with cgroup interface.

It also introduces this new notion of niceness for task groups which is sort
of equivalent to cpu.shares in cpu controller. First of all why should we
not stick to shares notion even for autogroup. Even if we introduce the notion
of niceness for groups, IMHO, it should be through cgroup interface instead of
group niceness for autogroup and shares/weights for cgroup despite the
fact that in the background they do similar things.

I think above concerns can possibly be reason enough to think about about
the wedding.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-28 14:24   ` Mike Galbraith
  2010-11-28 19:31     ` Linus Torvalds
@ 2010-12-01  3:39     ` Paul Turner
  2010-12-01  3:39     ` Paul Turner
  2 siblings, 0 replies; 79+ messages in thread
From: Paul Turner @ 2010-12-01  3:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On 11/28/10 06:24, Mike Galbraith wrote:
> On Thu, 2010-11-25 at 09:00 -0700, Mike Galbraith wrote:
>
>> My vacation is (sniff) over, so I won't get a fully tested patch out the
>> door for review until I get back home.
>
> Either I forgot to pack my eyeballs, or laptop is just too dinky and
> annoying.  Now back home on beloved box, this little bugger poked me
> dead in the eye.
>
> Something else is seriously wrong though.  36.1 with attached (plus
> sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> 37.git and tip with fixlet below both suck rocks.  With a make -j40
> running, wakeup-latency is showing latencies of>100ms, amarok skips,
> mouse lurches badly.. generally horrid.  Something went south.
>

I'm looking at this.

The share:share ratios looked good in static testing, but perhaps we 
need a little more wake-up boost to improve interactivity.

Should have something tomorrow.

- Paul

> sched: fix 3d4b47b4 typo.
>
> Signed-off-by: Mike Galbraith<efault@gmx.de>
> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Cc: Ingo Molnar<mingo@elte.hu>
> LKML-Reference: new submission
> ---
>   kernel/sched.c |    3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -8087,7 +8087,6 @@ static inline void unregister_fair_sched
>   {
>   	struct rq *rq = cpu_rq(cpu);
>   	unsigned long flags;
> -	int i;
>
>   	/*
>   	* Only empty task groups can be destroyed; so we can speculatively
> @@ -8097,7 +8096,7 @@ static inline void unregister_fair_sched
>   		return;
>
>   	raw_spin_lock_irqsave(&rq->lock, flags);
> -	list_del_leaf_cfs_rq(tg->cfs_rq[i]);
> +	list_del_leaf_cfs_rq(tg->cfs_rq[cpu]);
>   	raw_spin_unlock_irqrestore(&rq->lock, flags);
>   }
>   #else /* !CONFG_FAIR_GROUP_SCHED */
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-28 14:24   ` Mike Galbraith
  2010-11-28 19:31     ` Linus Torvalds
  2010-12-01  3:39     ` Paul Turner
@ 2010-12-01  3:39     ` Paul Turner
  2010-12-01  6:16       ` Mike Galbraith
  2010-12-01 11:34       ` Peter Zijlstra
  2 siblings, 2 replies; 79+ messages in thread
From: Paul Turner @ 2010-12-01  3:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On 11/28/10 06:24, Mike Galbraith wrote:
> On Thu, 2010-11-25 at 09:00 -0700, Mike Galbraith wrote:
>
>> My vacation is (sniff) over, so I won't get a fully tested patch out the
>> door for review until I get back home.
>
> Either I forgot to pack my eyeballs, or laptop is just too dinky and
> annoying.  Now back home on beloved box, this little bugger poked me
> dead in the eye.
>
> Something else is seriously wrong though.  36.1 with attached (plus
> sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> 37.git and tip with fixlet below both suck rocks.  With a make -j40
> running, wakeup-latency is showing latencies of>100ms, amarok skips,
> mouse lurches badly.. generally horrid.  Something went south.
>

I'm looking at this.

The share:share ratios looked good in static testing, but perhaps we 
need a little more wake-up boost to improve interactivity.

Should have something tomorrow.

- Paul

> sched: fix 3d4b47b4 typo.
>
> Signed-off-by: Mike Galbraith<efault@gmx.de>
> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Cc: Ingo Molnar<mingo@elte.hu>
> LKML-Reference: new submission
> ---
>   kernel/sched.c |    3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -8087,7 +8087,6 @@ static inline void unregister_fair_sched
>   {
>   	struct rq *rq = cpu_rq(cpu);
>   	unsigned long flags;
> -	int i;
>
>   	/*
>   	* Only empty task groups can be destroyed; so we can speculatively
> @@ -8097,7 +8096,7 @@ static inline void unregister_fair_sched
>   		return;
>
>   	raw_spin_lock_irqsave(&rq->lock, flags);
> -	list_del_leaf_cfs_rq(tg->cfs_rq[i]);
> +	list_del_leaf_cfs_rq(tg->cfs_rq[cpu]);
>   	raw_spin_unlock_irqrestore(&rq->lock, flags);
>   }
>   #else /* !CONFG_FAIR_GROUP_SCHED */
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 19:36                               ` Vivek Goyal
@ 2010-12-01  5:01                                 ` Américo Wang
  2010-12-01  6:09                                   ` Mike Galbraith
                                                     ` (2 more replies)
  2010-12-01  5:57                                 ` Mike Galbraith
  1 sibling, 3 replies; 79+ messages in thread
From: Américo Wang @ 2010-12-01  5:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Mike Galbraith, Paul Turner, Ingo Molnar, Peter Zijlstra,
	Linus Torvalds, Oleg Nesterov, LKML

On Tue, Nov 30, 2010 at 02:36:22PM -0500, Vivek Goyal wrote:
>
>So again, user will think that task is in cgroup test1 and is being
>controlled by the respective weight but that's not the case.
>
>Even if we prevent autogroup task from being visible in cpu controller
>root group, then comes the question what happens if cpu and some other
>controller is comounted. Say cpuset. Now in that case will task be 
>visible in root group task file and can one operate on that. Now showing
>up there does not make much sense as task should still be controllable
>by other controllers and its policies.
>
>So yes, creating a /proc/<pid>/autogroup is dirt cheap and makes the life
>easier in terms of implementation of this patch and it should work well.
>But it is also a new user interface which does not sound too extensible and
>does not seem to cooperate well with cgroup interface.
>
>It also introduces this new notion of niceness for task groups which is sort
>of equivalent to cpu.shares in cpu controller. First of all why should we
>not stick to shares notion even for autogroup. Even if we introduce the notion
>of niceness for groups, IMHO, it should be through cgroup interface instead of
>group niceness for autogroup and shares/weights for cgroup despite the
>fact that in the background they do similar things.
>

Hmm, maybe we can make AUTO_GROUP depend on !CGROUPS?

It seems that autogroup only uses 'struct task_group', no other cgroup things,
so I think that is reasonable and doable.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-30 19:36                               ` Vivek Goyal
  2010-12-01  5:01                                 ` Américo Wang
@ 2010-12-01  5:57                                 ` Mike Galbraith
  2010-12-01 11:33                                   ` Peter Zijlstra
  2010-12-01 14:55                                   ` Vivek Goyal
  1 sibling, 2 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-12-01  5:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Linus Torvalds,
	Oleg Nesterov, LKML

On Tue, 2010-11-30 at 14:36 -0500, Vivek Goyal wrote:

> Few things.
> 
> - This /proc/<pid>/autogroup is good for doing this single thing but when
>   I start thinking of possible extensions of it down the line, it creates
>   issues. 
> 
> - Once we have some kind of uppper limit support in cpu controller, these
>   autogroups are beyond control. If you want to impose some kind of 
>   limits on them then you shall have to extend parallel interface
>   /proc/<pid>/autogroup to also speicify upper limit (like nice levels).
> 
> - Similiarly if this autgroup notion is extended to other cgroup
>   controllers, then you shall have to again extend /proc/<pid>/autogroup
>   to be able to specify these additional parameters.

Yes, if it evolves, it's interface will need to evolve as well.  It
could have been a directory containing buttons, knobs and statistics,
but KISS won.

> - If there is a monitoring tool which is monitoring the system for
>   resource usage by the groups, then I think these autogroups are beyond
>   reach and any stats exported by cgroup interface will not be available.
>   (though right now I can't see any stats being exported by cgroup files
>    in cpu controller but other controllers like block and memory do.).

If you're manually assigning bandwidth et al from userland, there's not
much point to in-kernel automation is there?

If I had married the two, the first thing that would have happened is
gripes about things appearing and disappearing in cgroups directories,
resulting in mayhem and confusion for scripts and tools.

> - I am doing some testing with the patch and w.r.t. cgroup interface some
>   things don't seem right.
> 
>   I have applied your patch and enabled CONFIG_AUTO_GROUP. Now I boot
>   into the kernel and open a new ssh connection to the machine. 
> 
>   # echo $$
>     3555
>   # cat /proc/3555/autogroup
>     /autogroup-63 nice 0
> 
>   IIUC, task 3555 has been moved into an autogroup. Now I mount the cpu
>   controller and this task is visible in root cgroup.
> 
>   # mount -t cgroup -o cpu none /cgroup/cpu
>   # cat /cgroup/cpu/tasks | grep 3555
>     3555
> 
>   First of all this gives user a wrong impression that task 3555 is in
>   root cgroup.

It is in the root cgroup.  It is not in the root autogroup is not
auto-cgroups group.

>   Now I create a child group test1 and move the task there and also change
>   the weight/shares of the cgroup to 10240.
> 
>   # mkdir test1
>   # echo 3555 > test1/tasks
>   # echo 10240 > test1/cpu.shares
>   # cat /proc/3555/cgroup
>     3:cpu:/test1
>   # cat /proc/3555/autogroup
>     /autogroup-63 nice 0
> 
> So again, user will think that task is in cgroup test1 and is being
> controlled by the respective weight but that's not the case.

It is the case here.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 7573 root      20   0  7996  340  256 R   50  0.0   3:35.86 3 pert
 7572 root      20   0  7996  340  256 R   50  0.0   9:21.68 3 pert
...
marge:/cgroups/test # echo 7572 > tasks
marge:/cgroups/test # echo 4096 > cpu.shares

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 7572 root      20   0  7996  340  256 R   80  0.0  10:06.92 3 pert
 7573 root      20   0  7996  340  256 R   20  0.0   4:05.80 3 pert

When you move a task into a cgroup, it still has an autogroup
association, as all tasks (processes actually) do, but it's not used.

> Even if we prevent autogroup task from being visible in cpu controller
> root group, then comes the question what happens if cpu and some other
> controller is comounted. Say cpuset. Now in that case will task be 
> visible in root group task file and can one operate on that. Now showing
> up there does not make much sense as task should still be controllable
> by other controllers and its policies.

The user has to specifically ask for it in his config, can turn it on or
off on the fly or at boot..

> So yes, creating a /proc/<pid>/autogroup is dirt cheap and makes the life
> easier in terms of implementation of this patch and it should work well.
> But it is also a new user interface which does not sound too extensible and
> does not seem to cooperate well with cgroup interface

..it has a different mission, with different users being targeted, so
why does it need to hold hands?

> It also introduces this new notion of niceness for task groups which is sort
> of equivalent to cpu.shares in cpu controller. First of all why should we
> not stick to shares notion even for autogroup. Even if we introduce the notion
> of niceness for groups, IMHO, it should be through cgroup interface instead of
> group niceness for autogroup and shares/weights for cgroup despite the
> fact that in the background they do similar things.

IMHO, cgroups should have been 'nice' from the start, but the folks who
wrote it did what they thought best.  I like nice a lot better than
shares, so I used nice.

> I think above concerns can possibly be reason enough to think about about
> the wedding.

Perhaps in future, they'll get married, and perhaps they should, but in
the here and now, I think they have similar but not identical missions.
If you turn on one, turn off the other.  Maybe that should be automated.

Systemd thingy may make autogroup short lived anyway.  I had a query
from an embedded guy (hm, which I spaced) suggesting autogroup may be
quite nice for handheld stuff though, so who knows.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  5:01                                 ` Américo Wang
@ 2010-12-01  6:09                                   ` Mike Galbraith
  2010-12-01 11:36                                   ` Peter Zijlstra
  2010-12-01 22:12                                   ` Valdis.Kletnieks
  2 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-12-01  6:09 UTC (permalink / raw)
  To: Américo Wang
  Cc: Vivek Goyal, Paul Turner, Ingo Molnar, Peter Zijlstra,
	Linus Torvalds, Oleg Nesterov, LKML

On Wed, 2010-12-01 at 13:01 +0800, Américo Wang wrote:

> Hmm, maybe we can make AUTO_GROUP depend on !CGROUPS?
> 
> It seems that autogroup only uses 'struct task_group', no other cgroup things,
> so I think that is reasonable and doable.

Build time exclusion is not as flexible.  As is, the user can have one
kernel, and use whatever he likes.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  3:39     ` Paul Turner
@ 2010-12-01  6:16       ` Mike Galbraith
  2010-12-03  5:11         ` Paul Turner
  2010-12-01 11:34       ` Peter Zijlstra
  1 sibling, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-12-01  6:16 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On Tue, 2010-11-30 at 19:39 -0800, Paul Turner wrote:
> On 11/28/10 06:24, Mike Galbraith wrote:
> >
> > Something else is seriously wrong though.  36.1 with attached (plus
> > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > running, wakeup-latency is showing latencies of>100ms, amarok skips,
> > mouse lurches badly.. generally horrid.  Something went south.
> 
> I'm looking at this.
> 
> The share:share ratios looked good in static testing, but perhaps we 
> need a little more wake-up boost to improve interactivity.

Yeah, feels like a wakeup issue.  I too did a (brief) static test, and
that looked ok.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  5:57                                 ` Mike Galbraith
@ 2010-12-01 11:33                                   ` Peter Zijlstra
  2010-12-01 11:55                                     ` Mike Galbraith
  2010-12-01 14:55                                   ` Vivek Goyal
  1 sibling, 1 reply; 79+ messages in thread
From: Peter Zijlstra @ 2010-12-01 11:33 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Vivek Goyal, Paul Turner, Ingo Molnar, Linus Torvalds,
	Oleg Nesterov, LKML

On Wed, 2010-12-01 at 06:57 +0100, Mike Galbraith wrote:
> IMHO, cgroups should have been 'nice' from the start, but the folks who
> wrote it did what they thought best.  I like nice a lot better than
> shares, so I used nice. 

Agreed, but by the time I realized that the shares thing was already in
the wild. I did (probably still do) have a patch that adds a nice file
to the cgroup file.

Anyway, I think the whole proc/nice interface for autogroups is already
a tad too far. If people want control they can use cgroups, but I really
don't care enough to argue much about it.




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  3:39     ` Paul Turner
  2010-12-01  6:16       ` Mike Galbraith
@ 2010-12-01 11:34       ` Peter Zijlstra
  1 sibling, 0 replies; 79+ messages in thread
From: Peter Zijlstra @ 2010-12-01 11:34 UTC (permalink / raw)
  To: Paul Turner
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Linus Torvalds, LKML

On Tue, 2010-11-30 at 19:39 -0800, Paul Turner wrote:
> On 11/28/10 06:24, Mike Galbraith wrote:
> > On Thu, 2010-11-25 at 09:00 -0700, Mike Galbraith wrote:
> >
> >> My vacation is (sniff) over, so I won't get a fully tested patch out the
> >> door for review until I get back home.
> >
> > Either I forgot to pack my eyeballs, or laptop is just too dinky and
> > annoying.  Now back home on beloved box, this little bugger poked me
> > dead in the eye.
> >
> > Something else is seriously wrong though.  36.1 with attached (plus
> > sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> > 37.git and tip with fixlet below both suck rocks.  With a make -j40
> > running, wakeup-latency is showing latencies of>100ms, amarok skips,
> > mouse lurches badly.. generally horrid.  Something went south.
> >
> 
> I'm looking at this.
> 
> The share:share ratios looked good in static testing, but perhaps we 
> need a little more wake-up boost to improve interactivity.
> 
> Should have something tomorrow.

Right, the previous thing cheated quite enormous with wakeups simply
because it was way too expensive to compute proper shares on wakeups.

Maybe we should re-instate some of that cheating.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  5:01                                 ` Américo Wang
  2010-12-01  6:09                                   ` Mike Galbraith
@ 2010-12-01 11:36                                   ` Peter Zijlstra
  2010-12-01 22:12                                   ` Valdis.Kletnieks
  2 siblings, 0 replies; 79+ messages in thread
From: Peter Zijlstra @ 2010-12-01 11:36 UTC (permalink / raw)
  To: Américo Wang
  Cc: Vivek Goyal, Mike Galbraith, Paul Turner, Ingo Molnar,
	Linus Torvalds, Oleg Nesterov, LKML

On Wed, 2010-12-01 at 13:01 +0800, Américo Wang wrote:
> 
> Hmm, maybe we can make AUTO_GROUP depend on !CGROUPS?
> 
> It seems that autogroup only uses 'struct task_group', no other cgroup things,
> so I think that is reasonable and doable. 

That's only going to create more #ifdefery in sched.c (and we already
got way too much of that), for little to no gain.

But yes, technically that could be done.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01 11:33                                   ` Peter Zijlstra
@ 2010-12-01 11:55                                     ` Mike Galbraith
  0 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-12-01 11:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vivek Goyal, Paul Turner, Ingo Molnar, Linus Torvalds,
	Oleg Nesterov, LKML

On Wed, 2010-12-01 at 12:33 +0100, Peter Zijlstra wrote:

> Anyway, I think the whole proc/nice interface for autogroups is already
> a tad too far. If people want control they can use cgroups, but I really
> don't care enough to argue much about it.

Agreed.  I've no intention of expanding it.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  5:57                                 ` Mike Galbraith
  2010-12-01 11:33                                   ` Peter Zijlstra
@ 2010-12-01 14:55                                   ` Vivek Goyal
  2010-12-01 15:04                                     ` Mike Galbraith
  1 sibling, 1 reply; 79+ messages in thread
From: Vivek Goyal @ 2010-12-01 14:55 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Linus Torvalds,
	Oleg Nesterov, LKML

On Wed, Dec 01, 2010 at 06:57:48AM +0100, Mike Galbraith wrote:

[..]
> 
> > - I am doing some testing with the patch and w.r.t. cgroup interface some
> >   things don't seem right.
> > 
> >   I have applied your patch and enabled CONFIG_AUTO_GROUP. Now I boot
> >   into the kernel and open a new ssh connection to the machine. 
> > 
> >   # echo $$
> >     3555
> >   # cat /proc/3555/autogroup
> >     /autogroup-63 nice 0
> > 
> >   IIUC, task 3555 has been moved into an autogroup. Now I mount the cpu
> >   controller and this task is visible in root cgroup.
> > 
> >   # mount -t cgroup -o cpu none /cgroup/cpu
> >   # cat /cgroup/cpu/tasks | grep 3555
> >     3555
> > 
> >   First of all this gives user a wrong impression that task 3555 is in
> >   root cgroup.
> 
> It is in the root cgroup.  It is not in the root autogroup is not
> auto-cgroups group.
> 
> >   Now I create a child group test1 and move the task there and also change
> >   the weight/shares of the cgroup to 10240.
> > 
> >   # mkdir test1
> >   # echo 3555 > test1/tasks
> >   # echo 10240 > test1/cpu.shares
> >   # cat /proc/3555/cgroup
> >     3:cpu:/test1
> >   # cat /proc/3555/autogroup
> >     /autogroup-63 nice 0
> > 
> > So again, user will think that task is in cgroup test1 and is being
> > controlled by the respective weight but that's not the case.
> 
> It is the case here.
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>  7573 root      20   0  7996  340  256 R   50  0.0   3:35.86 3 pert
>  7572 root      20   0  7996  340  256 R   50  0.0   9:21.68 3 pert
> ...
> marge:/cgroups/test # echo 7572 > tasks
> marge:/cgroups/test # echo 4096 > cpu.shares
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>  7572 root      20   0  7996  340  256 R   80  0.0  10:06.92 3 pert
>  7573 root      20   0  7996  340  256 R   20  0.0   4:05.80 3 pert
> 
> When you move a task into a cgroup, it still has an autogroup
> association, as all tasks (processes actually) do, but it's not used.

Ok, so I got confused with the fact that after moving a task into a
cgroup it is still associated with an autogroup.

So IIUC, if a task is in root cgroup, then it would not necessarily be driven
by cpu.shares of root cgroup (as task could be in its own autogroup). But
if I move the task into a non-root cgroup, then it will for sure be
subjected to rules imposed by non-root cgroup cpu.shares. That's not too
bad.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01 14:55                                   ` Vivek Goyal
@ 2010-12-01 15:04                                     ` Mike Galbraith
  0 siblings, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-12-01 15:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Paul Turner, Ingo Molnar, Peter Zijlstra, Linus Torvalds,
	Oleg Nesterov, LKML

On Wed, 2010-12-01 at 09:55 -0500, Vivek Goyal wrote:

> So IIUC, if a task is in root cgroup, then it would not necessarily be driven
> by cpu.shares of root cgroup (as task could be in its own autogroup). But
> if I move the task into a non-root cgroup, then it will for sure be
> subjected to rules imposed by non-root cgroup cpu.shares. That's not too
> bad.

I think the normal case would be either one or the other being in use at
any given time, but yes, if both are active, that's how it'd work.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  5:01                                 ` Américo Wang
  2010-12-01  6:09                                   ` Mike Galbraith
  2010-12-01 11:36                                   ` Peter Zijlstra
@ 2010-12-01 22:12                                   ` Valdis.Kletnieks
  2 siblings, 0 replies; 79+ messages in thread
From: Valdis.Kletnieks @ 2010-12-01 22:12 UTC (permalink / raw)
  To: Américo Wang
  Cc: Vivek Goyal, Mike Galbraith, Paul Turner, Ingo Molnar,
	Peter Zijlstra, Linus Torvalds, Oleg Nesterov, LKML

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

On Wed, 01 Dec 2010 13:01:29 +0800, Américo Wang said:

> Hmm, maybe we can make AUTO_GROUP depend on !CGROUPS?
> 
> It seems that autogroup only uses 'struct task_group', no other cgroup things,
> so I think that is reasonable and doable.

A non-starter if you have a Fedora Rawhide box that has systemd installed, as
that won't even make it to single-user if you don't have CGROUPS in the kernel config.


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-01  6:16       ` Mike Galbraith
@ 2010-12-03  5:11         ` Paul Turner
  2010-12-03  6:48           ` Mike Galbraith
  2010-12-04 23:55           ` James Courtier-Dutton
  0 siblings, 2 replies; 79+ messages in thread
From: Paul Turner @ 2010-12-03  5:11 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On 11/30/10 22:16, Mike Galbraith wrote:
> On Tue, 2010-11-30 at 19:39 -0800, Paul Turner wrote:
>> On 11/28/10 06:24, Mike Galbraith wrote:
>>>
>>> Something else is seriously wrong though.  36.1 with attached (plus
>>> sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
>>> 37.git and tip with fixlet below both suck rocks.  With a make -j40
>>> running, wakeup-latency is showing latencies of>100ms, amarok skips,
>>> mouse lurches badly.. generally horrid.  Something went south.
>>
>> I'm looking at this.
>>
>> The share:share ratios looked good in static testing, but perhaps we
>> need a little more wake-up boost to improve interactivity.
>
> Yeah, feels like a wakeup issue.  I too did a (brief) static test, and
> that looked ok.
>
> 	-Mike
>

Hey Mike,

Does something like the below help?

We're quick to drive the load_contribution up (to avoid over-commit). 
However on sleepy workloads this results in lots of weight being 
stranded (since it reaches maximum contribution instantaneously but 
decays slowly) as the thread migrates between cpus.

Avoid this by averaging "up" in the wake-up direction as well as the sleep.

We also get a boost from the fact that we use the instantaneous weight 
in computing the actual received shares.

I actually don't have a desktop setup handy to test "interactivity" (sad 
but true -- working on grabbing one).  But it looks better on under 
synthetic load.

- Paul

===================================================================
--- tip.orig/kernel/sched_fair.c
+++ tip/kernel/sched_fair.c
@@ -743,12 +743,19 @@ static void update_cfs_load(struct cfs_r
                 return;

         now = rq_of(cfs_rq)->clock;
-       delta = now - cfs_rq->load_stamp;
+
+       if (likely(cfs_rq->load_stamp))
+               delta = now - cfs_rq->load_stamp;
+       else {
+               /* avoid large initial delta and initialize load_period */
+               delta = 1;
+               cfs_rq->load_stamp = 1;
+       }

         /* truncate load history at 4 idle periods */
         if (cfs_rq->load_stamp > cfs_rq->load_last &&
             now - cfs_rq->load_last > 4 * period) {
-               cfs_rq->load_period = 0;
+               cfs_rq->load_period = period/2;
                 cfs_rq->load_avg = 0;
         }

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-03  5:11         ` Paul Turner
@ 2010-12-03  6:48           ` Mike Galbraith
  2010-12-03  8:37             ` Paul Turner
  2010-12-04 23:55           ` James Courtier-Dutton
  1 sibling, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-12-03  6:48 UTC (permalink / raw)
  To: Paul Turner
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On Thu, 2010-12-02 at 21:11 -0800, Paul Turner wrote:
> On 11/30/10 22:16, Mike Galbraith wrote:
> > On Tue, 2010-11-30 at 19:39 -0800, Paul Turner wrote:
> >> On 11/28/10 06:24, Mike Galbraith wrote:
> >>>
> >>> Something else is seriously wrong though.  36.1 with attached (plus
> >>> sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
> >>> 37.git and tip with fixlet below both suck rocks.  With a make -j40
> >>> running, wakeup-latency is showing latencies of>100ms, amarok skips,
> >>> mouse lurches badly.. generally horrid.  Something went south.
> >>
> >> I'm looking at this.
> >>
> >> The share:share ratios looked good in static testing, but perhaps we
> >> need a little more wake-up boost to improve interactivity.
> >
> > Yeah, feels like a wakeup issue.  I too did a (brief) static test, and
> > that looked ok.
> >
> > 	-Mike
> >
> 
> Hey Mike,
> 
> Does something like the below help?

Unfortunately not.  For example, Xorg+mplayer needs (30 sec refresh)..

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 6487 root      20   0  366m  30m 5100 S   31  0.4   2:04.83 2 Xorg
 4454 root      20   0  318m  28m  15m S   29  0.4   0:38.06 3 mplayer

..but gets this when a heavy kbuild is running along with them.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 6487 root      20   0  366m  30m 5136 S   12  0.4   2:25.98 1 Xorg
 5595 root      20   0  318m  28m  15m R    8  0.4   0:09.31 3 mplayer

There are 4 task groups active at this time, Xorg, mplayer, Amarok and
konsole where the kbuild is running make -j40.

	-Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-03  6:48           ` Mike Galbraith
@ 2010-12-03  8:37             ` Paul Turner
  0 siblings, 0 replies; 79+ messages in thread
From: Paul Turner @ 2010-12-03  8:37 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds, LKML

On Thu, Dec 2, 2010 at 10:48 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Thu, 2010-12-02 at 21:11 -0800, Paul Turner wrote:
>> On 11/30/10 22:16, Mike Galbraith wrote:
>> > On Tue, 2010-11-30 at 19:39 -0800, Paul Turner wrote:
>> >> On 11/28/10 06:24, Mike Galbraith wrote:
>> >>>
>> >>> Something else is seriously wrong though.  36.1 with attached (plus
>> >>> sched, cgroup: Fixup broken cgroup movement) works a treat, whereas
>> >>> 37.git and tip with fixlet below both suck rocks.  With a make -j40
>> >>> running, wakeup-latency is showing latencies of>100ms, amarok skips,
>> >>> mouse lurches badly.. generally horrid.  Something went south.
>> >>
>> >> I'm looking at this.
>> >>
>> >> The share:share ratios looked good in static testing, but perhaps we
>> >> need a little more wake-up boost to improve interactivity.
>> >
>> > Yeah, feels like a wakeup issue.  I too did a (brief) static test, and
>> > that looked ok.
>> >
>> >     -Mike
>> >
>>
>> Hey Mike,
>>
>> Does something like the below help?
>
> Unfortunately not.  For example, Xorg+mplayer needs (30 sec refresh)..
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>  6487 root      20   0  366m  30m 5100 S   31  0.4   2:04.83 2 Xorg
>  4454 root      20   0  318m  28m  15m S   29  0.4   0:38.06 3 mplayer
>
> ..but gets this when a heavy kbuild is running along with them.
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>  6487 root      20   0  366m  30m 5136 S   12  0.4   2:25.98 1 Xorg
>  5595 root      20   0  318m  28m  15m R    8  0.4   0:09.31 3 mplayer
>
> There are 4 task groups active at this time, Xorg, mplayer, Amarok and
> konsole where the kbuild is running make -j40.
>

Hmm.. unfortunate.  Ok -- based on the traces of synthetic loads and
the traces of their share on wake-up I think this is the right track
at least, will refine it tomorrow.

Thanks for trying it.

>        -Mike
>
>
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-03  5:11         ` Paul Turner
  2010-12-03  6:48           ` Mike Galbraith
@ 2010-12-04 23:55           ` James Courtier-Dutton
  2010-12-05  5:11             ` Paul Turner
  1 sibling, 1 reply; 79+ messages in thread
From: James Courtier-Dutton @ 2010-12-04 23:55 UTC (permalink / raw)
  To: Paul Turner
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Linus Torvalds, LKML

On 3 December 2010 05:11, Paul Turner <pjt@google.com> wrote:
>
> I actually don't have a desktop setup handy to test "interactivity" (sad but
> true -- working on grabbing one).  But it looks better on under synthetic
> load.
>

What tools are actually used to test "interactivity" ?
I posted a tool to the list some time ago, but I don't think anyone noticed.
My tool is very simple.
When you hold a key down, it should repeat. It should repeat at a
constant predictable interval.
So, my tool just waits for key presses and times when each one occurred.
The tester simply presses a key and holds it down.
If the time between each key press is constant, it indicates good
"interactivity". If the time between each key press varies a lot, it
indicates bad "interactivity".
You can reliably test if one kernel is better than the next using
actual measurable figures.

Kind Regards

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 23:55           ` James Courtier-Dutton
@ 2010-12-05  5:11             ` Paul Turner
  2010-12-07 11:32               ` Paul Turner
  0 siblings, 1 reply; 79+ messages in thread
From: Paul Turner @ 2010-12-05  5:11 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Linus Torvalds, LKML

On Sat, Dec 4, 2010 at 3:55 PM, James Courtier-Dutton
<james.dutton@gmail.com> wrote:
> On 3 December 2010 05:11, Paul Turner <pjt@google.com> wrote:
>>
>> I actually don't have a desktop setup handy to test "interactivity" (sad but
>> true -- working on grabbing one).  But it looks better on under synthetic
>> load.
>>
>
> What tools are actually used to test "interactivity" ?
> I posted a tool to the list some time ago, but I don't think anyone noticed.
> My tool is very simple.
> When you hold a key down, it should repeat. It should repeat at a
> constant predictable interval.
> So, my tool just waits for key presses and times when each one occurred.
> The tester simply presses a key and holds it down.
> If the time between each key press is constant, it indicates good
> "interactivity". If the time between each key press varies a lot, it
> indicates bad "interactivity".
> You can reliably test if one kernel is better than the next using
> actual measurable figures.
>
> Kind Regards
>
> James
>

Could you drop me a pointer?  I can certainly give it a try.  It would
be extra useful if it included any histogram functionality.

I've been using a combination of various synthetic wakeup and load
scripts and measuring the received bandwidth / wakeup latency.

They have not succeeded in reproducing the starvation or poor latency
observed by Mike above however.  (Although I've pulled a box to try
reproducing his exact conditions [ e.g. user environment ] on Monday).

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05  5:11             ` Paul Turner
@ 2010-12-07 11:32               ` Paul Turner
  2010-12-15 12:10                 ` Paul Turner
  0 siblings, 1 reply; 79+ messages in thread
From: Paul Turner @ 2010-12-07 11:32 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Linus Torvalds, LKML

Desktop hardware came in today and I can now reproduce the issues
Mike's been seeing; tuning in progress.

On Sat, Dec 4, 2010 at 9:11 PM, Paul Turner <pjt@google.com> wrote:
> On Sat, Dec 4, 2010 at 3:55 PM, James Courtier-Dutton
> <james.dutton@gmail.com> wrote:
>> On 3 December 2010 05:11, Paul Turner <pjt@google.com> wrote:
>>>
>>> I actually don't have a desktop setup handy to test "interactivity" (sad but
>>> true -- working on grabbing one).  But it looks better on under synthetic
>>> load.
>>>
>>
>> What tools are actually used to test "interactivity" ?
>> I posted a tool to the list some time ago, but I don't think anyone noticed.
>> My tool is very simple.
>> When you hold a key down, it should repeat. It should repeat at a
>> constant predictable interval.
>> So, my tool just waits for key presses and times when each one occurred.
>> The tester simply presses a key and holds it down.
>> If the time between each key press is constant, it indicates good
>> "interactivity". If the time between each key press varies a lot, it
>> indicates bad "interactivity".
>> You can reliably test if one kernel is better than the next using
>> actual measurable figures.
>>
>> Kind Regards
>>
>> James
>>
>
> Could you drop me a pointer?  I can certainly give it a try.  It would
> be extra useful if it included any histogram functionality.
>
> I've been using a combination of various synthetic wakeup and load
> scripts and measuring the received bandwidth / wakeup latency.
>
> They have not succeeded in reproducing the starvation or poor latency
> observed by Mike above however.  (Although I've pulled a box to try
> reproducing his exact conditions [ e.g. user environment ] on Monday).
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-07 11:32               ` Paul Turner
@ 2010-12-15 12:10                 ` Paul Turner
  0 siblings, 0 replies; 79+ messages in thread
From: Paul Turner @ 2010-12-15 12:10 UTC (permalink / raw)
  To: James Courtier-Dutton
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Linus Torvalds, LKML

This goose is now cooked.

It turns out the new shares code is doing the "right" thing, but in
the wrong order for tasks with very small time slices.  This made it
rather gnarly to track down since at all times the new evaluation /
the old evaluation / and an "OPT" evaluation were all in agreement!

As we hierarchically dequeue we now instantaneously adjust entity
weights to account for the new global state (good).  However, when we
then update on the parent (e.g. the group entity owning the
just-adjusted cfs_rq), the accrued unaccounted time is charged at the
new weight for that entity instead of the old.

For longer running processes, the periodic updates hide this.
However, for an interactive process, such as Xorg (which uses many
_small_ timeslices -- e.g. almost all accounting ends up being at
dequeue as opposed to periodic) this results in significant vruntime
over-charging and a loss of fairness.  In Xorg's case the loss of
fairness is compounded by the fact that there is only one runnable
thread means we transition between NICE_0_LOAD and MIN_SHARES for the
over-charging above.

This is fixed by charging the unaccounted time versus a group entity
before we manipulate its weight (as a result of child movement).

Thanks for your patience while I tracked this down.. it's been a few
sleepless nights while I cranked through a number of dead-end theories
(rather frustrating when the numbers are all right but the results are
all wrong! ;).  Cleaned up patch inbound in the morning.

- Paul

On Tue, Dec 7, 2010 at 3:32 AM, Paul Turner <pjt@google.com> wrote:
> Desktop hardware came in today and I can now reproduce the issues
> Mike's been seeing; tuning in progress.
>
> On Sat, Dec 4, 2010 at 9:11 PM, Paul Turner <pjt@google.com> wrote:
>> On Sat, Dec 4, 2010 at 3:55 PM, James Courtier-Dutton
>> <james.dutton@gmail.com> wrote:
>>> On 3 December 2010 05:11, Paul Turner <pjt@google.com> wrote:
>>>>
>>>> I actually don't have a desktop setup handy to test "interactivity" (sad but
>>>> true -- working on grabbing one).  But it looks better on under synthetic
>>>> load.
>>>>
>>>
>>> What tools are actually used to test "interactivity" ?
>>> I posted a tool to the list some time ago, but I don't think anyone noticed.
>>> My tool is very simple.
>>> When you hold a key down, it should repeat. It should repeat at a
>>> constant predictable interval.
>>> So, my tool just waits for key presses and times when each one occurred.
>>> The tester simply presses a key and holds it down.
>>> If the time between each key press is constant, it indicates good
>>> "interactivity". If the time between each key press varies a lot, it
>>> indicates bad "interactivity".
>>> You can reliably test if one kernel is better than the next using
>>> actual measurable figures.
>>>
>>> Kind Regards
>>>
>>> James
>>>
>>
>> Could you drop me a pointer?  I can certainly give it a try.  It would
>> be extra useful if it included any histogram functionality.
>>
>> I've been using a combination of various synthetic wakeup and load
>> scripts and measuring the received bandwidth / wakeup latency.
>>
>> They have not succeeded in reproducing the starvation or poor latency
>> observed by Mike above however.  (Although I've pulled a box to try
>> reproducing his exact conditions [ e.g. user environment ] on Monday).
>>
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 20:47                                     ` Linus Torvalds
  2010-12-05 22:47                                       ` Colin Walters
@ 2010-12-07 18:51                                       ` Peter Zijlstra
  1 sibling, 0 replies; 79+ messages in thread
From: Peter Zijlstra @ 2010-12-07 18:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Colin Walters, Ray Lee, Mike Galbraith, Ingo Molnar,
	Oleg Nesterov, Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sun, 2010-12-05 at 12:47 -0800, Linus Torvalds wrote:
> Nice levels are _not_ about group scheduling. They're about
> priorities. And since the cgroup code doesn't even support priority
> levels for the groups, it's a really *horrible* match. 

It does in fact, nice maps to a weight, we then schedule so that each
entity (be it task or group) gets a proportional amount of time relative
to the other entities (of the same parent).

The scheduler basically solves the following differential equation:
  dt_i = w_i * dt / \Sum_j w_j


For tasks we map nice to weight like:

static const int prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

For groups we expose the weight directly in cgroupfs://cpu.shares with a
default equivalent to nice-0 (1024).

So 'nice make -j9' will run make and all its children with weight=110,
if this task hierarchy has ~9 runnable tasks it will get about as much
time as a single nice-0 competing task.

[ 9*110 = 990, 1*1024 = 1024, which gives: 49% vs 51% ]


Now group scheduling is in fact closely related to nice, the only thing
group scheduling does is:

  w_i = \unit * \Prod_j { w_i,j / \Sum_k w_k,j }, where:

     j \elem i and its parents
     k \elem entities of group j (where a task is a trivial group)

Where we compute a task's effective weight (w_i) by multiplying it with
the effective weight of their ancestors.

Suppose a grouped make -j9 against 1 competing task (all nice-0 or
equivalent), and make's 9 active children [a..i] in the group G:


        R
      /   \
     t     G
          / \
         a...i

So w_t = 1024, w_G = 1024 and w_[a..i] = 1024.

Now, per the above the effective weight (weight as in the root group) of
each grouped task is:

  w_[a..i] = 1024 * 1024/2048 * 1024/9216 ~= 56
  w_t      = 1024 * 1024/2048             = 512

[ \Sum w_[a..i] = 512, vs 512 gives: 50% vs 50% ]

So effectively: nice make -j9, and stuffing the make -j9 in a group are
roughly equivalent.

The only difference between groups and nice is the interface, with nice
you set the weight directly, with groups you set it implicitly,
depending on the runnable task state.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 20:01                           ` Colin Walters
                                               ` (2 preceding siblings ...)
  2010-12-05 11:11                             ` Nikos Chantziaras
@ 2010-12-06  0:28                             ` Valdis.Kletnieks
  3 siblings, 0 replies; 79+ messages in thread
From: Valdis.Kletnieks @ 2010-12-06  0:28 UTC (permalink / raw)
  To: Colin Walters
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers, LKML

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On Sat, 04 Dec 2010 15:01:16 EST, Colin Walters said:

> Look around...where?  On what basis are you making that claim?  I did
> a quick web search for "unix background process", and this tutorial
> (in the first page of Google search results) aimed at grad students
> who use Unix at college definitely describes "nice make":
> http://acs.ucsd.edu/info/jobctrl.shtml

The fact that something is documented doesn't mean the documentation actually
is correct.

There exists a Linux guide written by somebody (who has enough of a rep that
you can safely say "should have known better") who didn't understand the
difference between traditional Unix and Linux, nor what the original concept
was, and it documented the proper way to take a system down quickly as:

# sync;sync;sync;halt

Of course, the *original* was:

# sync
# sync
# sync
# halt

And the whole point of 3 syncs was that the typing time of the second and third
sync's chewed up the time till the first sync finished.  Of course, sync;sync
doesn't  start the first sync and then make you type.  And it overlooked that
the Linux sync is a lot more synchronous than the ATT Unix sync, which returned
as soon as the I/O was scheduled, not completed.


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 22:58                                         ` Jesper Juhl
@ 2010-12-05 23:05                                           ` Jesper Juhl
  0 siblings, 0 replies; 79+ messages in thread
From: Jesper Juhl @ 2010-12-05 23:05 UTC (permalink / raw)
  To: Colin Walters
  Cc: Linus Torvalds, Ray Lee, Mike Galbraith, Ingo Molnar,
	Oleg Nesterov, Peter Zijlstra, Markus Trippelsdorf,
	Mathieu Desnoyers, LKML

On Sun, 5 Dec 2010, Jesper Juhl wrote:

> On Sun, 5 Dec 2010, Colin Walters wrote:
> 
> > On Sun, Dec 5, 2010 at 3:47 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > The semantics of "nice" are not - and have never been - to put things
> > > into process scheduling groups of their own.
> > 
> > Again, I obviously understand that - the point is to explore the space
> > of changes here and consider what would (and wouldn't) break.  And
> > actually, what would improve.
> > 
> > > This is very much documented. People rely on it.
> > 
> > Well, we established my Fedora 14 system doesn't.  You said "no one"
> > uses "nice" interactively.  So...that leaves - who?  If you were
> > saying to me something like "I know Yahoo has some code in their data
> > centers which uses a range of nice values; if we made this change, all
> > of a sudden they'd get more CPU contention..."  Or like, "I'm pretty
> > sure Maemo uses very low nice values for some UI code".  But you so
> > far haven't really done that, it's just been (mostly)
> > assertions/handwaving.  Now you obviously have a lot more experience
> > that gives those assertions and handwaving a lot of credibility - but
> > all we need is one concrete example to shut me up =)
> > 
> [...]
> 
> I'll give you two re-world examples from two (closed source, but 
> still) apps we develop at my current employer.
> 
> The first one is a server/network monitoring app where there are lots of 
> child processes devoted to performing checks, storing data, displaying 
> results etc. Most of these processes just run at the default nice level. 
> One of the processes sometimes has a need for a cryptographic key pair and 
> it can generate this when it needs it, but it's better if one is reaily 
> available, so we have a seperate child process running that maintains a 
> small pool of new key pairs - this process runs at a high nice level since 
> it should not take CPU time away from the rest of the processes (it's not 
> important, it's just a small optimization), the need for key pairs comes 
> at large intervals, so the pool will almost never be depleted even if 
> this process doesn't get very much CPU time for a long time and besides, 
> if the pool ever gets depleted its no disaster since the consumer will 
> then just generate a key pair when needed and burn the required CPU.
> 
> The second is a backup aplication where one child process is in charge of 
> doing background disk scanning, compression and encryption. This process 
> is not interactive, it must result in minimal interference with whatever 
> the user is currently using the machine for as his primary task and 
> time-to-completion is not really that important. So, this process runs at 
> a rather high nice level to avoid stealing CPU from the users primary 
> task(s).
> 

Ohh and a third example. On my home laptop I got sufficiently annoyed with 
'updatedb' starting up from cron while I was in the middle of something 
so that cron job now runs updatedb with 'nice 19' and also uses ionice so 
it runs at the 'best effort' class and with priority 7 (lowest).


-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 22:47                                       ` Colin Walters
@ 2010-12-05 22:58                                         ` Jesper Juhl
  2010-12-05 23:05                                           ` Jesper Juhl
  0 siblings, 1 reply; 79+ messages in thread
From: Jesper Juhl @ 2010-12-05 22:58 UTC (permalink / raw)
  To: Colin Walters
  Cc: Linus Torvalds, Ray Lee, Mike Galbraith, Ingo Molnar,
	Oleg Nesterov, Peter Zijlstra, Markus Trippelsdorf,
	Mathieu Desnoyers, LKML

On Sun, 5 Dec 2010, Colin Walters wrote:

> On Sun, Dec 5, 2010 at 3:47 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The semantics of "nice" are not - and have never been - to put things
> > into process scheduling groups of their own.
> 
> Again, I obviously understand that - the point is to explore the space
> of changes here and consider what would (and wouldn't) break.  And
> actually, what would improve.
> 
> > This is very much documented. People rely on it.
> 
> Well, we established my Fedora 14 system doesn't.  You said "no one"
> uses "nice" interactively.  So...that leaves - who?  If you were
> saying to me something like "I know Yahoo has some code in their data
> centers which uses a range of nice values; if we made this change, all
> of a sudden they'd get more CPU contention..."  Or like, "I'm pretty
> sure Maemo uses very low nice values for some UI code".  But you so
> far haven't really done that, it's just been (mostly)
> assertions/handwaving.  Now you obviously have a lot more experience
> that gives those assertions and handwaving a lot of credibility - but
> all we need is one concrete example to shut me up =)
> 
[...]

I'll give you two re-world examples from two (closed source, but 
still) apps we develop at my current employer.

The first one is a server/network monitoring app where there are lots of 
child processes devoted to performing checks, storing data, displaying 
results etc. Most of these processes just run at the default nice level. 
One of the processes sometimes has a need for a cryptographic key pair and 
it can generate this when it needs it, but it's better if one is reaily 
available, so we have a seperate child process running that maintains a 
small pool of new key pairs - this process runs at a high nice level since 
it should not take CPU time away from the rest of the processes (it's not 
important, it's just a small optimization), the need for key pairs comes 
at large intervals, so the pool will almost never be depleted even if 
this process doesn't get very much CPU time for a long time and besides, 
if the pool ever gets depleted its no disaster since the consumer will 
then just generate a key pair when needed and burn the required CPU.

The second is a backup aplication where one child process is in charge of 
doing background disk scanning, compression and encryption. This process 
is not interactive, it must result in minimal interference with whatever 
the user is currently using the machine for as his primary task and 
time-to-completion is not really that important. So, this process runs at 
a rather high nice level to avoid stealing CPU from the users primary 
task(s).


-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 20:47                                     ` Linus Torvalds
@ 2010-12-05 22:47                                       ` Colin Walters
  2010-12-05 22:58                                         ` Jesper Juhl
  2010-12-07 18:51                                       ` Peter Zijlstra
  1 sibling, 1 reply; 79+ messages in thread
From: Colin Walters @ 2010-12-05 22:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ray Lee, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sun, Dec 5, 2010 at 3:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The semantics of "nice" are not - and have never been - to put things
> into process scheduling groups of their own.

Again, I obviously understand that - the point is to explore the space
of changes here and consider what would (and wouldn't) break.  And
actually, what would improve.

> This is very much documented. People rely on it.

Well, we established my Fedora 14 system doesn't.  You said "no one"
uses "nice" interactively.  So...that leaves - who?  If you were
saying to me something like "I know Yahoo has some code in their data
centers which uses a range of nice values; if we made this change, all
of a sudden they'd get more CPU contention..."  Or like, "I'm pretty
sure Maemo uses very low nice values for some UI code".  But you so
far haven't really done that, it's just been (mostly)
assertions/handwaving.  Now you obviously have a lot more experience
that gives those assertions and handwaving a lot of credibility - but
all we need is one concrete example to shut me up =)

Playing around with Google code search a bit, hits for "nice" were
almost all duplicates of various C library headers/implementations.
"setpriority" was a bit more interesting, it appears Chromium has some
code to bump up the nice value by 5 for "background" processes:

http://google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/base/process_linux.cc&q=setpriority&exact_package=chromium&l=21

But all my Chrome related processes here are 0, so who knows what
that's used for.   There are also hits for chromium's copy of embedded
cygwin+perl...terrifying.  I assume (hope, desperately) that
Cygwin+Perl is just used for building...

Another hit here in some random X screensaver code:
http://google.com/codesearch/p?hl=en#tJJawb1IJ20/driver/exec.c&q=setpriority%20file:.*.c&l=218

But I can't find a place where it's setting a non-zero value for that.

So...ah, here's one in Android's "development" git:
http://google.com/codesearch/p?hl=en#CRBM04-7BoA/simulator/wrapsim/Init.c&q=setpriority%20file:.*.c&l=91

Except it appears to be unused =/

Oh!  Here we go, one in the Android UI code:
http://google.com/codesearch/p?hl=en#uX1GffpyOZk/libs/rs/rsContext.cpp&q=setpriority%20file:.*.c&sa=N&cd=29&ct=rc
Pasting this one so people don't have to follow the link:

void * Context::threadProc(void *vrsc)
{
  ...
  setpriority(PRIO_PROCESS, rsc->mNativeThreadId, ANDROID_PRIORITY_DISPLAY);

}

Where ANDROID_PRIORITY_DISPLAY = -4.  Actually the whole enum is interesting:
http://google.com/codesearch/p?hl=en#uX1GffpyOZk/include/utils/threads.h&q=ANDROID_PRIORITY_DISPLAY&l=39

One interesting bit here is that they renice UI that the user is
presently interacting with:

    /* threads currently running a UI that the user is interacting with */
    ANDROID_PRIORITY_FOREGROUND     =  -2,

(Something "we" (and by "we" I mean GNOME) don't do, I believe Windows
does though).  Though, honestly I could whip up a
gnome-settings-daemon plugin to do this in about 10 minutes.  Maybe
after dinner.

So...we've established that important released operating systems do
use negative nice values (not surprising). I can't offhand find any
uses of e.g. ANDROID_PRIORITY_BACKGROUND (i.e. a positive nice value)
in the "base" sources though.

> Different nice levels shouldn't get group scheduled together - they
> should be scheduled *less*.

But it seems obvious (right?) that putting them in one group *will*
ensure they get scheduled less, since that one group has to contend
with all other processes.

> And it's not about "make", since nobody
> really ever uses nice on make anyway, it's about things like
> pulseaudio (that wants higher priorities)

Note that pulse is actually using the RT scheduling class, so (I
think) its actual nice value is irrelevant.

Again using F14, the only things using negative nice besides pulse is
udev and auditd.

> Not very much (because they are mostly useless), but there really are
> people who use it.

Still trying to extract specific examples of "people who use it" from you...

> Do you *really* think that the person who niced the filesystem indexer
> down wants the indexer to get 50% of the CPU, just because it's
> scheduled separately from the parallel make?

Finally, an example!  I can work with this.  So let's assume I'm using
some JavaScript-intensive website in Firefox in GNOME, and
tracker-miner-fs kicks in after noticing I just saved a Word document
I want to look at later.  And an otherwise idle system.  You're
suggesting that, now tracker-miner-fs would be using a lot more CPU if
it was in an empty group than it would have before?

That does seem likely to be true.  But would it be a *problem*?  I
don't know, it's not obvious to me offhand.  Especially on any
hardware that's dual-core, where SpiderMonkey can be burning one core
(since that's all it will use, modulo Web Workers), and tracker on
another.

Anyways, I don't have the kernel-fu to make a patch myself here,
especially since the scheduler is probably one of the hardest parts of
the OS.  So ultimately I guess, if you just totally disagree, fine.
But I wasn't satisfied with the response - my engineering intuition is
to work through problems and try to really understand what would be
wrong.  It's hard to accept "just trust me, that's stupid".

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 10:18                                 ` Con Kolivas
  2010-12-05 11:36                                   ` Mike Galbraith
@ 2010-12-05 20:58                                   ` Ingo Molnar
  1 sibling, 0 replies; 79+ messages in thread
From: Ingo Molnar @ 2010-12-05 20:58 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Colin Walters, Linus Torvalds, Mike Galbraith, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers,
	linux-kernel


* Con Kolivas <kernel@kolivas.org> wrote:

> Greets.
> 
> I applaud your efforts to continue addressing interactivity and responsiveness 
> but, I know I'm going to regret this, I feel strongly enough to speak up about 
> this change.
> 
> On Sun, 5 Dec 2010 10:43:44 Colin Walters wrote:
> > On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > What's your point again? It's a heuristic.
> > 
> > So if it's a heuristic the OS can get wrong,
> 
> This is precisely what I see as the flaw in this approach. [...]

I think you are misunderstanding Mike's auto-group scheduling feature.

The scheduling itself is not 'heuristics'.

It is the _composition of a group_ that has a heuristic default. (We use the 'tty' 
to act as the grouping)

But that can be changed: the cgroup interfaces can be (and are) used by Gnome to 
create different groups. They can be used by users as well, using cgroup tooling.

What the kernel does is that it provides sane defaults.

> [...]
>
> Move away from the fragile heuristic tweaks and find a longer term robust 
> solution.

This is not some kernel heuristic that cannot be modified - which was the main 
problem of the O(1) scheduler. This is a common-sense default that can be overriden 
by user-space if it wants to.

So i definitely think you are confusing the two cases.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 19:22                                   ` Colin Walters
@ 2010-12-05 20:47                                     ` Linus Torvalds
  2010-12-05 22:47                                       ` Colin Walters
  2010-12-07 18:51                                       ` Peter Zijlstra
  0 siblings, 2 replies; 79+ messages in thread
From: Linus Torvalds @ 2010-12-05 20:47 UTC (permalink / raw)
  To: Colin Walters
  Cc: Ray Lee, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sun, Dec 5, 2010 at 11:22 AM, Colin Walters <walters@verbum.org> wrote:
>
> For the purposes of this discussion again, let's say "fixing nice"
> means say "group schedule each nice level above 0".  There are
> obviously many possibilities here, but let's consider this one
> precisely.

THAT IS NOT HOW 'nice' WORKS!

For chissake, how hard is it to understand?

The semantics of "nice" are not - and have never been - to put things
into process scheduling groups of their own.

When somebody says "nice xyzzy", they are explicitly stating that
"xyzzy" isn't as important as other processes. It's done for stuff
that you don't care about, and more specifically, for stuff that you
really don't want to impact anything else. So if there are other
things to be run, 'nice' means that those should get more CPU time.

(Obviously, negative nice levels work the other way around).

This is very much documented. People rely on it. Look at the man-page.
It talks about "most favorable" vs "least favorable" scheduling.

> Two people logged in would get their "make" jobs group scheduled
> together.  What is the problem?

The problem is that you don't know what the hell you are talking about.

Different nice levels shouldn't get group scheduled together - they
should be scheduled *less*. And it's not about "make", since nobody
really ever uses nice on make anyway, it's about things like
pulseaudio (that wants higher priorities) and random background
filesystem indexers etc (that want lower priorities).

Nice levels are _not_ about group scheduling. They're about
priorities. And since the cgroup code doesn't even support priority
levels for the groups, it's a really *horrible* match.

And the thing is, the nice semantics are traditional. They are also
*horrible*, but that doesn't allow you to change their semantics.
People rely on those crazy traditional and mostly useless semantics.
Not very much (because they are mostly useless), but there really are
people who use it.

And they use it knowing that positive nice levels means that something
is less important.

In contrast, giving processes a scheduling group doesn't imply "less
important". Not AT ALL. It doesn't really mean "more important"
either, it just means "somewhat insulated from other groups".

So let's say that you have a filesystem indexer, and you nice it up to
make sure that it doesn't steal CPU bandwidth from your "real work".
Now, let's say that you start a "make -16" to build something
important.

Do you *really* think that the person who niced the filesystem indexer
down wants the indexer to get 50% of the CPU, just because it's
scheduled separately from the parallel make?

HELL NO!

So stop this idiocy. "nice" has absolutely nothing to do with group
scheduling. It cannot. It must not. It's a legacy interface, and it
has real semantics.

> Since Linus appears to be more interested in talking about nipples
> than explaining exactly what it would break, but you appear to agree
> with him, hopefully you'll be able to explain...

The reason I was talking about make nipples should be clear by now.
Think "legacy interface". Think "don't mess with it, because people
are used to it".

They may be useless, but dammit, they do what they do.

Don't try to turn male nipples into something they aren't. And don't
try to turn 'nice' into something it isn't.

                     Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05  7:47                                 ` Ray Lee
@ 2010-12-05 19:22                                   ` Colin Walters
  2010-12-05 20:47                                     ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: Colin Walters @ 2010-12-05 19:22 UTC (permalink / raw)
  To: Ray Lee
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sun, Dec 5, 2010 at 2:47 AM, Ray Lee <ray-lk@madrabbit.org> wrote:
> On Sat, Dec 4, 2010 at 3:43 PM, Colin Walters <walters@verbum.org> wrote:
>> So if it's a heuristic the OS can get wrong, wouldn't it be a good
>> idea to support a way for programs and/or interactive users to
>> explicitly specify things?
>
> Consider a multi-user machine. `nice` is an orthogonal concern in that
> case. Therefore, fixing nice doesn't address all issues.

For the purposes of this discussion again, let's say "fixing nice"
means say "group schedule each nice level above 0".  There are
obviously many possibilities here, but let's consider this one
precisely.

How, exactly, under what scenario in a "multi-user machine" does this
break?  How exactly is it orthogonal?

Two people logged in would get their "make" jobs group scheduled
together.  What is the problem?

Since Linus appears to be more interested in talking about nipples
than explaining exactly what it would break, but you appear to agree
with him, hopefully you'll be able to explain...

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-05 10:18                                 ` Con Kolivas
@ 2010-12-05 11:36                                   ` Mike Galbraith
  2010-12-05 20:58                                   ` Ingo Molnar
  1 sibling, 0 replies; 79+ messages in thread
From: Mike Galbraith @ 2010-12-05 11:36 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Colin Walters, Linus Torvalds, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers,
	linux-kernel

On Sun, 2010-12-05 at 21:18 +1100, Con Kolivas wrote:
> Greets.
> 
> I applaud your efforts to continue addressing interactivity and responsiveness 
> but, I know I'm going to regret this, I feel strongly enough to speak up about 
> this change.
> 
> On Sun, 5 Dec 2010 10:43:44 Colin Walters wrote:
> > On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > What's your point again? It's a heuristic.
> > 
> > So if it's a heuristic the OS can get wrong,
> 
> This is precisely what I see as the flaw in this approach. The whole reason 
> you have CFS now is that we had a scheduler which was pretty good for all the 
> other things in the O(1) scheduler, but needed heuristics to get interactivity 
> right. I put them there.

Actually, Linus laid the foundation with sleeper fairness, Ingo expanded
it to requeue "interactive" tasks in the active array, and you tweaked
the result.

>  Then I spent the next few years trying to find a way 
> to get rid of them. The reason is precisely what Colin says above. Heuristics 
> get it wrong sometimes. So no matter how smart you think your heuristics are, 
> it is impossible to get it right 100% of the time. If the heuristics make it 
> better 99% of the time, and introduce disastrous corner cases, regressions and 
> exploits 1% of the time, that's unforgivable. That's precisely what we had 
> with the old O(1) scheduler and that's what you got rid of when you put CFS 
> into mainline. The whole reason CFS was better was it was mostly fair and 
> concentrated on ensuring decent latency rather than trying to guess what would 
> be right, so it was predictable and reliable.

And it still is Con.  I didn't rewrite the thing, I just added an
automated task grouping.  Session to session fairness is just as holy as
any sacred cow definition of fair you care to trot out.

> So if you introduce heuristics once again into the scheduler to try and 
> improve the desktop by unfairly distributing CPU, you will go back to where 
> you once were. Mostly better but sometimes really badly wrong. No matter how 
> smart you think you can be with heuristics they cannot be right all the time. 
> And there are regressions with these tty followed by per session group 
> patches. Search forums where desktop users go and you'll see that people are 
> afraid to speak up on lkml but some users are having mplayer and amarok 
> skipping under light load when trying them.

Shrug. I can't debug what isn't reported.

>  You want to program more 
> intelligence in to work around these regressions, you'll just get yourself 
> deeper and deeper into the same quagmire. The 'quick fix' you seek now is not 
> something you should be defending so vehemently. The "I have a solution now" 
> just doesn't make sense in this light. I for one do not welcome our new 
> heuristic overlords.

I for one don't welcome childish name calling.

> If you're serious about really improving the desktop from within the kernel, 
> as you seem to be with this latest change, then make a change that's 
> predictable and gets it right ALL the time and is robust for the future. Stop 
> working within all the old fashioned concepts and allow userspace to tell the 
> kernel what it wants, and give the user the power to choose. If you think this 
> is too hard and not doable, or that the user is too uninformed or want to 
> modify things themselves, then allow me to propose a relatively simple change 
> that can expedite this.
> 
> There are two aspects to getting good desktop behaviour, enough CPU and low 
> latency. 'nice' by your own admission is too crude and doesn't really describe 
> how either of these should really be modified. Furthermore there are 40 levels 
> of it and only about 4 or 5 are ever used. We also know that users don't even 
> bother using it. 
> 
> What I propose is a new syscall latnice for "latency nice". It only need have 
> 4 levels, 1 for default, 0 for latency insensitive, 2 for relatively latency 
> sensitive gui apps, and 3 for exquisitely latency sensitive uses such as 
> audio. These should not require extra privileges to use and thus should also 
> not be usable for "exploiting" extra CPU by default. It's simply a matter of 
> working with lower latencies yet shorter quota (or timeslices) which would 
> mean throughput on these apps is sacrificed due to cache trashing but then 
> that's not what latency sensitive applications need. These can then be 
> encouraged to be included within the applications themselves, making this a 
> more long term change. 'Firefox' could set itself 2, 'Amarok' and 'mplayer' 3, 
> and 'make' - bless its soul - 0, and so on. Keeping the range simple and 
> defined will make it easy for userspace developers to cope with, and users to 
> fiddle with.

An automated per session task group is an evil heuristic, so we should
use kinda sorta sensitive, really REALLY sensitive, don't give a damn,
or no frickin' clue... to make 100% accurate non-heuristic scheduling
decisions instead?  Did I get that right?

Goodbye.

	-Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 20:01                           ` Colin Walters
  2010-12-04 22:39                             ` Linus Torvalds
  2010-12-04 23:31                             ` david
@ 2010-12-05 11:11                             ` Nikos Chantziaras
  2010-12-06  0:28                             ` Valdis.Kletnieks
  3 siblings, 0 replies; 79+ messages in thread
From: Nikos Chantziaras @ 2010-12-05 11:11 UTC (permalink / raw)
  To: linux-kernel

On 12/04/2010 10:01 PM, Colin Walters wrote:
>[...]
> Speaking of the scheduler documentation - note that its sample shell
> code contains exactly the problem showing what's wrong with
> auto-grouping-by-tty, which is:
>
> # firefox&	# Launch firefox and move it to "browser" group
>
> As soon as you do that from the same terminal that you're going to
> launch the "make" from, you're back to total lossage.  Are you going
> to explain to a student that "oh, you need to create a new
> gnome-terminal tab and launch firefox from that"?

Btw, most people don't do that anymore.  They don't use terminals.  They 
click the application icons on their desktops and start menus or double 
click the executables in their file managers.  So it's not a matter of 
opening a second terminal tab, because the first one isn't even open.

To have a fluid desktop one shouldn't require to hack with terminal 
commands.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 23:43                               ` Colin Walters
  2010-12-05  0:31                                 ` Linus Torvalds
  2010-12-05  7:47                                 ` Ray Lee
@ 2010-12-05 10:18                                 ` Con Kolivas
  2010-12-05 11:36                                   ` Mike Galbraith
  2010-12-05 20:58                                   ` Ingo Molnar
  2 siblings, 2 replies; 79+ messages in thread
From: Con Kolivas @ 2010-12-05 10:18 UTC (permalink / raw)
  To: Colin Walters
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers,
	linux-kernel

Greets.

I applaud your efforts to continue addressing interactivity and responsiveness 
but, I know I'm going to regret this, I feel strongly enough to speak up about 
this change.

On Sun, 5 Dec 2010 10:43:44 Colin Walters wrote:
> On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > What's your point again? It's a heuristic.
> 
> So if it's a heuristic the OS can get wrong,

This is precisely what I see as the flaw in this approach. The whole reason 
you have CFS now is that we had a scheduler which was pretty good for all the 
other things in the O(1) scheduler, but needed heuristics to get interactivity 
right. I put them there. Then I spent the next few years trying to find a way 
to get rid of them. The reason is precisely what Colin says above. Heuristics 
get it wrong sometimes. So no matter how smart you think your heuristics are, 
it is impossible to get it right 100% of the time. If the heuristics make it 
better 99% of the time, and introduce disastrous corner cases, regressions and 
exploits 1% of the time, that's unforgivable. That's precisely what we had 
with the old O(1) scheduler and that's what you got rid of when you put CFS 
into mainline. The whole reason CFS was better was it was mostly fair and 
concentrated on ensuring decent latency rather than trying to guess what would 
be right, so it was predictable and reliable.

So if you introduce heuristics once again into the scheduler to try and 
improve the desktop by unfairly distributing CPU, you will go back to where 
you once were. Mostly better but sometimes really badly wrong. No matter how 
smart you think you can be with heuristics they cannot be right all the time. 
And there are regressions with these tty followed by per session group 
patches. Search forums where desktop users go and you'll see that people are 
afraid to speak up on lkml but some users are having mplayer and amarok 
skipping under light load when trying them. You want to program more 
intelligence in to work around these regressions, you'll just get yourself 
deeper and deeper into the same quagmire. The 'quick fix' you seek now is not 
something you should be defending so vehemently. The "I have a solution now" 
just doesn't make sense in this light. I for one do not welcome our new 
heuristic overlords.

If you're serious about really improving the desktop from within the kernel, 
as you seem to be with this latest change, then make a change that's 
predictable and gets it right ALL the time and is robust for the future. Stop 
working within all the old fashioned concepts and allow userspace to tell the 
kernel what it wants, and give the user the power to choose. If you think this 
is too hard and not doable, or that the user is too uninformed or want to 
modify things themselves, then allow me to propose a relatively simple change 
that can expedite this.

There are two aspects to getting good desktop behaviour, enough CPU and low 
latency. 'nice' by your own admission is too crude and doesn't really describe 
how either of these should really be modified. Furthermore there are 40 levels 
of it and only about 4 or 5 are ever used. We also know that users don't even 
bother using it. 

What I propose is a new syscall latnice for "latency nice". It only need have 
4 levels, 1 for default, 0 for latency insensitive, 2 for relatively latency 
sensitive gui apps, and 3 for exquisitely latency sensitive uses such as 
audio. These should not require extra privileges to use and thus should also 
not be usable for "exploiting" extra CPU by default. It's simply a matter of 
working with lower latencies yet shorter quota (or timeslices) which would 
mean throughput on these apps is sacrificed due to cache trashing but then 
that's not what latency sensitive applications need. These can then be 
encouraged to be included within the applications themselves, making this a 
more long term change. 'Firefox' could set itself 2, 'Amarok' and 'mplayer' 3, 
and 'make' - bless its soul - 0, and so on. Keeping the range simple and 
defined will make it easy for userspace developers to cope with, and users to 
fiddle with.

But that would only be the first step. The second step is to take the plunge 
and accept that we DO want selective unfairness on the desktop, but where WE 
want it, not where the kernel thinks we might want it. It's not an exploit if 
my full screen HD video continues to consume 80% of the CPU while make is 
running - on a desktop. Take a leaf out of other desktop OSs and allow the 
user to choose say levels 0, 1, or 2 for desktop interactivity with a simple 
/proc/sys/kernel/interactive tunable, a bit like the "optimise for foreground 
applications" seen elsewhere. This could then be used to decide whether to use 
the scheduling hints from latnice to either just ensure low latency but keep 
the same CPU usage  - 0, or actually give progressively more CPU for latniced 
tasks as the interactive tunable is increased. Then distros can set this on 
installation and make it part of the many funky GUIs to choose between the 
different levels. This then takes the user out of the picture almost entirely, 
yet gives them the power to change it if they so desire.

The actual scheduler changes required to implement this are absurdly simple 
and doable now, and will not cost in overhead the way cgroups do. It also 
should cause no regressions when interactive mode is disabled and would have 
no effect till changes are made elsewhere, or the users use the latnice 
utility.

Move away from the fragile heuristic tweaks and find a longer term robust 
solution.

Regards,
Con

-- 
-ck

P.S. I'm very happy for someone else to do it. Alternatively you could include 
BFS and I'd code it up for that in my spare time.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 23:43                               ` Colin Walters
  2010-12-05  0:31                                 ` Linus Torvalds
@ 2010-12-05  7:47                                 ` Ray Lee
  2010-12-05 19:22                                   ` Colin Walters
  2010-12-05 10:18                                 ` Con Kolivas
  2 siblings, 1 reply; 79+ messages in thread
From: Ray Lee @ 2010-12-05  7:47 UTC (permalink / raw)
  To: Colin Walters
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Dec 4, 2010 at 3:43 PM, Colin Walters <walters@verbum.org> wrote:
> So if it's a heuristic the OS can get wrong, wouldn't it be a good
> idea to support a way for programs and/or interactive users to
> explicitly specify things?

Consider a multi-user machine. `nice` is an orthogonal concern in that
case. Therefore, fixing nice doesn't address all issues. Also: Most
linux systems are multi-user (root and the physical tty user.)
Further, even a single user wears multiple hats on a single system.
The idea is to infer those hats, and deal with them fairly.

No one is taking nice away from you. Keep using it if you like.

If you want to allow users to explicitly specify group scheduling,
then good news: we already have that feature. You just seem to not be
using it. Much like the other 99.993% of us.

The kernel is supposed to have *sane defaults*. That's what is under
discussion here.

~r.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 23:43                               ` Colin Walters
@ 2010-12-05  0:31                                 ` Linus Torvalds
  2010-12-05  7:47                                 ` Ray Lee
  2010-12-05 10:18                                 ` Con Kolivas
  2 siblings, 0 replies; 79+ messages in thread
From: Linus Torvalds @ 2010-12-05  0:31 UTC (permalink / raw)
  To: Colin Walters
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Dec 4, 2010 at 3:43 PM, Colin Walters <walters@verbum.org> wrote:
> On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>
>> And it doesn't hurt you. If you're happy with "nice", go on and use
>> it. Why are you even discussing it?
>
> Because it seems to me like a bug if it isn't as good as group
> scheduling?  Most of your message is saying it's worthless, and I
> don't disagree that it's not very good *right now*.  I guess where we
> disagree is whether it's worth fixing.

It's not worth 'fixing", because it works exactly like it's designed -
and supposed - to work.

There really isn't anything to fix. 'nice' is what it is. It's a
simple legacy interface to scheduler priority. The fact that it's also
almost totally useless is irrelevant. It's like male nipples. We
wouldn't be better off lactating, and they look like some odd wart
that doesn't do much good. But it would be worse to remove it.

'nice' is a bad idea. It's a bad idea that has perfectly
understandable historical reasons for it, but it's an _unfixably_ bad
idea.

                      Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 22:39                             ` Linus Torvalds
@ 2010-12-04 23:43                               ` Colin Walters
  2010-12-05  0:31                                 ` Linus Torvalds
                                                   ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Colin Walters @ 2010-12-04 23:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Dec 4, 2010 at 5:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:

> And it doesn't hurt you. If you're happy with "nice", go on and use
> it. Why are you even discussing it?

Because it seems to me like a bug if it isn't as good as group
scheduling?  Most of your message is saying it's worthless, and I
don't disagree that it's not very good *right now*.  I guess where we
disagree is whether it's worth fixing.

> What's your point again? It's a heuristic.

So if it's a heuristic the OS can get wrong, wouldn't it be a good
idea to support a way for programs and/or interactive users to
explicitly specify things?  Unfortunately the cgroups utilities don't
make this easy (and of course there's the issue that no major released
OS exports write permission to the cpu cgroup for a desktop session
uid).  I guess "nice" could be patched to, if the user has permission
to the cgroups, to auto-create a group.  Or...nice could be fixed.

On a more productive note, I see now
Documentation/scheduler/sched-nice-design.txt has a lot of really
useful history regarding "nice" and the complaints over time (I guess
this is where some of your assertions that it's failed/worthless comes
from).

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 20:01                           ` Colin Walters
  2010-12-04 22:39                             ` Linus Torvalds
@ 2010-12-04 23:31                             ` david
  2010-12-05 11:11                             ` Nikos Chantziaras
  2010-12-06  0:28                             ` Valdis.Kletnieks
  3 siblings, 0 replies; 79+ messages in thread
From: david @ 2010-12-04 23:31 UTC (permalink / raw)
  To: Colin Walters
  Cc: Linus Torvalds, Mike Galbraith, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, 4 Dec 2010, Colin Walters wrote:

> Speaking of the scheduler documentation - note that its sample shell
> code contains exactly the problem showing what's wrong with
> auto-grouping-by-tty, which is:
>
> # firefox &	# Launch firefox and move it to "browser" group
>
> As soon as you do that from the same terminal that you're going to
> launch the "make" from, you're back to total lossage.  Are you going
> to explain to a student that "oh, you need to create a new
> gnome-terminal tab and launch firefox from that"?

as someone who starts firefox from a terminal session all the time, I 
always want to start it from it's own dedicated session, if for no other 
reason that it spits out a TON of error messages over time, and I don't 
want them popping up in a window where I'm doing something else.

so this is a very bad example.

David Lang

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 20:01                           ` Colin Walters
@ 2010-12-04 22:39                             ` Linus Torvalds
  2010-12-04 23:43                               ` Colin Walters
  2010-12-04 23:31                             ` david
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2010-12-04 22:39 UTC (permalink / raw)
  To: Colin Walters
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Dec 4, 2010 at 12:01 PM, Colin Walters <walters@verbum.org> wrote:
>
> But then again here's a Berkeley "Unix Tutorial" that does cover it:
> http://people.ischool.berkeley.edu/~kevin/unix-tutorial/section13.html

What part of "nobody does that" didn't you understand?

I know about "nice". I think it was part of the original Unix course I
ever took, and it probably made more sense back then (sixteen people
at a time on a microvax, compiling stuff). And I never used it, afaik.
Nor does really anybody else.

But hey, whatever floats your boat. You can use it. And you can feel
special and better than the rest of us exactly because you know you
really _are_ special.

> Hmm...how many threads are we talking about here?  If it's just say
> one per core, then I doubt it needs nicing.

I think git defaults to a maximum of 20 for it. Remember: it's not
about "cores". It's about IO, and then 20 is a "let's not mess up
everybody else _too_ much when we're actually CPU-bound".

But that's not the point. The point is that "nice" is totally the
wrong thing to do. It's _always_ the wrong thing to do. The only
reason it's in tutorials and taught in intro Unix classes is that it's
the only thing there is in traditional unix.

And we can be better. We don't need to be stupid and traditional.

But you go right on and use it. Nobody stops you.

> Sure...though I imagine for "most" people that's totally I/O bound
> (either on ext4 journal or hard disk seeks).

Sure. And "most" people do something totally different. What's your
point? The fact is, the session-based group scheduling really does
work. It works on a lot of different loads. It's nice for things like
my use, but it's _also_ nice for things like me ssh'ing into my kids
or wife's computers to update their kernel. And it's nice for things
like "make -j test" for git etc.

And it doesn't hurt you. If you're happy with "nice", go on and use
it. Why are you even discussing it? I'm telling you the FACT that
others aren't happy with nice, and that smart people consider nice to
be totally useless.

But none of that means that you can't go on using it. Comprende?

> # firefox &     # Launch firefox and move it to "browser" group
>
> As soon as you do that from the same terminal that you're going to
> launch the "make" from, you're back to total lossage.

"Mommy mommy, it hurts when I stick forks in my eyes!"

What's your point again? It's a heuristic. It works great for the
cases many normal people have. If you have a graphical desktop, most
sane people would tend to start the browser from that nice big browser
icon. But again, if you want to stick forks in your eyes, go right
ahead. It's not _my_ problem.

And similarly, it's not _your_ problem if other people want to do
saner things, is it?

                    Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 18:33                         ` Linus Torvalds
@ 2010-12-04 20:01                           ` Colin Walters
  2010-12-04 22:39                             ` Linus Torvalds
                                               ` (3 more replies)
  0 siblings, 4 replies; 79+ messages in thread
From: Colin Walters @ 2010-12-04 20:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Dec 4, 2010 at 1:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But the fundamental issue is that 'nice' is broken. It's very much
> broken at a conceptual and technical design angle (absolute priority
> levels, no fairness), but it's broken also from a psychological and
> practical angle (ie expecting people to manually do extra work is
> ridiculous and totally unrealistic).

I don't see it as ridiculous - for the simple reason that it really
has existed for so long and is documented (see below).

> Why would you want to do that? If you are willing to do group
> scheduling, do it on something sane and meaningful, and something that
> doesn't need user interaction or decisions. And do it on something
> that has more than 20 levels.

In this case, the "user interaction" component is pretty damn small.
We're talking about 4 extra characters.

> Nobody but morons ever "documented" that. Sure, you can find people
> saying it, but you won't be finding people actually _doing_ it. Look
> around.

Look around...where?  On what basis are you making that claim?  I did
a quick web search for "unix background process", and this tutorial
(in the first page of Google search results) aimed at grad students
who use Unix at college definitely describes "nice make":
http://acs.ucsd.edu/info/jobctrl.shtml

There are some that don't, like:
http://linux.about.com/od/itl_guide/a/gdeitl35t01.htm and
http://www.albany.edu/its/quickstarts/qs-common_unix.html

But then again here's a Berkeley "Unix Tutorial" that does cover it:
http://people.ischool.berkeley.edu/~kevin/unix-tutorial/section13.html

So, does your random Linux-using college student or professional
developer know about "nice"?  My guess is "likely".  Do they use it
for "make"?  No data.  The issue is that you really only have a bad
experience on *large* projects.  But if we just said to people who
come to us "Hey, when I compile webkit/linux/mozilla my system slows
down" we can tell them "use nice", especially since it's already
documented on the web, that seems to me like a pretty damn good
answer.

> Seriously. Nobody _ever_ does "nice make", unless they are seriously
> repressed beta-males (eg MIS people who get shouted at when they do
> system maintenance unless they hide in dark corners and don't get
> discovered). It just doesn't happen.

Heh.  Well, I do at least (or rather, my personal automagic build
wrapper script does (it detects Makefile/autotools etc. and tries to
DTRT)).

> But more fundamentally, it's still the wrong thing to do. What nice
> level should you use?

Doesn't matter - if they all got group-scheduled together, then the
default of 10 (0+10) is totally fine.

> Do you want to do "nice git" too? Especially as the reason the
> threaded lstat was implemented was that over NFS, you actually want
> the threads not because you're using lots of CPU, but because you want
> to fire up lots of concurrent network traffic - and you actually want
> low latency. So you do NOT want to mark these threads as
> "unimportant". They're not.

Hmm...how many threads are we talking about here?  If it's just say
one per core, then I doubt it needs nicing.  The reason people nice
make is because the whole thing alternates between being CPU bound and
I/O bound, so you need to start more jobs than cores (sometimes a lot
more) to ensure maximal utilization.

> But what you do want is a basic and automatic fairness. When I do "git
> grep", I want the full resources of the machine to do the grep for me,
> so that I can get the answer in half a second (which is about the
> limit at which point I start getting impatient). That's an _important_
> job for me. It should get all the resources it can, there is
> absolutely no excuse for nicing it down.

Sure...though I imagine for "most" people that's totally I/O bound
(either on ext4 journal or hard disk seeks).

> Now, I'm not saying that cgroups are necessarily the answer either.
> But using sessions as input to group scheduling is certainly _one_
> answer. And it's a hell of a better answer than 'nice' has ever been,
> or will ever be.

Well, the text of Documentation/scheduler/sched-design-CFS.txt
certainly seems to be claiming it was a big improvement in this kind
of situation from the previous scheduler.  If we're finding out there
are cases where it's not, it's definitely worth asking the question
why it's not working.

Speaking of the scheduler documentation - note that its sample shell
code contains exactly the problem showing what's wrong with
auto-grouping-by-tty, which is:

# firefox &	# Launch firefox and move it to "browser" group

As soon as you do that from the same terminal that you're going to
launch the "make" from, you're back to total lossage.  Are you going
to explain to a student that "oh, you need to create a new
gnome-terminal tab and launch firefox from that"?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-12-04 17:39                       ` Colin Walters
@ 2010-12-04 18:33                         ` Linus Torvalds
  2010-12-04 20:01                           ` Colin Walters
  0 siblings, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2010-12-04 18:33 UTC (permalink / raw)
  To: Colin Walters
  Cc: Mike Galbraith, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Dec 4, 2010 at 9:39 AM, Colin Walters <walters@verbum.org> wrote:
>
> Why doesn't "nice" work for this?  On my Fedora 14 system, "ps alxf"
> shows almost everything in my session is running at the default nice
> 0.  The only exceptions are "/usr/libexec/tracker-miner-fs" at 19, and
> pulseaudio at -11.

"nice" doesn't work. It never has. Nobody ever uses it, and that has
always been true.

As you note, you can find occasional cases of it being used, but they
are either for things that are _so_ unimportant (and know they are)
and annoying cpu hogs that they wouldn't be allowed to live unless
they were niced down maximally (your tracker-miner example), or they
use nice not because they really want to, but because it is an
approximation for what they really do want (ie pulseaudio wants low
latencies, and is set up by the distro, so you'll find it niced up).

But the fundamental issue is that 'nice' is broken. It's very much
broken at a conceptual and technical design angle (absolute priority
levels, no fairness), but it's broken also from a psychological and
practical angle (ie expecting people to manually do extra work is
ridiculous and totally unrealistic).

> I don't know What would happen if say the scheduler effectively
> group-scheduled each nice value?

Why would you want to do that? If you are willing to do group
scheduling, do it on something sane and meaningful, and something that
doesn't need user interaction or decisions. And do it on something
that has more than 20 levels.

You could, for example, decide to do it per session.

> Then, what we tell people to do is
> run "nice make".  Which in fact, has been documented as a thing to do
> for decades.

Nobody but morons ever "documented" that. Sure, you can find people
saying it, but you won't be finding people actually _doing_ it. Look
around.

Seriously. Nobody _ever_ does "nice make", unless they are seriously
repressed beta-males (eg MIS people who get shouted at when they do
system maintenance unless they hide in dark corners and don't get
discovered). It just doesn't happen.

But more fundamentally, it's still the wrong thing to do. What nice
level should you use?

And btw, it's not just "make". One of the things that originally
caused me to want something like this is that you can enable some
pretty aggressive threading with "git diff". If you use the
"core.preloadindex" setting, git will fire up 20 threads just to do
"lstat()" system calls as quickly as it humanly can. Or "git grep"
will happily use lots of threads and really mess with your system,
except it limits the threads to a smallish number just to not be
asocial.

Do you want to do "nice git" too? Especially as the reason the
threaded lstat was implemented was that over NFS, you actually want
the threads not because you're using lots of CPU, but because you want
to fire up lots of concurrent network traffic - and you actually want
low latency. So you do NOT want to mark these threads as
"unimportant". They're not.

But what you do want is a basic and automatic fairness. When I do "git
grep", I want the full resources of the machine to do the grep for me,
so that I can get the answer in half a second (which is about the
limit at which point I start getting impatient). That's an _important_
job for me. It should get all the resources it can, there is
absolutely no excuse for nicing it down.

But at the same time, if I just happen to have sound or something
going on at the same time, I would definitely like some amount of
fairness. Just because git is smart and can use lots of threads to do
its work quickly, it shouldn't be _unfair_. It should hod the machine
- but only up to a point of some fairness.

That is something that "nice" can never give you. It's not what nice
was designed for, it's not how nice works. And if you ask people to
say "this work isn't important", you shouldn't expect them to actually
do it. If something isn't important, I certainly won't then spend
extra effort on it, for chrissake!

Now, I'm not saying that cgroups are necessarily the answer either.
But using sessions as input to group scheduling is certainly _one_
answer. And it's a hell of a better answer than 'nice' has ever been,
or will ever be.

                             Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4] sched: automated per session task groups
  2010-11-20 19:35                     ` [PATCH v4] sched: automated per session " Mike Galbraith
@ 2010-12-04 17:39                       ` Colin Walters
  2010-12-04 18:33                         ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: Colin Walters @ 2010-12-04 17:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Oleg Nesterov, Peter Zijlstra, Linus Torvalds,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Sat, Nov 20, 2010 at 2:35 PM, Mike Galbraith <efault@gmx.de> wrote:

> A recurring complaint from CFS users is that parallel kbuild has a negative
> impact on desktop interactivity.  This patch implements an idea from Linus,
> to automatically create task groups.  This patch only per session autogroups,
> but leaves the way open for enhancement.

Resurrecting this thread a bit, one question I didn't see discussed is simply:

Why doesn't "nice" work for this?  On my Fedora 14 system, "ps alxf"
shows almost everything in my session is running at the default nice
0.  The only exceptions are "/usr/libexec/tracker-miner-fs" at 19, and
pulseaudio at -11.

I don't know What would happen if say the scheduler effectively
group-scheduled each nice value?  Then, what we tell people to do is
run "nice make".  Which in fact, has been documented as a thing to do
for decades.  Actually I tend to use "ionice" too, which is also
useful if any of your desktop applications happen to make the mistake
of doing I/O in the mainloop (emacs fsync()ing in UI thread, I'm
looking at you).

Quickly testing kernel-2.6.35.6-48.fc14.x86_64 on a "Intel(R)
Core(TM)2 Quad CPU    Q9400  @ 2.66GHz", the difference between "make
-j 128" and "nice make -j 128" is quite noticeable.  As you'd expect.
The CFS docs already say:

"The CFS scheduler has a much stronger handling of nice levels and SCHED_BATCH
than the previous vanilla scheduler: both types of workloads are isolated much
more aggressively"

Does it just need to be even more aggressive, and people use "nice"?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v4] sched: automated per session task groups
  2010-11-16 17:28                   ` Ingo Molnar
@ 2010-11-20 19:35                     ` Mike Galbraith
  2010-12-04 17:39                       ` Colin Walters
  0 siblings, 1 reply; 79+ messages in thread
From: Mike Galbraith @ 2010-11-20 19:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Peter Zijlstra, Linus Torvalds,
	Markus Trippelsdorf, Mathieu Desnoyers, LKML

On Tue, 2010-11-16 at 18:28 +0100, Ingo Molnar wrote:

> Mike,
> 
> Mind sending a new patch with a separate v2 announcement in a new thread, once you 
> have something i could apply to the scheduler tree (for a v2.6.38 merge)?

Changes since last:
- switch to per session vs tty
- make autogroups visible in /proc/sched_debug
- make autogroups visible in /proc/<pid>/autogroup
- add nice level bandwidth tweakability to /proc/<pid>/autogroup

Modulo "kill it" debate outcome...

A recurring complaint from CFS users is that parallel kbuild has a negative
impact on desktop interactivity.  This patch implements an idea from Linus,
to automatically create task groups.  This patch only per session autogroups,
but leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited pointer to a
refcounted autogroup struct containing a task group pointer, the default for
all tasks pointing to the init_task_group.  When a task calls setsid(), the
process wide reference to the default group is dropped, a new task group is
created, and the process is moved into the new task group.  Children thereafter
inherit this task group, and increase it's refcount.  On exit, a reference to the
current task group is dropped when the last reference to each signal struct is
dropped.  The task group is destroyed when the last signal struct referencing
it is freed.   At runqueue selection time, IFF a task has no cgroup assignment,
it's current autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level through the
proc filesystem.  cat /proc/<pid>/autogroup displays the task's group and
the group's nice level.  echo <nice level> > /proc/<pid>/autogroup sets the
task group's shares to the weight of nice <level> task.  Setting nice level
is rate limited for !admin users due to the abuse risk of task group locking.

The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP is
selected, but can be disabled via the boot option noautogroup, and can be
also be turned on/off on the fly via..
   echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
..which will automatically move tasks to/from the root task group.

Signed-off-by: Mike Galbraith <efault@gmx.de>

---
 Documentation/kernel-parameters.txt |    2 
 fs/proc/base.c                      |   79 +++++++++++
 include/linux/sched.h               |   23 +++
 init/Kconfig                        |   12 +
 kernel/fork.c                       |    5 
 kernel/sched.c                      |   13 +
 kernel/sched_autogroup.c            |  243 ++++++++++++++++++++++++++++++++++++
 kernel/sched_autogroup.h            |   23 +++
 kernel/sched_debug.c                |   29 ++--
 kernel/sys.c                        |    4 
 kernel/sysctl.c                     |   11 +
 11 files changed, 426 insertions(+), 18 deletions(-)

Index: linux-2.6.37.git/include/linux/sched.h
===================================================================
--- linux-2.6.37.git.orig/include/linux/sched.h
+++ linux-2.6.37.git/include/linux/sched.h
@@ -509,6 +509,8 @@ struct thread_group_cputimer {
 	spinlock_t lock;
 };
 
+struct autogroup;
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
@@ -576,6 +578,9 @@ struct signal_struct {
 
 	struct tty_struct *tty; /* NULL if no tty */
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 	/*
 	 * Cumulative resource counters for dead threads in the group,
 	 * and for reaped dead child processes forked by this group.
@@ -1931,6 +1936,24 @@ int sched_rt_handler(struct ctl_table *t
 
 extern unsigned int sysctl_sched_compat_yield;
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+extern unsigned int sysctl_sched_autogroup_enabled;
+
+extern void sched_autogroup_create_attach(struct task_struct *p);
+extern void sched_autogroup_detach(struct task_struct *p);
+extern void sched_autogroup_fork(struct signal_struct *sig);
+extern void sched_autogroup_exit(struct signal_struct *sig);
+#ifdef CONFIG_PROC_FS
+extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
+extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
+#endif
+#else
+static inline void sched_autogroup_create_attach(struct task_struct *p) { }
+static inline void sched_autogroup_detach(struct task_struct *p) { }
+static inline void sched_autogroup_fork(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+#endif
+
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
Index: linux-2.6.37.git/kernel/sched.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched.c
+++ linux-2.6.37.git/kernel/sched.c
@@ -78,6 +78,7 @@
 
 #include "sched_cpupri.h"
 #include "workqueue_sched.h"
+#include "sched_autogroup.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/sched.h>
@@ -268,6 +269,10 @@ struct task_group {
 	struct task_group *parent;
 	struct list_head siblings;
 	struct list_head children;
+
+#ifdef CONFIG_SCHED_AUTOGROUP
+	struct autogroup *autogroup;
+#endif
 };
 
 #define root_task_group init_task_group
@@ -605,11 +610,14 @@ static inline int cpu_of(struct rq *rq)
  */
 static inline struct task_group *task_group(struct task_struct *p)
 {
+	struct task_group *tg;
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
 			lockdep_is_held(&task_rq(p)->lock));
-	return container_of(css, struct task_group, css);
+	tg = container_of(css, struct task_group, css);
+
+	return autogroup_task_group(p, tg);
 }
 
 /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
@@ -2006,6 +2014,7 @@ static void sched_irq_time_avg_update(st
 #include "sched_idletask.c"
 #include "sched_fair.c"
 #include "sched_rt.c"
+#include "sched_autogroup.c"
 #include "sched_stoptask.c"
 #ifdef CONFIG_SCHED_DEBUG
 # include "sched_debug.c"
@@ -7979,7 +7988,7 @@ void __init sched_init(void)
 #ifdef CONFIG_CGROUP_SCHED
 	list_add(&init_task_group.list, &task_groups);
 	INIT_LIST_HEAD(&init_task_group.children);
-
+	autogroup_init(&init_task);
 #endif /* CONFIG_CGROUP_SCHED */
 
 #if defined CONFIG_FAIR_GROUP_SCHED && defined CONFIG_SMP
Index: linux-2.6.37.git/kernel/fork.c
===================================================================
--- linux-2.6.37.git.orig/kernel/fork.c
+++ linux-2.6.37.git/kernel/fork.c
@@ -174,8 +174,10 @@ static inline void free_signal_struct(st
 
 static inline void put_signal_struct(struct signal_struct *sig)
 {
-	if (atomic_dec_and_test(&sig->sigcnt))
+	if (atomic_dec_and_test(&sig->sigcnt)) {
+		sched_autogroup_exit(sig);
 		free_signal_struct(sig);
+	}
 }
 
 void __put_task_struct(struct task_struct *tsk)
@@ -904,6 +906,7 @@ static int copy_signal(unsigned long clo
 	posix_cpu_timers_init_group(sig);
 
 	tty_audit_fork(sig);
+	sched_autogroup_fork(sig);
 
 	sig->oom_adj = current->signal->oom_adj;
 	sig->oom_score_adj = current->signal->oom_score_adj;
Index: linux-2.6.37.git/kernel/sys.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sys.c
+++ linux-2.6.37.git/kernel/sys.c
@@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
-	if (err > 0)
+	if (err > 0) {
 		proc_sid_connector(group_leader);
+		sched_autogroup_create_attach(group_leader);
+	}
 	return err;
 }
 
Index: linux-2.6.37.git/kernel/sched_debug.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched_debug.c
+++ linux-2.6.37.git/kernel/sched_debug.c
@@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
 }
 #endif
 
+#if defined(CONFIG_CGROUP_SCHED) && \
+	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+static void task_group_path(struct task_group *tg, char *buf, int buflen)
+{
+	/* may be NULL if the underlying cgroup isn't fully-created yet */
+	if (!tg->css.cgroup) {
+		if (!autogroup_path(tg, buf, buflen))
+			buf[0] = '\0';
+		return;
+	}
+	cgroup_path(tg->css.cgroup, buf, buflen);
+}
+#endif
+
 static void
 print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 {
@@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
 		char path[64];
 
 		rcu_read_lock();
-		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
+		task_group_path(task_group(p), path, sizeof(path));
 		rcu_read_unlock();
 		SEQ_printf(m, " %s", path);
 	}
@@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
 	read_unlock_irqrestore(&tasklist_lock, flags);
 }
 
-#if defined(CONFIG_CGROUP_SCHED) && \
-	(defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
-static void task_group_path(struct task_group *tg, char *buf, int buflen)
-{
-	/* may be NULL if the underlying cgroup isn't fully-created yet */
-	if (!tg->css.cgroup) {
-		buf[0] = '\0';
-		return;
-	}
-	cgroup_path(tg->css.cgroup, buf, buflen);
-}
-#endif
-
 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 {
 	s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
Index: linux-2.6.37.git/fs/proc/base.c
===================================================================
--- linux-2.6.37.git.orig/fs/proc/base.c
+++ linux-2.6.37.git/fs/proc/base.c
@@ -1407,6 +1407,82 @@ static const struct file_operations proc
 
 #endif
 
+#ifdef CONFIG_SCHED_AUTOGROUP
+/*
+ * Print out autogroup related information:
+ */
+static int sched_autogroup_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *p;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+	proc_sched_autogroup_show_task(p, m);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static ssize_t
+sched_autogroup_write(struct file *file, const char __user *buf,
+	    size_t count, loff_t *offset)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct task_struct *p;
+	char buffer[PROC_NUMBUF];
+	long nice;
+	int err;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	err = strict_strtol(strstrip(buffer), 0, &nice);
+	if (err)
+		return -EINVAL;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+
+	err = nice;
+	err = proc_sched_autogroup_set_nice(p, &err);
+	if (err)
+		count = err;
+
+	put_task_struct(p);
+
+	return count;
+}
+
+static int sched_autogroup_open(struct inode *inode, struct file *filp)
+{
+	int ret;
+
+	ret = single_open(filp, sched_autogroup_show, NULL);
+	if (!ret) {
+		struct seq_file *m = filp->private_data;
+
+		m->private = inode;
+	}
+	return ret;
+}
+
+static const struct file_operations proc_pid_sched_autogroup_operations = {
+	.open		= sched_autogroup_open,
+	.read		= seq_read,
+	.write		= sched_autogroup_write,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
+
 static ssize_t comm_write(struct file *file, const char __user *buf,
 				size_t count, loff_t *offset)
 {
@@ -2733,6 +2809,9 @@ static const struct pid_entry tgid_base_
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
+#ifdef CONFIG_SCHED_AUTOGROUP
+	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
+#endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	INF("syscall",    S_IRUSR, proc_pid_syscall),
Index: linux-2.6.37.git/kernel/sched_autogroup.h
===================================================================
--- /dev/null
+++ linux-2.6.37.git/kernel/sched_autogroup.h
@@ -0,0 +1,23 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg);
+
+#else /* !CONFIG_SCHED_AUTOGROUP */
+
+static inline void autogroup_init(struct task_struct *init_task) {  }
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	return tg;
+}
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return 0;
+}
+#endif
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6.37.git/kernel/sched_autogroup.c
===================================================================
--- /dev/null
+++ linux-2.6.37.git/kernel/sched_autogroup.c
@@ -0,0 +1,243 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/utsname.h>
+
+unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
+
+struct autogroup {
+	struct task_group	*tg;
+	struct kref		kref;
+	struct rw_semaphore 	lock;
+	unsigned long		id;
+	int			nice;
+};
+
+static struct autogroup autogroup_default;
+static atomic_t autogroup_seq_nr;
+
+static void autogroup_init(struct task_struct *init_task)
+{
+	autogroup_default.tg = &init_task_group;
+	init_task_group.autogroup = &autogroup_default;
+	kref_init(&autogroup_default.kref);
+	init_rwsem(&autogroup_default.lock);
+	init_task->signal->autogroup = &autogroup_default;
+}
+
+static inline void autogroup_destroy(struct kref *kref)
+{
+	struct autogroup *ag = container_of(kref, struct autogroup, kref);
+	struct task_group *tg = ag->tg;
+
+	kfree(ag);
+	sched_destroy_group(tg);
+}
+
+static inline void autogroup_kref_put(struct autogroup *ag)
+{
+	kref_put(&ag->kref, autogroup_destroy);
+}
+
+static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
+{
+	kref_get(&ag->kref);
+	return ag;
+}
+
+static inline struct autogroup *autogroup_create(void)
+{
+	struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+
+	if (!ag)
+		goto out_fail;
+
+	ag->tg = sched_create_group(&init_task_group);
+
+	if (IS_ERR(ag->tg))
+		goto out_fail;
+
+	ag->tg->autogroup = ag;
+	kref_init(&ag->kref);
+	init_rwsem(&ag->lock);
+	ag->id = atomic_inc_return(&autogroup_seq_nr);
+
+	return ag;
+
+out_fail:
+	if (ag) {
+		kfree(ag);
+		WARN_ON(1);
+	} else
+		WARN_ON(1);
+
+	return autogroup_kref_get(&autogroup_default);
+}
+
+static inline bool
+task_wants_autogroup(struct task_struct *p, struct task_group *tg)
+{
+	if (tg != &root_task_group)
+		return false;
+
+	if (p->sched_class != &fair_sched_class)
+		return false;
+
+	/*
+	 * We can only assume the task group can't go away on us if
+	 * autogroup_move_group() can see us on ->thread_group list.
+	 */
+	if (p->flags & PF_EXITING)
+		return false;
+
+	return true;
+}
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+	int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
+
+	if (enabled && task_wants_autogroup(p, tg))
+		return p->signal->autogroup->tg;
+
+	return tg;
+}
+
+static void
+autogroup_move_group(struct task_struct *p, struct autogroup *ag)
+{
+	struct autogroup *prev;
+	struct task_struct *t;
+
+	spin_lock(&p->sighand->siglock);
+
+	prev = p->signal->autogroup;
+	if (prev == ag) {
+		spin_unlock(&p->sighand->siglock);
+		return;
+	}
+
+	p->signal->autogroup = autogroup_kref_get(ag);
+	t = p;
+
+	do {
+		sched_move_task(p);
+	} while_each_thread(p, t);
+
+	spin_unlock(&p->sighand->siglock);
+
+	autogroup_kref_put(prev);
+}
+
+/* Allocates GFP_KERNEL, cannot be called under any spinlock */
+void sched_autogroup_create_attach(struct task_struct *p)
+{
+	struct autogroup *ag = autogroup_create();
+
+	autogroup_move_group(p, ag);
+	/* drop extra refrence added by autogroup_create() */
+	autogroup_kref_put(ag);
+}
+EXPORT_SYMBOL(sched_autogroup_create_attach);
+
+/* Cannot be called under siglock.  Currently has no users */
+void sched_autogroup_detach(struct task_struct *p)
+{
+	autogroup_move_group(p, &autogroup_default);
+}
+EXPORT_SYMBOL(sched_autogroup_detach);
+
+void sched_autogroup_fork(struct signal_struct *sig)
+{
+	struct sighand_struct *sighand = current->sighand;
+
+	spin_lock(&sighand->siglock);
+	sig->autogroup = autogroup_kref_get(current->signal->autogroup);
+	spin_unlock(&sighand->siglock);
+}
+
+void sched_autogroup_exit(struct signal_struct *sig)
+{
+	autogroup_kref_put(sig->autogroup);
+}
+
+static int __init setup_autogroup(char *str)
+{
+	sysctl_sched_autogroup_enabled = 0;
+
+	return 1;
+}
+
+__setup("noautogroup", setup_autogroup);
+
+#ifdef CONFIG_PROC_FS
+
+static inline struct autogroup *autogroup_get(struct task_struct *p)
+{
+	struct autogroup *ag;
+
+	/* task may be moved after we unlock.. tough */
+	spin_lock(&p->sighand->siglock);
+	ag = autogroup_kref_get(p->signal->autogroup);
+	spin_unlock(&p->sighand->siglock);
+
+	return ag;
+}
+
+int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
+{
+	static unsigned long next = INITIAL_JIFFIES;
+	struct autogroup *ag;
+	int err;
+
+	if (*nice < -20 || *nice > 19)
+		return -EINVAL;
+
+	err = security_task_setnice(current, *nice);
+	if (err)
+		return err;
+
+	if (*nice < 0 && !can_nice(current, *nice))
+		return -EPERM;
+
+	/* this is a heavy operation taking global locks.. */
+	if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
+		return -EAGAIN;
+
+	next = HZ / 10 + jiffies;;
+	ag = autogroup_get(p);
+
+	down_write(&ag->lock);
+	err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
+	if (!err)
+		ag->nice = *nice;
+	up_write(&ag->lock);
+
+	autogroup_kref_put(ag);
+
+	return err;
+}
+
+void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
+{
+	struct autogroup *ag = autogroup_get(p);
+
+	down_read(&ag->lock);
+	seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
+	up_read(&ag->lock);
+
+	autogroup_kref_put(ag);
+}
+#endif /* CONFIG_PROC_FS */
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+	return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6.37.git/kernel/sysctl.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sysctl.c
+++ linux-2.6.37.git/kernel/sysctl.c
@@ -382,6 +382,17 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_SCHED_AUTOGROUP
+	{
+		.procname	= "sched_autogroup_enabled",
+		.data		= &sysctl_sched_autogroup_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 #ifdef CONFIG_PROVE_LOCKING
 	{
 		.procname	= "prove_locking",
Index: linux-2.6.37.git/init/Kconfig
===================================================================
--- linux-2.6.37.git.orig/init/Kconfig
+++ linux-2.6.37.git/init/Kconfig
@@ -728,6 +728,18 @@ config NET_NS
 
 endif # NAMESPACES
 
+config SCHED_AUTOGROUP
+	bool "Automatic process group scheduling"
+	select CGROUPS
+	select CGROUP_SCHED
+	select FAIR_GROUP_SCHED
+	help
+	  This option optimizes the scheduler for common desktop workloads by
+	  automatically creating and populating task groups.  This separation
+	  of workloads isolates aggressive CPU burners (like build jobs) from
+	  desktop applications.  Task group autogeneration is currently based
+	  upon task session.
+
 config MM_OWNER
 	bool
 
Index: linux-2.6.37.git/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.37.git.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.37.git/Documentation/kernel-parameters.txt
@@ -1622,6 +1622,8 @@ and is between 256 and 4096 characters.
 	noapic		[SMP,APIC] Tells the kernel to not make use of any
 			IOAPICs that may be present in the system.
 
+	noautogroup	Disable scheduler automatic task group creation.
+
 	nobats		[PPC] Do not use BATs for mapping kernel lowmem
 			on "Classic" PPC cores.
 



^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2010-12-15 12:10 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-21 13:37 [PATCH v4] sched: automated per session task groups Ingo Molnar
2010-11-21 13:39 ` Ingo Molnar
2010-11-21 15:44   ` Oleg Nesterov
2010-11-21 16:35     ` Mike Galbraith
2010-11-21 16:15 ` Mike Galbraith
2010-11-21 18:43 ` Gene Heskett
2010-11-25 16:00 ` Mike Galbraith
2010-11-28 14:24   ` Mike Galbraith
2010-11-28 19:31     ` Linus Torvalds
2010-11-28 20:18       ` Ingo Molnar
2010-11-29 11:53         ` Peter Zijlstra
2010-11-29 12:30           ` Ingo Molnar
2010-11-29 13:45             ` Mike Galbraith
2010-11-29 13:47               ` Ingo Molnar
2010-11-29 14:04                 ` Mike Galbraith
2010-11-29 16:27           ` Linus Torvalds
2010-11-29 16:44             ` Ingo Molnar
2010-11-29 17:37             ` Peter Zijlstra
2010-11-29 18:03               ` Ingo Molnar
2010-11-29 19:06               ` Mike Galbraith
2010-11-29 19:20                 ` Ingo Molnar
2010-11-30  3:39                   ` Paul Turner
2010-11-30  4:14                     ` Mike Galbraith
2010-11-30  4:23                       ` Paul Turner
2010-11-30 13:18                         ` Mike Galbraith
2010-11-30 13:48                           ` Peter Zijlstra
2010-11-30 13:59                             ` Ingo Molnar
2010-11-30 14:13                           ` Ingo Molnar
2010-11-30 16:41                             ` Mike Galbraith
2010-11-30 15:17                           ` Vivek Goyal
2010-11-30 17:13                             ` Mike Galbraith
2010-11-30 19:36                               ` Vivek Goyal
2010-12-01  5:01                                 ` Américo Wang
2010-12-01  6:09                                   ` Mike Galbraith
2010-12-01 11:36                                   ` Peter Zijlstra
2010-12-01 22:12                                   ` Valdis.Kletnieks
2010-12-01  5:57                                 ` Mike Galbraith
2010-12-01 11:33                                   ` Peter Zijlstra
2010-12-01 11:55                                     ` Mike Galbraith
2010-12-01 14:55                                   ` Vivek Goyal
2010-12-01 15:04                                     ` Mike Galbraith
2010-11-30  7:54                   ` Mike Galbraith
2010-11-30 14:18                     ` Ingo Molnar
2010-11-30 14:53                       ` Ingo Molnar
2010-11-30 15:01                         ` Peter Zijlstra
2010-11-30 15:11                           ` Ingo Molnar
2010-11-30 16:28                       ` Mike Galbraith
2010-11-29  5:45       ` Mike Galbraith
2010-12-01  3:39     ` Paul Turner
2010-12-01  3:39     ` Paul Turner
2010-12-01  6:16       ` Mike Galbraith
2010-12-03  5:11         ` Paul Turner
2010-12-03  6:48           ` Mike Galbraith
2010-12-03  8:37             ` Paul Turner
2010-12-04 23:55           ` James Courtier-Dutton
2010-12-05  5:11             ` Paul Turner
2010-12-07 11:32               ` Paul Turner
2010-12-15 12:10                 ` Paul Turner
2010-12-01 11:34       ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2010-11-15  1:13 [RFC/RFT PATCH v3] sched: automated per tty " Mike Galbraith
2010-11-15  8:57 ` Peter Zijlstra
2010-11-15 11:32   ` Mike Galbraith
2010-11-15 11:46     ` Mike Galbraith
2010-11-15 12:57       ` Oleg Nesterov
2010-11-15 21:25         ` Mike Galbraith
2010-11-16 13:04           ` Oleg Nesterov
2010-11-16 14:18             ` Mike Galbraith
2010-11-16 15:03               ` Oleg Nesterov
2010-11-16 15:41                 ` Mike Galbraith
2010-11-16 17:28                   ` Ingo Molnar
2010-11-20 19:35                     ` [PATCH v4] sched: automated per session " Mike Galbraith
2010-12-04 17:39                       ` Colin Walters
2010-12-04 18:33                         ` Linus Torvalds
2010-12-04 20:01                           ` Colin Walters
2010-12-04 22:39                             ` Linus Torvalds
2010-12-04 23:43                               ` Colin Walters
2010-12-05  0:31                                 ` Linus Torvalds
2010-12-05  7:47                                 ` Ray Lee
2010-12-05 19:22                                   ` Colin Walters
2010-12-05 20:47                                     ` Linus Torvalds
2010-12-05 22:47                                       ` Colin Walters
2010-12-05 22:58                                         ` Jesper Juhl
2010-12-05 23:05                                           ` Jesper Juhl
2010-12-07 18:51                                       ` Peter Zijlstra
2010-12-05 10:18                                 ` Con Kolivas
2010-12-05 11:36                                   ` Mike Galbraith
2010-12-05 20:58                                   ` Ingo Molnar
2010-12-04 23:31                             ` david
2010-12-05 11:11                             ` Nikos Chantziaras
2010-12-06  0:28                             ` Valdis.Kletnieks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).