* [GIT, RFC] Killing the Big Kernel Lock @ 2010-03-24 21:40 Arnd Bergmann 2010-03-24 21:07 ` Andrew Morton ` (10 more replies) 0 siblings, 11 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-24 21:40 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar I've spent some time continuing the work of the people on Cc and many others to remove the big kernel lock from Linux and I now have bkl-removal branch in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git that lets me run a kernel on my quad-core machine with the only users of the BKL being mostly obscure device driver modules. The oldest patch in this series is roughly eight years old and is Willy's patch to remove the BKL from fs/locks.c, and I took a series of patches from Jan that removes it from most of the VFS. The other non-obvious changes are: - all file operations that either have an .ioctl method or do not have their own .llseek method used to implicitly require the BKL. I've changed that so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl = default_ioctl, and changed all the code that either has supplied a .ioctl method or looks like it needs the BKL somewhere else, meaning the default_llseek function might actually do something. - The block layer now has a global bkldev_mutex that is used in all block drivers in place of the BKL. The only recursive instance of the BKL was __blkdev_get(), which is now called with the blkdev_mutex held instead of grabbing the BKL. This has some possible performance implications that need to be looked into. - The init/main.c code no longer take the BKL. I figured that this was completely unnecessary because there is no other code running at the same time that takes the BKL. - The most invasive change is in the TTY layer, which has a new global mutex (sorry!). I know that Alan has plans of his own to remove the BKL from this subsystem, so my patches may not go anywhere, but they seem to work fine for me. I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably makes more sense if you happen to speak German. The basic idea here is to make recursive locking and the release-on-sleep explicit, so every mutex_lock, wait_event, workqueue_flush and schedule in the TTY layer now explicitly releases the BTM before blocking. - All drivers that still require the BKL are now listed as 'depends on BKL' in Kconfig, and you can set that symbol to 'y', 'm' or 'n'. If the lock itself is a module, only other modules can use it, and /proc/modules will tell you exactly which ones those are. I've thought about adding a module_init function in that module that will taint the kernel, but so far I haven't done that. - Included is a debugfs file that gives statistics over the BKL usage from early boot on. This is now obsolete and will not get merged, but I'm including it for reference. Frederic has volunteered to help merging all of this upstream, which I very much welcome. The shape that the tree is in now is very inconsistent, especially some of the bits at the end are a bit dodgy and all of it needs more testing. I've built-tested an allmodconfig kernel with CONFIG_BKL disabled on x86_64, i386, powerpc64, powerpc32, s390 and arm to make sure I catch all the modules that depend on BKL, and I've been running various versions of this tree on my desktop machine over the last few weeks while adding stuff. Arnd --- Arnd Bergmann (44): input: kill BKL, fix input_open_file locking ptrace: kill BKL procfs: kill BKL in llseek random: forbid llseek on random chardev x86/microcode: use nonseekable_open perf_event: use nonseekable_open dm: use nonseekable_open vgaarb: use nonseekable_open kvm: don't require BKL nvram: kill BKL do_coredump: do not take BKL hpet: kill BKL, add compat_ioctl proc/pci: kill BKL autofs/autofs4: move compat_ioctl handling into fs usb/mon: kill BKL usage fat: push down BKL sunrpc: push down BKL pcmcia: push down BKL vfs: kill BKL in default_llseek BKL: introduce CONFIG_BKL. bkl-removal: make fops->ioctl and default_llseek optional x86: update defconfig to CONFIG_BKL=m bkl removal: make unlocked_ioctl mandatory bkl removal: use default_llseek in code that uses the BKL BKL removal: mark remaining users as 'depends on BKL' tty: replace BKL with a new tty_lock tty: make atomic_write_lock release tty_lock tty: make tty_port->mutex nest under tty_lock tty: make termios mutex nest under tty_lock tty: make ldisc_mutex nest under tty_lock tty: never hold tty_lock() while getting tty_mutex ppp: use big tty mutex tty: release tty lock when blocking tty: implement BTM as mutex instead of BKL briq_panel: do not use BTM affs: remove leftover unlock_kernel kvm: don't require BKL block: replace BKL with global mutex init: kill BKL usage debug: instrument big kernel lock BKL removal: make the BKL modular Matthew Wilcox (1): [RFC] Remove BKL from fs/locks.c Jan Blunck (19): JFS: Free sbi memory in error path BKL: Explicitly add BKL around get_sb/fill_super BKL: Remove outdated comment and include BKL: Remove BKL from Amiga FFS BKL: Remove BKL from BFS BKL: Remove BKL from CifsFS BKL: Remove BKL from ext3 fill_super() BKL: Remove BKL from ext3_put_super() and ext3_remount() BKL: Remove BKL from ext4 filesystem BKL: Remove smp_lock.h from exofs BKL: Remove BKL from HFS BKL: Remove BKL from HFS+ BKL: Remove BKL from JFS BKL: Remove BKL from NILFS2 BKL: Remove BKL from NTFS BKL: Remove BKL from cgroup BKL: Remove BKL from do_new_mount() ext2: Add ext2_sb_info s_lock spinlock BKL: Remove BKL from ext2 filesystem ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann @ 2010-03-24 21:07 ` Andrew Morton 2010-03-25 10:26 ` Arnd Bergmann 2010-03-24 21:53 ` Roland Dreier ` (9 subsequent siblings) 10 siblings, 1 reply; 49+ messages in thread From: Andrew Morton @ 2010-03-24 21:07 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, 24 Mar 2010 22:40:54 +0100 Arnd Bergmann <arnd@arndb.de> wrote: > I've spent some time continuing the work of the people on Cc and many others > to remove the big kernel lock from Linux <looks inside ptrace.c> Seems that there might be a few tricksy bits in here. Please do push at least the non-obvious parts out to the relevant people. It'd be interesting to see the overall diffstat? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:07 ` Andrew Morton @ 2010-03-25 10:26 ` Arnd Bergmann 2010-03-28 20:33 ` Frederic Weisbecker 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-25 10:26 UTC (permalink / raw) To: Andrew Morton Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wednesday 24 March 2010, Andrew Morton wrote: > On Wed, 24 Mar 2010 22:40:54 +0100 Arnd Bergmann <arnd@arndb.de> wrote: > > > I've spent some time continuing the work of the people on Cc and many others > > to remove the big kernel lock from Linux > > <looks inside ptrace.c> > > Seems that there might be a few tricksy bits in here. Please do push > at least the non-obvious parts out to the relevant people. Sure, that is certainly the plan. Regarding the ptrace bits, this is one of a handful of places where the BKL was put in by someone a really long time ago but with the rest of the series applied, it becomes evident that there is nothing whatsoever that it serializes with, so removing the BKL here does not make the situation worse. It could still be a bug that needs to be fixed by adding a new serialization method no matter if the BKL is there or not. > It'd be interesting to see the overall diffstat? Let me give you three separate diffstats. One is for the trivial pushdown of the BKL into all the ioctl and llseek functions as well as marking all remaining users as 'depends on BKL' in Kconfig, the second one is for the TTY layer conversion to the Big TTY Mutex and the third one is all the rest. Arnd --- Big TTY Mutex conversion: drivers/char/Makefile | 2 + drivers/char/amiserial.c | 16 ++-- drivers/char/cyclades.c | 20 ++-- drivers/char/epca.c | 4 +- drivers/char/isicom.c | 10 +- drivers/char/istallion.c | 20 ++-- drivers/char/mxser.c | 10 +- drivers/char/n_hdlc.c | 16 ++-- drivers/char/n_r3964.c | 10 +- drivers/char/pty.c | 8 +- drivers/char/riscom8.c | 8 +- drivers/char/rocket.c | 8 +- drivers/char/serial167.c | 4 +- drivers/char/specialix.c | 10 +- drivers/char/stallion.c | 12 ++-- drivers/char/sx.c | 12 ++-- drivers/char/synclink.c | 10 ++- drivers/char/synclink_gt.c | 8 +- drivers/char/synclinkmp.c | 12 ++-- drivers/char/tty_buffer.c | 2 +- drivers/char/tty_io.c | 123 +++++++++++++++------------ drivers/char/tty_ioctl.c | 36 ++++---- drivers/char/tty_ldisc.c | 53 +++++++----- drivers/char/tty_lock.c | 100 ++++++++++++++++++++++ drivers/char/tty_port.c | 6 +- drivers/char/vc_screen.c | 4 +- drivers/char/vt.c | 4 +- drivers/char/vt_ioctl.c | 12 ++-- drivers/isdn/i4l/isdn_common.c | 20 ++-- drivers/isdn/i4l/isdn_tty.c | 8 +- drivers/net/irda/irtty-sir.c | 5 +- drivers/net/ppp_generic.c | 29 +++--- drivers/serial/68360serial.c | 4 +- drivers/serial/crisv10.c | 8 +- drivers/serial/serial_core.c | 42 +++++----- drivers/staging/strip/strip.c | 2 +- drivers/usb/class/cdc-acm.c | 2 +- drivers/usb/serial/usb-serial.c | 18 ++-- drivers/video/console/vgacon.c | 4 +- include/linux/init_task.h | 1 + include/linux/sched.h | 1 + include/linux/tty.h | 180 +++++++++++++++++++++++++++++++++++++++ kernel/fork.c | 1 + ioctl/llseek pushdown, Kconfig: 43 files changed, 590 insertions(+), 275 deletions(-) Documentation/DocBook/kernel-hacking.tmpl | 2 +- Documentation/filesystems/vfs.txt | 3 +- arch/arm/kernel/etm.c | 1 - arch/cris/arch-v10/drivers/ds1302.c | 3 - arch/cris/arch-v10/drivers/gpio.c | 2 - arch/cris/arch-v10/drivers/i2c.c | 2 - arch/cris/arch-v10/drivers/pcf8563.c | 3 - arch/cris/arch-v10/drivers/sync_serial.c | 4 +- arch/cris/arch-v32/drivers/cryptocop.c | 4 +- arch/cris/arch-v32/drivers/i2c.c | 2 - arch/cris/arch-v32/drivers/mach-a3/gpio.c | 2 - arch/cris/arch-v32/drivers/mach-fs/gpio.c | 2 - arch/cris/arch-v32/drivers/pcf8563.c | 5 +- arch/cris/arch-v32/drivers/sync_serial.c | 4 +- arch/ia64/kernel/perfmon.c | 2 - arch/ia64/sn/kernel/sn2/sn_hwperf.c | 2 - arch/m68k/bvme6000/rtc.c | 2 - arch/m68k/mvme16x/rtc.c | 2 - arch/powerpc/platforms/iseries/Kconfig | 1 - arch/s390/Kconfig | 1 - arch/um/drivers/harddog_kern.c | 2 - arch/um/drivers/hostaudio_kern.c | 3 - arch/um/drivers/mmapper_kern.c | 3 - arch/x86/Kconfig | 3 +- arch/x86/kvm/Kconfig | 1 - drivers/block/DAC960.c | 3 +- drivers/block/Kconfig | 6 +- drivers/block/paride/pg.c | 2 - drivers/block/paride/pt.c | 2 - drivers/block/pktcdvd.c | 3 - drivers/char/Kconfig | 25 +- drivers/char/apm-emulation.c | 2 - drivers/char/applicom.c | 2 - drivers/char/ds1302.c | 1 - drivers/char/ds1620.c | 2 - drivers/char/dtlk.c | 2 - drivers/char/generic_nvram.c | 2 - drivers/char/genrtc.c | 2 - drivers/char/i8k.c | 2 - drivers/char/ip2/ip2main.c | 1 - drivers/char/ipmi/Kconfig | 2 - drivers/char/ipmi/ipmi_devintf.c | 2 - drivers/char/ipmi/ipmi_watchdog.c | 2 - drivers/char/istallion.c | 1 - drivers/char/lp.c | 1 - drivers/char/mmtimer.c | 1 - drivers/char/nwflash.c | 1 - drivers/char/pcmcia/Kconfig | 4 +- drivers/char/raw.c | 4 - drivers/char/rio/rio_linux.c | 1 - drivers/char/stallion.c | 1 - drivers/char/sx.c | 1 - drivers/char/uv_mmtimer.c | 1 - drivers/char/viotape.c | 1 - drivers/crypto/Kconfig | 2 +- drivers/firewire/Kconfig | 1 - drivers/firewire/core-cdev.c | 2 - drivers/gpu/drm/Kconfig | 5 +- drivers/gpu/drm/i810/i810_dma.c | 2 - drivers/gpu/drm/i830/i830_dma.c | 2 - drivers/hid/Kconfig | 2 +- drivers/hid/usbhid/Kconfig | 2 +- drivers/hid/usbhid/hiddev.c | 3 +- drivers/hwmon/Kconfig | 2 +- drivers/hwmon/fschmd.c | 2 - drivers/ide/Kconfig | 1 - drivers/ide/ide-tape.c | 1 - drivers/ieee1394/Kconfig | 6 +- drivers/ieee1394/dv1394.c | 2 - drivers/ieee1394/raw1394.c | 2 - drivers/ieee1394/video1394.c | 4 +- drivers/infiniband/Kconfig | 6 +- drivers/infiniband/core/ucm.c | 2 - drivers/infiniband/core/ucma.c | 2 - drivers/infiniband/core/user_mad.c | 3 - drivers/infiniband/core/uverbs_main.c | 11 +- drivers/input/joystick/Kconfig | 2 +- drivers/input/misc/Kconfig | 13 +- drivers/input/misc/hp_sdc_rtc.c | 2 - drivers/input/misc/uinput.c | 1 - drivers/input/mouse/Kconfig | 4 +- drivers/input/serio/Kconfig | 1 - drivers/input/tablet/Kconfig | 8 +- drivers/input/touchscreen/Kconfig | 2 +- drivers/isdn/Kconfig | 2 - drivers/isdn/capi/Kconfig | 2 +- drivers/isdn/capi/capi.c | 1 - drivers/isdn/divert/divert_procfs.c | 2 - drivers/isdn/hardware/eicon/Kconfig | 2 +- drivers/isdn/hysdn/Kconfig | 2 +- drivers/isdn/i4l/Kconfig | 1 - drivers/isdn/i4l/isdn_common.c | 1 - drivers/isdn/mISDN/Kconfig | 1 - drivers/isdn/mISDN/timerdev.c | 3 - drivers/macintosh/Kconfig | 7 +- drivers/macintosh/ans-lcd.c | 2 - drivers/macintosh/nvram.c | 2 - drivers/macintosh/via-pmu.c | 2 - drivers/media/Kconfig | 5 +- drivers/media/dvb/bt8xx/Kconfig | 2 +- drivers/media/dvb/bt8xx/dst_ca.c | 1 - drivers/media/dvb/dvb-core/dmxdev.c | 3 - drivers/media/dvb/dvb-core/dvb_ca_en50221.c | 3 - drivers/media/dvb/dvb-core/dvb_frontend.c | 5 +- drivers/media/dvb/dvb-core/dvb_net.c | 3 - drivers/media/dvb/firewire/Kconfig | 2 +- drivers/media/dvb/firewire/firedtv-ci.c | 3 - drivers/media/dvb/ttpci/Kconfig | 2 +- drivers/media/dvb/ttpci/av7110.c | 3 - drivers/media/dvb/ttpci/av7110_av.c | 5 - drivers/media/dvb/ttpci/av7110_ca.c | 3 - drivers/media/video/Kconfig | 13 +- drivers/media/video/cpia.c | 2 - drivers/media/video/v4l2-dev.c | 2 - drivers/message/fusion/Kconfig | 2 +- drivers/message/i2o/Kconfig | 2 +- drivers/misc/Kconfig | 2 +- drivers/mtd/Kconfig | 1 - drivers/mtd/mtdchar.c | 1 - drivers/mtd/ubi/Kconfig | 2 +- drivers/mtd/ubi/cdev.c | 2 - drivers/net/Kconfig | 1 - drivers/net/appletalk/Kconfig | 1 - drivers/net/ppp_generic.c | 4 +- drivers/net/wan/Kconfig | 2 +- drivers/net/wireless/Kconfig | 4 +- drivers/net/wireless/airo.c | 9 - drivers/net/wireless/ray_cs.c | 3 - drivers/pci/hotplug/Kconfig | 2 +- drivers/pcmcia/Kconfig | 3 +- drivers/platform/x86/Kconfig | 1 - drivers/pnp/isapnp/Kconfig | 2 +- drivers/rtc/Kconfig | 1 - drivers/rtc/rtc-m41t80.c | 1 - drivers/s390/block/Kconfig | 2 +- drivers/s390/char/Kconfig | 2 +- drivers/s390/char/fs3270.c | 1 - drivers/s390/char/tape_char.c | 2 +- drivers/s390/cio/chsc_sch.c | 2 - drivers/s390/crypto/zcrypt_api.c | 1 - drivers/s390/scsi/zfcp_cfdc.c | 2 - drivers/sbus/char/envctrl.c | 1 - drivers/sbus/char/openprom.c | 1 - drivers/scsi/3w-9xxx.c | 2 - drivers/scsi/3w-sas.c | 2 - drivers/scsi/3w-xxxx.c | 2 - drivers/scsi/Kconfig | 28 +- drivers/scsi/aacraid/linit.c | 1 - drivers/scsi/dpt_i2o.c | 2 - drivers/scsi/gdth.c | 2 - drivers/scsi/megaraid.c | 2 - drivers/scsi/megaraid/Kconfig.megaraid | 6 +- drivers/scsi/megaraid/megaraid_mm.c | 2 - drivers/scsi/megaraid/megaraid_sas.c | 1 - drivers/scsi/mpt2sas/Kconfig | 2 +- drivers/scsi/mpt2sas/mpt2sas_ctl.c | 1 - drivers/scsi/osd/Kconfig | 2 +- drivers/scsi/osd/osd_uld.c | 2 - drivers/scsi/osst.c | 2 - drivers/scsi/pmcraid.c | 2 - drivers/scsi/st.c | 1 - drivers/spi/spidev.c | 2 - drivers/staging/comedi/comedi_fops.c | 2 - drivers/staging/dream/pmem.c | 3 - drivers/staging/dream/qdsp5/audio_aac.c | 2 - drivers/staging/dream/qdsp5/audio_mp3.c | 2 - drivers/staging/poch/poch.c | 3 - drivers/staging/sep/sep_driver.c | 2 - drivers/staging/vme/devices/vme_user.c | 2 - drivers/telephony/Kconfig | 2 +- drivers/telephony/ixj.c | 1 - drivers/usb/Kconfig | 2 +- drivers/usb/class/Kconfig | 1 - drivers/usb/class/usblp.c | 2 - drivers/usb/gadget/Kconfig | 2 - drivers/usb/gadget/printer.c | 1 - drivers/usb/host/Kconfig | 2 +- drivers/usb/misc/Kconfig | 8 +- drivers/usb/misc/idmouse.c | 2 - drivers/usb/misc/iowarrior.c | 1 - drivers/usb/misc/rio500.c | 1 - drivers/usb/misc/sisusbvga/Kconfig | 2 +- fs/adfs/Kconfig | 1 - fs/afs/Kconfig | 1 - fs/autofs/Kconfig | 1 - fs/autofs/root.c | 1 - fs/autofs4/Kconfig | 1 - fs/autofs4/dev-ioctl.c | 2 - fs/btrfs/super.c | 1 - fs/coda/Kconfig | 1 - fs/coda/pioctl.c | 3 - fs/coda/psdev.c | 2 - fs/ecryptfs/Kconfig | 1 - fs/ecryptfs/file.c | 2 - fs/ecryptfs/miscdev.c | 2 - fs/fat/Kconfig | 3 - fs/freevxfs/Kconfig | 1 - fs/hfsplus/Kconfig | 1 - fs/hfsplus/dir.c | 2 - fs/hfsplus/inode.c | 2 - fs/hpfs/Kconfig | 1 - fs/ioctl.c | 11 +- fs/isofs/Kconfig | 1 - fs/jffs2/Kconfig | 1 - fs/ncpfs/Kconfig | 1 - fs/ncpfs/dir.c | 2 - fs/ncpfs/file.c | 1 - fs/nfs/Kconfig | 2 +- fs/nfsd/Kconfig | 1 - fs/ocfs2/Kconfig | 1 - fs/qnx4/Kconfig | 1 - fs/read_write.c | 34 + fs/reiserfs/Kconfig | 1 - fs/smbfs/Kconfig | 1 - fs/smbfs/dir.c | 2 - fs/smbfs/file.c | 1 - fs/squashfs/Kconfig | 1 - fs/udf/Kconfig | 1 - fs/udf/dir.c | 2 - fs/udf/file.c | 1 - fs/ufs/Kconfig | 2 +- include/linux/fs.h | 5 + kernel/power/Kconfig | 2 +- lib/Kconfig.debug | 2 +- lib/kernel_lock.c | 37 +- net/bluetooth/hidp/Kconfig | 1 - net/ipx/Kconfig | 1 - net/irda/Kconfig | 2 +- net/irda/irnet/Kconfig | 2 +- net/socket.c | 1 - net/sunrpc/Kconfig | 4 +- net/wanrouter/Kconfig | 2 +- net/x25/Kconfig | 2 +- sound/Kconfig | 2 +- sound/core/control.c | 2 - sound/core/oss/pcm_oss.c | 2 - sound/core/pcm_native.c | 2 - sound/core/seq/seq_clientmgr.c | 2 - sound/oss/au1550_ac97.c | 30 +- sound/oss/dmasound/dmasound_core.c | 2 - sound/oss/msnd_pinnacle.c | 2 - sound/oss/sh_dac_audio.c | 3 - sound/oss/swarm_cs4297a.c | 3 - sound/oss/vwsnd.c | 2 - sound/soc/soc-core.c | 2 - virt/kvm/kvm_main.c | 1 - 248 files changed, 190 insertions(+), 516 deletions(-) The rest: arch/arm/lib/uaccess_with_memcpy.c | 1 + arch/x86/configs/x86_64_defconfig | 29 +- arch/x86/kernel/microcode_core.c | 6 +- block/bsg.c | 2 - block/compat_ioctl.c | 8 +- block/ioctl.c | 24 +- drivers/block/DAC960.c | 4 +- drivers/block/aoe/aoechr.c | 6 +- drivers/block/cciss.c | 4 +- drivers/block/paride/pg.c | 4 +- drivers/block/paride/pt.c | 16 +- drivers/char/briq_panel.c | 1 + drivers/char/dsp56k.c | 1 + drivers/char/hpet.c | 96 +++-- drivers/char/mwave/mwavedd.c | 1 + drivers/char/nvram.c | 7 +- drivers/char/pcmcia/cm4000_cs.c | 1 + drivers/char/pcmcia/cm4040_cs.c | 1 + drivers/char/random.c | 2 + drivers/char/snsc.c | 1 + drivers/char/tlclk.c | 1 + drivers/char/toshiba.c | 1 + drivers/char/xilinx_hwicap/xilinx_hwicap.c | 1 + drivers/gpu/vga/vgaarb.c | 4 +- drivers/hid/hidraw.c | 1 + drivers/input/serio/serio_raw.c | 1 + drivers/isdn/capi/capifs.c | 10 +- drivers/md/dm-ioctl.c | 2 + drivers/media/dvb/dvb-core/dvbdev.c | 1 + drivers/misc/phantom.c | 1 + drivers/pci/proc.c | 4 +- drivers/pcmcia/pcmcia_ioctl.c | 23 +- drivers/sbus/char/display7seg.c | 1 + drivers/sbus/char/jsflash.c | 20 +- drivers/scsi/aacraid/linit.c | 1 + drivers/scsi/ch.c | 1 + drivers/scsi/sg.c | 22 +- drivers/usb/core/file.c | 1 + drivers/usb/core/inode.c | 5 + drivers/usb/gadget/inode.c | 12 +- drivers/usb/misc/usblcd.c | 1 + drivers/usb/mon/mon_bin.c | 14 +- drivers/watchdog/cpwd.c | 1 + fs/adfs/super.c | 8 +- fs/affs/super.c | 14 +- fs/afs/super.c | 5 + fs/autofs/root.c | 67 +++- fs/autofs4/root.c | 69 +++- fs/bad_inode.c | 4 + fs/bfs/inode.c | 7 +- fs/block_dev.c | 20 +- fs/cifs/cifsfs.c | 9 +- fs/coda/inode.c | 8 +- fs/compat_ioctl.c | 43 +-- fs/ecryptfs/main.c | 4 + fs/exec.c | 6 - fs/exofs/super.c | 1 - fs/ext2/inode.c | 5 +- fs/ext2/super.c | 58 ++- fs/ext3/super.c | 12 - fs/ext4/super.c | 11 - fs/fat/dir.c | 11 +- fs/fat/fat.h | 2 +- fs/fat/file.c | 33 ++- fs/fat/namei_msdos.c | 7 +- fs/fat/namei_vfat.c | 7 +- fs/freevxfs/vxfs_lookup.c | 1 + fs/freevxfs/vxfs_super.c | 7 +- fs/hfs/super.c | 6 +- fs/hfsplus/super.c | 5 - fs/hpfs/super.c | 8 +- fs/hppfs/hppfs.c | 2 +- fs/ioctl.c | 2 + fs/isofs/dir.c | 1 + fs/isofs/inode.c | 8 +- fs/jffs2/super.c | 11 +- fs/jfs/super.c | 22 +- fs/locks.c | 110 +++-- fs/namespace.c | 2 - fs/ncpfs/inode.c | 8 +- fs/nfs/super.c | 24 + fs/nilfs2/ioctl.c | 1 - fs/nilfs2/super.c | 10 - fs/ntfs/super.c | 24 +- fs/ocfs2/stack_user.c | 1 + fs/ocfs2/super.c | 5 + fs/proc/base.c | 10 +- fs/proc/inode.c | 8 +- fs/qnx4/dir.c | 1 + fs/qnx4/inode.c | 8 +- fs/read_write.c | 8 + fs/reiserfs/super.c | 4 + fs/smbfs/inode.c | 5 + fs/squashfs/super.c | 6 + fs/super.c | 3 - fs/udf/super.c | 8 +- fs/ufs/super.c | 5 + include/linux/auto_fs.h | 1 + include/linux/blkdev.h | 6 + include/linux/ext2_fs_sb.h | 6 + include/linux/fs.h | 2 + include/linux/sched.h | 2 + include/linux/smp_lock.h | 57 ++- init/main.c | 5 - kernel/cgroup.c | 4 - kernel/perf_event.c | 2 + kernel/ptrace.c | 10 - kernel/sched.c | 17 + kernel/trace/blktrace.c | 14 +- kernel/trace/trace.c | 8 - lib/Kconfig.debug | 17 + lib/Makefile | 4 + lib/kernel_lock.c | 104 +----- lib/kernel_lock_core.c | 307 +++++++++++++ net/sunrpc/cache.c | 30 +- net/sunrpc/rpc_pipe.c | 9 +- sound/core/timer.c | 5 +- sound/sound_core.c | 1 + virt/kvm/kvm_main.c | 6 + 120 files changed, 1156 insertions(+), 540 deletions(-) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-25 10:26 ` Arnd Bergmann @ 2010-03-28 20:33 ` Frederic Weisbecker 0 siblings, 0 replies; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 20:33 UTC (permalink / raw) To: Arnd Bergmann Cc: Andrew Morton, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Thu, Mar 25, 2010 at 11:26:56AM +0100, Arnd Bergmann wrote: > On Wednesday 24 March 2010, Andrew Morton wrote: > > On Wed, 24 Mar 2010 22:40:54 +0100 Arnd Bergmann <arnd@arndb.de> wrote: > > > > > I've spent some time continuing the work of the people on Cc and many others > > > to remove the big kernel lock from Linux > > > > <looks inside ptrace.c> > > > > Seems that there might be a few tricksy bits in here. Please do push > > at least the non-obvious parts out to the relevant people. > > Sure, that is certainly the plan. > > Regarding the ptrace bits, this is one of a handful of places where the BKL > was put in by someone a really long time ago but with the rest of the > series applied, it becomes evident that there is nothing whatsoever that > it serializes with, so removing the BKL here does not make the situation > worse. It could still be a bug that needs to be fixed by adding a new > serialization method no matter if the BKL is there or not. Yeah, the comment gives this: /* * This lock_kernel fixes a subtle race with suid exec */ But there is no lock_kernel() in the exec path, may be I missed it... so this may be an old lock_kernel() that doesn't exist anymore, and the bkl in the ptrace path is not going to help in any way. What remain to check are the possible unintended racy places that this bkl might protect. I'm going to check a first pass and if it looks fine, I'll just submit to Oleg and Roland. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 2010-03-24 21:07 ` Andrew Morton @ 2010-03-24 21:53 ` Roland Dreier 2010-03-24 21:59 ` Arnd Bergmann 2010-03-24 22:10 ` Alan Cox ` (8 subsequent siblings) 10 siblings, 1 reply; 49+ messages in thread From: Roland Dreier @ 2010-03-24 21:53 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar Hi Arnd, Interesting work. For the drivers/infiniband part, it seems maybe all these drivers should be using no_llseek instead of default_llseek? (or is it better style to use nonseekable_open()?) Certainly as far as I can tell, nothing in drivers/infiniband pays any attention to f_pos. Also, is there a reason why you add "#include <linux/smp_lock.h>" to all the files where you also do ".llseek = default_llseek"? In any case I can at least take care of the llseek stuff for 2.6.35. - R. -- Roland Dreier <rolandd@cisco.com> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:53 ` Roland Dreier @ 2010-03-24 21:59 ` Arnd Bergmann 2010-03-31 5:22 ` Roland Dreier 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-24 21:59 UTC (permalink / raw) To: Roland Dreier Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wednesday 24 March 2010 22:53:07 Roland Dreier wrote: > Interesting work. For the drivers/infiniband part, it seems maybe all > these drivers should be using no_llseek instead of default_llseek? (or > is it better style to use nonseekable_open()?) Certainly as far as I > can tell, nothing in drivers/infiniband pays any attention to f_pos. no_llseek makes it clear that you don't want the default_llseek semantics, while nonseekable_open also prevents pread/pwrite. Ideally, I'd just use both. There is a small chance that a random user space application actually tries to seek on the device (e.g. SEEK_END) and expects a zero return value, so when in doubt, I converted everything to default_llseek instead of no_llseek, just so I can be sure I don't change the semantics. > Also, is there a reason why you add "#include <linux/smp_lock.h>" to all > the files where you also do ".llseek = default_llseek"? The last patch in the series moves the default_llseek and default_ioctl function into the same loadable module that contains the BKL itself. Moving the declarations into the respective header seemed appropriate, but it could also stay in a VFS header if people prefer that. > In any case I can at least take care of the llseek stuff for 2.6.35. Ok, thanks! Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:59 ` Arnd Bergmann @ 2010-03-31 5:22 ` Roland Dreier 0 siblings, 0 replies; 49+ messages in thread From: Roland Dreier @ 2010-03-31 5:22 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar OK, I added the following to my tree, currently queued in my for-next branch for 2.6.35: IB: Explicitly rule out llseek to avoid BKL in default_llseek() Several RDMA user-access drivers have file_operations structures with no .llseek method set. None of the drivers actually do anything with f_pos, so this means llseek is essentially a NOP, instead of returning an error as leaving other file_operations methods unimplemented would do. This is mostly harmless, except that a NULL .llseek means that default_llseek() is used, and this function grabs the BKL, which we would like to avoid. Since llseek does nothing useful on these files, we would like it to return an error to userspace instead of silently grabbing the BKL and succeeding. For nearly all of the file types, we take the belt-and-suspenders approach of setting the .llseek method to no_llseek and also calling nonseekable_open(); the exception is the uverbs_event files, which are created with anon_inode_getfile(), which already sets f_mode the same way as nonseekable_open() would. This work is motivated by Arnd Bergmann's bkl-removal tree. Signed-off-by: Roland Dreier <rolandd@cisco.com> --- drivers/infiniband/core/ucm.c | 3 ++- drivers/infiniband/core/ucma.c | 4 +++- drivers/infiniband/core/user_mad.c | 12 ++++++++---- drivers/infiniband/core/uverbs_main.c | 11 +++++++---- 4 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index 017d6e2..7ef3954 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -1180,7 +1180,7 @@ static int ib_ucm_open(struct inode *inode, struct file *filp) file->filp = filp; file->device = container_of(inode->i_cdev, struct ib_ucm_device, cdev); - return 0; + return nonseekable_open(inode, filp); } static int ib_ucm_close(struct inode *inode, struct file *filp) @@ -1228,6 +1228,7 @@ static const struct file_operations ucm_fops = { .release = ib_ucm_close, .write = ib_ucm_write, .poll = ib_ucm_poll, + .llseek = no_llseek, }; static ssize_t show_ibdev(struct device *dev, struct device_attribute *attr, diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index b2e16c3..09d4a3b 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -1219,7 +1219,8 @@ static int ucma_open(struct inode *inode, struct file *filp) filp->private_data = file; file->filp = filp; - return 0; + + return nonseekable_open(inode, filp); } static int ucma_close(struct inode *inode, struct file *filp) @@ -1249,6 +1250,7 @@ static const struct file_operations ucma_fops = { .release = ucma_close, .write = ucma_write, .poll = ucma_poll, + .llseek = no_llseek, }; static struct miscdevice ucma_misc = { diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 04b585e..2bb9703 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -780,7 +780,7 @@ static int ib_umad_open(struct inode *inode, struct file *filp) { struct ib_umad_port *port; struct ib_umad_file *file; - int ret = 0; + int ret; port = container_of(inode->i_cdev, struct ib_umad_port, cdev); if (port) @@ -813,6 +813,8 @@ static int ib_umad_open(struct inode *inode, struct file *filp) list_add_tail(&file->port_list, &port->file_list); + ret = nonseekable_open(inode, filp); + out: mutex_unlock(&port->file_mutex); return ret; @@ -865,7 +867,8 @@ static const struct file_operations umad_fops = { .compat_ioctl = ib_umad_compat_ioctl, #endif .open = ib_umad_open, - .release = ib_umad_close + .release = ib_umad_close, + .llseek = no_llseek, }; static int ib_umad_sm_open(struct inode *inode, struct file *filp) @@ -902,7 +905,7 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp) filp->private_data = port; - return 0; + return nonseekable_open(inode, filp); fail: kref_put(&port->umad_dev->ref, ib_umad_release_dev); @@ -932,7 +935,8 @@ static int ib_umad_sm_close(struct inode *inode, struct file *filp) static const struct file_operations umad_sm_fops = { .owner = THIS_MODULE, .open = ib_umad_sm_open, - .release = ib_umad_sm_close + .release = ib_umad_sm_close, + .llseek = no_llseek, }; static struct ib_client umad_client = { diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 1444f95..a16a91e 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -385,7 +385,8 @@ static const struct file_operations uverbs_event_fops = { .read = ib_uverbs_event_read, .poll = ib_uverbs_event_poll, .release = ib_uverbs_event_close, - .fasync = ib_uverbs_event_fasync + .fasync = ib_uverbs_event_fasync, + .llseek = no_llseek, }; void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context) @@ -639,7 +640,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) filp->private_data = file; - return 0; + return nonseekable_open(inode, filp); err_module: module_put(dev->ib_dev->owner); @@ -667,7 +668,8 @@ static const struct file_operations uverbs_fops = { .owner = THIS_MODULE, .write = ib_uverbs_write, .open = ib_uverbs_open, - .release = ib_uverbs_close + .release = ib_uverbs_close, + .llseek = no_llseek, }; static const struct file_operations uverbs_mmap_fops = { @@ -675,7 +677,8 @@ static const struct file_operations uverbs_mmap_fops = { .write = ib_uverbs_write, .mmap = ib_uverbs_mmap, .open = ib_uverbs_open, - .release = ib_uverbs_close + .release = ib_uverbs_close, + .llseek = no_llseek, }; static struct ib_client uverbs_client = { -- 1.7.0.3 -- Roland Dreier <rolandd@cisco.com> || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 2010-03-24 21:07 ` Andrew Morton 2010-03-24 21:53 ` Roland Dreier @ 2010-03-24 22:10 ` Alan Cox 2010-03-24 22:25 ` Arnd Bergmann 2010-03-24 22:23 ` Ingo Molnar ` (7 subsequent siblings) 10 siblings, 1 reply; 49+ messages in thread From: Alan Cox @ 2010-03-24 22:10 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar > - The most invasive change is in the TTY layer, which has a new global > mutex (sorry!). I know that Alan has plans of his own to remove the BKL > from this subsystem, so my patches may not go anywhere, but they seem > to work fine for me. > I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably > makes more sense if you happen to speak German. Chuckle (I don't but I looked it up) > The basic idea here is to make recursive locking and the release-on-sleep > explicit, so every mutex_lock, wait_event, workqueue_flush and schedule > in the TTY layer now explicitly releases the BTM before blocking. I'm not sure if that is actually the path of sanity (yours at least), nor the right way to whack the other BKL users whose use is horrible but essentially private. It would be nice to get the other bits in first removing BKL from most of the kernel and building kernels which are non BKL except for the tty layer. That (after Ingo's box from hell has run it a bit) would reasonably test the assertion that the tty layer has no BKL requirements that are driven by external to tty layer code. That to me would test the biggest question of all and be a reasonably good base from which to then either apply the tty BTM patches or attack the problem properly with the BKL localised to one subtree. Alan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 22:10 ` Alan Cox @ 2010-03-24 22:25 ` Arnd Bergmann 0 siblings, 0 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-24 22:25 UTC (permalink / raw) To: Alan Cox Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Wednesday 24 March 2010 23:10:16 Alan Cox wrote: > > The basic idea here is to make recursive locking and the release-on-sleep > > explicit, so every mutex_lock, wait_event, workqueue_flush and schedule > > in the TTY layer now explicitly releases the BTM before blocking. > > I'm not sure if that is actually the path of sanity (yours at least), nor > the right way to whack the other BKL users whose use is horrible but > essentially private. > > It would be nice to get the other bits in first removing BKL from most of > the kernel and building kernels which are non BKL except for the tty > layer. That (after Ingo's box from hell has run it a bit) would > reasonably test the assertion that the tty layer has no BKL requirements > that are driven by external to tty layer code. Yes, we can do that by applying all patches except 'tty: implement BTM as mutex instead of BKL', which is the only one in the tty section of my series that should really change the behaviour. Building a kernel with all other BKL users gone currently implies disabling usbcore, videodev, soundcore, i4l and capi, as well as a large number of obsolete device drivers. The only ones that I can imagine still interacting with the tty code are the ISDN drivers, and even those look pretty unlikely. > That to me would test the biggest question of all and be a reasonably > good base from which to then either apply the tty BTM patches or attack > the problem properly with the BKL localised to one subtree. We could also make the 'tty: implement BTM as mutex instead of BKL' patch a config option that makes it possible to test it out some more while conservative users just continue to get the BKL semantics. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (2 preceding siblings ...) 2010-03-24 22:10 ` Alan Cox @ 2010-03-24 22:23 ` Ingo Molnar 2010-03-25 12:55 ` Jiri Kosina ` (6 subsequent siblings) 10 siblings, 0 replies; 49+ messages in thread From: Ingo Molnar @ 2010-03-24 22:23 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Linus Torvalds, Andrew Morton, Jonathan Corbet * Arnd Bergmann <arnd@arndb.de> wrote: > I've built-tested an allmodconfig kernel with CONFIG_BKL disabled on x86_64, > i386, powerpc64, powerpc32, s390 and arm to make sure I catch all the > modules that depend on BKL, and I've been running various versions of this > tree on my desktop machine over the last few weeks while adding stuff. Very nice work! How about going one step further: - remove CONFIG_BKL altogether - remove all the remains of the BKL code in lib/kernel_lock.c and kernel/sched.c - turn lock_kernel() into a WARN_ONCE() and unlock_kernel() into a NOP. - ... - Celebrate! :-) Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (3 preceding siblings ...) 2010-03-24 22:23 ` Ingo Molnar @ 2010-03-25 12:55 ` Jiri Kosina 2010-03-25 13:06 ` Arnd Bergmann 2010-03-28 21:58 ` Andi Kleen 2010-03-25 13:40 ` [GIT, RFC] Killing the Big Kernel Lock Dan Carpenter ` (5 subsequent siblings) 10 siblings, 2 replies; 49+ messages in thread From: Jiri Kosina @ 2010-03-25 12:55 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, 24 Mar 2010, Arnd Bergmann wrote: > I've spent some time continuing the work of the people on Cc and many others > to remove the big kernel lock from Linux and I now have bkl-removal branch > in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git > that lets me run a kernel on my quad-core machine with the only users of the BKL > being mostly obscure device driver modules. config USB tristate "Support for Host-side USB" depends on USB_ARCH_HAS_HCD && BKL Well, that's very interesting definition of "obscure" :) -- Jiri Kosina SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-25 12:55 ` Jiri Kosina @ 2010-03-25 13:06 ` Arnd Bergmann 2010-03-25 13:38 ` Arnd Bergmann 2010-03-28 21:58 ` Andi Kleen 1 sibling, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-25 13:06 UTC (permalink / raw) To: Jiri Kosina Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Thursday 25 March 2010, Jiri Kosina wrote: > On Wed, 24 Mar 2010, Arnd Bergmann wrote: > > > I've spent some time continuing the work of the people on Cc and many others > > to remove the big kernel lock from Linux and I now have bkl-removal branch > > in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git > > that lets me run a kernel on my quad-core machine with the only users of the BKL > > being mostly obscure device driver modules. > > config USB > tristate "Support for Host-side USB" > depends on USB_ARCH_HAS_HCD && BKL > > Well, that's very interesting definition of "obscure" :) > That's why I said /mostly/ obscure modules. There are soundcore, usb-core, drm, vfat and a few other very common ones, along with many obscure ones. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-25 13:06 ` Arnd Bergmann @ 2010-03-25 13:38 ` Arnd Bergmann 2010-03-26 23:47 ` Stefan Richter 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-25 13:38 UTC (permalink / raw) To: Jiri Kosina Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Thursday 25 March 2010, Arnd Bergmann wrote: > On Thursday 25 March 2010, Jiri Kosina wrote: > > config USB > > tristate "Support for Host-side USB" > > depends on USB_ARCH_HAS_HCD && BKL > > > > Well, that's very interesting definition of "obscure" :) > > > > That's why I said /mostly/ obscure modules. There are soundcore, usb-core, drm, > vfat and a few other very common ones, along with many obscure ones. FWIW, this is the full list of 148 modules that require the BKL in an x86 allmodconfig, which is probably the configuration with the largest code coverage. Arnd sound/soundcore.ko sound/soc/snd-soc-core.ko sound/oss/sound.ko sound/oss/msnd_pinnacle.ko sound/oss/msnd_classic.ko sound/core/snd.ko sound/core/snd-pcm.ko sound/core/seq/snd-seq.ko sound/core/oss/snd-pcm-oss.ko net/x25/x25.ko net/wanrouter/wanrouter.ko net/sunrpc/sunrpc.ko net/irda/irnet/irnet.ko net/irda/irda.ko net/ipx/ipx.ko net/appletalk/appletalk.ko fs/ufs/ufs.ko fs/udf/udf.ko fs/squashfs/squashfs.ko fs/smbfs/smbfs.ko fs/reiserfs/reiserfs.ko fs/qnx4/qnx4.ko fs/ocfs2/ocfs2_stack_user.ko fs/ocfs2/ocfs2.ko fs/nfsd/nfsd.ko fs/nfs/nfs.ko fs/ncpfs/ncpfs.ko fs/lockd/lockd.ko fs/jffs2/jffs2.ko fs/isofs/isofs.ko fs/hpfs/hpfs.ko fs/hfsplus/hfsplus.ko fs/freevxfs/freevxfs.ko fs/fat/vfat.ko fs/fat/msdos.ko fs/fat/fat.ko fs/ecryptfs/ecryptfs.ko fs/coda/coda.ko fs/autofs4/autofs4.ko fs/autofs/autofs.ko fs/afs/kafs.ko fs/adfs/adfs.ko drivers/usb/misc/usblcd.ko drivers/usb/misc/sisusbvga/sisusbvga.ko drivers/usb/misc/rio500.ko drivers/usb/misc/iowarrior.ko drivers/usb/misc/idmouse.ko drivers/usb/host/uhci-hcd.ko drivers/usb/gadget/gadgetfs.ko drivers/usb/gadget/g_printer.ko drivers/usb/core/usbcore.ko drivers/usb/class/usblp.ko drivers/telephony/ixj.ko drivers/scsi/st.ko drivers/scsi/scsi_tgt.ko drivers/scsi/pmcraid.ko drivers/scsi/osst.ko drivers/scsi/osd/osd.ko drivers/scsi/mpt2sas/mpt2sas.ko drivers/scsi/megaraid/megaraid_sas.ko drivers/scsi/megaraid/megaraid_mm.ko drivers/scsi/megaraid.ko drivers/scsi/gdth.ko drivers/scsi/dpt_i2o.ko drivers/scsi/ch.ko drivers/scsi/aacraid/aacraid.ko drivers/scsi/3w-xxxx.ko drivers/scsi/3w-sas.ko drivers/scsi/3w-9xxx.ko drivers/rtc/rtc-m41t80.ko drivers/pci/hotplug/cpqphp.ko drivers/net/wireless/ray_cs.ko drivers/net/wireless/airo.ko drivers/net/wan/cosa.ko drivers/net/ppp_generic.ko drivers/mtd/ubi/ubi.ko drivers/mtd/mtdchar.ko drivers/misc/phantom.ko drivers/message/i2o/i2o_config.ko drivers/message/fusion/mptctl.ko drivers/media/video/zoran/zr36067.ko drivers/media/video/videodev.ko drivers/media/video/usbvision/usbvision.ko drivers/media/video/usbvideo/vicam.ko drivers/media/video/tlg2300/poseidon.ko drivers/media/video/stv680.ko drivers/media/video/stradis.ko drivers/media/video/stkwebcam.ko drivers/media/video/se401.ko drivers/media/video/s2255drv.ko drivers/media/video/pwc/pwc.ko drivers/media/video/dabusb.ko drivers/media/video/cx88/cx8800.ko drivers/media/video/cx88/cx88-blackbird.ko drivers/media/video/cx23885/cx23885.ko drivers/media/video/cpia.ko drivers/media/video/bt8xx/bttv.ko drivers/media/radio/si470x/radio-usb-si470x.ko drivers/media/dvb/ttpci/dvb-ttpci.ko drivers/media/dvb/firewire/firedtv.ko drivers/media/dvb/dvb-core/dvb-core.ko drivers/media/dvb/bt8xx/dst_ca.ko drivers/isdn/mISDN/mISDN_core.ko drivers/isdn/i4l/isdn.ko drivers/isdn/hysdn/hysdn.ko drivers/isdn/hardware/eicon/divas.ko drivers/isdn/hardware/eicon/diva_mnt.ko drivers/isdn/hardware/eicon/diva_idi.ko drivers/isdn/divert/dss1_divert.ko drivers/isdn/capi/capifs.ko drivers/isdn/capi/capi.ko drivers/input/serio/serio_raw.ko drivers/input/misc/uinput.ko drivers/infiniband/core/rdma_ucm.ko drivers/infiniband/core/ib_uverbs.ko drivers/infiniband/core/ib_umad.ko drivers/infiniband/core/ib_ucm.ko drivers/ieee1394/video1394.ko drivers/ieee1394/raw1394.ko drivers/ieee1394/dv1394.ko drivers/ide/ide-tape.ko drivers/hwmon/fschmd.ko drivers/hid/usbhid/usbhid.ko drivers/hid/hid.ko drivers/gpu/drm/i830/i830.ko drivers/gpu/drm/i810/i810.ko drivers/gpu/drm/drm.ko drivers/firewire/firewire-core.ko drivers/char/toshiba.ko drivers/char/tlclk.ko drivers/char/stallion.ko drivers/char/raw.ko drivers/char/ppdev.ko drivers/char/pcmcia/cm4040_cs.ko drivers/char/pcmcia/cm4000_cs.ko drivers/char/mwave/mwave.ko drivers/char/lp.ko drivers/char/istallion.ko drivers/char/ipmi/ipmi_watchdog.ko drivers/char/ipmi/ipmi_devintf.ko drivers/char/ip2/ip2.ko drivers/char/i8k.ko drivers/char/dtlk.ko drivers/char/applicom.ko drivers/block/pktcdvd.ko drivers/block/paride/pt.ko drivers/block/paride/pg.ko drivers/block/DAC960.ko ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-25 13:38 ` Arnd Bergmann @ 2010-03-26 23:47 ` Stefan Richter 2010-03-27 9:16 ` [PATCH] firewire: char device files are not seekable (BKL removal) Stefan Richter ` (3 more replies) 0 siblings, 4 replies; 49+ messages in thread From: Stefan Richter @ 2010-03-26 23:47 UTC (permalink / raw) To: Arnd Bergmann Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur Arnd Bergmann wrote: > On Thursday 25 March 2010, Arnd Bergmann wrote: >> On Thursday 25 March 2010, Jiri Kosina wrote: >>> config USB >>> tristate "Support for Host-side USB" >>> depends on USB_ARCH_HAS_HCD && BKL >>> >>> Well, that's very interesting definition of "obscure" :) >>> >> That's why I said /mostly/ obscure modules. There are soundcore, usb-core, drm, >> vfat and a few other very common ones, along with many obscure ones. > > FWIW, this is the full list of 148 modules that require the BKL in an x86 allmodconfig, > which is probably the configuration with the largest code coverage. ... > drivers/media/dvb/firewire/firedtv.ko ... > drivers/ieee1394/video1394.ko > drivers/ieee1394/raw1394.ko > drivers/ieee1394/dv1394.ko ... > drivers/firewire/firewire-core.ko firewire-core and raw1394 do not actually require the BKL, they only miss to declare their files as not seekable. I will post patches which change these accordingly. My guess is that there is nothing to seek in dv1394, video1394, and firedtv either. Needless to say, there may be other character device file interfaces which cannot be seeked but don't admit it yet. -- Stefan Richter -=====-==-=- --== ==-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH] firewire: char device files are not seekable (BKL removal) 2010-03-26 23:47 ` Stefan Richter @ 2010-03-27 9:16 ` Stefan Richter 2010-03-27 9:20 ` [PATCH] ieee1394: " Stefan Richter ` (2 subsequent siblings) 3 siblings, 0 replies; 49+ messages in thread From: Stefan Richter @ 2010-03-27 9:16 UTC (permalink / raw) To: linux1394-devel Cc: linux-kernel, Arnd Bergmann, Kristian Hoegsberg, Jay Fenlason The <linux/firewire-cdev.h> character device file ABI is based on - ioctl() to initiate actions, - read() to consume events, - mmap() for isochronous I/O DMA buffers. lseek(), pread(), pwrite() (or any kind of write() at all) on the other hand are not applicable to /dev/fw* device files. Alas, whereas for example file_operations.write == NULL causes write() to be failed with an appropriate error, file_operations.llseek == NULL causes fs/read_write.c::default_llseek to be called on lseek() per default. This looks like not doing any harm, but it grabs the Big Kernel Lock. We don't want that, and we should return an error on lseek() and friends. This is provided by fs/read_write.c::no_llseek which we get if we clear the FMODE_LSEEK (and FMODE_PREAD, FMODE_PWRITE) flag by means of nonseekable_open(). Side note: The firewire-cdev interface has always been free of any BKL usage apart from this oversight regarding default_llseek (and from involuntary BKL usage by open() in older kernels). Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Somebody correct me if I got anything wrong in my patch description. This patch is motivated by Arnd's "bkl removal: make unlocked_ioctl mandatory" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/firewire/core-cdev.c;h=4464b9dc01a8c69258a1e8880e9a390f17420b6c;hp=4eeaed57e2197a0dd5f0cab7cffa4713eaf2ec96;hb=05e7753338045e9ee3950b2da032c5e5774efa90;hpb=03165e1d096afb4b1d9cfccdad66eed038121cec "BKL removal: mark remaining users as 'depends on BKL'" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/firewire/Kconfig;h=ae6d45900230dfbdc32c099d6a026ecfd0a6f5c2;hp=a9371b36a9b9c3074f5b31d7bdacf1bda72a15dd;hb=abb83d8fe5f8dcc8fca09bd9117429f73e1417e0;hpb=33c014b118f45516113d4b6823e40ea6f834dc6a drivers/firewire/core-cdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: b/drivers/firewire/core-cdev.c =================================================================== --- a/drivers/firewire/core-cdev.c +++ b/drivers/firewire/core-cdev.c @@ -226,7 +226,7 @@ static int fw_device_op_open(struct inod list_add_tail(&client->link, &device->client_list); mutex_unlock(&device->client_list_mutex); - return 0; + return nonseekable_open(inode, file); } static void queue_event(struct client *client, struct event *event, -- Stefan Richter -=====-==-=- --== ==-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH] ieee1394: char device files are not seekable (BKL removal) 2010-03-26 23:47 ` Stefan Richter 2010-03-27 9:16 ` [PATCH] firewire: char device files are not seekable (BKL removal) Stefan Richter @ 2010-03-27 9:20 ` Stefan Richter 2010-03-27 10:40 ` [PATCH RFC] DVB: add dvb_generic_nonseekable_open, dvb_generic_unlocked_ioctl, use in firedtv Stefan Richter 2010-03-27 14:37 ` [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 3 siblings, 0 replies; 49+ messages in thread From: Stefan Richter @ 2010-03-27 9:20 UTC (permalink / raw) To: linux1394-devel; +Cc: linux-kernel, Arnd Bergmann, Dan Dennedy The raw1394 character device file ABI is based on - write() or ioctl() to initiate actions, with write being used exactly like _IOW ioctls, - read() to consume events, - mmap() for isochronous I/O DMA buffers. The video1394 character device file ABI is based on - ioctl() to initiate actions, - mmap() for isochronous I/O DMA buffers. The dv1394 character device file ABI is based on - ioctl() to initiate actions, - read() and write() for simple serial reception and transmission of DV streams with a copy_to/from_user, - mmap() for isochronous I/O DMA buffers for I/O without user copy. lseek(), pread(), pwrite() on the other hand are not applicable to /dev/raw1394, /dev/video1394/*, and /dev/dv1394/* device files. Alas, file_operations.llseek == NULL causes lseek() to call fs/read_write.c::default_llseek per default. This looks like not doing any harm in either of the three drivers, but it grabs the Big Kernel Lock. We don't want that, and we should return an error on lseek() and friends. This is provided by fs/read_write.c::no_llseek which we get if we clear the FMODE_LSEEK (and FMODE_PREAD, FMODE_PWRITE) flag by means of nonseekable_open(). Side note: Apart from this oversight regarding default_llseek, the raw1394, video1394, and dv1394 interfaces have been converted away from BKL usage some time ago. Although all this is legacy code which should be left in peace until it is eventually removed (as it is superseded by firewire-core's <linux/firewire-cdev.h> ABI), this change seems still worth doing to further minimize the presence of BKL usage in the kernel. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Somebody correct me if I misunderstood anything of these ABIs. This patch is motivated by Arnd's "bkl removal: make unlocked_ioctl mandatory" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/ieee1394/raw1394.c;h=7c312331fc14facd8d3e2f8cf05daf090fd88e71;hp=8aa56ac07e2920d371a5af41eca7f09c8b752380;hb=05e7753338045e9ee3950b2da032c5e5774efa90;hpb=03165e1d096afb4b1d9cfccdad66eed038121cec etc., and "BKL removal: mark remaining users as 'depends on BKL'" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/ieee1394/Kconfig;h=0d4954c2615233ccdd8ed28ba2e7b36ed1c7941d;hp=e02096cf7d95bfa383bddb703bbfbff6b782e688;hb=abb83d8fe5f8dcc8fca09bd9117429f73e1417e0;hpb=33c014b118f45516113d4b6823e40ea6f834dc6a drivers/ieee1394/dv1394.c | 2 +- drivers/ieee1394/raw1394.c | 2 +- drivers/ieee1394/video1394.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) Index: b/drivers/ieee1394/dv1394.c =================================================================== --- a/drivers/ieee1394/dv1394.c +++ b/drivers/ieee1394/dv1394.c @@ -1824,7 +1824,7 @@ static int dv1394_open(struct inode *ino "and will not be available in the new firewire driver stack. " "Try libraw1394 based programs instead.\n", current->comm); - return 0; + return nonseekable_open(inode, file); } Index: b/drivers/ieee1394/raw1394.c =================================================================== --- a/drivers/ieee1394/raw1394.c +++ b/drivers/ieee1394/raw1394.c @@ -2834,7 +2834,7 @@ static int raw1394_open(struct inode *in file->private_data = fi; - return 0; + return nonseekable_open(inode, file); } static int raw1394_release(struct inode *inode, struct file *file) Index: b/drivers/ieee1394/video1394.c =================================================================== --- a/drivers/ieee1394/video1394.c +++ b/drivers/ieee1394/video1394.c @@ -1239,7 +1239,7 @@ static int video1394_open(struct inode * ctx->current_ctx = NULL; file->private_data = ctx; - return 0; + return nonseekable_open(inode, file); } static int video1394_release(struct inode *inode, struct file *file) -- Stefan Richter -=====-==-=- --== ==-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH RFC] DVB: add dvb_generic_nonseekable_open, dvb_generic_unlocked_ioctl, use in firedtv 2010-03-26 23:47 ` Stefan Richter 2010-03-27 9:16 ` [PATCH] firewire: char device files are not seekable (BKL removal) Stefan Richter 2010-03-27 9:20 ` [PATCH] ieee1394: " Stefan Richter @ 2010-03-27 10:40 ` Stefan Richter 2010-03-28 14:47 ` [PATCH RFC v2] " Stefan Richter 2010-03-27 14:37 ` [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 3 siblings, 1 reply; 49+ messages in thread From: Stefan Richter @ 2010-03-27 10:40 UTC (permalink / raw) To: linux-media Cc: linux-kernel, Mauro Carvalho Chehab, Arnd Bergmann, Henrik Kurelid In order to remove Big Kernel Lock usages from the DVB subsystem, we need to - provide .llseek file operations that do not grab the BKL (unlike fs/read_write.c::default_llseek) or mark files as not seekable, - provide .unlocked_ioctl file operations. Add two dvb_generic_ file operations for file interfaces which are not seekable and, respectively, do not require the BKL in their ioctl handlers. Use them in one driver of which I am sure of that these are applicable. (Affected code paths were not runtime-tested since I don't have a CAM.) Notes: - In order to maximize code reuse and minimize API churn, I pass a dummy NULL struct inode * through dvb_usercopy() to dvbdev->kernel_ioctl(). Very ugly; it may be better to convert everything from .ioctl to .unlocked_ioctl in one go and remove the inode pointer argument everywhere. - I suspect that actually all dvb_generic_open() users really want nonseekable_open --- then we should simply change dvb_generic_open() instead of adding dvb_generic_nonseekable_open() --- but I haven't checked other users of dvb_generic_open whether they require .llssek mehods other than fs/read_write.c::no_llseek. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- This patch is motivated by Arnd's "bkl removal: make unlocked_ioctl mandatory" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/media/dvb/firewire/firedtv-ci.c;h=7ab89035c101240a860d6c72bf8541a0fdf3aed9;hp=853e04b7cb361d45257f80dcafdcf121cd340c63;hb=05e7753338045e9ee3950b2da032c5e5774efa90;hpb=03165e1d096afb4b1d9cfccdad66eed038121cec "BKL removal: mark remaining users as 'depends on BKL'" http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=blobdiff;f=drivers/media/dvb/firewire/Kconfig;h=ba8938e26414c8c82c41677f87c1361ed8fcebd0;hp=4afa29256df110d3383b6b469762cefbb46436b6;hb=abb83d8fe5f8dcc8fca09bd9117429f73e1417e0;hpb=33c014b118f45516113d4b6823e40ea6f834dc6a drivers/media/dvb/dvb-core/dvbdev.c | 20 ++++++++++++++++++++ drivers/media/dvb/dvb-core/dvbdev.h | 11 +++++++---- drivers/media/dvb/firewire/firedtv-ci.c | 4 ++-- 3 files changed, 29 insertions(+), 6 deletions(-) Index: b/drivers/media/dvb/dvb-core/dvbdev.c =================================================================== --- a/drivers/media/dvb/dvb-core/dvbdev.c +++ b/drivers/media/dvb/dvb-core/dvbdev.c @@ -135,6 +135,18 @@ int dvb_generic_open(struct inode *inode EXPORT_SYMBOL(dvb_generic_open); +int dvb_generic_nonseekable_open(struct inode *inode, struct file *file) +{ + int retval = dvb_generic_open(inode, file); + + if (retval == 0) + retval = nonseekable_open(inode, file); + + return retval; +} +EXPORT_SYMBOL(dvb_generic_nonseekable_open); + + int dvb_generic_release(struct inode *inode, struct file *file) { struct dvb_device *dvbdev = file->private_data; @@ -170,6 +182,14 @@ int dvb_generic_ioctl(struct inode *inod EXPORT_SYMBOL(dvb_generic_ioctl); +long dvb_generic_unlocked_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + return dvb_generic_ioctl(NULL, file, cmd, arg); +} +EXPORT_SYMBOL(dvb_generic_unlocked_ioctl); + + static int dvbdev_get_free_id (struct dvb_adapter *adap, int type) { u32 id = 0; Index: b/drivers/media/dvb/dvb-core/dvbdev.h =================================================================== --- a/drivers/media/dvb/dvb-core/dvbdev.h +++ b/drivers/media/dvb/dvb-core/dvbdev.h @@ -136,10 +136,13 @@ extern int dvb_register_device (struct d extern void dvb_unregister_device (struct dvb_device *dvbdev); -extern int dvb_generic_open (struct inode *inode, struct file *file); -extern int dvb_generic_release (struct inode *inode, struct file *file); -extern int dvb_generic_ioctl (struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg); +extern int dvb_generic_open(struct inode *inode, struct file *file); +extern int dvb_generic_nonseekable_open(struct inode *inode, struct file *file); +extern int dvb_generic_release(struct inode *inode, struct file *file); +extern int dvb_generic_ioctl(struct inode *inode, struct file *file, + unsigned int cmd, unsigned long arg); +extern long dvb_generic_unlocked_ioctl(struct file *file, + unsigned int cmd, unsigned long arg); /* we don't mess with video_usercopy() any more, we simply define out own dvb_usercopy(), which will hopefully become Index: b/drivers/media/dvb/firewire/firedtv-ci.c =================================================================== --- a/drivers/media/dvb/firewire/firedtv-ci.c +++ b/drivers/media/dvb/firewire/firedtv-ci.c @@ -217,8 +217,8 @@ static unsigned int fdtv_ca_io_poll(stru static const struct file_operations fdtv_ca_fops = { .owner = THIS_MODULE, - .ioctl = dvb_generic_ioctl, - .open = dvb_generic_open, + .unlocked_ioctl = dvb_generic_unlocked_ioctl, + .open = dvb_generic_nonseekable_open, .release = dvb_generic_release, .poll = fdtv_ca_io_poll, }; -- Stefan Richter -=====-==-=- --== ==-== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH RFC v2] DVB: add dvb_generic_nonseekable_open, dvb_generic_unlocked_ioctl, use in firedtv 2010-03-27 10:40 ` [PATCH RFC] DVB: add dvb_generic_nonseekable_open, dvb_generic_unlocked_ioctl, use in firedtv Stefan Richter @ 2010-03-28 14:47 ` Stefan Richter 0 siblings, 0 replies; 49+ messages in thread From: Stefan Richter @ 2010-03-28 14:47 UTC (permalink / raw) To: linux-media Cc: linux-kernel, Mauro Carvalho Chehab, Arnd Bergmann, Henrik Kurelid In order to remove Big Kernel Lock usages from the DVB subsystem, we need to - provide .llseek file operations that do not grab the BKL (unlike fs/read_write.c::default_llseek) or mark files as not seekable, - provide .unlocked_ioctl file operations. Add two dvb_generic_ file operations for file interfaces which are not seekable and, respectively, do not require the BKL in their ioctl handlers. Use them in one driver of which I am sure of that these are applicable. (Affected code paths in firedtv-ci were not runtime-tested since I don't have a CAM, but the frontend ioctls were of course runtime-tested.) Notes: - The dvb-core internal dvb_usercopy() API is changed to match .unlocked_ioctl() prototypes. - I suspect that all dvb_generic_open() users really want nonseekable_open --- then we should simply change dvb_generic_open() instead of adding dvb_generic_nonseekable_open() --- but I haven't checked other users of dvb_generic_open whether they require .llssek mehods other than fs/read_write.c::no_llseek. Applies to: drivers/media/dvb/ttpci/av7110.c drivers/media/dvb/ttpci/av7110_av.c drivers/media/dvb/ttpci/av7110_ca.c drivers/media/dvb/dvb-core/dvb_net.c drivers/media/dvb/dvb-core/dvb_frontend.c drivers/media/dvb/dvb-core/dvb_ca_en50221.c - To be done by those who know the code: Check all users of struct dvb_device.kernel_ioctl whether they really need the BKL. Convert to .unlocked_ioctl and remove .kernel_ioctl and the temporarily introduced dvbdev.c::legacy_usercopy(). Applies to: drivers/media/dvb/ttpci/av7110.c drivers/media/dvb/ttpci/av7110_av.c drivers/media/dvb/ttpci/av7110_ca.c drivers/media/dvb/dvb-core/dvb_frontend.c Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Update: Split dvb_usercopy into one that follows the .unlocked_ioctl prototype form and another one that preserves the old form. drivers/media/dvb/dvb-core/dmxdev.c | 10 - drivers/media/dvb/dvb-core/dvb_ca_en50221.c | 6 drivers/media/dvb/dvb-core/dvb_net.c | 5 drivers/media/dvb/dvb-core/dvbdev.c | 190 +++++++++++++------- drivers/media/dvb/dvb-core/dvbdev.h | 23 +- drivers/media/dvb/firewire/firedtv-ci.c | 9 6 files changed, 148 insertions(+), 95 deletions(-) Index: b/drivers/media/dvb/dvb-core/dmxdev.c =================================================================== --- a/drivers/media/dvb/dvb-core/dmxdev.c +++ b/drivers/media/dvb/dvb-core/dmxdev.c @@ -963,8 +963,7 @@ dvb_demux_read(struct file *file, char _ return ret; } -static int dvb_demux_do_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, void *parg) +static long dvb_demux_do_ioctl(struct file *file, unsigned int cmd, void *parg) { struct dmxdev_filter *dmxdevfilter = file->private_data; struct dmxdev *dmxdev = dmxdevfilter->dev; @@ -1087,7 +1086,7 @@ static int dvb_demux_do_ioctl(struct ino static int dvb_demux_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { - return dvb_usercopy(inode, file, cmd, arg, dvb_demux_do_ioctl); + return dvb_usercopy(file, cmd, arg, dvb_demux_do_ioctl); } static unsigned int dvb_demux_poll(struct file *file, poll_table *wait) @@ -1152,8 +1151,7 @@ static struct dvb_device dvbdev_demux = .fops = &dvb_demux_fops }; -static int dvb_dvr_do_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, void *parg) +static long dvb_dvr_do_ioctl(struct file *file, unsigned int cmd, void *parg) { struct dvb_device *dvbdev = file->private_data; struct dmxdev *dmxdev = dvbdev->priv; @@ -1179,7 +1177,7 @@ static int dvb_dvr_do_ioctl(struct inode static int dvb_dvr_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { - return dvb_usercopy(inode, file, cmd, arg, dvb_dvr_do_ioctl); + return dvb_usercopy(file, cmd, arg, dvb_dvr_do_ioctl); } static unsigned int dvb_dvr_poll(struct file *file, poll_table *wait) Index: b/drivers/media/dvb/dvb-core/dvb_ca_en50221.c =================================================================== --- a/drivers/media/dvb/dvb-core/dvb_ca_en50221.c +++ b/drivers/media/dvb/dvb-core/dvb_ca_en50221.c @@ -1181,8 +1181,8 @@ static int dvb_ca_en50221_thread(void *d * * @return 0 on success, <0 on error. */ -static int dvb_ca_en50221_io_do_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, void *parg) +static long dvb_ca_en50221_io_do_ioctl(struct file *file, + unsigned int cmd, void *parg) { struct dvb_device *dvbdev = file->private_data; struct dvb_ca_private *ca = dvbdev->priv; @@ -1258,7 +1258,7 @@ static int dvb_ca_en50221_io_do_ioctl(st static int dvb_ca_en50221_io_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { - return dvb_usercopy(inode, file, cmd, arg, dvb_ca_en50221_io_do_ioctl); + return dvb_usercopy(file, cmd, arg, dvb_ca_en50221_io_do_ioctl); } Index: b/drivers/media/dvb/dvb-core/dvb_net.c =================================================================== --- a/drivers/media/dvb/dvb-core/dvb_net.c +++ b/drivers/media/dvb/dvb-core/dvb_net.c @@ -1333,8 +1333,7 @@ static int dvb_net_remove_if(struct dvb_ return 0; } -static int dvb_net_do_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, void *parg) +static long dvb_net_do_ioctl(struct file *file, unsigned int cmd, void *parg) { struct dvb_device *dvbdev = file->private_data; struct dvb_net *dvbnet = dvbdev->priv; @@ -1438,7 +1437,7 @@ static int dvb_net_do_ioctl(struct inode static int dvb_net_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { - return dvb_usercopy(inode, file, cmd, arg, dvb_net_do_ioctl); + return dvb_usercopy(file, cmd, arg, dvb_net_do_ioctl); } static int dvb_net_close(struct inode *inode, struct file *file) Index: b/drivers/media/dvb/dvb-core/dvbdev.c =================================================================== --- a/drivers/media/dvb/dvb-core/dvbdev.c +++ b/drivers/media/dvb/dvb-core/dvbdev.c @@ -135,6 +135,18 @@ int dvb_generic_open(struct inode *inode EXPORT_SYMBOL(dvb_generic_open); +int dvb_generic_nonseekable_open(struct inode *inode, struct file *file) +{ + int retval = dvb_generic_open(inode, file); + + if (retval == 0) + retval = nonseekable_open(inode, file); + + return retval; +} +EXPORT_SYMBOL(dvb_generic_nonseekable_open); + + int dvb_generic_release(struct inode *inode, struct file *file) { struct dvb_device *dvbdev = file->private_data; @@ -154,6 +166,102 @@ int dvb_generic_release(struct inode *in EXPORT_SYMBOL(dvb_generic_release); +#define STATIC_BUFFER_SIZE 128 + +/* If necessary, swap out static *buf by a slab-allocated buffer */ +static int get_arg(unsigned int cmd, unsigned long arg, void **parg, void **buf) +{ + switch (_IOC_DIR(cmd)) { + case _IOC_NONE: + /* + * For this command, the pointer is actually an integer + * argument. + */ + *parg = (void *)arg; + break; + case _IOC_READ: /* some v4l ioctls are marked wrong ... */ + case _IOC_WRITE: + case (_IOC_WRITE | _IOC_READ): + if (_IOC_SIZE(cmd) > STATIC_BUFFER_SIZE) { + *buf = kmalloc(_IOC_SIZE(cmd), GFP_KERNEL); + if (!*buf) + return -ENOMEM; + } + *parg = *buf; + + if (copy_from_user(*parg, (void __user *)arg, _IOC_SIZE(cmd))) + return -EFAULT; + } + return 0; +} + +static int put_arg(unsigned int cmd, unsigned long arg, void *parg) +{ + switch (_IOC_DIR(cmd)) { + case _IOC_READ: + case (_IOC_WRITE | _IOC_READ): + if (copy_to_user((void __user *)arg, parg, _IOC_SIZE(cmd))) + return -EFAULT; + } + return 0; +} + +/* + * Wrapper around ioctl handlers; copies arguments and results from/ to user. + * Could be changed into a "generic_usercopy()" for all kernel subsystems. + */ +long dvb_usercopy(struct file *file, unsigned int cmd, unsigned long arg, + long (*func)(struct file *file, unsigned int cmd, void *arg)) +{ + char sbuf[STATIC_BUFFER_SIZE]; + void *parg, *buf = sbuf; + long retval; + + retval = get_arg(cmd, arg, &parg, &buf); + if (retval < 0) + goto out; + + /* call driver */ + retval = func(file, cmd, parg); + if (retval == -ENOIOCTLCMD) + retval = -EINVAL; + if (retval < 0) + goto out; + + retval = put_arg(cmd, arg, parg); +out: + if (buf != sbuf) + kfree(buf); + return retval; +} + +static int legacy_usercopy(struct inode *inode, struct file *file, + unsigned int cmd, unsigned long arg, + int (*func)(struct inode *inode, struct file *file, + unsigned int cmd, void *arg)) +{ + char sbuf[STATIC_BUFFER_SIZE]; + void *parg, *buf = sbuf; + int retval; + + retval = get_arg(cmd, arg, &parg, &buf); + if (retval < 0) + goto out; + + /* call driver */ + retval = func(inode, file, cmd, parg); + if (retval == -ENOIOCTLCMD) + retval = -EINVAL; + if (retval < 0) + goto out; + + retval = put_arg(cmd, arg, parg); +out: + if (buf != sbuf) + kfree(buf); + return retval; +} + int dvb_generic_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { @@ -165,11 +273,27 @@ int dvb_generic_ioctl(struct inode *inod if (!dvbdev->kernel_ioctl) return -EINVAL; - return dvb_usercopy (inode, file, cmd, arg, dvbdev->kernel_ioctl); + return legacy_usercopy(inode, file, cmd, arg, dvbdev->kernel_ioctl); } EXPORT_SYMBOL(dvb_generic_ioctl); +long dvb_generic_unlocked_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dvb_device *dvbdev = file->private_data; + + if (!dvbdev) + return -ENODEV; + + if (!dvbdev->unlocked_ioctl) + return -EINVAL; + + return dvb_usercopy(file, cmd, arg, dvbdev->unlocked_ioctl); +} +EXPORT_SYMBOL(dvb_generic_unlocked_ioctl); + + static int dvbdev_get_free_id (struct dvb_adapter *adap, int type) { u32 id = 0; @@ -372,70 +496,6 @@ int dvb_unregister_adapter(struct dvb_ad } EXPORT_SYMBOL(dvb_unregister_adapter); -/* if the miracle happens and "generic_usercopy()" is included into - the kernel, then this can vanish. please don't make the mistake and - define this as video_usercopy(). this will introduce a dependecy - to the v4l "videodev.o" module, which is unnecessary for some - cards (ie. the budget dvb-cards don't need the v4l module...) */ -int dvb_usercopy(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg, - int (*func)(struct inode *inode, struct file *file, - unsigned int cmd, void *arg)) -{ - char sbuf[128]; - void *mbuf = NULL; - void *parg = NULL; - int err = -EINVAL; - - /* Copy arguments into temp kernel buffer */ - switch (_IOC_DIR(cmd)) { - case _IOC_NONE: - /* - * For this command, the pointer is actually an integer - * argument. - */ - parg = (void *) arg; - break; - case _IOC_READ: /* some v4l ioctls are marked wrong ... */ - case _IOC_WRITE: - case (_IOC_WRITE | _IOC_READ): - if (_IOC_SIZE(cmd) <= sizeof(sbuf)) { - parg = sbuf; - } else { - /* too big to allocate from stack */ - mbuf = kmalloc(_IOC_SIZE(cmd),GFP_KERNEL); - if (NULL == mbuf) - return -ENOMEM; - parg = mbuf; - } - - err = -EFAULT; - if (copy_from_user(parg, (void __user *)arg, _IOC_SIZE(cmd))) - goto out; - break; - } - - /* call driver */ - if ((err = func(inode, file, cmd, parg)) == -ENOIOCTLCMD) - err = -EINVAL; - - if (err < 0) - goto out; - - /* Copy results into user buffer */ - switch (_IOC_DIR(cmd)) - { - case _IOC_READ: - case (_IOC_WRITE | _IOC_READ): - if (copy_to_user((void __user *)arg, parg, _IOC_SIZE(cmd))) - err = -EFAULT; - break; - } - -out: - kfree(mbuf); - return err; -} static int dvb_uevent(struct device *dev, struct kobj_uevent_env *env) { Index: b/drivers/media/dvb/dvb-core/dvbdev.h =================================================================== --- a/drivers/media/dvb/dvb-core/dvbdev.h +++ b/drivers/media/dvb/dvb-core/dvbdev.h @@ -118,6 +118,7 @@ struct dvb_device { /* don't really need those !? -- FIXME: use video_usercopy */ int (*kernel_ioctl)(struct inode *inode, struct file *file, unsigned int cmd, void *arg); + long (*unlocked_ioctl)(struct file *file, unsigned int cmd, void *arg); void *priv; }; @@ -136,19 +137,15 @@ extern int dvb_register_device (struct d extern void dvb_unregister_device (struct dvb_device *dvbdev); -extern int dvb_generic_open (struct inode *inode, struct file *file); -extern int dvb_generic_release (struct inode *inode, struct file *file); -extern int dvb_generic_ioctl (struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg); - -/* we don't mess with video_usercopy() any more, -we simply define out own dvb_usercopy(), which will hopefully become -generic_usercopy() someday... */ - -extern int dvb_usercopy(struct inode *inode, struct file *file, - unsigned int cmd, unsigned long arg, - int (*func)(struct inode *inode, struct file *file, - unsigned int cmd, void *arg)); +extern int dvb_generic_open(struct inode *inode, struct file *file); +extern int dvb_generic_nonseekable_open(struct inode *inode, struct file *file); +extern int dvb_generic_release(struct inode *inode, struct file *file); +extern int dvb_generic_ioctl(struct inode *inode, struct file *file, + unsigned int cmd, unsigned long arg); +extern long dvb_generic_unlocked_ioctl(struct file *file, + unsigned int cmd, unsigned long arg); +extern long dvb_usercopy(struct file *file, unsigned int cmd, unsigned long arg, + long (*func)(struct file *file, unsigned int cmd, void *arg)); /** generic DVB attach function. */ #ifdef CONFIG_MEDIA_ATTACH Index: b/drivers/media/dvb/firewire/firedtv-ci.c =================================================================== --- a/drivers/media/dvb/firewire/firedtv-ci.c +++ b/drivers/media/dvb/firewire/firedtv-ci.c @@ -175,8 +175,7 @@ static int fdtv_ca_send_msg(struct fired return err; } -static int fdtv_ca_ioctl(struct inode *inode, struct file *file, - unsigned int cmd, void *arg) +static long fdtv_ca_ioctl(struct file *file, unsigned int cmd, void *arg) { struct dvb_device *dvbdev = file->private_data; struct firedtv *fdtv = dvbdev->priv; @@ -217,8 +216,8 @@ static unsigned int fdtv_ca_io_poll(stru static const struct file_operations fdtv_ca_fops = { .owner = THIS_MODULE, - .ioctl = dvb_generic_ioctl, - .open = dvb_generic_open, + .unlocked_ioctl = dvb_generic_unlocked_ioctl, + .open = dvb_generic_nonseekable_open, .release = dvb_generic_release, .poll = fdtv_ca_io_poll, }; @@ -228,7 +227,7 @@ static struct dvb_device fdtv_ca = { .readers = 1, .writers = 1, .fops = &fdtv_ca_fops, - .kernel_ioctl = fdtv_ca_ioctl, + .unlocked_ioctl = fdtv_ca_ioctl, }; int fdtv_ca_register(struct firedtv *fdtv) -- Stefan Richter -=====-==-=- --== ===-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-26 23:47 ` Stefan Richter ` (2 preceding siblings ...) 2010-03-27 10:40 ` [PATCH RFC] DVB: add dvb_generic_nonseekable_open, dvb_generic_unlocked_ioctl, use in firedtv Stefan Richter @ 2010-03-27 14:37 ` Arnd Bergmann 2010-03-28 12:27 ` Stefan Richter 2010-04-10 15:13 ` Stefan Richter 3 siblings, 2 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-27 14:37 UTC (permalink / raw) To: Stefan Richter Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Saturday 27 March 2010 00:47:33 Stefan Richter wrote: > firewire-core and raw1394 do not actually require the BKL, they only > miss to declare their files as not seekable. I will post patches which > change these accordingly. > > My guess is that there is nothing to seek in dv1394, video1394, and > firedtv either. > > Needless to say, there may be other character device file interfaces > which cannot be seeked but don't admit it yet. > Your patches look good, but it would be helpful to also set .llseek = no_llseek in the file operations, because that is much easier to grep for than only the nonseekable_open. While it's technically a NOP on the presence of nonseekable_open, it will help that I don't accidentally apply my patch on top of yours. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-27 14:37 ` [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann @ 2010-03-28 12:27 ` Stefan Richter 2010-03-28 20:05 ` Arnd Bergmann 2010-04-10 15:13 ` Stefan Richter 1 sibling, 1 reply; 49+ messages in thread From: Stefan Richter @ 2010-03-28 12:27 UTC (permalink / raw) To: Arnd Bergmann Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur Arnd Bergmann wrote: > Your patches look good, but it would be helpful to also set .llseek = no_llseek > in the file operations, because that is much easier to grep for than > only the nonseekable_open. While it's technically a NOP on the presence of > nonseekable_open, it will help that I don't accidentally apply my patch on > top of yours. Sounds like a plan, but (a) if my .llseek = no_llseek and your .llseek = default_llseek are not within diff context range, you (or whoever else merges mine and yours) only get a compiler warning (Initializer entry defined twice) rather than a merge conflict which couldn't be missed, (b) there won't be a merge conflict in "BKL removal: mark remaining users as 'depends on BKL'". (c) While I don't mind adding more visual clutter to ieee1394/*, I prefer terse coding in firewire/*. How about I put my nonseekable_open additions into a release branch and send you a pull request after a few days exposure in linux-next? If you do not plan to respin your patch queue soon or at all, I could even let you pull a for-arnd branch with a semantically correct merge of yours and mine. General thoughts: ".llseek = NULL," so far meant "do the Right Thing on lseek() and friends, as far as the fs core can tell". Shouldn't we keep it that way? It's as close to other ".method = NULL," as it can get, which either mean "silently skip this method if it doesn't matter" (e.g. .flush) or "fail attempts to use this method with a fitting errno" (e.g. .write). Of course, as we have already seen with infiniband, firewire, ieee1394, .llseek = NULL is ambiguous in practice. Does the driver really want to use default_llseek, or should it rather use no_llseek and/or nonseekable_open, or should it even implement a dummy_llseek() { return 0; } which avoids the BKL but preserves ABI behaviour? This needs to be resolved for each and every case eventually, regardless of whether or when your addition of .llseek = default_llseek enters mainline. -- Stefan Richter -=====-==-=- --== ===-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 12:27 ` Stefan Richter @ 2010-03-28 20:05 ` Arnd Bergmann 2010-03-28 20:15 ` Frederic Weisbecker 2010-04-08 20:45 ` Jan Blunck 0 siblings, 2 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-28 20:05 UTC (permalink / raw) To: Stefan Richter Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Sunday 28 March 2010, Stefan Richter wrote: > Arnd Bergmann wrote: > > Your patches look good, but it would be helpful to also set .llseek = no_llseek > > in the file operations, because that is much easier to grep for than > > only the nonseekable_open. While it's technically a NOP on the presence of > > nonseekable_open, it will help that I don't accidentally apply my patch on > > top of yours. > > Sounds like a plan, but (a) if my .llseek = no_llseek and your .llseek = > default_llseek are not within diff context range, you (or whoever else > merges mine and yours) only get a compiler warning (Initializer entry > defined twice) rather than a merge conflict which couldn't be missed, > (b) there won't be a merge conflict in "BKL removal: mark remaining > users as 'depends on BKL'". (c) While I don't mind adding more visual > clutter to ieee1394/*, I prefer terse coding in firewire/*. > > How about I put my nonseekable_open additions into a release branch and > send you a pull request after a few days exposure in linux-next? If you > do not plan to respin your patch queue soon or at all, I could even let > you pull a for-arnd branch with a semantically correct merge of yours > and mine. I can probably remember this specific one now, but for other people doing the same on their subsystems, adding no_llseek may help reduce the need for coordination. > General thoughts: > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > friends, as far as the fs core can tell". Shouldn't we keep it that > way? It's as close to other ".method = NULL," as it can get, which > either mean "silently skip this method if it doesn't matter" (e.g. > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > .write). My series changes the default from 'default_llseek' to 'generic_file_llseek', which is almost identical, except for taking the inode mutex instead of the BKL. Another option that has been discussed before is to make no_llseek the default, but that might cause more serious problems wiht drivers that really require seeking. Since using default_llseek can only ever make a difference if the driver actually uses the BKL in any other function, I could go through the patches again and revert those that do no use the BKL anywhere else. > Of course, as we have already seen with infiniband, firewire, ieee1394, > .llseek = NULL is ambiguous in practice. Does the driver really want to > use default_llseek, or should it rather use no_llseek and/or > nonseekable_open, or should it even implement a dummy_llseek() { return > 0; } which avoids the BKL but preserves ABI behaviour? This needs to be > resolved for each and every case eventually, regardless of whether or > when your addition of .llseek = default_llseek enters mainline. Yes, that also sounds like a good idea. I believe that Jan actually posted a patch to do that at some point. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 20:05 ` Arnd Bergmann @ 2010-03-28 20:15 ` Frederic Weisbecker 2010-03-28 21:34 ` Arnd Bergmann 2010-04-08 20:45 ` Jan Blunck 1 sibling, 1 reply; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 20:15 UTC (permalink / raw) To: Arnd Bergmann Cc: Stefan Richter, Jiri Kosina, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Sun, Mar 28, 2010 at 09:05:50PM +0100, Arnd Bergmann wrote: > > General thoughts: > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > friends, as far as the fs core can tell". Shouldn't we keep it that > > way? It's as close to other ".method = NULL," as it can get, which > > either mean "silently skip this method if it doesn't matter" (e.g. > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > .write). > > My series changes the default from 'default_llseek' to 'generic_file_llseek', > which is almost identical, except for taking the inode mutex instead of the > BKL. What if another file operation changes the file pointer while holding the bkl? You're not protected anymore in this case. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 20:15 ` Frederic Weisbecker @ 2010-03-28 21:34 ` Arnd Bergmann 2010-03-28 23:24 ` Frederic Weisbecker 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-28 21:34 UTC (permalink / raw) To: Frederic Weisbecker Cc: Stefan Richter, Jiri Kosina, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Sunday 28 March 2010, Frederic Weisbecker wrote: > On Sun, Mar 28, 2010 at 09:05:50PM +0100, Arnd Bergmann wrote: > > > General thoughts: > > > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > > friends, as far as the fs core can tell". Shouldn't we keep it that > > > way? It's as close to other ".method = NULL," as it can get, which > > > either mean "silently skip this method if it doesn't matter" (e.g. > > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > > .write). > > > > My series changes the default from 'default_llseek' to 'generic_file_llseek', > > which is almost identical, except for taking the inode mutex instead of the > > BKL. > > > What if another file operation changes the file pointer while holding the bkl? > You're not protected anymore in this case. > Exactly, that's why I changed all the drivers to set default_llseek explicitly. Even this is very likely not needed in more than a handful of drivers (if any), for a number of reasons: - sys_read/sys_write *never* hold any locks while calling file_pos_write(), which is the only place they get updated for regular files. - concurrent llseek plus other file operations on the same file descriptor usually already have an undefined outcome. - when I started inspecting drivers that look at file->f_pos themselves (not the read/write operation arguments), I found that practically all of them are doing this in a totally broken way! - The only think we'd probably ever want to lock against in llseek is readdir, which is not used in any drivers, but only in file systems. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 21:34 ` Arnd Bergmann @ 2010-03-28 23:24 ` Frederic Weisbecker 0 siblings, 0 replies; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 23:24 UTC (permalink / raw) To: Arnd Bergmann Cc: Stefan Richter, Jiri Kosina, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, John Kacur On Sun, Mar 28, 2010 at 10:34:54PM +0100, Arnd Bergmann wrote: > On Sunday 28 March 2010, Frederic Weisbecker wrote: > > On Sun, Mar 28, 2010 at 09:05:50PM +0100, Arnd Bergmann wrote: > > > > General thoughts: > > > > > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > > > friends, as far as the fs core can tell". Shouldn't we keep it that > > > > way? It's as close to other ".method = NULL," as it can get, which > > > > either mean "silently skip this method if it doesn't matter" (e.g. > > > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > > > .write). > > > > > > My series changes the default from 'default_llseek' to 'generic_file_llseek', > > > which is almost identical, except for taking the inode mutex instead of the > > > BKL. > > > > > > What if another file operation changes the file pointer while holding the bkl? > > You're not protected anymore in this case. > > > > Exactly, that's why I changed all the drivers to set default_llseek explicitly. Ah ok. > Even this is very likely not needed in more than a handful of drivers (if any), > for a number of reasons: > > - sys_read/sys_write *never* hold any locks while calling file_pos_write(), > which is the only place they get updated for regular files. Yeah sure. But the pushdown (or step by step replacement with generic_file_llseek) is still necessary to ensure every places are fine. > - concurrent llseek plus other file operations on the same file descriptor > usually already have an undefined outcome. Yeah. > - when I started inspecting drivers that look at file->f_pos themselves (not > the read/write operation arguments), I found that practically all of them > are doing this in a totally broken way! Hehe :) > - The only think we'd probably ever want to lock against in llseek > is readdir, which is not used in any drivers, but only in file systems. Right. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 20:05 ` Arnd Bergmann 2010-03-28 20:15 ` Frederic Weisbecker @ 2010-04-08 20:45 ` Jan Blunck 2010-04-08 21:27 ` Arnd Bergmann 1 sibling, 1 reply; 49+ messages in thread From: Jan Blunck @ 2010-04-08 20:45 UTC (permalink / raw) To: Arnd Bergmann Cc: Stefan Richter, Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, Alan Cox, Ingo Molnar, John Kacur On Sun, Mar 28, Arnd Bergmann wrote: > On Sunday 28 March 2010, Stefan Richter wrote: > > Arnd Bergmann wrote: > > > Your patches look good, but it would be helpful to also set .llseek = no_llseek > > > in the file operations, because that is much easier to grep for than > > > only the nonseekable_open. While it's technically a NOP on the presence of > > > nonseekable_open, it will help that I don't accidentally apply my patch on > > > top of yours. > > > > Sounds like a plan, but (a) if my .llseek = no_llseek and your .llseek = > > default_llseek are not within diff context range, you (or whoever else > > merges mine and yours) only get a compiler warning (Initializer entry > > defined twice) rather than a merge conflict which couldn't be missed, > > (b) there won't be a merge conflict in "BKL removal: mark remaining > > users as 'depends on BKL'". (c) While I don't mind adding more visual > > clutter to ieee1394/*, I prefer terse coding in firewire/*. > > > > How about I put my nonseekable_open additions into a release branch and > > send you a pull request after a few days exposure in linux-next? If you > > do not plan to respin your patch queue soon or at all, I could even let > > you pull a for-arnd branch with a semantically correct merge of yours > > and mine. > > I can probably remember this specific one now, but for other people > doing the same on their subsystems, adding no_llseek may help reduce > the need for coordination. > > > General thoughts: > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > friends, as far as the fs core can tell". Shouldn't we keep it that > > way? It's as close to other ".method = NULL," as it can get, which > > either mean "silently skip this method if it doesn't matter" (e.g. > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > .write). > > My series changes the default from 'default_llseek' to 'generic_file_llseek', That is not that easy. generic_file_llseek() is testing against 'offset < inode->i_sb->s_maxbytes'. This is not necessarily true when you think about directories with random offset cookies. I know that seeking on directories is stupid but don't blame me. > which is almost identical, except for taking the inode mutex instead of the > BKL. Another option that has been discussed before is to make no_llseek > the default, but that might cause more serious problems wiht drivers that > really require seeking. > > Since using default_llseek can only ever make a difference if the driver > actually uses the BKL in any other function, I could go through the > patches again and revert those that do no use the BKL anywhere else. > > > Of course, as we have already seen with infiniband, firewire, ieee1394, > > .llseek = NULL is ambiguous in practice. Does the driver really want to > > use default_llseek, or should it rather use no_llseek and/or > > nonseekable_open, or should it even implement a dummy_llseek() { return > > 0; } which avoids the BKL but preserves ABI behaviour? This needs to be > > resolved for each and every case eventually, regardless of whether or > > when your addition of .llseek = default_llseek enters mainline. > > Yes, that also sounds like a good idea. I believe that Jan actually posted > a patch to do that at some point. Yes, it is in http://git.infradead.org/users/jblunck/linux-2.6.git bkl/default-lseek There are some other patches in that branch that are not upstream yet. Mind to take them for your bkl-removal branch? Cheers, Jan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-04-08 20:45 ` Jan Blunck @ 2010-04-08 21:27 ` Arnd Bergmann 2010-04-08 21:30 ` Frederic Weisbecker 2010-04-09 11:02 ` Jan Blunck 0 siblings, 2 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-04-08 21:27 UTC (permalink / raw) To: Jan Blunck Cc: Stefan Richter, Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, Alan Cox, Ingo Molnar, John Kacur On Thursday 08 April 2010 22:45:45 Jan Blunck wrote: > On Sun, Mar 28, Arnd Bergmann wrote: > > > General thoughts: > > > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > > friends, as far as the fs core can tell". Shouldn't we keep it that > > > way? It's as close to other ".method = NULL," as it can get, which > > > either mean "silently skip this method if it doesn't matter" (e.g. > > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > > .write). > > > > My series changes the default from 'default_llseek' to 'generic_file_llseek', > > That is not that easy. generic_file_llseek() is testing against 'offset < > inode->i_sb->s_maxbytes'. This is not necessarily true when you think about > directories with random offset cookies. I know that seeking on directories is > stupid but don't blame me. Oh, I see. Would it work if we extend generic_file_llseek to only check s_maxbytes if S_ISREG(inode->i_mode)))? > > Yes, that also sounds like a good idea. I believe that Jan actually posted > > a patch to do that at some point. > > Yes, it is in > > http://git.infradead.org/users/jblunck/linux-2.6.git bkl/default-lseek > > There are some other patches in that branch that are not upstream yet. Mind to > take them for your bkl-removal branch? Frederic is now collecting the new patches. Your default-lseek series looks good to me, except for the obvious one that says 'FIXME' in the subject. Maybe Frederic can add your series except for that one as another branch to get pulled into his kill-the-bkl master branch. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-04-08 21:27 ` Arnd Bergmann @ 2010-04-08 21:30 ` Frederic Weisbecker 2010-04-09 11:02 ` Jan Blunck 1 sibling, 0 replies; 49+ messages in thread From: Frederic Weisbecker @ 2010-04-08 21:30 UTC (permalink / raw) To: Arnd Bergmann Cc: Jan Blunck, Stefan Richter, Jiri Kosina, linux-kernel, Matthew Wilcox, Thomas Gleixner, Alan Cox, Ingo Molnar, John Kacur On Thu, Apr 08, 2010 at 11:27:26PM +0200, Arnd Bergmann wrote: > On Thursday 08 April 2010 22:45:45 Jan Blunck wrote: > > On Sun, Mar 28, Arnd Bergmann wrote: > > > > General thoughts: > > > > > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > > > friends, as far as the fs core can tell". Shouldn't we keep it that > > > > way? It's as close to other ".method = NULL," as it can get, which > > > > either mean "silently skip this method if it doesn't matter" (e.g. > > > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > > > .write). > > > > > > My series changes the default from 'default_llseek' to 'generic_file_llseek', > > > > That is not that easy. generic_file_llseek() is testing against 'offset < > > inode->i_sb->s_maxbytes'. This is not necessarily true when you think about > > directories with random offset cookies. I know that seeking on directories is > > stupid but don't blame me. > > Oh, I see. Would it work if we extend generic_file_llseek to only check s_maxbytes > if S_ISREG(inode->i_mode)))? > > > > Yes, that also sounds like a good idea. I believe that Jan actually posted > > > a patch to do that at some point. > > > > Yes, it is in > > > > http://git.infradead.org/users/jblunck/linux-2.6.git bkl/default-lseek > > > > There are some other patches in that branch that are not upstream yet. Mind to > > take them for your bkl-removal branch? > > Frederic is now collecting the new patches. Your default-lseek series looks > good to me, except for the obvious one that says 'FIXME' in the subject. > > Maybe Frederic can add your series except for that one as another branch to > get pulled into his kill-the-bkl master branch. Ok, will have a look at this soon (I will also put a branch for the procfs series). Thanks. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-04-08 21:27 ` Arnd Bergmann 2010-04-08 21:30 ` Frederic Weisbecker @ 2010-04-09 11:02 ` Jan Blunck 1 sibling, 0 replies; 49+ messages in thread From: Jan Blunck @ 2010-04-09 11:02 UTC (permalink / raw) To: Arnd Bergmann Cc: Stefan Richter, Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, Alan Cox, Ingo Molnar, John Kacur On Thu, Apr 08, Arnd Bergmann wrote: > On Thursday 08 April 2010 22:45:45 Jan Blunck wrote: > > On Sun, Mar 28, Arnd Bergmann wrote: > > > > General thoughts: > > > > > > > > ".llseek = NULL," so far meant "do the Right Thing on lseek() and > > > > friends, as far as the fs core can tell". Shouldn't we keep it that > > > > way? It's as close to other ".method = NULL," as it can get, which > > > > either mean "silently skip this method if it doesn't matter" (e.g. > > > > .flush) or "fail attempts to use this method with a fitting errno" (e.g. > > > > .write). > > > > > > My series changes the default from 'default_llseek' to 'generic_file_llseek', > > > > That is not that easy. generic_file_llseek() is testing against 'offset < > > inode->i_sb->s_maxbytes'. This is not necessarily true when you think about > > directories with random offset cookies. I know that seeking on directories is > > stupid but don't blame me. > > Oh, I see. Would it work if we extend generic_file_llseek to only check s_maxbytes > if S_ISREG(inode->i_mode)))? > Yes and maybe rename generic_file_llseek to generic_llseek. Jan > > > Yes, that also sounds like a good idea. I believe that Jan actually posted > > > a patch to do that at some point. > > > > Yes, it is in > > > > http://git.infradead.org/users/jblunck/linux-2.6.git bkl/default-lseek > > > > There are some other patches in that branch that are not upstream yet. Mind to > > take them for your bkl-removal branch? > > Frederic is now collecting the new patches. Your default-lseek series looks > good to me, except for the obvious one that says 'FIXME' in the subject. > > Maybe Frederic can add your series except for that one as another branch to > get pulled into his kill-the-bkl master branch. > > Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-27 14:37 ` [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 2010-03-28 12:27 ` Stefan Richter @ 2010-04-10 15:13 ` Stefan Richter 1 sibling, 0 replies; 49+ messages in thread From: Stefan Richter @ 2010-04-10 15:13 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-kernel, linux1394-devel On 27 Mar, Arnd Bergmann wrote: > On Saturday 27 March 2010 00:47:33 Stefan Richter wrote: >> firewire-core and raw1394 do not actually require the BKL, they only >> miss to declare their files as not seekable. I will post patches which >> change these accordingly. > > Your patches look good, but it would be helpful to also set .llseek = no_llseek > in the file operations, because that is much easier to grep for than > only the nonseekable_open. While it's technically a NOP on the presence of > nonseekable_open, it will help that I don't accidentally apply my patch on > top of yours. I pushed modified versions of these patches out to linux1394-2.6.git now (master and for-next branch, on top of unrelated firewire updates). They contain the explicit .llseek = no_llseek now. http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff;h=3ac26b2ee30005930117fe6a180c139c5f300faf http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff;h=7cfe21aae155c26193fde617dc61d37a79a63f86 -- Stefan Richter -=====-==-=- -=-- -=-=- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-25 12:55 ` Jiri Kosina 2010-03-25 13:06 ` Arnd Bergmann @ 2010-03-28 21:58 ` Andi Kleen 2010-03-29 1:07 ` [GIT, RFC] Killing the Big Kernel Lock II Andi Kleen 1 sibling, 1 reply; 49+ messages in thread From: Andi Kleen @ 2010-03-28 21:58 UTC (permalink / raw) To: Jiri Kosina Cc: Arnd Bergmann, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh Jiri Kosina <jkosina@suse.cz> writes: > On Wed, 24 Mar 2010, Arnd Bergmann wrote: > >> I've spent some time continuing the work of the people on Cc and many others >> to remove the big kernel lock from Linux and I now have bkl-removal branch >> in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git >> that lets me run a kernel on my quad-core machine with the only users of the BKL >> being mostly obscure device driver modules. > > config USB > tristate "Support for Host-side USB" > depends on USB_ARCH_HAS_HCD && BKL > > Well, that's very interesting definition of "obscure" :) >From a quick grep at least EHCI doesn't seem to need it? Except for those two guys in core/*.c: /* keep API that guarantees BKL */ lock_kernel(); retval = driver->ioctl(intf, ctl->ioctl_code, buf); unlock_kernel(); if (retval == -ENOIOCTLCMD) retval = -ENOTTY; I guess that could be just moved into the low level modules with unlocked_ioctl And one use in the usbfs which seems quite bogus and can be probably removed. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock II 2010-03-28 21:58 ` Andi Kleen @ 2010-03-29 1:07 ` Andi Kleen 2010-03-29 11:48 ` Arnd Bergmann 0 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2010-03-29 1:07 UTC (permalink / raw) To: Jiri Kosina Cc: Arnd Bergmann, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh Andi Kleen <andi@firstfloor.org> writes: > Jiri Kosina <jkosina@suse.cz> writes: > >> On Wed, 24 Mar 2010, Arnd Bergmann wrote: >> >>> I've spent some time continuing the work of the people on Cc and many others >>> to remove the big kernel lock from Linux and I now have bkl-removal branch >>> in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git >>> that lets me run a kernel on my quad-core machine with the only users of the BKL >>> being mostly obscure device driver modules. >> >> config USB >> tristate "Support for Host-side USB" >> depends on USB_ARCH_HAS_HCD && BKL >> >> Well, that's very interesting definition of "obscure" :) > > From a quick grep at least EHCI doesn't seem to need it? As a followup: I killed some time by going through the various BKL uses in USB and came up with this git tree to address them . Feel free to integrate into your tree. With this only some obscure USB low level drivers still need to depend on BKL, the majority is clean. Gadgetfs also still needs it for now. I ended up also fixing some minor races in usb serial registration/ unregistration. Opens: - The usb serial ioctl entry needs to become a unlocked_ioctl, but I think that needs your tree first. The code below it doesn't need it anymore. - The seek function in uhci-debug.c probably is still racy. Only lightly tested. Some more reviewing would be appreciated -Andi The following changes since commit b72c40949b0f04728f2993a1434598d3bad094ea: Linus Torvalds (1): Merge branch 'for-linus' of git://git.kernel.org/.../jbarnes/pci-2.6 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl Andi Kleen (12): USB-BKL: Remove lock_kernel in usbfs update_sb() USB-BKL: Remove BKL from usb serial drivers ioctl handlers USB-BKL: Convert usb_driver ioctl to unlocked_ioctl USB-BKL: Remove BKL use for usb serial driver probing USB-BKL: Make usb monitor depend on BKL USB-BKL: Make usb lcd driver depend on BKL USB-BKL: Make usb sisvga driver depend on BKL USB-BKL: Make usb rio500 driver depend on BKL USB-BKL: Make usb iowarrior driver depend on BKL USB-BKL: Remove BKL use in uhci-debug USB-BKL: Make usb gadget fs depend on BKL USB-BKL: Make usb gadget printer depend on BKL drivers/usb/core/devio.c | 7 +---- drivers/usb/core/hub.c | 3 +- drivers/usb/core/inode.c | 4 --- drivers/usb/gadget/Kconfig | 3 +- drivers/usb/host/uhci-debug.c | 17 +++++---------- drivers/usb/misc/Kconfig | 6 ++-- drivers/usb/misc/sisusbvga/Kconfig | 2 +- drivers/usb/misc/usbtest.c | 3 +- drivers/usb/mon/Kconfig | 2 +- drivers/usb/serial/ark3116.c | 3 +- drivers/usb/serial/ch341.c | 3 +- drivers/usb/serial/cypress_m8.c | 8 +++--- drivers/usb/serial/ftdi_sio.c | 4 +- drivers/usb/serial/io_tables.h | 8 +++--- drivers/usb/serial/io_ti.c | 5 ++- drivers/usb/serial/kobil_sct.c | 3 +- drivers/usb/serial/mos7720.c | 3 +- drivers/usb/serial/mos7840.c | 3 +- drivers/usb/serial/opticon.c | 3 +- drivers/usb/serial/oti6858.c | 3 +- drivers/usb/serial/pl2303.c | 3 +- drivers/usb/serial/spcp8x5.c | 2 +- drivers/usb/serial/ti_usb_3410_5052.c | 6 ++-- drivers/usb/serial/usb-serial.c | 36 ++++++++++++++++---------------- drivers/usb/serial/whiteheat.c | 4 +- include/linux/usb.h | 2 +- include/linux/usb/serial.h | 2 +- 27 files changed, 74 insertions(+), 74 deletions(-) -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock II 2010-03-29 1:07 ` [GIT, RFC] Killing the Big Kernel Lock II Andi Kleen @ 2010-03-29 11:48 ` Arnd Bergmann 2010-03-29 12:30 ` Andi Kleen 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-29 11:48 UTC (permalink / raw) To: Andi Kleen Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh On Monday 29 March 2010, Andi Kleen wrote: > Andi Kleen <andi@firstfloor.org> writes: > > > > From a quick grep at least EHCI doesn't seem to need it? Right. > As a followup: > > I killed some time by going through the various BKL uses in USB and > came up with this git tree to address them . Feel free to integrate > into your tree. Great, I was hoping someone picks up the missing pieces. Maybe someone else can do the same for the sound drivers ;-). > With this only some obscure USB low level drivers still need to > depend on BKL, the majority is clean. Gadgetfs also still needs > it for now. Sure, that absolutely makes sense. > I ended up also fixing some minor races in usb serial registration/ > unregistration. > > Opens: > - The usb serial ioctl entry needs to become a unlocked_ioctl, > but I think that needs your tree first. The code below it doesn't > need it anymore. The usb-serial driver has a few instances where it takes the BKL in the mainline code, but this gets converted to the Big TTY Mutex in my series. The ioctl method in there is fine as far as I can tell. > - The seek function in uhci-debug.c probably is still racy. That function could be removed in favor of using generic_file_ioctl and setting i_size to up->size. Also, the race is only between concurrent calls of llseek on the same file descriptor, which is undefined anyway. The current code also doesn't protect you against partial updates of f_pos during ->read() on 32 bit systems (nothing ever does), and it even fails to protect against the concurrent llseek race because the assignment is done outside of the f_pos update. >commit 7160dc70d8a3e5a45c11eadfba05b9996966c6b4 >Author: Andi Kleen <ak@linux.intel.com> >Date: Mon Mar 29 01:26:35 2010 +0200 > > USB-BKL: Convert usb_driver ioctl to unlocked_ioctl > > And audit all the users. None needed the BKL. That was easy > because there was only very few around. > > Tested with allmodconfig build on x86-64 > > Signed-off-by: Andi Kleen <ak@linux.intel.com> >diff --git a/include/linux/usb.h b/include/linux/usb.h >index ce1323c..2803ca4 100644 >--- a/include/linux/usb.h >+++ b/include/linux/usb.h >@@ -846,7 +846,7 @@ struct usb_driver { > > void (*disconnect) (struct usb_interface *intf); > >- int (*ioctl) (struct usb_interface *intf, unsigned int code, >+ int (*unlocked_ioctl) (struct usb_interface *intf, unsigned int code, > void *buf); > > int (*suspend) (struct usb_interface *intf, pm_message_t message); The patch looks correct, but I probably wouldn't bother with the rename, and simply drop the BKL in the caller. >commit 89361b2e8d3dc3de83e105d60e49d5cdd9a68973 >Author: Andi Kleen <ak@linux.intel.com> >Date: Mon Mar 29 01:15:32 2010 +0200 > > USB-BKL: Remove BKL from usb serial drivers ioctl handlers > > I audited all the low level serial ioctl handlers and none of them > actually need the BKL. > > To make sure all code is checked change the usb_serial_driver ->ioctl > field to ->unlocked_ioctl > > Note this is still called for now with BKL held because tty drivers > don't have a ->unlocked_ioctl from the tty layer in mainline. > This could be trivially changed now though. > > Signed-off-by: Andi Kleen <ak@linux.intel.com> The serial_ioctl function is already called without the BKL, depite the name. tty_operations->ioctl was converted a long time ago, so I guess this patch can be dropped from your series. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock II 2010-03-29 11:48 ` Arnd Bergmann @ 2010-03-29 12:30 ` Andi Kleen 2010-03-29 14:43 ` Arnd Bergmann 0 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2010-03-29 12:30 UTC (permalink / raw) To: Arnd Bergmann Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh Arnd Bergmann <arnd@arndb.de> writes: >> - The seek function in uhci-debug.c probably is still racy. > > That function could be removed in favor of using generic_file_ioctl > and setting i_size to up->size. Does that lock against read in libfs? > Also, the race is only between concurrent calls of llseek on > the same file descriptor, which is undefined anyway. > The current code also doesn't protect you against partial updates > of f_pos during ->read() on 32 bit systems (nothing ever does), That is not what I meant. > and it even fails to protect against the concurrent llseek race > because the assignment is done outside of the f_pos update. I wasn't sure it would protect against parallel reads. Does it? > The patch looks correct, but I probably wouldn't bother with the rename, > and simply drop the BKL in the caller. I think a rename is better, I take compile errors over subtle breakage any day. >>Author: Andi Kleen <ak@linux.intel.com> >>Date: Mon Mar 29 01:15:32 2010 +0200 >> >> USB-BKL: Remove BKL from usb serial drivers ioctl handlers >> >> I audited all the low level serial ioctl handlers and none of them >> actually need the BKL. >> >> To make sure all code is checked change the usb_serial_driver ->ioctl >> field to ->unlocked_ioctl >> >> Note this is still called for now with BKL held because tty drivers >> don't have a ->unlocked_ioctl from the tty layer in mainline. >> This could be trivially changed now though. >> >> Signed-off-by: Andi Kleen <ak@linux.intel.com> > > The serial_ioctl function is already called without the BKL, depite the > name. tty_operations->ioctl was converted a long time ago, so I guess this > patch can be dropped from your series. Ok. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock II 2010-03-29 12:30 ` Andi Kleen @ 2010-03-29 14:43 ` Arnd Bergmann 2010-03-29 20:11 ` Andi Kleen 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-29 14:43 UTC (permalink / raw) To: Andi Kleen Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh On Monday 29 March 2010, Andi Kleen wrote: > Arnd Bergmann <arnd@arndb.de> writes: > > >> - The seek function in uhci-debug.c probably is still racy. > > > > That function could be removed in favor of using generic_file_ioctl > > and setting i_size to up->size. > > Does that lock against read in libfs? No. > > Also, the race is only between concurrent calls of llseek on > > the same file descriptor, which is undefined anyway. > > The current code also doesn't protect you against partial updates > > of f_pos during ->read() on 32 bit systems (nothing ever does), > > That is not what I meant. > > > and it even fails to protect against the concurrent llseek race > > because the assignment is done outside of the f_pos update. > > I wasn't sure it would protect against parallel reads. > > Does it? There is no way for any driver or file system right now to protect against that, nor has there been for a long time[1]. The sys_read and sys_write functions use file_pos_write() to update the file->f_pos without taking any lock, and they pass a local variable into the *ppos argument of the ->read/->write file operations, which means that the file operation itself cannot add locking to the update either. We never do in-place updates of file->f_pos, but on architectures where a 64 bit load can see incorrect data from a 64 bit store, any concurrent read/write/llseek combinations may cause problems, except for two concurrent lseek. Also, llseek is usually serialized with readdir/getdents for file systems. > > The patch looks correct, but I probably wouldn't bother with the rename, > > and simply drop the BKL in the caller. > > I think a rename is better, I take compile errors over subtle > breakage any day. ok, fine with me. Arnd [1] http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=55f09ec0087c160533eab791607d92c9ce6222ae was merged in linux-2.6.8, which opened this race. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock II 2010-03-29 14:43 ` Arnd Bergmann @ 2010-03-29 20:11 ` Andi Kleen 2010-03-31 15:30 ` Arnd Bergmann 0 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2010-03-29 20:11 UTC (permalink / raw) To: Arnd Bergmann Cc: Andi Kleen, Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh > > > The patch looks correct, but I probably wouldn't bother with the rename, > > > and simply drop the BKL in the caller. > > > > I think a rename is better, I take compile errors over subtle > > breakage any day. > > ok, fine with me. I updated the git tree removing the unneeded serial change. Feel free to pull. git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock II 2010-03-29 20:11 ` Andi Kleen @ 2010-03-31 15:30 ` Arnd Bergmann 0 siblings, 0 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-31 15:30 UTC (permalink / raw) To: Andi Kleen Cc: Jiri Kosina, Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar, gregkh, linux-usb On Monday 29 March 2010, Andi Kleen wrote: > > > > The patch looks correct, but I probably wouldn't bother with the rename, > > > > and simply drop the BKL in the caller. > > > > > > I think a rename is better, I take compile errors over subtle > > > breakage any day. > > > > ok, fine with me. > > I updated the git tree removing the unneeded serial change. Feel free to pull. > > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6.git usb-bkl Looks good to me. I guess it makes sense to merge this through Greg's USB tree. AFAICT the only prerequisite to this series is to have CONFIG_BKL enabled in Kconfig. Shall we add a that to 2.6.34 as 'def_bool y' for preparation so we can queue up patches like this in maintainer trees? Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (4 preceding siblings ...) 2010-03-25 12:55 ` Jiri Kosina @ 2010-03-25 13:40 ` Dan Carpenter 2010-03-25 14:14 ` Arnd Bergmann 2010-03-28 20:04 ` Frederic Weisbecker ` (4 subsequent siblings) 10 siblings, 1 reply; 49+ messages in thread From: Dan Carpenter @ 2010-03-25 13:40 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar There is a typo in this one: commit 12c8fcce56c0de4fdcacf048fe723c8778af940d Author: Arnd Bergmann <arnd@relay.de.ibm.com> Date: Wed Mar 24 20:08:55 2010 +0100 block: replace BKL with global mutex It doesn't seem to interact with much else, so give this a try. Signed-off-by: Arnd Bergmann <arnd@arndb.de> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index d9d6206..9c1277a 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c [ snip ] @@ -1641,7 +1643,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, ret = -ENXIO; - lock_kernel(); + mutex_unlock(&blkdev_mutex); p = dev_to_part(dev); bdev = bdget(part_devt(p)); if (bdev == NULL) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-25 13:40 ` [GIT, RFC] Killing the Big Kernel Lock Dan Carpenter @ 2010-03-25 14:14 ` Arnd Bergmann 0 siblings, 0 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-25 14:14 UTC (permalink / raw) To: Dan Carpenter Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Thursday 25 March 2010, Dan Carpenter wrote: > There is a typo in this one: > @@ -1641,7 +1643,7 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, > > ret = -ENXIO; > > - lock_kernel(); > + mutex_unlock(&blkdev_mutex); > p = dev_to_part(dev); > bdev = bdget(part_devt(p)); > if (bdev == NULL) > Fixed now, thanks for reporting! Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (5 preceding siblings ...) 2010-03-25 13:40 ` [GIT, RFC] Killing the Big Kernel Lock Dan Carpenter @ 2010-03-28 20:04 ` Frederic Weisbecker 2010-03-28 20:11 ` Frederic Weisbecker ` (3 subsequent siblings) 10 siblings, 0 replies; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 20:04 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, Mar 24, 2010 at 10:40:54PM +0100, Arnd Bergmann wrote: > I've spent some time continuing the work of the people on Cc and many others > to remove the big kernel lock from Linux and I now have bkl-removal branch > in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git > that lets me run a kernel on my quad-core machine with the only users of the BKL > being mostly obscure device driver modules. > > The oldest patch in this series is roughly eight years old and is Willy's patch > to remove the BKL from fs/locks.c, and I took a series of patches from Jan that > removes it from most of the VFS. > > The other non-obvious changes are: > > - all file operations that either have an .ioctl method or do not have their > own .llseek method used to implicitly require the BKL. I've changed that > so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl = > default_ioctl, and changed all the code that either has supplied a .ioctl > method or looks like it needs the BKL somewhere else, meaning the > default_llseek function might actually do something. > > - The block layer now has a global bkldev_mutex that is used in all block > drivers in place of the BKL. The only recursive instance of the BKL was > __blkdev_get(), which is now called with the blkdev_mutex held instead of > grabbing the BKL. This has some possible performance implications that > need to be looked into. > > - The init/main.c code no longer take the BKL. I figured that this was > completely unnecessary because there is no other code running at the > same time that takes the BKL. > > - The most invasive change is in the TTY layer, which has a new global > mutex (sorry!). I know that Alan has plans of his own to remove the BKL > from this subsystem, so my patches may not go anywhere, but they seem > to work fine for me. > I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably > makes more sense if you happen to speak German. > The basic idea here is to make recursive locking and the release-on-sleep > explicit, so every mutex_lock, wait_event, workqueue_flush and schedule > in the TTY layer now explicitly releases the BTM before blocking. > > - All drivers that still require the BKL are now listed as 'depends on BKL' > in Kconfig, and you can set that symbol to 'y', 'm' or 'n'. If the lock > itself is a module, only other modules can use it, and /proc/modules > will tell you exactly which ones those are. I've thought about adding > a module_init function in that module that will taint the kernel, but so > far I haven't done that. > > - Included is a debugfs file that gives statistics over the BKL usage from > early boot on. This is now obsolete and will not get merged, but I'm > including it for reference. > > Frederic has volunteered to help merging all of this upstream, which I > very much welcome. The shape that the tree is in now is very inconsistent, > especially some of the bits at the end are a bit dodgy and all of it needs > more testing. > > I've built-tested an allmodconfig kernel with CONFIG_BKL disabled > on x86_64, i386, powerpc64, powerpc32, s390 and arm to make sure I > catch all the modules that depend on BKL, and I've been running > various versions of this tree on my desktop machine over the last few > weeks while adding stuff. > > Arnd > > --- > > Arnd Bergmann (44): > input: kill BKL, fix input_open_file locking > ptrace: kill BKL > procfs: kill BKL in llseek > random: forbid llseek on random chardev > x86/microcode: use nonseekable_open > perf_event: use nonseekable_open I just queued the perf_event one. It looks pretty good. I'm also looking at some of the most trivials (ehm..less hards) in the list and see which we can submit right away. Thanks. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (6 preceding siblings ...) 2010-03-28 20:04 ` Frederic Weisbecker @ 2010-03-28 20:11 ` Frederic Weisbecker 2010-03-28 23:18 ` Frederic Weisbecker ` (2 subsequent siblings) 10 siblings, 0 replies; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 20:11 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, Mar 24, 2010 at 10:40:54PM +0100, Arnd Bergmann wrote: > Arnd Bergmann (44): > input: kill BKL, fix input_open_file locking This one is upstream now. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (7 preceding siblings ...) 2010-03-28 20:11 ` Frederic Weisbecker @ 2010-03-28 23:18 ` Frederic Weisbecker 2010-03-28 23:38 ` Frederic Weisbecker 2010-03-29 12:45 ` John Kacur 2010-03-31 22:11 ` Roland Dreier 10 siblings, 1 reply; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 23:18 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, Mar 24, 2010 at 10:40:54PM +0100, Arnd Bergmann wrote: > Arnd Bergmann (44): > [...] > procfs: kill BKL in llseek About this one, there is a "sensible" part: @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf, } static const struct file_operations proc_fdinfo_file_operations = { - .open = nonseekable_open, + .llseek = generic_file_llseek, .read = proc_fdinfo_read, }; Replacing default_llseek() by generic_file_llseek() as you did for most of the other parts is fine. But the above changes the semantics as it makes it seekable. Why not just keeping it as is? It just ends up in no_llseek(). ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 23:18 ` Frederic Weisbecker @ 2010-03-28 23:38 ` Frederic Weisbecker 2010-03-29 11:04 ` Arnd Bergmann 0 siblings, 1 reply; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-28 23:38 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote: > On Wed, Mar 24, 2010 at 10:40:54PM +0100, Arnd Bergmann wrote: > > Arnd Bergmann (44): > > [...] > > procfs: kill BKL in llseek > > > About this one, there is a "sensible" part: > > > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf, > } > > static const struct file_operations proc_fdinfo_file_operations = { > - .open = nonseekable_open, > + .llseek = generic_file_llseek, > .read = proc_fdinfo_read, > }; > > > Replacing default_llseek() by generic_file_llseek() as you > did for most of the other parts is fine. > > But the above changes the semantics as it makes it seekable. > Why not just keeping it as is? It just ends up in no_llseek(). > There is also the ioctl part that takes the bkl in procfs. I'll just check nothing weird happens there wrt file pos. We probably first need to pushdown the bkl in the procfs ioctl handlers. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-28 23:38 ` Frederic Weisbecker @ 2010-03-29 11:04 ` Arnd Bergmann 2010-03-29 17:59 ` Frederic Weisbecker 0 siblings, 1 reply; 49+ messages in thread From: Arnd Bergmann @ 2010-03-29 11:04 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Monday 29 March 2010, Frederic Weisbecker wrote: > On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote: > > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf, > > } > > > > static const struct file_operations proc_fdinfo_file_operations = { > > - .open = nonseekable_open, > > + .llseek = generic_file_llseek, > > .read = proc_fdinfo_read, > > }; > > > > > > Replacing default_llseek() by generic_file_llseek() as you > > did for most of the other parts is fine. > > > > But the above changes the semantics as it makes it seekable. > > Why not just keeping it as is? It just ends up in no_llseek(). The default is default_llseek, which uses the BKL and cannot be used if procfs is builtin and the BKL is a module. > There is also the ioctl part that takes the bkl in procfs. > I'll just check nothing weird happens there wrt file pos. > We probably first need to pushdown the bkl in the procfs > ioctl handlers. The BKL in procfs is only for proc files that have registered their own .ioctl instead of .unlocked_ioctl method. Converting every file_operations instance to provide an unlocked_ioctl (as one of the other patches does) makes sure that this path is never taken. BTW, there are less than a handful of procfs files that provide an ioctl operation, and those probably should never have been merged. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-29 11:04 ` Arnd Bergmann @ 2010-03-29 17:59 ` Frederic Weisbecker 2010-03-29 21:18 ` Arnd Bergmann 0 siblings, 1 reply; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-29 17:59 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Mon, Mar 29, 2010 at 12:04:24PM +0100, Arnd Bergmann wrote: > On Monday 29 March 2010, Frederic Weisbecker wrote: > > On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote: > > > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf, > > > } > > > > > > static const struct file_operations proc_fdinfo_file_operations = { > > > - .open = nonseekable_open, > > > + .llseek = generic_file_llseek, > > > .read = proc_fdinfo_read, > > > }; > > > > > > > > > Replacing default_llseek() by generic_file_llseek() as you > > > did for most of the other parts is fine. > > > > > > But the above changes the semantics as it makes it seekable. > > > Why not just keeping it as is? It just ends up in no_llseek(). > > The default is default_llseek, which uses the BKL and cannot be > used if procfs is builtin and the BKL is a module. Yeah, but you removed the nonseekable_open and made generic_file_llseek in llseek on this one. This makes it seekable while it wasn't, changing its ABI. It wasn't taking the bkl before that as it was calling no_llseek(). May be its non seekable property is irrelevant, I don't know, but if this behaviour must be changed, it should be in a separate patch as that dosn't deal with the bkl. > > There is also the ioctl part that takes the bkl in procfs. > > I'll just check nothing weird happens there wrt file pos. > > We probably first need to pushdown the bkl in the procfs > > ioctl handlers. > > The BKL in procfs is only for proc files that have registered > their own .ioctl instead of .unlocked_ioctl method. Converting > every file_operations instance to provide an unlocked_ioctl > (as one of the other patches does) makes sure that this path > is never taken. BTW, there are less than a handful of procfs files > that provide an ioctl operation, and those probably should never > have been merged. There are three of them. I'm going to make them .unlocked_ioctl and push the bkl inside, and warn on further uses of .ioctl, without applying the bkl there anymore. That plus your bkl removal in proc seek, should totally remove the bkl from procfs. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-29 17:59 ` Frederic Weisbecker @ 2010-03-29 21:18 ` Arnd Bergmann 0 siblings, 0 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-03-29 21:18 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Monday 29 March 2010 19:59:39 Frederic Weisbecker wrote: > On Mon, Mar 29, 2010 at 12:04:24PM +0100, Arnd Bergmann wrote: > > On Monday 29 March 2010, Frederic Weisbecker wrote: > > > On Mon, Mar 29, 2010 at 01:18:48AM +0200, Frederic Weisbecker wrote: > > > > @@ -1943,7 +1949,7 @@ static ssize_t proc_fdinfo_read(struct file *file, char __user *buf, > > > > } > > > > > > > > static const struct file_operations proc_fdinfo_file_operations = { > > > > - .open = nonseekable_open, > > > > + .llseek = generic_file_llseek, > > > > .read = proc_fdinfo_read, > > > > }; > > > > > > > > > > > > Replacing default_llseek() by generic_file_llseek() as you > > > > did for most of the other parts is fine. > > > > > > > > But the above changes the semantics as it makes it seekable. > > > > Why not just keeping it as is? It just ends up in no_llseek(). > > > > The default is default_llseek, which uses the BKL and cannot be > > used if procfs is builtin and the BKL is a module. > > Yeah, but you removed the nonseekable_open and made generic_file_llseek > in llseek on this one. > This makes it seekable while it wasn't, changing its ABI. > It wasn't taking the bkl before that as it was calling > no_llseek(). Ah, I see what you mean. That change was certainly not intentional an should be reverted. Thanks for pointing this out. > > The BKL in procfs is only for proc files that have registered > > their own .ioctl instead of .unlocked_ioctl method. Converting > > every file_operations instance to provide an unlocked_ioctl > > (as one of the other patches does) makes sure that this path > > is never taken. BTW, there are less than a handful of procfs files > > that provide an ioctl operation, and those probably should never > > have been merged. > > > There are three of them. I'm going to make them .unlocked_ioctl > and push the bkl inside, and warn on further uses of .ioctl, > without applying the bkl there anymore. > > That plus your bkl removal in proc seek, should totally remove the > bkl from procfs. Ok Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (8 preceding siblings ...) 2010-03-28 23:18 ` Frederic Weisbecker @ 2010-03-29 12:45 ` John Kacur 2010-03-31 22:11 ` Roland Dreier 10 siblings, 0 replies; 49+ messages in thread From: John Kacur @ 2010-03-29 12:45 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, Mar 24, 2010 at 11:40 PM, Arnd Bergmann <arnd@arndb.de> wrote: > I've spent some time continuing the work of the people on Cc and many others > to remove the big kernel lock from Linux and I now have bkl-removal branch > in my git tree at git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git > that lets me run a kernel on my quad-core machine with the only users of the BKL > being mostly obscure device driver modules. > > The oldest patch in this series is roughly eight years old and is Willy's patch > to remove the BKL from fs/locks.c, and I took a series of patches from Jan that > removes it from most of the VFS. > > The other non-obvious changes are: > > - all file operations that either have an .ioctl method or do not have their > own .llseek method used to implicitly require the BKL. I've changed that > so they need to explicitly set .llseek = default_llseek, .unlocked_ioctl = > default_ioctl, and changed all the code that either has supplied a .ioctl > method or looks like it needs the BKL somewhere else, meaning the > default_llseek function might actually do something. > > - The block layer now has a global bkldev_mutex that is used in all block > drivers in place of the BKL. The only recursive instance of the BKL was > __blkdev_get(), which is now called with the blkdev_mutex held instead of > grabbing the BKL. This has some possible performance implications that > need to be looked into. > > - The init/main.c code no longer take the BKL. I figured that this was > completely unnecessary because there is no other code running at the > same time that takes the BKL. > > - The most invasive change is in the TTY layer, which has a new global > mutex (sorry!). I know that Alan has plans of his own to remove the BKL > from this subsystem, so my patches may not go anywhere, but they seem > to work fine for me. > I've called the new lock the 'Big TTY Mutex' (BTM), a name that probably > makes more sense if you happen to speak German. > The basic idea here is to make recursive locking and the release-on-sleep > explicit, so every mutex_lock, wait_event, workqueue_flush and schedule > in the TTY layer now explicitly releases the BTM before blocking. > > - All drivers that still require the BKL are now listed as 'depends on BKL' > in Kconfig, and you can set that symbol to 'y', 'm' or 'n'. If the lock > itself is a module, only other modules can use it, and /proc/modules > will tell you exactly which ones those are. I've thought about adding > a module_init function in that module that will taint the kernel, but so > far I haven't done that. > > - Included is a debugfs file that gives statistics over the BKL usage from > early boot on. This is now obsolete and will not get merged, but I'm > including it for reference. > > Frederic has volunteered to help merging all of this upstream, which I > very much welcome. The shape that the tree is in now is very inconsistent, > especially some of the bits at the end are a bit dodgy and all of it needs > more testing. > > I've built-tested an allmodconfig kernel with CONFIG_BKL disabled > on x86_64, i386, powerpc64, powerpc32, s390 and arm to make sure I > catch all the modules that depend on BKL, and I've been running > various versions of this tree on my desktop machine over the last few > weeks while adding stuff. > > Arnd > > --- > > Arnd Bergmann (44): > input: kill BKL, fix input_open_file locking > ptrace: kill BKL > procfs: kill BKL in llseek > random: forbid llseek on random chardev > x86/microcode: use nonseekable_open > perf_event: use nonseekable_open > dm: use nonseekable_open > vgaarb: use nonseekable_open > kvm: don't require BKL > nvram: kill BKL > do_coredump: do not take BKL > hpet: kill BKL, add compat_ioctl > proc/pci: kill BKL > autofs/autofs4: move compat_ioctl handling into fs > usb/mon: kill BKL usage > fat: push down BKL > sunrpc: push down BKL > pcmcia: push down BKL > vfs: kill BKL in default_llseek > BKL: introduce CONFIG_BKL. > bkl-removal: make fops->ioctl and default_llseek optional > x86: update defconfig to CONFIG_BKL=m > bkl removal: make unlocked_ioctl mandatory > bkl removal: use default_llseek in code that uses the BKL > BKL removal: mark remaining users as 'depends on BKL' > tty: replace BKL with a new tty_lock > tty: make atomic_write_lock release tty_lock > tty: make tty_port->mutex nest under tty_lock > tty: make termios mutex nest under tty_lock > tty: make ldisc_mutex nest under tty_lock > tty: never hold tty_lock() while getting tty_mutex > ppp: use big tty mutex > tty: release tty lock when blocking > tty: implement BTM as mutex instead of BKL > briq_panel: do not use BTM > affs: remove leftover unlock_kernel > kvm: don't require BKL > block: replace BKL with global mutex > init: kill BKL usage > debug: instrument big kernel lock > BKL removal: make the BKL modular > > Matthew Wilcox (1): > [RFC] Remove BKL from fs/locks.c > > Jan Blunck (19): > JFS: Free sbi memory in error path > BKL: Explicitly add BKL around get_sb/fill_super > BKL: Remove outdated comment and include > BKL: Remove BKL from Amiga FFS > BKL: Remove BKL from BFS > BKL: Remove BKL from CifsFS > BKL: Remove BKL from ext3 fill_super() > BKL: Remove BKL from ext3_put_super() and ext3_remount() > BKL: Remove BKL from ext4 filesystem > BKL: Remove smp_lock.h from exofs > BKL: Remove BKL from HFS > BKL: Remove BKL from HFS+ > BKL: Remove BKL from JFS > BKL: Remove BKL from NILFS2 > BKL: Remove BKL from NTFS > BKL: Remove BKL from cgroup > BKL: Remove BKL from do_new_mount() > ext2: Add ext2_sb_info s_lock spinlock > BKL: Remove BKL from ext2 filesystem > -- Great, Arnd, I like this. I also have a private but stale tree where I have collected some remove bkl patches (which I will review against your tree.) I think that it is important that we keep chipping away at it though, and that we all keep sending stuff upstream when it is ready. Thanks John ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann ` (9 preceding siblings ...) 2010-03-29 12:45 ` John Kacur @ 2010-03-31 22:11 ` Roland Dreier 2010-03-31 22:20 ` Frederic Weisbecker 2010-04-01 8:50 ` Arnd Bergmann 10 siblings, 2 replies; 49+ messages in thread From: Roland Dreier @ 2010-03-31 22:11 UTC (permalink / raw) To: Arnd Bergmann Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar Hi Arnd, Looking at your tree, I see you have commit 753dd249 ("perf_event: use nonseekable_open") that does: > --- a/kernel/perf_event.c > +++ b/kernel/perf_event.c > @@ -2515,6 +2515,8 @@ static int perf_fasync(int fd, struct file *filp, int on) > } > > static const struct file_operations perf_fops = { > + .open = nonseekable_open, > + .llseek = no_llseek, > .release = perf_release, > .read = perf_read, > .poll = perf_poll, But if I understand this correctly, the assignment to .open is at best useless -- these file_operations are only used via anon_inode_getfd() and so there is no possible path that can call the .open method. Or am I missing something? (The same applies to the kvm_main.c changes too) -- Roland Dreier <rolandd@cisco.com> || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-31 22:11 ` Roland Dreier @ 2010-03-31 22:20 ` Frederic Weisbecker 2010-04-01 8:50 ` Arnd Bergmann 1 sibling, 0 replies; 49+ messages in thread From: Frederic Weisbecker @ 2010-03-31 22:20 UTC (permalink / raw) To: Roland Dreier Cc: Arnd Bergmann, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Wed, Mar 31, 2010 at 03:11:03PM -0700, Roland Dreier wrote: > Hi Arnd, > > Looking at your tree, I see you have commit 753dd249 ("perf_event: use > nonseekable_open") that does: > > > --- a/kernel/perf_event.c > > +++ b/kernel/perf_event.c > > @@ -2515,6 +2515,8 @@ static int perf_fasync(int fd, struct file *filp, int on) > > } > > > > static const struct file_operations perf_fops = { > > + .open = nonseekable_open, > > + .llseek = no_llseek, > > .release = perf_release, > > .read = perf_read, > > .poll = perf_poll, > > But if I understand this correctly, the assignment to .open is at best > useless -- these file_operations are only used via anon_inode_getfd() > and so there is no possible path that can call the .open method. Or am > I missing something? > > (The same applies to the kvm_main.c changes too) Good point, I'll update that. Thanks. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [GIT, RFC] Killing the Big Kernel Lock 2010-03-31 22:11 ` Roland Dreier 2010-03-31 22:20 ` Frederic Weisbecker @ 2010-04-01 8:50 ` Arnd Bergmann 1 sibling, 0 replies; 49+ messages in thread From: Arnd Bergmann @ 2010-04-01 8:50 UTC (permalink / raw) To: Roland Dreier Cc: Frederic Weisbecker, linux-kernel, Matthew Wilcox, Thomas Gleixner, jblunck, Alan Cox, Ingo Molnar On Thursday 01 April 2010, Roland Dreier wrote: > Looking at your tree, I see you have commit 753dd249 ("perf_event: use > nonseekable_open") that does: > > > --- a/kernel/perf_event.c > > +++ b/kernel/perf_event.c > > @@ -2515,6 +2515,8 @@ static int perf_fasync(int fd, struct file *filp, int on) > > } > > > > static const struct file_operations perf_fops = { > > + .open = nonseekable_open, > > + .llseek = no_llseek, > > .release = perf_release, > > .read = perf_read, > > .poll = perf_poll, > > But if I understand this correctly, the assignment to .open is at best > useless -- these file_operations are only used via anon_inode_getfd() > and so there is no possible path that can call the .open method. Or am > I missing something? You're right. I did not consider this. Arnd ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2010-04-10 15:13 UTC | newest] Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-03-24 21:40 [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 2010-03-24 21:07 ` Andrew Morton 2010-03-25 10:26 ` Arnd Bergmann 2010-03-28 20:33 ` Frederic Weisbecker 2010-03-24 21:53 ` Roland Dreier 2010-03-24 21:59 ` Arnd Bergmann 2010-03-31 5:22 ` Roland Dreier 2010-03-24 22:10 ` Alan Cox 2010-03-24 22:25 ` Arnd Bergmann 2010-03-24 22:23 ` Ingo Molnar 2010-03-25 12:55 ` Jiri Kosina 2010-03-25 13:06 ` Arnd Bergmann 2010-03-25 13:38 ` Arnd Bergmann 2010-03-26 23:47 ` Stefan Richter 2010-03-27 9:16 ` [PATCH] firewire: char device files are not seekable (BKL removal) Stefan Richter 2010-03-27 9:20 ` [PATCH] ieee1394: " Stefan Richter 2010-03-27 10:40 ` [PATCH RFC] DVB: add dvb_generic_nonseekable_open, dvb_generic_unlocked_ioctl, use in firedtv Stefan Richter 2010-03-28 14:47 ` [PATCH RFC v2] " Stefan Richter 2010-03-27 14:37 ` [GIT, RFC] Killing the Big Kernel Lock Arnd Bergmann 2010-03-28 12:27 ` Stefan Richter 2010-03-28 20:05 ` Arnd Bergmann 2010-03-28 20:15 ` Frederic Weisbecker 2010-03-28 21:34 ` Arnd Bergmann 2010-03-28 23:24 ` Frederic Weisbecker 2010-04-08 20:45 ` Jan Blunck 2010-04-08 21:27 ` Arnd Bergmann 2010-04-08 21:30 ` Frederic Weisbecker 2010-04-09 11:02 ` Jan Blunck 2010-04-10 15:13 ` Stefan Richter 2010-03-28 21:58 ` Andi Kleen 2010-03-29 1:07 ` [GIT, RFC] Killing the Big Kernel Lock II Andi Kleen 2010-03-29 11:48 ` Arnd Bergmann 2010-03-29 12:30 ` Andi Kleen 2010-03-29 14:43 ` Arnd Bergmann 2010-03-29 20:11 ` Andi Kleen 2010-03-31 15:30 ` Arnd Bergmann 2010-03-25 13:40 ` [GIT, RFC] Killing the Big Kernel Lock Dan Carpenter 2010-03-25 14:14 ` Arnd Bergmann 2010-03-28 20:04 ` Frederic Weisbecker 2010-03-28 20:11 ` Frederic Weisbecker 2010-03-28 23:18 ` Frederic Weisbecker 2010-03-28 23:38 ` Frederic Weisbecker 2010-03-29 11:04 ` Arnd Bergmann 2010-03-29 17:59 ` Frederic Weisbecker 2010-03-29 21:18 ` Arnd Bergmann 2010-03-29 12:45 ` John Kacur 2010-03-31 22:11 ` Roland Dreier 2010-03-31 22:20 ` Frederic Weisbecker 2010-04-01 8:50 ` Arnd Bergmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.