Discussion:
2.6.24-rc6-mm1
(too old to reply)
Andrew Morton
2007-12-23 07:40:12 UTC
Permalink
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/

- This kernel doesn't work on i386!

It oopses late in boot due to an unrevertable change (e3c1b141) in git-x86
which I stared at for a while then I ran out of time and gave up.

I would have just abandoned this release until it was fixed but I'll be
largely offline for ten days starting tomorrow.

The culprits have been notified and hopefully we'll have a patch for
hot-fixes/ tomorrow.

x86_64 and powerpc work OK though.

- git-block is dropped due to more conflicts that I'm prepared to repair
with git-scsi-misc

- git-perfmon is dropped due to conflicts with git-x86

- git-kgdb is dropped due to conflicts with git-x86

- git-newsetup is dropped due to conflicts with git-x86

- Andi's x86 quilt tree is dropped due to conflicts with git-x86

- Someone broke suspend-to-RAM on the t61p again. It just instantly resumes
itself.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
mm-commits mailing list.

echo "subscribe mm-commits" | mail ***@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
most valuable if you can perform a bisection search to identify which patch
introduced the bug. Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

But beware that this process takes some time (around ten rebuilds and
reboots), so consider reporting the bug first and if we cannot immediately
identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
email Subject: in some manner to reflect the nature of the bug. Some
developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
the mm-commits list. These probably are at least compilable.

- More-than-daily -mm snapshots may be found at
http://userweb.kernel.org/~akpm/mmotm/. These are almost certainly not
compileable.



Changes since 2.6.24-rc5-mm1:


origin.patch
git-acpi.patch
git-alsa.patch
git-agpgart.patch
git-arm.patch
git-avr32.patch
git-cpufreq.patch
git-powerpc.patch
git-drm.patch
git-dvb.patch
git-hwmon.patch
git-gfs2-nmw.patch
git-hid.patch
git-hrt.patch
git-ieee1394.patch
git-infiniband.patch
git-input.patch
git-jfs.patch
git-kbuild.patch
git-kvm.patch
git-lblnet.patch
git-leds.patch
git-libata-all.patch
git-md-accel.patch
git-mips.patch
git-mmc.patch
git-mtd.patch
git-ubi.patch
git-net.patch
git-net-fixup.patch
git-netdev-all.patch
git-battery.patch
git-nfsd.patch
git-ocfs2.patch
git-selinux.patch
git-s390.patch
git-sched.patch
git-sched-fixup.patch
git-sh.patch
git-scsi-misc.patch
git-unionfs.patch
git-v9fs.patch
git-watchdog.patch
git-watchdog-fixup.patch
git-wireless.patch
git-ipwireless_cs.patch
git-x86.patch
git-x86-fixup.patch
git-xfs.patch
git-cryptodev.patch
git-cryptodev-fixup.patch
git-xtensa.patch

git trees

-revert-hibernation-use-temporary-page-tables-for-kernel-text-mapping-on-x86_64.patch
-uml-stop-gdb-from-deleting-breakpoints-when-running-uml.patch
-alpha-strncpy-strncat-fixes.patch
-rtc-at32ap700x-fix-irq-init-oops.patch
-parport-dev-timeslice-is-an-unsigned-long-not-an-int.patch
-ecryptfs-initialize-new-auth_tokens-before-teardown.patch
-knfsd-change-mailing-list-for-nfsd-in-maintainers.patch
-fix-lguest-documentation.patch
-sparsemem-make-sparsemem_vmemmap-selectable.patch
-fs-kconfig-grammar-fix.patch
-ext3-ext4-avoid-divide-by-zero.patch
-alpha-build-fixes.patch
-git-acpi-ia64-build-fix.patch
-git-acpi-build-fix.patch
-acpi-add-reboot-mechanism.patch
-acpi-cleanup-linux-acpih.patch
-alsa-nopage.patch
-alsa-usx2y-nopage.patch
-drivers-char-remove-unnecessary-pci_dev_put.patch
-git-cpufreq-query_current_values_with_pending_wait-build-fix.patch
-agk-dm-dm-table-detect-io-beyond-device.patch
-agk-dm-dm-mpath-hp-requires-scsi.patch
-agk-dm-dm-crypt-fix-write-endio.patch
-agk-dm-dm-trigger-change-uevent-on-rename.patch
-agk-dm-dm-merge-max_hw_sector.patch
-agk-dm-dm-crypt-use-bio_add_page.patch
-agk-dm-dm-ioctl-move-compat-code-fix.patch
-dm-persistent_read_metadata-warning-fix.patch
-arch-powerpc-remove-duplicate-includes.patch
-arch-ppc-remove-duplicate-includes.patch
-arch-ppc-remove-an-unnecessary-pci_dev_put.patch
-powerpc-kill-non-existent-symbols-from-ksyms-and-commproch.patch
-powerpc-fix-typo-ifdef-ifndef.patch
-powerpc-add-support-for-porta-and-portb-odr-registers.patch
-powerpc-stop-the-toc-overflowing-for-large-builds.patch
-ppc-fix-missed-increment-on-device-interface-counter.patch
-ppc-chrp-fix-possible-null-pointer-dereference.patch
-ppc-chrp-fix-possible-null-pointer-dereference-checkpatch-fixes.patch
-powerpc-dont-cast-a-pointer-to-pointer-of-list_head.patch
-arch-powerpc-add-missing-of_node_put.patch
-arch-powerpc-platforms-cell-cbe_regsc-add-missing-of_node_put.patch
-gregkh-driver-kobject-fix-the-documentation-of-how-kobject_set_name-works.patch
-revert-gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch
-gregkh-driver-kset-convert-acpi-to-use-kset_create.patch
-gregkh-driver-kobject-remove-old-outdated-documentation.patch
-gregkh-driver-kobject-update-the-kobject-kset-documentation.patch
-gregkh-driver-kobject-add-sample-code-for-how-to-use-kobjects-in-a-simple-manner.patch
-gregkh-driver-kobject-add-sample-code-for-how-to-use-ksets-ktypes-kobjects.patch
-gregkh-driver-kobject-warn.patch
-mga_dma-return-err-not-just-zero-from-mga_do_cleanup_dma.patch
-drm-dont-cast-a-pointer-to-pointer-of-list_head.patch
-git-dvb-fix-build-in-drivers-media-dvb-frontends-tda18271h.patch
-git-dvb-one-videobuf_read_start-is-enough.patch
-git-dvb-drivers-media-dvb-frontends-zl10353c-avoid-64-bit-divide.patch
-git-dvb-drivers-media-video-et61x251-et61x251_corec-fix-warnings.patch
-media-video-usbvision-add-mutex_unlock-to-error-paths.patch
-media-video-usbvision-add-mutex_unlock-to-error-paths-fix.patch
-media-video-usbvision-remove-ctrlurblock.patch
-i2c-fix-drivers-media-video-bt866c.patch
-gfs2-avoid-64-bit-divide.patch
-ia64-slim-down-__clear_bit_unlock.patch
-ia64-signal-remove-redundant-code-in-setup_sigcontext.patch
-ia64-ia32-nopage.patch
-ieee1394-nopage.patch
-ib-nopage.patch
-fix-build-failure-when-config_infiniband_ipoib_cm-is-not-defined.patch
-fujitsu-application-panel-driver.patch
-apanel-free-input-device-on-close.patch
-apanel-change-name-of-led.patch
-apanel-detach-on-shutdown.patch
-apanel-use-generic-keycode-routines.patch
-fujitsu-application-panel-led-value.patch
-ads7846-stop-updating-dev-powerpower_state.patch
-drivers-ata-libata-ehc-fix-printk-warning.patch
-pata_hpt37x-fix-outstanding-bug-reports-on-the-hpt374-and-37x-cable-detect-checkpatch-fixes.patch
-pata_pcmcia-minor-cleanups-and-support-for-dual-channel-cards.patch
-ata-ahci-enclosure-management-via-led.patch
-pata_legacy-restructure-and-revamp.patch
-ide-mm-ide-scsi-add-ide_scsi_hex_dump-helper.patch
-ide-mm-ide-add-missing-checks-for-control-register-existence.patch
-ide-mm-ide-deprecate-config_blk_dev_offboard.patch
-ide-mm-ide-fix-ide_scan_pcibus-error-message.patch
-ide-mm-ide-coding-style-fixes-for-drivers-ide-setup-pci-c.patch
-ide-mm-ide-add-sys-bus-ide-devices-model-firmware-serial-sysfs-entries.patch
-ide-mm-ide-dma-reporting-and-validity-checking-fixes-take-3.patch
-ide-mm-ide-cd-remove-dead-post_transform_command.patch
-ide-mm-pdc202xx_new-fix-promise-tx4-support.patch
-ide-mm-hpt366-fix-hpt37x-pio-mode-timings-take-2.patch
-ide-mm-ide-remove-dead-code-from-__ide_dma_test_irq.patch
-ide-mm-ide-remove-stale-changelog-from-ide-disk-c.patch
-ide-mm-ide-remove-stale-changelog-from-ide-probe-c.patch
-md-balance-braces-in-raid5-debug-code.patch
-mips-fix-makefile-borkage.patch
-mips-remove-dead-config-symbols-from-mips-code.patch
-ipsec-fix-reversed-icmp6-policy-check.patch
-ipsec-do-not-let-packets-pass-when-icmp-flag-is-off.patch
-git-net-fix-drivers-net-ns83820c-build.patch
-updates-to-nfsroot-documentation-take-3.patch
-net-use-mutex_is_locked-for-assert_rtnl.patch
-tipc-fix-semaphore-handling.patch
-ppp-synchronous-tty-convert-dead_sem-to-completion.patch
-ucc_geth-fix-build-break-introduced-by-commit-09f75cd7bf13720738e6a196cc0107ce9a5bd5a0-checkpatch-fixes.patch
-pcmcia-net-use-roundup_pow_of_two-macro-instead-of-grotesque-loop.patch
-net-ibm_newemac-remove-spin_lock_unlocked.patch
-e100-free-irq-to-remove-warning-when-rebooting.patch
-net-smc911x-shut-up-compiler-warnings.patch
-bnx2x-depends-on-zlib_inflate.patch
-plip-driver-convert-killed_timer_sem-to-completion.patch
-pcie-fix-double-initialization-bug.patch
-pci-dont-load-acpi_php-when-acpi-is-disabled.patch
-pci-dont-load-acpi_php-when-acpi-is-disabled-fix.patch
-track-accurate-idle-time-with-tick_schedidle_sleeptime.patch
-merge-multiple-error-paths-in-alloc_uid-into-one.patch
-kernel-time-make-tick_do_broadcast-static.patch
-git-scsi-misc-fix-build-in-drivers-scsi-scsi_tgt_libc.patch
-initio-fix-conflict-when-loading-driver.patch
-ips-remove-ips_ha-members-that-duplicate-struct-pci_dev-members.patch
-ips-trim-trailing-whitespace.patch
-ips-pci-api-cleanups.patch
-ips-handle-scsi_add_host-failure-and-other-err-cleanups.patch
-scsi-gdth-kill-unneeded-irq-argument.patch
-scsi-sym53c416-kill-pointless-irq-handler-loop-and-test.patch
-scsi-ncr5380-minor-irq-handler-cleanups.patch
-advansys-fix-section-mismatch-warning.patch
-aic94-fix-section-mismatches.patch
-sym2-fix-section-mismatch-warning.patch
-aacraid-driver-fails-with-dell-poweredge-expandable-raid-controller-3-di.patch
-drivers-scsi-sgiwd93c-export-sgiwd93_reset.patch
-hptiop-add-more-adapter-models-and-other-fixes.patch
-hptiop-add-more-adapter-models-and-other-fixes-update.patch
-hptiop-add-more-adapter-models-and-other-fixes-fix-2.patch
-drivers-scsi-iprc-use-list_head-instead-of-list_head_init.patch
-libsas-convert-ata-bridge-to-use-new-eh-checkpatch-fixes.patch
-belkin_sa-clean-up-for-new-style-termios-and-speed.patch
-keyspan_pda-clean-up-speed-handling.patch
-mct232-speed-new-termios-and-compliance-cleanups.patch
-mct232-speed-new-termios-and-compliance-cleanups-fix.patch
-ohci-hcdcohci_irq-locking-fix.patch
-edgeport-usb-serial-converter-convert-es_sem-to-mutex.patch
-usb-testing-driver-convert-dev-sem-to-mutex.patch
-usb-testing-driver-dont-free-a-locked-mutex.patch
-usb-mon-nopage.patch
-txx9-watchdog-driver.patch
-net-mac80211-fix-inappropriate-memory-freeing.patch
-wireless-libertas-dont-cast-a-pointer-to-pointer-of-list_head.patch
-git-x86-__vdso_getcpu-warning-fix.patch
-git-x86-fix-allnoconfig-build.patch
-uml-add-asm-um-asmh.patch
-x86_64-add-acpi-reboot-option.patch
-clocksource-make-clocksource_mask-bullet-proof.patch
-time-fold-__get_realtime_clock_ts-into-getnstimeofday.patch
-mcheck-mce_64-mce_read_sem-to-mutex.patch
-x86_64-efi-runtime-service-support-efi-basic-runtime-service-support.patch
-x86_64-efi-runtime-service-support-efi-basic-runtime-service-support-fixes.patch
-x86_64-efi-runtime-service-support-efi-basic-runtime-service-support-calling-convention-fix.patch
-x86_64-efi-runtime-service-support-efi-runtime-services.patch
-x86_64-efi-runtime-service-support-document-for-efi-runtime-services.patch
-x86_64-efi-runtime-service-support-remove-duplicated-code-from-efi_32c.patch
-x86-boot-use-e820-memory-map-on-efi-32-platform.patch
-ieee80211_rate-missed-unlock.patch
-drivers-cpufreq-cpufreq_statsc-section-fix.patch
bonding-locking-fix.patch
-bridge-assign-random-address.patch
-nfs-fix-an-oops-in-nfs-unmount.patch
-acpi-sbs-reset-alarm-bit.patch
-acpi-sbs-ignore-alarms-coming-from-unknown-devices.patch
-acpi-sbs-return-rate-in-mw-if-capacity-in-mwh.patch
-usb-use-irqf_disabled-for-hcd-interrupt-handlers.patch
-usb-at91_udc-correct-hanging-while-disconnecting-usb-cable.patch
-iwlwifi3945-4965-fix-rate-control-algo-reference-leak.patch
-iwlwifi3945-4965-fix-rate-control-algo-reference-leak-fix.patch
-mm-sparsec-check-the-return-value-of-sparse_index_alloc.patch
-mm-sparsec-improve-the-error-handling-for-sparse_add_one_section.patch
-mm-sparsec-improve-the-error-handling-for-sparse_add_one_section-fix.patch
-pktcdvd-add-kobject_put-when-kobject-register-fails.patch
-libertas-select-wireless_ext.patch
-bcm43xx_debugfs-sscanf-fix.patch
-apm_eventinfo_t-are-userspace-types.patch
-drivers-macintosh-via-pmuc-added-a-missing-iounmap.patch
-tmpfs-fix-mounts-when-size-is-less-than-the-page-size.patch
-shmem-factor-out-sbi-free_inodes-manipulations.patch
-shmem-factor-out-sbi-free_inodes-manipulations-fix.patch
-i-oat-fixups-from-code-comments.patch
-rcu-move-three-variables-to-__read_mostly-to-save-space.patch
-ext4-fix-mb_debug-format-warnings.patch
-jbd2-remove-printk-from-j_assert-macros.patch
-64-bit-i_version-afs-fixes.patch
-ext4-fix-freespace-accounting-with-mballoc-on-32bit-machines.patch
-ext4-fix-oops-with-jbd-stats-through-procfs-and-external.patch
-ext4-superc-fix-ifdefs.patch
-ext4-add-block-bitmap-validation.patch
-ext4-fix-up-ext4fs_debug-builds.patch
-jbd2-fix-assertion-failure-in-fs-jbd2-checkpointc.patch
-ext4-check-for-the-correct-error-return-from-ext4_ext_get_blocks.patch
-ext4-check-for-the-correct-error-return-from-ext4_ext_get_blocks-fix.patch
-drivers-dma-iop-admac-use-list_head-instead-of-list_head_init.patch

Merged into mainline or a subsystem tree

+quicklists-do-not-release-off-node-pages-early.patch
+ecryptfs-fix-string-overflow-on-long-cipher-names.patch
+fix-computation-of-skb-size-for-quota-messages.patch
+dont-send-quota-messages-repeatedly-when-hardlimit-reached.patch
+ecryptfs-fix-unlocking-in-error-paths.patch
+ecryptfs-redo-dgetmntget-on-dentry_open-failure.patch
+maintainers-mailing-list-archives-are-web-links.patch
+ps3-vuart-fix-error-path-locking.patch
+lib-proportion-fix-underflow-in-prop_norm_percpu.patch
+pcmcia-remove-pxa2xx_lubbock-build-warning.patch
+kconfig-obey-kconfig_allconfig-choices-with-randconfig.patch

2.6.24 queue

+fix-crash-with-flat_memory-and-arch_pfn_offset-=-0.patch
+hfs-handle-more-on-disk-corruptions-without-oopsing.patch
+hfs-handle-more-on-disk-corruptions-without-oopsing-fix.patch
+tty-fix-logic-change-introduced-by-wait_event_interruptible_timeout.patch

Maybe 2.6.24 queue

+timerfd-v3-new-timerfd-api-make-hrtimer_forward-to-return-a-u64.patch
+timerfd-v3-new-timerfd-api-make-the-returned-time-to-be-the-remaining-time-till-the-next-expiration.patch
+timerfd-v3-new-timerfd-api-make-the-returned-time-to-be-the-remaining-time-till-the-next-expiration-checkpatch-fixes.patch

update timerfd patches

+git-alsa-fixup.patch
+sound-usb-usbaudioc-fix-build-with-config_pm=n.patch

Fix git-alsa

+git-agpgart-intel-agp-dont-zero-an-already-registered-resource-during-resume.patch

Fix crash in git-agpgart.patch

-arm-remove-dead-config-symbols-from-arm-code.patch

Dropped

+agk-dm-dm-snapshot-use-uninitialized_var.patch
+agk-dm-dm-raid1-handle-write-failures.patch
+agk-dm-dm-raid1-report-fault-status.patch
+agk-dm-dm-raid1-fix-eio-after-log-failure.patch
+agk-dm-dm-raid1-handle-read-failures.patch
+agk-dm-dm-raid1-mark-and-clear-nosync-writes.patch

device-mapper tree updates

+powerpc-add-fixed-phy-support-for-fs_enet.patch

pwerpc net driver fix

+gregkh-driver-kref-add-kref_set.patch
+gregkh-driver-kobject-convert-sys-firmware-acpi-to-use-kobject_create.patch
+gregkh-driver-kobject-change-net-bridge-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-gfs2-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-infiniband-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-firmware-eddc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-firmware-efivarsc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-cpufreq-cpufreqc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-edac-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-cpuidle-sysfsc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-pci-hotplug-pci_hotplug_corec-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-base-sysc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-intel_cacheinfoc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-acpi-systemc-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-drivers-block-pktcdvdc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-sh-kernel-cpu-sh4-sqc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-net-ibmvethc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-parisc-pdc_stablec-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-ia64-kernel-topologyc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-md-mdc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-mcheck-mce_amd_64c-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-mcheck-mce_amd_64c-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-the-cris-iop_fw_loadc-code-is-broken.patch
+gregkh-driver-kobject-convert-drivers-base-classc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-base-corec-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-net-iseries_vethc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-fs-char_devc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-paramsc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-userc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-mm-slubc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-net-bridge-br_ifc-to-use-kobject_init-add_ng.patch
+gregkh-driver-driver-add-driver_add_kobj-for-looney-iseries_veth-driver.patch
+gregkh-driver-kobject-change-drivers-base-bus-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-convert-block-elevatorc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-block-ll_rw_blkc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-md-mdc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-modulec-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-remove-kobject_add-as-no-one-uses-it-anymore.patch
+gregkh-driver-kobject-rename-kobject_add_ng-to-kobject_add.patch
+gregkh-driver-kobject-remove-kobject_init-as-no-one-uses-it-anymore.patch
+gregkh-driver-kobject-rename-kobject_init_ng-to-kobject_init.patch
+gregkh-driver-kobject-remove-kobject_register.patch
+gregkh-driver-kset-remove-kset_add-function.patch
+gregkh-driver-kobject-auto-cleanup-on-final-unref.patch
+gregkh-driver-kobject-convert-arch-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-drivers-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-fs-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-remaining-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-remove-kobject_unregister-as-no-one-uses-it-anymore.patch
+gregkh-driver-driver-core-change-sysdev-classes-to-use-dynamic-kobject-names.patch
+gregkh-driver-kobject-remove-old-outdated-documentation.patch
+gregkh-driver-kobject-update-the-kobject-kset-documentation.patch
+gregkh-driver-kobject-add-sample-code-for-how-to-use-kobjects-in-a-simple-manner.patch
+gregkh-driver-kobject-add-sample-code-for-how-to-use-ksets-ktypes-kobjects.patch
+gregkh-driver-driver-core-use-list_head-instead-of-call-to-init_list_head-in-__init.patch

driver tree updates

+revert-gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch
+drivers-pcmcia-i82092c-fix-up-after-pci_bus_region-changes.patch

Fix it

+driver-base-memory-semaphore-to-mutex.patch

mutex conversion

-git-drm-oops-fix.patch

Now unneeded

+jdelvare-i2c-i2c-omap-fix-reset-on-error.patch
+jdelvare-i2c-i2c-spelling-fixes.patch
+jdelvare-i2c-i2c-tps65010-move-header.patch
+jdelvare-i2c-i2c-i801-01-document-features.patch
+jdelvare-i2c-i2c-i801-02-features-as-a-bitfield.patch
+jdelvare-i2c-i2c-i801-03-clear-block-buffer-mode.patch
+jdelvare-i2c-i2c-i801-04-add-support-for-i2c-block-read.patch
+jdelvare-i2c-i2c-id-document-optional.patch
+jdelvare-i2c-i2c-id-delete-unused.patch

I2C tree updates

+ia64-remove-dead-code.patch
+ia64-honor-notify_die-returning-notify_stop.patch

ia64 updates

+git-infiniband-versus-driver-tree.patch

+input-handle-ev_pwr-in-input_set_capability.patch

Input fix

+kvm-ist-kaput.patch

Disable KVM due to large clashes with git-x86

+git-lblnet-fixup.patch

Fix rejects in git-lblnet.patch

+git-libata-all-fix-pata_winbond-borkage.patch
+git-libata-all-wtf.patch

Fix git-libata-all.

-libata-xfer_mask-is-unsigned-int-not-unsigned-long-fix.patch

Fiolded into libata-xfer_mask-is-unsigned-int-not-unsigned-long.patch

+ide-mm-ide-spelling-fixes.patch
+ide-mm-hpt366-merge-set_dma_mode-methods.patch
+ide-mm-ide-fix-build-break-caused-by-ide-remove-ideprobe_init.patch
+ide-mm-ide-fix-io_32bit-race-in-ide_taskfile_ioctl.patch
+ide-mm-ide-clear-hob-bit-for-req_type_ata_cmd-requests-in-ide_end_drive_cmd.patch
+ide-mm-ide-fix-final-status-check-in-task_in_intr.patch
+ide-mm-ide-tape-fix-handling-of-non-special-requests-in-end_request-method.patch
+ide-mm-ide-set-ide_tflag_in-flags-before-queuing-executing-command.patch
+ide-mm-ide-remove-needless-cursg-clearing-from-task_end_request.patch
+ide-mm-ide-use-rq-nr_sectors-in-task_end_request.patch
+ide-mm-ide-task_end_request-fix.patch
+ide-mm-ide-kill-data_ready-define.patch
+ide-mm-ide-use-wait_drive_not_busy-in-drive_cmd_intr-take-2.patch
+ide-mm-ide-initialize-rq-cmd_type-in-ide_init_drive_cmd-callers.patch
+ide-mm-ide-convert-empty-req_type_ata_cmd-requests-to-use-req_type_ata_taskfile.patch
+ide-mm-ide-dont-enable-local-irqs-for-pio-in-in-driver_cmd_intr-take-2.patch
+ide-mm-ide-check-busy-and-error-status-bits-before-reading-data-in-drive_cmd_intr.patch
+ide-mm-ide-fix-final-status-check-in-drive_cmd_intr.patch
+ide-mm-ide-switch-set_xfer_rate-to-use-req_type_ata_taskfile-requests.patch
+ide-mm-ide-switch-ide_cmd_ioctl-to-use-req_type_ata_taskfile-requests.patch
+ide-mm-ide-remove-req_type_ata_cmd.patch
+ide-mm-ide-cd-fix-samsung-cd-rom-scr-3231-quirk.patch
+ide-mm-ide-cd-fix-acer-aopen-24x-cdrom-speed-reporting-on-big-endian-machines.patch
+ide-mm-ide-cd-use-ide_cd_release-in-ide_cd_probe.patch
+ide-mm-ide-cd-fix-error-messages-in-cdrom_read-write_check_ireason.patch
+ide-mm-ide-cd-add-missing-ireason-masking-to-cdrom_write_intr.patch
+ide-mm-ide-cd-fix-error-messages-in-cdrom_write_intr.patch
+ide-mm-ide-cd-add-error-message-for-dma-error-to-cdrom_read_intr.patch
+ide-mm-ide-cd-fix-error-message-in-cdrom_pc_intr.patch
+ide-mm-ide-cd-fix-ireason-reporting-in-cdrom_pc_intr.patch
+ide-mm-ide-cd-use-xfer_func_t-in-cdrom_pc_intr.patch
+ide-mm-ide-cd-add-ide_cd_pad_transfer-helper.patch
+ide-mm-ide-cd-fix-missing-data-handling-in-cdrom_pc_intr.patch
+ide-mm-ide-cd-fix-dma-error-handling-in-cdrom_newpc_intr.patch
+ide-mm-ide-cd-fix-trailing-whitespaces-in-changelog.patch
+ide-mm-ide-cd-move-historical-changelog-to-documentation-ide-changelog-ide-cd-1994-2004.patch
+ide-mm-ide-cd-remove-stale-cdrom_transfer_packet_command-comment.patch
+ide-mm-ide-cd-remove-unused-defines-from-ide-cd-h.patch
+ide-mm-ide-cd-remove-dead-code-from-cdrom_pc_intr.patch
+ide-mm-ide-cd-remove-unused-struct-atapi_cdrom_subchnl.patch
+ide-mm-ide-cd-remove-needless-zeroing-of-info-fields-from-ide_cdrom_setup.patch
+ide-mm-ide-cd-remove-unused-and-write-only-struct-ide_cd_config_flags-fields.patch
+ide-mm-ide-cd-remove-struct-atapi_mechstat_header-changer_info-slot.patch
+ide-mm-ide-cd-cleanup-ide_cdrom_update_speed.patch
+ide-mm-ide-cd-add-ide_cd_capabilities-define.patch
+ide-mm-ide-cd-remove-redundant-config-flags.patch
+ide-mm-ide-cd-kill-cdrom_config_flags-macro.patch
+ide-mm-ide-cd-kill-cdrom_state_flags-macro.patch
+ide-mm-ide-cd-remove-struct-atapi_capabilities_page.patch
+ide-mm-ide-cd-remove-struct-ide_cd_config-state_flags.patch
+ide-mm-ide-cd-remove-no_door_locking-define.patch
+ide-mm-ide-cd-remove-standard_atapi-define.patch
+ide-mm-ide-cd-use-bcd2bin-bin2bcd-macros-from-linux-bcd-h.patch
+ide-mm-ide-cd-re-organize-handling-of-quirky-devices.patch
+ide-mm-ide-cd-remove-duplicate-sense-keys-definitions-from-ide-cd-h.patch
+ide-mm-ide-cd-coding-style-fixes-for-verbose_ide_cd_errors-code.patch
+ide-mm-ide-cd-move-verbose_ide_cd_errors-code-to-ide-cd_verbose-c.patch
+ide-mm-ide-cd-factor-out-ioctl-handlers-from-ide_cdrom_audio_ioctl.patch
+ide-mm-ide-cd-merge-cdrom_play_audio-into-ide_cd_fake_play_trkind.patch
+ide-mm-ide-cd-merge-cdrom_read_subchannel-into-ide_cdrom_get_mcn.patch
+ide-mm-ide-cd-merge-cdrom_select_speed-into-ide_cdrom_select_speed.patch
+ide-mm-ide-cd-move-lba_to_msf-and-msf_to_lba-to-linux-cdrom-h.patch
+ide-mm-ide-cd-coding-style-fixes-for-cdrom_get_toc_entry.patch
+ide-mm-ide-cd-rename-cdrom_-functions-to-ide_cd_.patch
+ide-mm-ide-cd-move-code-handling-cdrom-c-ioctls-to-ide-cd_ioctl-c.patch
+ide-mm-ide-cd-remove-bug_on-from-cdrom_newpc_intr.patch
+ide-mm-ide-cd-call-blk_dump_rq_flags-on-missing-data-in-cdrom_newpc_intr.patch
+ide-mm-ide-cd-factor-out-request-sense-fixup-from-cdrom_pc_intr.patch
+ide-mm-ide-cd-unify-request-end-exit-path-in-cdrom_pc_intr.patch
+ide-mm-ide-cd-merge-cdrom_pc_intr-and-cdrom_newpc_intr.patch
+ide-mm-ide-cd-remove-cdrom_do_pc_continuation.patch
+ide-mm-ide-cd-merge-cdrom_do_packet_command-and-cdrom_do_block_pc.patch
+ide-mm-ide-cd-add-ide_cd_drain_data-helper.patch
+ide-mm-ide-cd-factor-out-transfer-size-checking-from-cdrom_read_intr.patch
+ide-mm-ide-cd-merge-cdrom_read_intr-and-cdrom_write_intr.patch
+ide-mm-ide-cd-merge-cdrom_start_read_continuation-and-cdrom_start_write_cont.patch
+ide-mm-ide-cd-merge-cdrom_start_read-and-cdrom_start_write.patch
+ide-mm-ide-cd-unify-moving-to-the-next-buffer-in-cdrom_rw_intr.patch
+ide-mm-ide-cd-prepare-cdrom_rw_intr-and-cdrom_newpc_intr-to-be-merged.patch
+ide-mm-ide-cd-call-blk_dump_rq_flags-on-missing-data-in-cdrom_rw_intr.patch
+ide-mm-ide-cd-merge-cdrom_rw_intr-and-cdrom_newpc_intr.patch
+ide-mm-ide-cd-merge-cdrom_write_check_ireason-and-cdrom_read_check_ireason.patch
+ide-mm-ide-cd-unify-request-end-exit-path-in-cdrom_decode_status.patch
+ide-mm-ide-cd-update-driver-version-comments-and-copyrights.patch

IDE tree updates

+git-net-fixup.patch

Fix rejects in git-net.patch

+git-net-vs-git-lblnet-2.patch

Fix disagreement between git-lblnet and git-net.

+git-net-vs-git-netdev-all.patch

Fix disagreement between git-netdev-all and git-net.

-backlight-omap1-backlight-driver-fix.patch

Fix backlight-omap1-backlight-driver.patch

+pcmcia-3c574_cs-fix-dubious-bitfield-warning.patch
+pcmcia-3c574_cs-fix-shadow-variable-warning.patch
+pcmcia-axnet_cs-make-functions-static.patch
+pcmcia-axnet_cs-make-use-of-max-instead-of-handcrafted-one.patch
+pcmcia-fmvj18x_cs-fix-shadow-variable-warning.patch
+pcmcia-pcnet_cs-fix-shadow-variable-warning.patch

pcmcia fixes

+serial-add-addi-data-gmbh-communication-cardsin8250_pcic-and-pci_idsh.patch
+serial-add-addi-data-gmbh-communication-cardsin8250_pcic-and-pci_idsh-checkpatch-fixes.patch

Serial device support

+gregkh-pci-pcie-fix-pcie-hotplug-so-that-it-works-with-expresscard-slots-on-dell-notebooks-in-conjunction-with-modparam-of-pciehp_force-1.patch
+gregkh-pci-pci-more-fixes-for-pcie-hotplug-so-that-it-works-with-expresscard-slots-on-dell-notebooks-in-conjunction-with-modparam-of-pciehp_force-1.patch
+gregkh-pci-pcie-make-use-of-the-previously-split-out-pcie_init_enable_events-function.patch
+gregkh-pci-pcie-fix-double-initialization-bug.patch
+gregkh-pci-pci-hotplug-acpiphp-fix-trivial-typos.patch
+gregkh-pci-pci-hotplug-acpiphp-remove-unneeded-acpi_get_name-function-call.patch
+gregkh-pci-pci-hotplug-pciehp-remove-needless-members-from-struct-controller.patch
+gregkh-pci-pci-hotplug-pciehp-remove-needless-hp_slot-calculation.patch
+gregkh-pci-pci-hotplug-pciehp-use-generic-function-to-find-ext-capability.patch
+gregkh-pci-pci-hotplug-pciehp-fix-some-whitespace-damage.patch
+gregkh-pci-pci-fix-bus-resource-assignment-on-32-bits-with-64b-resources.patch
+gregkh-pci-pci-fix-warning-in-setup-resc-on-32-bit-platforms-with-64-bit-resources.patch
+gregkh-pci-pci-remove-default-pci-expansion-rom-memory-allocation.patch
+gregkh-pci-pci-quirk-enable-msi-mapping-on-ht1000.patch
+gregkh-pci-pci-drivers-pci-msic-move-arch-hooks-to-the-top.patch
+gregkh-pci-pci-kconfig-help-don-t-refer-to-the-pci-howto.patch
+gregkh-pci-pci-spelling-fixes.patch
+gregkh-pci-pci-fix-for-quirk_e100_interrupt.patch
+gregkh-pci-pci-print-quirk-name-in-debug-messages.patch
+gregkh-pci-pci-use-dev_printk-in-quirk-messages.patch
+gregkh-pci-pci-use-dev_printk-in-x86-quirk-messages.patch
+gregkh-pci-pci-fix-typo-in-pci_save_pcix_state.patch
+gregkh-pci-pci-correctly-initialize-a-structure-for-pcie_save_pcix_state.patch
+gregkh-pci-pci-avoid-save-the-same-type-of-cap-multiple-times.patch
+gregkh-pci-pci-add-pci_enable_device_-io-mem-intefaces.patch
+gregkh-pci-pci-remove-users-of-pci_enable_device_bars.patch
+gregkh-pci-pci-remove-pci_enable_device_bars.patch

PCI tree updates

-quirk-enable-msi-mapping-on-ht1000.patch
-quirk-enable-msi-mapping-on-ht1000-v2.patch


Dropped

+if-0-pci_cleanup_aer_correct_error_status.patch
+cleanup-gregkh-pci-pci-fix-bus-resource-assignment-on-32-bits-with-64b-resources.patch

PCI cleanups

-pci-hotplug-mm-pci-hotplug-pciehp-deal-with-pre-inserted-expresscards.patch
-pci-hotplug-mm-pci-hotplug-pciehp-split-out-hardware-init-from-pcie_init.patch
-pci-hotplug-mm-pci-hotplug-pciehp-reinit-hotplug-h-w-on-resume-from-suspend.patch

Not sure what happened to these - they don't apply by a mile and don't seem
to have been merged.

+git-sched-fixup.patch
+git-sched-fix-preempt-rcu-on-non-preemptible-architectures.patch

Repair git-sched.patch

+scsi-megaraidc-__devexit-annotation.patch
+scsi-aic94xx-cleanups.patch
+scsi-aic94xx-cleanups-checkpatch-fixes.patch
+scsi-aic94xx-cleanups-checkpatch-fixes-checkpatch-fixes.patch
+small-cleanups-for-scsi_hosth.patch

scsi stuff

+scsi-scsi_data_buffer.patch
+scsi-pending-arm-convert-to-accessors.patch
+scsi-bidi-support.patch

More scsi stuff

+gregkh-usb-usb-unbreak-fsl_usb2_udc.patch
+gregkh-usb-usb-vid-pid-update-for-sierra.patch
+gregkh-usb-usb-new-device-id-for-the-cp2101-driver.patch
+gregkh-usb-usb-convert-ohci-debug-files-to-use-debugfs-instead-of-sysfs.patch
+gregkh-usb-usb-convert-ehci-debug-files-to-use-debugfs-instead-of-sysfs.patch
+gregkh-usb-usb-remove-ohci-useless-masking-unmasking-of-wdh-interrupt.patch
+gregkh-usb-usb-repair-usbdevfs_connect-ioctl.patch
+gregkh-usb-usb-updates-to-usb_reset_composite_device.patch
+gregkh-usb-usb-edgeport-usb-serial-converter-convert-es_sem-to-mutex.patch
+gregkh-usb-usb-add-usbfs-stubs-for-suspend-and-resume.patch
+gregkh-usb-usb-ehci-add-separate-iaa-watchdog-timer.patch
+gregkh-usb-usb-dummy_hcd-change-the-default-power-budget.patch
+gregkh-usb-usb-pl2303-cleanup-fish-and-soup-macros-in-pl2303-driver.patch
+gregkh-usb-usb-pl2303-move-pl2303-vendor-specific-init-to-probe-function.patch
+gregkh-usb-usb-pl2303-add-autosuspend-support-to-pl2303-usb-serial-converter.patch
+gregkh-usb-usb-update-pxa27x-ohci-driver-to-use-clk-support.patch
+gregkh-usb-usb-belkin_sa-clean-up-for-new-style-termios-and-speed-handling-plus-style.patch
+gregkh-usb-usb-keyspan_pda-clean-up-speed-handling.patch
+gregkh-usb-usb-mct232-speed-new-termios-and-compliance-cleanups.patch
+gregkh-usb-usb-mon-nopage.patch
+gregkh-usb-usb-testing-driver-convert-dev-sem-to-mutex.patch
+gregkh-usb-usb-testing-driver-don-t-free-a-locked-mutex.patch
+gregkh-usb-usb-gadget-pxa2xx_udc-supports-inverted-vbus.patch
+gregkh-usb-usb-spelling-fixes.patch
+gregkh-usb-usb-ps3-fix-ehci-iso-transfer-bug.patch
+gregkh-usb-usb-usb-storage-initializersc-fix-signedness-difference.patch
+gregkh-usb-usb-usbdevfs_urb-__user-annotation.patch
+gregkh-usb-usb-ehci-hcd-fix-sparse-warning-about-shadowing-status-symbol.patch
+gregkh-usb-usb-add-marvell-orion-usb-host-support.patch
+gregkh-usb-usb-ehci-potential-oops-fix-on-arc-tdi-cores.patch
+gregkh-usb-usb-gadget-ethernet-error-path-potential-oops-fix.patch
+gregkh-usb-usb-fix-null-pointer-dereference-on-drivers-usb-serial-whiteheatc.patch
+gregkh-usb-usb-gadget-at91_udc-minor-fix.patch
+gregkh-usb-usb-fix-hcd-kconfig-goofage.patch
+gregkh-usb-usb-tosa_udc_use_gpio_vbuspatch.patch

USB tree updates

+ehci-hcd-fix-sparse-warning-about-shadowing-status-symbol-checkpatch-fixes.patch
+usb-microtek-remove-unused-semaphore.patch
+usb-libusual-locking-cleanup.patch

USB things

+git-watchdog-fixup.patch

Fix rejects in git-watchdog

-add-support-for-sb1-hardware-watchdog-fix.patch

Folded into add-support-for-sb1-hardware-watchdog.patch

+prism54-remove-questionable-down_interruptible-usage.patch

wireless fix

+git-x86-fixup.patch
+git-x86-arch-x86-math-emu-errorsc-fix-printk-warnings.patch
+git-x86-drivers-pnp-pnpbios-bioscallsc-build-fix.patch
+git-x86-fix-doubly-merged-patch.patch
+git-x86-export-leave_mm.patch

Partially repair git-x86

+pci-dont-load-acpi_php-when-acpi-is-disabled.patch
+arch-x86-kernel-cpu-mcheck-p4c-kernel-2624-rc5.patch
+arch-x86-kernel-cpu-mcheck-p4c-kernel-2624-rc5-checkpatch-fixes.patch
+arch-x86-kernel-cpu-mcheck-p4c-kernel-2624-rc5-checkpatch-fixes-checkpatch-fixes.patch

x86 stuff

+drm-i915-fix-oops-after-killing-x.patch
+usbtouchscreen-fix-buffer-overflow-make-more-egalax-work.patch
+usbtouchscreen-fix-buffer-overflow-make-more-egalax-work-checkpatch-fixes.patch
+fix-rtc_aie-with-config_hpet_emulate_rtc.patch
+cpufreq-initialise-default-governor-before-use.patch

Probably for 2.6.24, via subsystem trees

-slub-optimise-the-clearing-of-__gfp_zero.patch

Unneeded

+shmem-factor-out-sbi-free_inodes-manipulations.patch
+shmem-factor-out-sbi-free_inodes-manipulations-fix.patch
+tmpfs-fix-mounts-when-size-is-less-than-the-page-size.patch
+tmpfs-move-swap_state-stats-update.patch
+tmpfs-shuffle-add_to_swap_caches.patch
+tmpfs-move-swap-swizzling-into-shmem.patch
+tmpfs-allow-filepage-alongside-swappage.patch
+tmpfs-allocate-on-read-when-stacked.patch
+tmpfs-make-shmem_unuse-more-preemptible.patch
+tmpfs-open-a-window-in-shmem_unuse_inode.patch
+tmpfs-radix_tree_preloading.patch
+tmpfs-fix-shmem_swaplist-races.patch

MM things

+maps4-add-proc-kpagecount-interface-fix.patch
+maps4-add-proc-kpageflags-interface-fix.patch
+maps4-add-proc-kpageflags-interface-fix-2.patch
+maps4-add-proc-kpageflags-interface-fix-2-fix.patch

maps4 fixes

+page-allocator-clean-up-pcp-draining-functions-swsusp-fix.patch
+page-allocator-clean-up-pcp-draining-functions-swsusp-fix-fix.patch

Fix page-allocator-clean-up-pcp-draining-functions.patch

+mm-remove-fastcall-from-mm.patch
+mm-remove-fastcall-from-mm-checkpatch-fixes.patch
+set_page_refcounted-vm_bug_on-fix.patch
+fix-dirty-page-accounting-leak-with-ext3-data=journal.patch
+oom_kill-remove-uid==0-checks.patch

More mm things

+m68knommu-remove-duplicate-exports.patch

m68knommu cleanup

+arch-cris-arch-v10-vmlinuxldss-fix-boot-problem.patch
+cris-remove-unused-__dummy-const_addr-and-addr-from-bitopsh.patch

cris updates

+uml-header-untangling-fix.patch

Fix uml-header-untangling.patch

+ik8-add-dell-uk-6400-inspiron-model-mm061.patch
+parport_pc-detection-for-superio-it87xx-post.patch
+lib-extablec-removes-an-expensive-integer-divide-in-search_extable.patch
+kernel-paramsc-remove-sparse-warning-different-signedness.patch
+fix-missing-n-in-checkpatchpl.patch
+xen-fiddle_vdso-must-be-__init.patch
+calibrate_delay-must-be-__cpuinit.patch
+idle_regs-must-be-__cpuinit.patch
+kernel-sysc-get-rid-of-expensive-divides-in-groups_sort.patch
+debug_smp_processor_id-fixlets.patch
+use-ilog2-in-fs-namespacec.patch
+use-ilog2-in-fs-namespacec-fix.patch
+printk_ratelimit-functions-should-use-config_printk.patch
+w1-gpio-add-gpio-w1-bus-master-driver.patch
+docs-convert-kref-semaphore-to-mutex.patch
+fix-ixany-and-restart-after-signal-eg-ctrl-c-in-n_tty-line-discipline.patch
+maintainers-remove-adam-fritzler-update-his-email-address-in-other-sources.patch
+avoid-overflows-in-kernel-timec.patch
+avoid-overflows-in-kernel-timec-fix.patch
+export-iov_shorten-for-ext4s-use.patch
+export-iov_shorten-for-ext4s-use-fix.patch

Misc

+ser_gigaset-convert-mutex-to-completion.patch

gigaset cleanup

+ecryptfs-remove-debug-as-mount-option-and-warn-if-set-via-modprobe.patch
+ecryptfs-minor-fixes-to-printk-messages.patch
+ecryptfs-change-the-type-of-cipher_code-from-u16-to-u8.patch
+ecryptfs-load-each-file-decryption-key-only-once.patch

ecryptfs work

-logo-move-declarations-of-logos-to-linux_logoh.patch
-logo-move-declarations-of-logos-to-linux_logoh-fix.patch

Dropped these - it was too much work trying to make them work.

+drivers-video-pm3fbc-section-fix.patch
+neofb-avoid-overwriting-fb_info-fields.patch

fbdev things

+md-support-external-metadata-for-md-arrays.patch
+md-give-userspace-control-over-removing-failed-devices-when-external-metdata-in-use.patch
+md-allow-a-maximum-extent-to-be-set-for-resyncing.patch
+md-allow-devices-to-be-shared-between-md-arrays.patch
+md-lock-address-when-changing-attributes-of-component-devices.patch
+md-allow-an-md-array-to-appear-with-0-drives-if-it-has-external-metadata.patch

RAID updates

-ext4-mm-ext4_grpnum_t.patch
-ext4-mm-ext4_grpnum_t_int_fix.patch
-ext4-mm-ext4-cleanup.patch
-ext4-mm-ext4-cleanup-2.patch
-ext4-mm-ext4-cleanup-3.patch
-ext4-mm-ext4-cleanup-4.patch
+ext4-mm-ext4_extents_use_ext4_lblk_t_fix.patch
+ext4-mm-ext4_extents_remove_unneeded_casts.patch
+ext4-mm-ext4_grp_t.patch
+ext4-mm-ext4_grp_t_int_fix.patch
+ext4-mm-ext4_add_update_incompat_feature.patch
+ext4-mm-ext4-sparse-warning-fix.patch
+ext4-mm-ext4-rename-i_file_acl-to-i_file_acl_lo.patch
+ext4-mm-ext4-rename-i_dir_acl-to-i_size_high.patch
+ext4-mm-ext4_different_maxbytes_funcs_for_bitmap_and_extent_files.patch
+ext4-mm-ext4_store_maxbytes_for_bitmaped_files.patch
+ext4-mm-ext4_ifdef_fix.patch
+ext4-mm-ext4_fix_oops_on_corrupted_mount.patch
+ext4-mm-change-default-ext4-error.patch
+ext4-mm-ext4_add_block_bitmap_validation.patch
+ext4-mm-ext4-add-block-bitmap-validation-fix.patch
+ext4-mm-jbd2-remove-printk-from-j_assert-macros.patch
+ext4-mm-jbd2-fix-assertion-failure-in-fs-jbd2-checkpointc.patch
+ext4-mm-ext4-check-for-the-correct-error-return-from-ext4_ext_get_blocks.patch
+ext4-mm-ext4-check-for-the-correct-error-return-from-ext4_ext_get_blocks-fix.patch
+ext4-mm-remove-unused-code-from-ext4_find_entry.patch
+ext4-mm-jbd-stats-through-procfs-with-external-journal-oops-fix.patch
+ext4-mm-ext4_jbd2_stats_kmalloc_failure_fix.patch
+ext4-mm-ext4_jbd2_stats_comments_fix.patch
+ext4-mm-ext4_accumulated_jbd2_stats_in_jiffies.patch
+ext4-mm-ext4_open-code-jbd2_stats-union-references.patch
-ext4-mm-64-bit-i_version.patch
-ext4-mm-i_version_hi.patch
-ext4-mm-ext4_i_version_hi_2.patch
-ext4-mm-i_version_update_ext4.patch
+ext4-mm-ext4_journal_chksum_highmem_fix.patch
+ext4-mm-inode-version-vfs.patch
+ext4-mm-inode-version-ext4.patch
+ext4-mm-64-bit-i_version-afs-fixes.patch
-ext4-mm-mballoc-bug-workaround.patch
-ext4-mm-ext4_grpnumt-mballoc-fix.patch
-ext4-mm-mballoc-compilebench-fix.patch
+ext4-mm-ext4-fix-mb_debug-format-warnings.patch
+ext4-mm-ext4_mballoc_freespace_accounting_fix.patch
+ext4-mm-enable-delalloc-and-mballoc.patch
+ext4-mm-show-mballoc-delalloc-option.patch
+ext4-mm-fix-show-options.patch
+ext4-mm-ext4_fix_up_ext4fs_debug_builds.patch

Changes in the ext4 tree

+ext4-mm-ext4_store_maxbytes_for_bitmaped_files-warning-fix.patch

ext4 fix

+ext3-remove-unused-code-from-ext3_find_entry.patch

ext3 cleanup

+memory-controller-memory-accounting-v7-move-page_assign_page_cgroup-to-vm_bug_on-in-free_hot_cold_page.patch
+memory-controller-use-rcu_read_lock-in-mem_cgroup_cache_charge.patch
+memcgroup-tidy-up-mem_cgroup_charge_common.patch
+memcgroup-fix-hang-with-shmem-tmpfs.patch
+memory-controller-remove-control_type-feature.patch

Memory controller updates

+iget-stop-isofs-from-using-read_inode-fix-2.patch
+iget-stop-isofs-from-using-read_inode-fix-2-update.patch

Fix iget-stop-isofs-from-using-read_inode.patch

+pid-namespaces-vs-locks-interaction.patch

namespaces fix

+pid-fix-mips-irix-emulation-pid-usage-fix.patch

Fix pid-fix-mips-irix-emulation-pid-usage.patch

+aout-move-stack_top-to-asm-processorh-fix.patch

Fix aout-move-stack_top-to-asm-processorh.patch against git-x86 changes

+fs-remove-fastcall-it-is-always-empty.patch
+fs-remove-fastcall-it-is-always-empty-checkpatch-fixes.patch
+kernel-remove-fastcall-in-kernel.patch
+kernel-remove-fastcall-in-kernel-checkpatch-fixes.patch
+lib-remove-fastcall-from-lib.patch
+lib-remove-fastcall-from-lib-checkpatch-fixes.patch
+remove-fastcall-from-linux-include.patch
+remove-fastcall-from-linux-include-checkpatch-fixes.patch
+asm-generic-remove-fastcall.patch
+misc-removal-of-final-callers-using-fastcall.patch

Start removal of fastcall

+udf-remove-wrong-prototype-of-udf_readdir.patch
+udf-improve-readability-of-do_udf_readdir.patch
+udf-fix-coding-style-of-dirc.patch
+udf-fix-3-signedness-1-unitialized-variable-warnings.patch
+udf-fix-signedness-issue.patch

UDF fixlets

+constify-tables-in-kernel-sysctl_checkc.patch
+constify-tables-in-kernel-sysctl_checkc-fix.patch

Cleanups

+aoe-bring-driver-version-number-to-47.patch
+aoe-handle-multiple-network-paths-to-aoe-device.patch
+aoe-mac_addr-avoid-64-bit-arch-compiler-warnings.patch
+aoe-clean-up-udev-configuration-example.patch
+aoe-eliminate-goto-and-improve-readability.patch
+aoe-user-can-ask-driver-to-forget-previously-detected-devices.patch
+aoe-dynamically-allocate-a-capped-number-of-skbs-when-necessary.patch
+aoe-only-install-new-aoe-device-once.patch
+aoe-add-module-parameter-for-users-who-need-more-outstanding-i-o.patch
+aoe-the-aoeminor-doesnt-need-a-long-format.patch
+aoe-make-error-messages-more-specific.patch
+aoe-update-copyright-date.patch
+aoe-statically-initialise-devlist_lock.patch

AOE driver update

+use-pgoff_t-instead-of-unsigned-long.patch

MM cleanup

+ipc-convert-handmade-min-to-min.patch

IPC cleanup

+reiser4-replace-uid==0-check-with-capability.patch

reiser4 fix



6223 commits in 1825 patch files

All patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/patch-list


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2007-12-23 11:10:08 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in
git-x86 which I stared at for a while then I ran out of time and
gave up.
hm, the fix for that is in x86.git already - perhaps you got an older
copy?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2007-12-23 11:20:13 UTC
Permalink
Post by Ingo Molnar
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in
git-x86 which I stared at for a while then I ran out of time and
gave up.
hm, the fix for that is in x86.git already - perhaps you got an older
copy?
hm, e3c1b141 is already the latest one.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2007-12-23 11:40:04 UTC
Permalink
Post by Ingo Molnar
Post by Ingo Molnar
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in
git-x86 which I stared at for a while then I ran out of time and
gave up.
hm, the fix for that is in x86.git already - perhaps you got an older
copy?
hm, e3c1b141 is already the latest one.
"already in", I assume.


You can always tell what I have by looking at the patch:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/broken-out/git-x86.patch

It includes the head commit ID at the first line (it's in a
machine-readable form - Matthias's scripts which prepare the mm git tree
actually get the git-foo.patch info direct from the original repo rather
than by applying the diff from broken-out/)

Still. The crash is 100% repeatable and is the same every time. Happens
on both my i386 test boxes.

http://userweb.kernel.org/~akpm/config-sony.txt
http://userweb.kernel.org/~akpm/config-vmm.txt

and I bisected it down to e3c1b141.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2007-12-23 12:00:20 UTC
Permalink
Post by Andrew Morton
Still. The crash is 100% repeatable and is the same every time.
Happens on both my i386 test boxes.
http://userweb.kernel.org/~akpm/config-sony.txt
http://userweb.kernel.org/~akpm/config-vmm.txt
and I bisected it down to e3c1b141.
ok, can reproduce it - the patch below fixes it for me.

Ingo

------------------------->
Subject: x86: fix system gate related crash
From: Ingo Molnar <***@elte.hu>

on 32-bit, system gates are traps.

on 64-bit, they are interrupts (which disable hardirqs).

Signed-off-by: Ingo Molnar <***@elte.hu>
---
include/asm-x86/desc.h | 4 ++++
1 file changed, 4 insertions(+)

Index: linux-x86.q/include/asm-x86/desc.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/desc.h
+++ linux-x86.q/include/asm-x86/desc.h
@@ -310,7 +310,11 @@ static inline void set_trap_gate(unsigne
static inline void set_system_gate(unsigned int n, void *addr)
{
BUG_ON((unsigned)n > 0xFF);
+#ifdef CONFIG_X86_32
+ _set_gate(n, GATE_TRAP, addr, 0x3, 0, __KERNEL_CS);
+#else
_set_gate(n, GATE_INTERRUPT, addr, 0x3, 0, __KERNEL_CS);
+#endif
}

static inline void set_task_gate(unsigned int n, unsigned int gdt_entry)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Christoph Hellwig
2007-12-23 12:20:05 UTC
Permalink
Post by Ingo Molnar
Post by Andrew Morton
Still. The crash is 100% repeatable and is the same every time.
Happens on both my i386 test boxes.
http://userweb.kernel.org/~akpm/config-sony.txt
http://userweb.kernel.org/~akpm/config-vmm.txt
and I bisected it down to e3c1b141.
ok, can reproduce it - the patch below fixes it for me.
Ingo
------------------------->
Subject: x86: fix system gate related crash
on 32-bit, system gates are traps.
on 64-bit, they are interrupts (which disable hardirqs).
---
include/asm-x86/desc.h | 4 ++++
1 file changed, 4 insertions(+)
Index: linux-x86.q/include/asm-x86/desc.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/desc.h
+++ linux-x86.q/include/asm-x86/desc.h
@@ -310,7 +310,11 @@ static inline void set_trap_gate(unsigne
static inline void set_system_gate(unsigned int n, void *addr)
{
BUG_ON((unsigned)n > 0xFF);
+#ifdef CONFIG_X86_32
+ _set_gate(n, GATE_TRAP, addr, 0x3, 0, __KERNEL_CS);
+#else
_set_gate(n, GATE_INTERRUPT, addr, 0x3, 0, __KERNEL_CS);
+#endif
}
This would be a lot cleaner with entirely separate implementations
of set_system_gate for 32 vs 64 bit. Especially if the file already
has a large ifdef block for 32 vs 64 already.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2007-12-23 12:20:09 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in git-x86
which I stared at for a while then I ran out of time and gave up.
I would have just abandoned this release until it was fixed but I'll be
largely offline for ten days starting tomorrow.
The culprits have been notified and hopefully we'll have a patch for
hot-fixes/ tomorrow.
x86_64 and powerpc work OK though.
Well it doesn't build on x86-64 for me:

CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Assembler messages:
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2

I will post the .config if anyone is interested.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2007-12-23 13:10:07 UTC
Permalink
Post by Rafael J. Wysocki
CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2
I will post the .config if anyone is interested.
yes, please send the .config.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2007-12-23 13:40:12 UTC
Permalink
Post by Ingo Molnar
Post by Rafael J. Wysocki
CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2
I will post the .config if anyone is interested.
yes, please send the .config.
Attached.
It also may be relevant that I compile the kernel with "make O=../build".
I ran the compilation once again and it worked. Strange.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Sam Ravnborg
2007-12-23 20:10:18 UTC
Permalink
Post by Rafael J. Wysocki
Post by Ingo Molnar
Post by Rafael J. Wysocki
CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2
I will post the .config if anyone is interested.
yes, please send the .config.
Attached.
It also may be relevant that I compile the kernel with "make O=../build".
I ran the compilation once again and it worked. Strange.
Try to delete your fs/ directory in your output dir.
Then I expect the same bug to surface again.

I guess it is because arch/x86/ia32/ is built before fs/ and
gcc cannot create directories for the output files and
it is the dependency files that triggers the error as this
is the first file to be generated.
The right fix is to move the build of compat_binfmt_elf to
fs/Makefile as already discussed.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2007-12-24 02:00:39 UTC
Permalink
Post by Sam Ravnborg
Post by Rafael J. Wysocki
Post by Ingo Molnar
Post by Rafael J. Wysocki
CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2
I will post the .config if anyone is interested.
yes, please send the .config.
Attached.
It also may be relevant that I compile the kernel with "make O=../build".
I ran the compilation once again and it worked. Strange.
Try to delete your fs/ directory in your output dir.
Then I expect the same bug to surface again.
It does surface indeed.
Post by Sam Ravnborg
I guess it is because arch/x86/ia32/ is built before fs/ and
gcc cannot create directories for the output files and
it is the dependency files that triggers the error as this
is the first file to be generated.
I think you are right.

Greetings,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-02 20:10:18 UTC
Permalink
Post by Rafael J. Wysocki
Try to delete your fs/ directory in your output dir. Then I expect
the same bug to surface again.
It does surface indeed.
could you try the patch from Sam below - does it fix the problem?
Thanks,

Ingo

---------->
Subject: x86 compat_binfmt_elf, Makefile fixes
From: Sam Ravnborg <***@ravnborg.org>

fix the build rules of compat-binfmt_elf.

Signed-off-by: Ingo Molnar <***@elte.hu>
---
arch/x86/Kconfig | 1 +
arch/x86/ia32/Makefile | 5 ++---
fs/Kconfig.binfmt | 10 ++++++++++
fs/Makefile | 1 +
4 files changed, 14 insertions(+), 3 deletions(-)

Index: linux-x86.q/arch/x86/Kconfig
===================================================================
--- linux-x86.q.orig/arch/x86/Kconfig
+++ linux-x86.q/arch/x86/Kconfig
@@ -1546,6 +1546,7 @@ source "fs/Kconfig.binfmt"
config IA32_EMULATION
bool "IA32 Emulation"
depends on X86_64
+ select HAVE_COMPAT_BINFMT_ELF
help
Include code to run 32-bit programs under a 64-bit kernel. You should
likely turn this on, unless you're 100% sure that you don't have any
Index: linux-x86.q/arch/x86/ia32/Makefile
===================================================================
--- linux-x86.q.orig/arch/x86/ia32/Makefile
+++ linux-x86.q/arch/x86/ia32/Makefile
@@ -2,7 +2,8 @@
# Makefile for the ia32 kernel emulation subsystem.
#

-obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_signal.o
+obj-$(CONFIG_IA32_EMULATION) := ia32entry.o sys_ia32.o ia32_signal.o \
+ ia32_binfmt.o

sysv-$(CONFIG_SYSVIPC) := ipc32.o
obj-$(CONFIG_IA32_EMULATION) += $(sysv-y)
@@ -11,5 +12,3 @@ obj-$(CONFIG_IA32_AOUT) += ia32_aout.o

audit-class-$(CONFIG_AUDIT) := audit.o
obj-$(CONFIG_IA32_EMULATION) += $(audit-class-y)
-
-obj-$(CONFIG_IA32_EMULATION) += ../../../fs/compat_binfmt_elf.o
Index: linux-x86.q/fs/Kconfig.binfmt
===================================================================
--- linux-x86.q.orig/fs/Kconfig.binfmt
+++ linux-x86.q/fs/Kconfig.binfmt
@@ -23,6 +23,16 @@ config BINFMT_ELF
ld.so (check the file <file:Documentation/Changes> for location and
latest version).

+# Archs supporting compatibility binfmt_elf shall select HAVE_COMPAT_BINFMT_ELF
+config HAVE_COMPAT_BINFMT_ELF
+
+config COMPAT_BINFMT_ELF
+ bool "Bla"
+ depends on HAVE_COMPAT_BINFMT_ELF
+ depends on MMU
+ help
+ Bla
+
config BINFMT_ELF_FDPIC
bool "Kernel support for FDPIC ELF binaries"
default y
Index: linux-x86.q/fs/Makefile
===================================================================
--- linux-x86.q.orig/fs/Makefile
+++ linux-x86.q/fs/Makefile
@@ -42,6 +42,7 @@ obj-$(CONFIG_BINFMT_ELF) += binfmt_elf.o
obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o
obj-$(CONFIG_BINFMT_SOM) += binfmt_som.o
obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o
+obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o

obj-$(CONFIG_FS_MBCACHE) += mbcache.o
obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o xattr_acl.o
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2008-01-02 20:40:09 UTC
Permalink
Post by Ingo Molnar
Post by Rafael J. Wysocki
Try to delete your fs/ directory in your output dir. Then I expect
the same bug to surface again.
It does surface indeed.
could you try the patch from Sam below - does it fix the problem?
Well, with this patch applied the compilation reliably fails with:

No rule to make target `arch/x86/ia32/ia32_binfmt.o', needed by `arch/x86/ia32/built-in.o'.

[Do you want the .config, btw?]

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-02 20:50:12 UTC
Permalink
Post by Rafael J. Wysocki
Post by Ingo Molnar
Post by Rafael J. Wysocki
Try to delete your fs/ directory in your output dir. Then I expect
the same bug to surface again.
It does surface indeed.
could you try the patch from Sam below - does it fix the problem?
No rule to make target `arch/x86/ia32/ia32_binfmt.o', needed by `arch/x86/ia32/built-in.o'.
[Do you want the .config, btw?]
i think i'll wait for Roland and Sam to sort it out.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2007-12-23 13:40:10 UTC
Permalink
Post by Ingo Molnar
Post by Rafael J. Wysocki
CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2
I will post the .config if anyone is interested.
yes, please send the .config.
Attached.

It also may be relevant that I compile the kernel with "make O=../build".

Thanks,
Rafael
H. Peter Anvin
2007-12-24 02:00:43 UTC
Permalink
Post by Rafael J. Wysocki
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in git-x86
which I stared at for a while then I ran out of time and gave up.
I would have just abandoned this release until it was fixed but I'll be
largely offline for ten days starting tomorrow.
The culprits have been notified and hopefully we'll have a patch for
hot-fixes/ tomorrow.
x86_64 and powerpc work OK though.
CHK include/linux/compile.h
CC arch/x86/ia32/../../../fs/compat_binfmt_elf.o
Fatal error: can't create arch/x86/ia32/../../../fs/.tmp_compat_binfmt_elf.o: No such file or directory
make[2]: *** [arch/x86/ia32/../../../fs/compat_binfmt_elf.o] Error 2
I will post the .config if anyone is interested.
It's a Kbuild race -- if you keep re-building it will eventually build
the right file.

Not excusable, but that's what's going on.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-23 16:30:11 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
[snip]
Post by Andrew Morton
+agk-dm-dm-snapshot-use-uninitialized_var.patch
+agk-dm-dm-raid1-handle-write-failures.patch
+agk-dm-dm-raid1-report-fault-status.patch
+agk-dm-dm-raid1-fix-eio-after-log-failure.patch
+agk-dm-dm-raid1-handle-read-failures.patch
+agk-dm-dm-raid1-mark-and-clear-nosync-writes.patch
device-mapper tree updates
[snip]
Post by Andrew Morton
+gregkh-driver-kref-add-kref_set.patch
+gregkh-driver-kobject-convert-sys-firmware-acpi-to-use-kobject_create.patch
+gregkh-driver-kobject-change-net-bridge-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-gfs2-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-infiniband-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-firmware-eddc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-firmware-efivarsc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-cpufreq-cpufreqc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-edac-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-cpuidle-sysfsc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-pci-hotplug-pci_hotplug_corec-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-base-sysc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-intel_cacheinfoc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-acpi-systemc-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-drivers-block-pktcdvdc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-sh-kernel-cpu-sh4-sqc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-net-ibmvethc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-parisc-pdc_stablec-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-ia64-kernel-topologyc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-md-mdc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-mcheck-mce_amd_64c-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-mcheck-mce_amd_64c-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-the-cris-iop_fw_loadc-code-is-broken.patch
+gregkh-driver-kobject-convert-drivers-base-classc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-base-corec-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-net-iseries_vethc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-fs-char_devc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-paramsc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-userc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-mm-slubc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-net-bridge-br_ifc-to-use-kobject_init-add_ng.patch
+gregkh-driver-driver-add-driver_add_kobj-for-looney-iseries_veth-driver.patch
+gregkh-driver-kobject-change-drivers-base-bus-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-convert-block-elevatorc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-block-ll_rw_blkc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-md-mdc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-modulec-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-remove-kobject_add-as-no-one-uses-it-anymore.patch
+gregkh-driver-kobject-rename-kobject_add_ng-to-kobject_add.patch
+gregkh-driver-kobject-remove-kobject_init-as-no-one-uses-it-anymore.patch
+gregkh-driver-kobject-rename-kobject_init_ng-to-kobject_init.patch
+gregkh-driver-kobject-remove-kobject_register.patch
+gregkh-driver-kset-remove-kset_add-function.patch
+gregkh-driver-kobject-auto-cleanup-on-final-unref.patch
+gregkh-driver-kobject-convert-arch-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-drivers-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-fs-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-remaining-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-remove-kobject_unregister-as-no-one-uses-it-anymore.patch
+gregkh-driver-driver-core-change-sysdev-classes-to-use-dynamic-kobject-names.patch
+gregkh-driver-kobject-remove-old-outdated-documentation.patch
+gregkh-driver-kobject-update-the-kobject-kset-documentation.patch
+gregkh-driver-kobject-add-sample-code-for-how-to-use-kobjects-in-a-simple-manner.patch
+gregkh-driver-kobject-add-sample-code-for-how-to-use-ksets-ktypes-kobjects.patch
+gregkh-driver-driver-core-use-list_head-instead-of-call-to-init_list_head-in-__init.patch
[snip]
Post by Andrew Morton
+md-support-external-metadata-for-md-arrays.patch
+md-give-userspace-control-over-removing-failed-devices-when-external-metdata-in-use.patch
+md-allow-a-maximum-extent-to-be-set-for-resyncing.patch
+md-allow-devices-to-be-shared-between-md-arrays.patch
+md-lock-address-when-changing-attributes-of-component-devices.patch
+md-allow-an-md-array-to-appear-with-0-drives-if-it-has-external-metadata.patch
RAID updates
I have finally given up on using 2.6.24-rc3-mm2 with slub_debug=FZP to
get more information out of the random crashes I had seen with that
version. (Did not crash once with slub_debug, so no new information on
what the cause was)

2.6.24-rc6-mm1 does not boot for me.
It starts my initrd, but when this wants to start the md devices it crashes:
[ 12.900887] Freeing unused kernel memory: 356k freed
[ 15.290320] Clocksource tsc unstable (delta = -558415384 ns)
[ 34.284845] md: Autodetecting RAID arrays.
[ 34.154076] md: Scanned 5 and added 5 devices.
[ 34.154076] md: autorun ...
[ 34.154080] md: considering sdc2 ...
[ 34.155472] md: adding sdc2 ...
[ 34.156728] md: adding sdb2 ...
[ 34.164080] md: sdb1 has different UUID to sdc2
[ 34.165836] md: adding sda2 ...
[ 34.174080] md: sda1 has different UUID to sdc2
[ 34.175852] md: created md1
[ 34.176938] md: bind<sda2>
[ 34.184147] md: bind<sdb2>
[ 34.185219] md: bind<sdc2>
[ 34.186284] md: running: <sdc2><sdb2><sda2>
[ 34.194604] md: do_md_run() returned -22
[ 34.196123] md: md1 stopped.
[ 34.197267] md: unbind<sdc2>
[ 34.204105] md: export_rdev(sdc2)
[ 34.205426] md: unbind<sdb2>
[ 34.206548] md: export_rdev(sdb2)
[ 34.214102] md: unbind<sda2>
[ 34.215223] md: export_rdev(sda2)
[ 34.216544] md: considering sdb1 ...
[ 34.224083] md: adding sdb1 ...
[ 34.225337] md: adding sda1 ...
[ 34.226696] Unable to handle kernel paging request at 0000000034333545 RIP:
[ 34.228481] [<ffffffff803b49a1>] kref_put+0x31/0x80
[ 34.231378] PGD 7e402067 PUD 7e924067 PMD 0
[ 34.233084] Oops: 0002 [1] SMP
[ 34.234076] last sysfs file: /sys/devices/virtual/block/md1/dev
[ 34.234076] CPU 3
[ 34.234076] Modules linked in:
[ 34.234076] Pid: 18, comm: events/3 Not tainted 2.6.24-rc6-mm1 #1
[ 34.234076] RIP: 0010:[<ffffffff803b49a1>] [<ffffffff803b49a1>]
kref_put+0x31/0x80
[ 34.234076] RSP: 0018:ffff81007ffd5e00 EFLAGS: 00010202
[ 34.234076] RAX: 0000000000000000 RBX: 0000000034333545 RCX: ffffffff80606270
[ 34.234076] RDX: 0000000000000040 RSI: ffffffff803b38b0 RDI: 0000000034333545
[ 34.234076] RBP: ffff81007ffd5e10 R08: 0000000000000001 R09: 0000000000000000
[ 34.234076] R10: ffffffff8094c430 R11: 0000000000000000 R12: ffffffff803b38b0
[ 34.234076] R13: ffff81011ed434d8 R14: ffffffff804d7d50 R15: ffff81011ff220f0
[ 34.234076] FS: 0000000000c7f870(0000) GS:ffff81011ff20280(0000)
knlGS:0000000000000000
[ 34.234076] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 34.234076] CR2: 0000000034333545 CR3: 000000007e5bc000 CR4: 00000000000006e0
[ 34.234076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 34.234076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 34.234076] Process events/3 (pid: 18, threadinfo ffff81007ffd4000,
task ffff81007ffd2000)
[ 34.234076] Stack: ffff81011ed43460 ffff81011ff220c0
ffff81007ffd5e20 ffffffff803b37e9
[ 34.234076] ffff81007ffd5e40 ffffffff803b389b ffff81007ffd5e50
ffff81011ed434e0
[ 34.234076] ffff81007ffd5e50 ffffffff804d7d5d ffff81007ffd5eb0
ffffffff80249775
[ 34.234076] Call Trace:
[ 34.234076] [<ffffffff803b37e9>] kobject_put+0x19/0x20
[ 34.234076] [<ffffffff803b389b>] kobject_del+0x2b/0x40
[ 34.234076] [<ffffffff804d7d5d>] delayed_delete+0xd/0x10
[ 34.234076] [<ffffffff80249775>] run_workqueue+0x175/0x210
[ 34.234076] [<ffffffff8024a411>] worker_thread+0x71/0xb0
[ 34.234076] [<ffffffff8024d9e0>] autoremove_wake_function+0x0/0x40
[ 34.234076] [<ffffffff8024a3a0>] worker_thread+0x0/0xb0
[ 34.234076] [<ffffffff8024d5fd>] kthread+0x4d/0x80
[ 34.234076] [<ffffffff8020c4b8>] child_rip+0xa/0x12
[ 34.234076] [<ffffffff8020bbcf>] restore_args+0x0/0x30
[ 34.234076] [<ffffffff8024d5b0>] kthread+0x0/0x80
[ 34.234076] [<ffffffff8020c4ae>] child_rip+0x0/0x12
[ 34.234076]
[ 34.234076]
[ 34.234076] Code: f0 ff 0b 0f 94 c0 31 d2 84 c0 74 0b 48 89 df 41 ff d4 ba 01
[ 34.234076] RIP [<ffffffff803b49a1>] kref_put+0x31/0x80
[ 34.234076] RSP <ffff81007ffd5e00>
[ 34.234076] CR2: 0000000034333545
[ 34.234080] ---[ end trace 303353bd9dfe95b0 ]---
[ 34.236037] md: created md0
[ 34.237125] md: bind<sda1>
[ 34.244088] md: bind<sdb1>
[ 34.245152] md: running: <sdb1><sda1>
[ 34.246626] md: do_md_run() returned -22
[ 34.254078] md: md0 stopped.
[ 45.657898] SysRq : Resetting

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdb1[1] sda1[0]
9775424 blocks [2/2] [UU]
bitmap: 0/150 pages [0KB], 32KB chunk

md1 : active raid5 sdc2[2] sdb2[1] sda2[0]
605586048 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
bitmap: 2/145 pages [8KB], 1024KB chunk

unused devices: <none>

Should I blame the raid1 changes or the kobject changes?

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2007-12-24 02:00:17 UTC
Permalink
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
[snip]
Post by Andrew Morton
+agk-dm-dm-snapshot-use-uninitialized_var.patch
+agk-dm-dm-raid1-handle-write-failures.patch
+agk-dm-dm-raid1-report-fault-status.patch
+agk-dm-dm-raid1-fix-eio-after-log-failure.patch
+agk-dm-dm-raid1-handle-read-failures.patch
+agk-dm-dm-raid1-mark-and-clear-nosync-writes.patch
device-mapper tree updates
[snip]
Post by Andrew Morton
+gregkh-driver-kref-add-kref_set.patch
+gregkh-driver-kobject-convert-sys-firmware-acpi-to-use-kobject_create.patch
+gregkh-driver-kobject-change-net-bridge-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-gfs2-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-infiniband-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-firmware-eddc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-firmware-efivarsc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-cpufreq-cpufreqc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-edac-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-cpuidle-sysfsc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-pci-hotplug-pci_hotplug_corec-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-base-sysc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-intel_cacheinfoc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-acpi-systemc-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-drivers-block-pktcdvdc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-sh-kernel-cpu-sh4-sqc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-net-ibmvethc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-parisc-pdc_stablec-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-ia64-kernel-topologyc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-drivers-md-mdc-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-mcheck-mce_amd_64c-to-use-kobject_create_and_add.patch
+gregkh-driver-kobject-change-arch-x86-kernel-cpu-mcheck-mce_amd_64c-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-the-cris-iop_fw_loadc-code-is-broken.patch
+gregkh-driver-kobject-convert-drivers-base-classc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-base-corec-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-net-iseries_vethc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-fs-char_devc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-paramsc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-userc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-mm-slubc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-net-bridge-br_ifc-to-use-kobject_init-add_ng.patch
+gregkh-driver-driver-add-driver_add_kobj-for-looney-iseries_veth-driver.patch
+gregkh-driver-kobject-change-drivers-base-bus-to-use-kobject_init_and_add.patch
+gregkh-driver-kobject-convert-block-elevatorc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-block-ll_rw_blkc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-drivers-md-mdc-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-convert-kernel-modulec-to-use-kobject_init-add_ng.patch
+gregkh-driver-kobject-remove-kobject_add-as-no-one-uses-it-anymore.patch
+gregkh-driver-kobject-rename-kobject_add_ng-to-kobject_add.patch
+gregkh-driver-kobject-remove-kobject_init-as-no-one-uses-it-anymore.patch
+gregkh-driver-kobject-rename-kobject_init_ng-to-kobject_init.patch
+gregkh-driver-kobject-remove-kobject_register.patch
+gregkh-driver-kset-remove-kset_add-function.patch
+gregkh-driver-kobject-auto-cleanup-on-final-unref.patch
+gregkh-driver-kobject-convert-arch-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-drivers-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-fs-from-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-convert-remaining-kobject_unregister-to-kobject_put.patch
+gregkh-driver-kobject-remove-kobject_unregister-as-no-one-uses-it-anymore.patch
+gregkh-driver-driver-core-change-sysdev-classes-to-use-dynamic-kobject-names.patch
+gregkh-driver-kobject-remove-old-outdated-documentation.patch
+gregkh-driver-kobject-update-the-kobject-kset-documentation.patch
+gregkh-driver-kobject-add-sample-code-for-how-to-use-kobjects-in-a-simple-manner.patch
+gregkh-driver-kobject-add-sample-code-for-how-to-use-ksets-ktypes-kobjects.patch
+gregkh-driver-driver-core-use-list_head-instead-of-call-to-init_list_head-in-__init.patch
[snip]
Post by Andrew Morton
+md-support-external-metadata-for-md-arrays.patch
+md-give-userspace-control-over-removing-failed-devices-when-external-metdata-in-use.patch
+md-allow-a-maximum-extent-to-be-set-for-resyncing.patch
+md-allow-devices-to-be-shared-between-md-arrays.patch
+md-lock-address-when-changing-attributes-of-component-devices.patch
+md-allow-an-md-array-to-appear-with-0-drives-if-it-has-external-metadata.patch
RAID updates
I have finally given up on using 2.6.24-rc3-mm2 with slub_debug=FZP to
get more information out of the random crashes I had seen with that
version. (Did not crash once with slub_debug, so no new information on
what the cause was)
2.6.24-rc6-mm1 does not boot for me.
[ 12.900887] Freeing unused kernel memory: 356k freed
[ 15.290320] Clocksource tsc unstable (delta = -558415384 ns)
[ 34.284845] md: Autodetecting RAID arrays.
[ 34.154076] md: Scanned 5 and added 5 devices.
[ 34.154076] md: autorun ...
[ 34.154080] md: considering sdc2 ...
[ 34.155472] md: adding sdc2 ...
[ 34.156728] md: adding sdb2 ...
[ 34.164080] md: sdb1 has different UUID to sdc2
[ 34.165836] md: adding sda2 ...
[ 34.174080] md: sda1 has different UUID to sdc2
[ 34.175852] md: created md1
[ 34.176938] md: bind<sda2>
[ 34.184147] md: bind<sdb2>
[ 34.185219] md: bind<sdc2>
[ 34.186284] md: running: <sdc2><sdb2><sda2>
[ 34.194604] md: do_md_run() returned -22
[ 34.196123] md: md1 stopped.
[ 34.197267] md: unbind<sdc2>
[ 34.204105] md: export_rdev(sdc2)
[ 34.205426] md: unbind<sdb2>
[ 34.206548] md: export_rdev(sdb2)
[ 34.214102] md: unbind<sda2>
[ 34.215223] md: export_rdev(sda2)
[ 34.216544] md: considering sdb1 ...
[ 34.224083] md: adding sdb1 ...
[ 34.225337] md: adding sda1 ...
[ 34.228481] [<ffffffff803b49a1>] kref_put+0x31/0x80
[ 34.231378] PGD 7e402067 PUD 7e924067 PMD 0
[ 34.233084] Oops: 0002 [1] SMP
[ 34.234076] last sysfs file: /sys/devices/virtual/block/md1/dev
[ 34.234076] CPU 3
[ 34.234076] Pid: 18, comm: events/3 Not tainted 2.6.24-rc6-mm1 #1
[ 34.234076] RIP: 0010:[<ffffffff803b49a1>] [<ffffffff803b49a1>]
kref_put+0x31/0x80
[ 34.234076] RSP: 0018:ffff81007ffd5e00 EFLAGS: 00010202
[ 34.234076] RAX: 0000000000000000 RBX: 0000000034333545 RCX: ffffffff80606270
[ 34.234076] RDX: 0000000000000040 RSI: ffffffff803b38b0 RDI: 0000000034333545
[ 34.234076] RBP: ffff81007ffd5e10 R08: 0000000000000001 R09: 0000000000000000
[ 34.234076] R10: ffffffff8094c430 R11: 0000000000000000 R12: ffffffff803b38b0
[ 34.234076] R13: ffff81011ed434d8 R14: ffffffff804d7d50 R15: ffff81011ff220f0
[ 34.234076] FS: 0000000000c7f870(0000) GS:ffff81011ff20280(0000)
knlGS:0000000000000000
[ 34.234076] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 34.234076] CR2: 0000000034333545 CR3: 000000007e5bc000 CR4: 00000000000006e0
[ 34.234076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 34.234076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 34.234076] Process events/3 (pid: 18, threadinfo ffff81007ffd4000,
task ffff81007ffd2000)
[ 34.234076] Stack: ffff81011ed43460 ffff81011ff220c0
ffff81007ffd5e20 ffffffff803b37e9
[ 34.234076] ffff81007ffd5e40 ffffffff803b389b ffff81007ffd5e50
ffff81011ed434e0
[ 34.234076] ffff81007ffd5e50 ffffffff804d7d5d ffff81007ffd5eb0
ffffffff80249775
[ 34.234076] [<ffffffff803b37e9>] kobject_put+0x19/0x20
[ 34.234076] [<ffffffff803b389b>] kobject_del+0x2b/0x40
[ 34.234076] [<ffffffff804d7d5d>] delayed_delete+0xd/0x10
[ 34.234076] [<ffffffff80249775>] run_workqueue+0x175/0x210
[ 34.234076] [<ffffffff8024a411>] worker_thread+0x71/0xb0
[ 34.234076] [<ffffffff8024d9e0>] autoremove_wake_function+0x0/0x40
[ 34.234076] [<ffffffff8024a3a0>] worker_thread+0x0/0xb0
[ 34.234076] [<ffffffff8024d5fd>] kthread+0x4d/0x80
[ 34.234076] [<ffffffff8020c4b8>] child_rip+0xa/0x12
[ 34.234076] [<ffffffff8020bbcf>] restore_args+0x0/0x30
[ 34.234076] [<ffffffff8024d5b0>] kthread+0x0/0x80
[ 34.234076] [<ffffffff8020c4ae>] child_rip+0x0/0x12
[ 34.234076]
[ 34.234076]
[ 34.234076] Code: f0 ff 0b 0f 94 c0 31 d2 84 c0 74 0b 48 89 df 41 ff d4 ba 01
[ 34.234076] RIP [<ffffffff803b49a1>] kref_put+0x31/0x80
[ 34.234076] RSP <ffff81007ffd5e00>
[ 34.234076] CR2: 0000000034333545
[ 34.234080] ---[ end trace 303353bd9dfe95b0 ]---
[ 34.236037] md: created md0
[ 34.237125] md: bind<sda1>
[ 34.244088] md: bind<sdb1>
[ 34.245152] md: running: <sdb1><sda1>
[ 34.246626] md: do_md_run() returned -22
[ 34.254078] md: md0 stopped.
[ 45.657898] SysRq : Resetting
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sdb1[1] sda1[0]
9775424 blocks [2/2] [UU]
bitmap: 0/150 pages [0KB], 32KB chunk
md1 : active raid5 sdc2[2] sdb2[1] sda2[0]
605586048 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
bitmap: 2/145 pages [8KB], 1024KB chunk
unused devices: <none>
Should I blame the raid1 changes or the kobject changes?
I don't know. It could even be that both patch series are OK but when they
are combined, things fail.

Greg, Alasdair: the above looks like a preview of 2.6.25-rc1 :(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-27 11:50:11 UTC
Permalink
Post by Andrew Morton
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
[snip]
Post by Andrew Morton
+md-support-external-metadata-for-md-arrays.patch
+md-give-userspace-control-over-removing-failed-devices-when-external-metdata-in-use.patch
+md-allow-a-maximum-extent-to-be-set-for-resyncing.patch
+md-allow-devices-to-be-shared-between-md-arrays.patch
+md-lock-address-when-changing-attributes-of-component-devices.patch
+md-allow-an-md-array-to-appear-with-0-drives-if-it-has-external-metadata.patch
RAID updates
Should I blame the raid1 changes or the kobject changes?
I don't know. It could even be that both patch series are OK but when they
are combined, things fail.
OK, I debugged this some more. It looks like two bugs meshed together.

One new bug: "do_md_run() returned -22"
I can't seem to start my raid anymore.
The following part of md-allow-devices-to-be-shared-between-md-arrays
adds a new check to do_md_run() (drivers/md/md.c) that fails for my system:
@@ -3213,8 +3283,11 @@ static int do_md_run(mddev_t * mddev)
/*
* Analyze all RAID superblock(s)
*/
- if (!mddev->raid_disks)
+ if (!mddev->raid_disks) {
+ if (!mddev->persistent)
+ return -EINVAL;
analyze_sbs(mddev);
+ }

chunk_size = mddev->chunk_size;

The raid gets started normally with any other kernel I tried.
I did not investigate the cause of this failure further, because I was
looking why a failure to start a raid was causing event/3 to oops.

This looks like a secound, but rather old bug.
do_md_stop (from drivers/md/md.c) does the following:
3691 /* make sure all delayed_delete calls have finished */
3692 flush_scheduled_work();
3693
3694 export_array(mddev);
3695
But: Only the callchain export_array -> kick_rdev_from_array ->
unbind_rdev_from_array schedules the delayed_delete's!

After adding a second flush_scheduled_work() below the export_array()
the resulting kernel no longer oopses and my initrd normally asks for
an alternative root-fs, because of the first bug the raid still does
not get started.

I don't know if this flush_scheduled_work() is misplaced since is was
introduced, or if it really even was only trying to flush delayed
deletes from previously stopped arrays.

When investigation this, I got these debug-outputs:
first try, with my second flush_scheduled_work removed again:
[ 34.290576] md: Autodetecting RAID arrays.
[ 34.125649] md: Scanned 5 and added 5 devices.
[ 34.125649] md: autorun ...
[ 34.125649] md: considering sdc2 ...
[ 34.125658] md: adding sdc2 ...
[ 34.126914] md: adding sdb2 ...
[ 34.128170] md: sdb1 has different UUID to sdc2
[ 34.135654] md: adding sda2 ...
[ 34.137168] md: sda1 has different UUID to sdc2
[ 34.145669] md: created md1
[ 34.146755] md: bind<sda2>
[ 34.147879] md: bind<sdb2>
[ 34.155665] md: bind<sdc2>
[ 34.156730] md: running: <sdc2><sdb2><sda2>
[ 34.158427] mddev not persistent ???
[ 34.165651] md: do_md_run() returned -22
[ 34.167171] md: md1 stopped.
[ 34.168292] 1:flush_scheduled_work()
-> this is the original flush_scheduled_work()-call
[ 34.175675] md: unbind<sdc2>
[ 34.176795] md: remove sysfs-link 'block', schedule delayed_delete...
following output is from unbind_rdev_from_array:
[ 34.185662] XXX:unb:rdev == ffff81011ede4800
<3>XXX:unb:rdev->bdev == ffff81011f86b600
<3>XXX:unb:rdev->kobj == ffff81011ede4860
<6>md: export_rdev(sdc2)
[ 34.196600] md: unbind<sdb2>
[ 34.197720] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.205654] XXX:unb:rdev == ffff81011ede4600
<3>XXX:unb:rdev->bdev == ffff81011f86b080
<3>XXX:unb:rdev->kobj == ffff81011ede4660
<6>md: export_rdev(sdb2)
[ 34.217942] md: unbind<sda2>
[ 34.225651] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.228140] XXX:unb:rdev == ffff81011ede4400
<3>XXX:unb:rdev->bdev == ffff81011f86a000
<3>XXX:unb:rdev->kobj == ffff81011ede4460
<6>md: export_rdev(sda2)
[ 34.245664] 2:!flush_scheduled_work()
-> my second call is disabled for this run
[ 34.247101] md: considering sdb1 ...
[ 34.248492] md: adding sdb1 ...
[ 34.255654] md: adding sda1 ...
following output is from delayed_delete:
[ 34.257024] XXX:dd:rdev == ffff81011ede4800
<3>XXX:dd:rdev->kobj == ffff81011ede4860
-> sdc2 seems to get deleted normally
<3>XXX:dd:rdev == ffff81011ede4600
<3>XXX:dd:rdev->kobj == ffff81011ede4660
-> sdb2 too
<3>XXX:dd:rdev == ffff81011ede4400
<3>XXX:dd:rdev->bdev == 5441505645440064
-> but here is something strange, the other devices did not get a
bdev-line, because that value was zero
<6>md: created md0
[ 34.654554] md: bind<sda1>
[ 34.654554] md: bind<sdb1>
[ 34.654554] md: running: <sdb1><sda1>
[ 34.654554] mddev not persistent ???
[ 34.654554] md: do_md_run() returned -22
[ 34.654554] md: md0 stopped.
[ 34.654554] 1:flush_scheduled_work()
-> the flush from the second failed raid forces the delayed_deletes
from the first raid to finish
[ 34.285652] XXX:dd:rdev->kobj == ffff81011ede4460
-> the pointer itself looks still the same, but the kobject seems to be gone:
<1>Unable to handle kernel paging request at 0000000034333545 RIP:
[ 34.288794] [<ffffffff803b49a1>] kref_put+0x31/0x80
[ 34.291688] PGD 7e427067 PUD 7ed22067 PMD 0
[ 34.293394] Oops: 0002 [1] SMP
[ 34.294651] last sysfs file: /sys/devices/virtual/block/md0/dev
[ 34.295649] CPU 3
[ 34.295649] Modules linked in:
[ 34.295649] Pid: 18, comm: events/3 Not tainted 2.6.24-rc6-mm1 #9
[ 34.295649] RIP: 0010:[<ffffffff803b49a1>] [<ffffffff803b49a1>]
kref_put+0x31/0x80
[ 34.295649] RSP: 0000:ffff81007ffe5df0 EFLAGS: 00010202
[ 34.295649] RAX: 0000000000000000 RBX: 0000000034333545 RCX: ffffffff80606270
[ 34.295649] RDX: 0000000000000040 RSI: ffffffff803b38b0 RDI: 0000000034333545
[ 34.295649] RBP: ffff81007ffe5e00 R08: 0000000000000001 R09: 0000000000000000
[ 34.295649] R10: ffffffff8094c430 R11: 0000000000000000 R12: ffffffff803b38b0
[ 34.295650] R13: ffff81011ede44d8 R14: ffffffff804d7d50 R15: ffff81011ff210f0
[ 34.295650] FS: 0000000002024870(0000) GS:ffff81011ff0dd00(0000)
knlGS:0000000000000000
[ 34.295650] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 34.295650] CR2: 0000000034333545 CR3: 000000007e535000 CR4: 00000000000006e0
[ 34.295650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 34.295650] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 34.295650] Process events/3 (pid: 18, threadinfo ffff81007ffe4000,
task ffff81007ffe2000)
[ 34.295650] Stack: ffff81011ede4460 ffff81011ede4400
ffff81007ffe5e10 ffffffff803b37e9
[ 34.295650] ffff81007ffe5e30 ffffffff803b389b ffffffff804d7d50
ffff81011ede4460
[ 34.295650] ffff81007ffe5e50 ffffffff804d7db9 ffff81011ede44e0
ffff81011ff210c0
[ 34.295650] Call Trace:
[ 34.295650] [<ffffffff803b37e9>] kobject_put+0x19/0x20
[ 34.295650] [<ffffffff803b389b>] kobject_del+0x2b/0x40
[ 34.295650] [<ffffffff804d7d50>] delayed_delete+0x0/0xb0
[ 34.295650] [<ffffffff804d7db9>] delayed_delete+0x69/0xb0
[ 34.295650] [<ffffffff80249775>] run_workqueue+0x175/0x210
[ 34.295650] [<ffffffff8024a411>] worker_thread+0x71/0xb0
[ 34.295650] [<ffffffff8024d9e0>] autoremove_wake_function+0x0/0x40
[ 34.295650] [<ffffffff8024a3a0>] worker_thread+0x0/0xb0
[ 34.295650] [<ffffffff8024d5fd>] kthread+0x4d/0x80
[ 34.295650] [<ffffffff8020c4b8>] child_rip+0xa/0x12
[ 34.295650] [<ffffffff8020bbcf>] restore_args+0x0/0x30
[ 34.295650] [<ffffffff8024d5b0>] kthread+0x0/0x80
[ 34.295650] [<ffffffff8020c4ae>] child_rip+0x0/0x12
[ 34.295650]
[ 34.295650]
[ 34.295650] Code: f0 ff 0b 0f 94 c0 31 d2 84 c0 74 0b 48 89 df 41
ff d4 ba 01
[ 34.295650] RIP [<ffffffff803b49a1>] kref_put+0x31/0x80
[ 34.295650] RSP <ffff81007ffe5df0>
[ 34.295650] CR2: 0000000034333545
[ 34.295653] ---[ end trace 60425fedd4d3ef22 ]---
Here the system hangs, the initrd does not prompt for an alternative root-fs

second try with the newly added flush:
[ 34.267403] md: Autodetecting RAID arrays.
[ 34.188217] md: Scanned 5 and added 5 devices.
[ 34.188220] md: autorun ...
[ 34.189306] md: considering sdc2 ...
[ 34.190699] md: adding sdc2 ...
[ 34.198224] md: adding sdb2 ...
[ 34.199480] md: sdb1 has different UUID to sdc2
[ 34.201237] md: adding sda2 ...
[ 34.208220] md: sda1 has different UUID to sdc2
[ 34.209993] md: created md1
[ 34.218220] md: bind<sda2>
[ 34.219341] md: bind<sdb2>
[ 34.220410] md: bind<sdc2>
[ 34.221476] md: running: <sdc2><sdb2><sda2>
[ 34.228963] mddev not persistent ???
[ 34.230350] md: do_md_run() returned -22
[ 34.238219] md: md1 stopped.
[ 34.239339] 1:flush_scheduled_work()
-> old call to flush_scheduled_work()
[ 34.240749] md: unbind<sdc2>
[ 34.248219] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.250716] XXX:unb:rdev == ffff81011ed6d800
<3>XXX:unb:rdev->bdev == ffff81011f86b600
<3>XXX:unb:rdev->kobj == ffff81011ed6d860
<6>md: export_rdev(sdc2)
[ 34.268262] md: unbind<sdb2>
[ 34.269382] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.278221] XXX:unb:rdev == ffff81011ed6de00
<3>XXX:unb:rdev->bdev == ffff81011f86b080
<3>XXX:unb:rdev->kobj == ffff81011ed6de60
<6>md: export_rdev(sdb2)
[ 34.289123] md: unbind<sda2>
[ 34.290244] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.298221] XXX:unb:rdev == ffff81011ed6da00
<3>XXX:unb:rdev->bdev == ffff81011f86a000
<3>XXX:unb:rdev->kobj == ffff81011ed6da60
<6>md: export_rdev(sda2)
[ 34.310502] 2:flush_scheduled_work()
-> newly added call to flush_scheduled_work()
[ 34.318253] XXX:dd:rdev == ffff81011ed6d800
<3>XXX:dd:rdev->kobj == ffff81011ed6d860
<3>XXX:dd:rdev == ffff81011ed6de00
<3>XXX:dd:rdev->kobj == ffff81011ed6de60
-> this time, rdev->bdev/rdev->kobj from sda2 seem to be still ok.
<3>XXX:dd:rdev == ffff81011ed6da00
<3>XXX:dd:rdev->kobj == ffff81011ed6da60
<6>md: considering sdb1 ...
[ 34.339255] md: adding sdb1 ...
[ 34.340511] md: adding sda1 ...
[ 34.348520] md: created md0
[ 34.349608] md: bind<sda1>
[ 34.350676] md: bind<sdb1>
[ 34.351741] md: running: <sdb1><sda1>
[ 34.358743] mddev not persistent ???
[ 34.360131] md: do_md_run() returned -22
-> same failure to start the second raid, but...
[ 34.368219] md: md0 stopped.
[ 34.369340] 1:flush_scheduled_work()
-> ... this time the work is already done and no oops happend
[ 34.370733] md: unbind<sdb1>
[ 34.378219] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.380705] XXX:unb:rdev == ffff81011ed6dc00
<3>XXX:unb:rdev->bdev == ffff81011f86a580
<3>XXX:unb:rdev->kobj == ffff81011ed6dc60
<6>md: export_rdev(sdb1)
[ 34.398254] md: unbind<sda1>
[ 34.399373] md: remove sysfs-link 'block', schedule delayed_delete...
[ 34.408221] XXX:unb:rdev == ffff81007ff51800
<3>XXX:unb:rdev->bdev == ffff81007f8a0580
<3>XXX:unb:rdev->kobj == ffff81007ff51860
<6>md: export_rdev(sda1)
[ 34.419120] 2:flush_scheduled_work()
[ 34.420510] XXX:dd:rdev == ffff81011ed6dc00
<3>XXX:dd:rdev->kobj == ffff81011ed6dc60
<3>XXX:dd:rdev == ffff81007ff51800
<3>XXX:dd:rdev->kobj == ffff81007ff51860
<6>md: ... autorun DONE.
Here the system now asks for a root-fs as the normal root /dev/md1 was
not started.

I hope these outputs help, if more are needed, just ask.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-27 14:40:12 UTC
Permalink
[author CCed]
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
[snip]
Post by Andrew Morton
+md-allow-devices-to-be-shared-between-md-arrays.patch
[snip]
Post by Torsten Kaiser
OK, I debugged this some more. It looks like two bugs meshed together.
One new bug: "do_md_run() returned -22"
I can't seem to start my raid anymore.
The following part of md-allow-devices-to-be-shared-between-md-arrays
@@ -3213,8 +3283,11 @@ static int do_md_run(mddev_t * mddev)
/*
* Analyze all RAID superblock(s)
*/
- if (!mddev->raid_disks)
+ if (!mddev->raid_disks) {
+ if (!mddev->persistent)
+ return -EINVAL;
analyze_sbs(mddev);
+ }
chunk_size = mddev->chunk_size;
This hunk is indeed buggy.
analyze_sbs() calls load_super() and validate_super() and only the
validate function is setting mddev->persistent, so this new check
needs to be after the call analyze_sbs(mddev).

Changing this allows my system to boot correctly, including starting KDE.

Please note, that this is not a fix for the OOPS in delayed_delete,
the OOPS just doesn't happen, because the buggy error path is no
longer used.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-28 23:00:19 UTC
Permalink
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I have finally given up on using 2.6.24-rc3-mm2 with slub_debug=FZP to
get more information out of the random crashes I had seen with that
version. (Did not crash once with slub_debug, so no new information on
what the cause was)
Murphy: Just after sending that mail the system crashed two times with
slub_debug=FZP, but did not show any new informations.
No debug output from slub, only this stacktrace: (Its the same I
already reported in the 2.6.24-rc3-mm2 thread)

[ 7620.673012] ------------[ cut here ]------------
[ 7620.676291] kernel BUG at lib/list_debug.c:33!
[ 7620.679440] invalid opcode: 0000 [1] SMP
[ 7620.682319] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[ 7620.687845] CPU 0
[ 7620.689300] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid i2c_nforce2 sg pata_amd
[ 7620.708561] Pid: 5698, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #2
[ 7620.713080] RIP: 0010:[<ffffffff803bae54>] [<ffffffff803bae54>]
__list_add+0x54/0x60
[ 7620.718667] RSP: 0018:ffff81011bca1dc0 EFLAGS: 00010282
[ 7620.722439] RAX: 0000000000000088 RBX: ffff81011c862c48 RCX: 0000000000000002
[ 7620.727504] RDX: ffff81011bc82ef0 RSI: 0000000000000001 RDI: ffffffff807590c0
[ 7620.732581] RBP: ffff81011bca1dc0 R08: 0000000000000001 R09: 0000000000000000
[ 7620.737658] R10: ffff810080058d48 R11: 0000000000000001 R12: ffff81011ed8d1c8
[ 7620.742711] R13: ffff81011ed8d200 R14: ffff81011ed8d200 R15: ffff81011cc0e578
[ 7620.747806] FS: 00007ffe400116f0(0000) GS:ffffffff807d4000(0000)
knlGS:00000000f73558e0
[ 7620.753535] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7620.757607] CR2: 00000000017071dc CR3: 00000001188b5000 CR4: 00000000000006e0
[ 7620.762677] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7620.767748] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7620.772808] Process nfsv4-svc (pid: 5698, threadinfo
FFFF81011BCA0000, task FFFF81011BC82EF0)
[ 7620.778872] Stack: ffff81011bca1e00 ffffffff805be26e
ffff81011ed8d1d0 ffff81011cc0e578
[ 7620.784626] ffff81011c862c48 ffff81011c8be000 ffff810054a8b060
ffff81011cc0e588
[ 7620.789913] ffff81011bca1e10 ffffffff805be367 ffff81011bca1ee0
ffffffff805bf0ac
[ 7620.795062] Call Trace:
[ 7620.796941] [<ffffffff805be26e>] svc_xprt_enqueue+0x1ae/0x250
[ 7620.801087] [<ffffffff805be367>] svc_xprt_received+0x17/0x20
[ 7620.805199] [<ffffffff805bf0ac>] svc_recv+0x39c/0x840
[ 7620.808851] [<ffffffff805bea3f>] svc_send+0xaf/0xd0
[ 7620.812374] [<ffffffff8022f590>] default_wake_function+0x0/0x10
[ 7620.816637] [<ffffffff803163ea>] nfs_callback_svc+0x7a/0x130
[ 7620.820712] [<ffffffff805cfea2>] trace_hardirqs_on_thunk+0x35/0x3a
[ 7620.825174] [<ffffffff80259f8f>] trace_hardirqs_on+0xbf/0x160
[ 7620.829335] [<ffffffff8020cbc8>] child_rip+0xa/0x12
[ 7620.832842] [<ffffffff8020c2df>] restore_args+0x0/0x30
[ 7620.836554] [<ffffffff80316370>] nfs_callback_svc+0x0/0x130
[ 7620.840564] [<ffffffff8020cbbe>] child_rip+0x0/0x12
[ 7620.844102]
[ 7620.845168] INFO: lockdep is turned off.
[ 7620.847964]
[ 7620.847965] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[ 7620.854334] RIP [<ffffffff803bae54>] __list_add+0x54/0x60
[ 7620.858255] RSP <ffff81011bca1dc0>
[ 7620.860724] Kernel panic - not syncing: Aiee, killing interrupt handler!


The cause, why I am resending this: I just got a crash with
2.6.24-rc6-mm1, again looking network related:

[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939293] Call Trace:
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.939337]
[93436.947241] general protection fault: 0000 [1] SMP
[93436.947243] last sysfs file:
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.947259] RIP: 0010:[<ffffffff80531438>] [<ffffffff80531438>]
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:ffff810005f4fda8 EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 000000000000d135
[93436.947265] RDX: ffff81011d089a80 RSI: 0000000000000001 RDI: ffff81011d089a88
[93436.947266] RBP: ffff810005f4fdb8 R08: 0000000000000001 R09: 0000000000000006
[93436.947268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100de02c500
[93436.947269] R13: ffff81011c188a00 R14: 0000000000000001 R15: ffff81011c189198
[93436.947271] FS: 00007fb5bde0d700(0000) GS:ffff81007ff22000(0000)
knlGS:0000000000000000
[93436.947273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[93436.947274] CR2: 00007fb5bdd76000 CR3: 00000000664d5000 CR4: 00000000000006e0
[93436.947276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[93436.947277] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[93436.947279] Process konqueror (pid: 8079, threadinfo
ffff810005f4e000, task ffff8100a1dec000)
[93436.947281] Stack: ffff810005f4fdd8 ffff810116c86140
ffff810005f4fdd8 ffffffff805314ae
[93436.947284] ffff810116c86140 ffff8100de02c500 ffff810005f4fdf8
ffffffff80531cf0
[93436.947286] ffff8100de02c500 ffff81011c188b48 ffff810005f4fe18
ffffffff80531311
[93436.947288] Call Trace:
[93436.947290] [<ffffffff805314ae>] skb_release_data+0x5e/0xa0
[93436.947293] [<ffffffff80531cf0>] skb_release_all+0xa0/0x110
[93436.947295] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.947297] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.947299] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.947302] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.947304] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.947307] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.947309] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.947312] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.947314] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.947316] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.947319] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.947322]
[93436.947322]
[93436.947323] Code: 48 8b 18 48 89 c7 e8 5d ff ff ff 48 85 db 75 ed
48 83 c4 08
[93436.947328] RIP [<ffffffff80531438>] skb_drop_list+0x18/0x30
[93436.947330] RSP <ffff810005f4fda8>
[93436.947332] ---[ end trace befb7cc3528ab3b1 ]---

Don't know in what direction I should look.
I also can't easily reproduce this, it happened after several hours of
watching a wmv stream with mplayer...

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2007-12-28 23:20:08 UTC
Permalink
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I have finally given up on using 2.6.24-rc3-mm2 with slub_debug=FZP to
get more information out of the random crashes I had seen with that
version. (Did not crash once with slub_debug, so no new information on
what the cause was)
Murphy: Just after sending that mail the system crashed two times with
slub_debug=FZP, but did not show any new informations.
No debug output from slub, only this stacktrace: (Its the same I
already reported in the 2.6.24-rc3-mm2 thread)
[ 7620.673012] ------------[ cut here ]------------
[ 7620.676291] kernel BUG at lib/list_debug.c:33!
[ 7620.679440] invalid opcode: 0000 [1] SMP
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[ 7620.687845] CPU 0
[ 7620.689300] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid i2c_nforce2 sg pata_amd
[ 7620.708561] Pid: 5698, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #2
[ 7620.713080] RIP: 0010:[<ffffffff803bae54>] [<ffffffff803bae54>]
__list_add+0x54/0x60
[ 7620.718667] RSP: 0018:ffff81011bca1dc0 EFLAGS: 00010282
[ 7620.722439] RAX: 0000000000000088 RBX: ffff81011c862c48 RCX: 0000000000000002
[ 7620.727504] RDX: ffff81011bc82ef0 RSI: 0000000000000001 RDI: ffffffff807590c0
[ 7620.732581] RBP: ffff81011bca1dc0 R08: 0000000000000001 R09: 0000000000000000
[ 7620.737658] R10: ffff810080058d48 R11: 0000000000000001 R12: ffff81011ed8d1c8
[ 7620.742711] R13: ffff81011ed8d200 R14: ffff81011ed8d200 R15: ffff81011cc0e578
[ 7620.747806] FS: 00007ffe400116f0(0000) GS:ffffffff807d4000(0000)
knlGS:00000000f73558e0
[ 7620.753535] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7620.757607] CR2: 00000000017071dc CR3: 00000001188b5000 CR4: 00000000000006e0
[ 7620.762677] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7620.767748] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7620.772808] Process nfsv4-svc (pid: 5698, threadinfo
FFFF81011BCA0000, task FFFF81011BC82EF0)
[ 7620.778872] Stack: ffff81011bca1e00 ffffffff805be26e
ffff81011ed8d1d0 ffff81011cc0e578
[ 7620.784626] ffff81011c862c48 ffff81011c8be000 ffff810054a8b060
ffff81011cc0e588
[ 7620.789913] ffff81011bca1e10 ffffffff805be367 ffff81011bca1ee0
ffffffff805bf0ac
[ 7620.796941] [<ffffffff805be26e>] svc_xprt_enqueue+0x1ae/0x250
[ 7620.801087] [<ffffffff805be367>] svc_xprt_received+0x17/0x20
[ 7620.805199] [<ffffffff805bf0ac>] svc_recv+0x39c/0x840
[ 7620.808851] [<ffffffff805bea3f>] svc_send+0xaf/0xd0
[ 7620.812374] [<ffffffff8022f590>] default_wake_function+0x0/0x10
[ 7620.816637] [<ffffffff803163ea>] nfs_callback_svc+0x7a/0x130
[ 7620.820712] [<ffffffff805cfea2>] trace_hardirqs_on_thunk+0x35/0x3a
[ 7620.825174] [<ffffffff80259f8f>] trace_hardirqs_on+0xbf/0x160
[ 7620.829335] [<ffffffff8020cbc8>] child_rip+0xa/0x12
[ 7620.832842] [<ffffffff8020c2df>] restore_args+0x0/0x30
[ 7620.836554] [<ffffffff80316370>] nfs_callback_svc+0x0/0x130
[ 7620.840564] [<ffffffff8020cbbe>] child_rip+0x0/0x12
[ 7620.844102]
[ 7620.845168] INFO: lockdep is turned off.
[ 7620.847964]
[ 7620.847965] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[ 7620.854334] RIP [<ffffffff803bae54>] __list_add+0x54/0x60
[ 7620.858255] RSP <ffff81011bca1dc0>
[ 7620.860724] Kernel panic - not syncing: Aiee, killing interrupt handler!
That looks like a sunrpc bug. git-nfsd has bene mucking around in there a
bit.
Post by Torsten Kaiser
The cause, why I am resending this: I just got a crash with
[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.939337]
[93436.947241] general protection fault: 0000 [1] SMP
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.947259] RIP: 0010:[<ffffffff80531438>] [<ffffffff80531438>]
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:ffff810005f4fda8 EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 000000000000d135
[93436.947265] RDX: ffff81011d089a80 RSI: 0000000000000001 RDI: ffff81011d089a88
[93436.947266] RBP: ffff810005f4fdb8 R08: 0000000000000001 R09: 0000000000000006
[93436.947268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100de02c500
[93436.947269] R13: ffff81011c188a00 R14: 0000000000000001 R15: ffff81011c189198
[93436.947271] FS: 00007fb5bde0d700(0000) GS:ffff81007ff22000(0000)
knlGS:0000000000000000
[93436.947273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[93436.947274] CR2: 00007fb5bdd76000 CR3: 00000000664d5000 CR4: 00000000000006e0
[93436.947276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[93436.947277] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[93436.947279] Process konqueror (pid: 8079, threadinfo
ffff810005f4e000, task ffff8100a1dec000)
[93436.947281] Stack: ffff810005f4fdd8 ffff810116c86140
ffff810005f4fdd8 ffffffff805314ae
[93436.947284] ffff810116c86140 ffff8100de02c500 ffff810005f4fdf8
ffffffff80531cf0
[93436.947286] ffff8100de02c500 ffff81011c188b48 ffff810005f4fe18
ffffffff80531311
[93436.947290] [<ffffffff805314ae>] skb_release_data+0x5e/0xa0
[93436.947293] [<ffffffff80531cf0>] skb_release_all+0xa0/0x110
[93436.947295] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.947297] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.947299] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.947302] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.947304] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.947307] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.947309] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.947312] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.947314] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.947316] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.947319] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.947322]
[93436.947322]
[93436.947323] Code: 48 8b 18 48 89 c7 e8 5d ff ff ff 48 85 db 75 ed
48 83 c4 08
[93436.947328] RIP [<ffffffff80531438>] skb_drop_list+0x18/0x30
[93436.947330] RSP <ffff810005f4fda8>
[93436.947332] ---[ end trace befb7cc3528ab3b1 ]---
Yes, that looks more networking-related.
Post by Torsten Kaiser
Don't know in what direction I should look.
I also can't easily reproduce this, it happened after several hours of
watching a wmv stream with mplayer...
Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-29 17:00:12 UTC
Permalink
[snip]
Post by Andrew Morton
Post by Torsten Kaiser
[ 7620.708561] Pid: 5698, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #2
[snip]
Post by Andrew Morton
That looks like a sunrpc bug. git-nfsd has bene mucking around in there a
bit.
Please note, that this report is still against 2.6.24-rc3-mm2. The
only new thing about that was, that slub_debug=FZP does not catch the
cause...
Post by Andrew Morton
Post by Torsten Kaiser
The cause, why I am resending this: I just got a crash with
[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
From code inspection I would blame the patch "[SKBUFF]: Free old skb
properly in skb_morph" from Herbert Xu. (CC added)

Mostly it only shuffles code around, the only real change seems to be this hunk:
@@ -441,7 +446,7 @@ static struct sk_buff *__skb_clone(struct sk_buff
*n, struct sk_buff *skb)
*/
struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
{
- skb_release_data(dst);
+ skb_release_all(dst);
return __skb_clone(dst, src);
}
EXPORT_SYMBOL_GPL(skb_morph);

Using sbk_release_all instead only skb_release_data (with is called
automatically from the new sbk_release_all) will add a new call to
dst_release(skb->dst); (first line in sbk_release_all)
Could that explain the above underflow warning?

(I do not have any clue about the inner workings of the network core,
I just looked for code changes, that might be relevant...)
Post by Andrew Morton
Post by Torsten Kaiser
[93436.947241] general protection fault: 0000 [1] SMP
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.947259] RIP: 0010:[<ffffffff80531438>] [<ffffffff80531438>]
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:ffff810005f4fda8 EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 000000000000d135
[93436.947265] RDX: ffff81011d089a80 RSI: 0000000000000001 RDI: ffff81011d089a88
[93436.947266] RBP: ffff810005f4fdb8 R08: 0000000000000001 R09: 0000000000000006
[93436.947268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100de02c500
[93436.947269] R13: ffff81011c188a00 R14: 0000000000000001 R15: ffff81011c189198
[93436.947271] FS: 00007fb5bde0d700(0000) GS:ffff81007ff22000(0000)
knlGS:0000000000000000
[93436.947273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[93436.947274] CR2: 00007fb5bdd76000 CR3: 00000000664d5000 CR4: 00000000000006e0
[93436.947276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[93436.947277] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[93436.947279] Process konqueror (pid: 8079, threadinfo
ffff810005f4e000, task ffff8100a1dec000)
[93436.947281] Stack: ffff810005f4fdd8 ffff810116c86140
ffff810005f4fdd8 ffffffff805314ae
[93436.947284] ffff810116c86140 ffff8100de02c500 ffff810005f4fdf8
ffffffff80531cf0
[93436.947286] ffff8100de02c500 ffff81011c188b48 ffff810005f4fe18
ffffffff80531311
[93436.947290] [<ffffffff805314ae>] skb_release_data+0x5e/0xa0
[93436.947293] [<ffffffff80531cf0>] skb_release_all+0xa0/0x110
[93436.947295] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.947297] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.947299] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.947302] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.947304] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.947307] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.947309] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.947312] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.947314] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.947316] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.947319] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.947322]
[93436.947322]
[93436.947323] Code: 48 8b 18 48 89 c7 e8 5d ff ff ff 48 85 db 75 ed
48 83 c4 08
[93436.947328] RIP [<ffffffff80531438>] skb_drop_list+0x18/0x30
[93436.947330] RSP <ffff810005f4fda8>
[93436.947332] ---[ end trace befb7cc3528ab3b1 ]---
Yes, that looks more networking-related.
I would hope this OOPS was caused by the same error, trying to release
the same list twice.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Herbert Xu
2007-12-30 01:40:13 UTC
Permalink
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Torsten Kaiser
The cause, why I am resending this: I just got a crash with
[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
From code inspection I would blame the patch "[SKBUFF]: Free old skb
properly in skb_morph" from Herbert Xu. (CC added)
I doubt it. skb_morph is only used on IP fragments so I don't see how
you could attribute an error from a Unix domain socket to this patch.

In any case, Unix socket packets should not have a dst at all so the
very fact that you're in that path means that you have some sort of
memory corruption.

Is this the very first OOPS/warning that you see? If not you should
ignore all but the very first one as that may have left your system
in an inconsistent state which may render all subsequent OOPSes and
warnings useless.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-30 03:40:06 UTC
Permalink
Post by Herbert Xu
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Torsten Kaiser
The cause, why I am resending this: I just got a crash with
[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
From code inspection I would blame the patch "[SKBUFF]: Free old skb
properly in skb_morph" from Herbert Xu. (CC added)
I doubt it. skb_morph is only used on IP fragments so I don't see how
you could attribute an error from a Unix domain socket to this patch.
That's why I wrote that I do not know much about the network core...
Post by Herbert Xu
In any case, Unix socket packets should not have a dst at all so the
very fact that you're in that path means that you have some sort of
memory corruption.
... I did not know about the fact that there should not have been an dst.

Its just that this warning was the first nice clue about the memory
corruption related to networking that I see since 2.6.24-rc3-mm2.
The time of the patch (Mon, 26 Nov 2007 15:11:19) even fits into the
window between -rc3-mm1 and -rc3-mm2.

I doubt that the memory corruption is a hardware problem, because the
system in question is using ECC ram and I did not see any messages
about corrected/detected errors.
Post by Herbert Xu
Is this the very first OOPS/warning that you see? If not you should
ignore all but the very first one as that may have left your system
in an inconsistent state which may render all subsequent OOPSes and
warnings useless.
I looked into the log in question and the only other warning was a
circular locking dependency that lockdep detected around 1.5 hour
before this warning.

As reported in my original mail immeadeatly after the warning the
system OOPSed and hang:
[93436.947241] general protection fault: 0000 [1] SMP
-> first OOPS
[93436.947243] last sysfs file:
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2
028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
compat_ioctl32 videobuf_dma_sg v
ideobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
-> not tainted by a previous OOPS
[93436.947259] RIP: 0010:[<ffffffff80531438>] [<ffffffff80531438>]
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:ffff810005f4fda8 EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 000000000000d135
[93436.947265] RDX: ffff81011d089a80 RSI: 0000000000000001 RDI: ffff81011d089a88
[93436.947266] RBP: ffff810005f4fdb8 R08: 0000000000000001 R09: 0000000000000006
[93436.947268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100de02c500
[93436.947269] R13: ffff81011c188a00 R14: 0000000000000001 R15: ffff81011c189198
[93436.947271] FS: 00007fb5bde0d700(0000) GS:ffff81007ff22000(0000)
knlGS:0000000000000000
[93436.947273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[93436.947274] CR2: 00007fb5bdd76000 CR3: 00000000664d5000 CR4: 00000000000006e0
[93436.947276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[93436.947277] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[93436.947279] Process konqueror (pid: 8079, threadinfo
ffff810005f4e000, task ffff8100a1dec000)
[93436.947281] Stack: ffff810005f4fdd8 ffff810116c86140
ffff810005f4fdd8 ffffffff805314ae
[93436.947284] ffff810116c86140 ffff8100de02c500 ffff810005f4fdf8
ffffffff80531cf0
[93436.947286] ffff8100de02c500 ffff81011c188b48 ffff810005f4fe18
ffffffff80531311
[93436.947288] Call Trace:
[93436.947290] [<ffffffff805314ae>] skb_release_data+0x5e/0xa0
[93436.947293] [<ffffffff80531cf0>] skb_release_all+0xa0/0x110
[93436.947295] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.947297] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.947299] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.947302] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.947304] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.947307] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.947309] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.947312] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.947314] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.947316] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.947319] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.947322]
[93436.947322]
[93436.947323] Code: 48 8b 18 48 89 c7 e8 5d ff ff ff 48 85 db 75 ed 48 83 c4 08
[93436.947328] RIP [<ffffffff80531438>] skb_drop_list+0x18/0x30
[93436.947330] RSP <ffff810005f4fda8>
[93436.947332] ---[ end trace befb7cc3528ab3b1 ]---

Your patch just fit so "good" to my problems:
* it had the correct time frame for 2.6.24-rc3-mm2
* it looked guilty at changing the refcounting of __refcnt because of
the added dst_release()
* it added other release / freeing operations so that a use-after-free
memory corruption seemed possible

I just have no better idea to what caused this OOPS and the other
hangs in -rc3-mm2.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Randy Dunlap
2007-12-30 05:50:09 UTC
Permalink
Post by Torsten Kaiser
Post by Herbert Xu
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Torsten Kaiser
The cause, why I am resending this: I just got a crash with
[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
From code inspection I would blame the patch "[SKBUFF]: Free old skb
properly in skb_morph" from Herbert Xu. (CC added)
I doubt it. skb_morph is only used on IP fragments so I don't see how
you could attribute an error from a Unix domain socket to this patch.
That's why I wrote that I do not know much about the network core...
Post by Herbert Xu
In any case, Unix socket packets should not have a dst at all so the
very fact that you're in that path means that you have some sort of
memory corruption.
... I did not know about the fact that there should not have been an dst.
Its just that this warning was the first nice clue about the memory
corruption related to networking that I see since 2.6.24-rc3-mm2.
The time of the patch (Mon, 26 Nov 2007 15:11:19) even fits into the
window between -rc3-mm1 and -rc3-mm2.
I doubt that the memory corruption is a hardware problem, because the
system in question is using ECC ram and I did not see any messages
about corrected/detected errors.
Post by Herbert Xu
Is this the very first OOPS/warning that you see? If not you should
ignore all but the very first one as that may have left your system
in an inconsistent state which may render all subsequent OOPSes and
warnings useless.
I looked into the log in question and the only other warning was a
circular locking dependency that lockdep detected around 1.5 hour
before this warning.
As reported in my original mail immeadeatly after the warning the
[93436.947241] general protection fault: 0000 [1] SMP
-> first OOPS ^
FYI, that's what this counter is... -----^
Post by Torsten Kaiser
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2
028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
compat_ioctl32 videobuf_dma_sg v
ideobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
-> not tainted by a previous OOPS
[93436.947259] RIP: 0010:[<ffffffff80531438>] [<ffffffff80531438>]
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:ffff810005f4fda8 EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 000000000000d135
[93436.947265] RDX: ffff81011d089a80 RSI: 0000000000000001 RDI: ffff81011d089a88
[93436.947266] RBP: ffff810005f4fdb8 R08: 0000000000000001 R09: 0000000000000006
[93436.947268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100de02c500
[93436.947269] R13: ffff81011c188a00 R14: 0000000000000001 R15: ffff81011c189198
[93436.947271] FS: 00007fb5bde0d700(0000) GS:ffff81007ff22000(0000)
knlGS:0000000000000000
[93436.947273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[93436.947274] CR2: 00007fb5bdd76000 CR3: 00000000664d5000 CR4: 00000000000006e0
[93436.947276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[93436.947277] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[93436.947279] Process konqueror (pid: 8079, threadinfo
ffff810005f4e000, task ffff8100a1dec000)
[93436.947281] Stack: ffff810005f4fdd8 ffff810116c86140
ffff810005f4fdd8 ffffffff805314ae
[93436.947284] ffff810116c86140 ffff8100de02c500 ffff810005f4fdf8
ffffffff80531cf0
[93436.947286] ffff8100de02c500 ffff81011c188b48 ffff810005f4fe18
ffffffff80531311
[93436.947290] [<ffffffff805314ae>] skb_release_data+0x5e/0xa0
[93436.947293] [<ffffffff80531cf0>] skb_release_all+0xa0/0x110
[93436.947295] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.947297] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.947299] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.947302] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.947304] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.947307] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.947309] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.947312] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.947314] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.947316] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.947319] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.947322]
[93436.947322]
[93436.947323] Code: 48 8b 18 48 89 c7 e8 5d ff ff ff 48 85 db 75 ed 48 83 c4 08
[93436.947328] RIP [<ffffffff80531438>] skb_drop_list+0x18/0x30
[93436.947330] RSP <ffff810005f4fda8>
[93436.947332] ---[ end trace befb7cc3528ab3b1 ]---
* it had the correct time frame for 2.6.24-rc3-mm2
* it looked guilty at changing the refcounting of __refcnt because of
the added dst_release()
* it added other release / freeing operations so that a use-after-free
memory corruption seemed possible
I just have no better idea to what caused this OOPS and the other
hangs in -rc3-mm2.
---
~Randy
desserts: http://www.xenotime.net/linux/recipes/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-31 20:20:07 UTC
Permalink
Post by Torsten Kaiser
Post by Herbert Xu
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Torsten Kaiser
The cause, why I am resending this: I just got a crash with
[93436.933356] WARNING: at include/net/dst.h:165 dst_release()
[93436.936685] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
[93436.939292]
[93436.939304] [<ffffffff80531d2d>] skb_release_all+0xdd/0x110
[93436.939307] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.939309] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.939312] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.939315] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.939318] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.939320] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.939324] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.939327] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.939329] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.939331] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.939335] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
From code inspection I would blame the patch "[SKBUFF]: Free old skb
properly in skb_morph" from Herbert Xu. (CC added)
I doubt it. skb_morph is only used on IP fragments so I don't see how
you could attribute an error from a Unix domain socket to this patch.
That's why I wrote that I do not know much about the network core...
Post by Herbert Xu
In any case, Unix socket packets should not have a dst at all so the
very fact that you're in that path means that you have some sort of
memory corruption.
... I did not know about the fact that there should not have been an dst.
Its just that this warning was the first nice clue about the memory
corruption related to networking that I see since 2.6.24-rc3-mm2.
The time of the patch (Mon, 26 Nov 2007 15:11:19) even fits into the
window between -rc3-mm1 and -rc3-mm2.
I doubt that the memory corruption is a hardware problem, because the
system in question is using ECC ram and I did not see any messages
about corrected/detected errors.
Post by Herbert Xu
Is this the very first OOPS/warning that you see? If not you should
ignore all but the very first one as that may have left your system
in an inconsistent state which may render all subsequent OOPSes and
warnings useless.
I looked into the log in question and the only other warning was a
circular locking dependency that lockdep detected around 1.5 hour
before this warning.
As reported in my original mail immeadeatly after the warning the
[93436.947241] general protection fault: 0000 [1] SMP
-> first OOPS
/sys/devices/pci0000:00/0000:00:0f.0/0000:01:00.1/irq
[93436.947245] CPU 1
[93436.947246] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2
028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
compat_ioctl32 videobuf_dma_sg v
ideobuf_core btcx_risc tveeprom usbhid videodev v4l2_common hid
v4l1_compat pata_amd sg i2c_nforce2
[93436.947257] Pid: 8079, comm: konqueror Not tainted 2.6.24-rc6-mm1 #11
-> not tainted by a previous OOPS
[93436.947259] RIP: 0010:[<ffffffff80531438>] [<ffffffff80531438>]
skb_drop_list+0x18/0x30
[93436.947262] RSP: 0018:ffff810005f4fda8 EFLAGS: 00010286
[93436.947263] RAX: ab1ed5ca5b74e7de RBX: ab1ed5ca5b74e7de RCX: 000000000000d135
[93436.947265] RDX: ffff81011d089a80 RSI: 0000000000000001 RDI: ffff81011d089a88
[93436.947266] RBP: ffff810005f4fdb8 R08: 0000000000000001 R09: 0000000000000006
[93436.947268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100de02c500
[93436.947269] R13: ffff81011c188a00 R14: 0000000000000001 R15: ffff81011c189198
[93436.947271] FS: 00007fb5bde0d700(0000) GS:ffff81007ff22000(0000)
knlGS:0000000000000000
[93436.947273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[93436.947274] CR2: 00007fb5bdd76000 CR3: 00000000664d5000 CR4: 00000000000006e0
[93436.947276] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[93436.947277] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[93436.947279] Process konqueror (pid: 8079, threadinfo
ffff810005f4e000, task ffff8100a1dec000)
[93436.947281] Stack: ffff810005f4fdd8 ffff810116c86140
ffff810005f4fdd8 ffffffff805314ae
[93436.947284] ffff810116c86140 ffff8100de02c500 ffff810005f4fdf8
ffffffff80531cf0
[93436.947286] ffff8100de02c500 ffff81011c188b48 ffff810005f4fe18
ffffffff80531311
[93436.947290] [<ffffffff805314ae>] skb_release_data+0x5e/0xa0
[93436.947293] [<ffffffff80531cf0>] skb_release_all+0xa0/0x110
[93436.947295] [<ffffffff80531311>] __kfree_skb+0x11/0xa0
[93436.947297] [<ffffffff805313b7>] kfree_skb+0x17/0x30
[93436.947299] [<ffffffff805a0b48>] unix_release_sock+0x128/0x250
[93436.947302] [<ffffffff805a0c91>] unix_release+0x21/0x30
[93436.947304] [<ffffffff8052b144>] sock_release+0x24/0x90
[93436.947307] [<ffffffff8052b656>] sock_close+0x26/0x50
[93436.947309] [<ffffffff8029f921>] __fput+0xc1/0x230
[93436.947312] [<ffffffff8029fe46>] fput+0x16/0x20
[93436.947314] [<ffffffff8029c576>] filp_close+0x56/0x90
[93436.947316] [<ffffffff8029de46>] sys_close+0xa6/0x110
[93436.947319] [<ffffffff8020b57b>] system_call_after_swapgs+0x7b/0x80
[93436.947322]
[93436.947322]
[93436.947323] Code: 48 8b 18 48 89 c7 e8 5d ff ff ff 48 85 db 75 ed 48 83 c4 08
[93436.947328] RIP [<ffffffff80531438>] skb_drop_list+0x18/0x30
[93436.947330] RSP <ffff810005f4fda8>
[93436.947332] ---[ end trace befb7cc3528ab3b1 ]---
* it had the correct time frame for 2.6.24-rc3-mm2
* it looked guilty at changing the refcounting of __refcnt because of
the added dst_release()
* it added other release / freeing operations so that a use-after-free
memory corruption seemed possible
I just have no better idea to what caused this OOPS and the other
hangs in -rc3-mm2.
After testing the patch from http://lkml.org/lkml/2007/12/30/210 the
system hung again after building ~10 packages from the last kde4
release candidate. (see other mail)

I then tried to "fix" it with this suspect.
I changed "skb_release_all(dst);" back to "skb_release_data(dst);" in
skb_morph() (net/core/skbuff.c).

I'm now at 205 of 210 packages completed without a further hang. I
also do not see an obvious memory leak.

(All of these tests where done on 2.6.24-rc3-mm2, as I'm relative
sure, that doing these compiles will trigger the error on that kernel
version)

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Herbert Xu
2008-01-01 12:10:12 UTC
Permalink
Post by Torsten Kaiser
I then tried to "fix" it with this suspect.
I changed "skb_release_all(dst);" back to "skb_release_data(dst);" in
skb_morph() (net/core/skbuff.c).
Check /proc/net/snmp to see if you're getting any fragments, if not
then skb_morph shouldn't even be getting called.
Post by Torsten Kaiser
I'm now at 205 of 210 packages completed without a further hang. I
also do not see an obvious memory leak.
In any case, I suspect the cause of your problem is that somebody
somewhere is doing a double-free on an skb.

Since you're the only person who can reproduce this, we really need
your help to track this down. Since bisecting the mm tree is not
practical, you could start by checking whether the bug is in mm only
or whether it affects rc6 too.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2008-01-01 13:00:15 UTC
Permalink
Post by Herbert Xu
Post by Torsten Kaiser
I then tried to "fix" it with this suspect.
I changed "skb_release_all(dst);" back to "skb_release_data(dst);" in
skb_morph() (net/core/skbuff.c).
Check /proc/net/snmp to see if you're getting any fragments, if not
then skb_morph shouldn't even be getting called.
OK, thanks for that hint.
I look at this after my next tests.
Post by Herbert Xu
Post by Torsten Kaiser
I'm now at 205 of 210 packages completed without a further hang. I
also do not see an obvious memory leak.
In any case, I suspect the cause of your problem is that somebody
somewhere is doing a double-free on an skb.
Is there any debug option I could turn on to catch this?

Hmm... __alloc_skb() uses kmem_cache_alloc_node() and I did run
-rc3-mm2 a long time with slub_debug=FZP and that did not catch
anything. Shouldn't the poisoning catch that? (Sorry if this question
is stupid, but while I can read C, I'm not a kernel expert)
Post by Herbert Xu
Since you're the only person who can reproduce this, we really need
your help to track this down. Since bisecting the mm tree is not
practical, you could start by checking whether the bug is in mm only
or whether it affects rc6 too.
I will try -rc6-mm1 and vanilla -rc6 and report back.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2008-01-01 18:30:12 UTC
Permalink
Post by Torsten Kaiser
Post by Herbert Xu
Post by Torsten Kaiser
I then tried to "fix" it with this suspect.
I changed "skb_release_all(dst);" back to "skb_release_data(dst);" in
skb_morph() (net/core/skbuff.c).
I can't explain, why this seems to fix 2.6.24-rc3-mm2 for me, but at
least in 2.6.24-rc6-mm1 it does not seem to be involved.
Post by Torsten Kaiser
Post by Herbert Xu
Check /proc/net/snmp to see if you're getting any fragments, if not
then skb_morph shouldn't even be getting called.
OK, thanks for that hint.
I look at this after my next tests.
During normal work I did not see the frag counters increase.
I used ping -s 10000 to create some frags, worked perfectly.
I used netio -b 63k -u [target] to create around half a million frags,
worked too.

And what really is strange is that I changed skb_morph into this:
struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
{
printk(KERN_ERR "morph %p:%p",dst,src);
WARN_ON(1);
skb_release_all(dst);
return __skb_clone(dst, src);
}
... that warning was not triggered once.
Post by Torsten Kaiser
Post by Herbert Xu
Post by Torsten Kaiser
I'm now at 205 of 210 packages completed without a further hang. I
also do not see an obvious memory leak.
In any case, I suspect the cause of your problem is that somebody
somewhere is doing a double-free on an skb.
Since you're the only person who can reproduce this, we really need
your help to track this down. Since bisecting the mm tree is not
practical, you could start by checking whether the bug is in mm only
or whether it affects rc6 too.
The problem bisecting this, is that I can't seem to trigger this on
demand. Today I was just about giving up on triggering it in -rc6-mm1
with doing package complies when did happen again. But that was after
more then 4 hours...
Post by Torsten Kaiser
I will try -rc6-mm1 and vanilla -rc6 and report back.
As noted above, my WARN_ON(1) in skb_morph did not trigger once before
the system died with this OOPS:
[18663.909931] Unable to handle kernel NULL pointer dereference at
0000000000000000 RIP:
[18663.915489] [<ffffffff8055f2e8>] tcp_read_sock+0x58/0x1b0
[18663.918652] PGD 73442067 PUD 7480e067 PMD 0
[18663.918652] Oops: 0000 [1] SMP
[18663.918652] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[18663.918652] CPU 1
[18663.918652] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom usbhid videodev v4l2_common
v4l1_compat hid sg pata_amd i2c_nforce2
[18663.918652] Pid: 0, comm: swapper Not tainted 2.6.24-rc6-mm1 #13
[18663.918652] RIP: 0010:[<ffffffff8055f2e8>] [<ffffffff8055f2e8>]
tcp_read_sock+0x58/0x1b0
[18663.918652] RSP: 0018:ffff81007ff4fb60 EFLAGS: 00010286
[18663.918652] RAX: 0000000000000038 RBX: 0000000000000000 RCX: 0000000000000000
[18663.918652] RDX: ffff8100141a40b0 RSI: ffff81007ff4fbc0 RDI: 0000000000000000
[18663.918652] RBP: ffff81007ff4fbb0 R08: 0000000000000002 R09: 0000000000000000
[18663.918652] R10: ffffffff805b2afb R11: 000000000520cde8 R12: 00000000c05a019a
[18663.918652] R13: 000000000f26378b R14: ffff810066469d38 R15: ffff81004b4e4000
[18663.918652] FS: 00007f58ac9a0700(0000) GS:ffff81007ff12580(0000)
knlGS:0000000000000000
[18663.918652] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[18663.918652] CR2: 0000000000000000 CR3: 0000000073441000 CR4: 00000000000006e0
[18663.918652] DR0: 00007fffe1e55cbc DR1: 0000000000000000 DR2: 0000000000000000
[18663.918652] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[18663.918652] Process swapper (pid: 0, threadinfo ffff81011ff2c000,
task ffff81007ff4a000)
[18663.918652] Stack: ffff810066469d38 ffff81004b4e4148
ffffffff805b1ab0 ffff81007ff4fbc0
[18663.918652] Stack: ffff810066469d38 ffff81004b4e4148
ffffffff805b1ab0 ffff81007ff4fbc0
[18663.918652] 00000000805b2afb ffff81004b4e4000 ffff81004b4e4298
ffff810066469d00
[18663.918652] ffff810066469d38 0000000000000000 ffff81007ff4fbf0
ffffffff805b2b41
[18663.918652] Call Trace:
[18663.918652] <IRQ> [<ffffffff805b1ab0>] xs_tcp_data_recv+0x0/0x560
[18663.918652] [<ffffffff805b2b41>] xs_tcp_data_ready+0x71/0x90
[18663.918652] [<ffffffff80568bec>] __tcp_ack_snd_check+0x5c/0xa0
[18663.918652] [<ffffffff8056a458>] tcp_rcv_established+0x3c8/0x800
[18663.918652] [<ffffffff80571451>] tcp_v4_do_rcv+0x2e1/0x4e0
[18663.918652] [<ffffffff80573cb1>] tcp_v4_rcv+0x721/0x850
[18663.918652] [<ffffffff80553d63>] ip_local_deliver_finish+0xd3/0x250
[18663.918652] [<ffffffff8055433b>] ip_local_deliver+0x3b/0x90
[18663.918652] [<ffffffff80553988>] ip_rcv_finish+0x118/0x420
[18663.918652] [<ffffffff8022e313>] enqueue_task_fair+0x73/0xd0
[18663.918652] [<ffffffff80554236>] ip_rcv+0x226/0x2f0
[18663.918652] [<ffffffff80537576>] netif_receive_skb+0x1d6/0x280
[18663.918652] [<ffffffff8053a1ea>] process_backlog+0x8a/0xf0
[18663.918652] [<ffffffff80539e84>] net_rx_action+0xb4/0x130
[18663.918652] [<ffffffff8023d624>] __do_softirq+0x84/0x110
[18663.918652] [<ffffffff8020c82c>] call_softirq+0x1c/0x30
[18663.918652] [<ffffffff8020eaa5>] do_softirq+0x65/0xc0
[18663.918652] [<ffffffff8023d595>] irq_exit+0x95/0xa0
[18663.918652] [<ffffffff8020ebbf>] do_IRQ+0x8f/0x100
[18663.918652] [<ffffffff8020a4b0>] default_idle+0x0/0x80
[18663.918652] [<ffffffff8020bb26>] ret_from_intr+0x0/0xf
[18663.918652] <EOI> [<ffffffff80252310>]
__atomic_notifier_call_chain+0x0/0xa0
[18663.918652] [<ffffffff8020a4f3>] default_idle+0x43/0x80
[18663.918652] [<ffffffff8020a4f1>] default_idle+0x41/0x80
[18663.918652] [<ffffffff8020a4b0>] default_idle+0x0/0x80
[18663.918652] [<ffffffff8020a59c>] cpu_idle+0x6c/0xa0
[18663.918652] [<ffffffff808109b8>] start_secondary+0x2f8/0x420
[18663.918652]
[18663.918652]
[18663.918652] Code: 48 8b 3b 0f 18 0f 74 75 8b 93 a0 00 00 00 45 89 ec 44 2b 63
[18663.918652] RIP [<ffffffff8055f2e8>] tcp_read_sock+0x58/0x1b0
[18663.918652] RSP <ffff81007ff4fb60>
[18663.918652] CR2: 0000000000000000
[18663.918680] ---[ end trace 1dc6b1bf3734ac14 ]---

(gdb) list *0xffffffff8055f2e8
0xffffffff8055f2e8 is in tcp_read_sock (net/ipv4/tcp.c:1173).
1168 static inline struct sk_buff *tcp_recv_skb(struct sock *sk,
u32 seq, u32 *off)
1169 {
1170 struct sk_buff *skb;
1171 u32 offset;
1172
1173 skb_queue_walk(&sk->sk_receive_queue, skb) {
1174 offset = seq - TCP_SKB_CB(skb)->seq;
1175 if (tcp_hdr(skb)->syn)
1176 offset--;
1177 if (offset < skb->len || tcp_hdr(skb)->fin) {

(gdb) list *0xffffffff805b2b41
0xffffffff805b2b41 is in xs_tcp_data_ready (net/sunrpc/xprtsock.c:1079).
1074 goto out;
1075
1076 /* We use rd_desc to pass struct xprt to xs_tcp_data_recv */
1077 rd_desc.arg.data = xprt;
1078 rd_desc.count = 65536;
1079 tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv);
1080 out:
1081 read_unlock(&sk->sk_callback_lock);
1082 }
1083

I will see what vanilla -rc6 will do...

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2008-01-02 18:40:42 UTC
Permalink
Post by Herbert Xu
In any case, I suspect the cause of your problem is that somebody
somewhere is doing a double-free on an skb.
Since you're the only person who can reproduce this, we really need
your help to track this down. Since bisecting the mm tree is not
practical, you could start by checking whether the bug is in mm only
or whether it affects rc6 too.
Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
J. Bruce Fields
2008-01-02 22:00:22 UTC
Permalink
Post by Torsten Kaiser
Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
OK that's great. The next step would be to try excluding specific git
trees from mm to see if they make a difference.
The two specific trees of interest would be git-nfsd and git-net.
Also, if it's git-nfsd, it'd be useful to test with the current git-nfsd
from the for-mm branch at:

git://linux-nfs.org/~bfields/linus.git for-mm

and then any bisection results (even partial) from that tree would help
immensely....

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Herbert Xu
2008-01-02 22:00:23 UTC
Permalink
Post by Torsten Kaiser
Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
OK that's great. The next step would be to try excluding specific git
trees from mm to see if they make a difference.

The two specific trees of interest would be git-nfsd and git-net.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
J. Bruce Fields
2007-12-30 21:30:19 UTC
Permalink
Post by Andrew Morton
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I have finally given up on using 2.6.24-rc3-mm2 with slub_debug=FZP to
get more information out of the random crashes I had seen with that
version. (Did not crash once with slub_debug, so no new information on
what the cause was)
Murphy: Just after sending that mail the system crashed two times with
slub_debug=FZP, but did not show any new informations.
No debug output from slub, only this stacktrace: (Its the same I
already reported in the 2.6.24-rc3-mm2 thread)
[ 7620.673012] ------------[ cut here ]------------
[ 7620.676291] kernel BUG at lib/list_debug.c:33!
[ 7620.679440] invalid opcode: 0000 [1] SMP
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[ 7620.687845] CPU 0
[ 7620.689300] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid i2c_nforce2 sg pata_amd
[ 7620.708561] Pid: 5698, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #2
[ 7620.713080] RIP: 0010:[<ffffffff803bae54>] [<ffffffff803bae54>]
__list_add+0x54/0x60
[ 7620.718667] RSP: 0018:ffff81011bca1dc0 EFLAGS: 00010282
[ 7620.722439] RAX: 0000000000000088 RBX: ffff81011c862c48 RCX: 0000000000000002
[ 7620.727504] RDX: ffff81011bc82ef0 RSI: 0000000000000001 RDI: ffffffff807590c0
[ 7620.732581] RBP: ffff81011bca1dc0 R08: 0000000000000001 R09: 0000000000000000
[ 7620.737658] R10: ffff810080058d48 R11: 0000000000000001 R12: ffff81011ed8d1c8
[ 7620.742711] R13: ffff81011ed8d200 R14: ffff81011ed8d200 R15: ffff81011cc0e578
[ 7620.747806] FS: 00007ffe400116f0(0000) GS:ffffffff807d4000(0000)
knlGS:00000000f73558e0
[ 7620.753535] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7620.757607] CR2: 00000000017071dc CR3: 00000001188b5000 CR4: 00000000000006e0
[ 7620.762677] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7620.767748] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7620.772808] Process nfsv4-svc (pid: 5698, threadinfo
FFFF81011BCA0000, task FFFF81011BC82EF0)
[ 7620.778872] Stack: ffff81011bca1e00 ffffffff805be26e
ffff81011ed8d1d0 ffff81011cc0e578
[ 7620.784626] ffff81011c862c48 ffff81011c8be000 ffff810054a8b060
ffff81011cc0e588
[ 7620.789913] ffff81011bca1e10 ffffffff805be367 ffff81011bca1ee0
ffffffff805bf0ac
[ 7620.796941] [<ffffffff805be26e>] svc_xprt_enqueue+0x1ae/0x250
[ 7620.801087] [<ffffffff805be367>] svc_xprt_received+0x17/0x20
[ 7620.805199] [<ffffffff805bf0ac>] svc_recv+0x39c/0x840
[ 7620.808851] [<ffffffff805bea3f>] svc_send+0xaf/0xd0
[ 7620.812374] [<ffffffff8022f590>] default_wake_function+0x0/0x10
[ 7620.816637] [<ffffffff803163ea>] nfs_callback_svc+0x7a/0x130
[ 7620.820712] [<ffffffff805cfea2>] trace_hardirqs_on_thunk+0x35/0x3a
[ 7620.825174] [<ffffffff80259f8f>] trace_hardirqs_on+0xbf/0x160
[ 7620.829335] [<ffffffff8020cbc8>] child_rip+0xa/0x12
[ 7620.832842] [<ffffffff8020c2df>] restore_args+0x0/0x30
[ 7620.836554] [<ffffffff80316370>] nfs_callback_svc+0x0/0x130
[ 7620.840564] [<ffffffff8020cbbe>] child_rip+0x0/0x12
[ 7620.844102]
[ 7620.845168] INFO: lockdep is turned off.
[ 7620.847964]
[ 7620.847965] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[ 7620.854334] RIP [<ffffffff803bae54>] __list_add+0x54/0x60
[ 7620.858255] RSP <ffff81011bca1dc0>
[ 7620.860724] Kernel panic - not syncing: Aiee, killing interrupt handler!
That looks like a sunrpc bug. git-nfsd has bene mucking around in there a
bit.
Can you still reproduce this? Tom thought there was a chance the
following could fix it.

--b.

From: Tom Tucker <***@opengridcomputing.com>
Date: Sun, 30 Dec 2007 10:07:17 -0600

Bruce/Aime:

Here is what I believe to be the fix for the crashes/svc_xprt BUG_ON
that people are seeing. It would be great if those who have seen this
problem could apply this patch and see if it resolves their problem.

The common code calls svc_xprt_received on behalf of the transport.
Since the provider was calling it as well, this resulted in clearing the
busy bit/resetting xpt_pool when the BUSY bit wasn't held.

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 4628881..4d39db1 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1272,7 +1272,6 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,

if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
svc_xprt_set_local(&svsk->sk_xprt, newsin, newlen);
- svc_xprt_received(&svsk->sk_xprt);
return (struct svc_xprt *)svsk;
}


-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-30 21:40:06 UTC
Permalink
Post by J. Bruce Fields
Post by Andrew Morton
Post by Torsten Kaiser
Post by Torsten Kaiser
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I have finally given up on using 2.6.24-rc3-mm2 with slub_debug=FZP to
get more information out of the random crashes I had seen with that
version. (Did not crash once with slub_debug, so no new information on
what the cause was)
Murphy: Just after sending that mail the system crashed two times with
slub_debug=FZP, but did not show any new informations.
No debug output from slub, only this stacktrace: (Its the same I
already reported in the 2.6.24-rc3-mm2 thread)
[snip]
Post by J. Bruce Fields
Post by Andrew Morton
Post by Torsten Kaiser
[ 7620.708561] Pid: 5698, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #2
[snip]
Post by J. Bruce Fields
Post by Andrew Morton
That looks like a sunrpc bug. git-nfsd has bene mucking around in there a
bit.
Can you still reproduce this? Tom thought there was a chance the
following could fix it.
Please see also http://lkml.org/lkml/2007/12/29/76

Just wanted to say that slub_debug did not help to get more infos.

I will try to reproduce this with rc3-mm2 and the below patch tomorrow.
Without slub_debug this seemed to trigger rather reliable when trying
to update/upgrade packages on my system.
Post by J. Bruce Fields
Date: Sun, 30 Dec 2007 10:07:17 -0600
Here is what I believe to be the fix for the crashes/svc_xprt BUG_ON
that people are seeing. It would be great if those who have seen this
problem could apply this patch and see if it resolves their problem.
The common code calls svc_xprt_received on behalf of the transport.
Since the provider was calling it as well, this resulted in clearing the
busy bit/resetting xpt_pool when the BUSY bit wasn't held.
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 4628881..4d39db1 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1272,7 +1272,6 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
svc_xprt_set_local(&svsk->sk_xprt, newsin, newlen);
- svc_xprt_received(&svsk->sk_xprt);
return (struct svc_xprt *)svsk;
}
I will send a mail, when I'm done with testing this...

Thanks for the patch.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Torsten Kaiser
2007-12-31 13:20:12 UTC
Permalink
Post by Torsten Kaiser
Post by J. Bruce Fields
Date: Sun, 30 Dec 2007 10:07:17 -0600
Here is what I believe to be the fix for the crashes/svc_xprt BUG_ON
that people are seeing. It would be great if those who have seen this
problem could apply this patch and see if it resolves their problem.
The common code calls svc_xprt_received on behalf of the transport.
Since the provider was calling it as well, this resulted in clearing the
busy bit/resetting xpt_pool when the BUSY bit wasn't held.
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 4628881..4d39db1 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1272,7 +1272,6 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
svc_xprt_set_local(&svsk->sk_xprt, newsin, newlen);
- svc_xprt_received(&svsk->sk_xprt);
return (struct svc_xprt *)svsk;
}
I will send a mail, when I'm done with testing this...
Removing this line from 2.6.24-rc3-mm2 does not solve my crash
FYI the codepart from net/sunrpc/svcsock.c / svc_create_socket() where
I removed this:
if (protocol == IPPROTO_TCP) {
if ((error = kernel_listen(sock, 64)) < 0)
goto bummer;
}

if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
memcpy(&svsk->sk_xprt.xpt_local, newsin, newlen);
//svc_xprt_received(&svsk->sk_xprt);
return (struct svc_xprt *)svsk;
}

bummer:
dprintk("svc: svc_create_socket error = %d\n", -error);


The crash itself:
[11166.565362] ------------[ cut here ]------------
[11166.568595] kernel BUG at lib/list_debug.c:33!
[11166.571696] invalid opcode: 0000 [1] SMP
[11166.574527] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[11166.580017] CPU 3
[11166.581442] Modules linked in: radeon drm nfsd exportfs w83792d
ipv6 tuner tea5767 tda8290 tuner_xc2
028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common
compat_ioctl32 videobuf_dma_sg v
ideobuf_core btcx_risc tveeprom videodev usbhid v4l2_common hid
v4l1_compat sg pata_amd i2c_nforce2
[11166.600470] Pid: 5548, comm: nfsv4-svc Not tainted 2.6.24-rc3-mm2 #3
[11166.604912] RIP: 0010:[<ffffffff803bae54>] [<ffffffff803bae54>]
__list_add+0x54/0x60
[11166.610408] RSP: 0000:ffff81007d83fdc0 EFLAGS: 00010282
[11166.614144] RAX: 0000000000000088 RBX: ffff81007f2e0400 RCX: 0000000000000002
[11166.619113] RDX: ffff81007dc6eed0 RSI: 0000000000000001 RDI: ffffffff807590c0
[11166.624130] RBP: ffff81007d83fdc0 R08: 0000000000000001 R09: 0000000000000000
[11166.629124] R10: ffff810080058d48 R11: 0000000000000001 R12: ffff81007e444680
[11166.634129] R13: ffff81007e4446b8 R14: ffff81007e4446b8 R15: ffff81011ff50100
[11166.639128] FS: 00007fb815abc6f0(0000) GS:ffff81011ff13280(0000)
knlGS:0000000000000000
[11166.644786] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[11166.648809] CR2: 0000000000441770 CR3: 0000000000201000 CR4: 00000000000006e0
[11166.653796] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[11166.658784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[11166.663783] Process nfsv4-svc (pid: 5548, threadinfo
FFFF81007D83E000, task FFFF81007DC6EED0)
[11166.669776] Stack: ffff81007d83fe00 ffffffff805be25e
ffff81007e444688 ffff81011ff50100
[11166.675428] ffff81007f2e0400 ffff81007dd62000 ffff81010a138000
ffff81011ff50110
[11166.680660] ffff81007d83fe10 ffffffff805be357 ffff81007d83fee0
ffffffff805bf09c
[11166.685744] Call Trace:
[11166.687592] [<ffffffff805be25e>] svc_xprt_enqueue+0x1ae/0x250
[11166.691672] [<ffffffff805be357>] svc_xprt_received+0x17/0x20
[11166.695700] [<ffffffff805bf09c>] svc_recv+0x39c/0x840
[11166.699299] [<ffffffff805bea2f>] svc_send+0xaf/0xd0
[11166.702755] [<ffffffff8022f590>] default_wake_function+0x0/0x10
[11166.706983] [<ffffffff803163ea>] nfs_callback_svc+0x7a/0x130
[11166.710992] [<ffffffff805cfe92>] trace_hardirqs_on_thunk+0x35/0x3a
[11166.715377] [<ffffffff80259f8f>] trace_hardirqs_on+0xbf/0x160
[11166.719454] [<ffffffff8020cbc8>] child_rip+0xa/0x12
[11166.722919] [<ffffffff8020c2df>] restore_args+0x0/0x30
[11166.726578] [<ffffffff80316370>] nfs_callback_svc+0x0/0x130
[11166.730540] [<ffffffff8020cbbe>] child_rip+0x0/0x12
[11166.734024]
[11166.735072] INFO: lockdep is turned off.
[11166.737843]
[11166.737844] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16 48 89 e5 e8
[11166.744160] RIP [<ffffffff803bae54>] __list_add+0x54/0x60
[11166.748015] RSP <ffff81007d83fdc0>
[11166.750464] Kernel panic - not syncing: Aiee, killing interrupt handler!
-> then the system hung, no "---[ end trace xyz ]---"-output

Will it make a difference if I try it in -rc6-mm1?

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Herbert Xu
2007-12-24 02:00:30 UTC
Permalink
No, the problem is that include/crypto/scatterwalk.h doesn't include enough
header files to support its inlining fetish. It needs sched.h.
I'll get it fixed in cryptodev.
Ingo, it's not good that we have cond_resched() definitions conditionally
duplicated in kernel.h - that's increasing the risk of bugs like this one.
Actually, why do we even have cond_resched when real preemption
is on? It seems to be a waste of space and time.

Any objections to something like this to remove cond_resched with
CONFIG_PREEMPT on (apart from the potential to uncover more bugs
like this one)?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 94bc996..a7283c9 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -105,8 +105,8 @@ struct user;
* supposed to.
*/
#ifdef CONFIG_PREEMPT_VOLUNTARY
-extern int cond_resched(void);
-# define might_resched() cond_resched()
+extern int _cond_resched(void);
+# define might_resched() _cond_resched()
#else
# define might_resched() do { } while (0)
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ac3d496..ae8e9bd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1863,7 +1863,18 @@ static inline int need_resched(void)
* cond_resched_lock() will drop the spinlock before scheduling,
* cond_resched_softirq() will enable bhs before scheduling.
*/
-extern int cond_resched(void);
+#ifdef CONFIG_PREEMPT
+static inline int cond_resched(void)
+{
+ return 0;
+}
+#else
+extern int _cond_resched(void);
+static inline int cond_resched(void)
+{
+ return _cond_resched();
+}
+#endif
extern int cond_resched_lock(spinlock_t * lock);
extern int cond_resched_softirq(void);

diff --git a/kernel/sched.c b/kernel/sched.c
index 3df84ea..2dc2bbf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4683,7 +4683,8 @@ static void __cond_resched(void)
} while (need_resched());
}

-int __sched cond_resched(void)
+#if !defined(CONFIG_PREEMPT) || defined(CONFIG_PREEMPT_VOLUNTARY)
+int __sched _cond_resched(void)
{
if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
system_state == SYSTEM_RUNNING) {
@@ -4692,7 +4693,8 @@ int __sched cond_resched(void)
}
return 0;
}
-EXPORT_SYMBOL(cond_resched);
+EXPORT_SYMBOL(_cond_resched);
+#endif

/*
* cond_resched_lock() - if a reschedule is pending, drop the given lock,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2007-12-30 13:20:10 UTC
Permalink
Ingo, it's not good that we have cond_resched() definitions
conditionally duplicated in kernel.h - that's increasing the risk of
bugs like this one.
Actually, why do we even have cond_resched when real preemption is on?
It seems to be a waste of space and time.
due to the BKL. cond_resched() in BKL code breaks up BKL latencies.

i dont mind not doing that though - we should increase the pain for BKL
users, so that subsystems finally get rid of it altogether.
lock_kernel() use within the kernel is still rampant - there are still
more than 400 callsites to lock_kernel().

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Nick Piggin
2008-01-02 10:40:14 UTC
Permalink
Post by Ingo Molnar
Ingo, it's not good that we have cond_resched() definitions
conditionally duplicated in kernel.h - that's increasing the risk of
bugs like this one.
Actually, why do we even have cond_resched when real preemption is on?
It seems to be a waste of space and time.
due to the BKL. cond_resched() in BKL code breaks up BKL latencies.
i dont mind not doing that though - we should increase the pain for BKL
users, so that subsystems finally get rid of it altogether.
lock_kernel() use within the kernel is still rampant - there are still
more than 400 callsites to lock_kernel().
It would be silly to potentially increase latency in some areas
for CONFIG_PREEMPT kernels, though.

Better may be to detect when there is CONFIG_PREEMPT and
CONFIG_PREEMPT_BKL, and ifdef away the cond_resched in that case
(or -- why do we even make CONFIG_PREEMPT_BKL an option? Are there
really workloads left where it causes throughput regressions?)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Peter Zijlstra
2008-01-02 11:10:20 UTC
Permalink
Post by Nick Piggin
Post by Ingo Molnar
Ingo, it's not good that we have cond_resched() definitions
conditionally duplicated in kernel.h - that's increasing the risk of
bugs like this one.
Actually, why do we even have cond_resched when real preemption is on?
It seems to be a waste of space and time.
due to the BKL. cond_resched() in BKL code breaks up BKL latencies.
i dont mind not doing that though - we should increase the pain for BKL
users, so that subsystems finally get rid of it altogether.
lock_kernel() use within the kernel is still rampant - there are still
more than 400 callsites to lock_kernel().
It would be silly to potentially increase latency in some areas
for CONFIG_PREEMPT kernels, though.
Better may be to detect when there is CONFIG_PREEMPT and
CONFIG_PREEMPT_BKL, and ifdef away the cond_resched in that case
(or -- why do we even make CONFIG_PREEMPT_BKL an option? Are there
really workloads left where it causes throughput regressions?)
I've seen 1s+ desktop latencies due to PREEMPT_BKL when I was still
using reiserfs.

Both reiserfs and tty were fighting for the bkl and massive prio
inversion ensued. Turning PREEMPT_BKL off made the system usable again.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Nick Piggin
2008-01-02 11:20:18 UTC
Permalink
Post by Peter Zijlstra
Post by Nick Piggin
Post by Ingo Molnar
Ingo, it's not good that we have cond_resched() definitions
conditionally duplicated in kernel.h - that's increasing the risk
of bugs like this one.
Actually, why do we even have cond_resched when real preemption is
on? It seems to be a waste of space and time.
due to the BKL. cond_resched() in BKL code breaks up BKL latencies.
i dont mind not doing that though - we should increase the pain for BKL
users, so that subsystems finally get rid of it altogether.
lock_kernel() use within the kernel is still rampant - there are still
more than 400 callsites to lock_kernel().
It would be silly to potentially increase latency in some areas
for CONFIG_PREEMPT kernels, though.
Better may be to detect when there is CONFIG_PREEMPT and
CONFIG_PREEMPT_BKL, and ifdef away the cond_resched in that case
(or -- why do we even make CONFIG_PREEMPT_BKL an option? Are there
really workloads left where it causes throughput regressions?)
I've seen 1s+ desktop latencies due to PREEMPT_BKL when I was still
using reiserfs.
Fair enough; so the former ifdefery would be preferable for now then.
Post by Peter Zijlstra
Both reiserfs and tty were fighting for the bkl and massive prio
inversion ensued. Turning PREEMPT_BKL off made the system usable again.
Are either of those subsystems actually using the BKL to protect against
anything else (than themselves)? It would be sweet to have them use
private mutexes for the job instead (although even then it probably
wouldn't be a straight conversion)...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Peter Zijlstra
2008-01-02 11:30:20 UTC
Permalink
Post by Nick Piggin
Post by Peter Zijlstra
I've seen 1s+ desktop latencies due to PREEMPT_BKL when I was still
using reiserfs.
Fair enough; so the former ifdefery would be preferable for now then.
To be honest, I must mention that the load that did that was a kernel
build -j5 on a dual socket Athlon MP box. With a current kernel and XFS
that load is making the box slow but its still very servicable.
Post by Nick Piggin
Post by Peter Zijlstra
Both reiserfs and tty were fighting for the bkl and massive prio
inversion ensued. Turning PREEMPT_BKL off made the system usable again.
Are either of those subsystems actually using the BKL to protect against
anything else (than themselves)?
I doubt it.

IIRC Alan is working on getting tty BKL free.
Post by Nick Piggin
It would be sweet to have them use
private mutexes for the job instead (although even then it probably
wouldn't be a straight conversion)...
I tried a quick conversion of reiser3 at the time, but it really wants a
recursive lock and I couldn't be bothered to fix a 'legacy' filesystem
so I just gave up and converted the filesystem to XFS.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-02 12:30:22 UTC
Permalink
Post by Peter Zijlstra
It would be sweet to have them use private mutexes for the job
instead (although even then it probably wouldn't be a straight
conversion)...
I tried a quick conversion of reiser3 at the time, but it really wants
a recursive lock and I couldn't be bothered to fix a 'legacy'
filesystem so I just gave up and converted the filesystem to XFS.
as long as the only requirement is recursion, and not any of the other
BKL properties, that could be wrapped. I guess fixing the TTY code to
have no BKL dependencies has a higher chance of success - given that
Alan is working on it :-)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alan Cox
2008-01-02 13:40:23 UTC
Permalink
Post by Ingo Molnar
BKL properties, that could be wrapped. I guess fixing the TTY code to
have no BKL dependencies has a higher chance of success - given that
Alan is working on it :-)
Bit by bit when I can face it, and with a lot of other people
contributing parts. Right now the BKL mostly protects the open/close
paths and those are *really* ugly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-02 16:20:12 UTC
Permalink
Post by Alan Cox
Post by Ingo Molnar
BKL properties, that could be wrapped. I guess fixing the TTY code
to have no BKL dependencies has a higher chance of success - given
that Alan is working on it :-)
Bit by bit when I can face it, and with a lot of other people
contributing parts. Right now the BKL mostly protects the open/close
paths and those are *really* ugly
could we perhaps just replace it with a tty_mutex? (possibly a recursive
one) I suspect by now most of the BKL dependencies there have become
local to the tty code? Or are there deep VFS dependencies as well? (if
yes, what type of dependencies?)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alan Cox
2008-01-02 23:00:20 UTC
Permalink
Post by Ingo Molnar
could we perhaps just replace it with a tty_mutex? (possibly a recursive
one) I suspect by now most of the BKL dependencies there have become
local to the tty code? Or are there deep VFS dependencies as well? (if
yes, what type of dependencies?)
The big problem is that nobody actually knows where all the dependancies
are. That is why I've started with the bits we can decipher so that it
simplifies the mess each time we clean up the locking of some lower level
aspect.

Almost all the serial drivers clone the same open and release methods (or
worse older versions of it) so that also needs doing. Lots to do, so
little time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Ingo Molnar
2008-01-02 11:10:30 UTC
Permalink
(or -- why do we even make CONFIG_PREEMPT_BKL an option? [...]
thanks for the reminder - i just zapped it. Was a pleasure ;-)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2007-12-24 02:00:36 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in git-x86
which I stared at for a while then I ran out of time and gave up.
I would have just abandoned this release until it was fixed but I'll be
largely offline for ten days starting tomorrow.
The culprits have been notified and hopefully we'll have a patch for
hot-fixes/ tomorrow.
x86_64 and powerpc work OK though.
- git-block is dropped due to more conflicts that I'm prepared to repair
with git-scsi-misc
- git-perfmon is dropped due to conflicts with git-x86
- git-kgdb is dropped due to conflicts with git-x86
- git-newsetup is dropped due to conflicts with git-x86
- Andi's x86 quilt tree is dropped due to conflicts with git-x86
- Someone broke suspend-to-RAM on the t61p again. It just instantly resumes
itself.
Suspend is also broken on my HP nx6325 (hangs hard in the last phase of
suspend) and git-cpufreq.patch is responsible for that (as shown by bisection).
Reverting git-cpufreq.patch makes suspend work again,
ah. Thanks.
although it still is a
bit flaky (it takes well more than 5 seconds to suspend and the sound adapter
doesn't work right after the resume, but it starts to work again about 10s
later).
hm. There have been some suspend changes in the alsa tree.

And yes, I noticed that susped has become slower too - looks like abot ten
seconds, which is a pretty significant usability irritant.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Dave Jones
2007-12-24 02:00:37 UTC
Permalink
Post by Andrew Morton
Post by Andrew Morton
- Someone broke suspend-to-RAM on the t61p again. It just instantly resumes
itself.
Suspend is also broken on my HP nx6325 (hangs hard in the last phase of
suspend) and git-cpufreq.patch is responsible for that (as shown by bisection).
Reverting git-cpufreq.patch makes suspend work again,
ah. Thanks.
I'm not sure how this is 'new' breakage, because git-cpufreq hasn't changed
in a while, other than the integration of that missing #include diff
that sat in -mm. Maybe some bad interaction with something else that
changed perhaps. *shrug*.

I'm on vacation until the new year, so I'm going out of my way not to look
at bugs for a change. But I'm not ignoring this completely, I'll make a
note to look at it in January.

Dave
--
http://www.codemonkey.org.uk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Takashi Iwai
2007-12-24 13:50:11 UTC
Permalink
At Sun, 23 Dec 2007 14:50:03 -0800,
Post by Andrew Morton
although it still is a
bit flaky (it takes well more than 5 seconds to suspend and the sound adapter
doesn't work right after the resume, but it starts to work again about 10s
later).
hm. There have been some suspend changes in the alsa tree.
Not really. The usb-audio suspend support is the only addition on
mm. It should be irrelevant with on-board HD-audio...


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2007-12-24 14:00:07 UTC
Permalink
Post by Takashi Iwai
At Sun, 23 Dec 2007 14:50:03 -0800,
Post by Andrew Morton
although it still is a
bit flaky (it takes well more than 5 seconds to suspend and the sound adapter
doesn't work right after the resume, but it starts to work again about 10s
later).
hm. There have been some suspend changes in the alsa tree.
Not really. The usb-audio suspend support is the only addition on
mm. It should be irrelevant with on-board HD-audio...
Well, I'm suspecting some ACPI changes, but will be only able to debug it
further in a couple of days.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rafael J. Wysocki
2007-12-24 02:00:37 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
- This kernel doesn't work on i386!
It oopses late in boot due to an unrevertable change (e3c1b141) in git-x86
which I stared at for a while then I ran out of time and gave up.
I would have just abandoned this release until it was fixed but I'll be
largely offline for ten days starting tomorrow.
The culprits have been notified and hopefully we'll have a patch for
hot-fixes/ tomorrow.
x86_64 and powerpc work OK though.
- git-block is dropped due to more conflicts that I'm prepared to repair
with git-scsi-misc
- git-perfmon is dropped due to conflicts with git-x86
- git-kgdb is dropped due to conflicts with git-x86
- git-newsetup is dropped due to conflicts with git-x86
- Andi's x86 quilt tree is dropped due to conflicts with git-x86
- Someone broke suspend-to-RAM on the t61p again. It just instantly resumes
itself.
Suspend is also broken on my HP nx6325 (hangs hard in the last phase of
suspend) and git-cpufreq.patch is responsible for that (as shown by bisection).

Reverting git-cpufreq.patch makes suspend work again, although it still is a
bit flaky (it takes well more than 5 seconds to suspend and the sound adapter
doesn't work right after the resume, but it starts to work again about 10s
later).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andreas Mohr
2007-12-25 22:00:18 UTC
Permalink
Hi,

another one most likely related to the recent NFS_V4 define build error
saga:

CC fs/nfs/super.o
fs/nfs/super.c: In function 'nfs_sb_deactive':
fs/nfs/super.c:338: error: 'TASK_NORMAL' undeclared (first use in this function)
fs/nfs/super.c:338: error: (Each undeclared identifier is reported only once
fs/nfs/super.c:338: error: for each function it appears in.)
fs/nfs/super.c: In function 'nfs_put_super':
fs/nfs/super.c:349: error: 'TASK_UNINTERRUPTIBLE' undeclared (first use in this function)
fs/nfs/super.c:349: error: implicit declaration of function 'schedule'
make[3]: *** [fs/nfs/super.o] Error 1
make[2]: *** [fs/nfs] Error 2
make[1]: *** [fs] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.24-rc6-mm1.system-gate-patch'
make: *** [debian/stamp-build-kernel] Error 2


This was hand-patched from earlier kernel versions, however I wouldn't
think there was any problem due to this (a cleanly extracted version
doesn't show any md5sum difference for fs/nfs/super.c).

[plus hotfix x86-fix-system-gate-related-crash.patch]

I'm circa 120% sure there must be a sched.h include missing there, given the
whereabouts of these APIs ;)


CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
CONFIG_NFSD_V4=y
CONFIG_NFSD_TCP=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y


i386 K6-***@150, gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
V***@vt.edu
2007-12-26 05:50:10 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I've bisected it down this far:

kvm-ist-kaput.patch GOOD
git-lblnet.patch
git-lblnet-fixup.patch
git-leds.patch
git-libata-all.patch
git-libata-all-fix-pata_winbond-borkage.patch
git-libata-all-wtf.patch BAD

and somehow, I doubt the leds or libata trees horked up networking. ;)

Symptoms - semi-sporadic failures in making network connections. The test
case that tripped it up was the 'make test' from the Tcl 8.5 - several of the
test cases will create a listening socket, and then try to connect to it.
Under 2.6.24-rc5-mm1, it works just fine, but I'm seeing hangs under -rc6-mm1.
Doing a 'netstat -n -a -A inet -p' while it's hung shows me this:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:34118 0.0.0.0:* LISTEN 2236/tcltest
tcp 0 1 127.0.0.1:59460 127.0.0.1:34118 SYN_SENT 2236/tcltest
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest
tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:47842 0.0.0.0:* LISTEN 2352/tcltest
tcp 0 1 127.0.0.1:46510 127.0.0.1:47842 SYN_SENT 2352/tcltest

Pretty consistent failure mode - a socket is in 'listen', and the connection
gets hung in 'SYN_SENT'. There's 3 outputs listed - the first one from one run
of the test case, the second 2 are some 20 seconds apart on the same run.
It's pretty obvious that if you can't complete a 3-packet handshake to loopback
in 20 seconds, something is hosed. However, it's apparently some sort of
race/timing issue, as many *other* test cases in the Tcl test tree do in fact
work OK.

I already checked, it's not a slam-dunk to just 'patch -R' as there's 3 or 4
conflicts where later patches need massaging/reverting as well.

It's a problem with both 'classic RCU' and 'preempt RCU' (that was my *first*
guess as to the cause).

Any clues/hints/advice/patches?
James Morris
2007-12-26 07:40:09 UTC
Permalink
Post by V***@vt.edu
I already checked, it's not a slam-dunk to just 'patch -R' as there's 3 or 4
conflicts where later patches need massaging/reverting as well.
It's a problem with both 'classic RCU' and 'preempt RCU' (that was my *first*
guess as to the cause).
Any clues/hints/advice/patches?
Can you post your .config ?

Also, is that the plain upstream Tcl package you're compiling, or a distro
package?
--
James Morris
<***@namei.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
James Morris
2007-12-26 09:00:21 UTC
Permalink
Post by James Morris
Can you post your .config ?
The gzip'ed config as of when I quit bisecting is attached. It's probably
not directly usable unless you have a quilt tree that's positioned fairly
close to git-lblnet.patch.
What does the following say ?

# sestatus && rpm -q selinux-policy

Do you see anything unusual in the audit log or syslog?

Try

# ausearch -hn 127.0.0.1

and

# ausearch -x tcltest



- James
--
James Morris
<***@namei.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
James Morris
2007-12-26 14:20:12 UTC
Permalink
Post by James Morris
What does the following say ?
# sestatus && rpm -q selinux-policy
Don't worry about that -- I reproduced it with Paul Moore's git tree:
git://git.infradead.org/users/pcmoore/lblnet-2.6_testing

(under current -mm, the e1000 driver doesn't find my ethernet card & the
tcl tests won't run without an external interface).

The offending commit is when SELinux is converted to the new ifindex
interface:

9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 is first bad commit
commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315
Author: Paul Moore <***@hp.com>
Date: Fri Dec 21 11:44:26 2007 -0500

SELinux: Convert the netif code to use ifindex values

[...]

In some case (not yet fully identified -- also happens when avahi starts
up, although seemingly silently & without obvious issues), SELinux is
passed an ifindex of 1515870810, which corresponds to 0x5a5a5a5a, the slab
poison value, suggesting a race in the calling code where we're being
asked to check an skb which has been freed.

The SELinux code is erroring out before performing an access check
(perhaps there should be WARN_ON, at least), so this will affect both
permissive and enforcing mode without generating any log messages.

Andrew: I suggest dropping the patchset from -mm until Paul gets back from
vacation.


- James
--
James Morris
<***@namei.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2007-12-26 22:50:07 UTC
Permalink
Post by James Morris
Post by James Morris
What does the following say ?
# sestatus && rpm -q selinux-policy
git://git.infradead.org/users/pcmoore/lblnet-2.6_testing
(under current -mm, the e1000 driver doesn't find my ethernet card & the
tcl tests won't run without an external interface).
You might need to enable CONFIG_E1000E.
Post by James Morris
The offending commit is when SELinux is converted to the new ifindex
9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 is first bad commit
commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315
Date: Fri Dec 21 11:44:26 2007 -0500
SELinux: Convert the netif code to use ifindex values
[...]
In some case (not yet fully identified -- also happens when avahi starts
up, although seemingly silently & without obvious issues), SELinux is
passed an ifindex of 1515870810, which corresponds to 0x5a5a5a5a, the slab
poison value, suggesting a race in the calling code where we're being
asked to check an skb which has been freed.
The SELinux code is erroring out before performing an access check
(perhaps there should be WARN_ON, at least), so this will affect both
permissive and enforcing mode without generating any log messages.
Andrew: I suggest dropping the patchset from -mm until Paul gets back from
vacation.
OK, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
James Morris
2007-12-26 23:50:11 UTC
Permalink
Post by Andrew Morton
Post by James Morris
(under current -mm, the e1000 driver doesn't find my ethernet card & the
tcl tests won't run without an external interface).
You might need to enable CONFIG_E1000E.
Indeed, it works for me.



- James
--
James Morris
<***@namei.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
V***@vt.edu
2007-12-26 16:50:07 UTC
Permalink
Post by James Morris
Post by James Morris
Can you post your .config ?
The gzip'ed config as of when I quit bisecting is attached. It's probably
not directly usable unless you have a quilt tree that's positioned fairly
close to git-lblnet.patch.
What does the following say ?
# sestatus && rpm -q selinux-policy
I'm running MLS in permissive mode, so there shouldn't be any SElinux
denials happening.
Dave Young
2007-12-26 08:40:07 UTC
Permalink
This post might be inappropriate. Click to display it.
Mariusz Kozlowski
2007-12-26 12:30:14 UTC
Permalink
Hello,

WARNING: vmlinux.o(.text+0x46b04): Section mismatch: reference to .init.text:sun4v_ktsb_register (between 'smp_callin' and 'smp_fill_in_sib_core_maps')
WARNING: vmlinux.o(.text+0x4756c): Section mismatch: reference to .init.text:sun4v_register_mondo_queues (between 'after_lock_tlb' and 'hv_cpu_startup')
WARNING: vmlinux.o(.text+0x477ac): Section mismatch: reference to .init.text:sun4v_register_mondo_queues (between 'hv_cpu_startup' and 'sys32_exit')
WARNING: vmlinux.o(.text+0x55258): Section mismatch: reference to .init.text:__alloc_bootmem (between 'kernel_map_range' and 'kernel_map_pages')
WARNING: vmlinux.o(.text+0x55278): Section mismatch: reference to .init.text:__alloc_bootmem (between 'kernel_map_range' and 'kernel_map_pages')
WARNING: vmlinux.o(.text+0x1fdfe4): Section mismatch: reference to .init.text:sunserial_console_match (between 'hv_probe' and 'serial_in')
WARNING: vmlinux.o(.text+0x20011c): Section mismatch: reference to .init.text:sunserial_console_match (between 'su_probe' and 'sunsu_console_putchar')
WARNING: vmlinux.o(.sun4v_2insn_patch+0x3d8): Section mismatch: reference to .init.text:
WARNING: vmlinux.o(__ksymtab+0x62c0): Section mismatch: reference to .init.text:sunserial_console_match (between '__ksymtab_sunserial_console_match' and '__ksymtab_sunserial_unregister_minors')

Regards,

Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2007-12-27 03:10:08 UTC
Permalink
From: Mariusz Kozlowski <***@tuxland.pl>
Date: Wed, 26 Dec 2007 13:29:07 +0100
Post by Mariusz Kozlowski
WARNING: vmlinux.o(.text+0x46b04): Section mismatch: reference to .init.text:sun4v_ktsb_register (between 'smp_callin' and 'smp_fill_in_sib_core_maps')
Well known and I see them every build and so does everyone
else on sparc64.

They are harmless and as time allows I try to find ways
to get rid of them but it's very low priority.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Adrian Bunk
2007-12-28 23:30:15 UTC
Permalink
Post by David Miller
Date: Wed, 26 Dec 2007 13:29:07 +0100
Post by Mariusz Kozlowski
WARNING: vmlinux.o(.text+0x46b04): Section mismatch: reference to .init.text:sun4v_ktsb_register (between 'smp_callin' and 'smp_fill_in_sib_core_maps')
Well known and I see them every build and so does everyone
else on sparc64.
They are harmless and as time allows I try to find ways
to get rid of them but it's very low priority.
At least the sunserial_console_match() one is an obvious Oops
(EXPORT_SYMBOL of an __init function).

The comment in the description of
commit 58d784a5c754cd66ecd4791222162504d3c16c74 the warning was bogus
is bullshit.

I'm not sure whether this might count as a 2.6.24-rc regression or
whether 2.6.23 is simply differently but similarly broken (does anyone
actually use the Sun console drivers modular?).

cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2007-12-29 08:20:07 UTC
Permalink
From: Adrian Bunk <***@kernel.org>
Date: Sat, 29 Dec 2007 01:22:56 +0200
Post by Adrian Bunk
At least the sunserial_console_match() one is an obvious Oops
(EXPORT_SYMBOL of an __init function).
The comment in the description of
commit 58d784a5c754cd66ecd4791222162504d3c16c74 the warning was bogus
is bullshit.
I'm not sure whether this might count as a 2.6.24-rc regression or
whether 2.6.23 is simply differently but similarly broken (does anyone
actually use the Sun console drivers modular?).
You can't do that, the FOO_CONSOLE config options depend upon
FOO=y.

That's why I'm not worried about this issue and it's not critical at
all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2007-12-29 08:30:14 UTC
Permalink
From: David Miller <***@davemloft.net>
Date: Sat, 29 Dec 2007 00:14:11 -0800 (PST)
Post by David Miller
You can't do that, the FOO_CONSOLE config options depend upon
FOO=y.
That's why I'm not worried about this issue and it's not critical at
all.
Adrian, if you're interested in tackling this "fun" problem,
have a look at add_preferred_console() and find a way to make
that not marked __init. (it's called by sunserial_console_match)

That's what causes this dependency chain of __init problems for the
Sun serial console drivers.

It's problematic, furthermore, because even if one could call
add_preferred_console() from a module properly, it doesn't have the
desired effect of changing init's stdin/stdout/stderr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Adrian Bunk
2007-12-29 08:50:06 UTC
Permalink
Post by David Miller
Date: Sat, 29 Dec 2007 01:22:56 +0200
Post by Adrian Bunk
At least the sunserial_console_match() one is an obvious Oops
(EXPORT_SYMBOL of an __init function).
The comment in the description of
commit 58d784a5c754cd66ecd4791222162504d3c16c74 the warning was bogus
is bullshit.
I'm not sure whether this might count as a 2.6.24-rc regression or
whether 2.6.23 is simply differently but similarly broken (does anyone
actually use the Sun console drivers modular?).
You can't do that, the FOO_CONSOLE config options depend upon
FOO=y.
Looking closer, the problem aren't the FOO_CONSOLE options themselves,
the problem is that with FOO_CONSOLE=n sunserial_console_match() still
gets called.
Post by David Miller
That's why I'm not worried about this issue and it's not critical at
all.
If a module calls sunserial_console_match() that's an Oops.

I removed the EXPORT_SYMBOL(sunserial_console_match), and this is the
result:
MODPOST 136 modules
ERROR: "sunserial_console_match" [drivers/serial/sunzilog.ko] undefined!
ERROR: "sunserial_console_match" [drivers/serial/sunsu.ko] undefined!
ERROR: "sunserial_console_match" [drivers/serial/sunsab.ko] undefined!

-ENOHARDWARE, but looking at the code you could call me _very_ surprised
if you manage to load a modular sunsab from 2.6.24-rc6 on a machine with
the hardware without getting an Oops.

cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2007-12-29 09:00:16 UTC
Permalink
From: Adrian Bunk <***@kernel.org>
Date: Sat, 29 Dec 2007 10:48:46 +0200
Post by Adrian Bunk
Post by David Miller
That's why I'm not worried about this issue and it's not critical at
all.
If a module calls sunserial_console_match() that's an Oops.
That's true.

I'm trying to figure out a way to fix this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Adrian Bunk
2007-12-29 09:10:07 UTC
Permalink
Post by David Miller
Date: Sat, 29 Dec 2007 10:48:46 +0200
Post by Adrian Bunk
Post by David Miller
That's why I'm not worried about this issue and it's not critical at
all.
If a module calls sunserial_console_match() that's an Oops.
That's true.
I'm trying to figure out a way to fix this.
#ifdef FOO_CONSOLE around the sunserial_console_match() calls in the
drivers should work.

If you consider this too many #ifdef's, an alternative solution would be
doing the following in drivers/serial/suncore.h:

#ifndef MODULE
extern int sunserial_console_match(struct console *, struct device_node *,
struct uart_driver *, int);
#else
static inline int sunserial_console_match(struct console *, struct device_node *,
struct uart_driver *, int);
{ return 0; }
#endif

cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2007-12-29 09:20:07 UTC
Permalink
From: Adrian Bunk <***@kernel.org>
Date: Sat, 29 Dec 2007 11:06:19 +0200
Post by Adrian Bunk
Post by David Miller
Date: Sat, 29 Dec 2007 10:48:46 +0200
Post by Adrian Bunk
Post by David Miller
That's why I'm not worried about this issue and it's not critical at
all.
If a module calls sunserial_console_match() that's an Oops.
That's true.
I'm trying to figure out a way to fix this.
#ifdef FOO_CONSOLE around the sunserial_console_match() calls in the
drivers should work.
It absolutely doesn't work, I tried this, see my other reply.

The issue is add_preferred_console() is __init, driver probe calls are
__devinit which are either __init or not __init.

So even with the FOO_CONSOLE ifdef (or something similar like the
patch I posted) we'll still get section mismatch warnings.
Post by Adrian Bunk
If you consider this too many #ifdef's, an alternative solution would be
#ifndef MODULE
extern int sunserial_console_match(struct console *, struct device_node *,
struct uart_driver *, int);
#else
static inline int sunserial_console_match(struct console *, struct device_node *,
struct uart_driver *, int);
{ return 0; }
#endif
Just removing the __init tag from add_preferred_console() (and
subsequently sunserial_console_match()) is probably the easiest way to
fix all of this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Adrian Bunk
2007-12-29 10:00:14 UTC
Permalink
Post by David Miller
Date: Sat, 29 Dec 2007 11:06:19 +0200
Post by Adrian Bunk
Post by David Miller
Date: Sat, 29 Dec 2007 10:48:46 +0200
Post by Adrian Bunk
Post by David Miller
That's why I'm not worried about this issue and it's not critical at
all.
If a module calls sunserial_console_match() that's an Oops.
That's true.
I'm trying to figure out a way to fix this.
#ifdef FOO_CONSOLE around the sunserial_console_match() calls in the
drivers should work.
It absolutely doesn't work, I tried this, see my other reply.
The issue is add_preferred_console() is __init, driver probe calls are
__devinit which are either __init or not __init.
So even with the FOO_CONSOLE ifdef (or something similar like the
patch I posted) we'll still get section mismatch warnings.
...
Sorry, I shouldn't suggest stuff I haven't tried myself... :-(

cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2007-12-29 09:20:06 UTC
Permalink
From: David Miller <***@davemloft.net>
Date: Sat, 29 Dec 2007 00:54:08 -0800 (PST)
Post by David Miller
Date: Sat, 29 Dec 2007 10:48:46 +0200
Post by Adrian Bunk
Post by David Miller
That's why I'm not worried about this issue and it's not critical at
all.
If a module calls sunserial_console_match() that's an Oops.
That's true.
I'm trying to figure out a way to fix this.
At the end of this email is one idea I came up with but it still
results in:

WARNING: vmlinux.o(.text+0x19e52c): Section mismatch: reference to .init.text:sunserial_console_match (between 'hv_probe' and 'sunzilog_get_mctrl')
WARNING: vmlinux.o(.text+0x19fd1c): Section mismatch: reference to .init.text:sunserial_console_match (between 'zs_probe' and 'serial_in')
WARNING: vmlinux.o(.text+0x19fd5c): Section mismatch: reference to .init.text:sunserial_console_match (between 'zs_probe' and 'serial_in')
WARNING: vmlinux.o(.text+0x1a19e0): Section mismatch: reference to .init.text:sunserial_console_match (between 'su_probe' and 'sunsu_console_putchar')
WARNING: vmlinux.o(.text+0x1a307c): Section mismatch: reference to .init.text:sunserial_console_match (between 'sab_probe' and 'sunsab_send_xchar')
WARNING: vmlinux.o(.text+0x1a3090): Section mismatch: reference to .init.text:sunserial_console_match (between 'sab_probe' and 'sunsab_send_xchar')
WARNING: vmlinux.o(.sun4v_2insn_patch+0x4f8): Section mismatch: reference to .init.text:

if CONFIG_HOTPLUG is set because driver initialization code has to be
marked with __devinit and with HOTPLUG that isn't __init.

This means it's impossible to call add_preferred_console() (either
directly or indirectly via a helper like sunserial_console_match())
from a driver init routine.

The only way I can think of to "work around" this is to mark
sunserial_console_match() as __init_refok, and use some static
variable in suncore which starts as "0" gets set to "1" via a
late_initcall() to block the call to add_preferred_console().

But that's just gross.

Probably the thing to do to untangle this is to make
add_preferred_console() not be __init. I just tested and that seems
to make everything happy. Again, below is the first thing I
tried just for reference.

diff --git a/drivers/serial/suncore.c b/drivers/serial/suncore.c
index 707c5b0..a4cbd17 100644
--- a/drivers/serial/suncore.c
+++ b/drivers/serial/suncore.c
@@ -21,6 +21,8 @@

#include <asm/prom.h>

+#define SUNCORE_CONSOLE
+
#include "suncore.h"

static int sunserial_current_minor = 64;
@@ -52,8 +54,9 @@ void sunserial_unregister_minors(struct uart_driver *drv, int count)
}
EXPORT_SYMBOL(sunserial_unregister_minors);

-int __init sunserial_console_match(struct console *con, struct device_node *dp,
- struct uart_driver *drv, int line)
+int __init sunserial_console_match(struct console *con,
+ struct device_node *dp,
+ struct uart_driver *drv, int line)
{
int off;

@@ -74,7 +77,6 @@ int __init sunserial_console_match(struct console *con, struct device_node *dp,

return 1;
}
-EXPORT_SYMBOL(sunserial_console_match);

void
sunserial_console_termios(struct console *con)
diff --git a/drivers/serial/suncore.h b/drivers/serial/suncore.h
index 042668a..de4dfb0 100644
--- a/drivers/serial/suncore.h
+++ b/drivers/serial/suncore.h
@@ -25,8 +25,17 @@ extern int suncore_mouse_baud_detection(unsigned char, int);
extern int sunserial_register_minors(struct uart_driver *, int);
extern void sunserial_unregister_minors(struct uart_driver *, int);

+#ifdef SUNCORE_CONSOLE
extern int sunserial_console_match(struct console *, struct device_node *,
struct uart_driver *, int);
extern void sunserial_console_termios(struct console *);
+#else
+static inline int sunserial_console_match(struct console *con,
+ struct device_node *dp,
+ struct uart_driver *drv, int line)
+{
+ return 0;
+}
+#endif

#endif /* !(_SERIAL_SUN_H) */
diff --git a/drivers/serial/sunhv.c b/drivers/serial/sunhv.c
index be0fe15..67a0b4c 100644
--- a/drivers/serial/sunhv.c
+++ b/drivers/serial/sunhv.c
@@ -24,6 +24,8 @@
#include <asm/of_device.h>
#include <asm/irq.h>

+#define SUNCORE_CONSOLE
+
#if defined(CONFIG_MAGIC_SYSRQ)
#define SUPPORT_SYSRQ
#endif
diff --git a/drivers/serial/sunsab.c b/drivers/serial/sunsab.c
index 543f937..955d54b 100644
--- a/drivers/serial/sunsab.c
+++ b/drivers/serial/sunsab.c
@@ -38,9 +38,12 @@
#include <asm/prom.h>
#include <asm/of_device.h>

-#if defined(CONFIG_SERIAL_SUNSAB_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#ifdef CONFIG_SERIAL_SUNSAB_CONSOLE
+#define SUNCORE_CONSOLE
+#ifdef CONFIG_MAGIC_SYSRQ
#define SUPPORT_SYSRQ
#endif
+#endif

#include <linux/serial_core.h>

diff --git a/drivers/serial/sunsu.c b/drivers/serial/sunsu.c
index 4e2302d..cbd97eb 100644
--- a/drivers/serial/sunsu.c
+++ b/drivers/serial/sunsu.c
@@ -41,9 +41,12 @@
#include <asm/prom.h>
#include <asm/of_device.h>

-#if defined(CONFIG_SERIAL_SUNSU_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#ifdef CONFIG_SERIAL_SUNSU_CONSOLE
+#define SUNCORE_CONSOLE
+#ifdef CONFIG_MAGIC_SYSRQ
#define SUPPORT_SYSRQ
#endif
+#endif

#include <linux/serial_core.h>

diff --git a/drivers/serial/sunzilog.c b/drivers/serial/sunzilog.c
index cb2e405..d18d145 100644
--- a/drivers/serial/sunzilog.c
+++ b/drivers/serial/sunzilog.c
@@ -38,9 +38,12 @@
#include <asm/prom.h>
#include <asm/of_device.h>

-#if defined(CONFIG_SERIAL_SUNZILOG_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#ifdef CONFIG_SERIAL_SUNZILOG_CONSOLE
+#define SUNCORE_CONSOLE
+#ifdef CONFIG_MAGIC_SYSRQ
#define SUPPORT_SYSRQ
#endif
+#endif

#include <linux/serial_core.h>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul Moore
2007-12-26 16:00:18 UTC
Permalink
As James said I'm away right now and computer access is limited. However, I'm stuck in the airport right now and spent some time looking at the code ... Based on what has been found so far I wonder if the problem isn't a race but a problem of skb->iif never being initialized correctly? To my untrained eye it looks like __netdev_alloc_skb() should be setting skb->iif (like it does for skb->dev) but it currently doesn't.

Am I barking up the wrong tree here?

. paul moore
. linux security @ hp
-----Original Message-----
From: James Morris <***@namei.org>
Date: Wednesday, Dec 26, 2007 7:16 am
Subject: Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage
Post by James Morris
Post by James Morris
What does the following say ?
# sestatus && rpm -q selinux-policy
Don't worry about that -- I reproduced it with Paul Moore's git tree: git://git.infradead.org/users/pcmoore/lblnet-2.6_testing
(under current -mm, the e1000 driver doesn't find my ethernet card & the
tcl tests won't run without an external interface).
The offending commit is when SELinux is converted to the new ifindex
9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 is first bad commit
commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315
Date: Fri Dec 21 11:44:26 2007 -0500
SELinux: Convert the netif code to use ifindex values
[...]
In some case (not yet fully identified -- also happens when avahi starts
up, although seemingly silently & without obvious issues), SELinux is
passed an ifindex of 1515870810, which corresponds to 0x5a5a5a5a, the slab
poison value, suggesting a race in the calling code where we're being
asked to check an skb which has been freed.
The SELinux code is erroring out before performing an access check
(perhaps there should be WARN_ON, at least), so this will affect both
permissive and enforcing mode without generating any log messages.
Andrew: I suggest dropping the patchset from -mm until Paul gets back from
vacation.
- James
--
James Morris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
James Morris
2007-12-26 22:00:18 UTC
Permalink
Post by Paul Moore
As James said I'm away right now and computer access is limited.
However, I'm stuck in the airport right now and spent some time looking
at the code ... Based on what has been found so far I wonder if the
problem isn't a race but a problem of skb->iif never being initialized
correctly? To my untrained eye it looks like __netdev_alloc_skb()
should be setting skb->iif (like it does for skb->dev) but it currently
doesn't.
->iif will be zeroed during skb allocation, then set during
netif_receive_skb().


- James
--
James Morris
<***@namei.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul Moore
2007-12-28 19:30:13 UTC
Permalink
Post by James Morris
Post by Paul Moore
As James said I'm away right now and computer access is limited.
However, I'm stuck in the airport right now and spent some time looking
at the code ... Based on what has been found so far I wonder if the
problem isn't a race but a problem of skb->iif never being initialized
correctly? To my untrained eye it looks like __netdev_alloc_skb()
should be setting skb->iif (like it does for skb->dev) but it currently
doesn't.
->iif will be zeroed during skb allocation, then set during
netif_receive_skb().
So it is ... I didn't look at __alloc_skb() close enough. Thanks.
--
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul Moore
2007-12-31 17:20:12 UTC
Permalink
Post by James Morris
Post by Paul Moore
As James said I'm away right now and computer access is limited.
However, I'm stuck in the airport right now and spent some time looking
at the code ... Based on what has been found so far I wonder if the
problem isn't a race but a problem of skb->iif never being initialized
correctly? To my untrained eye it looks like __netdev_alloc_skb()
should be setting skb->iif (like it does for skb->dev) but it currently
doesn't.
->iif will be zeroed during skb allocation, then set during
netif_receive_skb().
I was able to reproduce this bug this morning by running avahi as James did
and did a little more digging. I don't have a fix yet, but thought I would
pass along what I've found in case this triggers a moment of clarity to
someone out there ...

The skb->iif value appears to be messed up as early as netif_receive_skb(), in
my case it is set to 196611 (trust me, I don't have that many interfaces in
my test machine) which causes the ->iif initialization code in
netif_receive_skb() to be skipped because ->iif is greater than zero. This
particular packet is locally generated and locally consumed.

Hopefully I'll have a fix later this afternoon but if someone has a bright
idea I'd love to hear it. Backtrace is below:

WARNING: at security/selinux/hooks.c:3805 selinux_socket_sock_rcv_skb()
Pid: 1454, comm: avahi-daemon Not tainted 2.6.24-rc5 #4
[<c04aac4e>] selinux_socket_sock_rcv_skb+0x96/0x3ac
[<c041bddf>] printk+0x1b/0x1f
[<c04349c9>] __print_symbol+0x21/0x2a
[<c04a5ae8>] security_sock_rcv_skb+0xc/0xd
[<c05822c3>] sock_queue_rcv_skb+0x29/0xce
[<d08f34e9>] ipt_do_table+0x423/0x466 [ip_tables]
[<c05bf114>] udp_queue_rcv_skb+0x199/0x201
[<c04caf24>] vsnprintf+0x283/0x450
[<d08f93e8>] nf_conntrack_in+0x307/0x3d7 [nf_conntrack]
[<c05bf56a>] __udp4_lib_rcv+0x3ee/0x7a7
[<d08fc26f>] nf_ct_deliver_cached_events+0x8/0x90 [nf_conntrack]
[<d0984158>] ipv4_confirm+0x34/0x39 [nf_conntrack_ipv4]
[<c059e99a>] nf_iterate+0x3a/0x6e
[<c05a38d3>] ip_local_deliver_finish+0x0/0x191
[<c05a38d3>] ip_local_deliver_finish+0x0/0x191
[<c05a39e5>] ip_local_deliver_finish+0x112/0x191
[<c05a38b4>] ip_rcv_finish+0x254/0x273
[<c05a3660>] ip_rcv_finish+0x0/0x273
[<c05a3cd3>] ip_rcv+0x1cc/0x1fb
[<c05a3660>] ip_rcv_finish+0x0/0x273
[<c05a3b07>] ip_rcv+0x0/0x1fb
[<c0587fd7>] netif_receive_skb+0x37d/0x397
[<c058a111>] process_backlog+0x60/0x92
[<c0589e16>] net_rx_action+0x67/0x118
[<c041f164>] __do_softirq+0x35/0x75
[<c0404f02>] do_softirq+0x3e/0x8d
[<c041f06e>] local_bh_enable+0x6b/0x79
[<d08fc26f>] nf_ct_deliver_cached_events+0x8/0x90 [nf_conntrack]
[<d0984158>] ipv4_confirm+0x34/0x39 [nf_conntrack_ipv4]
[<d0984124>] ipv4_confirm+0x0/0x39 [nf_conntrack_ipv4]
[<c059e99a>] nf_iterate+0x3a/0x6e
[<c05a6ca9>] ip_finish_output+0x0/0x208
[<c059ea3f>] nf_hook_slow+0x4d/0xb5
[<c05a6ca9>] ip_finish_output+0x0/0x208
[<c05a7cb5>] ip_mc_output+0x172/0x18b
[<c05a6ca9>] ip_finish_output+0x0/0x208
[<c05a5b79>] ip_push_pending_frames+0x2be/0x311
[<c05a5790>] dst_output+0x0/0x7
[<c05bedb6>] udp_push_pending_frames+0x298/0x2d7
[<c05bfd8b>] udp_sendmsg+0x459/0x55c
[<c05c4bf9>] inet_sendmsg+0x3b/0x45
[<c057eead>] sock_sendmsg+0xc8/0xe3
[<c0429863>] autoremove_wake_function+0x0/0x33
[<c057eead>] sock_sendmsg+0xc8/0xe3
[<c0429863>] autoremove_wake_function+0x0/0x33
[<c04cbb78>] copy_from_user+0x32/0x5e
[<c04cbb78>] copy_from_user+0x32/0x5e
[<c057f05a>] sys_sendmsg+0x192/0x1f7
[<c041eb1b>] current_fs_time+0x13/0x15
[<c0470b14>] file_update_time+0x21/0x61
[<c04663f2>] pipe_write+0x3cc/0x3d8
[<c0460e91>] do_sync_write+0x0/0x109
[<c0460f57>] do_sync_write+0xc6/0x109
[<c0429863>] autoremove_wake_function+0x0/0x33
[<c058029c>] sys_socketcall+0x240/0x261
[<c0403c72>] syscall_call+0x7/0xb
=======================
--
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul Moore
2007-12-31 20:10:10 UTC
Permalink
Post by Paul Moore
Post by James Morris
Post by Paul Moore
As James said I'm away right now and computer access is limited.
However, I'm stuck in the airport right now and spent some time looking
at the code ... Based on what has been found so far I wonder if the
problem isn't a race but a problem of skb->iif never being initialized
correctly? To my untrained eye it looks like __netdev_alloc_skb()
should be setting skb->iif (like it does for skb->dev) but it currently
doesn't.
->iif will be zeroed during skb allocation, then set during
netif_receive_skb().
I was able to reproduce this bug this morning by running avahi as James did
and did a little more digging. I don't have a fix yet, but thought I would
pass along what I've found in case this triggers a moment of clarity to
someone out there ...
The skb->iif value appears to be messed up as early as netif_receive_skb(),
in my case it is set to 196611 (trust me, I don't have that many interfaces
in my test machine) which causes the ->iif initialization code in
netif_receive_skb() to be skipped because ->iif is greater than zero. This
particular packet is locally generated and locally consumed.
Hopefully I'll have a fix later this afternoon but if someone has a bright
idea I'd love to hear it ...
[NOTE: I added netdev to this thread to gather some input. @netdev folks, the
problem is that the skb->iif field contains garbage in some cases which is
causing problems for some new SELinux network code. The exact problem
probably isn't too important for this discussion, what is important is that
the skb->iif field contains a non-zero garbage value some of the time on
incoming packets.]

I'm pretty certain this is an uninitialized value problem now and not a
use-after-free issue. The invalid/garbage ->iif value seems to only happen
on packets that are generated locally and sent back into the stack for local
consumption, e.g. loopback. These local packets also need to have been
cloned at some point, either on the output or input path.

The problem appears to be a skb_clone() function which does not clear the skb
structure properly and fails to copy the ->iif value from the original skb to
the cloned skb. From what I can tell, there are two possible solutions to
this problem:

1. Clear all of the cloned skb fields in skb_clone() via memset()
2. Copy the ->iif field in __copy_skb_header()

I don't have a good enough understanding of all the details involving skb
memory management to know if option #1 is a Good Idea or not, but option #2
seems much simpler and solves the problem of garbage in the ->iif field. My
preference is to go with option #2 but before I submit a patch does anyone
think this is the wrong solution?
--
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
James Morris
2007-12-31 21:50:10 UTC
Permalink
Post by Paul Moore
I'm pretty certain this is an uninitialized value problem now and not a
use-after-free issue. The invalid/garbage ->iif value seems to only happen
on packets that are generated locally and sent back into the stack for local
consumption, e.g. loopback. These local packets also need to have been
cloned at some point, either on the output or input path.
I think we need to find out exactly what's happening, first.
Post by Paul Moore
The problem appears to be a skb_clone() function which does not clear the skb
structure properly and fails to copy the ->iif value from the original skb to
the cloned skb. From what I can tell, there are two possible solutions to
1. Clear all of the cloned skb fields in skb_clone() via memset()
Sounds like it's not going to fly for performance reasons in any case.
Post by Paul Moore
2. Copy the ->iif field in __copy_skb_header()
Seems valid.


- James
--
James Morris
<***@namei.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Paul Moore
2007-12-31 22:10:08 UTC
Permalink
Post by James Morris
Post by Paul Moore
I'm pretty certain this is an uninitialized value problem now and not a
use-after-free issue. The invalid/garbage ->iif value seems to only
happen on packets that are generated locally and sent back into the stack
for local consumption, e.g. loopback. These local packets also need to
have been cloned at some point, either on the output or input path.
I think we need to find out exactly what's happening, first.
The more I've looked at the code this afternoon, I'm certain this is the case.
I've also been running a patched kernel (using option #2 from below) and all
of the skbs coming up the stack have valid ->iif values. Granted, I haven't
examined the code from the avahi daemon or the tcl test cases and traced the
entire code path through the kernel but I _am_ certain that at some point in
that code path the packet is cloned and due to a problem in skb_clone()
the ->iif field is not copied correctly causing the problems we have all
seen.

How much smoke needs to be coming from the gun? :)
Post by James Morris
Post by Paul Moore
The problem appears to be a skb_clone() function which does not clear the
skb structure properly and fails to copy the ->iif value from the
original skb to the cloned skb. From what I can tell, there are two
1. Clear all of the cloned skb fields in skb_clone() via memset()
Sounds like it's not going to fly for performance reasons in any case.
That was my gut feeling. I was also a little unsure where exactly the correct
placement should be for the memset() call.
Post by James Morris
Post by Paul Moore
2. Copy the ->iif field in __copy_skb_header()
Seems valid.
Okay, I'll stick with this approach. I'll post a patch backed against
net-2.6.25 tomorrow as an RFC to see if anyone on netdev has any strong
feelings. If no one complains, I'll add it to the lblnet git tree.
--
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Joseph Fannin
2007-12-27 03:00:10 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
This doesn't build on powerpc with my .config:

In file included from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/sched.h: In function ‘spin_needbreak’:
include/linux/sched.h:1947: error: implicit declaration of function ‘__raw_spin_is_contended’

I don't see where __raw_spin_is_contended is defined for any arch
other than x86, so I guess this will happen on any non-x86 arch when
SMP=y and PREEMPT=y are set?

This comes from "spinlock: lockbreak cleanup" in git-x86.

--
Joseph Fannin
***@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Nick Piggin
2007-12-27 05:30:11 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
include/linux/sched.h:1947: error: implicit declaration of function ???__raw_spin_is_contended???
I don't see where __raw_spin_is_contended is defined for any arch
other than x86, so I guess this will happen on any non-x86 arch when
SMP=y and PREEMPT=y are set?
And CONFIG_GENERIC_LOCKBREAK is not defined, which is what powerpc needs.

Thanks for reporting,
Nick

---

Index: linux-2.6/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/Kconfig
+++ linux-2.6/arch/powerpc/Kconfig
@@ -53,6 +53,11 @@ config RWSEM_XCHGADD_ALGORITHM
bool
default y

+config GENERIC_LOCKBREAK
+ bool
+ default y
+ depends on SMP && PREEMPT
+
config ARCH_HAS_ILOG2_U32
bool
default y
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
V***@vt.edu
2007-12-27 06:10:11 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
Looks like an uninitialized variable dereference for SEPARATOR events:

# mount -t securityfs none /sys/kernel/security/
# ls /sys/kernel/security/
tpm0
# l /sys/kernel/security/tpm0/
total 0
0 -r--r----- 1 root root 0 2007-12-26 23:28 ascii_bios_measurements
0 -r--r----- 1 root root 0 2007-12-26 23:28 binary_bios_measurements
# cat /sys/kernel/security/tpm0/ascii_bios_measurements
0 0000000000000000000000000000000000000000 07 [S-CRTM Contents]
0 0000000000000000000000000000000000000000 07 [S-CRTM Contents]
0 0000000000000000000000000000000000000000 07 [S-CRTM Contents]
0 0000000000000000000000000000000000000000 07 [S-CRTM Contents]
4 c1e25c3f6b0dc78d57296aa2870ca6f782ccf80f 05 [Calling INT 19h]
0 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
1 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
2 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
3 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
4 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
5 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
6 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
7 85e53271e14006f0265921d02d4d736cdc580b0b 04 [ÿ]
4 38f30a0a967fcf2bfee1e3b2971de540115048c8 05 [Returned INT 19h]
4 f9d3a33e4ba6109fb60e8df6ec0f10330733c8b2 0c [Compact Hash]
5 9bd5c812613f67ce1c75d0ea48b9933a547683cb 0c [Compact Hash]

Looks like the problem is likely in get_event_name:

case NONHOST_INFO:
name = tcpa_event_type_strings[event->event_type];
n_len = strlen(name);
break;
case SEPARATOR:
case ACTION:
if (MAX_TEXT_EVENT > event->event_size) {
name = event_entry;
n_len = event->event_size;
}
break;

Should there be a 'break;' after the SEPARATOR line? Given the name, it
probably doesn't have a name/length pair attached to an event, right?
Kamalesh Babulal
2007-12-27 09:00:17 UTC
Permalink
Hi Andrew,

The 2.6.24-rc6-mm1 kernel with hotfix x86-fix-system-gate-related-crash.patch applied
panics while booting on a x86_64 box

Unable to handle kernel NULL pointer dereference at 0000000000000046 RIP:
[<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
PGD 17ff65067 PUD 17f1c7067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 0
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80369a0b>] [<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
RSP: 0000:ffffffff80650e00 EFLAGS: 00010002
RAX: ffff8101fe9568c8 RBX: ffff8100010062a8 RCX: ffff8101fe9568b0
RDX: ffff8101fe9568c8 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff80650e10 R08: ffff8101fe9568c8 R09: 0000000000000086
R10: 0000000000000000 R11: 00000000000001e8 R12: ffff8100010062b8
R13: 0000000000000002 R14: ffff810001006260 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffffff805dc000(0000) knlGS:00000000f31ffbb0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000046 CR3: 000000017f0ab000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff805f6000, task ffffffff805a2080)
Stack: ffff8100010062a8 ffff8101fe9568b0 ffffffff80650e40 ffffffff8024be16
ffffffff80369d65 ffffffff80369d65 ffff8101fe9568b0 ffff8100010062a8
ffffffff80650eb0 ffffffff8024c1d5 ffffffffb88cc28e 0000000006e73eff
Call Trace:
<IRQ> [<ffffffff8024be16>] __remove_hrtimer+0x2e/0x3c
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff8024c1d5>] hrtimer_run_queues+0x130/0x191
[<ffffffff8023fd09>] run_timer_softirq+0x28/0x1a7
[<ffffffff8023c018>] __do_softirq+0x55/0xc2
[<ffffffff8020c73c>] call_softirq+0x1c/0x28
[<ffffffff8020e719>] do_softirq+0x32/0x9d
[<ffffffff8023c0dd>] irq_exit+0x3f/0x41
[<ffffffff8021ff85>] smp_apic_timer_interrupt+0x92/0xa7
[<ffffffff8020c1e6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff802095f5>] default_idle+0x36/0x5e
[<ffffffff802095f0>] default_idle+0x31/0x5e
[<ffffffff802095bf>] default_idle+0x0/0x5e
[<ffffffff802096b6>] cpu_idle+0x90/0xb2
[<ffffffff804b0126>] rest_init+0x5a/0x5c
[<ffffffff806017ee>] start_kernel+0x2b8/0x2c4
[<ffffffff8060112b>] _sinittext+0x12b/0x132


Code: 48 8b 06 83 e0 03 4c 09 c0 48 89 06 4d 85 c0 74 12 49 39 48
RIP [<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
RSP <ffffffff80650e00>
CR2: 0000000000000046

The gdb for the above panic

(gdb) l *0xffffffff80369a0b
0xffffffff80369a0b is in rb_erase (include/linux/rbtree.h:125).
120 #define rb_set_red(r) do { (r)->rb_parent_color &= ~1; } while (0)
121 #define rb_set_black(r) do { (r)->rb_parent_color |= 1; } while (0)
122
123 static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
124 {
125 rb->rb_parent_color = (rb->rb_parent_color & 3) | (unsigned long)p;
126 }
127 static inline void rb_set_color(struct rb_node *rb, int color)
128 {
129 rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;


And when i tried rebooting again, i got the following traces one after the another
continuous in the second boot up

Unable to handle kernel paging request at 000000000000407f RIP:
[<ffffffff804b2bd3>] _spin_lock_irqsave+0xc/0x1d
PGD 1ff102067 PUD ffff8101fe6e4000
Oops: 0002 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 3
Modules linked in:
Pid: 16511, comm: ,@ Tainted: G M 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff804b2bd3>] [<ffffffff804b2bd3>] _spin_lock_irqsave+0xc/0x1d
RSP: 0000:ffff8101fe6e4178 EFLAGS: 00010046
RAX: 0000000000000046 RBX: 000000000000407b RCX: 0000000000000001
RDX: 0000000000000100 RSI: 0000000000000002 RDI: 000000000000407f
RBP: ffff8101fe6e4178 R08: 0000000000000001 R09: ffff8101fe6e43e0
R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000000000
R13: 0000000000000002 R14: ffff8101fe6e4000 R15: ffff8101fe6e4298
FS: 0000000000000000(0000) GS:ffff8101fff13000(0063) knlGS:00000000f7d4a080
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000000000407f CR3: 00000001ff1f2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ,@ (pid: 16511, threadinfo 00000000ffffffff, task ffff8101fe6e4000)
Stack: ffff8101fe6e4198 ffffffff80369d65 000000000000407b 000000000000407f
ffff8101fe6e41a8 ffffffff8024c599 ffff8101fe6e4288 ffffffff8022473a
0000000000000000 0000000000000000 0000000000000000 000000000000401b
Call Trace:


Code: f0 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c9 c3 55 48 89
RIP [<ffffffff804b2bd3>] _spin_lock_irqsave+0xc/0x1d
RSP <ffff8101fe6e4178>
CR2: 000000000000407f

0xffffffff804b2bd3 is in _spin_lock_irqsave (include/asm/spinlock.h:75).
70 * and should be optimal for the uncontended case. Note the tail must
71 * be in the high byte, otherwise the 16-bit wide increment of the low
72 * byte would carry up and contaminate the high byte.
73 */
74
75 __asm__ __volatile__ (
76 LOCK_PREFIX "xaddw %w0, %1\n"
77 "1:\t"
78 "cmpb %h0, %b0\n\t"
79 "je 2f\n\t"

PGD 0
Oops: 0000 [3] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 2
Modules linked in:
Pid: 0, comm: swapper Tainted: G M D 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80369c2b>] [<ffffffff80369c2b>] rb_next+0x1e/0x4f
RSP: 0000:ffff81017ff3be10 EFLAGS: 00010002
RAX: 0000000000000002 RBX: ffff8101000332a8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8101000332a8 RDI: 0000000000000002
RBP: ffff81017ff3be10 R08: 00000000000001e8 R09: 0000000000000086
R10: 0000000000000001 R11: 00000000000001e8 R12: ffff8101fe71dec8
R13: 0000000000000002 R14: ffff810100033260 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff81017ff0e000(0000) knlGS:00000000f7ea3b80
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000012 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8100e3b4a000, task ffff81007ff6c000)
Stack: ffff81017ff3be40 ffffffff8024be06 000000008020bb29 000000008020bb29
ffff8101fe71dec8 ffff8101000332a8 ffff81017ff3beb0 ffffffff8024c1d5
ffffffffb88cc1cc 000000000848661b ffffffffb88cc1cc 000000000848661b
Call Trace:
<IRQ> [<ffffffff8024be06>] __remove_hrtimer+0x1e/0x3c
[<ffffffff8024c1d5>] hrtimer_run_queues+0x130/0x191
[<ffffffff8023fd09>] run_timer_softirq+0x28/0x1a7
[<ffffffff8023c018>] __do_softirq+0x55/0xc2
[<ffffffff8020c73c>] call_softirq+0x1c/0x28
[<ffffffff8020e719>] do_softirq+0x32/0x9d
[<ffffffff8023c0dd>] irq_exit+0x3f/0x41
[<ffffffff8021ff85>] smp_apic_timer_interrupt+0x92/0xa7
[<ffffffff8020c1e6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff802095f5>] default_idle+0x36/0x5e
[<ffffffff802095f0>] default_idle+0x31/0x5e
[<ffffffff802095bf>] default_idle+0x0/0x5e
[<ffffffff802096b6>] cpu_idle+0x90/0xb2
[<ffffffff8021f0c1>] start_secondary+0x3ad/0x3b9


Code: 48 83 7f 10 00 74 06 48 8b 7f 10 eb f3 48 89 f8 eb 1d 48 89
RIP [<ffffffff80369c2b>] rb_next+0x1e/0x4f
RSP <ffff81017ff3be10>
CR2: 0000000000000012


0xffffffff80369c2b is in rb_next (lib/rbtree.c:333).
328 /* If we have a right-hand child, go down and then left as far
329 as we can. */
330 if (node->rb_right) {
331 node = node->rb_right;
332 while (node->rb_left)
333 node=node->rb_left;
334 return node;
335 }
336
337 /* No right-hand children. Everything down and left is

Unable to handle kernel paging request at 000000008020bb81 RIP:
[<ffffffff80242abd>] exit_signals+0x27/0x10a
PGD 1ff102067 PUD 0
Oops: 0000 [5] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 3
Modules linked in:
Pid: 16511, comm: ,@ Tainted: G M D 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80242abd>] [<ffffffff80242abd>] exit_signals+0x27/0x10a
RSP: 0000:ffff8101fe6e3cf8 EFLAGS: 00010003
RAX: 000000008020bb29 RBX: 0000000000000046 RCX: 00000000ffffffff
RDX: ffff8101fe6e4000 RSI: 0000000000000000 RDI: ffff8101fe6e4000
RBP: ffff8101fe6e3d18 R08: 0000000000000000 R09: ffffffff80662540
R10: ffffffff80662540 R11: ffff810004ab9740 R12: ffff8101fe6e4000
R13: 0000000000000000 R14: ffff8101fe6e4000 R15: ffff8101fe6e3e88
FS: 0000000000000000(0000) GS:ffff8101fff13000(0063) knlGS:00000000f7d4a080
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000008020bb81 CR3: 00000001ff1f2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ,@ (pid: 16511, threadinfo 00000000ffffffff, task ffff8101fe6e4000)
Stack: ffff8101fe6e4000 0000000000000046 ffff8101fe6e4000 0000000000000009
ffff8101fe6e3d68 ffffffff80239b8c ffff8101fe6e3d48 ffffffff803c241d
0000000000000046 0000000000000046 ffff8101fe6e3e88 0000000000000009
Call Trace:


Code: f6 40 58 08 75 07 48 83 78 48 00 74 0b 41 83 4c 24 14 04 e9
RIP [<ffffffff80242abd>] exit_signals+0x27/0x10a
RSP <ffff8101fe6e3cf8>
CR2: 000000008020bb81

0xffffffff80242abd is in exit_signals (include/linux/sched.h:555).
550 #define SIGNAL_GROUP_EXIT 0x00000008 /* group exit in progress */
551
552 /* If true, all threads except ->group_exit_task have pending SIGKILL */
553 static inline int signal_group_exit(const struct signal_struct *sig)
554 {
555 return (sig->flags & SIGNAL_GROUP_EXIT) ||
556 (sig->group_exit_task != NULL);
557 }
558
559 /*
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Andrew Morton
2007-12-27 10:00:09 UTC
Permalink
Post by Kamalesh Babulal
Hi Andrew,
The 2.6.24-rc6-mm1 kernel with hotfix x86-fix-system-gate-related-crash.patch applied
panics while booting on a x86_64 box
[<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
PGD 17ff65067 PUD 17f1c7067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80369a0b>] [<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
RSP: 0000:ffffffff80650e00 EFLAGS: 00010002
RAX: ffff8101fe9568c8 RBX: ffff8100010062a8 RCX: ffff8101fe9568b0
RDX: ffff8101fe9568c8 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff80650e10 R08: ffff8101fe9568c8 R09: 0000000000000086
R10: 0000000000000000 R11: 00000000000001e8 R12: ffff8100010062b8
R13: 0000000000000002 R14: ffff810001006260 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffffff805dc000(0000) knlGS:00000000f31ffbb0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000046 CR3: 000000017f0ab000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff805f6000, task ffffffff805a2080)
Stack: ffff8100010062a8 ffff8101fe9568b0 ffffffff80650e40 ffffffff8024be16
ffffffff80369d65 ffffffff80369d65 ffff8101fe9568b0 ffff8100010062a8
ffffffff80650eb0 ffffffff8024c1d5 ffffffffb88cc28e 0000000006e73eff
<IRQ> [<ffffffff8024be16>] __remove_hrtimer+0x2e/0x3c
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff8024c1d5>] hrtimer_run_queues+0x130/0x191
[<ffffffff8023fd09>] run_timer_softirq+0x28/0x1a7
[<ffffffff8023c018>] __do_softirq+0x55/0xc2
[<ffffffff8020c73c>] call_softirq+0x1c/0x28
[<ffffffff8020e719>] do_softirq+0x32/0x9d
[<ffffffff8023c0dd>] irq_exit+0x3f/0x41
[<ffffffff8021ff85>] smp_apic_timer_interrupt+0x92/0xa7
[<ffffffff8020c1e6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff802095f5>] default_idle+0x36/0x5e
[<ffffffff802095f0>] default_idle+0x31/0x5e
[<ffffffff802095bf>] default_idle+0x0/0x5e
[<ffffffff802096b6>] cpu_idle+0x90/0xb2
[<ffffffff804b0126>] rest_init+0x5a/0x5c
[<ffffffff806017ee>] start_kernel+0x2b8/0x2c4
[<ffffffff8060112b>] _sinittext+0x12b/0x132
It does seem to be mostly hrtimer-related. But surely the hrtimer system
is initialised by the time tis happens.

The usual refrain: is it possible to run a bisection search?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Kamalesh Babulal
2007-12-27 10:30:13 UTC
Permalink
Post by Andrew Morton
Post by Kamalesh Babulal
Hi Andrew,
The 2.6.24-rc6-mm1 kernel with hotfix x86-fix-system-gate-related-crash.patch applied
panics while booting on a x86_64 box
[<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
PGD 17ff65067 PUD 17f1c7067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80369a0b>] [<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
RSP: 0000:ffffffff80650e00 EFLAGS: 00010002
RAX: ffff8101fe9568c8 RBX: ffff8100010062a8 RCX: ffff8101fe9568b0
RDX: ffff8101fe9568c8 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff80650e10 R08: ffff8101fe9568c8 R09: 0000000000000086
R10: 0000000000000000 R11: 00000000000001e8 R12: ffff8100010062b8
R13: 0000000000000002 R14: ffff810001006260 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffffff805dc000(0000) knlGS:00000000f31ffbb0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000046 CR3: 000000017f0ab000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff805f6000, task ffffffff805a2080)
Stack: ffff8100010062a8 ffff8101fe9568b0 ffffffff80650e40 ffffffff8024be16
ffffffff80369d65 ffffffff80369d65 ffff8101fe9568b0 ffff8100010062a8
ffffffff80650eb0 ffffffff8024c1d5 ffffffffb88cc28e 0000000006e73eff
<IRQ> [<ffffffff8024be16>] __remove_hrtimer+0x2e/0x3c
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff8024c1d5>] hrtimer_run_queues+0x130/0x191
[<ffffffff8023fd09>] run_timer_softirq+0x28/0x1a7
[<ffffffff8023c018>] __do_softirq+0x55/0xc2
[<ffffffff8020c73c>] call_softirq+0x1c/0x28
[<ffffffff8020e719>] do_softirq+0x32/0x9d
[<ffffffff8023c0dd>] irq_exit+0x3f/0x41
[<ffffffff8021ff85>] smp_apic_timer_interrupt+0x92/0xa7
[<ffffffff8020c1e6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff802095f5>] default_idle+0x36/0x5e
[<ffffffff802095f0>] default_idle+0x31/0x5e
[<ffffffff802095bf>] default_idle+0x0/0x5e
[<ffffffff802096b6>] cpu_idle+0x90/0xb2
[<ffffffff804b0126>] rest_init+0x5a/0x5c
[<ffffffff806017ee>] start_kernel+0x2b8/0x2c4
[<ffffffff8060112b>] _sinittext+0x12b/0x132
It does seem to be mostly hrtimer-related. But surely the hrtimer system
is initialised by the time tis happens.
The usual refrain: is it possible to run a bisection search?
I will do the bisect and update.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Kamalesh Babulal
2007-12-28 09:20:07 UTC
Permalink
Post by Andrew Morton
Post by Kamalesh Babulal
Hi Andrew,
The 2.6.24-rc6-mm1 kernel with hotfix x86-fix-system-gate-related-crash.patch applied
panics while booting on a x86_64 box
[<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
PGD 17ff65067 PUD 17f1c7067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80369a0b>] [<ffffffff80369a0b>] rb_erase+0xe7/0x2a3
RSP: 0000:ffffffff80650e00 EFLAGS: 00010002
RAX: ffff8101fe9568c8 RBX: ffff8100010062a8 RCX: ffff8101fe9568b0
RDX: ffff8101fe9568c8 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff80650e10 R08: ffff8101fe9568c8 R09: 0000000000000086
R10: 0000000000000000 R11: 00000000000001e8 R12: ffff8100010062b8
R13: 0000000000000002 R14: ffff810001006260 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffffff805dc000(0000) knlGS:00000000f31ffbb0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000046 CR3: 000000017f0ab000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff805f6000, task ffffffff805a2080)
Stack: ffff8100010062a8 ffff8101fe9568b0 ffffffff80650e40 ffffffff8024be16
ffffffff80369d65 ffffffff80369d65 ffff8101fe9568b0 ffff8100010062a8
ffffffff80650eb0 ffffffff8024c1d5 ffffffffb88cc28e 0000000006e73eff
<IRQ> [<ffffffff8024be16>] __remove_hrtimer+0x2e/0x3c
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff80369d65>] __down_read_trylock+0x16/0x42
[<ffffffff8024c1d5>] hrtimer_run_queues+0x130/0x191
[<ffffffff8023fd09>] run_timer_softirq+0x28/0x1a7
[<ffffffff8023c018>] __do_softirq+0x55/0xc2
[<ffffffff8020c73c>] call_softirq+0x1c/0x28
[<ffffffff8020e719>] do_softirq+0x32/0x9d
[<ffffffff8023c0dd>] irq_exit+0x3f/0x41
[<ffffffff8021ff85>] smp_apic_timer_interrupt+0x92/0xa7
[<ffffffff8020c1e6>] apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff802095f5>] default_idle+0x36/0x5e
[<ffffffff802095f0>] default_idle+0x31/0x5e
[<ffffffff802095bf>] default_idle+0x0/0x5e
[<ffffffff802096b6>] cpu_idle+0x90/0xb2
[<ffffffff804b0126>] rest_init+0x5a/0x5c
[<ffffffff806017ee>] start_kernel+0x2b8/0x2c4
[<ffffffff8060112b>] _sinittext+0x12b/0x132
It does seem to be mostly hrtimer-related. But surely the hrtimer system
is initialised by the time tis happens.
The usual refrain: is it possible to run a bisection search?
Hi Andrew,

While doing the git bisect, following panic was seen

Unable to handle kernel paging request at 000000000000401e RIP:
[<ffffffff80232ec8>] load_balance_monitor+0x15e/0x2a4
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 1
Modules linked in:
Pid: 15, comm: load_balance_mo Not tainted 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80232ec8>] [<ffffffff80232ec8>] load_balance_monitor+0x15e/0x2a4
RSP: 0000:ffff81007ffb7eb0 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 000000000000401e RSI: ffff81007ffb7ed8 RDI: 0000000000000000
RBP: ffff81007ffb7f20 R08: ffff81007ffb6000 R09: ffff81007ffb6000
R10: ffff81007ffb6000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000000800 R15: ffff8101fe997f00
FS: 0000000000000000(0000) GS:ffff8100e3b10000(0000) knlGS:00000000f73e1bb0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000401e CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process load_balance_mo (pid: 15, threadinfo ffff81007ffb6000, task ffff81007ff94790)
Stack: 0000000000002000 0000000000000000 ffff810001009cc0 00000001e3b29d90
0000008000000000 000000000000000f ffff81007f0be780 000000000000000f
000000017ffb7f20 0000000000000000 00000000fffffffc ffffffffffffffff
Call Trace:
[<ffffffff80232d6a>] load_balance_monitor+0x0/0x2a4
[<ffffffff80247830>] kthread+0x3d/0x63
[<ffffffff8020c2b8>] child_rip+0xa/0x12
[<ffffffff802477f3>] kthread+0x0/0x63
[<ffffffff8020c2ae>] child_rip+0x0/0x12


Code: 48 8b 04 c2 48 8b 10 48 01 55 98 e8 ce 40 12 00 83 f8 07 41
RIP [<ffffffff80232ec8>] load_balance_monitor+0x15e/0x2a4
RSP <ffff81007ffb7eb0>
CR2: 000000000000401e


The git-sched.patch is causing this panic, and i am searching for the patch causing the
hrtimer-related panic.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Dhaval Giani
2007-12-28 13:10:14 UTC
Permalink
Post by Kamalesh Babulal
While doing the git bisect, following panic was seen
[<ffffffff80232ec8>] load_balance_monitor+0x15e/0x2a4
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:04.0/host0/target0:0:6/0:0:6:0/type
CPU 1
Pid: 15, comm: load_balance_mo Not tainted 2.6.24-rc6-mm1-autokern1 #1
RIP: 0010:[<ffffffff80232ec8>] [<ffffffff80232ec8>] load_balance_monitor+0x15e/0x2a4
RSP: 0000:ffff81007ffb7eb0 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 000000000000401e RSI: ffff81007ffb7ed8 RDI: 0000000000000000
RBP: ffff81007ffb7f20 R08: ffff81007ffb6000 R09: ffff81007ffb6000
R10: ffff81007ffb6000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000000800 R15: ffff8101fe997f00
FS: 0000000000000000(0000) GS:ffff8100e3b10000(0000) knlGS:00000000f73e1bb0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000401e CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process load_balance_mo (pid: 15, threadinfo ffff81007ffb6000, task ffff81007ff94790)
Stack: 0000000000002000 0000000000000000 ffff810001009cc0 00000001e3b29d90
0000008000000000 000000000000000f ffff81007f0be780 000000000000000f
000000017ffb7f20 0000000000000000 00000000fffffffc ffffffffffffffff
[<ffffffff80232d6a>] load_balance_monitor+0x0/0x2a4
[<ffffffff80247830>] kthread+0x3d/0x63
[<ffffffff8020c2b8>] child_rip+0xa/0x12
[<ffffffff802477f3>] kthread+0x0/0x63
[<ffffffff8020c2ae>] child_rip+0x0/0x12
Code: 48 8b 04 c2 48 8b 10 48 01 55 98 e8 ce 40 12 00 83 f8 07 41
RIP [<ffffffff80232ec8>] load_balance_monitor+0x15e/0x2a4
RSP <ffff81007ffb7eb0>
CR2: 000000000000401e
Hmmm. Looking into it :-).
--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
V***@vt.edu
2007-12-27 17:20:09 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
Happened to be looking more closely than usual at my dmesg looking for something
else, and spotted this:

[ 6.079043] power_supply BAT0: 11 dynamic props
[ 6.079045] power_supply BAT0: prop STATUS=Full
[ 6.079047] power_supply BAT0: prop PRESENT=1
[ 6.079049] power_supply BAT0: prop TECHNOLOGY=Li-ion
[ 6.079052] power_supply BAT0: prop VOLTAGE_MIN_DESIGN=11100000
[ 6.079054] power_supply BAT0: prop VOLTAGE_NOW=11793000
[ 6.079056] power_supply BAT0: prop CURRENT_NOW=1000
[ 6.079058] power_supply BAT0: prop CHARGE_FULL_DESIGN=7800000
[ 6.079061] power_supply BAT0: prop CHARGE_FULL=3110000
[ 6.079063] power_supply BAT0: prop CHARGE_NOW=7800000
[ 6.079065] power_supply BAT0: prop TIME_TO_FULL_AVG=DELL FF2316
[ 6.079067] power_supply BAT0: prop MODEL_NAME=Sanyo
[ 6.079301] ACPI: SSDT 7FE82138, 0244 (r1 PmRef Cpu0Ist 3000 INTL 20050624)
[ 6.079488] ACPI: SSDT 7FE81EED, 01C6 (r1 PmRef Cpu0Cst 3001 INTL 20050624)

What's with that TIME_TO_FULL_AVG value? Is the battery on crack, or my BIOS,
or the driver? I expected time units, not a Dell part number ;) (Yes, I know
CHARGE_FULL is low, the battery is pretty beat, and Latitudes seem to always
report CHARGE_NOW as "design full" when running off the AC power brick)

(For the record, dmidecode says:

Handle 0x1600, DMI type 22, 26 bytes
Portable Battery
Location: Sys. Battery Bay
Manufacturer: Sanyo
Name: DELL FF2316A
Design Capacity: 78000 mWh
Design Voltage: 11100 mV
SBDS Version: 1.0
Maximum Error: 0%
SBDS Serial Number: 0355
SBDS Manufacture Date: 2006-10-26
SBDS Chemistry: LION
OEM-specific Information: 0x00000001

So it even managed to lose the trailing 'A'.. ;)
V***@vt.edu
2007-12-27 18:00:12 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I seem to be on a roll here... :)

X86_64 kernel, Dell Latitude D820, Core2 T7200 processor...

(Yes, I know it's tainted. If I *have* to, I'll try to get an untainted one,
which will be a pain - the oss 'nv' driver is a tad busticated on my box at
the moment).

Took me a bunch of hours to find a way to repeat this one, but it seems
pretty consistent - I exit an 'Eterm' process and ka-blam. First guess
I have is that we hit a race and the timer popped right in the middle of
a process exiting and we try to update process times on an already-defunct
process table entry?

This ring any bells or brown-paper-bag D'Oh!s before I go bisecting?

[15345.901919] Unable to handle kernel paging request at 000000af008c00cd RIP:
[15345.901934] [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.901952] PGD 0
[15345.901959] Oops: 0000 [1] PREEMPT SMP
[15345.901972] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
[15345.901978] CPU 1
[15345.901984] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp nf_conntrack_ftp xt_pkttype ipt_REJECT ipt_osf nf_conntrack_ipv4 xt_ipisforif ipt_recent ipt_LOG xt_u32 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_generic acpi_cpufreq tpm_tis pcmcia gspca(U) iwl3945 firmware_class iTCO_wdt yenta_socket compat_ioctl32 ohci1394 rsrc_nonstatic iTCO_vendor_support mac80211 ieee1394 pcmcia_core nvidia(P)(U) watchdog_core battery videodev ac watchdog_dev v4l2_common snd_hda_intel v4l1_compat thermal power_supply cfg80211 button intel_agp processor rtc
[15345.902170] Pid: 0, comm: Eterm Tainted: P 2.6.24-rc6-mm1 #4
[15345.902176] RIP: 0010:[<ffffffff802310d9>] [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.902189] RSP: 0018:ffff81007f8a3eb8 EFLAGS: 00010083
[15345.902195] RAX: 000000af008c005d RBX: 00000df4fefac7d9 RCX: 0000000000000004
[15345.902202] RDX: 0000000000000004 RSI: ffff81007e405600 RDI: ffff810001011180
[15345.902208] RBP: ffff81007f8a3ed8 R08: 0000000000000010 R09: 0000000000000001
[15345.902214] R10: ffff8100808f4000 R11: ffffffff8071d180 R12: ffff810001011180
[15345.902219] R13: 0000000000000001 R14: ffff81007e405600 R15: 0000000000000001
[15345.902226] FS: 0000000000000000(0000) GS:ffff81007f86f9c0(0063) knlGS:00000000f7dab6c0
[15345.902233] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[15345.902238] CR2: 000000af008c00cd CR3: 00000000766c8000 CR4: 00000000000006e0
[15345.902244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15345.902249] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15345.902256] Process Eterm (pid: 0, threadinfo 4394404dffffffff, task ffff81007e405600)
[15345.902261] Stack: 0000000000000001 0000000000000000 ffff81007e405600 00000c71d7bbe7da
[15345.902279] ffff81007f8a3f08 ffffffff8023f913 ffff81007f8a3f08 ffff81000100e060
[15345.902293] ffff810077161b08 ffff81000100df60 ffff81007f8a3f38 ffffffff80251b7d
[15345.902306] Call Trace:
[15345.902312] <IRQ> [<ffffffff8023f913>] update_process_times+0x4a/0x5b
[15345.902334] [<ffffffff80251b7d>] tick_sched_timer+0x8e/0xcb
[15345.902345] [<ffffffff8024ce2c>] hrtimer_interrupt+0x111/0x1a1
[15345.902357] [<ffffffff8022733e>] ia32_setup_frame+0xb5/0x1b7
[15345.902367] [<ffffffff8021f174>] smp_apic_timer_interrupt+0x86/0xa6
[15345.902377] [<ffffffff8020cca6>] apic_timer_interrupt+0x66/0x70
[15345.902383] <EOI>
[15345.902389]
[15345.902390] Code: ff 50 70 4c 89 e7 e8 4a 2d 2f 00 44 89 ef e8 85 9f ff ff 41
[15345.902445] RIP [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.902455] RSP <ffff81007f8a3eb8>
[15345.902461] CR2: 000000af008c00cd
Andrew Morton
2007-12-28 07:40:07 UTC
Permalink
Post by V***@vt.edu
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
I seem to be on a roll here... :)
Yup. But please do try to get the cc's right. Especially when I'm lying
on a beach.
Post by V***@vt.edu
X86_64 kernel, Dell Latitude D820, Core2 T7200 processor...
(Yes, I know it's tainted. If I *have* to, I'll try to get an untainted one,
which will be a pain - the oss 'nv' driver is a tad busticated on my box at
the moment).
I doubt if the nvidia driver is involved in this.
Post by V***@vt.edu
Took me a bunch of hours to find a way to repeat this one, but it seems
pretty consistent - I exit an 'Eterm' process and ka-blam. First guess
I have is that we hit a race and the timer popped right in the middle of
a process exiting and we try to update process times on an already-defunct
process table entry?
This ring any bells or brown-paper-bag D'Oh!s before I go bisecting?
[15345.901934] [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.901952] PGD 0
[15345.901959] Oops: 0000 [1] PREEMPT SMP
[15345.901972] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
[15345.901978] CPU 1
[15345.901984] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp nf_conntrack_ftp xt_pkttype ipt_REJECT ipt_osf nf_conntrack_ipv4 xt_ipisforif ipt_recent ipt_LOG xt_u32 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_generic acpi_cpufreq tpm_tis pcmcia gspca(U) iwl3945 firmware_class iTCO_wdt yenta_socket compat_ioctl32 ohci1394 rsrc_nonstatic iTCO_vendor_support mac80211 ieee1394 pcmcia_core nvidia(P)(U) watchdog_core battery videodev ac watchdog_dev v4l2_common snd_hda_intel v4l1_compat thermal power_supply cfg80211 button intel_agp processor rtc
[15345.902170] Pid: 0, comm: Eterm Tainted: P 2.6.24-rc6-mm1 #4
[15345.902176] RIP: 0010:[<ffffffff802310d9>] [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.902189] RSP: 0018:ffff81007f8a3eb8 EFLAGS: 00010083
[15345.902195] RAX: 000000af008c005d RBX: 00000df4fefac7d9 RCX: 0000000000000004
[15345.902202] RDX: 0000000000000004 RSI: ffff81007e405600 RDI: ffff810001011180
[15345.902208] RBP: ffff81007f8a3ed8 R08: 0000000000000010 R09: 0000000000000001
[15345.902214] R10: ffff8100808f4000 R11: ffffffff8071d180 R12: ffff810001011180
[15345.902219] R13: 0000000000000001 R14: ffff81007e405600 R15: 0000000000000001
[15345.902226] FS: 0000000000000000(0000) GS:ffff81007f86f9c0(0063) knlGS:00000000f7dab6c0
[15345.902233] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[15345.902238] CR2: 000000af008c00cd CR3: 00000000766c8000 CR4: 00000000000006e0
[15345.902244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15345.902249] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15345.902256] Process Eterm (pid: 0, threadinfo 4394404dffffffff, task ffff81007e405600)
[15345.902261] Stack: 0000000000000001 0000000000000000 ffff81007e405600 00000c71d7bbe7da
[15345.902279] ffff81007f8a3f08 ffffffff8023f913 ffff81007f8a3f08 ffff81000100e060
[15345.902293] ffff810077161b08 ffff81000100df60 ffff81007f8a3f38 ffffffff80251b7d
[15345.902312] <IRQ> [<ffffffff8023f913>] update_process_times+0x4a/0x5b
[15345.902334] [<ffffffff80251b7d>] tick_sched_timer+0x8e/0xcb
[15345.902345] [<ffffffff8024ce2c>] hrtimer_interrupt+0x111/0x1a1
[15345.902357] [<ffffffff8022733e>] ia32_setup_frame+0xb5/0x1b7
[15345.902367] [<ffffffff8021f174>] smp_apic_timer_interrupt+0x86/0xa6
[15345.902377] [<ffffffff8020cca6>] apic_timer_interrupt+0x66/0x70
[15345.902383] <EOI>
[15345.902389]
[15345.902390] Code: ff 50 70 4c 89 e7 e8 4a 2d 2f 00 44 89 ef e8 85 9f ff ff 41
[15345.902445] RIP [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.902455] RSP <ffff81007f8a3eb8>
[15345.902461] CR2: 000000af008c00cd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
V***@vt.edu
2007-12-30 03:10:07 UTC
Permalink
Post by V***@vt.edu
[15345.901934] [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.901952] PGD 0
[15345.901959] Oops: 0000 [1] PREEMPT SMP
[15345.901972] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
[15345.901978] CPU 1
[15345.901984] Modules linked in: irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm irda crc_ccitt coretemp nf_conntrack_ftp xt_pkttype ipt_REJECT ipt_osf nf_conntrack_ipv4 xt_ipisforif ipt_recent ipt_LOG xt_u32 iptable_filter ip_tables xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_LOG xt_limit ip6table_filter ip6_tables x_tables sha256_generic aes_generic acpi_cpufreq tpm_tis pcmcia gspca(U) iwl3945 firmware_class iTCO_wdt yenta_socket compat_ioctl32 ohci1394 rsrc_nonstatic iTCO_vendor_support mac80211 ieee1394 pcmcia_core nvidia(P)(U) watchdog_core battery videodev ac watchdog_dev v4l2_common snd_hda_intel v4l1_compat thermal power_supply cfg80211 button intel_agp processor rtc
[15345.902170] Pid: 0, comm: Eterm Tainted: P 2.6.24-rc6-mm1 #4
[15345.902176] RIP: 0010:[<ffffffff802310d9>] [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.902189] RSP: 0018:ffff81007f8a3eb8 EFLAGS: 00010083
[15345.902195] RAX: 000000af008c005d RBX: 00000df4fefac7d9 RCX: 0000000000000004
[15345.902202] RDX: 0000000000000004 RSI: ffff81007e405600 RDI: ffff810001011180
[15345.902208] RBP: ffff81007f8a3ed8 R08: 0000000000000010 R09: 0000000000000001
[15345.902214] R10: ffff8100808f4000 R11: ffffffff8071d180 R12: ffff810001011180
[15345.902219] R13: 0000000000000001 R14: ffff81007e405600 R15: 0000000000000001
[15345.902226] FS: 0000000000000000(0000) GS:ffff81007f86f9c0(0063) knlGS:00000000f7dab6c0
[15345.902233] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[15345.902238] CR2: 000000af008c00cd CR3: 00000000766c8000 CR4: 00000000000006e0
[15345.902244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15345.902249] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15345.902256] Process Eterm (pid: 0, threadinfo 4394404dffffffff, task ffff81007e405600)
[15345.902261] Stack: 0000000000000001 0000000000000000 ffff81007e405600 00000c71d7bbe7da
[15345.902279] ffff81007f8a3f08 ffffffff8023f913 ffff81007f8a3f08 ffff81000100e060
[15345.902293] ffff810077161b08 ffff81000100df60 ffff81007f8a3f38 ffffffff80251b7d
[15345.902312] <IRQ> [<ffffffff8023f913>] update_process_times+0x4a/0x5b
[15345.902334] [<ffffffff80251b7d>] tick_sched_timer+0x8e/0xcb
[15345.902345] [<ffffffff8024ce2c>] hrtimer_interrupt+0x111/0x1a1
[15345.902357] [<ffffffff8022733e>] ia32_setup_frame+0xb5/0x1b7
[15345.902367] [<ffffffff8021f174>] smp_apic_timer_interrupt+0x86/0xa6
[15345.902377] [<ffffffff8020cca6>] apic_timer_interrupt+0x66/0x70
[15345.902383] <EOI>
[15345.902389]
[15345.902390] Code: ff 50 70 4c 89 e7 e8 4a 2d 2f 00 44 89 ef e8 85 9f ff ff 41
[15345.902445] RIP [<ffffffff802310d9>] scheduler_tick+0xdb/0x1c4
[15345.902455] RSP <ffff81007f8a3eb8>
[15345.902461] CR2: 000000af008c00cd
In case it makes a difference, the Eterm that causes the issue on exit is
a 32-bit binary, with a 64-bit kernel (though I did have one kernel lockup
with xpdf, which is a 64-bit binary, but I can't prove that was/wasn't this
same issue)....

Bisection says:

git-ipwireless_cs.patch GOOD
#
git-x86.patch
git-x86-fixup.patch
git-x86-arch-x86-math-emu-errorsc-fix-printk-warnings.patch
git-x86-drivers-pnp-pnpbios-bioscallsc-build-fix.patch
git-x86-fix-doubly-merged-patch.patch
git-x86-export-leave_mm.patch BAD

and that's where bisection comes to a halt...

Time to bisect through git-x86, or somebody got a better idea? Looking at
the commits listed in git-x86.patch, I didn't see anything that jumped out,
but I'm pretty sure the problem is in there somewhere...
Randy Dunlap
2007-12-31 18:10:10 UTC
Permalink
From: Randy Dunlap <***@oracle.com>

When CONFIG_PREEMPT_NONE=y, scatterwalk.h still uses cond_resched()
so it needs to include sched.h:

linux-2.6.24-rc6-mm1/include/crypto/scatterwalk.h:52: error: implicit declaration of function 'cond_resched'
make[2]: *** [crypto/digest.o] Error 1

Signed-off-by: Randy Dunlap <***@oracle.com>
---
include/crypto/scatterwalk.h | 1 +
1 file changed, 1 insertion(+)

--- linux-2.6.24-rc6-mm1.orig/include/crypto/scatterwalk.h
+++ linux-2.6.24-rc6-mm1/include/crypto/scatterwalk.h
@@ -23,6 +23,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/scatterlist.h>
+#include <linux/sched.h>

static inline enum km_type crypto_kmap_type(int out)
{
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Herbert Xu
2007-12-31 22:40:07 UTC
Permalink
Post by Randy Dunlap
When CONFIG_PREEMPT_NONE=y, scatterwalk.h still uses cond_resched()
Thanks. This is already in cryptodev-2.6.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Randy Dunlap
2007-12-31 18:10:14 UTC
Permalink
From: Randy Dunlap <***@oracle.com>

When SYSFS=n and MODULES=y, build ends with:

linux-2.6.24-rc6-mm1/drivers/base/module.c: In function 'module_add_driver':
linux-2.6.24-rc6-mm1/drivers/base/module.c:49: error: 'module_kset' undeclared (first use in this function)
make[3]: *** [drivers/base/module.o] Error 1

Below is one possible fix.
Build-tested with all 4 config combinations of SYSFS & MODULES.

Signed-off-by: Randy Dunlap <***@oracle.com>
---
drivers/base/Makefile | 2 ++
drivers/base/base.h | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)

--- linux-2.6.24-rc6-mm1.orig/drivers/base/Makefile
+++ linux-2.6.24-rc6-mm1/drivers/base/Makefile
@@ -11,7 +11,9 @@ obj-$(CONFIG_FW_LOADER) += firmware_clas
obj-$(CONFIG_NUMA) += node.o
obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o
obj-$(CONFIG_SMP) += topology.o
+ifeq ($(CONFIG_SYSFS),y)
obj-$(CONFIG_MODULES) += module.o
+endif
obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o

ifeq ($(CONFIG_DEBUG_DRIVER),y)
--- linux-2.6.24-rc6-mm1.orig/drivers/base/base.h
+++ linux-2.6.24-rc6-mm1/drivers/base/base.h
@@ -79,7 +79,7 @@ extern char *make_class_name(const char

extern int devres_release_all(struct device *dev);

-#ifdef CONFIG_MODULES
+#if defined(CONFIG_MODULES) && defined(CONFIG_SYSFS)
extern void module_add_driver(struct module *mod, struct device_driver *drv);
extern void module_remove_driver(struct device_driver *drv);
#else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Brownell
2007-12-31 18:50:08 UTC
Permalink
CC drivers/input/keyboard/gpio_keys.o
include2/asm/gpio.h:4:18: error: gpio.h: No such file or directory
Find whatever broken patch selected (on x86_64)

CONFIG_GENERIC_GPIO=y

without actually providing that support (by providing <asm/gpio.h> and
an implementation backing it up). That's the patch which broke those
various GPIO-dependant drivers.

- Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Randy Dunlap
2007-12-31 19:20:06 UTC
Permalink
Post by David Brownell
CC drivers/input/keyboard/gpio_keys.o
include2/asm/gpio.h:4:18: error: gpio.h: No such file or directory
Find whatever broken patch selected (on x86_64)
CONFIG_GENERIC_GPIO=y
without actually providing that support (by providing <asm/gpio.h> and
an implementation backing it up). That's the patch which broke those
various GPIO-dependant drivers.
OK, thanks for the direction.

---

From: Randy Dunlap <***@oracle.com>

X86_RDC321X is X86_32, so make it depend on X86_32 so that
X86_64 random configs don't try to build RDC and fail.

Signed-off-by: Randy Dunlap <***@oracle.com>
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)

--- linux-2.6.24-rc6-mm1.orig/arch/x86/Kconfig
+++ linux-2.6.24-rc6-mm1/arch/x86/Kconfig
@@ -297,6 +297,7 @@ config X86_ES7000

config X86_RDC321X
bool "RDC R-321x SoC"
+ depends on X86_32
select M486
select X86_REBOOTFIXUPS
select GENERIC_GPIO
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2008-01-01 16:00:19 UTC
Permalink
Post by Randy Dunlap
Post by David Brownell
Find whatever broken patch selected (on x86_64)
CONFIG_GENERIC_GPIO=y
without actually providing that support (by providing <asm/gpio.h> and
an implementation backing it up). That's the patch which broke those
various GPIO-dependant drivers.
OK, thanks for the direction.
---
X86_RDC321X is X86_32, so make it depend on X86_32 so that
X86_64 random configs don't try to build RDC and fail.
thanks Randy, i have applied your fix to x86.git.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Michael Krufky
2007-12-31 19:10:10 UTC
Permalink
MODPOST 120 modules
ERROR: "i2c_attach_client" [drivers/media/video/v4l2-common.ko] undefined!
make[2]: *** [__modpost] Error 1
---
~Randy
desserts: http://www.xenotime.net/linux/recipes/
http://linuxtv.org/hg/v4l-dvb/rev/64e0c78821c4
Hmm, I apologize -- I think this was an unrelated issue. Sorry for the
confusion.

-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Michael Krufky
2007-12-31 19:10:10 UTC
Permalink
MODPOST 120 modules
ERROR: "i2c_attach_client" [drivers/media/video/v4l2-common.ko] undefined!
make[2]: *** [__modpost] Error 1
---
~Randy
desserts: http://www.xenotime.net/linux/recipes/
I fixed this problem in this changeset:

http://linuxtv.org/hg/v4l-dvb/rev/64e0c78821c4

Mauro, can you send this upstream?

for mm: here is the raw patch:

http://linuxtv.org/hg/v4l-dvb/raw-rev/64e0c78821c4

Regards,

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Randy Dunlap
2007-12-31 20:20:10 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc6/2.6.24-rc6-mm1/
With CONFIG_BLOCK=n:

LD drivers/block/built-in.o
/local/linsrc/linux-2.6.24-rc6-mm1/drivers/base/core.c: In function 'device_add_class_symlinks':
/local/linsrc/linux-2.6.24-rc6-mm1/drivers/base/core.c:707: error: 'part_type' undeclared (first use in this function)
/local/linsrc/linux-2.6.24-rc6-mm1/drivers/base/core.c: In function 'device_remove_class_symlinks':
/local/linsrc/linux-2.6.24-rc6-mm1/drivers/base/core.c:746: error: 'part_type' undeclared (first use in this function)
make[3]: *** [drivers/base/core.o] Error 1


and that is after fixing (in some sense) the first CONFIG_BLOCK=n
problem with the patch below. Please test lots of configs.
and/or use 'make randconfig' (automated, scripted, e.g., etc.).
maybe check Documentation/SubmitChecklist. :)

---

From: Randy Dunlap <***@oracle.com>

Parts of driver core use blk_lookup_devt() when CONFIG_BLOCK=n,
so provide an short inline version of it for that case.

Signed-off-by: Randy Dunlap <***@oracle.com>
---
include/linux/genhd.h | 7 +++++++
1 file changed, 7 insertions(+)

--- linux-2.6.24-rc6-mm1.orig/include/linux/genhd.h
+++ linux-2.6.24-rc6-mm1/include/linux/genhd.h
@@ -10,6 +10,7 @@
*/

#include <linux/types.h>
+#include <linux/kdev_t.h>

#ifdef CONFIG_BLOCK

@@ -443,6 +444,12 @@ static inline struct block_device *bdget

static inline void printk_all_partitions(void) { }

+static inline dev_t blk_lookup_devt(const char *name)
+{
+ dev_t devt = MKDEV(0, 0);
+ return devt;
+}
+
#endif /* CONFIG_BLOCK */

#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Loading...