Discussion:
2.6.14-rc2-mm1
Andrew Morton
2005-09-22 05:28:39 UTC
Permalink
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/

- Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.

- Various random other things - nothing major.




Changes since 2.6.14-rc1-mm1:

linus.patch
git-cifs.patch
git-cryptodev.patch
git-drm.patch
git-ia64.patch
git-jfs.patch
git-libata-all.patch
git-mtd.patch
git-netdev-all.patch
git-nfs.patch
git-nfs-oops-fix.patch
git-ocfs2-prep.patch
git-ocfs2.patch
git-scsi-misc.patch
git-sas.patch
git-watchdog.patch

Subsystem trees

-raid6-altivec-fix.patch
-sharpsl-add-missing-hunk-from-backlight-update.patch
-mtd-update-sharpsl-partition-definitions.patch
-s390-default-configuration.patch
-s390-bl_dev-array-size.patch
-s390-crypto-driver-patch-take-2.patch
-s390-show_cpuinfo-fix.patch
-s390-diag-0x308-reipl.patch
-remove-arch-arm26-boot-compressed-hw-bsec.patch
-cpu-hotplug-breaks-wake_up_new_task.patch
-s390-kernel-stack-corruption.patch
-uml-_switch_to-code-consolidation.patch
-uml-breakpoint-an-arbitrary-thread.patch
-uml-remove-a-useless-include.patch
-uml-remove-an-unused-file.patch
-uml-remove-some-build-warnings.patch
-uml-preserve-errno-in-error-paths.patch
-uml-move-libc-code-out-of-mem_userc-and-tempfilec.patch
-uml-merge-mem_userc-and-memc.patch
-uml-return-a-real-error-code.patch
-uml-remove-include-of-asm-elfh.patch
-fix-up-some-pm_message_t-types.patch
-fix-mm-kconfig-spelling.patch
-x86_64-e820c-needs-module-h.patch
-seclvl-use-securityfs-tidy.patch
-seclvl-use-securityfs-fix.patch
-hdaps-driver-update.patch
-driver-core-fix-bus_rescan_devices-race-2.patch
-i2c-kill-an-unused-i2c_adapter-struct-member.patch
-fix-buffer-overrun-in-rpadlpar_sysfsc.patch
-ibmphp-use-dword-accessors-for-pci_rom_address.patch
-pciehp-use-dword-accessors-for-pci_rom_address.patch
-shpchp-use-dword-accessors-for-pci_rom_address.patch
-qla2xxx-use-dword-accessors-for-pci_rom_address.patch
-pci-convert-kcalloc-to-kzalloc.patch
-gregkh-usb-usb-gotemp.patch
-more-device-ids-for-option-card-driver.patch
-pcnet32-set_ringparam-implementation.patch
-pcnet32-set-min-ring-size-to-4.patch
-add-smp_mb__after_clear_bit-to-unlock_kiocb.patch
-joystick-vs-xorg-fix.patch
-codingstyle-memory-allocation.patch
-files-fix-preemption-issues.patch
-files-fix-preemption-issues-tidy.patch
-fat-miss-sync-issues-on-sync-mount-miss-sync-on-write.patch
-fix-pf-request-handling.patch
-i2o-remove-class-interface.patch
-i2o-remove-i2o_device_class.patch
-driver-core-allow-nesting-classes.patch
-driver-core-make-parent-class-define-subsystem.patch
-driver-core-pass-interface-to-class-intreface-methods.patch
-driver-core-send-hotplug-event-before-adding-class-interfaces.patch
-input-kill-devfs-references.patch
-input-prepare-to-sysfs-integration.patch
-input-convert-net-bluetooth-to-dynamic-input_dev-allocation.patch
-input-convert-drivers-macintosh-to-dynamic-input_dev-allocation.patch
-input-convert-konicawc-to-dynamic-input_dev-allocation.patch
-input-convert-onetouch-to-dynamic-input_dev-allocation.patch
-drivers-input-mouse-convert-to-dynamic-input_dev-allocation.patch
-drivers-input-keyboard-convert-to-dynamic-input_dev-allocation.patch
-drivers-input-touchscreen-convert-to-dynamic-input_dev-allocation.patch
-drivers-usb-input-convert-to-dynamic-input_dev-allocation.patch
-input-convert-ucb1x00-ts-to-dynamic-input_dev-allocation.patch
-input-convert-sound-ppc-beep-to-dynamic-input_dev-allocation.patch
-input-convert-sonypi-to-dynamic-input_dev-allocation.patch
-input-convert-driver-input-misc-to-dynamic-input_dev-allocation.patch
-drivers-input-joystick-convert-to-dynamic-input_dev-allocation.patch
-drivers-media-convert-to-dynamic-input_dev-allocation.patch
-input-show-sysfs-path-in-proc-bus-input-devices.patch
-input-export-input_dev-data-via-sysfs-attributes.patch
-input-core-implement-class-hierachy.patch
-input-core-implement-class-hierachy-hdaps-fixes.patch
-input-core-remove-custom-made-hotplug-handler.patch
-input-convert-input-handlers-to-class-interfaces.patch
-input-convert-to-seq_file.patch
-ide-fix-null-request-pointer-for-taskfile-ioctl.patch

Merged

+proc_task_root_link-c99-fix.patch
+lpfc-build-fix.patch

old gcc fixes

+hostap-fix-kbuild-warning.patch

Wrongly fix Kconfig screwup

+reboot-comment-and-factor-the-main-reboot-functions.patch
+suspend-cleanup-calling-of-power-off-methods.patch

Power management fixes

+pci_fixup_parent_subordinate_busnr-fixes.patch

PCI enumeration fix

+kdumpx86-add-note-type-nt_kdumpinfo-to-kernel-core-dumps.patch

kdump feature

+acpi-handle-fadt-20-xpmtmr-address-0-case.patch

ACPI pm_timer fix

+update-maintainers-list-with-the-kprobes-maintainers.patch

MAINAINERS update

+v9fs-make-conv-functions-to-check-for-conv-buffer-overflow.patch
+v9fs-allocate-the-rwalk-qid-array-from-the-right-conv-buffer.patch
+v9fs-make-copy-of-the-transport-prototype-instead-of-using-it-directly.patch
+v9fs-replace-strlen-on-newly-allocated-by-__getname-buffers-to-path_max.patch
+v9fs-dont-free-root-dentry-inode-if-error-occurs-in-v9fs_get_sb.patch

v9fs updates

+ppc64-smu-driver-update-i2c-support.patch
+ppc64-smu-driver-update-i2c-support-fix.patch

Big update tp pmac platform driver

+acpi-disable-c2-c3-for-_all_-ibm-r40e-laptops-for-2613-bug-3549-update.patch

Fix acpi-disable-c2-c3-for-_all_-ibm-r40e-laptops-for-2613-bug-3549.patch

+cs5535-audio-alsa-driver.patch
+cleanup-for-cs5535-audio-driver.patch

New audio driver

+gregkh-driver-driver-ide-tape-sysfs.patch
+gregkh-driver-driver-fix-bus_rescan_devices.patch
+gregkh-driver-driver-device_is_registered.patch
+gregkh-driver-driver-fix-class-symlinks.patch

Driver tree updates

+drm_addmap_ioctl-warning-fix.patch

drm warning fix

+gregkh-i2c-i2c-maintainer.patch
+gregkh-i2c-hwmon-adm9240-update-01.patch
+gregkh-i2c-hwmon-adm9240-update-02.patch
+gregkh-i2c-hwmon-via686a-save-memory.patch

i2c tree updates

+fix-broken-nvidia-device-id-in-sata_nv.patch

SATA driver fix

+gregkh-pci-pci-remove-unused-scratch.patch
+gregkh-pci-pci-kzalloc.patch
+gregkh-pci-pci-fix-probe-warning.patch
+gregkh-pci-pci-buffer-overrun-rpaldpar.patch

PCI tree updates

+areca-raid-linux-scsi-driver-update.patch

Update areca-raid-linux-scsi-driver.patch

-scsi-sas-makefile-and-kconfig.patch
-sas_class-include-files-in-include-scsi-sas.patch
-sas-class-core-files.patch
-aic94xx-the-aic94xx-sas-lldd.patch
+git-sas.patch

Adaptec Serial Attached Storage tree

+gregkh-usb-ub-burn-cd-fix.patch
+gregkh-usb-usb-option-new-ids.patch
+gregkh-usb-usb-ftdi_sio-baud-rate-change.patch
+gregkh-usb-usb-pxa2xx_udc-build-fix.patch
+gregkh-usb-usb-sl811-minor-fixes.patch
+gregkh-usb-devfs-remove-usb-mode.patch
+gregkh-usb-usb-handoff-merge.patch
+gregkh-usb-usb-power-state-01.patch
+gregkh-usb-usb-power-state-02.patch
+gregkh-usb-usb-power-state-03.patch
+gregkh-usb-usb-power-state-04.patch
+gregkh-usb-usb-power-state-05.patch
+gregkh-usb-usb-uhci-01.patch
+gregkh-usb-usb-uhci-02.patch
+gregkh-usb-usb-gotemp.patch

USB tree updates

+gregkh-usb-usb-power-state-03-fix.patch
+gregkh-usb-usb-handoff-merge-usb-Makefile-fix.patch
+pegasus-ethernet-over-usb-driver-fixes.patch
+st5481_usb-build-fix.patch

Various USB fixes and enhancements

+x86_64-defconfig-update.patch
-x86_64-dma32-iommu.patch
-x86_64-dma32-srat32.patch
-x86_64-vm-holes-reserved.patch
+x86_64-dma32-srat32.patch
+x86_64-vm-holes-reserved.patch
+x86_64-hpet-regs.patch
+x86_64-no-idle-tick.patch
+x86_64-nohpet.patch
+x86_64-mce-thresh.patch
+x86_64-pat-base.patch

Various x86_64 tree updates

+x86_64-no-idle-tick-fix.patch
+x86_64-no-idle-tick-fix-2.patch
+x86_64-mce-thresh-fix.patch
+x86_64-mce-thresh-fix-2.patch

Fix them up.

+mm-move_pte-to-remap-zero_page-fix.patch

Fix mm-move_pte-to-remap-zero_page.patch

+eeproc-module_param_array-cleanup.patch
+b44-fix-suspend-resume.patch
+r8169-call-proper-vlan-receive-function.patch

net driver updates

+ppc32-cleanup-amcc-ppc44x-eval-board-u-boot-support.patch
+ppc32-ifdef-out-altivec-specific-code-in-__switch_to.patch
+ppc32-handle-access-to-non-present-io-ports-on-8xx.patch

ppc32 updates

+x86-initialise-tss-io_bitmap_owner-to-something.patch
+intel_cacheinfo-remove-max_cache_leaves-limit.patch
+i386-little-pgtableh-consolidation-vs-2-3level.patch
+x86-hot-plug-cpu-to-support-physical-add-of-new-processors.patch

x86 updates

+x86_64-dont-use-shortcut-when-using-send_ipi_all-in-flat-mode.patch
+x86_64-init-and-zap-low-address-mappings-on-demand-for-cpu-hotplug.patch

More x86_64 updates

+introduce-valid-callback-for-pm_ops.patch

Power management fixlet

+uml-dont-remove-umid-files-in-conflict-case.patch
+strlcat-use-for-uml-umidc.patch
+uml-dont-redundantly-mark-pte-as-newpage-in-pte_modify.patch
+uml-fix-hang-in-tt-mode-on-fault.patch
+uml-fix-condition-in-tlb-flush.patch
+uml-run-mconsole-sysrq-in-process-context.patch
+uml-avoid-fixing-faults-while-atomic.patch
+uml-fix-gfp_-flags-usage.patch
+uml-use-gfp_atomic-for-allocations-under-spinlocks.patch
+uml-replace-printk-with-stack-friendly-printf-to-report-console-failure.patch

UML updates

+xtensa-remove-io_remap_page_range-and-minor-clean-ups.patch

xtensa fix

+cm4040-cardman-4040-driver-update.patch
+cm4000-cardman-4000-driver-update.patch

Update the cardman pcmcia drivers in -mm.

-invalidate_inode_pages2_range-clean-pages-fix.patch

Wrong, dropped.

+ext3-ext_debug-build-fixes.patch

ext3 fixlet

+fix-bd_claim-error-code.patch

swapon() return code fix

+reiserfs-free-checking-cleanup.patch

reiserfs cleanup

+remove-hardcoded-send_sig_xxx-constants.patch
+cleanup-the-usage-of-send_sig_xxx-constants.patch

Use the #defines

+little-de_thread-cleanup.patch
+introduce-setup_timer-helper.patch
+introduce-setup_timer-helper-x86_64-fix.patch
+move-tasklist-walk-from-cfq-iosched-to-elevatorc.patch

Various code cleanups

+add-kthread_stop_sem.patch

New workqueue featurette

+switch-sibyte-profiling-driver-to-compat_ioctl.patch
+switch-sibyte-profiling-driver-to-compat_ioctl-fix.patch
+remove-drm-ioctl32-translations-from-sparc-and-parisc.patch
+tioc-compat-ioctl-handling.patch

ioctl() cleanups

+ntp-shift_right-cleanup.patch

NTP cleanup

+delete-2-unreachable-statements-in-drivers-block-paride-pfc.patch
+clarify-help-text-for-init_env_arg_limit.patch
+moving-kprobes-and-oprofile-to-instrumentation-support-menu.patch

Little fixes

+keys-add-possessor-permissions-to-keys.patch

Key management enhancement

+fat-cleanup-and-optimization-of-checksum.patch
+fat-remove-the-unneeded-vfat_find-in-vfat_rename.patch
+fat-remove-duplicate-directory-scanning-code.patch

fatfs updates

+i4l-update-hfc_usb-driver.patch

ISDN driver update

+pcmcia-use-runtime-suspend-resume-support-to-unify-all-suspend-code-paths-fix.patch

Fix pcmcia-use-runtime-suspend-resume-support-to-unify-all-suspend-code-paths.patch

+pcmcia-yenta-add-support-for-more-ti-bridges.patch
+pcmcia-yenta-optimize-interrupt-handler.patch

Cardbus driver updates

+sched-modified-nice-support-for-smp-load-balancing.patch

CPU scheduler improvement

+reiser4-ver_linux-dont-print-reiser4progs-version-if-none-found.patch
+reiser4-atime-update-fix.patch
+reiser4-use-try_to_freeze.patch

reiser4 fixes

+ide-move-config_ide_max_hwifs-into-linux-ideh.patch

IDE cleanup

+add-dm-snapshot-tutorial-in-documentation.patch

Devicemapper documentation

+documentation-ioctl-messtxt-start-annotating-i-o.patch

Updates to ioctl documentation

+tty-layer-buffering-revamp-icom-fixes.patch
+tty-layer-buffering-revamp-isdn-layer.patch
+driver-char-n_hdlcc-remove-unused-declaration.patch

More tty layer fallout fixes




All 484 patches:



ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/patch-list


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel-announce" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Joel Becker
2005-09-22 06:35:57 UTC
Permalink
Post by Andrew Morton
git-ocfs2-prep.patch
git-ocfs2.patch
As the truncate_inode_pages patch is now in Linus' git, it is
no longer in git-ocfs2.patch. -rc2-mm1 is effectively reverting it.
git-ocfs2-prep.patch should be removed.

Joel
--
"There is no sincerer love than the love of food."
- George Bernard Shaw

Joel Becker
Principal Software Developer
Oracle
E-mail: ***@oracle.com
Phone: (650) 506-8127
Reuben Farrelly
2005-09-22 06:46:53 UTC
Permalink
Hi,
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
- Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
- Various random other things - nothing major.
Overall boots up and looks fine, but still seeing this oops which comes up on
warm reboot intermittently:

ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci(0000:00:1f.2) flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8802D00 ctl 0x0 bmdma 0x0 irq 193
ata2: SATA max UDMA/133 cmd 0xF8802D80 ctl 0x0 bmdma 0x0 irq 193
ata3: SATA max UDMA/133 cmd 0xF8802E00 ctl 0x0 bmdma 0x0 irq 193
ata4: SATA max UDMA/133 cmd 0xF8802E80 ctl 0x0 bmdma 0x0 irq 193
ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: no device found (phy stat 00000000)
scsi2 : ahci
ata4: no device found (phy stat 00000000)
scsi3 : ahci
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
scheduling while atomic: ksoftirqd/0/0x00000100/3
[<c0103ad0>] dump_stack+0x17/0x19
[<c031483a>] schedule+0x8ba/0xccb
[<c0315d17>] __down+0xe5/0x126
[<c0313f1a>] __down_failed+0xa/0x10
[<c0233f3d>] .text.lock.main+0x2b/0x3e
[<c022f90c>] device_del+0x35/0x5d
[<c025d71e>] scsi_target_reap+0x89/0xa3
[<c025ed5a>] scsi_device_dev_release+0x114/0x18b
[<c022f504>] device_release+0x1a/0x5a
[<c01e15c2>] kobject_cleanup+0x43/0x6b
[<c01e15f5>] kobject_release+0xb/0xd
[<c01e1e3c>] kref_put+0x2e/0x92
[<c01e160b>] kobject_put+0x14/0x16
[<c022f8d5>] put_device+0x11/0x13
[<c0256fd8>] scsi_put_command+0x7c/0x9e
[<c025b918>] scsi_next_command+0xf/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Unable to handle kernel paging request<5>SCSI device sda: 156301488 512-byte
hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
sda: at virtual address 6b6b6b6b
printing eip:
c025b81f
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 0
EIP: 0060:[<c025b81f>] Not tainted VLI
EFLAGS: 00010292 (2.6.14-rc2-mm1)
EIP is at scsi_run_queue+0x12/0xb8
eax: 6b6b6b6b ebx: f7c36b70 ecx: 00000000 edx: 00000001
esi: f7c4eb6c edi: 00000246 ebp: c1911eac esp: c1911e98
ds: 007b es: 007b ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c1910000 task=c1942a90)
Stack: c1baf5f8 f7c36b70 f7c36b70 f7c4eb6c 00000246 c1911eb8 c025b91f f7c386e8
c1911ed0 c025b9db f7c36b70 f7c4eb6c 00000000 00000000 c1911f28 c025bdd4
00000001 00004f80 00000100 00000001 c1807ac0 00000000 00000000 00040000
Call Trace:
[<c0103a83>] show_stack+0x94/0xca
[<c0103c2c>] show_registers+0x15a/0x1ea
[<c0103e4a>] die+0x108/0x183
[<c03166cd>] do_page_fault+0x1ed/0x63d
[<c0103753>] error_code+0x4f/0x54
[<c025b91f>] scsi_next_command+0x16/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Code: fd ff 8b 4d ec 8b 41 44 e8 e4 a6 0b 00 89 45 f0 89 d8 e8 34 c1 ff ff eb
b2 55 89 e5 57 56 53 83 ec 08 89 45 f0 8b 80 10 01 00 00 <8b> 38 80 b8 85 01
00 00 00 0f 88 8b 00 00 00 8b 47 44 e8 af a6
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 60 seconds..


This is not new to this -mm release (I had a screen dump of it 2 weeks ago but
I suspect it is actually a bit older than that even).

reuben
Andrew Morton
2005-09-22 07:03:50 UTC
Permalink
Post by Reuben Farrelly
Hi,
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
- Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
- Various random other things - nothing major.
Overall boots up and looks fine, but still seeing this oops which comes up on
Nasty.
Post by Reuben Farrelly
ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci(0000:00:1f.2) flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8802D00 ctl 0x0 bmdma 0x0 irq 193
ata2: SATA max UDMA/133 cmd 0xF8802D80 ctl 0x0 bmdma 0x0 irq 193
ata3: SATA max UDMA/133 cmd 0xF8802E00 ctl 0x0 bmdma 0x0 irq 193
ata4: SATA max UDMA/133 cmd 0xF8802E80 ctl 0x0 bmdma 0x0 irq 193
ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: no device found (phy stat 00000000)
scsi2 : ahci
ata4: no device found (phy stat 00000000)
scsi3 : ahci
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
scheduling while atomic: ksoftirqd/0/0x00000100/3
[<c0103ad0>] dump_stack+0x17/0x19
[<c031483a>] schedule+0x8ba/0xccb
[<c0315d17>] __down+0xe5/0x126
[<c0313f1a>] __down_failed+0xa/0x10
[<c0233f3d>] .text.lock.main+0x2b/0x3e
[<c022f90c>] device_del+0x35/0x5d
[<c025d71e>] scsi_target_reap+0x89/0xa3
[<c025ed5a>] scsi_device_dev_release+0x114/0x18b
[<c022f504>] device_release+0x1a/0x5a
[<c01e15c2>] kobject_cleanup+0x43/0x6b
[<c01e15f5>] kobject_release+0xb/0xd
[<c01e1e3c>] kref_put+0x2e/0x92
[<c01e160b>] kobject_put+0x14/0x16
[<c022f8d5>] put_device+0x11/0x13
[<c0256fd8>] scsi_put_command+0x7c/0x9e
[<c025b918>] scsi_next_command+0xf/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
There's a whole bunch of reasons why we cannot call scsi_target_reap() from
softirq context. klist_del() locking and whatever semaphore that's taking
are amongst them...
Post by Reuben Farrelly
Unable to handle kernel paging request<5>SCSI device sda: 156301488 512-byte
hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
sda: at virtual address 6b6b6b6b
c025b81f
*pde = 00000000
Oops: 0000 [#1]
SMP
CPU: 0
EIP: 0060:[<c025b81f>] Not tainted VLI
EFLAGS: 00010292 (2.6.14-rc2-mm1)
EIP is at scsi_run_queue+0x12/0xb8
eax: 6b6b6b6b ebx: f7c36b70 ecx: 00000000 edx: 00000001
esi: f7c4eb6c edi: 00000246 ebp: c1911eac esp: c1911e98
ds: 007b es: 007b ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c1910000 task=c1942a90)
Stack: c1baf5f8 f7c36b70 f7c36b70 f7c4eb6c 00000246 c1911eb8 c025b91f f7c386e8
c1911ed0 c025b9db f7c36b70 f7c4eb6c 00000000 00000000 c1911f28 c025bdd4
00000001 00004f80 00000100 00000001 c1807ac0 00000000 00000000 00040000
[<c0103a83>] show_stack+0x94/0xca
[<c0103c2c>] show_registers+0x15a/0x1ea
[<c0103e4a>] die+0x108/0x183
[<c03166cd>] do_page_fault+0x1ed/0x63d
[<c0103753>] error_code+0x4f/0x54
[<c025b91f>] scsi_next_command+0x16/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Code: fd ff 8b 4d ec 8b 41 44 e8 e4 a6 0b 00 89 45 f0 89 d8 e8 34 c1 ff ff eb
b2 55 89 e5 57 56 53 83 ec 08 89 45 f0 8b 80 10 01 00 00 <8b> 38 80 b8 85 01
00 00 00 0f 88 8b 00 00 00 8b 47 44 e8 af a6
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 60 seconds..
It oopsed as well. That might be a second bug.
Post by Reuben Farrelly
This is not new to this -mm release (I had a screen dump of it 2 weeks ago but
I suspect it is actually a bit older than that even).
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin J. Bligh
2005-09-22 18:59:37 UTC
Permalink
Build breaks with this config (x440/summit):
http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/elm3b67

arch/i386/kernel/built-in.o(.init.text+0x389d): In function `set_nmi_ipi_callback':
/usr/local/autobench/var/tmp/build/arch/i386/kernel/traps.c:727: undefined reference to `usb_early_handoff'
arch/i386/kernel/built-in.o(.init.text+0x4ee0): In function `smp_read_mpc':
/usr/local/autobench/var/tmp/build/include/asm-i386/mach-summit/mach_mpparse.h:35: undefined reference to `usb_early_handoff'


Plus it panics on boot on Power-4 LPAR

Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
softlockup thread 1 started up.
Processor 2 found.
softlockup thread 2 started up.
Processor 3 found.
Brought up 4 CPUs
softlockup thread 3 started up.
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
PCI_DMA: iommu_table_setparms: /***@3fffde0a000/***@2,2 has missing tce entries !
Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes

<7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
ibm,os-term call failed -1
Andrew Morton
2005-09-22 19:52:19 UTC
Permalink
Post by Martin J. Bligh
http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/elm3b67
/usr/local/autobench/var/tmp/build/arch/i386/kernel/traps.c:727: undefined reference to `usb_early_handoff'
/usr/local/autobench/var/tmp/build/include/asm-i386/mach-summit/mach_mpparse.h:35: undefined reference to `usb_early_handoff'
grr. David had a hack in there which caused my links to fail so I hacked
it out and broke yours.
Post by Martin J. Bligh
Plus it panics on boot on Power-4 LPAR
Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
softlockup thread 1 started up.
Processor 2 found.
softlockup thread 2 started up.
Processor 3 found.
Brought up 4 CPUs
softlockup thread 3 started up.
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
<7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
ibm,os-term call failed -1
There are ppc64 IOMMU changes in Linus's tree...
Martin J. Bligh
2005-09-22 20:14:10 UTC
Permalink
Post by Andrew Morton
Post by Martin J. Bligh
Plus it panics on boot on Power-4 LPAR
Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
softlockup thread 1 started up.
Processor 2 found.
softlockup thread 2 started up.
Processor 3 found.
Brought up 4 CPUs
softlockup thread 3 started up.
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
<7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
ibm,os-term call failed -1
There are ppc64 IOMMU changes in Linus's tree...
Thanks. will retest with just linus.patch to confirm
Martin J. Bligh
2005-09-23 00:28:26 UTC
Permalink
Post by Martin J. Bligh
Post by Andrew Morton
Post by Martin J. Bligh
Plus it panics on boot on Power-4 LPAR
Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
softlockup thread 1 started up.
Processor 2 found.
softlockup thread 2 started up.
Processor 3 found.
Brought up 4 CPUs
softlockup thread 3 started up.
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
<7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
ibm,os-term call failed -1
There are ppc64 IOMMU changes in Linus's tree...
Thanks. will retest with just linus.patch to confirm
Yeah, is broken there too. Borkage in mainline! ;-)

http://test.kernel.org/13316/debug/console.log

if someone wants to look ...

M.
Badari Pulavarty
2005-09-22 22:28:53 UTC
Permalink
Hi Andrew,

My ide-based AMD64 machine doesn't boot 2.6.14-rc2-mm1.
Known issue ?

Thanks,
Badari
Andrew Morton
2005-09-22 23:39:30 UTC
Permalink
Post by Badari Pulavarty
My ide-based AMD64 machine doesn't boot 2.6.14-rc2-mm1.
Known issue ?
Nope. How does that dmesg output differ from 2.6.14-rc2's?
Alexey Dobriyan
2005-09-22 19:50:29 UTC
Permalink
I see regression in tty update speed with ADOM (ncurses based
roguelike) [1].

Messages at the top ("goblin hits you") are printed slowly. An eye can
notice letter after letter printing.

2.6.14-rc2 is OK.

I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
it'll change something.

[1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
Alexey Dobriyan
2005-09-22 21:49:26 UTC
Permalink
Post by Alexey Dobriyan
I see regression in tty update speed with ADOM (ncurses based
roguelike) [1].
Messages at the top ("goblin hits you") are printed slowly. An eye can
notice letter after letter printing.
2.6.14-rc2 is OK.
I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
it'll change something.
[1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
Scratch TTY revamp, the sucker is
fix-sys_poll-large-timeout-handling.patch

HZ=250 here.
------------------------------------------------------------------------
From: Nishanth Aravamudan <***@us.ibm.com>

The @timeout parameter to sys_poll() is in milliseconds but we compare it
to (MAX_SCHEDULE_TIMEOUT / HZ), which is (jiffies/jiffies-per-sec) or
seconds. That seems blatantly broken. This led to improper overflow
checking for @timeout. As Andrew Morton pointed out, the best fix is to to
check for potential overflow first, then either select an indefinite value
or convert @timeout.

To achieve this and clean-up the code, change the prototype of the sys_poll
to make it clear that the parameter is in milliseconds and introduce a
variable, timeout_jiffies to hold the corresonding jiffies value.

Signed-off-by: Nishanth Aravamudan <***@us.ibm.com>
Signed-off-by: Andrew Morton <***@osdl.org>
---

fs/select.c | 36 ++++++++++++++++++++++++++----------
include/linux/syscalls.h | 2 +-
2 files changed, 27 insertions(+), 11 deletions(-)

diff -puN fs/select.c~fix-sys_poll-large-timeout-handling fs/select.c
--- devel/fs/select.c~fix-sys_poll-large-timeout-handling 2005-09-10 02:35:19.000000000 -0700
+++ devel-akpm/fs/select.c 2005-09-10 03:26:17.000000000 -0700
@@ -464,15 +464,18 @@ static int do_poll(unsigned int nfds, s
return count;
}

-asmlinkage long sys_poll(struct pollfd __user * ufds, unsigned int nfds, long timeout)
+asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
+ long timeout_msecs)
{
struct poll_wqueues table;
- int fdcount, err;
+ int fdcount, err;
+ int overflow;
unsigned int i;
struct poll_list *head;
struct poll_list *walk;
struct fdtable *fdt;
int max_fdset;
+ unsigned long timeout_jiffies;

/* Do a sanity check on nfds ... */
rcu_read_lock();
@@ -482,13 +485,26 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;

- if (timeout) {
- /* Careful about overflow in the intermediate values */
- if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT / HZ)
- timeout = (unsigned long)(timeout*HZ+999)/1000+1;
- else /* Negative or overflow */
- timeout = MAX_SCHEDULE_TIMEOUT;
- }
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to avoid
+ * converting any value to a numerically higher value, which
+ * could overflow.
+ */
+#if HZ > 1000
+ overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+#else
+ overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+#endif
+
+ /*
+ * If we would overflow in the conversion or a negative timeout
+ * is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;

poll_initwait(&table);

@@ -519,7 +535,7 @@ asmlinkage long sys_poll(struct pollfd _
}
i -= pp->len;
}
- fdcount = do_poll(nfds, head, &table, timeout);
+ fdcount = do_poll(nfds, head, &table, timeout_jiffies);

/* OK, now copy the revents fields back to user space. */
walk = head;
diff -puN include/linux/syscalls.h~fix-sys_poll-large-timeout-handling include/linux/syscalls.h
--- devel/include/linux/syscalls.h~fix-sys_poll-large-timeout-handling 2005-09-10 02:35:19.000000000 -0700
+++ devel-akpm/include/linux/syscalls.h 2005-09-10 02:35:19.000000000 -0700
@@ -420,7 +420,7 @@ asmlinkage long sys_socketpair(int, int,
asmlinkage long sys_socketcall(int call, unsigned long __user *args);
asmlinkage long sys_listen(int, int);
asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
- long timeout);
+ long timeout_msecs);
asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp);
asmlinkage long sys_epoll_create(int size);
_
Nishanth Aravamudan
2005-09-23 00:08:15 UTC
Permalink
Post by Alexey Dobriyan
Post by Alexey Dobriyan
I see regression in tty update speed with ADOM (ncurses based
roguelike) [1].
Messages at the top ("goblin hits you") are printed slowly. An eye can
notice letter after letter printing.
2.6.14-rc2 is OK.
I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
it'll change something.
[1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
Scratch TTY revamp, the sucker is
fix-sys_poll-large-timeout-handling.patch
HZ=250 here.
Alexey,

Thanks for the report. I will take a look on my Thinkpad with HZ=250
under -mm2. I have some ideas for debugging it if I see the same
problem.

Thanks,
Nish
Nish Aravamudan
2005-09-23 17:12:11 UTC
Permalink
Post by Nishanth Aravamudan
Post by Alexey Dobriyan
Post by Alexey Dobriyan
I see regression in tty update speed with ADOM (ncurses based
roguelike) [1].
Messages at the top ("goblin hits you") are printed slowly. An eye can
notice letter after letter printing.
2.6.14-rc2 is OK.
I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
it'll change something.
[1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
Scratch TTY revamp, the sucker is
fix-sys_poll-large-timeout-handling.patch
HZ=250 here.
Alexey,
Thanks for the report. I will take a look on my Thinkpad with HZ=250
under -mm2. I have some ideas for debugging it if I see the same
problem.
I did not see any tty refresh problems on my TP with HZ=250 under
2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
adom binary you sent me. I even played two games just to make sure ;)

Is there any chance you can do an strace of the process while it is
slow to redraw your screen? Just to verify how poll() is being called
[if my patch is the problem, then poll() must be being used somewhat
differently than I expected -- e.g. a dependency on the broken
behavior]. The only thing I can think of right now is that I made
timeout_jiffies unsigned, when schedule_timeout() will treat it as
signed, but I'm not sure if that is the problem.

We may want to contact the adom author eventually to figure out how
poll() is being used in the Linux port, if strace is unable to help
further.

Thanks,
Nish
Alexey Dobriyan
2005-09-23 18:42:16 UTC
Permalink
Post by Nish Aravamudan
I did not see any tty refresh problems on my TP with HZ=250 under
2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
adom binary you sent me. I even played two games just to make sure ;)
The slowdown is HZ dependent:
* HZ=1000 - game is playable. If I would not know slowdown is there I
wouldn't notice it.
* HZ=100 - messages at the top are printed r e a l l y s l o w.
* HZ=250 - somewhere in the middle.
Post by Nish Aravamudan
Is there any chance you can do an strace of the process while it is
slow to redraw your screen?
Typical pattern is:

rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[11;18H\33[37m\33[40m[g] Gnome\r\33[12"..., 58) = 58
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[12;18H\33[37m\33[40m[h] Hurthling\r"..., 62) = 62
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[13;18H\33[37m\33[40m[i] Orc\r\33[14d\33"..., 56) = 56
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0

I can send full strace log if needed.
Nishanth Aravamudan
2005-09-23 19:07:49 UTC
Permalink
Post by Alexey Dobriyan
Post by Nish Aravamudan
I did not see any tty refresh problems on my TP with HZ=250 under
2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
adom binary you sent me. I even played two games just to make sure ;)
* HZ=1000 - game is playable. If I would not know slowdown is there I
wouldn't notice it.
* HZ=100 - messages at the top are printed r e a l l y s l o w.
* HZ=250 - somewhere in the middle.
Post by Nish Aravamudan
Is there any chance you can do an strace of the process while it is
slow to redraw your screen?
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[11;18H\33[37m\33[40m[g] Gnome\r\33[12"..., 58) = 58
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[12;18H\33[37m\33[40m[h] Hurthling\r"..., 62) = 62
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[13;18H\33[37m\33[40m[i] Orc\r\33[14d\33"..., 56) = 56
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
I can send full strace log if needed.
Nope, that helped tremendously! I think I know what the issue is (and
why it's HZ dependent).

In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
be resolved as 0 jiffy requests. But in my patch, those requests become
1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!

Care to try the following patch?

Note: I would be happy to not do the conditional and just have the patch
change the msecs_to_jiffies() line when assigning to timeout_jiffies.
But I figured it would be best to avoid *all* computations if we know
the resulting value is going to be 0. Hence all the tab changing.

Thanks,
Nish

Description: Modifying sys_poll() to handle large timeouts correctly
resulted in 0 being treated just like any other millisecond request,
while the current code treats it as an optimized case. Do the same in
the new code. Most of the code change is tabbing due to the inserted if.

Signed-off-by: Nishanth Aravamudan <***@us.ibm.com>

---

fs/select.c | 41 +++++++++++++++++++++++++----------------
1 files changed, 25 insertions(+), 16 deletions(-)

diff -urpN 2.6.14-rc2-mm1/fs/select.c 2.6.14-rc2-mm1-dev/fs/select.c
--- 2.6.14-rc2-mm1/fs/select.c 2005-09-23 11:52:36.000000000 -0700
+++ 2.6.14-rc2-mm1-dev/fs/select.c 2005-09-23 12:04:03.000000000 -0700
@@ -485,26 +485,35 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;

- /*
- * We compare HZ with 1000 to work out which side of the
- * expression needs conversion. Because we want to avoid
- * converting any value to a numerically higher value, which
- * could overflow.
- */
+ if (timeout_msecs) {
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to
+ * avoid converting any value to a numerically higher
+ * value, which could overflow.
+ */
#if HZ > 1000
- overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+ overflow = timeout_msecs >=
+ jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
#else
- overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+ overflow = msecs_to_jiffies(timeout_msecs) >=
+ MAX_SCHEDULE_TIMEOUT;
#endif

- /*
- * If we would overflow in the conversion or a negative timeout
- * is requested, sleep indefinitely.
- */
- if (overflow || timeout_msecs < 0)
- timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
- else
- timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ /*
+ * If we would overflow in the conversion or a negative
+ * timeout is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ } else {
+ /*
+ * 0 millisecond requests become 0 jiffy requests
+ */
+ timeout_jiffies = 0;
+ }

poll_initwait(&table);
Alexey Dobriyan
2005-09-23 19:42:53 UTC
Permalink
Post by Nishanth Aravamudan
Post by Alexey Dobriyan
poll([{fd=0, events=POLLIN}], 1, 0) = 0
I can send full strace log if needed.
Nope, that helped tremendously! I think I know what the issue is (and
why it's HZ dependent).
In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
be resolved as 0 jiffy requests. But in my patch, those requests become
1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
Care to try the following patch?
It works! Now, even with HZ=100, gameplay is smooth.

Andrew, please, apply.
Post by Nishanth Aravamudan
Description: Modifying sys_poll() to handle large timeouts correctly
resulted in 0 being treated just like any other millisecond request,
while the current code treats it as an optimized case. Do the same in
the new code. Most of the code change is tabbing due to the inserted if.
diff -urpN 2.6.14-rc2-mm1/fs/select.c 2.6.14-rc2-mm1-dev/fs/select.c
--- 2.6.14-rc2-mm1/fs/select.c 2005-09-23 11:52:36.000000000 -0700
+++ 2.6.14-rc2-mm1-dev/fs/select.c 2005-09-23 12:04:03.000000000 -0700
@@ -485,26 +485,35 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;
- /*
- * We compare HZ with 1000 to work out which side of the
- * expression needs conversion. Because we want to avoid
- * converting any value to a numerically higher value, which
- * could overflow.
- */
+ if (timeout_msecs) {
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to
+ * avoid converting any value to a numerically higher
+ * value, which could overflow.
+ */
#if HZ > 1000
- overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+ overflow = timeout_msecs >=
+ jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
#else
- overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+ overflow = msecs_to_jiffies(timeout_msecs) >=
+ MAX_SCHEDULE_TIMEOUT;
#endif
- /*
- * If we would overflow in the conversion or a negative timeout
- * is requested, sleep indefinitely.
- */
- if (overflow || timeout_msecs < 0)
- timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
- else
- timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ /*
+ * If we would overflow in the conversion or a negative
+ * timeout is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ } else {
+ /*
+ * 0 millisecond requests become 0 jiffy requests
+ */
+ timeout_jiffies = 0;
+ }
poll_initwait(&table);
Nishanth Aravamudan
2005-09-23 21:32:41 UTC
Permalink
Post by Alexey Dobriyan
Post by Nishanth Aravamudan
Post by Alexey Dobriyan
poll([{fd=0, events=POLLIN}], 1, 0) = 0
I can send full strace log if needed.
Nope, that helped tremendously! I think I know what the issue is (and
why it's HZ dependent).
In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
be resolved as 0 jiffy requests. But in my patch, those requests become
1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
Care to try the following patch?
It works! Now, even with HZ=100, gameplay is smooth.
Andrew, please, apply.
Great! Thanks for the testing, Alexey.

-Nish
Mattia Dongili
2005-09-24 17:43:17 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
[...]
Post by Andrew Morton
+reiser4-ver_linux-dont-print-reiser4progs-version-if-none-found.patch
+reiser4-atime-update-fix.patch
+reiser4-use-try_to_freeze.patch
reiser4 fixes
Runs good, except that reiser4 seems to do bad things in do_sendfile.
I have apache2 running here and it refuses to serve my ~/public_html
homepage. /home is running on a reiser4 partition and while apache2
serves good pages from different filesystems, stracing the process while
requesting my homepage, I get:

stat64("/home/mattia/public_html/index.html", {st_mode=S_IFREG|0644, st_size=2315, ...}) = 0
open("/home/mattia/public_html/index.html", O_RDONLY) = 12
setsockopt(11, SOL_TCP, TCP_NODELAY, [0], 4) = 0
setsockopt(11, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(11, [{"HTTP/1.1 200 OK\r\nDate: Sat, 24 S"..., 328}], 1) = 328
sendfile(11, 12, [0], 2315) = -1 EINVAL (Invalid argument)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
setsockopt(11, SOL_TCP, TCP_CORK, [0], 4) = 0
setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
read(11, 0x82297f0, 8000) = -1 EAGAIN (Resource temporarily unavailable)
write(10, "127.0.0.1 - - [24/Sep/2005:10:13"..., 95) = 95
close(11) = 0
read(5, 0xbfe4c4e3, 1) = -1 EAGAIN (Resource temporarily unavailable)
close(12) = 0
--
mattia
:wq!
Mattia Dongili
2005-09-24 17:58:48 UTC
Permalink
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
Herm... running almost good :) I just got the below allocation failure
(including /proc/slabinfo and /proc/vmstat, useful? can provide more
info if happens again - ah, exim is just running for the local delivery
purpose only). I did see it previously in .14-rc1-mm1 only but I didn't
find time enough to report it properly.

Linux version 2.6.14-rc2-mm1-1 (***@inferi) (gcc version 4.0.1 (Debian 4.0.1-2)) #1 PREEMPT Fri Sep 23 20:56:05 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000c0000 - 00000000000d0000 (reserved)
BIOS-e820: 00000000000d8000 - 00000000000e0000 (reserved)
BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
BIOS-e820: 000000000fef0000 - 000000000feff000 (ACPI data)
BIOS-e820: 000000000feff000 - 000000000ff00000 (ACPI NVS)
BIOS-e820: 000000000ff00000 - 000000000ff80000 (usable)
BIOS-e820: 000000000ff80000 - 0000000010000000 (reserved)
BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved)
255MB LOWMEM available.
On node 0 totalpages: 65408
DMA zone: 4096 pages, LIFO batch:2
DMA32 zone: 0 pages, LIFO batch:2
Normal zone: 61312 pages, LIFO batch:32
HighMem zone: 0 pages, LIFO batch:2
DMI present.
[...]
exim4: page allocation failure. order:1, mode:0x80000020
[<c0143698>] __alloc_pages+0x328/0x450
[<c0147150>] kmem_getpages+0x30/0xa0
[<c01480cf>] cache_grow+0xbf/0x1f0
[<c0148446>] cache_alloc_refill+0x246/0x280
[<c0148793>] __kmalloc+0x73/0x80
[<c0291cd8>] pskb_expand_head+0x58/0x150
[<c0297143>] skb_checksum_help+0x103/0x120
[<d0c6d1cc>] ip_nat_fn+0x1cc/0x240 [iptable_nat]
[<d0c763e8>] ip_conntrack_in+0x188/0x2c0 [ip_conntrack]
[<d0c6d45e>] ip_nat_local_fn+0x7e/0xc0 [iptable_nat]
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02e7c2b>] nf_iterate+0x6b/0xa0
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02e7cc4>] nf_hook_slow+0x64/0x140
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02b35ae>] ip_queue_xmit+0x23e/0x550
[<c02b2670>] dst_output+0x0/0x30
[<c01e1b9a>] __copy_to_user_ll+0x4a/0x90
[<c0293a6e>] memcpy_toiovec+0x6e/0x90
[<c02c4c75>] tcp_cwnd_restart+0x35/0xf0
[<c02c5276>] tcp_transmit_skb+0x426/0x780
[<c02c332e>] tcp_rcv_established+0x6e/0x8c0
[<c02c657d>] tcp_write_xmit+0x12d/0x3d0
[<c02c6855>] __tcp_push_pending_frames+0x35/0xb0
[<c02bad3c>] tcp_sendmsg+0xa3c/0xb50
[<c028c67f>] sock_aio_write+0xcf/0x120
[<c016029d>] do_sync_write+0xcd/0x130
[<c0131ed0>] autoremove_wake_function+0x0/0x60
[<c016047f>] vfs_write+0x17f/0x190
[<c016055b>] sys_write+0x4b/0x80
[<c01032a1>] syscall_call+0x7/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: low 0, high 12, batch 2 used:8
cpu 0 cold: low 0, high 4, batch 1 used:3
DMA32 per-cpu: empty
Normal per-cpu:
cpu 0 hot: low 0, high 192, batch 32 used:14
cpu 0 cold: low 0, high 64, batch 16 used:51
HighMem per-cpu: empty
Free pages: 4112kB (0kB HighMem)
Active:46238 inactive:10857 dirty:16 writeback:0 unstable:0 free:1028 slab:4078 mapped:39343 pagetables:316
DMA free:1224kB min:128kB low:160kB high:192kB active:6812kB inactive:3684kB present:16384kB pages_scanned:36 all_unreclaimable? no
lowmem_reserve[]: 0 0 239 239
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 239 239
Normal free:2888kB min:1916kB low:2392kB high:2872kB active:178140kB inactive:39744kB present:245248kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 300*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1224kB
DMA32: empty
Normal: 722*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2888kB
HighMem: empty
Swap cache: add 33, delete 33, find 0/0, race 0+0
Free swap = 248864kB
Total swap = 248996kB
Free swap: 248864kB
65408 pages of RAM
0 pages of HIGHMEM
1529 reserved pages
46307 pages shared
0 pages swap cached
16 pages dirty
0 pages writeback
39343 pages mapped
4078 pages slab
316 pages pagetables


cat /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfs_write_data 36 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_read_data 32 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_inode_cache 3 14 560 7 1 : tunables 54 27 0 : slabdata 2 2 0
nfs_page 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0
rpc_tasks 8 20 192 20 1 : tunables 120 60 0 : slabdata 1 1 0
rpc_inode_cache 8 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
ip_conntrack_expect 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
ip_conntrack 1 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
scsi_cmd_cache 1 11 352 11 1 : tunables 54 27 0 : slabdata 1 1 0
d_cursor 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
file_fsdata 71 75 256 15 1 : tunables 120 60 0 : slabdata 5 5 0
dentry_fsdata 2188 3658 64 59 1 : tunables 120 60 0 : slabdata 62 62 0
fq 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
jnode 1869 4480 96 40 1 : tunables 120 60 0 : slabdata 112 112 0
txn_handle 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
txn_atom 1 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
plugin_set 73 118 64 59 1 : tunables 120 60 0 : slabdata 2 2 0
znode 4704 7888 224 17 1 : tunables 120 60 0 : slabdata 464 464 0
reiser4_inode 4057 4144 512 7 1 : tunables 54 27 0 : slabdata 592 592 0
sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0
sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0
sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0
sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
sgpool-8 32 60 128 30 1 : tunables 120 60 0 : slabdata 2 2 0
dm_tio 0 0 16 203 1 : tunables 120 60 0 : slabdata 0 0 0
dm_io 0 0 16 203 1 : tunables 120 60 0 : slabdata 0 0 0
uhci_urb_priv 1 92 40 92 1 : tunables 120 60 0 : slabdata 1 1 0
UNIX 77 77 352 11 1 : tunables 54 27 0 : slabdata 7 7 0
tcp_bind_bucket 15 203 16 203 1 : tunables 120 60 0 : slabdata 1 1 0
inet_peer_cache 1 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_alias 9 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_hash 9 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
ip_dst_cache 31 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
arp_cache 3 30 128 30 1 : tunables 120 60 0 : slabdata 1 1 0
RAW 2 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
UDP 8 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
tw_sock_TCP 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
request_sock_TCP 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
TCP 15 16 960 4 1 : tunables 54 27 0 : slabdata 4 4 0
cfq_ioc_pool 0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
cfq_pool 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
crq_pool 0 0 44 84 1 : tunables 120 60 0 : slabdata 0 0 0
deadline_drq 0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
as_arq 24 189 60 63 1 : tunables 120 60 0 : slabdata 3 3 0
mqueue_inode_cache 1 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0
reiser_inode_cache 622 1450 392 10 1 : tunables 54 27 0 : slabdata 145 145 0
dnotify_cache 0 0 20 169 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_pwq 0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_epi 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_event_cache 0 0 28 127 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_watch_cache 0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
kioctx 0 0 160 24 1 : tunables 120 60 0 : slabdata 0 0 0
kiocb 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
fasync_cache 2 203 16 203 1 : tunables 120 60 0 : slabdata 1 1 0
shmem_inode_cache 748 756 408 9 1 : tunables 54 27 0 : slabdata 84 84 0
posix_timers_cache 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
uid_cache 6 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_ioc 51 127 28 127 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_queue 2 10 380 10 1 : tunables 54 27 0 : slabdata 1 1 0
blkdev_requests 25 78 152 26 1 : tunables 120 60 0 : slabdata 3 3 0
biovec-(256) 260 260 3072 2 2 : tunables 24 12 0 : slabdata 130 130 0
biovec-128 264 265 1536 5 2 : tunables 24 12 0 : slabdata 53 53 0
biovec-64 272 275 768 5 1 : tunables 54 27 0 : slabdata 55 55 0
biovec-16 272 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
biovec-4 272 295 64 59 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 279 406 16 203 1 : tunables 120 60 0 : slabdata 2 2 0
bio 279 354 64 59 1 : tunables 120 60 0 : slabdata 6 6 0
file_lock_cache 21 44 88 44 1 : tunables 120 60 0 : slabdata 1 1 0
sock_inode_cache 110 110 352 11 1 : tunables 54 27 0 : slabdata 10 10 0
skbuff_fclone_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0
skbuff_head_cache 696 696 160 24 1 : tunables 120 60 0 : slabdata 29 29 0
acpi_operand 828 828 40 92 1 : tunables 120 60 0 : slabdata 9 9 0
acpi_parse_ext 61 84 44 84 1 : tunables 120 60 0 : slabdata 1 1 0
acpi_parse 41 127 28 127 1 : tunables 120 60 0 : slabdata 1 1 0
acpi_state 28 78 48 78 1 : tunables 120 60 0 : slabdata 1 1 0
proc_inode_cache 215 360 332 12 1 : tunables 54 27 0 : slabdata 30 30 0
sigqueue 4 26 148 26 1 : tunables 120 60 0 : slabdata 1 1 0
radix_tree_node 3568 4046 276 14 1 : tunables 54 27 0 : slabdata 289 289 0
bdev_cache 7 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
sysfs_dir_cache 4059 4140 40 92 1 : tunables 120 60 0 : slabdata 45 45 0
mnt_cache 27 40 96 40 1 : tunables 120 60 0 : slabdata 1 1 0
inode_cache 1113 1272 316 12 1 : tunables 54 27 0 : slabdata 106 106 0
dentry_cache 5085 7569 136 29 1 : tunables 120 60 0 : slabdata 261 261 0
filp 1512 1632 160 24 1 : tunables 120 60 0 : slabdata 68 68 0
names_cache 11 11 4096 1 1 : tunables 24 12 0 : slabdata 11 11 0
idr_layer_cache 93 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0
buffer_head 3942 20592 48 78 1 : tunables 120 60 0 : slabdata 264 264 0
mm_struct 77 77 576 7 1 : tunables 54 27 0 : slabdata 11 11 0
vm_area_struct 3512 3740 88 44 1 : tunables 120 60 0 : slabdata 85 85 0
fs_cache 77 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
files_cache 78 99 448 9 1 : tunables 54 27 0 : slabdata 11 11 0
signal_cache 99 99 352 11 1 : tunables 54 27 0 : slabdata 9 9 0
sighand_cache 84 84 1312 3 1 : tunables 24 12 0 : slabdata 28 28 0
task_struct 93 93 1328 3 1 : tunables 24 12 0 : slabdata 31 31 0
anon_vma 1504 1695 8 339 1 : tunables 120 60 0 : slabdata 5 5 0
pgd 64 64 4096 1 1 : tunables 24 12 0 : slabdata 64 64 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 2 2 32768 1 8 : tunables 8 4 0 : slabdata 2 2 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 95 95 8192 1 2 : tunables 8 4 0 : slabdata 95 95 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
size-4096 100 100 4096 1 1 : tunables 24 12 0 : slabdata 100 100 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
size-2048 310 328 2048 2 1 : tunables 24 12 0 : slabdata 164 164 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
size-1024 176 176 1024 4 1 : tunables 54 27 0 : slabdata 44 44 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
size-512 624 624 512 8 1 : tunables 54 27 0 : slabdata 78 78 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
size-256 150 150 256 15 1 : tunables 120 60 0 : slabdata 10 10 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
size-128 1702 1800 128 30 1 : tunables 120 60 0 : slabdata 60 60 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
size-64 2641 2891 64 59 1 : tunables 120 60 0 : slabdata 49 49 0
size-32 3020 3616 32 113 1 : tunables 120 60 0 : slabdata 32 32 0
kmem_cache 160 160 96 40 1 : tunables 120 60 0 : slabdata 4 4 0

and
cat /proc/vmstat
nr_dirty 6
nr_writeback 0
nr_unstable 0
nr_page_table_pages 299
nr_mapped 39613
nr_slab 4128
pgpgin 853871
pgpgout 697604
pswpin 0
pswpout 33
pgalloc_high 0
pgalloc_normal 7729542
pgalloc_dma 739299
pgfree 8475900
pgactivate 194732
pgdeactivate 167948
pgfault 4652531
pgmajfault 2200
pgrefill_high 0
pgrefill_normal 921490
pgrefill_dma 53701
pgsteal_high 0
pgsteal_normal 225142
pgsteal_dma 32821
pgscan_kswapd_high 0
pgscan_kswapd_normal 218790
pgscan_kswapd_dma 31262
pgscan_direct_high 0
pgscan_direct_normal 63855
pgscan_direct_dma 10391
pginodesteal 888
slabs_scanned 1641984
kswapd_steal 196892
kswapd_inodesteal 17749
pageoutrun 5595
allocstall 1531
pgrotated 71
nr_bounce 0
--
mattia
:wq!
Andrew Morton
2005-09-24 18:23:39 UTC
Permalink
Post by Mattia Dongili
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
Herm... running almost good :) I just got the below allocation failure
(including /proc/slabinfo and /proc/vmstat, useful? can provide more
info if happens again - ah, exim is just running for the local delivery
purpose only). I did see it previously in .14-rc1-mm1 only but I didn't
find time enough to report it properly.
...
exim4: page allocation failure. order:1, mode:0x80000020
Yes, it's expected that
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch will cause more
fragmentation and will hence cause higher-order allocation attempts to
fail.

I think I'll drop that one.
Seth, Rohit
2005-09-26 19:33:57 UTC
Permalink
Post by Andrew Morton
Post by Mattia Dongili
...
exim4: page allocation failure. order:1, mode:0x80000020
Yes, it's expected that
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch will cause more
fragmentation and will hence cause higher-order allocation attempts to
fail.
I think I'll drop that one.
Seems like from the log messages that quite a few pages are hanging in the cpu's cold pcp list even with the low memory conditions. Below is the patch to reduce the higher bound in cold pcp list (...this got increased with my previous change).

I think we should also drain the CPU's hot and cold pcps for the GFP_KERNEL page requests (in the event the higher order request is not able to get serviced otherwise). This will still only drains the current CPUs pcps in an MP environment (leaving the other CPUs with their lists intact). I will send this patch later today.

[PATCH]: Reduce the high mark in cpu's cold pcp list.

Signed-off-by: Rohit Seth <***@intel.com>


--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000 -0700
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000 -0700
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
Martin J. Bligh
2005-09-27 18:57:50 UTC
Permalink
Post by Seth, Rohit
Seems like from the log messages that quite a few pages are hanging in the cpu's cold pcp list even with the low memory conditions. Below is the patch to reduce the higher bound in cold pcp list (...this got increased with my previous change).
I think we should also drain the CPU's hot and cold pcps for the GFP_KERNEL page requests (in the event the higher order request is not able to get serviced otherwise). This will still only drains the current CPUs pcps in an MP environment (leaving the other CPUs with their lists intact). I will send this patch later today.
[PATCH]: Reduce the high mark in cpu's cold pcp list.
--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000 -0700
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000 -0700
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
-
I don't understand. How can you set the high watermark at half the batch
size? Makes no sense to me.

And can you give a stricter definiton of what you mean by "low memory
conditions"? I agree we ought to empty the lists before going OOM or
anything, but not at the slightest feather of pressure ... answer lies
somewhere inbetween ... but where?

M.
Rohit Seth
2005-09-27 20:05:02 UTC
Permalink
Post by Seth, Rohit
Post by Seth, Rohit
Seems like from the log messages that quite a few pages are hanging
in the cpu's cold pcp list even with the low memory conditions. Below
is the patch to reduce the higher bound in cold pcp list (...this got
increased with my previous change).
Post by Seth, Rohit
I think we should also drain the CPU's hot and cold pcps for the
GFP_KERNEL page requests (in the event the higher order request is not
able to get serviced otherwise). This will still only drains the
current CPUs pcps in an MP environment (leaving the other CPUs with
their lists intact). I will send this patch later today.
Post by Seth, Rohit
[PATCH]: Reduce the high mark in cpu's cold pcp list.
--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
-0700
Post by Seth, Rohit
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
-0700
Post by Seth, Rohit
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
-
I don't understand. How can you set the high watermark at half the
batch size? Makes no sense to me.
The batch size for the cold pcp list is getting initialized to batch/2
in the code snip above. So, this change is setting the high water mark
for cold list to same as pcp's batch number.
Post by Seth, Rohit
And can you give a stricter definiton of what you mean by "low memory
conditions"? I agree we ought to empty the lists before going OOM or
anything, but not at the slightest feather of pressure ... answer lies
somewhere inbetween ... but where?
In the specific case of dump information that Mattia sent earlier, there
is only 4M of free mem available at the time the order 1 request is
failing.

In general, I think if a specific higher order ( > 0) request fails that
has GFP_KERNEL set then at least we should drain the pcps.

-rohit
Martin J. Bligh
2005-09-27 21:18:14 UTC
Permalink
Post by Rohit Seth
Post by Martin J. Bligh
Post by Seth, Rohit
--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
-0700
Post by Seth, Rohit
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
-0700
Post by Seth, Rohit
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
-
I don't understand. How can you set the high watermark at half the
batch size? Makes no sense to me.
The batch size for the cold pcp list is getting initialized to batch/2
in the code snip above. So, this change is setting the high water mark
for cold list to same as pcp's batch number.
I must be being particularly dense today ... but:

pcp->high = batch / 2;

Looks like half the batch size to me, not the same?
Post by Rohit Seth
Post by Martin J. Bligh
And can you give a stricter definiton of what you mean by "low memory
conditions"? I agree we ought to empty the lists before going OOM or
anything, but not at the slightest feather of pressure ... answer lies
somewhere inbetween ... but where?
In the specific case of dump information that Mattia sent earlier, there
is only 4M of free mem available at the time the order 1 request is
failing.
In general, I think if a specific higher order ( > 0) request fails that
has GFP_KERNEL set then at least we should drain the pcps.
Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
for jumbo ethernet, or NFS, you want to drain the lists? that seems to
wholly defeat the purpose.

Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.

M.
Rohit Seth
2005-09-27 21:51:59 UTC
Permalink
Post by Seth, Rohit
Post by Rohit Seth
Post by Martin J. Bligh
Post by Seth, Rohit
--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
-0700
Post by Seth, Rohit
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
-0700
Post by Seth, Rohit
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
-
I don't understand. How can you set the high watermark at half the
batch size? Makes no sense to me.
The batch size for the cold pcp list is getting initialized to batch/2
in the code snip above. So, this change is setting the high water mark
for cold list to same as pcp's batch number.
pcp->high = batch / 2;
Looks like half the batch size to me, not the same?
pcp->batch = max(1UL, batch/2); is the line of code that is setting the
batch value for the cold pcp list. batch is just a number that we
counted based on some parameters earlier.
Post by Seth, Rohit
Post by Rohit Seth
Post by Martin J. Bligh
And can you give a stricter definiton of what you mean by "low memory
conditions"? I agree we ought to empty the lists before going OOM or
anything, but not at the slightest feather of pressure ... answer lies
somewhere inbetween ... but where?
In the specific case of dump information that Mattia sent earlier, there
is only 4M of free mem available at the time the order 1 request is
failing.
In general, I think if a specific higher order ( > 0) request fails that
has GFP_KERNEL set then at least we should drain the pcps.
Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
for jumbo ethernet, or NFS, you want to drain the lists? that seems to
wholly defeat the purpose.
Not every time there is a request for higher order pages. That surely
will defeat the purpose of pcps. But my suggestion is only to drain
when the the global pool is not able to service the request. In the
pathological case where the higher order and zero order requests are
alternating you could have thrashing in terms of pages moving to pcp for
them to move back to global list.
Post by Seth, Rohit
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.

-rohit
Martin J. Bligh
2005-09-27 21:59:45 UTC
Permalink
Post by Rohit Seth
Post by Seth, Rohit
pcp->high = batch / 2;
Looks like half the batch size to me, not the same?
pcp->batch = max(1UL, batch/2); is the line of code that is setting the
batch value for the cold pcp list. batch is just a number that we
counted based on some parameters earlier.
Ah, OK, so I am being dense. Fair enough. But if there's a reason to do
that max, perhaps:

pcp->batch = max(1UL, batch/2);
pcp->high = pcp->batch;

would be more appropriate? Tradeoff is more frequent dump / fill against
better frag, I suppose (at least if we don't refill using higher order
allocs ;-)) which seems fair enough.
Post by Rohit Seth
Post by Seth, Rohit
Post by Rohit Seth
In general, I think if a specific higher order ( > 0) request fails that
has GFP_KERNEL set then at least we should drain the pcps.
Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
for jumbo ethernet, or NFS, you want to drain the lists? that seems to
wholly defeat the purpose.
Not every time there is a request for higher order pages. That surely
will defeat the purpose of pcps. But my suggestion is only to drain
when the the global pool is not able to service the request. In the
pathological case where the higher order and zero order requests are
alternating you could have thrashing in terms of pages moving to pcp for
them to move back to global list.
OK, seems fair enough. But there's multiple "harder and harder" attempts
within __alloc_pages to do that ... which one are you going for? just
before we OOM / fail the alloc? That'd be hard to argue with, though I'm
unsure what the locking is to dump out other CPUs queues - you going to
global IPI and ask them to do it - that'd seem to cause it to race to
refill (as you mention).
Post by Rohit Seth
Post by Seth, Rohit
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.
OK ... but it has an impact on fragmentation. How much benefit are you
getting?

M.
Rohit Seth
2005-09-27 22:49:06 UTC
Permalink
Post by Seth, Rohit
pcp->batch = max(1UL, batch/2);
pcp->high = pcp->batch;
would be more appropriate? Tradeoff is more frequent dump / fill against
better frag, I suppose (at least if we don't refill using higher order
allocs ;-)) which seems fair enough.
There are couple of small changes including this one that I will be
sending out in this initialization routine.
Post by Seth, Rohit
Post by Rohit Seth
Not every time there is a request for higher order pages. That surely
will defeat the purpose of pcps. But my suggestion is only to drain
when the the global pool is not able to service the request. In the
pathological case where the higher order and zero order requests are
alternating you could have thrashing in terms of pages moving to pcp for
them to move back to global list.
OK, seems fair enough. But there's multiple "harder and harder" attempts
within __alloc_pages to do that ... which one are you going for? just
before we OOM / fail the alloc? That'd be hard to argue with, though I'm
unsure what the locking is to dump out other CPUs queues - you going to
global IPI and ask them to do it - that'd seem to cause it to race to
refill (as you mention).
Thinking of initiating this drain operation after the swapper daemon is
woken up. hopefully that will allow other possible pages to be put back
on freelist and reduce the possible thrash of pages between freemem pool
and pcps.

As a first step, I will be draining the local cpu's pcp. IPI or lazy
purging of pcps could be used as a a very last resort to drain other
CPUs pcps for the scnearios where nothing else has worked to get more
pages. For these extreme low memory conditions I'm not sure if we
should worry about thrashing any more than having free pages lying
around and not getting used.
Post by Seth, Rohit
Post by Rohit Seth
Post by Martin J. Bligh
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.
OK ... but it has an impact on fragmentation. How much benefit are you
getting?
Benefit is in terms of reduced performance variation (and expected
throughput) of certain workloads from run to run on the same kernel.

-rohit
Martin J. Bligh
2005-09-27 22:49:29 UTC
Permalink
Post by Rohit Seth
Post by Martin J. Bligh
Post by Rohit Seth
Not every time there is a request for higher order pages. That surely
will defeat the purpose of pcps. But my suggestion is only to drain
when the the global pool is not able to service the request. In the
pathological case where the higher order and zero order requests are
alternating you could have thrashing in terms of pages moving to pcp for
them to move back to global list.
OK, seems fair enough. But there's multiple "harder and harder" attempts
within __alloc_pages to do that ... which one are you going for? just
before we OOM / fail the alloc? That'd be hard to argue with, though I'm
unsure what the locking is to dump out other CPUs queues - you going to
global IPI and ask them to do it - that'd seem to cause it to race to
refill (as you mention).
Thinking of initiating this drain operation after the swapper daemon is
woken up. hopefully that will allow other possible pages to be put back
on freelist and reduce the possible thrash of pages between freemem pool
and pcps.
OK, but waking up kswapd doesn't indicate a low memory condition.
It's standard procedure .... we'll have to wake it up whenever we dip
below the high watermarks. Perhaps before dropping into direct reclaim
would be more appropriate?
Post by Rohit Seth
As a first step, I will be draining the local cpu's pcp. IPI or lazy
purging of pcps could be used as a a very last resort to drain other
CPUs pcps for the scnearios where nothing else has worked to get more
pages. For these extreme low memory conditions I'm not sure if we
should worry about thrashing any more than having free pages lying
around and not getting used.
Sounds fair.
Post by Rohit Seth
Post by Martin J. Bligh
Post by Rohit Seth
Post by Martin J. Bligh
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.
OK ... but it has an impact on fragmentation. How much benefit are you
getting?
Benefit is in terms of reduced performance variation (and expected
throughput) of certain workloads from run to run on the same kernel.
Mmmm. how much are you talking about in terms of throughput, and on what
platforms? all previous attempts to measure page colouring seemed to
indicate it did nothing at all - maybe some specific types of h/w are
more susceptible?

M.
Rohit Seth
2005-09-27 23:16:12 UTC
Permalink
Post by Martin J. Bligh
Post by Rohit Seth
Thinking of initiating this drain operation after the swapper daemon is
woken up. hopefully that will allow other possible pages to be put back
on freelist and reduce the possible thrash of pages between freemem pool
and pcps.
OK, but waking up kswapd doesn't indicate a low memory condition.
It's standard procedure .... we'll have to wake it up whenever we dip
below the high watermarks. Perhaps before dropping into direct reclaim
would be more appropriate?
Agreed. That is a better place.
Post by Martin J. Bligh
Post by Rohit Seth
Post by Martin J. Bligh
Post by Rohit Seth
Post by Martin J. Bligh
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.
OK ... but it has an impact on fragmentation. How much benefit are you
getting?
Benefit is in terms of reduced performance variation (and expected
throughput) of certain workloads from run to run on the same kernel.
Mmmm. how much are you talking about in terms of throughput, and on what
platforms? all previous attempts to measure page colouring seemed to
indicate it did nothing at all - maybe some specific types of h/w are
more susceptible?
In terms of percentages, between 10-15% variation. Nothing out of
regular about the platforms. Do you remember what workloads were run in
the previous attempts to see if there is any coloring. I agree that
with 2.6.x based kernel, there is better handle on the variation (as
compared to 2.4). And the best results of 2.6 matches the best results
of any coloring patch.

-rohit
Paul Blazejowski
2005-09-25 22:00:37 UTC
Permalink
Folks,

Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.

The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
to XORG list in hopes to help debug this issue.

The platform is nForce4 SLI from ASUS (A8N-SLI Premium) with dual core
X2 Athlon 3800+ processor.

XORG version is 6.8.2 under Slackware 10.2.

uname -a reports: Linux blaze 2.6.14-rc2-mm1 #1 SMP Sun Sep 25 17:03:22
EDT 2005 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
AuthenticAMD GNU/Linux

kernel config, dmesg output and lspci -vvv will be attached below.

I have confirmed this with another fellow who is using the same setup
and is having the same issue. Also worth noting is that the SATA
performance is very poor, the hdparm results give ~33mb/s where on
nforce2 previously the rates would be in ~58mb/s range. In comparison
SCSI rates are in ~52mb/s range. This both happens on sata_nv and
sata_sil controllers on this mainboard.

One of the workarounds for me is to turn the keyboard rate in
gnome-keybaord tools which helps. Also when browsing websites the USB
mouse has problems with scrolling and the window painting seems very
slow, like when typing www. in url bar can take up to 10 seconds before
the bar shows previously entered urls. Playing mp3 makes the music
skip very badly. I had not tried to use UP kernel but from the
reports i've read the issue is gone when using X. As noted here
http://lists.freedesktop.org/archives/xorg/2005-September/010148.html

I can help debugging this and if more info is needed please CC the
responses.

Best Regards,

Paul B.
Andrew Morton
2005-09-25 23:44:21 UTC
Permalink
Post by Paul Blazejowski
Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.
The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
to XORG list in hopes to help debug this issue.
Is it possible to narrow this down a bit further? Was 2.6.12 OK?

If we can identify two reasonably-close-in-time versions either side of the
regression then the next step would be to run `dmesg -s 1000000' under both
kernel versions, then run `diff -u dmesg.good dmesg.bad'.
Carlo Calica
2005-09-26 04:32:09 UTC
Permalink
I had the same problem with 2.6.12. I'll run some tests with older kernels.
Post by Andrew Morton
Post by Paul Blazejowski
Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.
The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
to XORG list in hopes to help debug this issue.
Is it possible to narrow this down a bit further? Was 2.6.12 OK?
If we can identify two reasonably-close-in-time versions either side of the
regression then the next step would be to run `dmesg -s 1000000' under both
kernel versions, then run `diff -u dmesg.good dmesg.bad'.
--
Carlo J. Calica
Paul Blazejowski
2005-09-28 04:56:32 UTC
Permalink
Post by Andrew Morton
Post by Paul Blazejowski
Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.
The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
to XORG list in hopes to help debug this issue.
Is it possible to narrow this down a bit further? Was 2.6.12 OK?
If we can identify two reasonably-close-in-time versions either side of the
regression then the next step would be to run `dmesg -s 1000000' under both
kernel versions, then run `diff -u dmesg.good dmesg.bad'.
No 2.6.12 is not OK. I don't think there's any regression between the
recent kernels. It just does not work on 3 of them i tried so far.

I am attatching diff from 2.6.12/2.6.13 against 2.6.14-rc2-mm1.
Carlo Calica
2005-09-28 19:07:59 UTC
Permalink
Post by Paul Blazejowski
No 2.6.12 is not OK. I don't think there's any regression between the
recent kernels. It just does not work on 3 of them i tried so far.
Another data point:

I'm unable to reproduce on a PATA install. Specifically, booting on a
PATA HD with sata_nv as a module. When booting on a SATA HD with
sata_nv compiled in, I get the race. Setting irq 1,5 (keyboard and
libata) handlers to cpu0 affinity and X affinity to cpu0 solves the
problem.

I haven't had time to try booting SATA with sata_nv as a module in initrd.

--
Carlo J. Calica
Tim Schmielau
2005-09-26 07:14:02 UTC
Permalink
Post by Paul Blazejowski
Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.
Does the following patch by John Stultz fix the problem?

Tim
Date: Mon, 19 Sep 2005 12:16:43 -0700
From: john stultz <***@us.ibm.com>
To: Andrew Morton <***@osdl.org>
Cc: lkml <linux-***@vger.kernel.org>, Andi Kleen <***@suse.de>
Subject: [PATCH] x86-64: Fix bad assumption that dualcore cpus have synced
TSCs

Andrew,
This patch should resolve the issue seen in bugme bug #5105, where it
is assumed that dualcore x86_64 systems have synced TSCs. This is not
the case, and alternate timesources should be used instead.

For more details, see:
http://bugzilla.kernel.org/show_bug.cgi?id=5105


Please consider for inclusion in your tree.

thanks
-john

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -959,9 +959,6 @@ static __init int unsynchronized_tsc(voi
are handled in the OEM check above. */
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
return 0;
- /* All in a single socket - should be synchronized */
- if (cpus_weight(cpu_core_map[0]) == num_online_cpus())
- return 0;
#endif
/* Assume multi socket systems are not synchronized */
return num_online_cpus() > 1;
Paul Blazejowski
2005-09-28 05:01:48 UTC
Permalink
Post by Tim Schmielau
Post by Paul Blazejowski
Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.
Does the following patch by John Stultz fix the problem?
Tim
Tim,

No it does not, from my understanding it only pertains to x86_64 but
currently i run i386 SMP enabled kernel on the dualcore X2 processor.

Also worth noting is that i do not see any failures or errors in dmesg
related to lost timers. Perhaps this is something new? I even run a
script from the bugzilla and the output matched both cpu's.

Thanks,

Paul
Post by Tim Schmielau
Date: Mon, 19 Sep 2005 12:16:43 -0700
Subject: [PATCH] x86-64: Fix bad assumption that dualcore cpus have synced
TSCs
Andrew,
This patch should resolve the issue seen in bugme bug #5105, where it
is assumed that dualcore x86_64 systems have synced TSCs. This is not
the case, and alternate timesources should be used instead.
http://bugzilla.kernel.org/show_bug.cgi?id=5105
Please consider for inclusion in your tree.
thanks
-john
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -959,9 +959,6 @@ static __init int unsynchronized_tsc(voi
are handled in the OEM check above. */
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
return 0;
- /* All in a single socket - should be synchronized */
- if (cpus_weight(cpu_core_map[0]) == num_online_cpus())
- return 0;
#endif
/* Assume multi socket systems are not synchronized */
return num_online_cpus() > 1;
Reuben Farrelly
2005-09-27 07:13:58 UTC
Permalink
Hi again,
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
- Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
- Various random other things - nothing major.
Just noticed this oops from about 4am this morning. This would have been at
about the time when the normal daily cronjobs are run, but shouldn't have been
doing much else.


Sep 27 04:04:28 tornado kernel: smbd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:28 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:28 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:28 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:28 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:28 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:28 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:28 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:28 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:28 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:28 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:28 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:28 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:04:28 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:04:28 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:04:29 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:04:29 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:04:29 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:04:29 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:04:29 tornado kernel: [<c02a6691>] sock_sendmsg+0xbf/0xe3
Sep 27 04:04:29 tornado kernel: [<c02a77be>] sys_sendto+0xa5/0xbe
Sep 27 04:04:29 tornado kernel: [<c02a780d>] sys_send+0x36/0x38
Sep 27 04:04:29 tornado kernel: [<c02a7ef7>] sys_socketcall+0x134/0x251
Sep 27 04:04:29 tornado kernel: [<c0102b5b>] sysenter_past_esp+0x54/0x75
Sep 27 04:04:29 tornado kernel: Mem-info:
Sep 27 04:04:29 tornado kernel: DMA per-cpu:
Sep 27 04:04:29 tornado kernel: cpu 0 hot: low 0, high 12, batch 2 used:10
Sep 27 04:04:29 tornado kernel: cpu 0 cold: low 0, high 4, batch 1 used:3
Sep 27 04:04:29 tornado kernel: cpu 1 hot: low 0, high 12, batch 2 used:10
Sep 27 04:04:29 tornado kernel: cpu 1 cold: low 0, high 4, batch 1 used:3
Sep 27 04:04:29 tornado kernel: DMA32 per-cpu: empty
Sep 27 04:04:30 tornado kernel: Normal per-cpu:
Sep 27 04:04:30 tornado kernel: cpu 0 hot: low 0, high 384, batch 64 used:346
Sep 27 04:04:30 tornado kernel: cpu 0 cold: low 0, high 128, batch 32 used:115
Sep 27 04:04:30 tornado kernel: cpu 1 hot: low 0, high 384, batch 64 used:324
Sep 27 04:04:30 tornado kernel: cpu 1 cold: low 0, high 128, batch 32 used:112
Sep 27 04:04:30 tornado kernel: HighMem per-cpu:
Sep 27 04:04:30 tornado kernel: cpu 0 hot: low 0, high 96, batch 16 used:38
Sep 27 04:04:30 tornado kernel: cpu 0 cold: low 0, high 32, batch 8 used:27
Sep 27 04:04:30 tornado kernel: cpu 1 hot: low 0, high 96, batch 16 used:36
Sep 27 04:04:30 tornado kernel: cpu 1 cold: low 0, high 32, batch 8 used:5
Sep 27 04:04:30 tornado kernel: Free pages: 38404kB (2720kB HighMem)
Sep 27 04:04:31 tornado kernel: Active:139410 inactive:49515 dirty:135
writeback:1 unstable:0 free:9601 slab:54525 mapped:88304 pagetables:776
Sep 27 04:04:31 tornado kernel: DMA free:5828kB min:68kB low:84kB high:100kB
active:100kB inactive:944kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:04:31 tornado kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:04:31 tornado kernel: Normal free:29856kB min:3756kB low:4692kB
high:5632kB active:446760kB inactive:188768kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 0 1009
Sep 27 04:04:32 tornado kernel: HighMem free:2720kB min:128kB low:160kB
high:192kB active:110784kB inactive:8344kB present:129212kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:04:32 tornado kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 04:04:32 tornado kernel: DMA: 803*4kB 167*8kB 50*16kB 15*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5828kB
Sep 27 04:04:32 tornado kernel: DMA32: empty
Sep 27 04:04:32 tornado kernel: Normal: 6744*4kB 360*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 29856kB
Sep 27 04:04:32 tornado kernel: HighMem: 654*4kB 13*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2720kB
Sep 27 04:04:32 tornado kernel: Swap cache: add 40, delete 40, find 1/2, race 0+0
Sep 27 04:04:32 tornado kernel: Free swap = 497820kB
Sep 27 04:04:32 tornado kernel: Total swap = 497936kB
Sep 27 04:04:32 tornado kernel: Free swap: 497820kB
Sep 27 04:04:32 tornado kernel: 261679 pages of RAM
Sep 27 04:04:32 tornado kernel: 32303 pages of HIGHMEM
Sep 27 04:04:32 tornado kernel: 3160 reserved pages
Sep 27 04:04:32 tornado kernel: 160186 pages shared
Sep 27 04:04:32 tornado kernel: 0 pages swap cached
Sep 27 04:04:33 tornado kernel: 135 pages dirty
Sep 27 04:04:33 tornado kernel: 1 pages writeback
Sep 27 04:04:33 tornado kernel: 88304 pages mapped
Sep 27 04:04:33 tornado kernel: 54527 pages slab
Sep 27 04:04:33 tornado kernel: 776 pages pagetables
Sep 27 04:04:59 tornado kernel: smtpd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:59 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:59 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:59 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:59 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:59 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:59 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:59 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:59 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:59 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:59 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:59 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:59 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:05:00 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:05:00 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:05:00 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:05:00 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:05:01 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:05:01 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:05:01 tornado kernel: [<c02a69c6>] sock_aio_write+0xbd/0xf6
Sep 27 04:05:01 tornado kernel: [<c0159767>] do_sync_write+0xbb/0x10a
Sep 27 04:05:01 tornado kernel: [<c01598f7>] vfs_write+0x141/0x148
Sep 27 04:05:02 tornado kernel: [<c015999f>] sys_write+0x3d/0x64
Sep 27 04:05:02 tornado kernel: [<c0102b5b>] sysenter_past_esp+0x54/0x75
Sep 27 04:05:02 tornado kernel: Mem-info:
Sep 27 04:05:02 tornado kernel: DMA per-cpu:
Sep 27 04:05:02 tornado kernel: cpu 0 hot: low 0, high 12, batch 2 used:4
Sep 27 04:05:02 tornado kernel: cpu 0 cold: low 0, high 4, batch 1 used:3
Sep 27 04:05:02 tornado kernel: cpu 1 hot: low 0, high 12, batch 2 used:10
Sep 27 04:05:03 tornado kernel: cpu 1 cold: low 0, high 4, batch 1 used:3
Sep 27 04:05:03 tornado kernel: DMA32 per-cpu: empty
Sep 27 04:05:03 tornado kernel: Normal per-cpu:
Sep 27 04:05:03 tornado kernel: cpu 0 hot: low 0, high 384, batch 64 used:23
Sep 27 04:05:04 tornado kernel: cpu 0 cold: low 0, high 128, batch 32 used:115
Sep 27 04:05:04 tornado kernel: cpu 1 hot: low 0, high 384, batch 64 used:383
Sep 27 04:05:04 tornado kernel: cpu 1 cold: low 0, high 128, batch 32 used:120
Sep 27 04:05:04 tornado kernel: HighMem per-cpu:
Sep 27 04:05:04 tornado kernel: cpu 0 hot: low 0, high 96, batch 16 used:89
Sep 27 04:05:04 tornado kernel: cpu 0 cold: low 0, high 32, batch 8 used:3
Sep 27 04:05:05 tornado kernel: cpu 1 hot: low 0, high 96, batch 16 used:5
Sep 27 04:05:05 tornado kernel: cpu 1 cold: low 0, high 32, batch 8 used:27
Sep 27 04:05:05 tornado kernel: Free pages: 39608kB (2144kB HighMem)
Sep 27 04:05:05 tornado kernel: Active:132565 inactive:56281 dirty:100
writeback:1 unstable:0 free:9902 slab:54546 mapped:88341 pagetables:776
Sep 27 04:05:05 tornado kernel: DMA free:4704kB min:68kB low:84kB high:100kB
active:224kB inactive:948kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:05:05 tornado kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:05:05 tornado kernel: Normal free:32760kB min:3756kB low:4692kB
high:5632kB active:418168kB inactive:216412kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 0 1009
Sep 27 04:05:05 tornado kernel: HighMem free:2144kB min:128kB low:160kB
high:192kB active:111868kB inactive:7764kB present:129212kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 04:05:05 tornado kernel: DMA: 936*4kB 108*8kB 6*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4704kB
Sep 27 04:05:05 tornado kernel: DMA32: empty
Sep 27 04:05:05 tornado kernel: Normal: 7484*4kB 349*8kB 2*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32760kB
Sep 27 04:05:05 tornado kernel: HighMem: 510*4kB 13*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2144kB
Sep 27 04:05:05 tornado kernel: Swap cache: add 40, delete 40, find 1/2, race 0+0
Sep 27 04:05:05 tornado kernel: Free swap = 497820kB
Sep 27 04:05:05 tornado kernel: Total swap = 497936kB
Sep 27 04:05:05 tornado kernel: Free swap: 497820kB
Sep 27 04:05:05 tornado kernel: 261679 pages of RAM
Sep 27 04:05:05 tornado kernel: 32303 pages of HIGHMEM
Sep 27 04:05:05 tornado kernel: 3160 reserved pages
Sep 27 04:05:06 tornado kernel: 165825 pages shared
Sep 27 04:05:06 tornado kernel: 0 pages swap cached
Sep 27 04:05:06 tornado kernel: 100 pages dirty
Sep 27 04:05:06 tornado kernel: 1 pages writeback
Sep 27 04:05:06 tornado kernel: 88341 pages mapped
Sep 27 04:05:06 tornado kernel: 54546 pages slab
Sep 27 04:05:06 tornado kernel: 776 pages pagetables


reuben
Andrew Morton
2005-09-27 07:44:10 UTC
Permalink
Post by Reuben Farrelly
Post by Andrew Morton
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
- Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
- Various random other things - nothing major.
Just noticed this oops from about 4am this morning. This would have been at
about the time when the normal daily cronjobs are run, but shouldn't have been
doing much else.
Sep 27 04:04:28 tornado kernel: smbd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:28 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:28 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:28 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:28 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:28 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:28 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:28 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:28 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:28 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:28 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:28 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:28 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:04:28 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:04:28 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:04:29 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:04:29 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:04:29 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:04:29 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:04:29 tornado kernel: [<c02a6691>] sock_sendmsg+0xbf/0xe3
Sep 27 04:04:29 tornado kernel: [<c02a77be>] sys_sendto+0xa5/0xbe
No, this is simply a warning - the kernel ran out of 1-order pages in the
page allocator. There have been several reports of this after
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch was merged,
which was rather expected.

I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Martin J. Bligh
2005-09-27 18:59:16 UTC
Permalink
Post by Andrew Morton
No, this is simply a warning - the kernel ran out of 1-order pages in the
page allocator. There have been several reports of this after
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch was merged,
which was rather expected.
I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Me no understand. We're going to deliberately cause fragmentation in order
to defragment it again later ???

M.
Paul Jackson
2005-10-02 17:13:19 UTC
Permalink
Post by Martin J. Bligh
Post by Andrew Morton
I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Me no understand. We're going to deliberately cause fragmentation in order
to defragment it again later ???
I thought that the patches of Mel Gorman and Joel Schopp were reducing
fragmentation, not causing it.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <***@sgi.com> 1.925.600.0401
Martin J. Bligh
2005-10-02 21:31:09 UTC
Permalink
Post by Paul Jackson
Post by Martin J. Bligh
Post by Andrew Morton
I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Me no understand. We're going to deliberately cause fragmentation in order
to defragment it again later ???
I thought that the patches of Mel Gorman and Joel Schopp were reducing
fragmentation, not causing it.
They were. but mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk
seems to be going in the opposite direction.

M.
Rohit Seth
2005-10-03 17:20:42 UTC
Permalink
Post by Martin J. Bligh
Post by Paul Jackson
Post by Martin J. Bligh
Post by Andrew Morton
I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Me no understand. We're going to deliberately cause fragmentation in order
to defragment it again later ???
I thought that the patches of Mel Gorman and Joel Schopp were reducing
fragmentation, not causing it.
They were. but mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk
seems to be going in the opposite direction.
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk patch tries to
allocate more physical contiguous pages for pcp. This would cause some
extra fragmentation at the higher orders but has the potential benefit
of spreading more uniformly across caches. I agree though that for this
scheme to work nicely we should have the capability of draining the pcps
so that higher order requests can be serviced whenever possible.

-rohit
Martin J. Bligh
2005-10-03 17:56:57 UTC
Permalink
Post by Rohit Seth
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk patch tries to
allocate more physical contiguous pages for pcp. This would cause some
extra fragmentation at the higher orders but has the potential benefit
of spreading more uniformly across caches. I agree though that for this
scheme to work nicely we should have the capability of draining the pcps
so that higher order requests can be serviced whenever possible.
Unfortunately, I don't think it's that simple. We'll end up taking the
higher order elements from the buddy into the caches, and using them
all piecemeal - ie fragmenting it all.

If we take lists of 0 order pages from the buddy, we're trying to use
whatever dross was left over in there (from a fragmentation point of view)
up first, before breaking into the more precious stuff (phys contig bits).

That was why I wrote it that way in the first place - it wasn't
accidental ;-)
Post by Rohit Seth
From the direction the thread was going in previously, it sounded like
you were finding other ways to alleviate the colouring issue you were
seeing ... I was hoping that would fix it up enough the desire for higher
order allocations would disappear.

To be blunt about it ... making sure that we don't fall over on higher
order allocs seems to me to be more important than a bit of variability
in benchmark runs ...

M.

Loading...