2009年10月 4日

[vine-users:079726] Re: AMD64 環境での起動時のメモリリークについて

野宮さん、ありがとうございます。

確かに中途半端だったな…と認識し、あれから全パスメモリチェックを行って、メモリのエラーが無い事を確認しました。
その後、テスト稼動していると突然同じ現象(リンクアップ・ダウンを繰り返す)が再発しました。
カーネルを2.6.27に戻して予備の同型アダプタへ換装、eth0のドライバを最新のバージョン(e1000-8.0.16)へ更新して、別のマシンからコネクションを500程度HTTPdへ張ってテスト稼動していると、またまた再発しました…。

ログには、

Oct 4 04:54:57 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:54:57 localhost kernel: Tx Queue <0>
Oct 4 04:54:57 localhost kernel: TDH <b4>
Oct 4 04:54:57 localhost kernel: TDT <fa>
Oct 4 04:54:57 localhost kernel: next_to_use <fa>
Oct 4 04:54:57 localhost kernel: next_to_clean <b1>
Oct 4 04:54:57 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:54:57 localhost kernel: time_stamp <100028ea8>
Oct 4 04:54:57 localhost kernel: next_to_watch <b6>
Oct 4 04:54:57 localhost kernel: jiffies <100029109>
Oct 4 04:54:57 localhost kernel: next_to_watch.status <0>
Oct 4 04:54:59 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:54:59 localhost kernel: Tx Queue <0>
Oct 4 04:54:59 localhost kernel: TDH <b4>
Oct 4 04:54:59 localhost kernel: TDT <fa>
Oct 4 04:54:59 localhost kernel: next_to_use <fa>
Oct 4 04:54:59 localhost kernel: next_to_clean <b1>
Oct 4 04:54:59 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:54:59 localhost kernel: time_stamp <100028ea8>
Oct 4 04:54:59 localhost kernel: next_to_watch <b6>
Oct 4 04:54:59 localhost kernel: jiffies <1000292fd>
Oct 4 04:54:59 localhost kernel: next_to_watch.status <0>
Oct 4 04:55:01 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:55:01 localhost kernel: Tx Queue <0>
Oct 4 04:55:01 localhost kernel: TDH <b4>
Oct 4 04:55:01 localhost kernel: TDT <fa>
Oct 4 04:55:01 localhost kernel: next_to_use <fa>
Oct 4 04:55:01 localhost kernel: next_to_clean <b1>
Oct 4 04:55:01 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:55:01 localhost kernel: time_stamp <100028ea8>
Oct 4 04:55:01 localhost kernel: next_to_watch <b6>
Oct 4 04:55:01 localhost kernel: jiffies <1000294f1>
Oct 4 04:55:01 localhost kernel: next_to_watch.status <0>
Oct 4 04:55:03 localhost kernel: ------------[ cut here ]------------
Oct 4 04:55:03 localhost kernel: WARNING: at net/sched/sch_generic.c:219
dev_watchdog+0x136/0x1d8()
Oct 4 04:55:03 localhost kernel: NETDEV WATCHDOG: eth0 (e1000): transmit
timed out
Oct 4 04:55:03 localhost kernel: Modules linked in: xt_length ipt_REJECT
xt_limit ipt_LOG ipt_recent xt_tcpudp nf_conntrack_ipv4 xt_state
nf_conntrack xt_multiport iptable_filter ip_tables x_tables cpufreq_ondemand
powernow_k8 freq_table dm_mod firewire_ohci firewire_core crc
_itu_t i2c_piix4 i2c_core ohci1394 thermal e1000 ieee1394 sg shpchp
processor pcspkr button wmi usb_storage ahci pata_atiixp libata dock
ide_cd_mod cdrom sd_mod scsi_mod crc_t10dif uhci_hcd ohci_hcd ehci_hcd
Oct 4 04:55:03 localhost kernel: Pid: 0, comm: swapper Not tainted
2.6.27-43vl5 #1
Oct 4 04:55:03 localhost kernel:
Oct 4 04:55:03 localhost kernel: Call Trace:
Oct 4 04:55:03 localhost kernel: <IRQ> [<ffffffff80239eb1>]
warn_slowpath+0xb4/0xe0
Oct 4 04:55:03 localhost kernel: [<ffffffffa0170696>]
ipt_do_table+0x501/0x56b [ip_tables]
Oct 4 04:55:03 localhost kernel: [<ffffffff8022ed69>]
source_load+0x2a/0x4f
Oct 4 04:55:03 localhost kernel: [<ffffffff8022edb8>]
target_load+0x2a/0x4f
Oct 4 04:55:03 localhost kernel: [<ffffffff8022f79b>]
place_entity+0x6c/0x9a
Oct 4 04:55:03 localhost kernel: [<ffffffff80230236>]
enqueue_entity+0x9c/0xbd
Oct 4 04:55:03 localhost kernel: [<ffffffff8022eb50>]
enqueue_task+0x13/0x1e
Oct 4 04:55:03 localhost kernel: [<ffffffff8022fa6f>]
resched_task+0x2d/0x74
Oct 4 04:55:03 localhost kernel: [<ffffffff802338a2>]
try_to_wake_up+0x175/0x187
Oct 4 04:55:03 localhost kernel: [<ffffffff8024ba00>]
autoremove_wake_function+0x9/0x2e
Oct 4 04:55:03 localhost kernel: [<ffffffff8022f049>]
__wake_up_common+0x41/0x75
Oct 4 04:55:03 localhost kernel: [<ffffffff80469bba>]
dev_watchdog+0x136/0x1d8
Oct 4 04:55:03 localhost kernel: [<ffffffff8022fbeb>] __wake_up+0x38/0x4e
Oct 4 04:55:03 localhost kernel: [<ffffffff80469a84>]
dev_watchdog+0x0/0x1d8
Oct 4 04:55:03 localhost kernel: [<ffffffff802421e3>]
run_timer_softirq+0x16f/0x1ec
Oct 4 04:55:03 localhost kernel: [<ffffffff8023e5e3>]
__do_softirq+0x65/0xdb
Oct 4 04:55:03 localhost kernel: [<ffffffff8021189c>]
call_softirq+0x1c/0x28
Oct 4 04:55:03 localhost kernel: [<ffffffff802139d3>] do_softirq+0x3c/0x81
Oct 4 04:55:03 localhost kernel: [<ffffffff8023e538>] irq_exit+0x3f/0x85
Oct 4 04:55:03 localhost kernel: [<ffffffff8021fc0b>]
smp_apic_timer_interrupt+0x8f/0xa8
Oct 4 04:55:03 localhost kernel: [<ffffffff802110a3>]
apic_timer_interrupt+0x83/0x90
Oct 4 04:55:03 localhost kernel: <EOI> [<ffffffff8021f9b4>]
lapic_next_event+0x0/0x13
Oct 4 04:55:03 localhost kernel: [<ffffffff80223c3c>]
native_safe_halt+0x2/0x3
Oct 4 04:55:03 localhost kernel: [<ffffffff8024f194>]
notifier_call_chain+0x29/0x4c
Oct 4 04:55:03 localhost kernel: [<ffffffff80217454>]
default_idle+0x2a/0x46
Oct 4 04:55:03 localhost kernel: [<ffffffff80217682>] c1e_idle+0x10a/0x10f
Oct 4 04:55:03 localhost kernel: [<ffffffff8020eca5>] cpu_idle+0x9e/0xc8
Oct 4 04:55:03 localhost kernel:
Oct 4 04:55:03 localhost kernel: ---[ end trace 003e90f6d4b5fa06 ]---
Oct 4 04:55:06 localhost kernel: e1000: eth0: e1000_watchdog_task: NIC Link
is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Oct 4 04:55:12 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:55:12 localhost kernel: Tx Queue <0>
Oct 4 04:55:12 localhost kernel: TDH <7f>
Oct 4 04:55:12 localhost kernel: TDT <ba>
Oct 4 04:55:12 localhost kernel: next_to_use <ba>
Oct 4 04:55:12 localhost kernel: next_to_clean <7c>
Oct 4 04:55:12 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:55:12 localhost kernel: time_stamp <100029dbe>
Oct 4 04:55:12 localhost kernel: next_to_watch <80>
Oct 4 04:55:12 localhost kernel: jiffies <100029fb2>
Oct 4 04:55:12 localhost kernel: next_to_watch.status <0>
Oct 4 04:55:14 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:55:14 localhost kernel: Tx Queue <0>
Oct 4 04:55:14 localhost kernel: TDH <7f>
Oct 4 04:55:14 localhost kernel: TDT <ba>
Oct 4 04:55:14 localhost kernel: next_to_use <ba>
Oct 4 04:55:14 localhost kernel: next_to_clean <7c>
Oct 4 04:55:14 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:55:14 localhost kernel: time_stamp <100029dbe>
Oct 4 04:55:14 localhost kernel: next_to_watch <80>
Oct 4 04:55:14 localhost kernel: jiffies <10002a1a6>
Oct 4 04:55:14 localhost kernel: next_to_watch.status <0>
Oct 4 04:55:16 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:55:16 localhost kernel: Tx Queue <0>
Oct 4 04:55:16 localhost kernel: TDH <7f>
Oct 4 04:55:16 localhost kernel: TDT <ba>
Oct 4 04:55:16 localhost kernel: next_to_use <ba>
Oct 4 04:55:16 localhost kernel: next_to_clean <7c>
Oct 4 04:55:16 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:55:16 localhost kernel: time_stamp <100029dbe>
Oct 4 04:55:16 localhost kernel: next_to_watch <80>
Oct 4 04:55:16 localhost kernel: jiffies <10002a39a>
Oct 4 04:55:16 localhost kernel: next_to_watch.status <0>
Oct 4 04:55:18 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:55:18 localhost kernel: Tx Queue <0>
Oct 4 04:55:18 localhost kernel: TDH <7f>
Oct 4 04:55:18 localhost kernel: TDT <ba>
Oct 4 04:55:18 localhost kernel: next_to_use <ba>
Oct 4 04:55:18 localhost kernel: next_to_clean <7c>
Oct 4 04:55:18 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:55:18 localhost kernel: time_stamp <100029dbe>
Oct 4 04:55:18 localhost kernel: next_to_watch <80>
Oct 4 04:55:18 localhost kernel: jiffies <10002a58e>
Oct 4 04:55:18 localhost kernel: next_to_watch.status <0>
Oct 4 04:55:20 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct 4 04:55:20 localhost kernel: Tx Queue <0>
Oct 4 04:55:20 localhost kernel: TDH <7f>
Oct 4 04:55:20 localhost kernel: TDT <ba>
Oct 4 04:55:20 localhost kernel: next_to_use <ba>
Oct 4 04:55:20 localhost kernel: next_to_clean <7c>
Oct 4 04:55:20 localhost kernel: buffer_info[next_to_clean]
Oct 4 04:55:20 localhost kernel: time_stamp <100029dbe>
Oct 4 04:55:20 localhost kernel: next_to_watch <80>
Oct 4 04:55:20 localhost kernel: jiffies <10002a782>
Oct 4 04:55:20 localhost kernel: next_to_watch.status <0>

<この後、同じようにアップ・ダウンを繰り返すログ>

そしてコネクションを切ると、アップしたままダウンしなくなりました。

…。

それからPCI NICに問題があるのかと思い、オンボードのイーサネットアダプタ(Realtek
8111C)へ繋いで見ると問題が起きません。
1000MT Desktop Adapterを別のマシンでチェックしましが、特に問題はなく、このNIC自体ハードウェア的な問題ではないような気がします。
マザーボードの仕様では、PCIはサウスブリッジ制御(SB700)、PCI-Eはノースブリッジ制御(780G)で、オンボードのイーサネットアダプタ(Realtek 8111C)はノースブリッジ制御のようです。
ちょっと問題がソフトウェアと言うかマザーボード使用上ハードウェア的な感じがしてきましたので、PCI-EのNICを調達して試してみます。

お騒がせ致しました…。


投稿者 xml-rpc : 2009年10月 4日 07:31
役に立ちました?:
過去のフィードバック 平均:(0) 総合:(0) 投票回数:(0)
本記事へのTrackback: http://hoop.euqset.org/blog/mt-tb2006.cgi/89136
トラックバック
コメント
コメントする




画像の中に見える文字を入力してください。