昨 天接到客戶報告,說一個RAC節點的歸檔存儲目錄變成只讀的了,導致無法創建歸檔日志,因此重做日志也無法切換,幸好是RAC,客戶說系統重啟動后,就可 以了,但是一會又變成只讀的了,一開始判斷可能掛載的有問題,于是就去查看了ROOT用戶的操作歷史,到是有加載混亂的問題,但是把懷疑的地方排除后,還 是只讀的。于是開始查看系統日志,因為ORACLE BUG 5722352,系統日志里全是
Feb 12 10:16:57 su(pam_unix)[28104]: session opened for user oracle by (uid=0)
Feb 12 10:16:57 su(pam_unix)[28104]: session closed for user oracle
這種信息,沒辦法,讓客戶截取了30W行,我才好容易找到啟動日志,從而找到了一些有價值的信息
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110988
Jul 9 16:15:38 dbrac2 kernel: Aborting journal on device sdh1.
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110989
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110990
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110991
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110992
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110993
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110994
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110995
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110996
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110997
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110998
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110999
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_truncate: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_orphan_del: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_delete_inode: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: ext3_abort called.
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_journal_start_sb: Detected aborted journal
Jul 9 16:15:38 dbrac2 kernel: Remounting filesystem read-only
可以看到是系統內核把sdh1(/arch02)REMOUNT成只讀的了,在看上邊是磁盤系統出現問題了。這個是LINUX系統內核管理的機制,為什么系統重啟會好呢?
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning (device sdh1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning (device sdh1): ext3_clear_journal_err: Marking fs in need of filesystem check.
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Jul 8 00:20:37 dbrac2 kernel: EXT3 FS on sdh1, internal journal
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs: recovery complete.
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
也只能從這里找出原因了。
我沒有FSCK修復磁盤系統,因為錯誤比較嚴重,上邊的歸檔日志也是7號之前的了,里邊的日志也無法拷貝出來,最后決定為了以后的運行文檔,把SDH1重新格式化了,然后重新掛載就OK了。
一般遇到次問題后需要檢查幾個方面
一、空間是否足夠
二、inode是否足夠
三、目錄權限屬主是否改過
四、掛載是否有問題,默認是掛載是讀寫狀態的(mount -o rw / /)
五、檢查系統日志是否有磁盤錯誤
六、出現次錯誤,硬件出問題的可能性比較大