ALL about Linux: 2016

2016年12月12日月曜日

RHEL7 カーネルのうるう秒（閏秒）に関する不具合

次回のうるう秒挿入（日本時間2017年1月1日8時59分59秒）が迫ってますが、先週、新たに RHEL7 のカーネル不具合情報と修正版カーネルが公開されたので、備忘録として書いておこうと思います。

http://rhn.redhat.com/errata/RHBA-2016-2862.html
https://access.redhat.com/solutions/2766351

カーネル 3.10.0-514.2.2.el7 で修正されたようですが、すぐにアップデートできないなら、ntpd の slew モードを使っておけばよいです。または、ntpd を一時的に止めておく方法も可能。ちなみに、今回のRHEL7のカーネル不具合は、発生確率がけっこう低い (500回試して1回程度と記載されてます) ようです。
ChangeLog より抜粋です。

* Wed Nov 16 2016 Frantisek Hrbata <fhrbata@hrbata.com> [3.10.0-514.2.2.el7]
- [kernel] timekeeping: Copy the shadow-timekeeper over the real timekeeper last (Prarit Bhargava) [1395577 1344747]

2016年10月29日土曜日

使わないデカイ rpm をリストアップして削除する方法

Windows PC のメンテナンス目的 (バックアップ等) で、CentOS 7 をインストールしてマルチブート環境を作ったのですが、SSD の割り当てをケチって 5GB のパーティションに Server with GUI オプションでインストールしました。
そうしたところ、yum update するだけでも、空きが殆どゼロになったため、不要な大きなサイズの rpm をリストアップして削除したいと思ったのでした。そのコマンドライン、備忘録です。

[root@hoge ~]# rpm -qa --qf '%{size}  %{name}-%{version}-%{release}.%{arch}\n' | sort -rn | head
149412110  firefox-45.4.0-1.el7.centos.x86_64
142763489  kernel-3.10.0-327.36.3.el7.x86_64
120273417  glibc-common-2.17-106.el7_2.8.x86_64
72282987  linux-firmware-20150904-43.git6ebf5d5.el7.noarch
68134425  texlive-cm-super-svn15878.0-38.el7.noarch
63095469  gimp-2.8.10-3.el7.x86_64
56222854  gnome-getting-started-docs-3.14.1.0.2-1.el7.noarch
49972399  kernel-doc-3.10.0-327.36.3.el7.noarch
45656942  webkitgtk3-2.4.9-5.el7.x86_64
41249234  texlive-lm-svn28119.2.004-38.el7.noarch

これで、例えば gimp が何かを、確認しながら、不要なら削除すればよいです。

[root@hoge ~]# rpm -qi gimp
Name        : gimp
Epoch       : 2
Version     : 2.8.10
Release     : 3.el7
Architecture: x86_64
Install Date: 2014年05月25日 18時00分07秒
Group       : Applications/Multimedia
Size        : 63095469
License     : GPLv3+ and GPLv3
Signature   : RSA/SHA256, 2014年04月02日 01時39分59秒, Key ID 199e2f91fd431d51
Source RPM  : gimp-2.8.10-3.el7.src.rpm
Build Date  : 2014年01月25日 08時24分39秒
Build Host  : x86-025.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. 
Vendor      : Red Hat, Inc.
URL         : http://www.gimp.org/
Summary     : GNU Image Manipulation Program
Description :
GIMP (GNU Image Manipulation Program) is a powerful image composition and
editing program, which can be extremely useful for creating logos and other
graphics for webpages. GIMP has many of the tools and filters you would expect
to find in similar commercial offerings, and some interesting extras as well.
GIMP provides a large image manipulation toolbox, including channel operations
and layers, effects, sub-pixel imaging and anti-aliasing, and conversions, all
with multi-level undo.

わたしの場合は、間違いなく使わないであろう libreoffice と java 関係を削ることで、十分な空きが確保できました。

2016年9月17日土曜日

CentOS6 + ZFS on Linux 環境での updatedb による余分なI/O負荷

ZFS on Linux を CentOS6 で動作させている環境で、cron.daily の際に updatedb がやけに動いていることを発見。
mlocate.cron を見てみたら、次のような記述になっていました。

[root@hoge ~]# cat /etc/cron.daily/mlocate.cron 
#!/bin/sh
nodevs=$(< /proc/filesystems awk '$1 == "nodev" && $2 != "zfs" { print $2 }')
renice +19 -p $$ >/dev/null 2>&1
ionice -c2 -n7 -p $$ >/dev/null 2>&1
/usr/bin/updatedb -f "$nodevs"

まさかとは思いましたが、レッドハットが ZFS に配慮しているケースもあるみたいです。mlocate の changelog から抜粋。

[root@hoge ~]# rpm -q --changelog mlocate | head -8
* Mon Jan 26 2015 Michal Sekletar <msekleta@redhat.com> - 0.22.2-6
- mlocate.db is ghost file created with non-default attrs, list them explicitly so rpm --verify doesn't report errors (#1182304)

* Wed Jan 07 2015 Michal Sekletar <msekleta@redhat.com> - 0.22.2-5
- index zfs filesystems despite the fact they are marked as nodev (#1023779)
- use more strict permissions for cron script and mark it as config (#1012534)
- add gpfs to PRUNEFS (#1168301)

[root@hoge ~]#

対応するバグジラ（#1023779）によると、
https://bugzilla.redhat.com/show_bug.cgi?id=1023779
ZFS 領域が updatedb の対象にならないという問題があり、まず Fedora で修正され、RHEL6 にも修正が取り込まれたようです。
さらには、RHEL7 向けのバグジラ（#1304416）もオープンされています。
https://bugzilla.redhat.com/show_bug.cgi?id=1304416
こちらは、RHEL7.3 向けに Status:VERIFIED の状態、つまり、ベータに取り込まれた模様です。

レッドハットが ZFS を無視しないでくれる（つまりは、ある程度は RHEL+ZFS を利用する顧客が居るらしい？）のは、個人的にはありがたいことと思いました。
が、しかし、ZFS の倉庫領域（大量のファイルを格納している）は、updatedb 対象じゃないほうが良いです。わたしの使い方に於いては。
というわけで、

[root@hoge ~]# vi /etc/updatedb.conf 
PRUNE_BIND_MOUNTS = "yes"
PRUNEFS = "zfs 9p afs anon_inodefs auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset debugfs devpts ecryptfs exofs fuse fusectl gfs gfs2 gpfs hugetlbfs inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs sockfs sysfs tmpfs ubifs udf usbfs"
PRUNENAMES = ".git .hg .svn"
PRUNEPATHS = "/afs /media /net /sfs /tmp /udev /var/cache/ccache /var/spool/cups /var/spool/squid /var/tmp"
[root@hoge ~]#

という具合に設定しました。

[root@hoge ~]# grep mlocate /var/log/cron
...
Sep 10 03:41:09 hoge run-parts(/etc/cron.daily)[6431]: starting mlocate.cron
Sep 10 04:19:02 hoge run-parts(/etc/cron.daily)[13231]: finished mlocate.cron
Sep 11 03:48:11 hoge run-parts(/etc/cron.daily)[12147]: starting mlocate.cron
Sep 11 04:22:57 hoge run-parts(/etc/cron.daily)[18782]: finished mlocate.cron
Sep 12 03:17:06 hoge run-parts(/etc/cron.daily)[11282]: starting mlocate.cron
Sep 12 03:43:40 hoge run-parts(/etc/cron.daily)[16158]: finished mlocate.cron
Sep 13 03:22:05 hoge run-parts(/etc/cron.daily)[16992]: starting mlocate.cron
Sep 13 03:50:24 hoge run-parts(/etc/cron.daily)[22578]: finished mlocate.cron
Sep 14 03:08:05 hoge run-parts(/etc/cron.daily)[20083]: starting mlocate.cron★ここから変更後
Sep 14 03:08:22 hoge run-parts(/etc/cron.daily)[20318]: finished mlocate.cron
Sep 15 03:13:08 hoge run-parts(/etc/cron.daily)[25880]: starting mlocate.cron
Sep 15 03:13:14 hoge run-parts(/etc/cron.daily)[26074]: finished mlocate.cron
Sep 16 03:07:06 hoge run-parts(/etc/cron.daily)[29745]: starting mlocate.cron
Sep 16 03:07:09 hoge run-parts(/etc/cron.daily)[29947]: finished mlocate.cron

このように、30～40分程度動いていた updatedb が、10秒程度になり、無駄な I/O 負荷をカットできました。

2016年8月2日火曜日

WD GREEN が壊れかけた

ZFS mirror 構成で使っていた WD GREEN 3T HDD ２台のうちの１台が、scrub の途中で I/O エラー (media error) 多発状態になりました。

scrub が永久に終わりそうにないほどスローダウンしたため、中断 (zpool scrub -s tankX) して、S.M.A.R.T. の値など参照したのですが、その時は smartctl -A を１回実行するだけでも４秒 (time で計測) くらいかかるという、異常に反応が遅い状態でした。いわゆる DRC (deep recovery cycle) に入っていたのかも。

その後、過去の経験から、SecureErase を行ってリフレッシュしてみました。次は、SecureErase 完了後の smartctl -a の出力です。

[root@hoge ~]# smartctl -a /dev/sde
smartctl 5.43 2012-06-30 r3573 [i686-linux-2.6.32-642.3.1.el6.nonpae.i686] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD30EZRX-00DC0B0
Serial Number:    WD-WMC1T0xxxxx2
LU WWN Device Id: 5 0014ee 6yyyyyyyf
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ACS-2 (revision not indicated)
Local Time is:    Tue Aug  2 20:32:42 2016 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
     a test element that failed and the test
     element that failed is not known.
Total time to complete Offline 
data collection:   (39360) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (   2) minutes.
Extended self-test routine
recommended polling time:   ( 395) minutes.
Conveyance self-test routine
recommended polling time:   (   5) minutes.
SCT capabilities:         (0x70b5) SCT Status supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       8587
  3 Spin_Up_Time            0x0027   173   172   021    Pre-fail  Always       -       6316
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       135
  5 Reallocated_Sector_Ct   0x0033   140   140   140    Pre-fail  Always   FAILING_NOW 1763
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   058   058   000    Old_age   Always       -       30747
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       131
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       38
193 Load_Cycle_Count        0x0032   159   159   000    Old_age   Always       -       125435
194 Temperature_Celsius     0x0022   113   106   000    Old_age   Always       -       37
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       357
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       5
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   187   187   000    Old_age   Offline      -       5319

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    90%     30716         -
# 2  Short offline       Completed without error       00%     30647         -
# 3  Short offline       Completed without error       00%     28800         -
# 4  Short offline       Completed without error       00%     15839         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

このあとで、dd コマンドを使って全セクタ読み出しテストを行ってみたのですが、なんと正常終了してしまいました。

このような経過で、廃棄するにはまだ惜しい感じですが、さりとて、もう一度 zpool に attach するのは躊躇されるという状態に至りました。
もう１台の WD GREEN が壊れたら RAID 崩壊になってしまうので、結局、安全策で、交換用ディスクを注文しました。

RAID で使ってはいけないと言われる WD GREEN ですが、たしかに、I/O エラーになった時のスローダウンの挙動 (DRC による挙動？) は、RAID には向かないと感じました。だけども、ここまで３年半利用できたわけで、値段を考えると、そんなに悪くないとも思いました。

ちなみに、交換用に発注したディスクは、今まで使ったことがない WD Purple です。WD Red は、使ったことがあるので、違うのを使ってみたい。

2016-08-04追記
紙のノートのメモによると、今回壊れかけているディスクは、2012年12月23日購入、2015年02月23日に idle3 タイマーを無効化していました。idle3 タイマーのことは後から知ったので、気がついた時には Load_Cycle_Count が 125426 になっていたと、メモにあります。本当に寿命と関係するのか定かではありませんが、最初から止めておいたなら、延命できたのでは？と思ってしまいます。

2016-08-22追記
新しいディスクとの replace は無事成功したので、この WD GREEN を再度 attach して三重ミラーの状態にしてみました。その後、resilver も成功したので、さらに scrub を行ったところ、数回のエラーは出たものの scrub も完了しました。そして、この過程で、Current_Pending_Sector が１まで減りました。とりあえず、このまま三重ミラーの状態で利用続けてみようと思います。

2016年7月31日日曜日

2.5インチHDDのLoad_Cycle_Countの上昇

S.M.A.R.T. の中で Load_Cycle_Count というのがあり、WD GREEN に属する HDD では、IntelliPark という機能の作用で、このカウンタがどんどん上がるという話題が有名です。
WD GREEN に限った性質と思ってましたが、ふと手持ちの 2.5 インチ HDD の値を見てみたら、結構な値になっていました。調べてみると、どうやら、APM 機能によるもののようです。

[root@hoge ~]# smartctl -g apm /dev/sdc
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

APM level is:     128 (minimum power consumption without standby)
[root@hoge ~]# smartctl -s apm,253 /dev/sdc
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

APM set to level 253 (intermediate level without standby)
[root@hoge ~]# smartctl -A /dev/sdc
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   173   173   033    Pre-fail  Always       -       2
  4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always       -       1873
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   093   093   000    Old_age   Always       -       3451
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1716
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   097   097   000    Old_age   Always       -       393906
193 Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       37845
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       36 (Min/Max 7/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       259
223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0

[root@hoge ~]# smartctl -A /dev/sdc
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   173   173   033    Pre-fail  Always       -       2
  4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always       -       1873
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   093   093   000    Old_age   Always       -       3452
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1716
191 G-Sense_Error_Rate      0x000a   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   097   097   000    Old_age   Always       -       393906
193 Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       37845
194 Temperature_Celsius     0x0002   162   162   000    Old_age   Always       -       37 (Min/Max 7/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       259
223 Load_Retry_Count        0x000a   100   100   000    Old_age   Always       -       0

APM 設定を 253 にすると上がらなくなりました。ただし、温度が高めになります。254 が maximum performance なのですが、254 だと更に温度が高くなるようで、253 のほうが良好に感じました。

2016年7月18日月曜日

中古 HDD の初期確認、４個目（2016年7月）

中古 HDD の初期確認。前回に続き、某ショップより４個目を入手しましたので、今回も備忘録です。

今回入手した中古 HDD も、前回と同じく Seagate Barracuda ES.2 1TB です。今まで入手した中で、もっとも高値でしたが、商品の説明によると、状態は良い（五つ★）ということでした。
ちなみに、４個購入して平均の値段は、3083円でした。

まず、S.M.A.R.T. の確認です。

[root@hoge ~]# smartctl -A /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.1.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   081   006    Pre-fail  Always       -       86548384
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       157
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       369
  7 Seek_Error_Rate         0x000f   068   060   030    Pre-fail  Always       -       94635312555
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       -       22600
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       1
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       140
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       301
188 Command_Timeout         0x0032   100   089   000    Old_age   Always       -       171801313320
189 High_Fly_Writes         0x003a   096   096   000    Old_age   Always       -       4
190 Airflow_Temperature_Cel 0x0022   064   046   045    Old_age   Always       -       36 (Min/Max 28/36)
194 Temperature_Celsius     0x0022   036   054   000    Old_age   Always       -       36 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   031   016   000    Old_age   Always       -       86548384
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   193   000    Old_age   Always       -       8172

残念ながら、それほど良いコンディションではないですね。
ですが、これまでの経験からは、すこぶる悪いというほどでもないです。

運用時間（9 Power_On_Hours）は、

[root@hoge ~]# echo "22600 / 24 / 365" | bc -l
2.57990867579908675799

次は hdparm -i の出力です。

[root@hoge ~]# hdparm -i /dev/sdb

/dev/sdb:

 Model=ST31000340NS, FwRev=SN04, SerialNo=5QJ0D8RA
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:  ATA/ATAPI-4,5,6

 * signifies the current active mode

ファームウェアのバージョンは、SN04 でした。同じモデルでも、ずいぶん色々なバージョンがあるもんですね。１個目は NA02、２個目が FSC9、３個目は SN06 でした。

次は、smartctl -a の出力です。

[root@hoge ~]# smartctl -a /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.1.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST31000340NS
Serial Number:    5xxxxxxA
LU WWN Device Id: 5 000c50 0yyyyyyyf
Firmware Version: SN04
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Mon Jun 27 15:25:19 2016 JST

==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/207963en  ★かの有名な問題の警告が出ています

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
     without error or no self-test has ever 
     been run.
Total time to complete Offline 
data collection:   (  642) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (   1) minutes.
Extended self-test routine
recommended polling time:   ( 239) minutes.
Conveyance self-test routine
recommended polling time:   (   2) minutes.
SCT capabilities:         (0x003d) SCT Status supported.
     SCT Error Recovery Control supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   081   006    Pre-fail  Always       -       86548384
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       157
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       369
  7 Seek_Error_Rate         0x000f   068   060   030    Pre-fail  Always       -       94635312564
  9 Power_On_Hours          0x0032   075   075   000    Old_age   Always       -       22600
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       1
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       140
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       301
188 Command_Timeout         0x0032   100   089   000    Old_age   Always       -       171801313320
189 High_Fly_Writes         0x003a   096   096   000    Old_age   Always       -       4
190 Airflow_Temperature_Cel 0x0022   064   046   045    Old_age   Always       -       36 (Min/Max 28/36)
194 Temperature_Celsius     0x0022   036   054   000    Old_age   Always       -       36 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   031   016   000    Old_age   Always       -       86548384
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   193   000    Old_age   Always       -       8172

SMART Error Log Version: 1
ATA Error Count: 301 (device log contains only the most recent five errors)
 CR = Command Register [HEX]
 FR = Features Register [HEX]
 SC = Sector Count Register [HEX]
 SN = Sector Number Register [HEX]
 CL = Cylinder Low Register [HEX]
 CH = Cylinder High Register [HEX]
 DH = Device/Head Register [HEX]
 DC = Device Command Register [HEX]
 ER = Error register [HEX]
 ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 301 occurred at disk power-on lifetime: 16966 hours (706 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+15:16:57.119  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00   3d+15:16:57.119  WRITE FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:57.035  READ LOG EXT
  60 00 08 ff ff ff 4f 00   3d+15:16:55.303  READ FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:55.187  READ LOG EXT

Error 300 occurred at disk power-on lifetime: 16966 hours (706 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+15:16:55.303  READ FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:55.187  READ LOG EXT
  60 00 08 ff ff ff 4f 00   3d+15:16:53.463  READ FPDMA QUEUED
  61 00 08 40 0f 60 40 00   3d+15:16:53.463  WRITE FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:53.386  READ LOG EXT

Error 299 occurred at disk power-on lifetime: 16966 hours (706 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+15:16:53.463  READ FPDMA QUEUED
  61 00 08 40 0f 60 40 00   3d+15:16:53.463  WRITE FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:53.386  READ LOG EXT
  60 00 08 ff ff ff 4f 00   3d+15:16:50.814  READ FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:50.738  READ LOG EXT

Error 298 occurred at disk power-on lifetime: 16966 hours (706 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+15:16:50.814  READ FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:50.738  READ LOG EXT
  60 00 08 ff ff ff 4f 00   3d+15:16:48.997  READ FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:48.857  READ LOG EXT
  61 00 08 ff ff ff 4f 00   3d+15:16:47.118  WRITE FPDMA QUEUED

Error 297 occurred at disk power-on lifetime: 16966 hours (706 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+15:16:48.997  READ FPDMA QUEUED
  2f 00 01 10 00 00 40 00   3d+15:16:48.857  READ LOG EXT
  61 00 08 ff ff ff 4f 00   3d+15:16:47.118  WRITE FPDMA QUEUED
  61 00 08 ff ff ff 4f 00   3d+15:16:47.117  WRITE FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+15:16:47.116  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

エラーの記録がありました。まあ、この程度なら、しばらく使えるとは思いますが。

次は、hdparm -I の出力です。

[root@hoge ~]# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
 Model Number:       ST31000340NS                            
 Serial Number:      5xxxxxxA
 Firmware Revision:  SN04    
Standards:
 Used: ATA/ATAPI-6 T13 1410D revision 2 
 Supported: 6 5 4 
Configuration:
 Logical  max current
 cylinders 16383 16383
 heads  16 16
 sectors/track 63 63
 --
 CHS current addressable sectors:   16514064
 LBA    user addressable sectors:  268435455
 LBA48  user addressable sectors: 1953525168
 Logical/Physical Sector size:           512 bytes
 device size with M = 1024*1024:      953869 MBytes
 device size with M = 1000*1000:     1000204 MBytes (1000 GB)
 cache/buffer size  = unknown
 Nominal Media Rotation Rate: 7200
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, no device specific minimum
 R/W multiple sector transfer: Max = 16 Current = 16
 Recommended acoustic management value: 254, current value: 0
 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
      Cycle time: min=120ns recommended=120ns
 PIO: pio0 pio1 pio2 pio3 pio4 
      Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
 Enabled Supported:
    * SMART feature set
      Security Mode feature set
    * Power Management feature set
    * Write cache
    * Look-ahead
    * Host Protected Area feature set
    * WRITE_BUFFER command
    * READ_BUFFER command
    * DOWNLOAD_MICROCODE
    * SET_MAX security extension
    * 48-bit Address feature set
    * Device Configuration Overlay feature set
    * Mandatory FLUSH_CACHE
    * FLUSH_CACHE_EXT
    * SMART error logging
    * SMART self-test
      General Purpose Logging feature set
    * 64-bit World wide name
    * Write-Read-Verify feature set
    * WRITE_UNCORRECTABLE_EXT command
    * Gen1 signaling speed (1.5Gb/s)
    * Native Command Queueing (NCQ)
    * Phy event counters
    * Software settings preservation
    * SMART Command Transport (SCT) feature set
    * SCT Write Same (AC2)
    * SCT Error Recovery Control (AC3)
    * SCT Features Control (AC4)
    * SCT Data Tables (AC5)
Security: 
 Master password revision code = 65534
  supported
 not enabled
 not locked
 not frozen
 not expired: security count
  supported: enhanced erase
Logical Unit WWN Device Identifier: 5000c500yyyyyyyf
 NAA  : 5
 IEEE OUI : 000c50
 Unique ID : 0yyyyyyyf
Checksum: correct

このあと、４台構成の RAIDZ に組み込みましたが、resilver と scrub でエラーが出ませんでしたので、十分に使えそうな感触です。

2016年7月6日水曜日

CentOS7 で systemd: Starting Session を抑止する方法（rsyslogd のフィルターで抑止）

CentOS7 / RHEL7 だと、cron が動くたびに、"systemd: Starting Session ... of user root." というメッセージが出て、邪魔（ノイズ）だと思っていました。これを抑止する方法としては、systemd 自体のログレベルを下げる方法があるのですが、大事なメッセージも出なくなる懸念があり、邪魔だとは思いつつ、デフォルトのままで運用していました。。。

レッドハットのナレッジベースに、rsyslogd のフィルターで抑止する方法(ナレッジ1564823) があるのを発見。メモメモ(ダブルポインタ)。
きっとレッドハットにも、このメッセージの問い合わせが多数寄せられたのでしょう。

そのまま頂戴して、自分のマシンにも設定施しました。

※ナレッジの設定を施して、systemctl restart rsyslog.service 実行後
[root@hoge ~]# grep Session /var/log/messages | head
Jul  5 04:11:01 hoge systemd: Started Session 32 of user root.
Jul  5 04:11:01 hoge systemd: Starting Session 32 of user root.
Jul  5 04:12:01 hoge systemd: Started Session 33 of user root.
Jul  5 04:12:01 hoge systemd: Starting Session 33 of user root.
Jul  5 04:13:01 hoge systemd: Started Session 34 of user root.
Jul  5 04:13:01 hoge systemd: Starting Session 34 of user root.
Jul  5 04:14:01 hoge systemd: Started Session 35 of user root.
Jul  5 04:14:01 hoge systemd: Starting Session 35 of user root.
Jul  5 04:15:01 hoge systemd: Started Session 36 of user root.
Jul  5 04:15:01 hoge systemd: Starting Session 36 of user root.
[root@hoge ~]# date ; tail -f /var/log/messages
2016年  7月  6日 水曜日 07:02:45 JST
Jul  6 06:55:06 hoge systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jul  6 06:55:07 hoge systemd: [/usr/lib/systemd/system/firstboot-graphical.service:14] Support for option SysVStartPriority= has been removed and it is ignored
Jul  6 06:55:07 hoge systemd: [/usr/lib/systemd/system/initial-setup-text.service:21] Support for option SysVStartPriority= has been removed and it is ignored
Jul  6 06:55:07 hoge systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jul  6 06:55:07 hoge systemd: Configuration file /usr/lib/systemd/system/wpa_supplicant.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Jul  6 06:55:07 hoge rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="1851" x-info="http://www.rsyslog.com"] exiting on signal 15.
Jul  6 06:55:07 hoge rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="10193" x-info="http://www.rsyslog.com"] start
Jul  6 06:55:07 hoge systemd: Stopping System Logging Service...
Jul  6 06:55:07 hoge systemd: Starting System Logging Service...
Jul  6 06:55:07 hoge systemd: Started System Logging Service.

^C
[root@hoge ~]# date
2016年  7月  6日 水曜日 07:03:15 JST

おかげで、ノイズが減って、せいせいしました。。。すっかり慣れっこ (CentOS7/RHEL7 はそういうものとあきらめていました) になってましたが、さすがに１分毎の cron で、大して意味のないメッセージが２行も出るのは、出杉でした。

2016年6月28日火曜日

CentOS 6 の root ファイルシステムに ZFS を使う

２年ほど前に、CentOS 6 の root ファイルシステムに ZFS を使う方法について書きました。
CentOS 6 の root ファイルシステムに ZFS を使ってみた
しかし、当時は、実験的にやってみた程度であり、作成した環境を常用することはありませんでした。一番のネックは、カーネルや ZFS 自身をアップデートする際に、運用手順が煩雑となることでした。

最近になって、CentOS 7 および CentOS 6 向けに kmod-zfs が提供されるようになり、以前よりも運用し易くなったと考えられるので、常設環境を作ってみることにしました。

幸い、先人の方が、CentOS 7 向けの手順をまとめてくれており、こちらを参考にしています。

前回は、CentOS 6 向けの GRUB1 が、ZFS を直接扱えないので、/boot (fsはext4) を別パーティションにしていました。
今回は、ZFS にも対応している CentOS 7 の GRUB2 を使って、直接 ZFS 上の CentOS 6 カーネルを起動する構成にしてみました。

マシンは ThinkPad W520 で、セカンドハードディスクベイに SSD (Crucial MX200) を増設して、次のような構成になっています。

[root@hoge ~]# fdisk -l /dev/sd[ab]

Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x153c11d0

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048     3074047     1536000    7  HPFS/NTFS/exFAT  ※この３つは Windows 環境
/dev/sda2         3074048   101378047    49152000    7  HPFS/NTFS/exFAT
/dev/sda3       101378048   134146047    16384000    7  HPFS/NTFS/exFAT
/dev/sda4       134146048   976773167   421313560    5  Extended
/dev/sda5       134148096   175108095    20480000   83  Linux    ※CentOS 7 をインストール済み（Btrfs ミラー）
/dev/sda6       207878144   248838143    20480000   83  Linux    ※CentOS 6 を ext4 にインストール済み

Disk /dev/sdb: 500.1 GB, 500107862016 bytes, 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x869474c7

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048     3074047     1536000    7  HPFS/NTFS/exFAT  ※この３つは Windows 環境
/dev/sdb2         3074048   101378047    49152000    7  HPFS/NTFS/exFAT
/dev/sdb3       101378048   134146047    16384000    7  HPFS/NTFS/exFAT
/dev/sdb4       134146048   976773167   421313560    5  Extended
/dev/sdb5       134148096   175108095    20480000   83  Linux    ※CentOS 7 の Btrfs ミラー
/dev/sdb6       175110144   207878143    16384000   83  Linux    ※空き、ここに CentOS 6 + ZFS を作成

まず、CentOS 7 および CentOS 6 に、この時点で最新の kmod-zfs-0.6.5.7-1 を、それぞれインストールします。
kmod-zfs のインストール方法は、こちらを参照。
ここで、CentOS 7 には、zfs-dracut をインストールして、
CentOS 6 には、zfs-dracut をインストールしないようにします。

次に、CentOS 7 環境から sdb6 に rpool を作成しました。このとき、GRUB2 が非対応の feature フラグを避ける必要があります。

[root@cent7 ~]# zpool create -d -o feature@async_destroy=enabled -o feature@empty_bpobj=enabled \
               -o feature@lz4_compress=enabled -o ashift=12 -O compression=lz4 rpool /dev/sdb6
[root@cent7 ~]# zfs create rpool/ROOT
[root@cent7 ~]# cd /rpool/ROOT
[root@cent7 ROOT]# dump -0uf - /dev/sda6 | restore -rf -    ※CentOS 6 環境を rpool/ROOT へコピー
[root@cent7 ROOT]# rpm -ql zfs-dracut
/usr/lib/dracut/modules.d/90zfs
/usr/lib/dracut/modules.d/90zfs/export-zfs.sh
/usr/lib/dracut/modules.d/90zfs/module-setup.sh
/usr/lib/dracut/modules.d/90zfs/mount-zfs.sh
/usr/lib/dracut/modules.d/90zfs/parse-zfs.sh
/usr/lib/dracut/modules.d/90zfs/zfs-lib.sh  ※これらをコピー
/usr/share/doc/zfs-dracut-0.6.5.7
/usr/share/doc/zfs-dracut-0.6.5.7/README.dracut.markdown
[root@cent7 ROOT]# cp -r /usr/lib/dracut/modules.d/90zfs ./usr/share/dracut/modules.d
[root@cent7 ROOT]# cd
[root@cent7 ~]# zfs set mountpoint=legacy rpool/ROOT
[root@cent7 ~]# zpool export rpool

このあと、sda6 上の CentOS 6 に起動しなおして、rpool を import して、作業を行います。

[root@cent6 ~]# zpool import rpool
[root@cent6 ~]# mkdir /mnt_rpool_ROOT
[root@cent6 ~]# mount -t zfs rpool/ROOT /mnt_rpool_ROOT
[root@cent6 ~]# cd /mnt_rpool_ROOT/usr/share/dracut/modules.d/90zfs/
[root@cent6 90zfs]# cp module-setup.sh check
[root@cent6 90zfs]# cp module-setup.sh install

以前書いた記事と同様ですが、コピーしたスクリプトをちょっと加工します。差分は次の通りです。

--- module-setup.sh 2016-06-21 01:45:30.846098657 +0900
+++ check 2016-06-21 01:44:42.865095421 +0900
@@ -59,3 +59,5 @@
  DD=`hostid | cut -b 7,8`
  printf "\x${DD}\x${CC}\x${BB}\x${AA}" > "${initdir}/etc/hostid"
 }
+
+check

--- module-setup.sh 2016-06-21 01:45:30.846098657 +0900
+++ install 2016-06-21 01:45:07.471099948 +0900
@@ -59,3 +59,5 @@
  DD=`hostid | cut -b 7,8`
  printf "\x${DD}\x${CC}\x${BB}\x${AA}" > "${initdir}/etc/hostid"
 }
+
+install

次に、３つのスクリプト zfs-lib.sh と parse-zfs.sh と module-setup.sh に修正を加えます。

--- ./zfs-lib.sh.org 2016-05-13 12:19:59.000000000 +0900
+++ ./zfs-lib.sh 2016-06-21 01:34:14.991037019 +0900
@@ -6,6 +6,22 @@
 NEWLINE="
 "
 
+# copied from Fedora19's /usr/lib/dracut/modules.d/99base/dracut-lib.sh
+getargbool() {
+    local _b
+    unset _b
+    local _default
+    _default=$1; shift
+    _b=$(getarg "$@")
+    [ $? -ne 0 -a -z "$_b" ] && _b=$_default
+    if [ -n "$_b" ]; then
+        [ $_b = "0" ] && return 1
+        [ $_b = "no" ] && return 1
+        [ $_b = "off" ] && return 1
+    fi
+    return 0
+}
+
 ZPOOL_IMPORT_OPTS=""
 if getargbool 0 zfs_force -y zfs.force -y zfsforce ; then
  warn "ZFS: Will force-import pools if necessary."

--- ./parse-zfs.sh.org 2016-05-13 12:19:59.000000000 +0900
+++ ./parse-zfs.sh 2016-06-21 01:36:26.200528591 +0900
@@ -55,5 +55,6 @@
 # modules to settle before mounting.
 if [ ${wait_for_zfs} -eq 1 ]; then
  ln -s /dev/null /dev/root 2>/dev/null
- echo '[ -e /dev/zfs ]' > "${hookdir}/initqueue/finished/zfs.sh"
+# echo '[ -e /dev/zfs ]' > "${hookdir}/initqueue/finished/zfs.sh"
+ echo '[ -e /dev/zfs ]' > "/initqueue-finished/zfs.sh"
 fi

--- ./module-setup.sh.org 2016-05-13 12:19:59.000000000 +0900
+++ ./module-setup.sh 2016-06-21 01:45:30.846098657 +0900
@@ -28,20 +28,20 @@
 }
 
 install() {
- inst_rules /usr/lib/udev/rules.d/90-zfs.rules
- inst_rules /usr/lib/udev/rules.d/69-vdev.rules
- inst_rules /usr/lib/udev/rules.d/60-zvol.rules
+ inst_rules /lib/udev/rules.d/90-zfs.rules
+ inst_rules /lib/udev/rules.d/69-vdev.rules
+ inst_rules /lib/udev/rules.d/60-zvol.rules
  dracut_install /sbin/zfs
  dracut_install /sbin/zpool
- dracut_install /usr/lib/udev/vdev_id
- dracut_install /usr/lib/udev/zvol_id
+ dracut_install /lib/udev/vdev_id
+ dracut_install /lib/udev/zvol_id
  dracut_install mount.zfs
  dracut_install hostid
  dracut_install awk
  dracut_install head
  inst_hook cmdline 95 "${moddir}/parse-zfs.sh"
  inst_hook mount 98 "${moddir}/mount-zfs.sh"
- inst_hook shutdown 30 "${moddir}/export-zfs.sh"
+# inst_hook shutdown 30 "${moddir}/export-zfs.sh"
 
  inst_simple "${moddir}/zfs-lib.sh" "/lib/dracut-zfs-lib.sh"
  if [ -e /etc/zfs/zpool.cache ]; then

さらに、/mnt_rpool_ROOT/etc/fstab を次のように書き換えます。

#UUID=xxxxxxxx-yyyy-zzzz-uuuu-vvvvvvvvvvvv /  ext4  defaults  1 1
rpool/ROOT                                 /  zfs   defaults  1 0

ここで、initramfs を再作成します。

[root@cent6 90zfs]# cd
[root@cent6 ~]# mount -t devtmpfs devtmpfs /mnt_rpool_ROOT/dev
[root@cent6 ~]# mount -t devpts devpts /mnt_rpool_ROOT/dev/pts
[root@cent6 ~]# mount -t sysfs sysfs /mnt_rpool_ROOT/sys
[root@cent6 ~]# mount -t proc proc /mnt_rpool_ROOT/proc
[root@cent6 ~]# chroot /mnt_rpool_ROOT /bin/bash
[root@cent6 /]# dracut -f /boot/initramfs-2.6.32-642.1.1.el6.x86_64.img 2.6.32-642.1.1.el6.x86_64

このあと、GRUB2 に CentOS 6 向けの起動エントリーを追加する必要があるので、CentOS 7 に切り替えます。

[root@cent6 /]# exit
exit
[root@cent6 ~]# umount /mnt_rpool_ROOT/proc
[root@cent6 ~]# umount /mnt_rpool_ROOT/sys
[root@cent6 ~]# umount /mnt_rpool_ROOT/dev/pts
[root@cent6 ~]# umount /mnt_rpool_ROOT/dev
[root@cent6 ~]# umount /mnt_rpool_ROOT
[root@cent6 ~]# zpool export rpoot
[root@cent6 ~]# shutdown -r now

/etc/grub.d/40_custom を編集して、次のようなエントリーを作成します。

#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

menuentry 'CentOS 6 on ZFS' --class red --class gnu-linux --class gnu --class os $menuentry_id_option 'CentOS 6 on ZFS' {
    load_video
    insmod gzio
    insmod part_msdos
    insmod zfs
    set root='hd1,msdos6'
    if [ x$feature_platform_search_hint = xy ]; then
        search --no-floppy --label --set=root --hint='hd1,msdos6' rpool
    else
        search --no-floppy --label --set=root rpool
    fi
    linux16 /ROOT@/boot/vmlinuz-2.6.32-642.1.1.el6.x86_64 ro root=ZFS=rpool/ROOT rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=pc KEYTABLE=jp106 rd_NO_LVM rd_NO_DM elevator=deadline nouveau.modeset=0 rdblacklist=nouveau
    initrd16 /ROOT@/boot/initramfs-2.6.32-642.1.1.el6.x86_64.img
}

最後に、grub.cfg を更新します。

[root@cent7 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg 
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-327.18.2.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-327.18.2.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-327.13.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-327.13.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-17561ed66271e2f6923ff3c901624b3a
Found initrd image: /boot/initramfs-0-rescue-17561ed66271e2f6923ff3c901624b3a.img
Found memtest image: /boot/elf-memtest86+-4.20
done

リブートして CentOS 6 on ZFS エントリーを起動した状態です。

[root@cent6 ~]# date
Tue Jun 28 01:11:19 JST 2016
[root@cent6 ~]# df -hT
Filesystem     Type   Size  Used Avail Use% Mounted on
rpool/ROOT     zfs     16G  5.0G   11G  33% /
tmpfs          tmpfs  7.8G   76K  7.8G   1% /dev/shm
rpool          zfs     11G  128K   11G   1% /rpool
[root@cent6 ~]# uname -r
2.6.32-642.1.1.el6.x86_64
[root@cent6 ~]# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 0h0m with 0 errors on Tue Jun 21 03:54:22 2016
config:

        NAME                                             STATE     READ WRITE CKSUM
        rpool                                            ONLINE       0     0     0
          ata-Crucial_CT500MX200SSD3_xxxxxxxxxxxx-part6  ONLINE       0     0     0

errors: No known data errors

2016-06-30追記 (kdump 設定方法)
何か異常が発生した場合に、kdump を採取できるようにしておきたいと思ったのですが、ZFS の考慮などあるはずもなく、クイックハックしてみました。

まず、zvol を使って、ext4 にダンプ出力すれば良いだろうと考え、rpool/kdump を用意しました。

[root@hoge ~]# zfs create -s -V 8g rpool/kdump
[root@hoge ~]# mkfs -t ext4 /dev/rpool/kdump
[root@hoge ~]# tune2fs -i 0 /dev/rpool/kdump
[root@hoge ~]# tune2fs -c 0 /dev/rpool/kdump
[root@hoge ~]# tune2fs -l /dev/rpool/kdump | egrep "^(Max|Check)"
Maximum mount count:      -1
Check interval:           0 (<none>)
[root@hoge ~]# blkid /dev/rpool/kdump 
/dev/rpool/kdump: UUID="a108ed6c-787d-40ff-9971-7db60962ebff" TYPE="ext4"
[root@hoge ~]# vi /etc/fstab
...
[root@hoge ~]# grep /var/crash /etc/fstab
UUID=a108ed6c-787d-40ff-9971-7db60962ebff /var/crash  ext4  defaults,discard  1 1
[root@hoge ~]# mount /var/crash
[root@hoge ~]# mount | grep /var/crash
/dev/zd0 on /var/crash type ext4 (rw,discard)
[root@hoge ~]# df -hT /var/crash
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/zd0       ext4  7.8G   18M  7.4G   1% /var/crash

これで、/etc/kdump.conf に次のように指定すれば、kdump サービスが起動するようになりました。

[root@hoge ~]# cat /etc/kdump.conf
ext4 UUID=a108ed6c-787d-40ff-9971-7db60962ebff
path /
core_collector makedumpfile -l --message-level 23 -d 31
[root@hoge ~]# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-642.1.1.el6.x86_64kdump.img
Warning: There might not be enough space to save a vmcore.
         The size of UUID=a108ed6c-787d-40ff-9971-7db60962ebff should be greater than 16157076 kilo bytes.
Starting kdump:                                            [  OK  ]

しかしながら、この設定だけでは、セカンドカーネルが動作する際、zpool import などの ZFS を考慮する処理が不足しているため、ダンプ採取できませんでした。
「鳴かぬなら、鳴かしてみせよう、ホトトギス」っと、自分で kdump_pre を用意しました。

#! /sbin/busybox msh
#modprobe zfs
mknod /dev/zfs c `awk -F: '{print $1,$2}' /sys/devices/virtual/misc/zfs/dev`
zpool import -f rpool
sleep 3        #2016-07-16追記、sysfs エントリーが作成されるまで少し待たないと mknod が失敗する場合がある
for D in `cd /sys/block ; echo zd*`
do
        mknod /dev/$D b `awk -F: '{print $1,$2}' /sys/block/$D/dev`
done
exit 0

必要なデバイス（/dev/zfs, /dev/zd0）を作成するため、sysfs から major:minor 番号を拾ってきています。

[root@hoge ~]# ls -l /root/bin/kdump-pre.sh 
-rwxr-xr-x 1 root root 246 Jun 29 07:21 /root/bin/kdump-pre.sh
[root@hoge ~]# cat /etc/kdump.conf
extra_bins /sbin/fsck.zfs /sbin/mount.zfs /sbin/zfs /sbin/zpool
kdump_pre /root/bin/kdump-pre.sh
ext4 UUID=a108ed6c-787d-40ff-9971-7db60962ebff
path /
core_collector makedumpfile -l --message-level 23 -d 31
blacklist nvidia ipv6 kvm e1000e
[root@hoge ~]# service kdump restart
Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-642.1.1.el6.x86_64kdump.img
Warning: There might not be enough space to save a vmcore.
         The size of UUID=a108ed6c-787d-40ff-9971-7db60962ebff should be greater than 16157076 kilo bytes.
Starting kdump:                                            [  OK  ]
[root@hoge ~]#

視点を変えて、セカンドカーネル環境では使えるメモリが少ない (わたしの ThinkPad では crashkernel=256M を指定) ので、ZFS のメモリ使用量を抑えるため、primarycache を none に設定しました。また、ダンプの圧縮は makedumpfile -l (LZO圧縮) を利用し、ZFS の圧縮は off にしました。

[root@hoge ~]# egrep -o crashkernel=...M /proc/cmdline 
crashkernel=256M
[root@hoge ~]# zfs set primarycache=none rpool/kdump
[root@hoge ~]# zfs set compression=off   rpool/kdump
[root@hoge ~]# zfs get all -s local
NAME         PROPERTY              VALUE                  SOURCE
rpool        compression           lz4                    local
rpool/ROOT   mountpoint            legacy                 local
rpool/kdump  volsize               8G                     local
rpool/kdump  compression           off                    local
rpool/kdump  primarycache          none                   local

これで、ダンプ採取 (echo c > /proc/sysrq-trigger でテスト) できました。所要時間は 2 分程度です。なお、メインメモリは 16G です。

[root@hoge ~]# ls -l /var/crash/
total 20
drwxr-xr-x 2 root root  4096 Jun 30 23:11 127.0.0.1-2016-06-30-23:09:50
drwx------ 2 root root 16384 Jun 29 05:19 lost+found
[root@hoge ~]# ls -l --full-time /var/crash/127.0.0.1-2016-06-30-23\:09\:50/vmcore*
-rw------- 1 root root 589946534 2016-06-30 23:11:50.981999996 +0900 /var/crash/127.0.0.1-2016-06-30-23:09:50/vmcore
-rw-r--r-- 1 root root     62643 2016-06-30 23:09:51.006000001 +0900 /var/crash/127.0.0.1-2016-06-30-23:09:50/vmcore-dmesg.txt
[root@hoge ~]# free -oom
             total       used       free     shared    buffers     cached
Mem:         15778       2300      13478          4          0        295
Swap:            0          0          0
[root@hoge ~]# df -h /var/crash
Filesystem      Size  Used Avail Use% Mounted on
/dev/zd0        7.8G  612M  6.8G   9% /var/crash
[root@hoge ~]# zfs get compressratio
NAME         PROPERTY       VALUE  SOURCE
rpool        compressratio  1.56x  -
rpool/ROOT   compressratio  1.65x  -
rpool/kdump  compressratio  1.00x  -

2016-07-03追記
kdump は採取できるようになったものの、ダンプ採取後に立ち上がってくると、次のように見えることがわかりました。

[root@hoge ~]# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
 still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
 the pool may no longer be accessible by software that does not support
 the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Jun 29 08:06:37 2016
config:

 NAME        STATE     READ WRITE CKSUM
 rpool       ONLINE       0     0     0
   sdb6      ONLINE       0     0     0

errors: No known data errors

これを直す方法を探ってみたところ、dracut のマニュアルに rdbreak=pre-mount というのがあり、これを指定して起動すると、rpool/ROOT がマウントされる直前で止めることができるようです。起動中断したところで、次のように import することで、元に戻りました。

# zpool import -d /dev/disk/by-id/ rpool

[root@hoge ~]# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
 still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
 the pool may no longer be accessible by software that does not support
 the features. See zpool-features(5) for details.
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Jun 29 08:06:37 2016
config:

 NAME                                             STATE     READ WRITE CKSUM
 rpool                                            ONLINE       0     0     0
   ata-Crucial_CT500MX200SSD3-xxxxxxxxxxxx-part6  ONLINE       0     0     0

errors: No known data errors

2016-07-23追記

[root@hoge ~]# grep ZPOOL_IMPORT_OPTS= /usr/share/dracut/modules.d/90zfs/zfs-lib.sh 
ZPOOL_IMPORT_OPTS="-d /dev/disk/by-id"

90zfs/zfs-lib.sh をこのように修正（-d /dev/disk/by-id オプションを追加）して initramfs を再作成すれば、kdump 採取後にデバイスパスが sdb6 になってしまう現象を回避できました。

2016-07-24追記 (mirror 化)
一ヶ月ほど使用して、途中でカーネルアップデート (2.6.32-642.1.1.el6から2.6.32-642.3.1.el6) も問題なく実施できたので、最後に mirror 化を行いました。

[root@hoge ~]# zpool attach rpool ata-Crucial_CT500MX200SSD3_xxxxxxxxxxxx-part6 ata-Crucial_CT500MX200SSD3_yyyyyyyyyyyy-part6
[root@hoge ~]# zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
 still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
 the pool may no longer be accessible by software that does not support
 the features. See zpool-features(5) for details.
  scan: resilvered 9.64G in 0h11m with 0 errors on Sat Jul 23 18:49:15 2016
config:

 NAME                                               STATE     READ WRITE CKSUM
 rpool                                              ONLINE       0     0     0
   mirror-0                                         ONLINE       0     0     0
     ata-Crucial_CT500MX200SSD3_xxxxxxxxxxxx-part6  ONLINE       0     0     0
     ata-Crucial_CT500MX200SSD3_yyyyyyyyyyyy-part6  ONLINE       0     0     0

errors: No known data errors

2016-09-19追記 (zfs v0.6.5.8 へアップデート)
/ に ZFS を使った環境で、初めての ZFS アップデート（kmod-zfs v0.6.5.7 から v0.6.5.8）。
理論上は大丈夫なはずと思いましたが、利点を生かして snapshot を作成してからアップデート実行しました。

[root@hoge ~]# zfs snapshot rpool/ROOT@2016-09-19-1809
[root@hoge ~]# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT@2016-08-02-1052   105M      -  4.91G  -
rpool/ROOT@2016-09-02-0541  55.7M      -  4.90G  -
rpool/ROOT@2016-09-19-1809      0      -  4.89G  -
[root@hoge ~]# yum update --enablerepo=zfs
Loaded plugins: fastestmirror, nvidia, priorities, refresh-packagekit, security
Setting up Update Process
Loading mirror speeds from cached hostfile
 * base: ftp.riken.jp
 * extras: ftp.riken.jp
 * updates: ftp.riken.jp
Resolving Dependencies
--> Running transaction check
---> Package kmod-spl.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package kmod-spl.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package kmod-zfs.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package kmod-zfs.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package libnvpair1.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package libnvpair1.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package libuutil1.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package libuutil1.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package libzfs2.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package libzfs2.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package libzpool2.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package libzpool2.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package spl.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package spl.x86_64 0:0.6.5.8-1.el6 will be an update
---> Package zfs.x86_64 0:0.6.5.7-1.el6 will be updated
---> Package zfs.x86_64 0:0.6.5.8-1.el6 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package             Arch            Version                 Repository    Size
================================================================================
Updating:
 kmod-spl            x86_64          0.6.5.8-1.el6           zfs          111 k
 kmod-zfs            x86_64          0.6.5.8-1.el6           zfs          636 k
 libnvpair1          x86_64          0.6.5.8-1.el6           zfs           33 k
 libuutil1           x86_64          0.6.5.8-1.el6           zfs           38 k
 libzfs2             x86_64          0.6.5.8-1.el6           zfs          119 k
 libzpool2           x86_64          0.6.5.8-1.el6           zfs          408 k
 spl                 x86_64          0.6.5.8-1.el6           zfs           27 k
 zfs                 x86_64          0.6.5.8-1.el6           zfs          330 k

Transaction Summary
================================================================================
Upgrade       8 Package(s)

Total download size: 1.7 M
Is this ok [y/N]: y
Downloading Packages:
(1/8): kmod-spl-0.6.5.8-1.el6.x86_64.rpm                 | 111 kB     00:00     
(2/8): kmod-zfs-0.6.5.8-1.el6.x86_64.rpm                 | 636 kB     00:00     
(3/8): libnvpair1-0.6.5.8-1.el6.x86_64.rpm               |  33 kB     00:00     
(4/8): libuutil1-0.6.5.8-1.el6.x86_64.rpm                |  38 kB     00:00     
(5/8): libzfs2-0.6.5.8-1.el6.x86_64.rpm                  | 119 kB     00:00     
(6/8): libzpool2-0.6.5.8-1.el6.x86_64.rpm                | 408 kB     00:00     
(7/8): spl-0.6.5.8-1.el6.x86_64.rpm                      |  27 kB     00:00     
(8/8): zfs-0.6.5.8-1.el6.x86_64.rpm                      | 330 kB     00:00     
--------------------------------------------------------------------------------
Total                                           456 kB/s | 1.7 MB     00:03     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Updating   : libuutil1-0.6.5.8-1.el6.x86_64                              1/16 
  Updating   : libnvpair1-0.6.5.8-1.el6.x86_64                             2/16 
  Updating   : libzpool2-0.6.5.8-1.el6.x86_64                              3/16 
  Updating   : kmod-spl-0.6.5.8-1.el6.x86_64                               4/16 
  Updating   : spl-0.6.5.8-1.el6.x86_64                                    5/16 
  Updating   : libzfs2-0.6.5.8-1.el6.x86_64                                6/16 
  Updating   : kmod-zfs-0.6.5.8-1.el6.x86_64                               7/16 
  Updating   : zfs-0.6.5.8-1.el6.x86_64                                    8/16 
  Cleanup    : kmod-zfs-0.6.5.7-1.el6.x86_64                               9/16 
  Cleanup    : zfs-0.6.5.7-1.el6.x86_64                                   10/16 
  Cleanup    : libzfs2-0.6.5.7-1.el6.x86_64                               11/16 
  Cleanup    : libzpool2-0.6.5.7-1.el6.x86_64                             12/16 
  Cleanup    : libnvpair1-0.6.5.7-1.el6.x86_64                            13/16 
  Cleanup    : spl-0.6.5.7-1.el6.x86_64                                   14/16 
  Cleanup    : kmod-spl-0.6.5.7-1.el6.x86_64                              15/16 
  Cleanup    : libuutil1-0.6.5.7-1.el6.x86_64                             16/16 
  Verifying  : libnvpair1-0.6.5.8-1.el6.x86_64                             1/16 
  Verifying  : libzfs2-0.6.5.8-1.el6.x86_64                                2/16 
  Verifying  : zfs-0.6.5.8-1.el6.x86_64                                    3/16 
  Verifying  : spl-0.6.5.8-1.el6.x86_64                                    4/16 
  Verifying  : kmod-zfs-0.6.5.8-1.el6.x86_64                               5/16 
  Verifying  : libuutil1-0.6.5.8-1.el6.x86_64                              6/16 
  Verifying  : libzpool2-0.6.5.8-1.el6.x86_64                              7/16 
  Verifying  : kmod-spl-0.6.5.8-1.el6.x86_64                               8/16 
  Verifying  : kmod-zfs-0.6.5.7-1.el6.x86_64                               9/16 
  Verifying  : spl-0.6.5.7-1.el6.x86_64                                   10/16 
  Verifying  : libzpool2-0.6.5.7-1.el6.x86_64                             11/16 
  Verifying  : zfs-0.6.5.7-1.el6.x86_64                                   12/16 
  Verifying  : libzfs2-0.6.5.7-1.el6.x86_64                               13/16 
  Verifying  : libnvpair1-0.6.5.7-1.el6.x86_64                            14/16 
  Verifying  : kmod-spl-0.6.5.7-1.el6.x86_64                              15/16 
  Verifying  : libuutil1-0.6.5.7-1.el6.x86_64                             16/16 

Updated:
  kmod-spl.x86_64 0:0.6.5.8-1.el6         kmod-zfs.x86_64 0:0.6.5.8-1.el6       
  libnvpair1.x86_64 0:0.6.5.8-1.el6       libuutil1.x86_64 0:0.6.5.8-1.el6      
  libzfs2.x86_64 0:0.6.5.8-1.el6          libzpool2.x86_64 0:0.6.5.8-1.el6      
  spl.x86_64 0:0.6.5.8-1.el6              zfs.x86_64 0:0.6.5.8-1.el6            

Complete!
[root@hoge ~]#

カーネルアップデートとは少し異なり、既存環境のinitramfs再作成が行われたので、少し時間かかりました。

[root@hoge ~]# ls -l /boot/initramfs-2.6.32-*
-rw------- 1 root root 21810840 Sep 19 18:11 /boot/initramfs-2.6.32-573.22.1.el6.x86_64.img
-rw------- 1 root root 22336843 Sep 19 18:12 /boot/initramfs-2.6.32-642.4.2.el6.x86_64.img

その後、再起動して、問題なく立ち上がりました。

2017-04-06追記 (カーネルを CentOS 6.9 kernel-2.6.32-696.el6.x86_64 へアップデート・・・起動できず)
CentOS 6.9 が出たようで、なんともなしにカーネルを最新化してみたのですが、起動できませんでした。
旧カーネル 642.15.1.el6 で再起動して調べてみると、initramfs に zfs.ko が入ってませんでした。このため root ファイルシステムをマウントできず、起動できなかったようです。
で、無理矢理 initramfs に zfs.ko を挿入して起動してみたら、posix_acl_equiv_mode というシンボルの解決が出来ず、zfs.ko がロードできないという状況でした。この「posix_acl_equiv_mode」をキーワードに google 検索してみたところ、既にレポートが上がってました。
https://github.com/zfsonlinux/zfs/issues/5930
どうやら kABI 互換が崩れたようです。まあ、そのうち対応版が出るでしょうし、それまでは 642系で過ごせばよいや。まさか、この blog を見て、マネしてる人は居ないとは思いますが、参考＆備忘録でした。
2017-04-27追記
issue#5930 は、既に FIX されていました。報告を上げてくれた人＆開発者の方に感謝です。さっそく kmod-zfs をアップデートして、kernel-2.6.32-696.1.1.el6.x86_64 でブートfrom ZFS できました。

2017-07-23追記、zfs-0.7.0-rc5を試したが失敗
Roadmap によると 0.7.0 も、もう一息 (99% complete) になっています。そこで、この環境で 0.7.0-rc5 を試してみたのですが、起動できなくなってしまいました。

[root@cent6 ~]# yum update --enablerepo=zfs-testing-kmod
Loaded plugins: fastestmirror, nvidia, priorities, refresh-packagekit, security
Setting up Update Process
Loading mirror speeds from cached hostfile
 * base: mirror.0x.sg
 * extras: mirror.nus.edu.sg
 * updates: centos.usonyx.net
zfs-testing-kmod                                         | 2.9 kB     00:00     
zfs-testing-kmod/primary_db                              | 168 kB     00:00     
Resolving Dependencies
--> Running transaction check
---> Package kmod-spl.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package kmod-spl.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package kmod-zfs.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package kmod-zfs.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package libnvpair1.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package libnvpair1.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package libuutil1.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package libuutil1.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package libzfs2.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package libzfs2.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package libzpool2.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package libzpool2.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package spl.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package spl.x86_64 0:0.7.0-rc5.el6 will be an update
---> Package zfs.x86_64 0:0.6.5.11-1.el6 will be updated
---> Package zfs.x86_64 0:0.7.0-rc5.el6 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package          Arch         Version             Repository              Size
================================================================================
Updating:
 kmod-spl         x86_64       0.7.0-rc5.el6       zfs-testing-kmod       108 k
 kmod-zfs         x86_64       0.7.0-rc5.el6       zfs-testing-kmod       804 k
 libnvpair1       x86_64       0.7.0-rc5.el6       zfs-testing-kmod        26 k
 libuutil1        x86_64       0.7.0-rc5.el6       zfs-testing-kmod        32 k
 libzfs2          x86_64       0.7.0-rc5.el6       zfs-testing-kmod       126 k
 libzpool2        x86_64       0.7.0-rc5.el6       zfs-testing-kmod       544 k
 spl              x86_64       0.7.0-rc5.el6       zfs-testing-kmod        27 k
 zfs              x86_64       0.7.0-rc5.el6       zfs-testing-kmod       402 k

Transaction Summary
================================================================================
Upgrade       8 Package(s)

Total download size: 2.0 M
Is this ok [y/N]: y
このあと rpool/ROOT をマウントできずブートできなくなりました。

dracut のオプション rdshell で、マウントできないのがなぜか調べてみると、zpool コマンドが segfault になってしまってました。ひとまず、あきらめて zfs rollback しました。

[root@cent7 ~]# zfs list -t snapshot
NAME                         USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT@2017-04-06-1022   891M      -  4.82G  -
rpool/ROOT@2017-06-22-0013   445M      -  4.90G  -
rpool/ROOT@2017-07-23-1841  69.2M      -  4.97G  -
rpool/ROOT@2017-07-23-2123  22.5M      -  5.10G  -
[root@cent7 ~]# zfs rollback rpool/ROOT@2017-07-23-2123

rollback は、デュアルブートの CentOS 7 で行いました。ZFS の利点を享受しました。

2017-07-27追記、zfs-0.7.0-1 へのアップデートはあっさりと成功
0.7.0-1 がリリースされたので、アップデートしてみましたが、今度はあっさり成功しました。

[root@hoge ~]# rpm -qi kmod-zfs
Name        : kmod-zfs                     Relocations: (not relocatable)
Version     : 0.7.0                             Vendor: (none)
Release     : 1.el6                         Build Date: Thu 27 Jul 2017 08:02:31 AM JST
Install Date: Thu 27 Jul 2017 11:26:42 PM JST      Build Host: centos-6-repo
Group       : System Environment/Kernel     Source RPM: zfs-kmod-0.7.0-1.el6.src.rpm
Size        : 3833208                          License: CDDL
Signature   : RSA/SHA1, Thu 27 Jul 2017 08:04:56 AM JST, Key ID a9d5a1c0f14ab620
URL         : http://zfsonlinux.org/
Summary     : zfs kernel module(s)
Description :
This package provides the zfs kernel modules built for
the Linux kernel 2.6.32-696.6.3.el6.x86_64 for the x86_64
family of processors.
[root@hoge ~]# df -T
Filesystem     Type  1K-blocks      Used Available Use% Mounted on
rpool/ROOT     zfs    13072896   5303040   7769856  41% /
tmpfs          tmpfs   8078524     43608   8034916   1% /dev/shm
/dev/zd0       ext4    8125880     50432   7656020   1% /KDUMP
rpool          zfs     7774464      4608   7769856   1% /rpool

同じワークロードでも、0.6 よりもメモリ使用量が少なくなったようでした。munin のデータから。

2017-12-17追記、zfs-0.7.4-1 へのアップデート
順次アップデートしながら、１年半ほど運用を続けていますが、特に問題なしです。

[root@hoge ~]# rpm -qi kmod-zfs
Name        : kmod-zfs                     Relocations: (not relocatable)
Version     : 0.7.4                             Vendor: (none)
Release     : 1.el6                         Build Date: Fri 08 Dec 2017 06:41:07 AM JST
Install Date: Sun 17 Dec 2017 07:51:05 PM JST      Build Host: centos-6-repo
Group       : System Environment/Kernel     Source RPM: zfs-kmod-0.7.4-1.el6.src.rpm
Size        : 3841048                          License: CDDL
Signature   : RSA/SHA1, Fri 08 Dec 2017 06:41:46 AM JST, Key ID a9d5a1c0f14ab620
URL         : http://zfsonlinux.org/
Summary     : zfs kernel module(s)
Description :
This package provides the zfs kernel modules built for
the Linux kernel 2.6.32-696.16.1.el6.x86_64 for the x86_64
family of processors.
[root@hoge ~]# df -T
Filesystem     Type  1K-blocks      Used Available Use% Mounted on
rpool/ROOT     zfs    12766336   4995456   7770880  40% /
tmpfs          tmpfs   8078524     43368   8035156   1% /dev/shm
/dev/zd0       ext4    8125880     50432   7656020   1% /KDUMP
rpool          zfs     7778432      7552   7770880   1% /rpool

2016年6月26日日曜日

中古 HDD の初期確認、３個目（2016年6月）

中古 HDD の初期確認。前回に続き、某ショップより３個目を入手しましたので、今回も備忘録です。

今回入手した中古 HDD も、前回と同じく Seagate Barracuda ES.2 1TB です。前回入手したものは S.M.A.R.T. エラーがある状態とのことで、かなり安い値段でしたが、今回の個体は、前回の値段＋600円程度で入手できました。

まず、S.M.A.R.T. の確認です。

[root@hoge ~]# smartctl -A /dev/sdf
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.1.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   063   044    Pre-fail  Always       -       187036203
  3 Spin_Up_Time            0x0003   099   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1247
  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
  7 Seek_Error_Rate         0x000f   061   053   030    Pre-fail  Always       -       343714111331
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21800
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       125
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   096   096   000    Old_age   Always       -       4
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       472453611632
189 High_Fly_Writes         0x003a   087   087   000    Old_age   Always       -       13
190 Airflow_Temperature_Cel 0x0022   062   046   045    Old_age   Always       -       38 (Min/Max 34/39)
194 Temperature_Celsius     0x0022   038   054   000    Old_age   Always       -       38 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   043   026   000    Old_age   Always       -       187036203
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   170   000    Old_age   Always       -       119834

188 Command_Timeout がとんでもなく大きな値ですが、5 Reallocated_Sector_Ct が、１個目と２個目の半分ぐらいの値です。前回までの経験値からすると、十分に使えそうな感触です。
運用時間（9 Power_On_Hours）は、

[root@hoge ~]# echo "21800 / 24 / 365" | bc -l
2.48858447488584474885

２年半程度でした。なお、HDD のラベルに Date Code: 08385 とあり、2008年3月製造のようです。

次は hdparm -i の出力です。

[root@hoge ~]# hdparm -i /dev/sdf

/dev/sdf:

 Model=ST31000340NS, FwRev=SN06, SerialNo=5xxxxxxC
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode

ファームウェアのバージョンが、１個目は NA02、２個目が FSC9 でしたが、この３個目は SN06 でした。

次は、smartctl -a の出力です。

[root@hoge ~]# smartctl -a /dev/sdf
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.1.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST31000340NS
Serial Number:    5xxxxxxC
LU WWN Device Id: 5 000c50 0yyyyyyy4
Firmware Version: SN06
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Jun 23 17:54:27 2016 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
     without error or no self-test has ever 
     been run.
Total time to complete Offline 
data collection:   (  642) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (   1) minutes.
Extended self-test routine
recommended polling time:   ( 222) minutes.
Conveyance self-test routine
recommended polling time:   (   2) minutes.
SCT capabilities:         (0x103d) SCT Status supported.
     SCT Error Recovery Control supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   063   044    Pre-fail  Always       -       187036203
  3 Spin_Up_Time            0x0003   099   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1247
  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
  7 Seek_Error_Rate         0x000f   061   053   030    Pre-fail  Always       -       343714111331
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21800
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       125
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   096   096   000    Old_age   Always       -       4
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       472453611632
189 High_Fly_Writes         0x003a   087   087   000    Old_age   Always       -       13
190 Airflow_Temperature_Cel 0x0022   062   046   045    Old_age   Always       -       38 (Min/Max 34/39)
194 Temperature_Celsius     0x0022   038   054   000    Old_age   Always       -       38 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   043   026   000    Old_age   Always       -       187036203
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   170   000    Old_age   Always       -       119834

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     21566         -
# 2  Short offline       Completed without error       00%         7         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

エラーログは無く (No Errors Logged)、Extended offline テスト (hdparm -t long) を行っても、エラーは検出されませんでした。

次は、hdparm -I の出力です。

[root@hoge ~]# hdparm -I /dev/sdf

/dev/sdf:

ATA device, with non-removable media
 Model Number:       ST31000340NS                            
 Serial Number:      5xxxxxxC
 Firmware Revision:  SN06    
 Transport:          Serial
Standards:
 Used: unknown (minor revision code 0x0029) 
 Supported: 8 7 6 5 
 Likely used: 8
Configuration:
 Logical  max current
 cylinders 16383 16383
 heads  16 16
 sectors/track 63 63
 --
 CHS current addressable sectors:   16514064
 LBA    user addressable sectors:  268435455
 LBA48  user addressable sectors: 1953525168
 Logical/Physical Sector size:           512 bytes
 device size with M = 1024*1024:      953869 MBytes
 device size with M = 1000*1000:     1000204 MBytes (1000 GB)
 cache/buffer size  = unknown
 Nominal Media Rotation Rate: 7200
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, no device specific minimum
 R/W multiple sector transfer: Max = 16 Current = 16
 Recommended acoustic management value: 254, current value: 0
 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
      Cycle time: min=120ns recommended=120ns
 PIO: pio0 pio1 pio2 pio3 pio4 
      Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
 Enabled Supported:
    * SMART feature set
      Security Mode feature set
    * Power Management feature set
    * Write cache
    * Look-ahead
    * Host Protected Area feature set
    * WRITE_BUFFER command
    * READ_BUFFER command
    * DOWNLOAD_MICROCODE
    * SET_MAX security extension
    * 48-bit Address feature set
    * Device Configuration Overlay feature set
    * Mandatory FLUSH_CACHE
    * FLUSH_CACHE_EXT
    * SMART error logging
    * SMART self-test
    * General Purpose Logging feature set
    * 64-bit World wide name
      Write-Read-Verify feature set
    * WRITE_UNCORRECTABLE_EXT command
    * {READ,WRITE}_DMA_EXT_GPL commands
    * Segmented DOWNLOAD_MICROCODE
    * Gen1 signaling speed (1.5Gb/s)
    * Gen2 signaling speed (3.0Gb/s)
    * Native Command Queueing (NCQ)
    * Phy event counters
    * Software settings preservation
    * SMART Command Transport (SCT) feature set
    * SCT Write Same (AC2)
    * SCT Error Recovery Control (AC3)
    * SCT Features Control (AC4)
    * SCT Data Tables (AC5)
      unknown 206[12] (vendor specific)
Security: 
 Master password revision code = 65534
  supported
 not enabled
 not locked
 not frozen
 not expired: security count
  supported: enhanced erase
 190min for SECURITY ERASE UNIT. 190min for ENHANCED SECURITY ERASE UNIT. 
Logical Unit WWN Device Identifier: 5000c500yyyyyyy4
 NAA  : 5
 IEEE OUI : 000c50
 Unique ID : 0yyyyyyy4
Checksum: correct

このあと、４台構成の RAIDZ に組み込みましたが、resilver と scrub でエラーが出ませんでしたので、十分に使えそうな感触です。

2018-10-23追記
とうとう限界に達したようなので、６個目の中古HDD を購入して、zpool replace しました。まだ、OSから認識できていますので、廃棄する前に smartctl のデータを採取しました。

smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-754.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST31000340NS
Serial Number:    5xxxxxxC
LU WWN Device Id: 5 000c50 0yyyyyyy4
Firmware Version: SN06
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Oct 22 13:05:36 2018 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
     without error or no self-test has ever 
     been run.
Total time to complete Offline 
data collection:   (  642) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (   1) minutes.
Extended self-test routine
recommended polling time:   ( 222) minutes.
Conveyance self-test routine
recommended polling time:   (   2) minutes.
SCT capabilities:         (0x103d) SCT Status supported.
     SCT Error Recovery Control supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   048   048   044    Pre-fail  Always       -       31115640
  3 Spin_Up_Time            0x0003   099   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1270
  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
  7 Seek_Error_Rate         0x000f   063   053   030    Pre-fail  Always       -       348075415785
  9 Power_On_Hours          0x0032   052   052   000    Old_age   Always       -       42139
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       148
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   038   038   000    Old_age   Always       -       62
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       472453611636
189 High_Fly_Writes         0x003a   083   083   000    Old_age   Always       -       17
190 Airflow_Temperature_Cel 0x0022   068   046   045    Old_age   Always       -       32 (Min/Max 32/33)
194 Temperature_Celsius     0x0022   032   054   000    Old_age   Always       -       32 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   016   016   000    Old_age   Always       -       31115640
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    0x003e   200   170   000    Old_age   Always       -       119834

SMART Error Log Version: 1
ATA Error Count: 4
 CR = Command Register [HEX]
 FR = Features Register [HEX]
 SC = Sector Count Register [HEX]
 SN = Sector Number Register [HEX]
 CL = Cylinder Low Register [HEX]
 CH = Cylinder High Register [HEX]
 DH = Device/Head Register [HEX]
 DC = Device Command Register [HEX]
 ER = Error register [HEX]
 ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4 occurred at disk power-on lifetime: 38827 hours (1617 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00   5d+00:53:17.265  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+00:53:17.263  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+00:53:17.261  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+00:53:17.257  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   5d+00:53:17.253  READ FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 38811 hours (1617 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 88 ff ff ff 4f 00   4d+09:28:47.666  READ FPDMA QUEUED
  60 00 e0 ff ff ff 4f 00   4d+09:28:47.663  READ FPDMA QUEUED
  60 00 18 ff ff ff 4f 00   4d+09:28:47.661  READ FPDMA QUEUED
  60 00 28 ff ff ff 4f 00   4d+09:28:47.388  READ FPDMA QUEUED
  60 00 28 ff ff ff 4f 00   4d+09:28:47.387  READ FPDMA QUEUED

Error 2 occurred at disk power-on lifetime: 38811 hours (1617 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 b0 ff ff ff 4f 00   4d+09:19:32.814  READ FPDMA QUEUED
  60 00 b0 ff ff ff 4f 00   4d+09:19:31.195  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   4d+09:19:30.826  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   4d+09:19:30.425  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   4d+09:19:28.712  READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 36767 hours (1531 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   9d+07:30:59.896  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   9d+07:30:59.795  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   9d+07:30:59.794  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   9d+07:30:59.785  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   9d+07:30:59.776  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    90%     41712         0
# 2  Extended offline    Completed: unknown failure    90%     41637         0
# 3  Extended offline    Completed: unknown failure    90%     41637         0
# 4  Short offline       Completed without error       00%     39497         -
# 5  Extended offline    Completed without error       00%     38853         -
# 6  Short offline       Completed without error       00%     38849         -
# 7  Short offline       Completed without error       00%     37103         -
# 8  Extended offline    Completed without error       00%     21566         -
# 9  Short offline       Completed without error       00%         7         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

このように、Reallocated_Sector_Ct が、しきい値 (THRESH) を下回り、FAILING_NOW という状態になりました。また、smartctl -t short や long を行っても、unknown failure になりました。貴重なデータと思います。
cron で週１回 smartctl -A のデータを収集していましたので、Reallocated_Sector_Ct の推移も貼っておきます。


[root@hoge ~]# grep Reallocated_Sector_Ct ata-ST31000340NS_5xxxxxxC-201*
ata-ST31000340NS_5xxxxxxC-2016-06-22-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-06-29-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-07-13-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-07-20-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-07-27-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-08-03-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-08-10-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-08-17-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-08-24-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-08-31-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-09-07-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-09-14-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-09-21-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-09-28-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-10-05-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-10-12-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-10-19-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-10-26-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-11-02-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-11-09-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-11-16-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-11-23-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-11-30-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-12-07-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-12-14-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-12-21-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2016-12-28-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-01-04-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-01-11-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-01-18-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-01-25-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-02-01-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-02-08-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-02-15-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-02-22-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-03-01-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-03-08-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-03-15-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-03-22-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-03-29-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-04-05-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-04-12-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-04-19-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-04-26-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-05-03-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-05-10-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-05-17-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-05-24-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-05-31-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-06-07-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-06-14-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-06-21-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-06-28-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-07-05-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-07-12-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-07-19-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-07-26-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-08-02-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-08-09-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-08-16-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-08-23-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-08-30-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-09-06-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-09-13-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-09-20-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-09-27-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-10-04-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-10-11-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       150
ata-ST31000340NS_5xxxxxxC-2017-10-18-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       151
ata-ST31000340NS_5xxxxxxC-2017-10-25-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       152
ata-ST31000340NS_5xxxxxxC-2017-11-01-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       152
ata-ST31000340NS_5xxxxxxC-2017-11-08-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       152
ata-ST31000340NS_5xxxxxxC-2017-11-15-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       152
ata-ST31000340NS_5xxxxxxC-2017-11-22-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       152
ata-ST31000340NS_5xxxxxxC-2017-11-29-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       152
ata-ST31000340NS_5xxxxxxC-2017-12-06-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2017-12-13-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2017-12-20-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2017-12-27-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-01-03-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-01-10-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-01-17-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-01-24-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-01-31-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-02-07-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-02-14-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-02-21-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-02-28-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-03-07-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-03-14-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-03-21-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-03-28-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-04-04-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-04-11-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-04-18-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-04-25-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-05-02-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       153
ata-ST31000340NS_5xxxxxxC-2018-05-09-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       154
ata-ST31000340NS_5xxxxxxC-2018-05-16-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       154
ata-ST31000340NS_5xxxxxxC-2018-05-23-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       154
ata-ST31000340NS_5xxxxxxC-2018-05-30-1206.log:  5 Reallocated_Sector_Ct   0x0033   093   093   036    Pre-fail  Always       -       154
ata-ST31000340NS_5xxxxxxC-2018-06-06-1206.log:  5 Reallocated_Sector_Ct   0x0033   089   089   036    Pre-fail  Always       -       228
ata-ST31000340NS_5xxxxxxC-2018-06-13-1206.log:  5 Reallocated_Sector_Ct   0x0033   089   089   036    Pre-fail  Always       -       245
ata-ST31000340NS_5xxxxxxC-2018-06-20-1206.log:  5 Reallocated_Sector_Ct   0x0033   089   089   036    Pre-fail  Always       -       245
ata-ST31000340NS_5xxxxxxC-2018-06-27-1206.log:  5 Reallocated_Sector_Ct   0x0033   089   089   036    Pre-fail  Always       -       245
ata-ST31000340NS_5xxxxxxC-2018-07-11-1206.log:  5 Reallocated_Sector_Ct   0x0033   084   084   036    Pre-fail  Always       -       346
ata-ST31000340NS_5xxxxxxC-2018-07-18-1206.log:  5 Reallocated_Sector_Ct   0x0033   073   073   036    Pre-fail  Always       -       568
ata-ST31000340NS_5xxxxxxC-2018-07-25-1206.log:  5 Reallocated_Sector_Ct   0x0033   062   062   036    Pre-fail  Always       -       786
ata-ST31000340NS_5xxxxxxC-2018-08-01-1206.log:  5 Reallocated_Sector_Ct   0x0033   045   045   036    Pre-fail  Always       -       1128
ata-ST31000340NS_5xxxxxxC-2018-08-08-1206.log:  5 Reallocated_Sector_Ct   0x0033   007   007   036    Pre-fail  Always   FAILING_NOW 1923
ata-ST31000340NS_5xxxxxxC-2018-08-15-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-08-22-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-08-29-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-09-12-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-09-26-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-10-03-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-10-10-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044
ata-ST31000340NS_5xxxxxxC-2018-10-17-1206.log:  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2044

このように、急激に値が上昇していました。これも貴重なデータかと思います。Reallocated_Sector_Ct の値を定期的に追跡しておけば、予防交換や緊急バックアップを行うか？のヒントにはなりそうです。

登録: 投稿 (Atom)

2016年12月12日月曜日

RHEL7 カーネルのうるう秒（閏秒）に関する不具合

2016年10月29日土曜日

使わないデカイ rpm をリストアップして削除する方法

2016年9月17日土曜日

CentOS6 + ZFS on Linux 環境での updatedb による余分なI/O負荷

2016年8月2日火曜日

WD GREEN が壊れかけた

2016年7月31日日曜日

2.5インチHDDのLoad_Cycle_Countの上昇

2016年7月18日月曜日

中古 HDD の初期確認、４個目（2016年7月）

2016年7月6日水曜日

CentOS7 で systemd: Starting Session を抑止する方法（rsyslogd のフィルターで抑止）

2016年6月28日火曜日

CentOS 6 の root ファイルシステムに ZFS を使う

2016年6月26日日曜日

中古 HDD の初期確認、３個目（2016年6月）

ALL about Linux

ページビューの合計

自己紹介

ラベル

注目の投稿

CentOS 7 の root ファイルシステムに ZFS を使う

人気の投稿

ブログアーカイブ

フォロワー

2016年12月12日月曜日

2016年10月29日土曜日

2016年9月17日土曜日

2016年8月2日火曜日

2016年7月31日日曜日

2016年7月18日月曜日

2016年7月6日水曜日

2016年6月28日火曜日

2016年6月26日日曜日

ページビューの合計

自己紹介

ラベル

注目の投稿

人気の投稿

ブログ アーカイブ

フォロワー

ブログアーカイブ