2015年10月12日月曜日

中古 HDD の初期確認、2個目(2015年10月)

中古 HDD の初期確認。前回に続き、某ショップより2個目を入手しましたので、今回も備忘録です。

今回入手した中古 HDD は、1個目と同じく Seagate Barracuda ES.2 1TB です。S.M.A.R.T. でエラーがある状態とのことで、1個目よりも千円ほど安い値段でした。1個目は、入手直後こそエラーがない状態でしたが、使い始めた途端にエラーが多発し、不良セクタ処置後は3ヵ月ほど安定して利用できています。その経験値から、最初からエラー(不良セクタ)があるとしても、同じように処置すれば、そこそこ使えるのでは?という判断のもと、安さ優先で調達しました。

そんな訳で、一種の賭けでしたが、認識はされました(さすがに認識もされない個体は売ってないですかね)ので、最初に S.M.A.R.T. 確認です。
[root@hoge ~]# smartctl -A /dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-229.14.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   065   039   044    Pre-fail  Always   In_the_past 3585868
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       35
  5 Reallocated_Sector_Ct   0x0033   082   082   036    Pre-fail  Always       -       379
  7 Seek_Error_Rate         0x000f   070   057   030    Pre-fail  Always       -       434963003265
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       26851
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       35
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       574
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       8590065669
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   074   054   045    Old_age   Always       -       26 (Min/Max 26/26)
194 Temperature_Celsius     0x0022   026   046   000    Old_age   Always       -       26 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   024   024   000    Old_age   Always       -       3585868
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
某ショップの説明通り、そこそこのキズモノのようですが、1個目と見比べると 188 Command_Timeout が多発しているのが特に気になります。
運用時間(9 Power_On_Hours)は、
[root@hoge ~]# echo "26851 / 24 / 365" | bc -l
3.06518264840182648401
ほぼ3年です。

次は hdparm -i の出力です。
[root@hoge ~]# hdparm -i /dev/sdf

/dev/sdf:

 Model=ST31000340NS, FwRev=FSC9, SerialNo=9xxxxxx6
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode

1個目は、Write cache が disabled でしたが、2個目は、特に操作しなくても WriteCache=enabled でした。
それと、ファームウェアのバージョンが異りました。1個目は NA02 ですが、こちらは FSC9 。

次は、smartctl -a の出力です。
[root@hoge ~]# smartctl -a /dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-229.14.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST31000340NS
Serial Number:    9xxxxxx6
LU WWN Device Id: 5 000c50 0yyyyyyy5
Firmware Version: FSC9
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Fri Oct  9 12:47:55 2015 JST

==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/207963en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
     without error or no self-test has ever 
     been run.
Total time to complete Offline 
data collection:   (  642) seconds.
Offline data collection
capabilities:     (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (   1) minutes.
Extended self-test routine
recommended polling time:   ( 225) minutes.
Conveyance self-test routine
recommended polling time:   (   2) minutes.
SCT capabilities:         (0x103d) SCT Status supported.
     SCT Error Recovery Control supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   065   039   044    Pre-fail  Always   In_the_past 3585868
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       35
  5 Reallocated_Sector_Ct   0x0033   082   082   036    Pre-fail  Always       -       379
  7 Seek_Error_Rate         0x000f   070   057   030    Pre-fail  Always       -       434963004045
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       26852
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       35
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       574
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       8590065669
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   054   045    Old_age   Always       -       32 (Min/Max 26/32)
194 Temperature_Celsius     0x0022   032   046   000    Old_age   Always       -       32 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   024   024   000    Old_age   Always       -       3585868
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 15 (device log contains only the most recent five errors)
 CR = Command Register [HEX]
 FR = Features Register [HEX]
 SC = Sector Count Register [HEX]
 SN = Sector Number Register [HEX]
 CL = Cylinder Low Register [HEX]
 CH = Cylinder High Register [HEX]
 DH = Device/Head Register [HEX]
 DC = Device Command Register [HEX]
 ER = Error register [HEX]
 ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 15 occurred at disk power-on lifetime: 26830 hours (1117 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0b 8d 78 00  Error: UNC at LBA = 0x00788d0b = 7900427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 01 0b 8d 78 e0 00      00:08:23.338  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:23.338  READ DMA EXT
  42 00 02 0d 8d 78 e0 00      00:08:23.241  READ VERIFY SECTOR(S) EXT
  42 00 02 0b 8d 78 e0 00      00:08:20.264  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:20.192  READ DMA EXT

Error 14 occurred at disk power-on lifetime: 26830 hours (1117 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0b 8d 78 00  Error: UNC at LBA = 0x00788d0b = 7900427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 02 0b 8d 78 e0 00      00:08:20.264  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:20.192  READ DMA EXT
  42 00 04 0b 8d 78 e0 00      00:08:17.248  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:17.192  READ DMA EXT
  42 00 04 07 8d 78 e0 00      00:08:17.102  READ VERIFY SECTOR(S) EXT

Error 13 occurred at disk power-on lifetime: 26830 hours (1117 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0b 8d 78 00  Error: UNC at LBA = 0x00788d0b = 7900427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 04 0b 8d 78 e0 00      00:08:17.248  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:17.192  READ DMA EXT
  42 00 04 07 8d 78 e0 00      00:08:17.102  READ VERIFY SECTOR(S) EXT
  42 00 08 07 8d 78 e0 00      00:08:14.161  READ VERIFY SECTOR(S) EXT
  42 00 08 ff 8c 78 e0 00      00:08:14.102  READ VERIFY SECTOR(S) EXT

Error 12 occurred at disk power-on lifetime: 26830 hours (1117 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0b 8d 78 00  Error: UNC at LBA = 0x00788d0b = 7900427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 08 07 8d 78 e0 00      00:08:14.161  READ VERIFY SECTOR(S) EXT
  42 00 08 ff 8c 78 e0 00      00:08:14.102  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:14.102  READ DMA EXT
  42 00 10 0f 8d 78 e0 00      00:08:14.020  READ VERIFY SECTOR(S) EXT
  42 00 10 ff 8c 78 e0 00      00:08:11.071  READ VERIFY SECTOR(S) EXT

Error 11 occurred at disk power-on lifetime: 26830 hours (1117 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0b 8d 78 00  Error: UNC at LBA = 0x00788d0b = 7900427

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 10 ff 8c 78 e0 00      00:08:11.071  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:11.070  READ DMA EXT
  42 00 20 1f 8d 78 e0 00      00:08:10.985  READ VERIFY SECTOR(S) EXT
  25 00 01 00 00 00 e0 00      00:08:10.947  READ DMA EXT
  42 00 20 ff 8c 78 e0 00      00:08:07.923  READ VERIFY SECTOR(S) EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

いくつかのエラーログも出ていました。

次は、hdparm -I の出力です。
[root@hoge ~]# hdparm -I /dev/sdf

/dev/sdf:

ATA device, with non-removable media
 Model Number:       ST31000340NS                            
 Serial Number:      9xxxxxx6
 Firmware Revision:  FSC9    
 Transport:          Serial
Standards:
 Used: unknown (minor revision code 0x0029) 
 Supported: 8 7 6 5 
 Likely used: 8
Configuration:
 Logical  max current
 cylinders 16383 16383
 heads  16 16
 sectors/track 63 63
 --
 CHS current addressable sectors:   16514064
 LBA    user addressable sectors:  268435455
 LBA48  user addressable sectors: 1953525168
 Logical/Physical Sector size:           512 bytes
 device size with M = 1024*1024:      953869 MBytes
 device size with M = 1000*1000:     1000204 MBytes (1000 GB)
 cache/buffer size  = unknown
 Nominal Media Rotation Rate: 7200
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, no device specific minimum
 R/W multiple sector transfer: Max = 16 Current = ?
 Recommended acoustic management value: 128, current value: 0
 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
      Cycle time: min=120ns recommended=120ns
 PIO: pio0 pio1 pio2 pio3 pio4 
      Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
 Enabled Supported:
    * SMART feature set
      Security Mode feature set
    * Power Management feature set
    * Write cache
    * Look-ahead
    * Host Protected Area feature set
    * WRITE_BUFFER command
    * READ_BUFFER command
    * DOWNLOAD_MICROCODE
      SET_MAX security extension
    * 48-bit Address feature set
    * Device Configuration Overlay feature set
    * Mandatory FLUSH_CACHE
    * FLUSH_CACHE_EXT
    * SMART error logging
    * SMART self-test
    * General Purpose Logging feature set
    * 64-bit World wide name
    * Write-Read-Verify feature set
    * WRITE_UNCORRECTABLE_EXT command
    * {READ,WRITE}_DMA_EXT_GPL commands
    * Segmented DOWNLOAD_MICROCODE
    * Gen1 signaling speed (1.5Gb/s)
    * Gen2 signaling speed (3.0Gb/s)
    * Native Command Queueing (NCQ)
    * Phy event counters
    * Software settings preservation
    * SMART Command Transport (SCT) feature set
    * SCT Write Same (AC2)
    * SCT Error Recovery Control (AC3)
    * SCT Features Control (AC4)
    * SCT Data Tables (AC5)
      unknown 206[12] (vendor specific)
Security: 
 Master password revision code = 65534
  supported
 not enabled
 not locked
 not frozen
 not expired: security count
  supported: enhanced erase
 192min for SECURITY ERASE UNIT. 192min for ENHANCED SECURITY ERASE UNIT. 
Logical Unit WWN Device Identifier: 5000c50015b19765
 NAA  : 5
 IEEE OUI : 000c50
 Unique ID : 0yyyyyyy5
Checksum: correct

今回も ZFS で冗長化(4台でRAIDZ)して使うので、全セクタ検査などせずに、そのまま投入です。ただ今、書き込みテスト中。

2015-10-13追記
ZFS プールの 70% 程度までテストデータを書き込み、そのあと、scrub を行いましたが、一切エラーは出ませんでした。S.M.A.R.T. カウンタも特段の変化はありませんでした。もし、このまま2年くらい使えたなら、お買い得だったと振り返ることができそうに思います。

2017-04-21追記
2017-04-11 に、見事に壊れました。擦れてような異音がして、冷却ファンのどれかが壊れかかっているのかと、耳を近づけてもどこから音がしているか、なかなかわからないもんですね。zpool status をみて、この HDD が壊れたことがわかりました。いわゆるヘッドクラッシュなのか?もの凄い音でした。
週に1回、smartctl の出力を採取していましたので、壊れるまえに最後に採取されたデータ (2017-04-05のもの) を、参考に貼り付けておきます。
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   070   039   044    Pre-fail  Always   In_the_past 12104840
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       83
  5 Reallocated_Sector_Ct   0x0033   082   082   036    Pre-fail  Always       -       389
  7 Seek_Error_Rate         0x000f   070   057   030    Pre-fail  Always       -       435030066631
  9 Power_On_Hours          0x0032   055   055   000    Old_age   Always       -       39600
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       83
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       574
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       8590065669
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   054   045    Old_age   Always       -       34 (Min/Max 32/43)
194 Temperature_Celsius     0x0022   034   046   000    Old_age   Always       -       34 (0 15 0 0 0)
195 Hardware_ECC_Recovered  0x001a   034   024   000    Old_age   Always       -       12104840
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
さて、交換ですが・・・、またも同じ型番の中古 HDD を注文しました。某ショップで★5つでしたが、さてどんなもんでしょうね。
人気ブログランキングへ にほんブログ村 IT技術ブログへ