Continuing the quest on RAID 6, with my previous post related to HDS’s RAID 6, here is a post about NetApp’s RAID-DP (Double Parity – Enhanced RAID 6). In some upcoming posts, we will talk about RAID 6 technology and its usage by EMC and HP. If possible will try to write the final post on comparison between each of the OEM products and how they have leverage the use of RAID 6 technology.
The following are links to my previous posts about RAID Technology
Hitachi’s RAID 6
Raid Technology Continued
NetApp Business Case
Similar to HDS; NetApp’s argument have been about the usage of high capacity disk drives that are FC and SATA (250GB, 300GB, 450GB, 500GB, 750GB and 1TB) which takes quite long time to rebuild and have higher failure rates. During these times the RAID Group might hit a MEDR (Media Errors During Reconstruction) and further cause a complete halt of data rebuild, possibly creating a DL (data loss) situation.
As you know the parity information that is stored on the disk is used to reconstruct the data on the new replaced drive. The time to replace the failed disk plus the time to reconstruct the data can be between 4 hours to 30 hours for these larger drives. It is a high probability, that during these times, there might be a hiccup with a bad sector/block or a legitimate drive failure in the same raid group, which can further cause data loss. The graphs related to the failure probability are in the later part of the post.
With the usage of RAID 6 (RAID-DP), two drive failures in the same RAID group can occur without data loss. NetApp is traditionally known to support RAID 0, RAID 1+0, RAID 4, and now RAID 6 (RAID-DP). The RAID 6 adaptation by NetApp comes in 2006 after HDS’s and HP’s offering to their customers.
Here is a little extract about RAID 4 (Widely used with NetApp)
Technology: Block level parity
Data Loss: With one drive failure, no data loss. With multiple drive failures in the same Raid group data loss is imminent.
Advantages: It has the highest Read data transaction rate and with a medium write data transaction rate. Data is stripped on disk creating high efficiency along with a good aggregate transfer rate. Parity is stored on a separate disk.
Disadvantages: Disk failure has medium impact on throughput. Often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk.
Here is an extract about RAID 6 (Remember RAID 6 is not exactly RAID-DP. RAID-DP is an adapted version by NetApp)
Technology: Striping Data with Double Parity, Independent Data Disk with Double Parity
Overhead: 20% to 30% overhead, with additional drives you can bring down the overhead.
Data Loss: With one drive failure and two drive failures in the same Raid Group no data loss.
Advantages: RAID 6 is essentially an extension of RAID 4 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity). Data is striped on a block level across a set of drives, just like in RAID 4, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures which typically makes it a perfect solution for mission critical applications.
Disadvantages: Poor Write performance in addition to requiring N+2 drives to implement because of two-dimensional parity scheme.
Because of the low performance of RAID 6 related to random writes, NetApp has modified the RAID 6 technology and incorporated it into its DATA ONTAP OS as RAID-DP (Double Parity) system. The big pitch from NetApp about Double Parity is, they get better Performance, less overhead and is Cost Effective Capacity Utilization. OEM’s tend to modify RAID technology to better suite their products or enhance it based on factors like speed, rebuild times, efficiency, etc.
RAID-DP is available with NetApp Data ONTAP Operating System version 6.5 and above and is offered across all different NetApp Series of Platforms. There is no added licensing, configuration, special hardware requirements with RAID-DP.
NetApp’s published data loss stats show the following with RAID 5 and RAID-DP.
The data protection offered with RAID-DP is 3800 times better than the closest competition which is RAID 5.
NetApp RAID-DP Technology
RAID-DP is a double parity protection in a RAID group. RAID-DP on NetApp is supported using 14 Data Drives and 2 Parity Disk. A traditional RAID 4 is implemented with horizontal parity structure. On RAID-DP the same principles are used to calculate the parity, the Double Parity (DP) calculations are done diagonally using row components.
With a single disk failure, RAID-DP will treat it as a normal failure and rebuild the new disk giving reconstruction of data a normal priority. With double disks failure, RAID-DP will prioritize the reconstruction of these new disks and finish the process in a shorter duration than a single rebuild.
Read Dave Hitz’s (Vice President of NetApp Engineering Division) Blog about why Double Protection RAID (RAID-DP) doesn’t waste extra disk space.
Consider the following as a simple example of single Parity (RAID 4)
You have data blocks (size possibly 4kb each), writing on 7 separate drives D0, D1, D2, D3, D4, D5, D6. Using simple mathematical equation the parity is generated as an addition of all the data, in this case the elements of a single row are added together to generate the P (Parity) information (follow the color scheme).
The sum of all elements of row 1 (row one) are 23. Now let’s say for example, your D1 drive goes belly up. Using the same equation, the D1 drive is reconstructed as 23 – 1 – 1 – 2 – 8 – 7 – 1 or the mathematical equation P – D6 – D5 – D4 – D3 – D2 – D0 yielding the final result 3.
To make things a little complex let’s look at the actual formula used to generate the Parity.
P = D0 XOR D1 XOR D2 XOR D3 XOR D4 XOR D5 XOR D6
For Row 1 your XOR (Exclusive OR) results are 15.
Now let’s take this same example but let’s add DP (Double Parity) to it and see how that works. Again for the math purposes we will leave the parity calculations based on additions rather than XOR).
The same process as discussed earlier is used to generate the P (Parity) which is the addition of all the elements of a single row D0, D1, D2, D3, D4, D5 and D6.
To generate the DP (Double Parity), diagonally one element from each row is added together. So if you see the above example, to generate DP (1), elements D0 (1), D1 (2), D2 (3), D3 (4), D4 (5), D5 (6), D6 (7) are added together.
The color scheme used above shows how the elements are added to create DP. As you stare at it, it will all make sense….
Note: The bracket above denotes ( ) ROW number.
As pointed earlier, in real life to generate the DP like the P, the row elements are XOR (Exclusive OR) together.
The equation would look like
DP (1) = D0 (1) XOR D1 (2) XOR D2 (3) XOR D3 (4) XOR D4 (5) XOR D5 (6) XOR D6 (7)
For DP (1) our XOR results should yield 13.
As you notice, the DP also includes the P (Parity) drive calculations. But for any given DP, only N numbers of drives are used for generating Double Parity. In short with a 9 drive configuration here, we only used 7 drives to generate the DP. This is because with a double failure, the DP and P will be used to reconstruct the new drives.
Let’s talk about a drive failure with RAID-DP.
Single drive failure, the process would be exactly like the example earlier where parity information is used to reconstruct the new drive.
With 2 drive failures, the use of Row level Parity and Diagonal Parity data is used to reconstruct the new drives. Also ONTAP gives a priority to data rebuild with 2 drive failures in the same RAID Group.
The rebuild process starts at the DP level with 2 drive failures. As it hits a certain Row of DP, it will try to reconstruct the data from DP first and then go to Parity to construct the second drive in a consecutive order. With visual or video that can be explained very easily.
Traditionally RAID 6 is known to have 25% to 33% performance overhead with Random Writes. One of NetApp’s arguments to create RAID-DP was to overcome this performance hindrance.
NetApp’s base performance varies between 98% to 100% with RAID-DP.
Disk space overhead for RAID 4 on NetApp is between 18 to 25% depending on the number of drives used in a RAID group. With RAID-DP, the overhead is as low as 7.5% with 28 drive RAID Group. The overhead will increase with fewer drives, where you have 16 drives in a RAID Group.
If you are using SyncMirror with RAID-DP, 4 drive failures will be allowed before there is any data loss.
If you are currently using RAID 4 with Data ONTAP 7G you can upgrade your volumes to RAID-DP.
A Spare drive is used to create DP in a RAID-DP system.
These days, we do see a lot of customers running RAID 6 with their larger SATA and FC drives. Again I don’t think RAID 6 is the future of storage industry, but it surely is the present of the storage industry and because of it, quite a few OEM’s have jumped on RAID 6 implementation into their products.
Note: One graph above has been extracted from NetApp’s Whitepaper on RAID-DP and Implementation of RAID-DP.