Google+

Archive

Archive for the ‘Storage’ Category

NetApp’s RAID-DP (Enhanced RAID 6)

February 11th, 2009 4 comments

Continuing the quest on RAID 6, with my previous post related to HDS’s RAID 6, here is a post about NetApp’s RAID-DP (Double Parity – Enhanced RAID 6). In some upcoming posts, we will talk about RAID 6 technology and its usage by EMC and HP. If possible will try to write the final post on comparison between each of the OEM products and how they have leverage the use of RAID 6 technology.


The following are links to my previous posts about RAID Technology

Hitachi’s RAID 6

Raid Technology Continued

Raid Types

NetApp Business Case

Similar to HDS; NetApp’s argument have been about the usage of high capacity disk drives that are FC and SATA (250GB, 300GB, 450GB, 500GB, 750GB and 1TB) which takes quite long time to rebuild and have higher failure rates. During these times the RAID Group might hit a MEDR (Media Errors During Reconstruction) and further cause a complete halt of data rebuild, possibly creating a DL (data loss) situation.


As you know the parity information that is stored on the disk is used to reconstruct the data on the new replaced drive. The time to replace the failed disk plus the time to reconstruct the data can be between 4 hours to 30 hours for these larger drives. It is a high probability, that during these times, there might be a hiccup with a bad sector/block or a legitimate drive failure in the same raid group, which can further cause data loss. The graphs related to the failure probability are in the later part of the post.


With the usage of RAID 6 (RAID-DP), two drive failures in the same RAID group can occur without data loss. NetApp is traditionally known to support RAID 0, RAID 1+0, RAID 4, and now RAID 6 (RAID-DP). The RAID 6 adaptation by NetApp comes in 2006 after HDS’s and HP’s offering to their customers.

Here is a little extract about RAID 4 (Widely used with NetApp)

Technology: Block level parity

Performance: Medium

Data Loss: With one drive failure, no data loss. With multiple drive failures in the same Raid group data loss is imminent.

Advantages: It has the highest Read data transaction rate and with a medium write data transaction rate. Data is stripped on disk creating high efficiency along with a good aggregate transfer rate. Parity is stored on a separate disk.

Disadvantages: Disk failure has medium impact on throughput. Often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk.

Here is an extract about RAID 6 (Remember RAID 6 is not exactly RAID-DP. RAID-DP is an adapted version by NetApp)

Technology: Striping Data with Double Parity, Independent Data Disk with Double Parity

Performance: Medium

Overhead: 20% to 30% overhead, with additional drives you can bring down the overhead.

Data Loss: With one drive failure and two drive failures in the same Raid Group no data loss.

Advantages: RAID 6 is essentially an extension of RAID 4 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity). Data is striped on a block level across a set of drives, just like in RAID 4, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures which typically makes it a perfect solution for mission critical applications.

Disadvantages: Poor Write performance in addition to requiring N+2 drives to implement because of two-dimensional parity scheme.

Because of the low performance of RAID 6 related to random writes, NetApp has modified the RAID 6 technology and incorporated it into its DATA ONTAP OS as RAID-DP (Double Parity) system. The big pitch from NetApp about Double Parity is, they get better Performance, less overhead and is Cost Effective Capacity Utilization. OEM’s tend to modify RAID technology to better suite their products or enhance it based on factors like speed, rebuild times, efficiency, etc.

RAID-DP is available with NetApp Data ONTAP Operating System version 6.5 and above and is offered across all different NetApp Series of Platforms. There is no added licensing, configuration, special hardware requirements with RAID-DP.

NetApp’s published data loss stats show the following with RAID 5 and RAID-DP.


The data protection offered with RAID-DP is 3800 times better than the closest competition which is RAID 5.

NetApp RAID-DP Technology

RAID-DP is a double parity protection in a RAID group. RAID-DP on NetApp is supported using 14 Data Drives and 2 Parity Disk. A traditional RAID 4 is implemented with horizontal parity structure. On RAID-DP the same principles are used to calculate the parity, the Double Parity (DP) calculations are done diagonally using row components.


With a single disk failure, RAID-DP will treat it as a normal failure and rebuild the new disk giving reconstruction of data a normal priority. With double disks failure, RAID-DP will prioritize the reconstruction of these new disks and finish the process in a shorter duration than a single rebuild.


Read Dave Hitz’s (Vice President of NetApp Engineering Division) Blog about why Double Protection RAID (RAID-DP) doesn’t waste extra disk space.


Consider the following as a simple example of single Parity (RAID 4)



You have data blocks (size possibly 4kb each), writing on 7 separate drives D0, D1, D2, D3, D4, D5, D6. Using simple mathematical equation the parity is generated as an addition of all the data, in this case the elements of a single row are added together to generate the P (Parity) information (follow the color scheme).


The sum of all elements of row 1 (row one) are 23. Now let’s say for example, your D1 drive goes belly up. Using the same equation, the D1 drive is reconstructed as 23 – 1 – 1 – 2 – 8 – 7 – 1 or the mathematical equation P – D6 – D5 – D4 – D3 – D2 – D0 yielding the final result 3.


To make things a little complex let’s look at the actual formula used to generate the Parity.

P = D0 XOR D1 XOR D2 XOR D3 XOR D4 XOR D5 XOR D6

For Row 1 your XOR (Exclusive OR) results are 15.


Now let’s take this same example but let’s add DP (Double Parity) to it and see how that works. Again for the math purposes we will leave the parity calculations based on additions rather than XOR).


The same process as discussed earlier is used to generate the P (Parity) which is the addition of all the elements of a single row D0, D1, D2, D3, D4, D5 and D6.

To generate the DP (Double Parity), diagonally one element from each row is added together. So if you see the above example, to generate DP (1), elements D0 (1), D1 (2), D2 (3), D3 (4), D4 (5), D5 (6), D6 (7) are added together.

The color scheme used above shows how the elements are added to create DP. As you stare at it, it will all make sense….

Note: The bracket above denotes ( ) ROW number.

As pointed earlier, in real life to generate the DP like the P, the row elements are XOR (Exclusive OR) together.

The equation would look like

DP (1) = D0 (1) XOR D1 (2) XOR D2 (3) XOR D3 (4) XOR D4 (5) XOR D5 (6) XOR D6 (7)

For DP (1) our XOR results should yield 13.

As you notice, the DP also includes the P (Parity) drive calculations. But for any given DP, only N numbers of drives are used for generating Double Parity. In short with a 9 drive configuration here, we only used 7 drives to generate the DP. This is because with a double failure, the DP and P will be used to reconstruct the new drives.

Let’s talk about a drive failure with RAID-DP.

Single drive failure, the process would be exactly like the example earlier where parity information is used to reconstruct the new drive.

With 2 drive failures, the use of Row level Parity and Diagonal Parity data is used to reconstruct the new drives. Also ONTAP gives a priority to data rebuild with 2 drive failures in the same RAID Group.

The rebuild process starts at the DP level with 2 drive failures. As it hits a certain Row of DP, it will try to reconstruct the data from DP first and then go to Parity to construct the second drive in a consecutive order. With visual or video that can be explained very easily.

Overhead

Traditionally RAID 6 is known to have 25% to 33% performance overhead with Random Writes. One of NetApp’s arguments to create RAID-DP was to overcome this performance hindrance.


NetApp’s base performance varies between 98% to 100% with RAID-DP.


Disk space overhead for RAID 4 on NetApp is between 18 to 25% depending on the number of drives used in a RAID group. With RAID-DP, the overhead is as low as 7.5% with 28 drive RAID Group. The overhead will increase with fewer drives, where you have 16 drives in a RAID Group.

Additional

If you are using SyncMirror with RAID-DP, 4 drive failures will be allowed before there is any data loss.

If you are currently using RAID 4 with Data ONTAP 7G you can upgrade your volumes to RAID-DP.

A Spare drive is used to create DP in a RAID-DP system.

These days, we do see a lot of customers running RAID 6 with their larger SATA and FC drives. Again I don’t think RAID 6 is the future of storage industry, but it surely is the present of the storage industry and because of it, quite a few OEM’s have jumped on RAID 6 implementation into their products.


Note: One graph above has been extracted from NetApp’s Whitepaper on RAID-DP and Implementation of RAID-DP.

Hitachi's (HDS) RAID 6

February 9th, 2009 2 comments

TO SUBSCRIBE TO STORAGENERVE BLOG

Hitachi (HDS) has been one of the pioneers in implementation of RAID 6 in their storage products. I believe the necessity of RAID 6 at HDS was initially realized back in 2004 with the release of high capacity disk drives and since then they started implementing those in 2005 with the USP’s and then later in the TagmaStore Modular Storage products.

In the next upcoming posts, we will talk about the RAID 6 technology and its usage by different OEMs like HDS, EMC and NetApp. If possible I will try to write the final post on comparison between each of these OEMs and how they have leveraged the use RAID 6.

All the OEM’s tend to modify RAID Technology in their microcode / software / firmware to better fit their product or enhance it based on various factors like speed, rebuild times, etc, prime example will be EMC’s implementation of RAID S with Symmetrix products. NetApp’s implementation of RAID DP with its products.


HDS’s Business Case

RAID 6 is available in Hitachi’s USP, WMS and the AMS disk arrays.

System and Storage Administrators are all very well versed with RAID 5 and has been using it as a standard RAID technology across all Servers and mid tier Storage. With Storage Disk Arrays the need to have RAID configuration is necessary, example RAID 1, RAID 1+0, RAID 3, RAID 5, RAID 6, RAID 10, RAID S, etc.

Hitachi products support RAID 0, RAID 1, RAID 5 and RAID 6.

RAID 5 has been common practice since the last 10 to 15 years. Now the drive sizes during these years varied from 4GB disk to 146GB SCSI or Fiber Disk (which included various different sizes like 4.3GB, 9GB, 18GB, 36GB, 50GB, 73GB and 146GB). These days, seldom you see these size drives, customers are talking about disk sizes that are minimum 300GB (FC or SATA) and go up to 1TB. Over the next 2 to 3 years, we will absolutely see disk sizes that will be between 3TB to 4TB.


Here is an abstract about RAID 5

Technology: Striping Data with Distributed Parity, Block Interleaved Distributed Parity

Performance: Medium

Overhead: 15% to 20% with additional drives in the Raid group you can substantially bring down the overhead.

Minimum Number of Drives: 3

Data Loss: With one drive failure, no data loss. With multiple drive failures in the same Raid group data loss is imminent.

Advantages: It has the highest Read data transaction rate and with a medium write data transaction rate. A low ratio of ECC (Parity) disks to data disks which converts to high efficiency along with a good aggregate transfer rate.

Disadvantages: Disk failure has medium impact on throughput. It also has most complex controller design. Often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk.

RAID 5 also relies on parity information to provide redundancy and fault tolerance using independent data disks with distributed parity blocks. Each entire data block is written onto a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads.

This would classify as one of the most favorite RAID Technologies of the past.


The rebuild time on drive sizes from 4.3GB to 146GB during off production times can be about 18 to 24 hours, during off production can be close to 4 to 8 hours. There is a risk associated with RAID 5 and having any additional drive failures in the same RAID group.

Let’s say you have a single drive failure in your RAID 5. The vendor picks up the error using the call home feature and dispatches an engineer to come onsite to replace the drive. It’s now 4 hours since the drive has failed. You as a customer haven’t seen any performance impact yet. The drive is replaced and it will take 24 hours to completely sync (rebuild) with (from) its partners in the same RAID group. So now it’s really 28 to 30 hours since your initial drive failure. During this time, if you hit one more roadblock (Read / Write hiccup or a bad sector) in the same RAID group, the data in the RAID group will be lost.

These days the normal drive size is at least 300GB or more. With FC and SATA you can have your drive size variations as 250GB, 300GB, 450GB, 500GB, 750GB and then 1TB being the latest addition. With these larger SATA drives, the rebuild times can go into 30 to 45 hours or in some cases even 100 hours. Now the window where things can really go wrong is much higher. That is one of the reasons quite a few vendors these days have introduced RAID 6.


Here is an abstract about RAID 6

Technology: Striping Data with Double Parity, Independent Data Disk with Double Parity

Performance: Medium

Overhead: 20% to 30% overhead, with additional drives you can bring down the overhead.

Minimum Number of Drives: 4

Data Loss: With one drive failure and two drive failures in the same Raid Group no data loss.

Advantages: RAID 6 is essentially an extension of RAID 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity). Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures which typically makes it a perfect solution for mission critical applications.

Disadvantages: Poor Write performance in addition to requiring N+2 drives to implement because of two-dimensional parity scheme.


Note:

Hitachi does not recommend using RAID 6 drives with high performance applications where extreme random writes are being performed. In some cases, the use of RAID 1 or RAID 1+0 is essential. There is an performance overhead associated with use of RAID 6, we will talk about it later in the post.

Probability of Data Loss with RAID 5 and RAID 6


As you see in the graph, the probability or the percentage of exposure related to RAID 5 double failures is as much as 7.5% while the chance of triple failure in a RAID 6 configuration is 0%. As the drive sizes are increasing, the usage of RAID 6 will become more prominent.


HDS’s Technology


Lets take the above as an example, we have 8 Disk Drives in a USP system.

The D1, D2, D3, D4, D5, D6 represents DATA BLOCKS and P1 and P2 (the Dual Parity).

The data blocks are followed by the parity and then the last parity drive is where the new data blocks starts to write again. With this sequential nature, the vast improvement is seen in the performance of this technology.

To make things a bit more complex and learn this technology lets introduce some mathematical formulas with implemention of RAID 6.


In the above, D0, D1, D2, D3, D4 and D5 are the Data Blocks (Stripes) and P = Calibration data and Q = Secondary Parity

Using mathematical formula’s with the Data Stripes (D0, D1, D2, D3, D4 and D5) and XOR (Exclusive OR), the P (Calibration data) is generated.

P = D0 XOR D1 XOR D2 XOR D2 XOR D4 XOR D5

Q is the product of Coefficent and Data Stripes (D0 through D5) XOR

Q = A0 * D0 XOR A0 * D1 XOR A0 * D2 XOR A0 * D3 XOR A0 * D4 XOR A0 * D5

Typically with one drive failure the P (Calibration Data) is used to Generate or rebuild the new drive, with two drive failures, the P and Q data is used to rebuild the new drive.


Risk

Here is a nice chart the shows the Risk associated with RAID 5 and RAID 6


As times elapse with the drive failure on RAID 5 (with time to respond and rebuild times), the risk associated tends to increase.

With RAID 6 and a drive failure the risk associated tends to stay the same and at 0%.

Based on different Raid Group Size, here is the risk of data loss with rebuild times.

As you can see in both the graphs, the risk associated with RAID 6 is pretty much zero percent.


Overhead

As discussed earlier, there is an additional overhead with usage of RAID 6 vs usage of RAID 5. But the risk associated with using RAID 5 is much higher than the overhead consumption by RAID 6. Here is a graph that shows the overhead associated with RAID 6.

As you see in the graph, the overhead with 6 Data drives and 2 Parity drives is only 25%. If you were running Mirroring or some other variation of RAID 5, the overhead can be between 50% to 25%. So in short even with 2 parity drives the advantages are quite greater with use of RAID 6.

From a performance standpoint, the performance of RAID 5 and RAID 6 is pretty similar when we talk about Random Read, Sequential Read and Sequential Write workloads. There is added penalty when we talk about Random Write workloads, that is because of the two dimensional parity. Compared to RAID 5, RAID 6 takes a 33% Performance hit on Hitachi with Random Write workloads.

To sum up, if you are using high capacity disk drives on your Hitachi Systems and are looking to mitigate failures, it is highly recommended you use RAID 6 on these systems.

RAID 6 is a great technology, may be the technology of present, but the future of RAID will go to a different place. Imagine you have a 20TB drive (reality of 2012 – SATA), how long will it take to rebuild that and what is the risk of triple fault with it.



Note: The graphs above have been obtained from two different documents – Hitachi’s RAID 6 Protection and Hitachi’s RAID 6 Storage Doc.

RAID Technology Continued

January 27th, 2009 No comments



RAID [Redundant Array of Independent (Inexpensive) Disk]

After reading couple of Blogs from last week regarding RAID Technology from StorageSearch and StorageIO, decided to elaborate more about the technology behind RAID and its functionality across Storage Platforms.

After I almost finished writing this blog, I ran into a Wikipedia article explaining RAID TECHNOLOGY at a much length, covering different types of RAID technologies like RAID 2, RAID 4, RAID 10, RAID 50, etc.

For example purposes, let’s say we need 5 TB of Space; each disk in this example is 1 TB each.

RAID 0

Technology: Striping Data with No Data Protection.

Performance: Highest

Overhead: None

Minimum Number of Drives: 2 since striping

Data Loss: Upon one drive failure

Example: 5TB of usable space can be achieved through 5 x 1TB of disk.

Advantages:
>
High Performance

Disadvantages: Guaranteed Data loss

Hot Spare: Upon a drive failure, a hot spare can be invoked, but there will be no data to copy over. Hot Spare is not a good option for this RAID type.

Supported: Clariion, Symmetrix, Symmetrix DMX (Meta BCV’s or DRV’s)

In RAID 0, the data is written / stripped across all of the disks. This is great for performance, but if one disk fails, the data will be lost because since there is no protection of that data.

RAID 1

Technology: Mirroring and Duplexing

Performance: Highest

Overhead: 50%

Minimum Number of Drives: 2

Data Loss: 1 Drive failure will cause no data loss. 2 drive failures, all the data is lost.

Example: 5TB of usable space can be achieved through 10 x 1TB of disk.

Advantages: Highest Performance, One of the safest.

Disadvantages: High Overhead, Additional overhead on the storage subsystem. Upon a drive failure it becomes RAID 0.
=”font-size:small;”>

Hot Spare: A Hot Spare can be invoked and data can be copied over from the surviving paired drive using Disk copy.

Supported: Clariion, Symmetrix, Symmetrix DMX

The exact data is written to two disks at the same time. Upon a single drive failure, no data is lost, no degradation, performance or data integrity issues. One of the safest forms of RAID, but with high overhead. In the old days, all the Symmetrix supported RAID 1 and RAID S. Highly recommended for high end business critical applications.

The controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair. One Write or two Reads are possible per mirrored pair. Upon a drive failure only the failed disk needs to be replaced.


RAID 1+0

Technology: Mirroring and Striping Data

Performance: High

Overhead: 50%

Minimum Number of Drives: 4

Data Loss: Upon 1 drive failure (M1) device, no issues. With multiple drive failures in the stripe (M1) device, no issues. With failure of both the M1 and M2 data loss is certain.

Example: 5TB of usable space can be achieved through 10 x 1TB of disk.

Advantages: Similar Fault Tolerance to RAID 5, Because of striping high I/O is achievable.

Disadvantages: Upon a drive failure, it becomes RAID 0.

Hot Spare: Hot Spare is a good option with this RAID type, since with a failure the data can be copied over from the surviving paired device.

Supported: Clariion, Symmetrix, Symmetrix DMX

RAID 1+0 is implemented as a mirrored array whose segments are RAID 0 arrays.


RAID 3

Technology: Striping Data with dedicated Parity Drive.

Performance: High

Overhead: 33% Overhead with Parity (in the example above), more drives in Raid 3 configuration will bring overhead down.

Minimum Number of Drives: 3

Data Loss: Upon 1 drive failure, Parity will be used to rebuild data. Two drive failures in the same Raid group will cause data loss.

Example: 5TB of usable space would be achieved through 9 1TB disk.

Advantages: Very high Read data transfer rate. Very high Write data transfer rate. Disk failure has an insignificant impact on throughput. Low ratio of ECC (Parity) disks to data disks which converts to high efficiency.

Disadvantages: Transaction rate will be equal to the single Spindle speed

Hot Spare: A Hot Spare can be configured and invoked upon a drive failure which can be built from parity device. Upon drive replacement, hot spare can be used to rebuild the replaced drive.

Supported: Clariion

RAID 5

Technology: Striping Data with Distributed Parity, Block Interleaved Distributed Parity

Performance: Medium

Overhead: 20% in our example, with additional drives in the Raid group you can substantially bring down the overhead.

Minimum Number of Drives: 3

Data Loss: With one drive failure, no data loss, with multiple drive failures in the Raid group data loss will occur.

Example: For 5TB of usable space, we might need 6 x 1 TB drives

Advantages: It has the highest Read data transaction rate and with a medium write data transaction rate. A low ratio of ECC (Parity) disks to data disks which converts to high efficiency along with a good aggregate transfer rate.

Disadvantages: Disk failure has medium impact on throughput. It also has most complex controller design. Often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk. Ask the PSE’s about RAID 5 issues and data loss?

Hot Spare: Similar to RAID 3, where a Hot Spare can be configured and invoked upon a drive failure which can be built from parity device. Upon drive replacement, hot spare can be used to rebuild the replaced drive.

Supported: Clariion, Symmetrix DMX code 71

RAID Level 5 also relies on parity information to provide redundancy and fault tolerance using independent data disks with distributed parity blocks. Each entire data block is written onto a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads.

This would classify to be the most favorite RAID Technology used today.



RAID 6

Technology: Striping Data with Double Parity, Independent Data Disk with Double Parity

Performance: Medium

Overhead: 28% in our example, with additional drives you can bring down the overhead.

Minimum Number of Drives: 4

Data Loss: With one drive failure and two drive failures in the same Raid Group no data loss. Very reliable.

Example: For 5 TB of usable space, we might need 7 x 1TB drives

Advantages: RAID 6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity). Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures which typically makes it a perfect solution for mission critical applications.

Disadvantages: Very poor Write performance in addition to requiring N+2 drives to implement because of two-dimensional parity scheme.

Hot Spare: Hot Spare can be invoked against a drive failure, built it from parity or data drives and then upon drive replacement use that hot spare to build the replaced drive.

Supported: Clariion Flare 26, 28, Symmetrix DMX Code 72, 73

Clariion Flare Code 26 supports RAID 6. It is also being implemented with the 72 code on the Symmetrix DMX. The simplest explanation of RAID 6 is double the parity. This allows a RAID 6 RAID Groups to be able to have two drive failures in the RAID Group, while maintaining access to the data.

RAID S (3+1)

Technology: RAID Symmetrix

Performance:
>
High

Overhead: 25%

Minimum Number of Drives: 4

Data Loss: Upon two drive failures in the same Raid Group

Example: For 5 TB of usable space, 8 x 1 TB drives

Advantages: High Performance on Symmetrix Environment

Disadvantages: Proprietary to EMC. RAID S can be implemented on Symmetrix 8000, 5000 and 3000 Series. Known to have backend issues with director replacements, SCSI Chip replacements and backend DA replacements causing DU or offline procedures.

Hot Spare: Hot Spare can be invoked against a failed drive, data can be built from the parity or the data drives and upon a successful drive replacement, the hot spare can be used to rebuild the replaced drive.

Supported: Symmetrix 8000, 5000, 3000. With the DMX platform it is just called RAID (3+1)

EMC Symmetrix / DMX disk arrays use an alternate, proprietary method for parity RAID that they call RAID-S. Three Data Drives (X) along with One Parity device. RAID-S is proprietary to EMC but seems to be similar to RAID-5 with some performance enhancements as well as the enhancements that come from having a high-speed disk cache on the disk array.

The data protection feature is based on a Parity RAID (3+1) volume configuration (three data volumes to one parity volume).

RAID (7+1)

Technology: RAID Symmetrix

Performance: High

Overhead: 12.5%

Minimum Number of Drives: 8

Data Loss: Upon two drive failures in the same Raid Group

Example: For 5 TB of usable space, 8 x 1 TB drives (rather you will get 7 TB)

Advantages: High Performance on Symmetrix Environment

Disadvantages: Proprietary to EMC. Available only on Symmetrix DMX Series. Known to have a lot of backend issues with director replacements, backend DA replacements since you have to verify the spindle locations. Cause of concern with DU.

Hot Spare: Hot Spare can be invoked against a failed drive, data can be built from the parity or the data drives and upon a successful drive replacement, the hot spare can be used to rebuild the replaced drive.

Supported: With the DMX platform it is just called RAID (7+1). Not supported on the Symms.

EMC DMX disk arrays use an alternate, proprietary method for parity RAID that is called RAID. Seven Data Drives (X) along with One Parity device. RAID is proprietary to EMC but seems to be similar to RAID-S or RAID5 with some performance enhancements as well as the enhancements that come from having a high-speed disk cache on the disk array.

The data protection feature is based on a Parity RAID (7+1) volume configuration (seven data volumes to one parity volume).

EMC Symmetrix / DMX SRDF Setup

January 26th, 2009 9 comments


TO SUBSCRIBE TO STORAGENERVE BLOG


This blog talks about setting up basic SRDF related functionality on the Symmetrix / DMX machines using EMC Solutions Enabler Symcli.

For this setup, let’s have two different host, our local host will be R1 (Source) volumes and our remote host will be R2 (Target) volumes.

A mix of R1 and R2 volumes can reside on the same symmetrix, in short you can configure SRDF between two Symmetrix machines to act as if one was local and other was remote and vice versa.


Step 1

Create SYMCLI Device Groups. Each group can have one or more Symmetrix devices specified in it.

SYMCLI device group information (name of the group, type, members, and any associations) are maintained in the SYMAPI database.

In the following we will create a device group that includes two SRDF volumes.

SRDF operations can be performed from the local host that has access to the source volumes or the remote host that has access to the target volumes. Therefore, both hosts should have device groups defined.

Complete the following steps on both the local and remote hosts.

a) Identify the SRDF source and target volumes available to your assigned hosts. Execute the following commands on both the local and remote hosts.

# symrdf list pd (execute on both local and remote hosts)

or

# syminq

b) To view all the RDF volumes configured in the Symmetrix use the following

# symrdf list dev

c) Display a synopsis of the symdg command and reference it in the following steps.

# symdg –h

d) List all device groups that are currently defined.

# symdg list

e) On the local host, create a device group of the type of RDF1. On the remote host, create a device group of the type RDF2.

# symdg –type RDF1 create newsrcdg (on local host)

# symdg –type RDF2 create newtgtdg (on remote host)

f) Verify that your device group was added to the SYMAPI database on both the local and remote hosts.

# symdg list

g) Add your two devices to your device group using the symld command. Again use (–h) for a synopsis of the command syntax.

On local host:

# symld –h

# symld –g newsrcdg add dev ###

or

# symld –g newsrcdg add pd Physicaldrive#

On remote host:

# symld –g newtgtdg add dev ###

or

# symld –g newtgtdg add pd Physicaldrive#

h) Using the syminq command, identify the gatekeeper devices. Determine if it is currently defined in the SYMAPI database, if not, define it, and associate it with your device group.

On local host:

# syminq

# symgate list (Check SYMAPI)

# symgate define pd Physicaldrive# (to define)

# symgate -g newsrcdg associate pd Physicaldrive# (to associate)

On remote host:

# syminq

# symgate list (Check SYMAPI)

# symgate define pd Physicaldrive# (to define)

# symgate -g newtgtdg associate pd Physicaldrive# (to associate)

i) Display your device groups. The output is verbose so pipe it to more.

On local host:

# symdg show newsrcdg |more

On remote host:

# symdg show newtgtdg | more

j) Display a synopsis of the symld command.

# symld -h

k) Rename DEV001 to NEWVOL1

On local host:

# symld –g newsrcdg rename DEV001 NEWVOL1

On remote host:

# symld –g newtgtdg rename DEV001 NEWVOL1

l) Display the device group on both the local and remote hosts.

On local host:

# symdg show newsrcdg |more

On remote host:

# symdg show newtgtdg | more

Step 2

Use the SYMCLI to display the status of the SRDF volumes in your device group.

a) If on the local host, check the status of your SRDF volumes using the following command:

# symrdf -g newsrcdg query

Step 3

Set the default device group. You can use the “Environmental Variables” option.

# set SYMCLI_DG=newsrcdg (on the local host)

# set SYMCLI_DG=newtgtdg (on the remote host)

a) Check the SYMCLI environment.

# symcli –def (on both the local and remote hosts)

b) Test to see if the SYMCLI_DG environment variable is working properly by performing a “query” without specifying the device group.

# symrdf query (on both the local and remote hosts)

Step 4

Changing Operational mode. The operational mode for a device or group of devices can be set dynamically with the symrdf set mode command.

a) On the local host, change the mode of operation for one of your SRDF volumes to enable semi-synchronous operations. Verify results and change back to synchronous mode.

# symrdf set mode semi NEWVOL1

# symrdf query

# symrdf set mode sync NEWVOL1

# symrdf query

b) Change mode of operation to enable adaptive copy-disk mode for all devices in the device group. Verify that the mode change occurred and then disable adaptive copy.

# symrdf set mode acp disk

# symrdf query

# symrdf set mode acp off

# symrdf query


Step 5

Check the communications link between the local and remote Symmetrix.

a) From the local host, verify that the remote Symmetrix is “alive”. If the host is attached to multiple Symmetrix, you may have to specify the Symmetrix Serial Number (SSN) through the –sid option.

# symrdf ping [ -sid xx ] (xx=last two digits of the remote SSN)

b) From the local host, display the status of the Remote Link Directors.

# symcfg –RA all list

c) From the local host, display the activity on the Remote Link Directors.

# symstat -RA all –i 10 –c 2

Step 6

Create a partition on each disk, format the partition and assign a filesystem to the partition. Add data on the R1 volumes defined in the newsrcdg device group.

Step 7

Suspend RDF Link and add data to filesystem. In this step we will suspend the SRDF link, add data to the filesystem and check for invalid tracks.

a) Check that the R1 and R2 volumes are fully synchronized.

# symrdf query

b) Suspend the link between the source and target volumes.

# symrdf suspend

c) Check link status.

# symrdf query

d) Add data to the filesystems.

e) Check for invalid tracks using the following command:

# symrdf query

f) Invalid tracks can also be displayed using the symdev show command. Execute the following command on one of the devices in your device group. Look at the Mirror set information.

On the local host:

# symdev show ###

g) From the local host, resume the link and monitor invalid tracks.

# symrdf resume

# symrdf query

In the next upcoming blogs, we will setup some flags for SRDF and Director types, etc.

Happy SRDF’ing!!!!!