One article to understand SAN network composition and daily operation and maintenance

One article to understand SAN network composition and daily operation and maintenance

[[268180]]

1. What is SAN network?

  • Baidu Encyclopedia:

Storage Area Network (SAN) uses Fibre Channel (FC, different from Fiber Channel) technology to connect storage arrays and server hosts through FC switches to establish a regional network dedicated to data storage. After more than ten years of development, SAN has become quite mature and has become the de facto standard in the industry (but the fiber switching technology of each manufacturer is not exactly the same, and there are compatibility requirements between their servers and SAN storage).

SAN focuses on the unique problems of enterprise-level storage. The two root causes of the problems encountered by current enterprise storage solutions are: the structural limitations caused by the close integration of data and application systems, and the limitations of the Small Computer System Interface (SCSI) standard. Most analysts believe that SAN is the future enterprise-level storage solution because SAN is easy to integrate, can improve data availability and network performance, and can also reduce management operations.

  • Personal understanding:

The explanation from Baidu Encyclopedia seems to be always so obscure and difficult to understand, but it is the so-called official definition after all. The following is mainly about what SAN network is in my eyes:

SAN network (Storage Area Network, SAN for short), as the name implies, is a storage area network. SAN network originally mainly refers to FC-SAN. Of course, at the current stage, the common SANs include FC-SAN, IP-SAN, and IB-SAN. Among them, FC-SAN forwards SCSI protocol through Fibre Channel protocol, and IP-SAN forwards SCSI protocol through TCP protocol. Here we mainly talk about FC-SAN.

When thinking of SAN, I think of a picture. Multiple hosts are connected to back-end storage and tape libraries through fiber optic switches. As for how to connect them, it depends on the specific needs, as shown in the figure.

All subsequent content will be based on this diagram. During the operation and maintenance of SAN, we should improve and optimize based on this diagram.

2. SAN network composition

Combined with the last picture in the first chapter, the SAN network consists of the following parts: host layer, switching layer, storage layer, and SAN network monitoring. Next, we will explain the specific details of the SAN network based on the above parts.

2.1 Host layer

The ultimate goal of any SAN network is to build a support platform for the upper-level business system and store data. Because the business system is on the OS, the underlying hardware part is the host part, whether it is a virtualized environment or a physical machine. Directly dealing with the SAN network still needs to be connected through physical hardware, and then there will be further settings of the virtualization platform.

The host level is mainly divided into two camps: X86 and minicomputers. As technology continues to mature and there are more and better choices for business solutions, the X86 camp has made great progress and may also be an overwhelming trend.

There are many brands of host equipment on the market, which are not listed here one by one. If the host wants to connect to the SAN network, it is necessary to configure the HBA card on the host side. Common manufacturers include emulx, qlogic, IBM, DELL, Brocade, etc. The cards of Emulex and QLogic are posted below.

As shown in the figure:

The host HBA cards used in enterprises are generally LC interface type. Since most switches are also LC interface type, the optical fibers we use are mostly LC-LC type.

HBA cards have undergone many years of development, and the interface rate has developed from the original 1GB, 2GB, to 4GB, 8GB cards in previous years, and then to 16GB in recent years. At present, the mainstream in the market is still 8GB and 16GB HBA cards, and the interface has also developed from the original single-port card to the current dual-port and quad-port. New orders in the market rarely buy single-port cards, unless they are in stock, etc. Below I post a picture of the development process of FC. This picture was also found in other articles. I saved it because it feels very clear.


Fiber Optic Cable:

In order for the HBA card to connect to the fiber optic switch, there must be a cable in the middle. Here is a fiber optic cable with an LC-LC interface type commonly used in enterprises.

[[268183]]

Of course, there are other types of fiber optic cable interfaces, such as SC, FC, ST, etc., but the most commonly used is the multimode LC-LC interface type. I will not elaborate on the difference between single-mode and multi-mode here. If you are interested, you can directly search on Baidu.

2.2 Exchange Layer

After introducing the host, HBA card and optical fiber cable, let's introduce the content of the switching layer, the core components of which are FC optical fiber switch and SFP module. The optical fiber switch market is also very chaotic, and each hardware manufacturer basically has its own products, of course, there are independent technical products of the manufacturer, and there are also related products launched by OEM product models. The following figure shows the products of major mainstream manufacturers and their relationship, as shown in the figure (please visit the corresponding official website for the latest products):

Here is a special picture showing Brocade's corresponding product pictures. The latest products can be found on the corresponding official website.

I believe that everyone has more or less come into contact with many fiber optic Ethernet switches. A small SAN network can basically support the entire company's SAN network environment through one or several fiber optic switches working independently or cascading. In an extra-large SAN network, relying solely on the cascading of fiber optic switches is very inefficient and has a high risk of failure. At this time, a router is needed. Large TCP/IP networks need routers, and large SAN networks also need routers. Through the routing function, the interconnection and communication between switches can be realized to form a large and complex network. However, since there is no clear regulation on the standard, there are many problems with the routing between major manufacturers, and often the mutual routing of their own products has good compatibility.

The mainstream fiber optic switches in the current market have interface rates of 16GB and 32GB, and ports range from 8 to 80. More ports need to be able to support fiber optic switches inserted into the backplane to meet the demand for port expansion. The product can refer to the previous picture. The purchase of fiber optic switches can also be purchased, activated and expanded on demand. In other words, when buying a fiber optic switch, you can initially purchase only 1/4 or 1/2 port configurations that meet business needs, and then expand the sfp module and activated license as needed.

Here I post a screenshot of the sfp module:

There are two main ways to use fiber switches in small enterprises: independent and cascaded. Independent is easy to understand, and it is not mixed with other environments. In this case, it is basically used when the environment is relatively small and simple. Whenever there is a mixed demand: for example, there are many hosts that need to be connected to the fiber switch, and redundancy and sharing are often required, then cascading needs to be considered to expand the number of switch ports.

2.2.1 Cascade:

Brocade switch cascading uses the ISL method. ISL is the abbreviation of Inter-switch link, which is a method of connecting two SAN switches using E-port ports.

So when cascading fiber switches, you need to consider several aspects and be prepared in advance.

1. Cascaded license

2. Switch microcode

3. Setting Domain ID

4. Selection of cascade mode

A project I have done:

B80 cascades the above switches respectively. B80 and B5K run core services, and B24 runs development and testing. The cascading between switches is carried out through two cables, as shown in the figure.

The B80 port type after cascading is as shown in the figure:

As for how to cascade switches between different manufacturers, you first need to check the device compatibility manual and try to avoid direct interconnection between switches of different brands to maintain good compatibility.

2.2.2 Zone Settings

The most common operation at the switch level is the zone division setting. Common zone division methods include alias-based, port-based, and WWN-based. Here is a brief introduction to zone settings. For detailed usage methods, you can find relevant documents.

Several steps need to be sorted out to divide the zone

1. Domain id needs to be set

2. Zonecreate creates a zone

3. Cfgcreate creates a profile

4. Add the zone created in step 2 to the profile created in step 3

5. Save the configuration profile

6. Activate profile to take effect

The principle and implementation are similar. The current mainstream in the market is the zone division settings of Brocade and CISCO products. The configuration methods are slightly different, but not difficult to understand. CISCO's zone is mainly the concept of vsan and zoneset. Those who have experience in CISCO operation should not be unfamiliar with the operation. Here are two examples based on Brocade and CISCO zone division for reference:

(1) Brocade

Zonecreate "power750_ds8100","1,20;1,0;1,1"

Cfgcreate "b80_config"

Cfgadd “b80_config”, “power750_ds8100”

Cfgsave

Cfgenable b80_config

(2) CISCO:

Create a zone

conf t

zone name crm vsan 100

mem interface fc 1/20,fc 1/21,fc 1/22

exit

Create a zoneset, add zone to zoneset

zoneset name zoneset_crm vsan 100

mem zone_crm

exit

Activate zoneset

zoneset activate name zoneset_crm vsan 100

copy running-config startup-config

The above two examples are just a small scenario of zone division settings in work, and I hope they can be used as a reference for everyone. In the operation and maintenance process of switches, there are many commands that need to be familiar and mastered, which will save time when diagnosing faults. Everyone needs to accumulate knowledge for specific switch series, and of course, monitoring tools can also be used.

2.3 Storage Layer

The storage layer is just a summary, mainly referring to the devices connected to the fiber optic switch for providing data storage, such as storage devices, tape library devices, NAS devices, etc. Most storage devices have two or more controllers, which are connected to the fiber optic switch through fiber optic devices, and the corresponding zones are configured on the switch to identify the host, map to the host, and finally complete the operation of host device storage, tape library and other related devices.

Storage connected to Fiber Switch:

  • The interface rate of storage SFP is usually 8GB or 16GB, which is relatively high. It is recommended to connect to the core switch to serve a large number of hosts.
  • The connection between the storage and the fiber optic switch storage controller must ensure redundancy. It is recommended to connect at least 2 jumpers to each controller, and 4 jumpers are ideal.

There is not much to say about the devices at this layer. In the SAN network, they are mainly connected to the Switch, identify the host, divide the LUN and map it to the host. We will not introduce other functions such as logical groups, LUN stripe size, snapshots, etc. in detail here.

2.4 Monitoring Layer

Monitoring is a powerful tool for operation and maintenance. How to use monitoring well should be an important thing for our operation and maintenance. Only in this way can we be one step ahead of others. In this era where data is king, the quality of service directly affects the future path and direction.

In fact, monitoring alone is not very accurate and incomplete, but it is indispensable. In terms of SAN operation and maintenance, except for large companies and the Internet industry, it is rare for general enterprises to have SAN monitoring suggestions. Here I would like to share the good experiences of community members in this regard.

Product selection:

  1. Open source software includes Stor2rrd, and commercial software includes Brocade Network Advisor, IBM TPC, and Solarwinds Storage Resource Monitor.
  2. Brocade Network Advisor + IBM TPC, Brocade Network Advisor chooses the free model
  3. IBM TPC, HP SOM Foreign products are expensive and lack customized support. Domestic products are also flexible, and the principle is based on collecting device configuration data such as SMI-S and SNMP.
  4. nagios zabbix cactic.

Regarding the monitoring of fiber optic switches, it would be best if you could use the related products provided by the product manufacturers. Some products can monitor a complete path based on host links, fiber optic switches, and storage. They have a wide coverage and are more complicated to implement. You can choose according to the needs of your company.

Storage Management Monitoring:

Currently, storage management and monitoring are all based on SMI-S, SNMP, CLI and other methods to collect data on devices (including configuration, capacity, performance, alarms, etc.). There are many such products, and the difference between manufacturers lies in the device support and compatibility.

Function:

  1. Automatic SAN topology
  2. Configuring asset information
  3. Capacity Information
  4. Performance Information
  5. Warning information
  6. Reports, etc.

3. Common Problems in SAN Network Operation and Maintenance

The following mainly introduces several typical problems from community exchanges. Thanks to: ACDante, Pan Yansheng, aix7, crystalwmagic, oniontech, fuwangrong and other brothers for sharing.

1. How to avoid SAN network cable chaos and how to plan line connections

Cabling is inseparable from the installation planning of computer room cabinets, switches, and equipment. The most suitable plan should be formulated according to the environment. For example, the equipment installation, whether the switch is EOR, MOR or TOR architecture, whether there is a large-log fiber distribution frame, whether the record name of the label is standardized, and the standardization of the names of cabinets, equipment, and switches also need to be considered.

2. Smooth data migration in SAN environment

When conditions permit, use advanced tools to help us complete the migration work, such as:

1) Command

mirrorvg, migratepv, migratelp, mklvcopy, cplv, backup, restore, etc. on AIX platform

2) Storage function

Snapshots, storage replication, virtual storage gateway, etc.

3. SAN environment troubleshooting

1) Indicator lights (understand the meaning of various indicator lights)

2) Is it a common or isolated problem? For example, an IO problem: I once encountered a situation where the IO performance of multiple nodes in a cluster environment degraded. The ultimate problem was caused by a controller problem of ds8000.

3) HBA card failure

Example: The alarm light on the storage side is on. Log in to the storage manager to check and find that the link is switched. Because of similar experience, it is directly located that it is caused by the abnormality of the HBA card in the host of the VMware cluster. Directly check the link status and HBA card hardware status, and check the switch port to quickly locate the faulty HBA

4) Scanning issues

For example, I once configured a cluster with three nodes in a vmware environment. When adding storage, the scan was abnormally slow. The operation that should have been completed in a few seconds lasted nearly a minute. The use effect was not good, which was manifested in io and response problems. Finally, the hba card was replaced, and the scan was normal immediately, and everything was OK.

5) Abnormal switch connection

For example: I once configured a B24 device through ssh connection. Since the device had been in use, the connection was disconnected immediately after it was established and I could not connect at all. Later, the serial port connection was also OK. The diagnosis using telnet port and ping IP address both showed OK, but ssh and telnet could not log in. Finally, I changed the network cable and it became normal.

Sometimes the switch's telnet service will stop responding, and you will need to log in using the serial port. Restarting the telnet service can also solve the problem.

6) Two application skills of SAN network

http://www.talkwithtrend.com/Question/408427

4. Summary

This article introduces and explains SAN networks from big to small, from the whole to the details, analyzes them appropriately from composition to principle, and shares many problems in operation and maintenance work in the form of examples. What we need to understand is:

A good plan is the basis of good operation and maintenance

A good monitoring is the guarantee of operation and maintenance

We need to pay attention to each detail during the operation and maintenance process.

We need to summarize and learn from each failure and experience

The technologies and products involved in this article are subject to the latest official information. If you have any questions, please click the end of the article to read the original text and leave comments in the community.

Recommended materials/articles:

SAN Complete Manual, very valuable

http://www.talkwithtrend.com/Document/detail/tid/162771

<<:  The 5G infrastructure dividend ignored in the Internet Queen's report

>>:  Five changes that 5G will bring to operators

Recommend

Half of the world's websites use HTTPS: HTTP is being phased out

In the early years, the data transmitted by the H...

How to wire the LAN in a new house

How to build a simple, stable, reliable and high-...

5G is a hot topic, but 4G module application data is still rising

According to the latest data released by the Mini...

Communication styles in microservices architecture

In a microservices architecture, communication is...

Image Gallery: TCP/IP Protocol Suite and Security

OSI and DoD Reference Model Relationship between ...

5G latency is less than 1 millisecond and will it replace Wi-Fi? Not true!

As the fifth generation of mobile communication n...

Do you know the misunderstandings about 5G?

Today I will reveal to you five misunderstandings...

What is Wi-Fi 6?

Wi-Fi plays an indispensable role in modern peopl...

What Internet speed do smart homes need?

The Internet of Things (IoT) is changing the way ...

6 small Windows tools that kill a lot of paid software

[[374946]] I am usually a software addict and can...