Wednesday, September 23, 2015

[Cluster Computing]: How to Configure Linux Cluster with 2 Nodes on RedHat and CentOS

How to Configure Linux Cluster with 2 Nodes on RedHat and CentOS

In an active-standby Linux cluster configuration, all the critical services including IP, filesystem will failover from one node to another node in the cluster.
This tutorials explains in detail on how to create and configure two node redhat cluster using command line utilities.
The following are the high-level steps involved in configuring Linux cluster on Redhat or CentOS:
  • Install and start RICCI cluster service
  • Create cluster on active node
  • Add a node to cluster
  • Add fencing to cluster
  • Configure failover domain
  • Add resources to cluster
  • Sync cluster configuration across nodes
  • Start the cluster
  • Verify failover by shutting down an active node
Red Hat Cluster

1. Required Cluster Packages

First make sure the following cluster packages are installed. If you don’t have these packages install them using yum command.
[root@rh1 ~]# rpm -qa | egrep -i "ricci|luci|cluster|ccs|cman"
modcluster-0.16.2-28.el6.x86_64
luci-0.26.0-48.el6.x86_64
ccs-0.16.2-69.el6.x86_64
ricci-0.16.2-69.el6.x86_64
cman-3.0.12.1-59.el6.x86_64
clusterlib-3.0.12.1-59.el6.x86_64

2. Start RICCI service and Assign Password

Next, start ricci service on both the nodes.
[root@rh1 ~]# service ricci start
Starting oddjobd:                                          [  OK  ]
generating SSL certificates...  done
Generating NSS database...  done
Starting ricci:                                            [  OK  ]
You also need to assign a password for the RICCI on both the nodes.
[root@rh1 ~]# passwd ricci
Changing password for user ricci.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Also, If you are running iptables firewall, keep in mind that you need to have appropriate firewall rules on both the nodes to be able to talk to each other.

3. Create Cluster on Active Node

From the active node, please run the below command to create a new cluster.
The following command will create the cluster configuration file /etc/cluster/cluster.conf. If the file already exists, it will replace the existing cluster.conf with the newly created cluster.conf.
[root@rh1 ~]# ccs -h rh1.mydomain.net --createcluster mycluster
rh1.mydomain.net password:

[root@rh1 ~]# ls -l /etc/cluster/cluster.conf
-rw-r-----. 1 root root 188 Sep 26 17:40 /etc/cluster/cluster.conf
Also keep in mind that we are running these commands only from one node on the cluster and we are not yet ready to propagate the changes to the other node on the cluster.

4. Initial Plain cluster.conf File

After creating the cluster, the cluster.conf file will look like the following:
[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="mycluster">
  <fence_daemon/>
  <clusternodes/>
  <cman/>
  <fencedevices/>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

5. Add a Node to the Cluster

Once the cluster is created, we need to add the participating nodes to the cluster using the ccs command as shown below.
First, add the first node rh1 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh1.mydomain.net
Node rh1.mydomain.net added.
Next, add the second node rh2 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh2.mydomain.net
Node rh2.mydomain.net added.
Once the nodes are created, you can use the following command to view all the available nodes in the cluster. This will also display the node id for the corresponding node.
[root@rh1 ~]# ccs -h rh1 --lsnodes
rh1.mydomain.net: nodeid=1
rh2.mydomain.net: nodeid=2

6. cluster.conf File After Adding Nodes

This above will also add the nodes to the cluster.conf file as shown below.
[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="3" name="mycluster">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="rh1.mydomain.net" nodeid="1"/>
    <clusternode name="rh2.mydomain.net" nodeid="2"/>
  </clusternodes>
  <cman/>
  <fencedevices/>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

7. Add Fencing to Cluster

Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
A fence device is a hardware device that can be used to cut a node off from shared storage.
This can be accomplished in a variety of ways: powering off the node via a remote power switch, disabling a Fiber Channel switch port, or revoking a host’s SCSI 3 reservations.
A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node’s shared storage (via powering off the node or removing access to the shared storage by other means).
Execute the following command to enable fencing.
[root@rh1 ~]# ccs -h rh1 --setfencedaemon post_fail_delay=0
[root@rh1 ~]# ccs -h rh1 --setfencedaemon post_join_delay=25
Next, add a fence device. There are different types of fencing devices available. If you are using virtual machine to build a cluster, use fence_virt device as shown below.
[root@rh1 ~]# ccs -h rh1 --addfencedev myfence agent=fence_virt
Next, add fencing method. After creating the fencing device, you need to created the fencing method and add the hosts to the fencing method.
[root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh1.mydomain.net
Method mthd1 added to rh1.mydomain.net.

[root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh2.mydomain.net
Method mthd1 added to rh2.mydomain.net.
Finally, associate fence device to the method created above as shown below:
[root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh1.mydomain.net mthd1
[root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh2.mydomain.net mthd1

8. cluster.conf File after Fencing

Your cluster.conf will look like below after the fencing devices, methods are added.
[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="10" name="mycluster">
  <fence_daemon post_join_delay="25"/>
  <clusternodes>
    <clusternode name="rh1.mydomain.net" nodeid="1">
      <fence>
        <method name="mthd1">
          <device name="myfence"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="rh2.mydomain.net" nodeid="2">
      <fence>
        <method name="mthd1">
          <device name="myfence"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman/>
  <fencedevices>
    <fencedevice agent="fence_virt" name="myfence"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

9. Types of Failover Domain

A failover domain is an ordered subset of cluster members to which a resource group or service may be bound.
The following are the different types of failover domains:
  • Restricted failover-domain: Resource groups or service bound to the domain may only run on cluster members which are also members of the failover domain. If no members of failover domain are availables, the resource group or service is placed in stopped state.
  • Unrestricted failover-domain: Resource groups bound to this domain may run on all cluster members, but will run on a member of the domain whenever one is available. This means that if a resource group is running outside of the domain and member of the domain transitions online, the resource group or
  • service will migrate to that cluster member.
  • Ordered domain: Nodes in the ordered domain are assigned a priority level from 1-100. Priority 1 being highest and 100 being the lowest. A node with the highest priority will run the resource group. The resource if it was running on node 2, will migrate to node 1 when it becomes online.
  • Unordered domain: Members of the domain have no order of preference. Any member may run in the resource group. Resource group will always migrate to members of their failover domain whenever possible.

10. Add a Filover Domain

To add a failover domain, execute the following command. In this example, I created domain named as “webserverdomain”,
[root@rh1 ~]# ccs -h rh1 --addfailoverdomain webserverdomain ordered
Once the failover domain is created, add both the nodes to the failover domain as shown below:
[root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1

[root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh2.mydomain.net priority=2
You can view all the nodes in the failover domain using the following command.
[root@rh1 ~]# ccs -h rh1 --lsfailoverdomain
webserverdomain: restricted=0, ordered=1, nofailback=0
  rh1.mydomain.net: 1
  rh2.mydomain.net: 2

11. Add Resources to Cluster

Now it is time to add a resources. This indicates the services that also should failover along with ip and filesystem when a node fails. For example, the Apache webserver can be part of the failover in the Redhat Linux Cluster.
When you are ready to add resources, there are 2 ways you can do this.
You can add as global resources or add a resource directly to resource group or service.
The advantage of adding it as global resource is that if you want to add the resource to more than one service group you can just reference the global resource on your service or resource group.
In this example, we added the filesystem on a shared storage as global resource and referenced it on the service.
[root@rh1 ~]# ccs –h rh1 --addresource fs name=web_fs device=/dev/cluster_vg/vol01 mountpoint=/var/www fstype=ext4
To add a service to the cluster, create a service and add the resource to the service.
[root@rh1 ~]# ccs -h rh1 --addservice webservice1 domain=webserverdomain recovery=relocate autostart=1
Now add the following lines in the cluster.conf for adding the resource references to the service. In this example, we also added failover IP to our service.
  <fs ref="web_fs"/>
  <ip address="192.168.1.12" monitor_link="yes" sleeptime="10"/>
In the 2nd part of this tutorial (tomorrow), we’ll explain how to sync the configurations across multiple nodes in a cluster, and how to verify the failover scenario in a cluster setup.


 

No comments: