Interworx Clustering Overview

What is clustering?

It is a very generic term in computing, but in general:

A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. - Wikipedia "Computer Cluster"


In the web hosting industry, clustering allows hosting companies to use multiple servers to serve http requests, run ftp services, and handle email with the illusion that the services are on one server. The historical problem with clustering configurations for most hosting companies is the setup and maintenance processes. Setting up and maintaining a cluster traditionally requires a system administrator to architect, install, and monitor the cluster. One must consider questions such as:

  • How is data synchronized across servers?
  • Where is data stored?
  • How are incoming requests delegated to the servers in the cluster?
  • What happens if a server requires maintenance?
  • How are account setups to be handled across all servers?
  • Where do users manage their website from? Can they?

Some of these questions can be answered by simply purchasing a hardware solution that serves the purpose. These solutions work well, but at exorbitant cost for the average web host. While purchasing a hardware solution may solve the technical problems, the long term management problems will remain.

Why InterWorx?

The InterWorx Control Panel immensely simplifies the process. Our solution allows you to harness the power of multiple InterWorx servers working in cooperation to present to the internet one http, https, smtp, imap, pop3, and ftp server. These services will appear to be served from one IP/machine, regardless of the server in the cluster actually responsible for handling the request. The many benefits of using our cluster are:

  • A single InterWorx instance is capable of handling much heavier traffic loads on cheaper hardware.
  • A single InterWorx instance is capable of handling much higher numbers of sites on cheaper hardware.
  • Each server can be designated a single service to handle if desired.
  • Alternatively, incoming requests can be load balanced across all machines in the cluster so the workload is shared and there is redundancy should one of the servers in the cluster go down (minus the cluster manager).
  • The ability to expand the power of the cluster later as demand rises simply by adding more servers to it.
  • Easy setup, and maintenece through the convenience of our control panel.

This documentation will go over how the InterWorx system operates, and demonstrate valid cluster configurations.

Definitions

  • InterWorx Cluster: An InterWorx Cluster is two or more servers running the InterWorx Control Panel software, with one of the servers playing the role of Cluster Manager, and the other server(s) playing the role of Cluster Node.
  • Cluster Manager (CM): In an InterWorx Cluster, the Cluster Manager is a server running InterWorx software that has been set up in Cluster Manager mode. This is the master server for the InterWorx Cluster. Usually, this is the server that should be logged into when accessing InterWorx for the cluster. The CM server serves as the load balancer for the cluster. Cluster managers can play other roles as well, which is described in the Cluster Manager section below.
  • Cluster Node (Node): In an InterWorx Cluster, a Cluster Node is a server running InterWorx software that has been set up in Cluster Node mode. Cluster Nodes are added to a cluster via the NodeWorx interface of a Cluster Manager server. Cluster Nodes are usually AppServers.
  • AppServer: The term "AppServer" is used to refer to a server in the cluster which is tasked with running hosting services (web, mail, dns, etc). Cluster Nodes are usually AppServers. The Cluster Manager can also be an AppServer (usually referred to as AppServer0), but this is not required.
  • Load Balancer: The software which distributes incoming requests to Nodes based on the policy that it configured via NodeWorx on the Cluster Manager server.
  • Load Balancer Policy: These are rules that the load balancer follows to distribute incoming requests. They include the IP to listen for requests on, the service and port the policy applies to, which Nodes in the cluster are to receive load-balanced requests, and the rule to use when deciding which server gets requests when.
  • Command Queue: The Command Queue is the mechanism which replicates and synchronizes changes made via InterWorx on the Cluster Manager, to the Cluster Nodes. The Command Queue process runs on Cluster Nodes only.

InterWorx Cluster Setup

There is a wide array of options when deciding how you want to setup your cluster. In order to fully address this, we have a section dedicated to cluster setup.

Data and Service Synchronization

A critical part of a functional cluster is data and service synchronization among the servers in the cluster. InterWorx expects all servers in the cluster to share a storage device for user's web site and mail storage. This device can be the Cluster Manager server (shared via NFS), or it can be an external shared storage device.

System service configurations are kept in sync via the InterWorx Command Queue. As system service configurations are slightly different from server to server, changes to system services must occur on a per-server basis. A change to the port SSH uses, for example, can't be done by changing a file on a shared storage device. The Command Queue daemon runs on all Cluster Nodes, and performs tasks on the Node when necessary, to keep various system configuration in sync.

The Command Queue can also be used to trigger events such as web-server restarts on all the Cluster Nodes, via a restart request made via the Cluster Manager.

Load Balancing

The load balancer, provided by the Linux Virtual Server Project (LVS), is a robust software load balancing solution. LVS has many large installations and is a proven technology. If you're already familiar with LVS you'll be happy to know that you can manage LVS configuration from the command line as you can with most InterWorx services. We also provide a graphical interface on the Cluster Manager to configure all aspects of the load balancing setup, including RRD graphs so you can track the load balancer statistics over time.

InterWorx Control Panel is currently capable of load balancing the following services:

  • HTTP
  • HTTPS
  • IMAP
  • SMTP
  • POP3

The load balancer documentation can be found here.

Clustering Mini-FAQ

Q. Is this a true load balancing solution?

A. Yes, domains are served from every clustered node and the cluster manager acts as the load balancer for the cluster using LVS.

Q. Don't session based scripts break when used in a cluster?

A. No, sessions are handled by LVS persistence (node affinity) where a given client IP is directed to the same cluster node over a period of time, thus saving the session state. For applications with database-based sessions, the persistence is not required.

Q. What box do my users connect to for mail, FTP, etc?

A. This is up to you. Users can connect to any AppServer, including the Cluster Manager to use IMAP/POP3/FTP services. Users can also send mail from any box in the cluster. It comes down to an architectural decision you must make as to what IPs to publish to your users for a given service, and what load balancing rules you setup.

Q. How does the load balancer work?

A. Load balancer documentation

Q. What platforms does the clustering solution run on?

A. Redhat Enterprise 4, CentOS 4, Redhat Enterprise 5, and CentOS 5 operating systems are all supported.

Q. Can the servers in my cluster be running different OS's?

A. No, all servers (CM and Nodes) MUST have the same OS.

Q. Can servers in the cluster be of different architectures?

A. No, all servers in a cluster MUST be either i386 OR x86_64.

Q. Does the clustering solution cost more than the base product?

A. No, any non-VPS unlimited domain license can run the clustering solution.

Q. How is DNS handled, is there a master/slave setup between the nodes?

A. InterWorx Control Panel uses a database back-end to store all DNS data. This means that each node in the cluster, including the Cluster Manager, sees the exact same DNS data. You can simply use any of the nodes as a full fledged DNS server for sites in the cluster without having to worry about setting up any master/slave system. It is a truly clustered DNS setup.

Q. Does each cluster node require a license?

A. Yes, every node in the cluster requires an unlimited domain InterWorx Control Panel license.

Q. Is this a high-availability solution?

A. Currently, the Cluster Manager server is still a single point of failure in an InterWorx cluster setup. If the Cluster Manager goes down, the entire cluster goes down. However, if one or more services on a cluster node becomes unavailable, the Cluster Manager will detect this failure and remove that node from the server's load balancing policy, until the the service on that node is restored. This means that you can temporarily shut down services, or even the entire node, and there will be no problem, provided the remaining servers in the cluster can handle the load normally handled by the downed server. High-availability support is in active development.