Wednesday, January 11, 2006

How Multiple Server Hosting impacts your website's uptime

by: Godfrey E. Heron

This article describes the technology behind multiple server hosting and how you may utilize it to maximize your site's security and uptime Hosting of web sites has essentially become a commodity. There is very little distinguishing one hosting company from the next. Core plans and features are the same and price is no longer a true determining feature. In fact, choosing a host based on the cheapest price can be more expensive in the long term with respect to reliability issues and possible loss of sales as a result of website downtime. Selecting a host from the thousands of providers and resellers can be a very daunting task, which may result in a hit and miss approach. But although hosting may have become a commodity, one distinguishing feature that you must always look out for is reliability. At the heart of any hosting company's reliability is redundancy. This ensures that if a problem exists at one point, there will be an alternative which ensures continuity as seemlessly and transparently as possible.

Most hosts do employ redundant network connections. These are the high speed pipes that route data from the server to your web browser. But, redundant 'multiple web servers' have been extremely rare and very expensive, requiring costly routing equipment which has previously been used only in mission critical applications of Fortune 500 companies. However, a very neat but little known Domain Name Server(DNS) feature called 'round robin' allows the selection and provision of a particular IP address from a 'pool' of addresses when a DNS request arrives. To understand what this has to do with server reliability it's important to remember that the Domain Name Server (DNS) database maps a host name to their IP address. So instead of using a hard to remember series of numbers (IP address) we just type in your web browser www.yourdomain.com, to get to your website. Now, typically it takes at at least 2 to 3 days to propagate or 'spread the word' of your DNS info throughout the internet. That's why when you register or transfer a domain name it isn't immediately available to the person browsing the web. This delay has stymied the security benefits of hosting your site on multiple servers, as your site would be down for a couple of days if something went awry with one server. You would have to change your DNS to reflect your second server and wait days before the change was picked up in routers on the internet. However, the round robin DNS strategy solves this predicament, by mapping your domain name to more than one IP address. Select hosting companies now employ the DNS round robin technique in conjunction with'failover monitoring'. The DNS round robin failover monitoring process starts by a web hosting company setting up your site on two or more independent web servers (preferably with different IP blocks assigned to them). Your domain name will therefore have 2 or more IP Addresses assigned to it. Then the failover monitor watches your web server(s) by dispatching data to a URL you specify and looking for particular text in the results. When the system detects that one of your IP addresses is returning an error, and the others aren't, it pulls that IP address out of the list. The DNS then points your domain name to the working IP address/s If any of your IP's come back online they are restored to the IP pool. This effectively and safely keeps your site online – even if one of your web servers is down. The average failure detection and recovery time with a system like this can be as low as 15 minutes. This time varies depending on the speed of your site and the nature of the failure and also how long other ISP's cache (save) your DNS information. The time taken for other ISP's caching your information can be manipulated in the failover monitor by lowering the "time to live" (TTL) cache settings. These are the settings that other ISP's will use to determine how long to cache your DNS information. Of course you must bear in mind the matter of how frequently data is synchronized between your website's servers. This will be the hosting company's responsibility, and this may become complicated where databases and user sessions are involved. The very expensive hardware based failover monitoring systems that point a virtual IP address to other ISP's, while behind the scenes juggling a number of unique IP addresses on different servers, is of course the most 'elegant' solution to multi server hosting. That way, the whole issue of ISP's caching your information does not come into play. Therefore, for site's that need to have true 99.99995% uptime, without huge outlays of money, the technology is readily available and certain proprietory failure monitoring systems are now relatively cheap to apply.







About The Author: Godfrey Heron is the Website Manager of the Irieisle Multiple Domain Hosting Services company.Signup for your free trial, and host multiple web sites on one account: http://www.irieisle-online.com

Tuesday, January 10, 2006

Buffer Underrun and Overrun Scenarios

By Stephen Bucaro

Buffer underrun and buffer overrun are occurrences that
can result in some very frustrating errors. This is not a
"how-to" article about fixing buffer underrun and buffer
overrun errors, but a basic description of what a buffer
is, why we need buffers, and what causes buffer underrun
and buffer overrun.

Buffer Underrun

The most common occurrence of buffer underrun is CD
recorders. Let's imagine an example of a CD recording
session. The computer has an ATA hard drive capable of
transferring data at a rate of 8 MBps (Mega Bytes per
second). The CD recorder has a recording rate of 8 MBps.
Everything should work fine, right?

Note: The data transfer rates mentioned in this article do
not apply to any specific device. They're just for purposes
of discussion.

The 8 MBps specification for the hard drive is for "burst"
mode. In other words, it can transfer data at a rate of
8 MBps for only a few seconds. Then the transfer rate drops
much lower, and if the hard drive hasn't been maintained,
for example it has not been defragmented recently, the
transfer rate can drop even lower.

Whereas a hard drive can skip from cluster to cluster
while reading and writing, a CD recorder must burn the data
track in a continuous stream without stopping. The design
of a CD recorder requires a "sustained" transfer rate.

When two devices that operate at different transfer rates
must communicate, we can make them work together by placing
a buffer between them. A buffer is a block of memory, like
a bucket for bytes. When you start the CD recording session,
the hard drive begins filling the buffer. When the buffer
is almost full, the CD recorder begins drawing bytes out of
the buffer.

If everything goes smoothly, the hard drive will be able
to keep enough bytes in the buffer so that the speedy CD
recorder won't empty the buffer. If the buffer runs dry,
the CD recorder has no data to burn into the CD, so it
stops. Buffer underrun error.

We can reduce the chances of buffer underrun by configuring
a larger buffer. Then the hard drive will be able to put
more bytes in the bucket before the CD recorder starts
drawing them out. However, sometimes you can't increase the
size of the buffer because the computer doesn't have a
large amount of RAM installed. When the computer needs more
RAM, it uses "virtual" RAM. That is, it allocates part of
the hard disk and pretends like that's RAM. Now, even though
you've increased the size of the buffer, you have caused
the hard drive to work even slower.

Buffer Underrun and Overrun Scenarios

Buffer Overrun

The most common occurrence of buffer overrun is video
recorders. Let's imagine an example of a video camera
connected to a computer. The video camera records at a data
rate of 168 MBps. The computer monitor is capable of
displaying data at a rate of only 60 MBps. We have a big
problem, right?

Thanks to MPEG compression, we might not have as big a
problem as first appears. With MPEG compression, the video
camera does not have to send the entire image for every
frame. It sends only the data for the part of the image
that changed, and it compresses that part.

If the image doesn't change much, and the part that changed
compresses well, the video camera might need to transfer at
a rate of only a few MBps. But if the entire image changes
every frame and the image does not compress well, the video
camera might transfer data at a higher rate than the
computer monitor is capable of displaying.

Again, we have two devices that operate at different
transfer rates that must communicate. We can make them work
together by placing a buffer between them. When you start
recording video, the video recorder starts filing the
buffer. The computer display immediately begins pulling
data out of the buffer to compose display frames.

If everything goes smoothly, the computer display will be
pulling data out of the buffer fast enough so that the
buffer never completely fills. If the buffer fills up, the
video camera can't put any more data in, so it stops.
Buffer overrun error.

We can reduce the chances of buffer overrun by defining a
larger buffer. Then the video camera will be able to put
more bytes in the bucket before it fills up. Hopefully,
the video camera will run into a few frames where the
entire image doesn't change, reducing its data transfer
rate enough so the computer display can catch up.

Underrun, Overrun Protection

Today, CD recorder buffer underrun is much less common.
Computers come with much more RAM than they did before,
and CD recorders have learned to monitor the buffer and
reduce the recording speed if the buffer starts to run low.

Video camera buffer overrun is also less common. Video uses
a program called a "codec" (for encode/decode). A smart
codec can monitor the buffer and reconfigure itself when
the buffer gets too full. It might for example automatically
reduce the color depth of the video, or drop frames, until
the computer display catches up.

Underrun and overrun Protection doesn't completely solve
the problem. If underrun protection activates, a CD
recording session will take much longer. If overrun
protection activates, the video quality will be reduced.
The only way to solve underrun and overrun problems, after
increasing the size of the buffer, is to match the data
transfer rates of the devices that need to communicate.
You can upgrade to a faster hard drive, or install to a
high performance video card.

Now, if you need to troubleshoot a buffer underrun or
buffer overrun errors, at least you know what a buffer is,
why we need buffersArticle Search, and what causes buffer underrun and
buffer overrun errors.

----------------------------------------------------------
Resource Box:
Copyright(C)2004 Bucaro TecHelp. To learn how to maintain
your computer and use it more effectively to design a Web
site and make money on the Web visit bucarotechelp.com
To subscribe to Bucaro TecHelp Newsletter Send a blank
email to subscribe@bucarotechelp.com
----------------------------------------------------------

Wednesday, June 15, 2005

Looking Ahead

I've made a series of changes to the IT infrastructure recently. We have acquired some new server hardware and I'm converting from Debian to CentOS 4 (Redhat EL4).

My goals are to provide a 2 server HA cluster using DRBD (to keep data in sync) and Heartbeat on CentOS. This improves the server situation in 3 ways.

1) Real time data backup to a hot spare.
2) Automatic failover to the hot spare.
3) CentOS 4 is based on Redhat EL 4, so future support is easier.

So far I have the new hardware assembled. CentOS is installed on 1 and being installed on the second as of now. I also have DRBD compiled on the primary machine. As this work is being don I am documenting everything.

I am used to being the solo IT Department, but I realize that making plans for future support is good. Maybe someday I can take a vacation even without worrying too much.

The openMosix stuff was fun to play with, but its really unnecessary in our situation. The cpu demands on our server are extremely low. There is no problem using a P3-500 to do the job. We have even run on a P2-300 during an emergency.

Eventually I would like to be able to configure a cluster with n nodes that can be seamlessly added or removed. Currently I do not know how this is done. The only way I have seen this done is with load balancing front end. To me that seems like a single point of failure situation.

Wednesday, January 12, 2005

Firefox - The World's Best Web Browser

or

Better (than bad), safer (than tragic), taller (yes)

This post -a work in progress

Firefox is your run of the mill web browser, not unlike MS Internet Explorer. However it comes to us from the open source community - and due to that a number of advantages. It was built on the ashes of the old Netscape browser. It has a few nice new features, runs a bit better and it is safer to use then IE.

History
Netscape open-sourced its Communicator source code in 1998, in an effort to "harness the power of thousands of open-source coders around the world".

Features
One of the major problem with Microsoft "winning" the earlier browser wars, was that they essentially stopped developing their web browser. Since Firefox is an open source effort - lots and lots of people can offer their suggestions and ideas, and assist in making them happen. See the bottom of the Firefox main page for more details on the browsers features.

Extensions allows third party developers to create a huge variety of plug ins to add all sorts of new functionality. The plug ins page has an enormous list of new things that Firefox can do. From Ad Blocking to web development tools to an egg timer...

Accessibility
Although there seem like a lot of activity in this area, so far support for accessibility software is limited. There are some details on this here.

Beyond the above info, one thing that is nice to know is that since this is an open source project, developers will have direct access to the source code. This should make development work in these areas MUCH easier than dealing with Microsoft.

The most serious problem I see is that official Jaws support doesn't seem to exist. I did come across a Firefox extension (or at least talk about one). So it might be possible that a third party will release "Jaws support" before Freedom Scientific does.

I'll keep an eye out to see what develops in the accessibility area. Moral of the story I guess is that we need to keep IE available to people with accessibility software needs for the most part. I'll encourage Firefox use for everyone else though.

Wednesday, January 05, 2005

This looks interesting

Pen

"This is pen, a load balancer for "simple" tcp based protocols such as http or smtp. It allows several servers to appear as one to the outside and automatically detects servers that are down and distributes clients among the available servers. This gives high availability and scalable performance."

Tuesday, January 04, 2005

Switching gears: Heartbeat

After achieving minor initial success with Chaos and openMosix (which I have yet to document here) I started to work on something more directly and immediately beneficial; heartbeat.

What my organization really needs is high availability (aka HA). Better performance would be a bonus, but HA is what I am really after. The closer to, no single point of failure, the better. So what I wanted is a cluster or cluster-like environment where the specific machines were more or less disposable. Having fairly demanding server needs and no budget - the goal is reliability through redundancy.

After reading around the web it became clear the heartbeat would be worth spending some time with. Heartbeat is a software package that provides for one machine to take over for another in even of failure. In its basic configuration is doesn't allow for multiple servers to share the work, but rather for one machine to take over for another when the first fails. Sort of a hot spare setup.

I used 2 of my cluster machines to do experiment with heartbeat. Heartbeat is available through apt, so installing was simply:


apt-get install heartbeat


Then I roughly followed this guide: Getting Started with Linux-HA (heartbeat). I used different IP addresses to suit my network and more or less followed the examples at that site. I set up samba and Apache to test availability. I set up a home share in samba with different files on each machine to indicate which machine was serving them. I did the same for Apache i.e. I changed the default index.html page to include the hostname.

This all worked very well. More to come on this subject.

Thursday, December 30, 2004

Linux Clustering: First Experiment

Now that I had 4 computers built that would be available for clustering, I wanted to do some experimenting. This turned out a failure, but I'll describe what I did anyway.

For my first experiment I wanted to create a simple 2 node cluster.

To make sure the hardware / networking was working properly, I did some testing with Debian.


Testing Hardware / Networking

1) I connected a network card from each computer together directly with a crossover cable.

2) Booted Debian, and logged in as root.

3) Once in debian, I configured networking:

On the first computer:
ifconfig eth0 192.168.1.10
route add -net 0.0.0.0 gw 192.168.1.1

And on the second computer:
ifconfig eth0 192.168.1.20
route add -net 0.0.0.0 gw 192.168.1.1

From the first computer:

3) Then I tested the network connectivity with:

On first computer (192.168.1.10):
ping 192.168.1.20

And this confirmed a working connection.



Everything was working perfectly in Debian, so I concluded everything was in good working order. So the next step was to do some clustering. I originally wanted to follow the IBM article I mentioned earlier on using clusterKnoppix, but as you will see it didn't work out.

The article is interesting and has some useful info, but it has at least 1 typo that I know of. So if you try to follow it, pay attention here:


Initializing the drone node
Setting up a drone is not very different from setting up a master. Repeat the first three steps above for initializing the master node. Try configuring the network card on the drone yourself with the values mentioned previously (ifconfig eth0 192.168.1.20 and ifconfig eth0 192.168.1.20).


I believe that last bit should be:

ifconfig eth0 192.168.1.20
and
route add -net 0.0.0.0 gw 192.168.1.1


This experiment ended up being a big headache (as well as not working) and I am not sure what the problem was. I found a much easier way to do an experiment like this that I will describe in the next post.


First Clustering Experiment



1. Downloaded clusterKnoppix v3.6 and burned CDs.
2. Booted the first machine from the CD into KDE.
3. Following the instructions from the IBM article:
4. Opened a shell window and switched to root

su -
ifconfig eth0 192.168.1.10
route add -net 0.0.0.0 gw 192.168.1.1
tyd -f init (produced errors and ran very slow)
tyd (after a min it gave me a status message)

5. Booted the second machine from CD into textmode using "knoppix 2" as a boot option
6. Again following the IBM article

su -
ifconfig eth0 192.168.1.20
route add -net 0.0.0.0 gw 192.168.1.1
tyd -f init
tyd -m 192.168.1.10


And then... the deafening sound of nothing. According to the article, "That's it! Your cluster's up and running." I tried to check up on the cluster status with the openMosix tools, openMosixview, mosmon, etc and could find no results. I tried many different options and configurations with no results and knoppix was running realy terrible on my machine (P2-400 384MB ram).

I googled furiously and read through forum support and found a lot of interesting stuff, but nothing related to my problem. While searching I noticed someone talking about a distro called Chaos and its compatibility with clusterKnoppix. Apparently Chaos 1.5 is compatible with clusterKnoppix v3.4 (same kernel 2.4.26-om vs clusterKnoppix v3.6 that uses 2.4.27)

I downloaded Chaos 1.5 and clusterKnoppix v3.4 and burned them to CD. In the next post I'll explain what happened with these distros.

One more interesting thing I learned about during this experiment

Both clusterKnoppix and Chaos offer something called PXE booting. This essentially lets you boot a computer across a network. So instead of booting from a floppy disk, cd, or hard drive you can boot directly from another machine. This sounded really useful to me so I decided to look into it.

As it turns out you need to have a network card that support PXE as well a an agreeable BIOS. I have neither. But there is an alternative.

Rom-O-Matic

This is a cool site that assist you in creating (in my case) a bootable floppy disk that will allow my non-PXE network card to PXE boot. The trick part is determining which specific nic type you have. Since I have very poor eyesight I took a macro photo of my card



rom-o-matic has a chart to determine which driver a specific card uses. Broadcom 5702 uses TG3 driver
From this I used the "wizard" page at rom-o-matic to download a disk image. I selected:

tg3:tg3-5702 -- [0x14e4,0x1646]
and
Bootable floppy ROM image (zdsk)

When it downloaded I renamed it something simple, like "image"

I also downloaded RAWRITE which is a dos utility that is used to write the image to the floppy. I opened a command prompt:


cd c:/downloads
rawrite
entered "image"
entered "a"

rawrite created the disk quickly.

When I tried to boot from the disk, there was a line about "probing pci nic" and it listed my network card as "5702X", so I went back to rom-o-matic and downloaded the ROM for that. Re-ran rawrite to create a new bootable floppy.

There is more info at Etherboot.