Even 5 years ago an approach to infrastructure in a startup would have been different. Building large-scale web system was a black-magic exercise, which started with buying hardware and building a farm in your basement. Fast forward to almost 2008 and those days are gone. You no longer need to be an expert in large-scale computing, instead you can rely on your common sense and existing solutions.
1. Use the Best Hosting Provider Your Money Can Buy
As a startup you are looking for ways to keep the costs down. One of the first things that seems to be good to save money on is the hosting company. This is a mistake that will cost you a lot of time, which is more valuable than the money you will spend.
It is okay to go with a cheaper provider when you are just developing the code, but your production needs to run on a rock solid system.
Here is an example from AdaptiveBlue experience. We worked with one hosting provider for the first year. The reason we choose them is because we’ve used them in the past. And we regretted the choice over and over again. There were constant problems. Ranging from configuration issues to down time, to unresponsiveness of the technical staff. Finally we just had enough.
To pick the best hosting company we tapped into community and posted a question on LinkedIn and asked: What is the best hosting provider you’ve ever had?
Majority of the 19 replies said: Rackspace.
All replies said that they were very expensive, but worth the money. Next day we called Rackspace up and found out that they are almost twice as expensive as the current company we had. I was reluctant, but in the end decided to give it a try. And we never looked back.
Since we moved to Rackspace we only had a single downtime. The machines are running smoothly, we never had to reboot them. And best of all, we get Rackspace fanatical support. With the previous provider it took days to get to the bottom of the issue, primarily because technical support did not respond quickly enough.
With Rackspace we experience nearly instant support. Each ticket is automatically entered and tracked through our dashboard. Replies were very prompt and the whole process feels good and simple because people where responsive.
The most important thing about the hosting company is the responsiveness of their support team. Think about this. When everything is running smoothly you do not care. When your production system is not working right, you need help and you need it fast.
2. Use Amazon Web Services
You are still likely to need the regular hosting provider, but you should be aware of the increasingly important alternative - Amazon Web Services. This offering from the E-Commerce giant is a must-consider piece of infrastructure for any startup. Specifically, 4 services make it easier to build large-scale web applications.
1. Amazon S3 - The Simple Storage Service
This web service provides simple API for reading and writing objects into a virtual hash. Each object, identified by unique key, can be written or read using REST interface. The service has been used by many startups, including AdaptiveBlue. One of the best usages of S3 is to store large media files. In this case, the S3 becomes particularly economical because read access is 10 times cheaper than the write access. For example, a popular photo sharing service, SmugMug, uses S3 to store the user’s photos. To learn more about S3, please read this post on my tech blog.
2. EC2 - The Elastic Compute Cloud Service
EC2 is essentially a grid in the box. It is a service that directly competes with your hosting provider, since it allows you to create and dynamically provision identical instances of your software.
You define what operating system you need, what web server you want to run, what database to use, etc. Then you create an image of all of that, including your own software, and roll it out onto the grid. You then can ask EC2 to start instances with your configuration on the fly.
EC2 is a powerful service, but it is still evolving. We considered it about 4 months ago and choose to wait until it is more mature and more tools available for it. Management console that makes tasks easier is one essential that we are waiting for. The second one is the ability to have a dedicated host, since right now, the instances share the physical hardware. But we are definitely planning to come back and re-evaluate EC2 in the nearest future, as it is excellent approach to scaling your application.
3. Simple Queue Service
The Simple Queue Service has been around for sometime, but it has not gotten as much attention as the other services. The idea is rather simple - allow queuing of the messages using pub/sub mechanism. This service is useful for processing intense data, such as site analytics. The information would be added to the queue by a writer and then taken off by one or more readers. This approach allows decoupling of the read/write access to the data store and is a common technique for dealing with database bottlenecks.
Simple DB Service
This newest offering form Amazon is still in the private Beta. SimpleDB is another interesting storage service, but it is different from S3. SimpleDB is a version of a database, where adding an object to a table automatically creates indexes for each attribute of the object. Such approach facilitates quick look up of an object by any attribute and is handy in many situation when you need to search through a large data set. In addition to being a database, SimpleDB can be used as a companion service for S3. The objects stored in S3 can be indexed in SimpleDB simply by using the same object key in both services.
Collectively Amazon Web Services offer a powerful tool set for startups to build on. You can learn more about Amazon web services stack from my post on Read/WriteWeb.
3. Use Google Analytics in Standard And Creative Ways
Early on startups need to track things. Tracking results is useful for metrics, which, in turn, help measure growth and success of the company. Without tracking, it is difficult to determine what is going on.
At the very least you need to track the visitors to your site. Where are they coming from? What are they doing on your site? Are they clicking where you need them to click? All of these questions are addressed by any web site analytics package. We have been using StatsCounter and Google Analytics to keep track of things. Personally, I am used to StatsCounter, but the truth is that Google Analytics, which is free, does as good, if not better job on site analytics.
Google Analytics is packed with features, but more importantly it has an API. The reason this is important is because you can actually build your own Dashboard that offers a different, customized view of the same information.
Beyond that, you can use Google analytics for any kind of tracking. For example, if your service is offering a widget you can use Google Analytics to track what sites is this widget on. To do that just call Google analytics from the widget code. Not only it is simple, it is also fast. While building your own tracking system might not be complicated, building the one that works very quickly and does not slow down the browser might be tricky.
Google analytics is very quick when it comes to calling it and is flexible when you need to get the data out, so all and all its a great piece of infrastructure that you should have in your pocket.
4. Start With Defaults Then Tune the System
In 99.9% of the cases you are better of starting with defaults. In 99.9% of the cases you are not going to end up where you started. The trick is to walk the path from defaults to custom settings in the right way.
Probably the worst thing you can do is premature tuning. Like premature optimization in the source code this leads to ugliness. Why guess before you even know what is going to happen to the system?
In the default LAMP stack you are getting a default set of parameters that works fine in a general case. As your system begins to grow you will need to make trade offs. For example, if you know that all of your requests are one time REST calls, then Keep Alive is not needed. Or if you know that you have 500 concurrent users then default setting of 30 MAX logins into MySql is not going to be enough.
But beware that once you discover the need to change one parameter, you will be tempted to tweak the others. Do not do it. Tweaking many parameters at once is useless because you will not know what really caused the change in your system. Fixing all settings and changing just one will give you more confidence in how the system will behave once multiple changes are made.
5. Hire or Contract Good System Administrator
This one is the last and the simplest tip of all. Like programming, business development and accounting system management is a specialty best left to professionals. I know my way around unix, I even used to be a system administrator 15 years ago, but I am not up to par. When you reach certain size and scale you need a dedicated person running the hardware and OS show.
Even a seemingly simple task of writing a system script might not be the best use of development time. System programming is very different from application development and is not straightforward.
For early stage startups it maybe a challenge to recruit a dedicated sys admin, but any good hosting company would be happy to carry this function for you, likely, for additional payment.
Because they know what they are doing, it will only take them a few minutes to write the script and will not cost you much. And if instead, you will try to do it yourself, your total cost might end up much higher.
Increasingly, the infrastructure for software startup is becoming less of a challenge. With the next generation tools on the market and the world class support from solid hosting companies the infrastructure, at least in the early stages, is no longer a big deal. The trick is to know the tools on the market and then pick the right ones to solve your problems.