Thursday, January 10, 2013

Experiment on Linux: A Web Server in Just One Line of Code

So, you want to learn about web applications? You have just installed Linux, and you figured out how to open a terminal and install web server software like Apache. But haven't you ever wondered exactly what it is the Apache server is doing? What magical change to your Linux system has it effected so that your computer suddenly acts like a web server? Well, Apache is a system that launches several programs that we call "daemons" that do nothing more than sit and listen for TCP sockets to be created on the HTTP port number 80. The operating system takes care of figuring out whether or not the IP address and TCP port numbers match, if they do, they awake your Apache daemons and plug the information that was stored in the TCP packet directly into the daemon. So anyone who places a TCP packet with the correct IP address and port number onto your network that can be detected by your computer's Internet Protocol interface will trigger these web server daemons into action.

Here's a trick you can try right as soon as you install Linux, even before you install a Web server.

Open a terminal and enter this command:

nc -l 50080

Nothing will happen right away, just let it sit there. What this command means is "open a TCP socket and wait for something to try and connect to it on channel 50080.") HTTP will always listen-in on channel 80, but your Linux system guards this channel, so you need to use channel 50080 instead. Then open a web browser and type "http://127.0.0.1:50080". Using 127.0.0.1 will cause your browser to communicate with your computer over its own TCP socket, instead of connecting to the Internet, and of course ":50080" means to broadcast on channel 50080, instead of the default channel 80.

Then, in the terminal window where you typed the nc -l 50080 command, you will see exactly the information that your browser sends to any website that it tries to communicate with, written out in the HTTP language. The browser will not show anything, it will act like it is waiting for a web page to load. It will probably just get tired of waiting and eventually show you an error message. At any time you can go back to the nc -l 50080 terminal and cancel it by pressing "Control-C" (C for cancel).

So here is what my Firefox says to every web server it ever meets:

GET / HTTP/1.1
Host: 127.0.0.1:50080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

If you were a genius hacker, then instead of cancelling the nc -l 50080 terminal, you could type commands in the HTTP language directly into the nc -l 50080 terminal, followed by the web page document written in the HTML language. This will be fed back to your browser and cause it to display the HTML document, assuming you made no mistakes and typed everything before the browser gets tired of waiting (typically you have about 2 minutes).

The point of this is to demonstrate that the basic technology of the Web is all very straight-forward: TCP sockets are like two-way radios, and everyone agrees that channel 80 is the HTTP language channel (like how in your local community, channel 52 might be the Spanish speaking TV channel). HTTP is human readable code. Whenever you connect to a website, your browser is speaking this language over a TCP socket, and the server on the other end of the Internet is talking right back. It all happens in an instant, but you can make it all happen by hand if you want to.

We often work with grand server software like Apache, but the job of the server software is really just to provide to you an elegant tool for dumping HTTP code onto a TCP socket in a way that intelligently responds to the web browsers that are communicating with it. The function of the TCP socket is simply to feed that code onto the Internet for you. So a server can be anything that lets you read and write code on a TCP socket, even just one single command, like netcat.

Linux gives you the tools to invent your own space in the Web, right down to the smallest detail, but still provides a simple way to freely install enterprise-quality server software that handles all the details for you.

One final note: don't ever say "TCP channel 80", say "TCP port 80", or port 50080. We don't call ports channels, even though ports are almost exactly like TV or radio channels, they are called "ports." Also, don't refer to HTTP as a "language," (even though that's what it is, a language), you should call it a "protocol." HTTP means Hyper Text Transfer Protocol. HTML is a language, the Hyper Text Markup Language. What's the difference between a language and a protocol? Not much, that's just how the jargon has evolved.

No comments: