James Marshall - HTTP Made Really Easy

Untitled HTTP uses the client-server model:

A browser is an HTTP client and the Web server is an HTTP server :

The format of the request and response messages are similar, and English-oriented.

Both kinds of messages consist of:

<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3

<optional message body goes here, like file contents or query data;
 it can be many lines long, or even binary data $&*%@!^$@>

Example

To retrieve the file at the URL

http://www.somehost.com/path/file.html

first open a socket to the host www.somehost.com, port 80

Then, send something like the following through the socket:

GET /path/file.html HTTP/1.0
From: someuser@jmarshall.com
User-Agent: HTTPTool/1.0
[blank line here]

The server should respond with something like:

HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354

<html>
<body>
<h1>Happy New Millennium!</h1>
(more file contents)
  .
  .
  .
</body>
</html>
After sending the response, the server closes the socket.

Status Codes

  • The status code is a three-digit integer, and the first digit identifies the general category of response: The most common status codes are:
    200 OK
    The request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body.
    404 Not Found
    The requested resource doesn't exist.
    301 Moved Permanently
    302 Moved Temporarily
    303 See Other
    (HTTP 1.1 only)
    The resource has moved to another URL (given by the Location: response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file.
    500 Server Error
    An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.

    Header Lines

    For Net-politeness, consider including these headers in your requests:

    These headers help webmasters troubleshoot problems. They also reveal information about the user. When you decide which headers to include, you must balance the webmasters' logging needs against your users' needs for privacy.

    If you're writing servers, consider including these headers in your responses:


    The Message Body

    An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.

    If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular,


    HTTP Proxies

    An HTTP proxy is a program that acts as an intermediary between a client and a server. It receives requests from clients, and forwards those requests to the intended servers. The responses pass back through it in the same way. Thus, a proxy has functions of both a client and a server.

    When a client uses a proxy, it typically sends all requests to that proxy, instead of to the servers in the URLs. Requests to a proxy differ from normal requests in one way: in the first line, they use the complete URL of the resource being requested, instead of just the path. For example,

    GET http://www.somehost.com/path/file.html HTTP/1.0
    

    That way, the proxy knows which server to forward the request to (though the proxy itself may use another proxy).


    The POST Method

    A POST request is used to send data to the server to be processed in some way, like by a CGI script. A POST request is different from a GET request in the following ways:

    The most common use of POST, by far, is to submit HTML form data to CGI scripts.
    In this case:

    Here's a typical form submission, using POST:
    POST /path/script.cgi HTTP/1.0
    From: frog@jmarshall.com
    User-Agent: HTTPTool/1.0
    Content-Type: application/x-www-form-urlencoded
    Content-Length: 32
    
    home=Cosby&favorite+flavor=flies
    

    HTTP 1.1

    Improvements include:

    HTTP 1.1 Clients

    To comply with HTTP 1.1, clients must


    Host: Header

    Starting with HTTP 1.1, one server at one IP address can be multi-homed, i.e. the home of several Web domains. For example, "www.host1.com" and "www.host2.com" can live on the same server.

    A complete HTTP 1.1 request might be

    GET /path/file.html HTTP/1.1
    Host: www.host1.com:80
    [blank line here]
    
    except the ":80" isn't required, since that's the default HTTP port.

    Chunked Transfer-Encoding

    If a server wants to start sending a response before knowing its total length (like with long script output), it might use the simple chunked transfer-encoding, which breaks the complete response into smaller chunks and sends them in series. You can identify such a response because it contains the "Transfer-Encoding: chunked" header. All HTTP 1.1 clients must be able to receive chunked messages.


    HTTP 1.1 Servers

    To comply with HTTP 1.1, servers must:


    Requiring the Host: Header

    Because of the urgency of implementing the new Host: header, servers are not allowed to tolerate HTTP 1.1 requests without it. If a server receives such a request, it must return a "400 Bad Request" response, like

    HTTP/1.1 400 Bad Request
    Content-Type: text/html
    Content-Length: 111
    
    <html><body>
    <h2>No Host: header received</h2>
    HTTP 1.1 requests must include the Host: header.
    </body></html>
    

    Persistent Connections and the "Connection: close" Header


    The Date: Header

    Caching is an important improvement in HTTP 1.1, and can't work without timestamped responses. So, servers must timestamp every response with a Date: header containing the current time, in the form

    Date: Fri, 31 Dec 1999 23:59:59 GMT
    

    All time values in HTTP use Greenwich Mean Time.


    Handling Requests with If-Modified-Since: or If-Unmodified-Since: Headers

    To avoid sending resources that don't need to be sent, thus saving bandwidth, HTTP 1.1 defines the If-Modified-Since: and If-Unmodified-Since: request headers. The former says "only send the resource if it has changed since this date"; the latter says the opposite. Clients aren't required to use them, but HTTP 1.1 servers are required to honor requests that do use them.

    Unfortunately, due to earlier HTTP versions, the date value may be in any of three possible formats:

    If-Modified-Since:  Fri, 31 Dec 1999 23:59:59 GMT
    If-Modified-Since:  Friday, 31-Dec-99 23:59:59 GMT
    If-Modified-Since:  Fri Dec 31 23:59:59 1999
    

    URL-encoding

    HTML form data is usually URL-encoded to package it in a GET or POST submission. In a nutshell, here's how you URL-encode the name-value pairs of the form data:

    1. Convert all "unsafe" characters in the names and values to "%xx", where "xx" is the ascii value of the character, in hex. "Unsafe" characters include =, &, %, +, non-printable characters, and any others you want to encode-- there's no danger in encoding too many characters. For simplicity, you might encode all non-alphanumeric characters.
    2. Change all spaces to plusses.
    3. String the names and values together with = and &, like
      name1=value1&name2=value2&name3=value3
      
    4. This string is your message body for POST submissions, or the query string for GET submissions.
    For example, if a form has a field called "name" that's set to "Lucy", and a field called "neighbors" that's set to "Fred & Ethel", the URL-encoded form data would be
    name=Lucy&neighbors=Fred+%26+Ethel
    
    with a length of 34.