James Marshall - HTTP Made Really Easy

Untitled HTTP uses the client-server model:

An HTTP client opens a connection and sends a request message to an HTTP server;
the server then returns a response message,
After delivering the response, the server closes the connection (HTTP 1.0),
This makes HTTP a stateless protocol, i.e. not maintaining any connection information between transactions
Este é um dos maiores problemas que devem ser resolvidos por aplições WWW envolvendo transações (Célio G.)

A browser is an HTTP client and the Web server is an HTTP server :

client sends requests to an HTTP server (Web server)
The Web server then sends responses back to the client.
The standard port for HTTP servers to listen on is 80

The format of the request and response messages are similar, and English-oriented.

Both kinds of messages consist of:

an initial line,
zero or more header lines,
a blank line (i.e. a CRLF by itself), and
an optional message body (e.g. a file, or query data, or query output).

<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3

<optional message body goes here, like file contents or query data;
 it can be many lines long, or even binary data $&*%@!^$@>

Example

To retrieve the file at the URL

http://www.somehost.com/path/file.html

first open a socket to the host www.somehost.com, port 80

Then, send something like the following through the socket:

GET /path/file.html HTTP/1.0
From: someuser@jmarshall.com
User-Agent: HTTPTool/1.0
[blank line here]

The server should respond with something like:

HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354

<html>
<body>
<h1>Happy New Millennium!</h1>
(more file contents)
  .
  .
  .
</body>
</html>

After sending the response, the server closes the socket.

Status Codes

The status code is a three-digit integer, and the first digit identifies the general category of response:

1xx indicates an informational message only
2xx indicates success of some kind
3xx redirects the client to another URL
4xx indicates an error on the client's part
5xx indicates an error on the server's part

The most common status codes are:

200 OK: The request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body.
404 Not Found: The requested resource doesn't exist.
301 Moved Permanently 302 Moved Temporarily 303 See Other (HTTP 1.1 only): The resource has moved to another URL (given by the Location: response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file.
500 Server Error: An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.

Header Lines

For Net-politeness, consider including these headers in your requests:

The From: header gives the email address of whoever's making the request, or running the program doing so. (This must be user-configurable, for privacy concerns.)
The User-Agent: header identifies the program that's making the request, in the form "Program-name/x.xx", where x.xx is the (mostly) alphanumeric version of the program. For example, Netscape 3.0 sends the header "User-agent: Mozilla/3.0Gold".

These headers help webmasters troubleshoot problems. They also reveal information about the user. When you decide which headers to include, you must balance the webmasters' logging needs against your users' needs for privacy.

If you're writing servers, consider including these headers in your responses:

The Server: header is analogous to the User-Agent: header: it identifies the server software in the form "Program-name/x.xx". For example, one beta version of Apache's server returns "Server: Apache/1.2b3-dev".
The Last-Modified: header gives the modification date of the resource that's being returned. It's used in caching and other bandwidth-saving activities. Use Greenwich Mean Time, in the format
```
    Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT
    
```

The Message Body

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.

If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular,

The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif.
The Content-Length: header gives the number of bytes in the body.

HTTP Proxies

An HTTP proxy is a program that acts as an intermediary between a client and a server. It receives requests from clients, and forwards those requests to the intended servers. The responses pass back through it in the same way. Thus, a proxy has functions of both a client and a server.

When a client uses a proxy, it typically sends all requests to that proxy, instead of to the servers in the URLs. Requests to a proxy differ from normal requests in one way: in the first line, they use the complete URL of the resource being requested, instead of just the path. For example,

GET http://www.somehost.com/path/file.html HTTP/1.0

That way, the proxy knows which server to forward the request to (though the proxy itself may use another proxy).

The POST Method

A POST request is used to send data to the server to be processed in some way, like by a CGI script. A POST request is different from a GET request in the following ways:

There's a block of data sent with the request, in the message body.
There are usually extra headers to describe this message body, like Content-Type: and Content-Length:.
The request URI is not a resource to retrieve; it's usually a program name to handle the data you're sending.
The HTTP response is normally program output, not a static file.

The most common use of POST, by far, is to submit HTML form data to CGI scripts.
In this case:

the Content-Type: header is usually application/x-www-form-urlencoded,
the Content-Length: header gives the length of the URL-encoded form data
The CGI script receives the message body through STDIN, and decodes it.

Here's a typical form submission, using POST:
POST /path/script.cgi HTTP/1.0 From: frog@jmarshall.com User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 32 home=Cosby&favorite+flavor=flies

HTTP 1.1

Improvements include:

Faster response, by allowing multiple transactions to take place over a single persistent connection.
Faster response and great bandwidth savings, by adding cache support.
Faster response for dynamically-generated pages, by supporting chunked encoding, which allows a response to be sent before its total length is known.
Efficient use of IP addresses, by allowing multiple domains to be served from a single IP address.

HTTP 1.1 Clients

To comply with HTTP 1.1, clients must

include the Host: header with each request
accept responses with chunked data
either support persistent connections, or include the "Connection: close" header with each request
handle the "100 Continue" response

Host: Header

Starting with HTTP 1.1, one server at one IP address can be multi-homed, i.e. the home of several Web domains. For example, "www.host1.com" and "www.host2.com" can live on the same server.

A complete HTTP 1.1 request might be

GET /path/file.html HTTP/1.1
Host: www.host1.com:80
[blank line here]

except the ":80" isn't required, since that's the default HTTP port.

Chunked Transfer-Encoding

If a server wants to start sending a response before knowing its total length (like with long script output), it might use the simple chunked transfer-encoding, which breaks the complete response into smaller chunks and sends them in series. You can identify such a response because it contains the "Transfer-Encoding: chunked" header. All HTTP 1.1 clients must be able to receive chunked messages.

HTTP 1.1 Servers

To comply with HTTP 1.1, servers must:

require the Host: header from HTTP 1.1 clients
accept absolute URL's in a request
accept requests with chunked data
either support persistent connections, or include the "Connection: close" header with each response
use the "100 Continue" response appropriately
include the Date: header in each response
handle requests with If-Modified-Since: or If-Unmodified-Since: headers
support at least the GET and HEAD methods
support HTTP 1.0 requests

Requiring the Host: Header

Because of the urgency of implementing the new Host: header, servers are not allowed to tolerate HTTP 1.1 requests without it. If a server receives such a request, it must return a "400 Bad Request" response, like

HTTP/1.1 400 Bad Request
Content-Type: text/html
Content-Length: 111

<html><body>
<h2>No Host: header received</h2>
HTTP 1.1 requests must include the Host: header.
</body></html>

Persistent Connections and the "Connection: close" Header

In HTTP 1.0 and before, TCP connections are closed after each request and response;
each resource to be retrieved requires its own connection;
Opening and closing TCP connections takes a substantial amount of CPU time, bandwidth, and memory;
most Web pages consist of several files on the same server, so much can be saved by allowing several requests and responses to be sent through a single persistent connection.
Persistent connections are the default in HTTP 1.1. Just open a connection and send several requests in series (called pipelining), and read the responses in the same order as the requests were sent.

The Date: Header

Caching is an important improvement in HTTP 1.1, and can't work without timestamped responses. So, servers must timestamp every response with a Date: header containing the current time, in the form

Date: Fri, 31 Dec 1999 23:59:59 GMT

All time values in HTTP use Greenwich Mean Time.

Handling Requests with If-Modified-Since: or If-Unmodified-Since: Headers

To avoid sending resources that don't need to be sent, thus saving bandwidth, HTTP 1.1 defines the If-Modified-Since: and If-Unmodified-Since: request headers. The former says "only send the resource if it has changed since this date"; the latter says the opposite. Clients aren't required to use them, but HTTP 1.1 servers are required to honor requests that do use them.

Unfortunately, due to earlier HTTP versions, the date value may be in any of three possible formats:

If-Modified-Since:  Fri, 31 Dec 1999 23:59:59 GMT
If-Modified-Since:  Friday, 31-Dec-99 23:59:59 GMT
If-Modified-Since:  Fri Dec 31 23:59:59 1999

URL-encoding

HTML form data is usually URL-encoded to package it in a GET or POST submission. In a nutshell, here's how you URL-encode the name-value pairs of the form data:

Convert all "unsafe" characters in the names and values to "%xx", where "xx" is the ascii value of the character, in hex. "Unsafe" characters include =, &, %, +, non-printable characters, and any others you want to encode-- there's no danger in encoding too many characters. For simplicity, you might encode all non-alphanumeric characters.
Change all spaces to plusses.
String the names and values together with = and &, like
```
name1=value1&name2=value2&name3=value3
```
This string is your message body for POST submissions, or the query string for GET submissions.

For example, if a form has a field called "name" that's set to "Lucy", and a field called "neighbors" that's set to "Fred & Ethel", the URL-encoded form data would be

name=Lucy&neighbors=Fred+%26+Ethel

with a length of 34.