The World Wide Web is a network of information resources. The Web relies on three mechanisms to make these resources readily available to the widest possible audience:
- A uniform naming scheme for locating resources on the Web, e.g. URLs
- Protocols, for access to named resources over the Web. e.g. HTTP
- Hypertext, for easy navigation among resources. e.g. HTML
HTML documents utilize URLs for specifying hypertext links. The following provides a brief introduction to URLs.
Every resource available on the Web --- HTML document, image, video clip, program, etc. --- has an address that may be encoded by a Uniform Resource Locator, or "URL" (defined in ).
URLs typically consist of three pieces:
- The scheme identifying the protocol used to access the resource.
- The name of the machine hosting the resource.
- The name of the resource itself, given as a path.
Consider the URL that designates the current HTML specification:
This URL may be read as follows: Use the HTTP protocol (see ) to transfer the data residing on the machine www.w3.org in the file "/TR/WD-html4/cover.html". Other schemes you may see in HTML documents include "mailto" for email and "ftp" for FTP.
URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive.
The character set of URLs that appear in HTML is specified in .
Some URLs refer to a location within a resource. As specified in , this kind of URL ends with "#" followed by an anchor identifier (called the "fragment identifier"). For instance, here is a URL pointing to an anchor named section_2:
//somesite.com/html/top.html#section_2A relative URL (defined in ) doesn't contain any protocol or machine information. Its path generally refers to a resource on the same machine as the current document. Relative URLs may contain relative path components (".." means one level up in the hierarchy defined by the path), and may contain fragment identifiers.
Relative URLs are resolved to full URLs using a base URL. defines the normative algorithm for this process.
As an example of relative URL resolution, assume we have the base URL "//www.acme.com/support/intro.html". The relative URL in the following markup for a hypertext link:
Supplierswould expand to the full URL "//www.acme.com/support/suppliers.html", while the relative URL in the following markup for an image
would expand to the full URL "//www.acme.com/icons/logo.gif".
5.1.3 URLs in HTML
In HTML, URLs play a role in these situations:
- linking to another document or resource, (see the and elements).
- linking to an external style sheet or script (see the and elements).
- images, objects and applets for inclusion in a page, (see the , , and elements).
- image maps (see the and elements).
- form submission (see ).
- frames (see the and elements).
- citing an external reference (see the , , and elements).
- referring to metadata conventions describing a document (see the element).
User agents should calculate the base URL for resolving relative URLs according to the . The following is a summary of how applies to HTML. User agents should calculate the base URL according to the following precedences (highest priority to lowest):
- The base URL is set by the element.
- The base URL is given by an HTTP header (see ).
- By default, the base URL is that of the current document.
Additionally, the and elements define attributes that take precedence over the value set by the element. Please consult the definitions of these elements for more information about URL issues specific to them.
Link elements specified by HTTP headers are handled exactly as elements that appear explicitly in a document.
MAILTO URLs
In addition to HTTP URLs, authors might want to include MAILTO URLs (see ) in their documents. MAILTO URLs cause email to be sent to some email address. For instance, the author might create a link that, when activated, causes the user agent to open a mail program with the destination address in the "To:" field.
MAILTO URLs have the following syntax:
mailto:email-addressUser agents may support MAILTO URL extensions that are not yet Internet standards (e.g., appending subject information to a URL with the syntax "?Subject=my%20subject" where any space characters are replaced by "%20"). Some user agents also support "?Cc=email-address".