Posted on Jul 26, 2007 - 1:50am by John P. in Tutorials, Wordpress
Although my partners and I have shared HTML authoring information with tens of millions of people on HTMLHelp.com over the previous decade, this is the first time that I’ve ever made available the brief introduction to HTML that I actually use when I’m teaching friends and family or giving lectures at universities or other speaking engagements.
I chose to share it in the hopes that it will be of use both to beginners and to educators looking for a good brief introductory document. In addition to the Web based version below, here is a PDF version that is formatted for printing (I think it’s easier to read).
Technically speaking, HTML stands for HyperText Markup Language. HTML is an SGML application. SGML or Standardized Generalized Markup Language is a system of text rules that had its roots in the late 1960’s and was pioneered by a man named Dr. Charles Goldfarb.
All systems of working with text have a basic set of rules. These rules can be referred to as a markup language. To try to ease into this difficult concept lets take a simple letter as an example and attempt to describe its markup language.
If we look at this sample letter:
November 3, 1996
John Q. Student
12345 UTD Drive
Richardson, TX 75000Dear John,
I am sorry that it has to end this way, but I have met a brilliant Internet expert who I wish to follow to the ends of Cyberspace. I am afraid that I can no longer be your E-mail girlfriend.
Things I like about him:
1. He is kind of funny (looking).
2. He is tall.
3. He has a computer.Please don’t bother to write back as I will never speak to you again.
Yours always,
Suzy Q. Byte
Your ex-girlfriendencl: The floppy disk you gave me for our anniversary.
We could break the letter down into logical sections:
<Date>
<Address>
<Salutation>
<Body>
<Closing>
And within each section we could further subdivide:
<Date>
<Address>
- <Name>
- <Street>
- <Town, State, Zip>
- <Salutation>
<Body>
- <Paragraph>
- <List>
- <Paragraph>
<Closing>
- <Name>
- <Title>
- <Enclosures>
Now, if we were to define a set of rules for each section we would have a simple markup language. For example:
That is really all there is to a markup language. It is concerned with the exact layout and formatting of a document as it is to occur in its final output.
It seems simple enough, however there is one glaring problem with this method of defining a document. It is very platform dependent. This means that it can only be used on identical systems. For example, what would happen if this document and its related markup language were transplanted to another system that did not have the font Times New Roman? Something would definitely not work right.
This is exactly the problem that Dr. Goldfarb was interested in addressing. He reasoned that a better markup language would not be quite as specific and would hence be platform independent. In other words, we are not interested that the address is Times New Roman 12 point, but rather that the address appears at that particular spot in the document. This way, a document could be moved to different systems and each system could use its own built in set of rules on how to display the corresponding general markup.
In 1986, the International Organization for Standards (ISO) in Geneva adopted SGML as “Information Processing - Text and Office Systems - Standard Generalized Markup Language.
So HTML is a SGML based markup language… meaning that it is not concerned with the specifics of how a document is rendered, merely where each part of the document begins and ends.
Since HTML is a standardized markup language, we use a limited set of elements or “tags” as identifiers to define the structure of a document. A tag is always contained within the “<” and “>” symbols to denote that it is an identifier, and often contains a closing tag which is contained within the “</” and “>” symbols. (The “/” differentiates it as an ending tag.)
An interesting note about HTML tags is that they may be, and quite often are contained within one another. For example, the text attribute BOLD is represented by the set of tags:
<b>Everything within is rendered in bold</b>
And the text attribute ITALICS is represented by the set of tags:
<i>Everything within is rendered in italics</i>
If a bold, italics font were desired, the tags would simply be used in conjunction with one another to produce:
<b><i>Render in bold, italics</i></b>
This use of multiple tags within one another is known as nesting. There is one little thing to remember about nested tags, internal tags should be closed before external ones. In the preceding example, the bold tags completely surround the italics tags.
If the closing Italics tag had been outside of the bold tag, the page might still display OK in this particular case, but the author would certainly look like an amateur. In addition, if some tags are not nested properly they may destroy the way the document is displayed.
All HTML documents must follow a very basic format. It is quite simple, but must be strictly adhered to. There are five specific tags that must be contained within each document and their format is as follows:
- <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 3.2//EN”>
- <HTML>
- <HEAD>
- <TITLE>The title of the document goes here</TITLE>
…other heading elements go here.- </HEAD>
- <BODY>
The BODY of the document goes here.
…
</BODY>
</HTML>
The 5 required tags are explained as follows:
For more information regarding the use of any HTML tag visit the Web Design Group’s site at http://www.htmlhelp.com. There are online and offline references available with complete definitions and usage examples.
<meta name="author" content="John Pozadzides">
<meta name="description" content="Short site description...">
<meta name="keywords" content="No, more, than, 20, keywords, go, here">The BODY of a document consists of multiple block level elements. If plain text is found inside the body, it is assumed to be inside a paragraph P.
Block level elements are elements that imply a paragraph break, or the start of a new line. Some Block level elements are:
<UL>
<LI>Walk the Car
<LI>Polish the Dog
<LI>Wash the Cat
</UL>
<OL>
<LI>Walk the Car
<LI>Polish the Dog
<LI>Wash the Cat
</OL>
<CENTER>
<IMG SRC="foo.gif">
This is a picture of Foo.
<A HREF="mailto:foo@mail.com">Send mail to Foo. </A>
</CENTER><H1 ALIGN=CENTER>Heading goes here</H1><HR SIZE=”4” WIDTH=”75%”>Text level elements are contained within Block level elements and are used for non-breaking markup such as:
Anchors are used to create hyperlinks to other documents or media within Web pages. The A element denotes an anchor–a hypertext link or the destination of a link.
The HREF attribute specifies a hypertext link to another resource, such as an HTML document or a JPEG image. Examples:
<A HREF="album.html">My photo album</A>
<A HREF="../images/me.jpg">Picture of me</A>
<A HREF="mailto:john@spammenot.com" TITLE="Feedback on HTML Reference">john@spammenot.com</A>
The value of the HREF attribute is the URI of the link. The TITLE attribute can be used to briefly describe the contents of the link and is rendered as a “tooltip” by some visual browsers. With mailto links, some browsers use the TITLE attribute value as a subject for the e-mail message.
The content of an A element used as a link should be as context-free as possible. In other words, a user should be able to pull all A elements from a document and still have an idea what lies behind each link. Link text that contains Click here or simply here is extremely bad form.
The NAME attribute defines a destination for a link. For example, a document containing:
<H1><A NAME=”foo”>My Heading</A></H1>
defines a link destination named “foo” at the indicated heading. One could then use HREF=”#foo” in an A element within the same document or HREF=”somedoc.html#foo” from within another document.
An A element cannot contain another A element, so one must be careful that named anchors do not contain link anchors. Authors can use both the NAME and HREF attributes in a single A element to avoid this problem.
HTML 4.0’s ID attribute is intended to eliminate the need for A NAME. The ID attribute can be used with almost any element to define a link destination, so that the following could be used in place of the previous example:
<H1 ID=foo>My heading</H1>
NAME and ID values must be unique in any document, and different values must differ by more than just the case. Values must begin with a letter in the range A-Z or a-z, and may be followed by A-Z, a-z, 0-9, hyphens, underscores, colons, or periods. When linking to a named anchor, the name is treated as case sensitive.
Why Validate My HTML?
One of the important maxims of computer programming is:
‘Be conservative in what you produce; be liberal in what you accept.’
Browsers follow the second half of this maxim by accepting Web pages and trying to display them even if they’re not legal HTML. Usually this means that the browser will try to make educated guesses about what you probably meant.
The problem is that different browsers (or even different versions of the same browser) will make different guesses about the same illegal construct; worse, if your HTML is really pathological, the browser could get hopelessly confused and produce a mangled mess, or even crash.
That’s why you want to follow the first half of the maxim by making sure your pages are legal HTML. The best way to do that is by running your documents through one or more HTML validators.“
Two of the most reputable HTML validators can be found at:
Once you’ve mastered this document there is much greater depth to be found at http://htmlhelp.com/reference/html40/.
Additionally, there are volunteers staffing the HTMLHelp.com Forums who will happily help anyone making a genuine attempt to learn HTML.
The first principle of Web authoring is to convey information to the reader about a particular topic. The manner in which it is conveyed should work as intended with all Web browsers. Remember, information is only useful if it can be interpreted. As a means to that end, here is a list of very important edicts to follow when designing for the Web.
It just so happens that strict adherence to these rules will not only benefit your visitors, they will help ensure excellent search engine results via Google and other major providers.
Finally, if you notice any errors or omissions please let me know and I’ll get them corrected immediately.
I use the "No Adverts for Friends" plugin by Donncha O Caoimh
John, in spite of your great introduction into HTML, what do you think about current standards like XHTML and CSS?
All widely-used browser are capable to handle XHTML 1.0 and CSS 2.0.I am of the opinion that a clear boundary between content (XHTML) and design (CSS) will be more effective in important areas like search engine optimization, barrier-free access (WAI) to web content etc.
Roy,
I believe that you are correct that XHTML and CSS are important steps in the evolution of Web design. I don’t actually believe that XHTML is more search friendly than HTML (I think they are equal) but I absolutely believe that CSS should be used for design as opposed to HTML.
Having said all of that, I believe that the best first step for people learning is to start with simple HTML, then after getting an understanding of how that works they should move on to CSS and then perhaps XHTML if necessary.
John
I’ve always thoroughly enjoyed the HTMLhelp site you developed. I remember when I was a teenager trying to figure out HTML and referencing sites like yours, including the W3C and other popular ones. Thanks!
One Man, you are the One. Thank you. I don t know nothing about HTML so I ll look into your site. But this first lesson was a gift.
Thereza,
Thanks for the kind comments. I am glad to hear that you found this intro useful.
Although I can’t really even remember what it’s like not to know HTML, I feel like this is a very useful intro because I’ve had so many people tell me that it clears up the mystery for them. I think that after you study this a bit and let the concepts sink in it’s much easier to go over to HTMLHelp.com and use any of the HTML tags with minimal effort.
Take care,
John