A year ago, I wrote up a brief primer on HTML for some colleagues that was intended for writers and editors. By this I mean I tried to keep technical jargon to a minimum and focused on appealing to writers’ and editors’ love of semantics and grammar.
I was just forwarding it to someone who didn’t know any HTML, and I figured it makes sense to post it here for the benefit of anyone else who stumbles upon it.
Why You Should Care About HTML
For writers and editors, HTML probably seems at best like a minor nuisance. The fact that it was written by (and to some extent conceived of for) programmers is probably a big reason why.
In fact, HTML is pretty easy to learn conceptually. And these days, HTML has mostly been stripped of what once made it a “programming” or design language. Properly used, HTML has now become a way for content authors to define the meaning of various aspects of a web document. In fact, content authors and editors are really the only people who can properly mark up a document and ensure it’s being done correctly.
Here are just a few of the reasons why you should care (passionately) about HTML:
- Both people and computers need to read your documents. In general, we write and edit for people. But on the web, people aren’t the only ones who read your document — machines read it as well. Proper HTML is like good grammar for machines.
- Good HTML makes it easier for designers and programmers to do their jobs. Designers are charged with making your documents look right and programmers are charged with making them work right. When you play by the HTML rules, you give them the freedom to focus on what they do best.
- Your documents will last forever. The web is constantly evolving. Programmers come up with new tricks for working with web documents every day, and designers constantly have new design tools added to their arsenal. Good HTML is forward compatible, meaning your documents themselves can evolve as well, no matter which way technology goes in the future.
- You can add a whole new level of meaning to your documents for readers. Used properly, HTML allows you to give readers more information about what various parts of your documents mean. For example, you can seamlessly define abbreviations, cite quotations, or identify people and places without intefering with the readability of your documents.
What HTML Is and Isn’t
In simple terms, HTML is a markup langauge that allows content authors and editors to explicitly define the semantic meaning of various aspects of a web document. That’s a simple and true definition, but it conflicts with what most people think HTML is and how most people use it. Unless you grasp this definition of HTML, you’re doomed to use it improperly.
Here’s an easy way to think about it:
HTML is not a way of making a document look or function in a certain way. Instead, HTML is a way of describing what a document is.
Here’s a quick little example that illustrates where most people go astray in their thinking.
Our CEO was featured in Time magazine today.
When preparing to mark this sentence up in HTML, most writers or editors would think: “Okay, I just need to put the title of the publication ‘Time’ in italics.” This is a natural reaction, since you’ve been trained to put publications in italics throughout your life, and most of us are accustomed to pressing the little “I” button in Microsoft Word when we want something to appear in italics.
While that line of thinking isn’t entirely wrong, it isn’t entirely right either — at least from a sematnics perspective. The problem is that it conflates two related but separate concepts: presentation (in this case, the italics) and meaning (in this case, emphasis, which is what the italics are actually conveying).
We can see the difference in practice. One way to mark up the document (in an older version of HTML) was to use the italics tag,
Our CEO was featured in <i>Time</i> magazine today.
The newer version of HTML, however, insists upon defining this text not by how it should appear, but by what it means, in this case using the
<cite> (citation) tag:
Our CEO was featured in <cite>Time</cite> magazine today.
Citations, by default, are rendered as italics in most browsers, so for readers, the two ways of coding this document appear the same. If that makes the distinction seem merely academic, it’s not. There are some real ramifications to the two different approaches. To name a few:
- A content author might decide to provide a link to the home page of the publication being cited.
- A designer might choose to render citations with a different look — using the color green or a small graphic rather than just italics.
- A programmer might build a function that provides a complete list of references cited in an article at the bottom of the article, which can be generated on the fly.
There are many examples similar to this one — some with big implications, and others that are largely academic.
Nevertheless, good HTML is like good grammar. Although at the end of the day, you can very like ignore the rules of grammar and still get your point across, consistent use of good grammar makes your documents more readable and your image more professional. Good HTML has plenty of its own benefits — both obvious and hidden.
Now that you understand the proper way to conceive of HTML, and are as convinced as I am of the importance of separating meaning from presentation and functionality, it’s time to get down to brass tacks and describe how to actually use HTML.
HTML is a markup language; i.e. it’s used to “mark up” a document. Think of it like a highlighter pen. When you mark up a word or phrase with HTML, it’s like drawing a yellow line through it. The only difference is, you add a bit of description when using HTML to explain what the yellow line actually means.
Fortunately, HTML is quite simple. Like with any foreign language, there are some vocabulary words you’ll eventually need to learn, but conceptually HTML boils down to three concepts:
- Attributes and values
Let’s briefly put these terms in context, and then tackle them one by one. Following is a phrase marked up with generic HTML:
<element attribute="value">The cat is black.</element>
In the context of the example above:
- A tag is everything that comes between, and inclusive of, the angle brackets (
>). HTML phrases always start with an opening tag (e.g.
<element>) and end with a closing tag (e.g.
</element>). To use our metaphor from above, they define where you start putting your highlighter marker on the page and where you stop. (That’s why you must close all tags. If you don’t, a browser won’t be able to know where you wanted to stop your highlighter.)
- An element is a word that describes what the phrase your marking up is. Common elements include a paragraph (
<p>), a list item (
<li>), a hyperlink (
<a>), an abberviation (
<abbr>), or an image (
<img>). There are approximately 40 or so elements and only about 15 or 20 that are commonly used.
- An attribute/value pair provides more information about the phrase being marked up and is not always required. Attribue/value pairs are used, for example, to define the destination of a hyperlink (e.g.
<a href="http://www.yahoo.com">), the source of an image file (e.g.
<img src="http://www.photo.com/my_picture.jpg">, or the meaning of an abbreviation (e.g.
<abbr title="United Nations">U.N.</abbr>).
That’s really all there is to it. HTML is not more fancy than that. There are just a few syntax rules you need to learn and abide by in order to ensure your HTML is correct and a few element definitions you need to know to start using HTML.
Let’s start with syntax.
The example from above…
<element attribute="value">The cat is black.</element>
…is properly formatted HTML. The rules are pretty straightforward and easy to follow.
- HTML is case sensitive. Always use lowercase for elements and attribute names.
- A tag always starts with
<and ends with
>. If you forget to put a
>at the end of your tag, you’re going to confuse the computer, it won’t know where your tag ends!
- An HTML phrase always opens with a start tag (
<element>) and closes with an end tag (
</element>). The forward slash
/is what makes a tag an end tag. Exception: There are a few tags called “empty tags” that don’t a phrase between them. An example is the img tag, which generally looks like this:
<img src="[image URL]">. Empty tags still must be closed. They are closed by adding a space and a forward slash at the end, as follows in our example:
<img src="[image URL]" />.
- The value of an attribute must always be placed in quotes. For example
<a href="http://www.yahoo.com">and not
<a href=http://www.yahoo.com>. Don’t forget to close your quotation marks! The incorrect
<a href="http://www.yahoo.com>will cause you nightmares.
- Multiple attribute/value pairs should be separated by a space. For example,
<a href="http://www.yahoo.com" title="Yahoo! home page" class="external_link">.
- HTML phrases must be properly nested. This is best demonstrated by example. Correct:
<p>That statement is <em>ridiculous.</em></p>Incorrect:
<p>That statement is <em>ridiculous.</p></em>
Those are the ground rules — the grammar of HTML, if you will — and that’s all you really need to know. Everything else is just a matter of definitions and specifics. A good reference guide, like the guide at htmlhelp.com will give you easy definitions for every element and attribute. (Be forwarned, however, that most guides are written for an earlier version of HTML and might imply that you can ignore the above syntax rules, which you cannot).
There are plenty of tools out there that will help you code HTML more quickly than you can by hand. But if you take the time to learn the actual mechanics of HTML, and if you internalize the concepts behind it, you’ll find yourself with much more control over the meaning of your documents and in a position to provide much more value to your readers.