Mastering the Web
Contents
Website Planning Tutorial
Website Design Tutorial
HTML Tutorial
HTML Tables Tutorial
CGI Tutorial
JavaScript Tutorial
Perl Tutorials
CSS Tutorial
Installing a Web Server
Security Tutorial
HTML Cookies Tutorial
Web Tracking Tutorial
Download Free Programs
F.A.Q.

  1. HTML Tutorial
  2. Character Data in HTML
  3. How does HTML work

Download FWTLogstat1

Download FWTLogstat2

Character Data in HTML

When you write the text of your HTML page, the characters you use must belong to a known character set. This character set is specified by the "charset" parameter of the "text/html" media type, and it is usually "ISO-8859-1", although it could also be the more restricted "US-ASCII". The character set (charset) ISO-8859-1 is also known as Latin Alphabet No. 1, or simply Latin-1. To specify this charset you should include the following in the HEAD section of the page:

<META HTTP-EQUIV="content-type" CONTENT=
"text/html; charset=ISO-8859-1">

Latin-1 includes characters from most Western European languages, as well as a number of control characters. Control characters are non-printable characters that are typically used for communication and device control, as format markers, and as data delimiters.

Control characters

In HTML the use of control characters is limited in order to maximize the chance of successful interchange over heterogeneous networks and operating systems. Only three control characters are used: Horizontal Tab (HT, encoded as 9 decimal in US-ASCII and ISO-8859-1), Carriage Return, and Line Feed.

Horizontal Tab is interpreted as a space in all contexts except pre-formatted text. Within pre-formatted text, the tab should be interpreted to shift the horizontal column position to the next position which is a multiple of 8 on the same line.

Carriage Return and Line Feed are conventionally used to represent end of line. For media types defined as "text/*", the sequence CR LF is used to represent an end of line. In practice, text/html documents are frequently represented and transmitted using an end of line convention that depends on the conventions of the source of the document; that representation may consist of CR only, LF only, or CR LF combination. Documents prepared in a computer running Microsoft's operating systems have their ends of line marked with both characters, while those prepared in Unix systems use only LF, and those in Apple computers use only CR.

In HTML, end of line in any of its variations is interpreted as a space in all contexts except pre-formatted text. Within pre-formatted text, browsers are expected to treat any of the three common representations of end-of-line as starting a new line.

Special characters

Certain characters have special meaning in HTML documents. There are two printing characters which may be interpreted by an HTML application to have an effect on the format of the text:

Space

· Interpreted as a word space (place where a line can be broken) in all contexts except the Pre-formatted Text element.

· Interpreted as a non-breaking space within the Pre-formatted Text element.

Hyphen

· Interpreted as a hyphen symbol in all contexts

· Interpreted as a potential word space when hyphenating the document

Certain characters are part of the HTML markup, and when used in the content's text should be replaced by entity references, always prefaced with ampersand (&) and followed by a semicolon. These characters are as follows:

	Symbol	Entity Name	Description
	<	lt		Less than sign
	>	gt		Greater than sign
	&   	amp		Ampersand
	"   	quot		Double quote sign

So that these characters will not be interpreted as markup, they must be represented by entity references. For example, this line of a program written in the C language,

if ( var > 125 && var < 250 ) {

when included in an HTML page, should be coded as:

if ( var &gt; 125 &amp;&amp; var &lt; 250 ) {

Character Entities

Many of the Latin alphabet No. 1 set of printing characters may be represented within the text of an HTML document by a character entity. The reasons for using a character entity are:

· the keyboard does not provide a key for the character, such as on U.S. keyboards which do not provide European characters

· the character may be interpreted as markup, such as the ampersand (&), double quotes ("), the lesser (<) and greater (>) characters

The HTML DTD includes a character entity for each of the printing characters in the character set Latin-1, so that one may reference them by name if it is inconvenient to enter them directly. To ensure that a string of characters is not interpreted as markup, represent all occurrences of <, >, and &; by character or entity references.

The following entity names are used in HTML, always prefaced with ampersand (&) and followed by a semicolon.

Table of character entities
Name Symbol Description
Aacute á Capital A, acute accent
aacute á Small a, acute accent
Acirc â Capital A, circumflex accent
acirc â Small a, circumflex accent
AElig æ Capital AE diphthong (ligature)
aelig æ Small ae diphthong (ligature
Agrave à Capital A, grave accent
agrave à Small a, grave accent
Aring å Capital A, ring
aring å Small a, ring
Atilde ã Capital A, tilde
atilde ã Small a, tilde
Auml ä Capital A, dieresis or umlaut mark
auml ä Small a, dieresis or umlaut mark
Ccedil ç Capital C, cedilla
ccedil ç Small c, cedilla
copy © Copyright
Eacute é Capital E, acute accent
eacute é Small e, acute accent
Ecirc ê Capital E, circumflex accent
ecirc ê Small e, circumflex accent
Egrave è Capital E, grave accent
egrave è Small e, grave accent
ETH ð Capital Eth, Icelandic
eth ð Small eth, Icelandic
Euml ë Capital E, dieresis or umlaut mark
euml ë Small e, dieresis or umlaut mark
Iacute í Capital I, acute accent
iacute í Small i, acute accent
Icirc î Capital I, circumflex accent
icirc î Small i, circumflex accent
Igrave ì Capital I, grave accent
igrave ì Small i, grave accent
Iuml ï Capital I, dieresis or umlaut mark
iuml ï Small i, dieresis or umlaut mark
Ntilde ñ Capital N, tilde
ntilde ñ Small n, tilde
Oacute ó Capital O, acute accent
oacute ó Small o, acute accent
Ocirc ô Capital O, circumflex accent
ocirc ô Small o, circumflex accent
Ograve ò Capital O, grave accent
ograve ò Small o, grave accent
Oslash ø Capital O, slash
oslash ø Small o, slash
Otilde õ Capital O, tilde
otilde õ Small o, tilde
Ouml ö Capital O, dieresis or umlaut mark
ouml ö Small o, dieresis or umlaut mark
reg ® Registered TradeMark
Szlig ß Small sharp s, German (sz ligature)
THORN þ Capital THORN, Icelandic
thorn þ Small thorn, Icelandic
trade TradeMark
Uacute ú Capital U, acute accent
uacute ú Small u, acute accent
Ucirc û Capital U, circumflex accent
ucirc û Small u, circumflex accent
Ugrave ù Capital U, grave accent
ugrave ù Small u, grave accent
Uuml ü Capital U, dieresis or umlaut mark;
uuml ü Small u, dieresis or umlaut mark
Yacute ý Capital Y, acute accent
yacute ý Small y, acute accent
yuml ÿ Small y, dieresis or umlaut mark

Numeric Character Entities

Numeric character entities are represented in an HTML document as entities whose name is the number sign (#) followed by a numeral from 32-126 and 161-255. The HTML DTD includes a numeric character entity for each of the printing characters of the ISO-8859-1 encoding, so that one may reference them by number if it is inconvenient to enter them directly. The following entity names are used, in HTML, always prefaced with ampersand (&) and followed by a semicolon.


Table of character entities
Name Description Symbol
#00-#08 Unused N/A
#09 Horizontal tab N/A
#10 Line feed N/A
#11-#31 Unused N/A
#32 Space N/A
#33 Exclamation mark !
#34 Quotation mark "
#35 Number sign #
#36 Dollar sign $
#37 Percent sign %
#38 Ampersand &
#39 Apostrophe '
#40 Left parenthesis (
#41 Right parenthesis )
#42 Asterisk *
#43 Plus sign +
#44 Comma ,
#45 Hyphen -
#46 Period (full stop) .
#47 Solidus (slash) /
#48-#57 Digits 0-9 0-9
#58 Colon :
#59 Semi-colon ;
#60 Less than <
#61 Equals sign =
#62 Greater than >
#63 Question mark ?
#64 Commercial at @
#91 Left square bracket [
#92 Reverse solidus (backslash) \
#93 Right square bracket ]
#94 Caret ^
#95 Horizontal bar _
#96 Acute accent `
#97-#122 Letters a-z a-z
#123 Left curly brace {
#124 Vertical bar |
#125 Right curly brace }
#126 Tilde ~
#127-#160 Unused N/A
#161 Inverted exclamation ¡
#162 Cent sign ¢
#163 Pound sterling £
#164 General currency sign ¤
#165 Yen sign ¥
#166 Broken vertical bar ¦
#167 Section sign §
#168 Umlaut (dieresis) ¨
#169 Copyright ©
#170 Feminine ordinal ª
#171 Left angle quote, guillemot left «
#172 Not sign ¬
#173 Soft hyphen N/A
#174 Registered trademark ®
#175 Macron accent ¯
#176 Degree sign °
#177 Plus or minus ±
#178 Superscript two ²
#179 Superscript three ³
#180 Acute accent ´
#181 Micro sign µ
#182 Paragraph sign
#183 Middle dot ·
#184 Cedilla ¸
#185 Superscript one ¹
#186 Masculine ordinal º
#187 Right angle quote, guillemot right »
#188 Fraction one-fourth ¼
#189 Fraction one-half ½
#190 Fraction three-fourths ¾
#191 Inverted question mark ¿
#192 Capital A, acute accent À
#193 Capital A, grave accent Á
#194 Capital A, circumflex accent Â
#195 Capital A, tilde Ã
#196 Capital A, ring Ä
#197 Capital A, dieresis or umlaut mark Å
#198 Capital AE diphthong (ligature) Æ
#199 Capital C, cedilla Ç
#200 Capital E, acute accent È
#201 Capital E, grave accent É
#202 Capital E, circumflex accent Ê
#203 Capital E, dieresis or umlaut mark Ë
#204 Capital I, acute accent Ì
#205 Capital I, grave accent Í
#206 Capital I, circumflex accent Î
#207 Capital I, dieresis or umlaut mark Ï
#208 Capital Eth, Icelandic Ð
#209 Capital N, tilde Ñ
#210 Capital O, acute accent Ò
#211 Capital O, grave accent Ó
#212 Capital O, circumflex accent Ô
#213 Capital O, tilde Õ
#214 Capital O, dieresis or umlaut mark Ö
#215 Multiply sign ×
#216 Capital O, slash Ø
#217 Capital U, acute accent Ù
#218 Capital U, grave accent Ú
#219 Capital U, circumflex accent Û
#220 Capital U, dieresis or umlaut mark Ü
#221 Capital Y, acute accent Ý
#222 Capital THORN, Icelandic Þ
#223 Small sharp s, German (sz ligature) ß
#224 Small a, acute accent à
#225 Small a, grave accent á
#226 Small a, circumflex accent â
#227 Small a, tilde ã
#228 Small a, dieresis or umlaut mark ä
#229 Small a, ring å
#230 Small ae diphthong (ligature) æ
#231 Small c, cedilla ç
#232 Small e, acute accent è
#233 Small e, grave accent é
#234 Small e, circumflex accent ê
#235 Small e, dieresis or umlaut mark ë
#236 Small i, acute accent ì
#237 Small i, grave accent í
#238 Small i, circumflex accent î
#239 Small i, dieresis or umlaut mark ï
#240 Small eth, Icelandic ð
#241 Small n, tilde ñ
#242 Small o, acute accent ò
#243 Small o, grave accent ó
#244 Small o, circumflex accent ô
#245 Small o, tilde õ
#246 Small o, dieresis or umlaut mark ö
#247 Division sign ÷
#248 Small o, slash ø
#249 Small u, acute accent ù
#250 Small u, grave accent ú
#251 Small u, circumflex accent û
#252 Small u, dieresis or umlaut mark ü
#253 Small y, acute accent ý
#254 Small thorn, Icelandic þ
#255 Small y, dieresis or umlaut mark ÿ

Previous | Contents | Next

| HOME | FEEDBACK | BOOKMARK |
Build your Website
© 1999-2008 Hector Castro -- All rights reserved

If your doubt is not answered in this site, please use the
contact form .
I'll answer as soon as posible.
I can help you using instant messaging. To schedule a meeting, please use the
meeting form.
You will find the late news about the free programs offered here on my blog
Free Webmaster Tools
You can get news about updates to my free programs through this
RSS feed.

www.great-web-info.com