没有合适的资源?快使用搜索试试~ 我知道了~
首页HTML5草案规范详解
HTML5草案规范详解
需积分: 9 4 下载量 191 浏览量
更新于2024-07-23
收藏 3.49MB PDF 举报
"HTML5协议草稿PDF文件,包含HTML5的核心词汇表和相关API,是HTML和XHTML的标准草案,由W3C和WHATWG组织的专家编辑,旨在定义Web应用作者的新特性,基于现有编写实践引入新元素,并强调用户代理的互操作性标准。"
HTML5是Web开发领域的关键里程碑,它对网页设计和应用程序开发产生了深远影响。这份草稿文档详细阐述了HTML5的主要修订内容,包括新增的语义化元素、增强的多媒体支持、离线存储功能以及改进的表单和API接口。
1. **语义化元素**:HTML5引入了更多语义化的元素,如`<header>`、`<footer>`、`<article>`、`<section>`和`<aside>`等,这些元素帮助开发者更好地组织内容,提高页面结构的可读性和可访问性。
2. **多媒体支持**:HTML5添加了原生的视频(`<video>`)和音频(`<audio>`)元素,使得在网页上嵌入多媒体内容无需依赖Flash或其他第三方插件。同时,还支持`<canvas>`元素,用于绘制2D图形,为游戏和数据可视化提供了可能。
3. **离线存储**:通过`Application Cache`(现在通常称为Service Worker)机制,HTML5允许网页在离线状态下仍然可以访问部分数据,提升了用户体验。
4. **表单控件改进**:HTML5更新了表单控件,添加了新的输入类型(例如`date`、`email`、`url`),提供了更好的验证功能,同时引入了`placeholder`属性和`autofocus`属性,增强了表单的可用性。
5. **API扩展**:HTML5带来了WebSocket API,实现双向通信,提供了Web Storage(Local Storage 和 Session Storage)用于持久化数据,Geolocation API用于获取设备地理位置,Web Workers和WebSockets则提高了后台处理和实时通信的能力。
6. **增强的错误处理和兼容性**:HTML5规范强调了用户代理(浏览器)的互操作性,通过定义明确的错误处理规则,确保不同浏览器对HTML5的解析和行为一致。
7. **编辑者和贡献者**:这份草稿由来自W3C和WHATWG的专家共同编辑,包括Robin Berjon、Steve Faulkner、Travis Leithead、Erika Doyle Navara、Edward O'Connor、Silvia Pfeiffer、Ian Hickson等人,他们都是Web技术领域的领军人物。
HTML5的这些变化极大地推动了Web技术的发展,使得现代Web应用程序更加丰富、互动且易于维护。这份草稿文档对于开发者来说是一份重要的参考资料,有助于理解HTML5的最新特性和最佳实践。
In general, due to the Internet's architecture, a user can be distinguished from another by the user's IP address.
IP addresses do not perfectly match to a user; as a user moves from device to device, or from network to
network, their IP address will change; similarly, NAT routing, proxy servers, and shared computers enable
packets that appear to all come from a single IP address to actually map to multiple users. Technologies such
as onion routing can be used to further anonymize requests so that requests from a single user at one node on
the Internet appear to come from many disparate parts of the network.
However, the IP address used for a user's requests is not the only mechanism by which a user's requests could
be related to each other. Cookies, for example, are designed specifically to enable this, and are the basis of
most of the Web's session features that enable you to log into a site with which you have an account.
There are other mechanisms that are more subtle. Certain characteristics of a user's system can be used to
distinguish groups of users from each other; by collecting enough such information, an individual user's
browser's "digital fingerprint" can be computed, which can be as good, if not better, as an IP address in
ascertaining which requests are from the same user.
Grouping requests in this manner, especially across multiple sites, can be used for both benign (and even
arguably positive) purposes, as well as for malevolent purposes. An example of a reasonably benign purpose
would be determining whether a particular person seems to prefer sites with dog illustrations as opposed to
sites with cat illustrations (based on how often they visit the sites in question) and then automatically using the
preferred illustrations on subsequent visits to participating sites. Malevolent purposes, however, could include
governments combining information such as the person's home address (determined from the addresses they
use when getting driving directions on one site) with their apparent political affiliations (determined by examining
the forum sites that they participate in) to determine whether the person should be prevented from voting in an
election.
Since the malevolent purposes can be remarkably evil, user agent implementors are encouraged to consider
how to provide their users with tools to minimize leaking information that could be used to fingerprint a user.
Unfortunately, as the first paragraph in this section implies, sometimes there is great benefit to be derived from
exposing the very information that can also be used for fingerprinting purposes, so it's not as easy as simply
blocking all possible leaks. For instance, the ability to log into a site to post under a specific identity requires
that the user's requests be identifiable as all being from the same user, more or less by definition. More subtly,
though, information such as how wide text is, which is necessary for many effects that involve drawing text onto
a canvas (e.g. any effect that involves drawing a border around the text) also leaks information that can be used
to group a user's requests. (In this case, by potentially exposing, via a brute force search, which fonts a user
has installed, information which can vary considerably from user to user.)
Features in this specification which can be used to fingerprint the user are marked as this paragraph is.
Other features in the platform can be used for the same purpose, though, including, though not limited to:
The exact list of which features a user agents supports.
The maximum allowed stack depth for recursion in script.
Features that describe the user's environment, like Media Queries and the Screen object. [MQ]
[CSSOMVIEW]
The user's time zone.
1.9 A quick introduction to HTML
This section is non-normative.
A basic HTML document looks like this:
<!DOCTYPE html>
<html>
<head>
<title>Sample page</title>
</head>
<body>
<h1>Sample page</h1>
<p>This is a <a href="demo.html">simple</a> sample.</p>
<!-- this is a comment -->
</body>
</html>
HTML documents consist of a tree of elements and text. Each element is denoted in the source by a start tag,
such as "
<body>", and an end tag, such as "</body>". (Certain start tags and end tags can in certain cases be
omitted and are implied by other tags.)
Tags have to be nested such that elements are all completely within each other, without overlapping:
<p>This is <em>very <strong>wrong</em>!</strong></p>
<p>This <em>is <strong>correct</strong>.</em></p>
This specification defines a set of elements that can be used in HTML, along with rules about the ways in which
the elements can be nested.
Elements can have attributes, which control how the elements work. In the example below, there is a
hyperlink,
formed using the
a element and its href attribute:
<a href="demo.html">simple</a>
Attributes are placed inside the start tag, and consist of a name and a value, separated by an "=" character. The
attribute value can remain
unquoted if it doesn't contain space characters or any of " ' ` = < or >. Otherwise, it
has to be quoted using either single or double quotes. The value, along with the "
=" character, can be omitted
altogether if the value is the empty string.
<!-- empty attributes -->
<input name=address disabled>
<input name=address disabled="">
HTML5 http://www.w3.org/html/wg/drafts/html/CR/single-page.html
第16页 共318页 2014-7-17 15:03
<!-- attributes with a value -->
<input name=address maxlength=200>
<input name=address maxlength='200'>
<input name=address maxlength="200">
HTML user agents (e.g. Web browsers) then parse this markup, turning it into a DOM (Document Object Model)
tree. A DOM tree is an in-memory representation of a document.
DOM trees contain several kinds of nodes, in particular a
DocumentType node, Element nodes, Text nodes,
Comment nodes, and in some cases ProcessingInstruction nodes.
The
markup snippet at the top of this section would be turned into the following DOM tree:
The
root element of this tree is the html element, which is the element always found at the root of HTML
documents. It contains two elements,
head and body, as well as a Text node between them.
There are many more
Text nodes in the DOM tree than one would initially expect, because the source contains
a number of spaces (represented here by "␣") and line breaks (" ") that all end up as
Text nodes in the DOM.
However, for historical reasons not all of the spaces and line breaks in the original markup appear in the DOM.
In particular, all the whitespace before
head start tag ends up being dropped silently, and all the whitespace
after the
body end tag ends up placed at the end of the body.
The
head element contains a title element, which itself contains a Text node with the text "Sample page".
Similarly, the
body element contains an h1 element, a p element, and a comment.
This DOM tree can be manipulated from scripts in the page. Scripts (typically in JavaScript) are small programs
that can be embedded using the
script element or using event handler content attributes. For example, here is
a form with a script that sets the value of the form's
output element to say "Hello World":
<form name="main">
Result: <output name="result"></output>
<script>
document.forms.main.elements.result.value = 'Hello World';
</script>
</form>
Each element in the DOM tree is represented by an object, and these objects have APIs so that they can be
manipulated. For instance, a link (e.g. the
a element in the tree above) can have its "href" attribute changed in
several ways:
var a = document.links[0]; // obtain the first link in the document
a.
href = 'sample.html'; // change the destination URL of the link
a.protocol = 'https'; // change just the scheme part of the URL
a.setAttribute('href', 'http://example.com/'); // change the content attribute directly
Since DOM trees are used as the way to represent HTML documents when they are processed and presented
by implementations (especially interactive implementations like Web browsers), this specification is mostly
phrased in terms of DOM trees, instead of the markup described above.
HTML documents represent a media-independent description of interactive content. HTML documents might be
rendered to a screen, or through a speech synthesizer, or on a braille display. To influence exactly how such
rendering takes place, authors can use a styling language such as CSS.
In the following example, the page has been made yellow-on-blue using CSS.
<!DOCTYPE html>
<html>
<head>
<title>Sample styled page</title>
<style>
body { background: navy; color: yellow; }
</style>
</head>
<body>
<h1>Sample styled page</h1>
<p>This page is just a demo.</p>
</body>
</html>
For more details on how to use HTML, authors are encouraged to consult tutorials and guides. Some of the
examples included in this specification might also be of use, but the novice author is cautioned that this
specification, by necessity, defines the language with a level of detail that might be difficult to understand at
first.
DOCTYPE:
html
html
head
#text:
␣␣
title
#text: Sample page
#text:
␣
#text:
␣
body
#text:
␣␣
h1
#text: Sample page
#text:
␣␣
p
#text: This is a
a href="demo.html"
#text: simple
#text: sample.
#text:
␣␣
#comment: this is a comment
#text:
␣
HTML5 http://www.w3.org/html/wg/drafts/html/CR/single-page.html
第17页 共318页 2014-7-17 15:03
This section is non-normative.
When HTML is used to create interactive sites, care needs to be taken to avoid introducing vulnerabilities
through which attackers can compromise the integrity of the site itself or of the site's users.
A comprehensive study of this matter is beyond the scope of this document, and authors are strongly
encouraged to study the matter in more detail. However, this section attempts to provide a quick introduction to
some common pitfalls in HTML application development.
The security model of the Web is based on the concept of "origins", and correspondingly many of the potential
attacks on the Web involve cross-origin actions.
[ORIGIN]
Not validating user input
Cross-site scripting (XSS)
SQL injection
When accepting untrusted input, e.g. user-generated content such as text comments, values in URL
parameters, messages from third-party sites, etc, it is imperative that the data be validated before use, and
properly escaped when displayed. Failing to do this can allow a hostile user to perform a variety of
attacks, ranging from the potentially benign, such as providing bogus user information like a negative age,
to the serious, such as running scripts every time a user looks at a page that includes the information,
potentially propagating the attack in the process, to the catastrophic, such as deleting all data in the
server.
When writing filters to validate user input, it is imperative that filters always be whitelist-based, allowing
known-safe constructs and disallowing all other input. Blacklist-based filters that disallow known-bad
inputs and allow everything else are not secure, as not everything that is bad is yet known (for example,
because it might be invented in the future).
Code Example:
For example, suppose a page looked at its URL's query string to determine what to display, and the
site then redirected the user to that page to display a message, as in:
<ul>
<li><a href="message.cgi?say=Hello">Say Hello</a>
<li><a href="message.cgi?say=Welcome">Say Welcome</a>
<li><a href="message.cgi?say=Kittens">Say Kittens</a>
</ul>
If the message was just displayed to the user without escaping, a hostile attacker could then craft a
URL that contained a script element:
http://example.com/message.cgi?say=%3Cscript%3Ealert%28%27Oh%20no%21%27%29%3C/script%3E
If the attacker then convinced a victim user to visit this page, a script of the attacker's choosing would
run on the page. Such a script could do any number of hostile actions, limited only by what the site
offers: if the site is an e-commerce shop, for instance, such a script could cause the user to
unknowingly make arbitrarily many unwanted purchases.
This is called a cross-site scripting attack.
There are many constructs that can be used to try to trick a site into executing code. Here are some that
authors are encouraged to consider when writing whitelist filters:
When allowing harmless-seeming elements like
img, it is important to whitelist any provided
attributes as well. If one allowed all attributes then an attacker could, for instance, use the
onload
attribute to run arbitrary script.
When allowing URLs to be provided (e.g. for links), the scheme of each URL also needs to be
explicitly whitelisted, as there are many schemes that can be abused. The most prominent example
is "
javascript:", but user agents can implement (and indeed, have historically implemented) others.
Allowing a base element to be inserted means any script elements in the page with relative links
can be hijacked, and similarly that any form submissions can get redirected to a hostile site.
Cross-site request forgery (CSRF)
If a site allows a user to make form submissions with user-specific side-effects, for example posting
messages on a forum under the user's name, making purchases, or applying for a passport, it is important
to verify that the request was made by the user intentionally, rather than by another site tricking the user
into making the request unknowingly.
This problem exists because HTML forms can be submitted to other origins.
Sites can prevent such attacks by populating forms with user-specific hidden tokens, or by checking
Origin headers on all requests.
Clickjacking
A page that provides users with an interface to perform actions that the user might not wish to perform
needs to be designed so as to avoid the possibility that users can be tricked into activating the interface.
One way that a user could be so tricked is if a hostile site places the victim site in a small
iframe and then
convinces the user to click, for instance by having the user play a reaction game. Once the user is playing
the game, the hostile site can quickly position the iframe under the mouse cursor just as the user is about
to click, thus tricking the user into clicking the victim site's interface.
To avoid this, sites that do not expect to be used in frames are encouraged to only enable their interface if
they detect that they are not in a frame (e.g. by comparing the
window object to the value of the top
attribute).
1.9.1 Writing secure applications with HTML
1.9.2 Common pitfalls to avoid when using the scripting APIs
HTML5 http://www.w3.org/html/wg/drafts/html/CR/single-page.html
第18页 共318页 2014-7-17 15:03
This section is non-normative.
Scripts in HTML have "run-to-completion" semantics, meaning that the browser will generally run the script
uninterrupted before doing anything else, such as firing further events or continuing to parse the document.
On the other hand, parsing of HTML files happens asynchronously and incrementally, meaning that the parser
can pause at any point to let scripts run. This is generally a good thing, but it does mean that authors need to
be careful to avoid hooking event handlers after the events could have possibly fired.
There are two techniques for doing this reliably: use
event handler content attributes, or create the element and
add the event handlers in the same script. The latter is safe because, as mentioned earlier, scripts are run to
completion before further events can fire.
Code Example:
One way this could manifest itself is with
img elements and the load event. The event could fire as soon as
the element has been parsed, especially if the image has already been cached (which is common).
Here, the author uses the
onload handler on an img element to catch the load event:
<img src="games.png" alt="Games" onload="gamesLogoHasLoaded(event)">
If the element is being added by script, then so long as the event handlers are added in the same script,
the event will still not be missed:
<script>
var img = new Image();
img.src = 'games.png';
img.alt = 'Games';
img.onload = gamesLogoHasLoaded;
// img.addEventListener('load', gamesLogoHasLoaded, false); // would work also
</script>
However, if the author first created the img element and then in a separate script added the event listeners,
there's a chance that the
load event would be fired in between, leading it to be missed:
<!-- Do not use this style, it has a race condition! -->
<img id="games" src="games.png" alt="Games">
<!-- the 'load' event might fire here while the parser is taking a
break, in which case you will not see it! -->
<script>
var img = document.getElementById('games');
img.onload = gamesLogoHasLoaded; // might never fire!
</script>
This section is non-normative.
Authors are encouraged to make use of conformance checkers (also known as validators) to catch common
mistakes. The W3C provides a number of online validation services, including the
Nu Markup Validation Service.
1.10 Conformance requirements for authors
This section is non-normative.
Unlike previous versions of the HTML specification, this specification defines in some detail the required
processing for invalid documents as well as valid documents.
However, even though the processing of invalid content is in most cases well-defined, conformance
requirements for documents are still important: in practice, interoperability (the situation in which all
implementations process particular content in a reliable and identical or equivalent way) is not the only goal of
document conformance requirements. This section details some of the more common reasons for still
distinguishing between a conforming document and one with errors.
This section is non-normative.
The majority of presentational features from previous versions of HTML are no longer allowed. Presentational
markup in general has been found to have a number of problems:
The use of presentational elements leads to poorer accessibility
While it is possible to use presentational markup in a way that provides users of assistive technologies
(ATs) with an acceptable experience (e.g. using ARIA), doing so is significantly more difficult than doing so
when using semantically-appropriate markup. Furthermore, even using such techniques doesn't help
make pages accessible for non-AT non-graphical users, such as users of text-mode browsers.
Using media-independent markup, on the other hand, provides an easy way for documents to be authored
in such a way that they work for more users (e.g. text browsers).
Higher cost of maintenance
It is significantly easier to maintain a site written in such a way that the markup is style-independent. For
example, changing the color of a site that uses
<font color=""> throughout requires changes across the
entire site, whereas a similar change to a site based on CSS can be done by changing a single file.
Larger document sizes
Presentational markup tends to be much more redundant, and thus results in larger document sizes.
For those reasons, presentational markup has been removed from HTML in this version. This change should not
come as a surprise; HTML4 deprecated presentational markup many years ago and provided a mode (HTML4
1.9.3 How to catch mistakes when writing HTML: validators and conformance checkers
1.10.1 Presentational markup
HTML5 http://www.w3.org/html/wg/drafts/html/CR/single-page.html
第19页 共318页 2014-7-17 15:03
Transitional) to help authors move away from presentational markup; later, XHTML 1.1 went further and
obsoleted those features altogether.
The only remaining presentational markup features in HTML are the
style attribute and the style element. Use
of the
style attribute is somewhat discouraged in production environments, but it can be useful for rapid
prototyping (where its rules can be directly moved into a separate style sheet later) and for providing specific
styles in unusual cases where a separate style sheet would be inconvenient. Similarly, the
style element can
be useful in syndication or for page-specific styles, but in general an external style sheet is likely to be more
convenient when the styles apply to multiple pages.
It is also worth noting that some elements that were previously presentational have been redefined in this
specification to be media-independent:
b, i, hr, s, small, and u.
This section is non-normative.
The syntax of HTML is constrained to avoid a wide variety of problems.
Unintuitive error-handling behavior
Certain invalid syntax constructs, when parsed, result in DOM trees that are highly unintuitive.
Code Example:
For example, the following markup fragment results in a DOM with an
hr element that is an earlier
sibling of the corresponding
table element:
<table><hr>...
Errors with optional error recovery
To allow user agents to be used in controlled environments without having to implement the more bizarre
and convoluted error handling rules, user agents are permitted to fail whenever encountering a
parse
error.
Errors where the error-handling behavior is not compatible with streaming user agents
Some error-handling behavior, such as the behavior for the
<table><hr>... example mentioned above,
are incompatible with streaming user agents (user agents that process HTML files in one pass, without
storing state). To avoid interoperability problems with such user agents, any syntax resulting in such
behavior is considered invalid.
Errors that can result in infoset coercion
When a user agent based on XML is connected to an HTML parser, it is possible that certain invariants
that XML enforces, such as comments never containing two consecutive hyphens, will be violated by an
HTML file. Handling this can require that the parser coerce the HTML DOM into an XML-compatible
infoset. Most syntax constructs that require such handling are considered invalid.
Errors that result in disproportionally poor performance
Certain syntax constructs can result in disproportionally poor performance. To discourage the use of such
constructs, they are typically made non-conforming.
Code Example:
For example, the following markup results in poor performance, since all the unclosed
i elements
have to be reconstructed in each paragraph, resulting in progressively more elements in each
paragraph:
<p><i>He dreamt.
<p><i>He dreamt that he ate breakfast.
<p><i>Then lunch.
<p><i>And finally dinner.
The resulting DOM for this fragment would be:
Errors involving fragile syntax constructs
There are syntax constructs that, for historical reasons, are relatively fragile. To help reduce the number of
users who accidentally run into such problems, they are made non-conforming.
Code Example:
For example, the parsing of certain named character references in attributes happens even with the
closing semicolon being omitted. It is safe to include an ampersand followed by letters that do not
form a named character reference, but if the letters are changed to a string that does form a named
character reference, they will be interpreted as that character instead.
In this fragment, the attribute's value is "
?bill&ted":
<a href="?bill&ted">Bill and Ted</a>
p
i
#text: He dreamt.
p
i
i
#text: He dreamt that he ate breakfast.
p
i
i
i
#text: Then lunch.
p
i
i
i
i
#text: And finally dinner.
1.10.2 Syntax errors
HTML5 http://www.w3.org/html/wg/drafts/html/CR/single-page.html
第20页 共318页 2014-7-17 15:03
剩余317页未读,继续阅读
2016-06-05 上传
2012-11-17 上传
2021-11-24 上传
2021-10-10 上传
2024-10-16 上传
2024-10-16 上传
19880201wzy
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- WPF渲染层字符绘制原理探究及源代码解析
- 海康精简版监控软件:iVMS4200Lite版发布
- 自动化脚本在lspci-TV的应用介绍
- Chrome 81版本稳定版及匹配的chromedriver下载
- 深入解析Python推荐引擎与自然语言处理
- MATLAB数学建模算法程序包及案例数据
- Springboot人力资源管理系统:设计与功能
- STM32F4系列微控制器开发全面参考指南
- Python实现人脸识别的机器学习流程
- 基于STM32F103C8T6的HLW8032电量采集与解析方案
- Node.js高效MySQL驱动程序:mysqljs/mysql特性和配置
- 基于Python和大数据技术的电影推荐系统设计与实现
- 为ripro主题添加Live2D看板娘的后端资源教程
- 2022版PowerToys Everything插件升级,稳定运行无报错
- Map简易斗地主游戏实现方法介绍
- SJTU ICS Lab6 实验报告解析
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功