翻译于 2014/11/07 08:48
1 人 顶 此译文
The first time I sat across a table from Jacob (@fat), he asked bluntly, “How do you write a text editor?”
I drew a tree structure on the whiteboard, waved my hands, and said “This is a shitty editing surface.” Then I drew a column of boxes with arrows pointing to arrays, waved my hands some more, and said “This is a good editing surface.”
Jacob raised an eyebrow.
This post is what I would have said instead, if I had a year to think about it.
我第一次坐在Jacob(@fat)桌子对面时,他直率的问:“你是怎么写一个文字编辑器的?”
我在白板上画了一个树结构,挥舞着手臂,说“这个是如shit一般的编辑界面。“然后我画了一列方框,用箭头指向数组,更多的挥舞手臂,说”这个是一个好的编辑界面。”
Jacob扬了一下眉。
这篇文章是我本应该那时说的,如果我有一年的时间去思考的话。
ContentEditable is the native widget for editing rich text in a web browser. It is…sad.
I’m going to try to prove to you, with some hand-wavey math, that the current approach of ContentEditable is broken. This is not because I think math is a persuasive way to make this argument. It actually makes the argument more alienating.
But I do think that text editors lead to lots of fuzzy, ill-defined questions like “What does What-You-See-Is-What-You-Get (WYSIWYG) even mean?” and “What happens when you select this text and hit Enter?” Axiomatic math is the best toolkit I know to take fuzzy, ill-defined questions and sharpen them.
ContentEditable 是一种在web浏览器上进行富文本编辑的本地原生组件. 它是那样让人…伤感.
我会用一些顺手拈来的数学方法来想你证明,目前的ContentEditable的方法是不好的. 这并不是因为我觉得数学是解释这个论点的一个有说服力的方式。它实际上使得这个论点更加的异类话.
但是我真的觉得文本编辑器导致了太多的模棱两可, 就像“所见即所得 (WYSIWYG) 是啥意思?”还有 “当你选择了这段文本并敲下了Enter会发生什么?”这样不明确的问题。公理化的数学是我所知道的能解决模棱两可的问题并对它们进行明确的最佳工具.
So what does WYSIWYG mean? A good WYSIWYG editor should satisfy the following 3 axioms:
The mapping between DOM content and Visible content should be well-behaved.
The mapping between DOM selection and Visible selection should be well-behaved.
All visible edits should map onto an algebraically closed and complete set of visible content.
First, I’ll explain what each of these 3 axioms mean, and why a good editor should obey these rules. But let’s be clear: they’re axioms. The weakest part of any proof. We’re assuming they’re OK unless we have evidence otherwise.
Second, I’ll show that ContentEditable fails all 3 axioms.
Third, we’ll talk about how new browser features and libraries try to address these issues, and how we handle them in the Medium editor.
DOM space is the set of all web pages that you can express in HTML. All pages can be represented as a tree of elements, with text nodes as the leaves of those trees.
Visible space (“what-you-see-is-what-you-get”) is the set of all visible pages — what you actually see on the screen when a browser renders a page. We say that two pages are the same in Visible space if they look exactly the same.
The browser’s rendering engine is a mapping from DOM space onto Visible space. By “onto,” we mean that all Visible pages are the output of Render(x) for some DOM tree x.
那么所见即所得是什么意思呢?一个好的所见即所得编辑器应该满足下面3个定理:
1.DOM内容和可视化(Visible)内容能够很好地进行映射。
2.DOM选择和可视化(Visible)选择能够很好地进行映射。
3.所有的可视化编辑都能够映射到一个从代数上来说封闭的和完整的可视化内容集合上面。
首先,我会解释这3个定理所表达的意思,并且一个好的编辑器为什么要遵守这些规则。但是我们首先要明白:他们是定理。他们最弱的部分是缺少证明。但是我们假设他们是没问题的除非我们能拿出明显的证据来。
其次,我会证明ContentEditable不满足这3个定理。
最后,我会讨论浏览器新的特性和库是怎样针对这些问题的,以及我们怎样在普通的编辑器中处理他们的。
DOM空间是我们能在HTML中表述的所有网页页面的集合。所有页面都能够被表示成一个元素树,而这些树把文本节点作为叶子。
可视化空间(“所见即所得”)是所有可视化页面的集合 — 也就是我们在屏幕上看到的浏览器渲染的一个页面。我们常常会把看起来一样的两个页面的可视化空间认为是一样的。When we say a mapping is well-behaved in an editor, we mean that the mapping preserves all edit operations (see footnote 1). More precisely, if Render is well-defined, then
for all edit operations E, and DOM pages x and y Render(x) = Render(y) implies Render(E(x)) = Render(E(y))
This is a way of formalizing the “what you get” part after “what you see.” If two pages look the same, and we make the same edit on them, then the two results should look the same. (again, see 1)
I’ve been surprised how many “WYSIWYG” editors on the web break this rule. It may sound like an obvious principle. But it leads you into weird existential questions about what “same” means, which are best explored with examples.
当我们说一个映射在编辑器中表现良好时,我们的意思是这个映射保留了所有的编辑操作(见注1)。更准确的说,如果渲染的定义是明确的话,那么
for all edit operations E, and DOM pages x and y Render(x) = Render(y) implies Render(E(x)) = Render(E(y))
这是一种在“你看到的是什么”之后确定“你得到了什么”的方式。如果两个部分看起来是一样的,而我们对它们进行了相同的编辑,那么两种结果应该看起来是一样的. (请再看看第 1 条)
我已经很惊奇的看到网络上那么多的 “WYSIWYG” 编辑器都打破了这个规则. 这听起来应该是一个理所当然的规则。但是它会导致你陷入有关“相同”是什么意思这个有点怪异的问题,而对这个问题的问题的最好探讨就是示例了.
Consider a sample sentence:
The hobbit was a very well-to-do hobbit, and his name was Baggins.
The Medium editor renders this sentence as, roughly, below.
The <a href=”http://en.wikipedia.org/wiki/The_Hobbit">hobbit</a> was a very well-to-do hobbit, and his name was <strong><em>Baggins</em></strong>.
There are many, many ways to encode that last word, Baggins, as both italicized and bold. (see footnote 2)
<strong><em>Baggins</em></strong> <em><strong>Baggins</strong></em> <em><strong>Bagg</strong><strong>ins</strong></em> <em><strong>Bagg</strong></em><strong><em>ins</em></strong>
These forms should be equivalent, editor-wise. Any edits you make to this post need to treat all these forms the same. It is surprisingly tricky to write an edit action that knows about all the different DOM forms.
看看下面这个例句:
The hobbit was a very well-to-do hobbit, and his name was Baggins.
编辑器里面呈现这句话,大致上如下所示.
The <a href=”http://en.wikipedia.org/wiki/The_Hobbit">hobbit</a> was a very well-to-do hobbit, and his name was <strong><em>Baggins</em></strong>.
有许多许多的方式可以用来对最后一个词—— Baggins —— 进行编码. (见注2)
<strong><em>Baggins</em></strong> <em><strong>Baggins</strong></em> <em><strong>Bagg</strong><strong>ins</strong></em> <em><strong>Bagg</strong></em><strong><em>ins</em></strong>
编辑器应该明智的认为这些形式是等价的。你对这篇文章所进行的任何编辑都应该对所有这种形式同等对待。编写一个编辑动作来了解所有这些不同的DOM形式是需要令人极其惊讶的技巧.
For many ContentEditable implementations on the web, some invisible character or empty span tag may slip into the HTML, so that two ContentEditable elements behave totally differently (even though they look the same). The experience can be maddening to users, and hard for engineers to debug.
Even if we knew how to write an edit action that is well-behaved, how would we check it? If we limit our HTML to simple tags, proving that two forms are visually equivalent is…complicated. Your best bet is to iterate through each letter, assign it a style, and compare the results.
In an ideal world, we would have some high-level API for making “visual edits” to the DOM. Each operation would guarantee that it is well-behaved, and does the “same” thing for all visually equivalent pages. Then, as long as your editor only used these APIs, you could guarantee that it is well-behaved.
The mapping between DOM content and Visible content is ugly, but at least it’s many-to-one. One DOM representation has exactly one visible representation.
Selections are worse because the mapping is many-to-many.
It’s easy enough to see that one visible selection can have many DOM representations. If you have the HTML,
his name was <strong><em>Baggins</em></strong>
then a cursor before “Baggins” can be in one of three DOM positions: before the strong start tag, between the strong start tag and the em start tag, and after the em start tag. If you place your cursor before “Baggins” and start typing, will your characters be bold, or italicized, or neither?
对于网上大量的可编辑内容的实现,一些不显眼的字符或者空的span标签可能会进入到HTML中,因此两个可编辑内容的元素的表现完全不同(即使看起来是一样的)。这样的体验会激怒用户,而工程师也很难去调试。
即使我们知道如何编写一个表现良好的编辑动作,但我们怎么检查它呢?如果我们把HTML限制到一些简单的标签,证明两个表单视觉上是相等的将。。。非常复杂。你最好能够在每个字符上迭代,分配一个样式,并比较结果。
在理想世界里,我们对于DOM的“可视编辑”会有一些高级的API。每个操作将保证它的有效性,对于所有视觉上相等的页面做“相同”的操作。这样,只要你的编辑器仅仅使用这些API,你就能保证它的有效性。
DOM和可见内容的映射是很丑陋的,但至少是多对一的关系。一个DOM表示有一个准确的可见表示。
选择更加丑陋,因为映射是多对多的关系。
你可以很轻松的看到一个可见的选择可以有很多的DOM表示。如果你有HTML,
his name was <strong><em>Baggins</em></strong>
那么“Baggins”之前的指针可以在三个DOM位置之一:在 strong开始标签前,在 strong开始标签和 em开始标签中间,以及在 em开始标签之后。如果你把指针放在“Baggins”之前开始输入,你的字符会是粗体,斜体,或者都不是?
More subtly, one DOM selection can have multiple visual representations. Consider the case where “well-to-do” breaks after “to-”, as in the image above. A cursor at the end of the first line and at the beginning of the second line have the same DOM position, but different visual positions. As far as I know, there is no way to tell the browser to prefer one visual position over the other.
When designing editor commands, we want selections that look the same to behave the same. But because that mapping is messy, this is a pain too.
One day a couple years ago, my friend Julie sent me a Gchat message:
We can remove Apple Style Span…Oh happy day!
Ryosuke Niwa wrote a lovely post on the WebKit blog about the quest to remove apple-style-span. If you’ve read this far, many of the issues he raises will sound familiar. WebKit’s ContentEditable editor was adding loads of “bookkeeping” HTML markup that didn’t change anything visually, but made the editor behave differently.
He also points out that WebKit’s ContentEditable implementation has to be able to deal with HTML created by any other CMS, or any other browser’s ContentEditable implementation. Our editor should be a good citizen in this ecosystem. That means we ought to produce HTML that’s easy to read and understand. And on the flip side, we need to be aware that our editor has to deal with pasted content that can’t possibly be created in our editor.
几年前的某一天, 我的朋友Julie在Gchat上给我发了一个消息:
We can remove Apple Style Span…Oh happy day!
Ryosuke Niwa 在WebKit的博客上发表了一个友好的帖子 ,这篇帖子请求移除苹果风格的span(apple-style-span)。如果你之前读过这篇文章的话,那么他提出的许多问题听上去很耳熟。WebKit的ContentEditable 编辑器增加许多“bookkeeping”HTML标签,这种标签不会改变任何的可视化效果,仅仅是使编辑器表现的不同。
他也指出WebKit的ContentEditable的实现必须能够处理由其他任何的CMS或其他任何的浏览器的ContentEditable 的实现创建的HTML。我们的编辑器在这种生态环境下应该是一个好的公民。这意味着我们应该制作易读易懂的HTML。但另一面,我们要意识到我们的编辑器必须处理那些我们不能在编辑器中创建的拷贝内容。
I’ve seen classes of bugs where the only way to reproduce is to write text in Firefox, switch to Chrome to make an edit, then switch back to Firefox. This is frustrating — for both developers and users.
To avoid this class of bugs, we say that the contents of a good WYSIWYG editor should be algebraically closed under its edits. This means that the contents of the editor should always be something that I could create by typing “normally.” I shouldn’t be able to break out into different types of documents by pasting some HTML in, or editing in another browser.
A bare ContentEditable element is a bad WYSIWYG editor, because it breaks all of these axioms. So how would we build a good WYSIWYG editor?
For the Medium editor, there are 4 key pieces.
Create a model of the document, with a simple way to tell if two models are visually equivalent
Create a mapping between the DOM and our model
Define well-behaved edit operations on this model
Translating all key presses and mouse clicks into sequence of these operations
I’ll walk you briefly through each of these pieces, and how we make changes to them. At the end, I’ll discuss how browser engineers are making ContentEditable better, and may make some of these components obsolete.
The Medium editor model has two fields: a list of paragraphs, and a list of sections.
Each paragraph contains
text, a string of plain text
markups, a list of formatting text ranges, like “bold from char 1 to 5”
metadata for images or embeds
layout, a description of how we should position the paragraph
A section describes a background for a sublist of paragraphs.
Any selection in the Medium editor is expressed as two points. Each point is a paragraph index and text offset into that paragraph, and a type. Most selections are text-type selections. We also have media-type selections (when the tooltip is on the image), and section-type selections (when the tooltip is on the section background).
The advantage of this model is that two models have the same visual rendering if and only if the models are equal. Any change to a model translates to a well-defined visual change.
Next, we define a mapping from DOM space to the model space. We break this into two separate cases: “indoor” mappings and “outdoor” mappings.
An indoor mapping is when we take content inside the editor and translate it back and forth between DOM and model. We expect an indoor mapping to be one-to-one.
An outdoor mapping is when we have HTML from outside the editor, like when the user pastes HTML from Word into a Medium post. We need to translate it to our paragraphs-and-sections model. We expect outdoor mappings to be lossy. We prioritize plain text first, then bold/italic/link markup, then images and other miscellaneous formatting.
我见过许多种问题,复现这种问题的唯一方式是在Firefox中写文本,然后切换到Chrome中做编辑操作,再然后切回到Firefox中。这对开发者和用户来说是非常令人沮丧的。
一个最基本的 ContentEditable 元素是一个非常差的编辑器, 因为它打破了前面提到的所有定理.。那么我们怎样构建一个好的所见即所得的编辑器呢?
对一个编辑器来说,有4个关键点。
创建一个文档模型,并且能够用一种简单的方式去区分两个模型是否在视觉上相等
创建一个在DOM与我们的模型之间的映射
在这种模型上能够定义表现良好的编辑操作
能够把所有的按键操作和鼠标点击转换成相应操作的序列
我会简要的介绍每个关键点以及我们怎样对他们做出改变。最后,我会讨论浏览器工程师怎样才能把ContentEditable 做的更好,并且去掉这些组件中不好的部分
编辑器模型有两个领域:一连串的段落以及一连串的区域。
每一个段落包括下面这些内容
文本,一个普通文本的字符串
标记,一连串格式化好的文本范围,比如“对位置1到5的字符进行加粗”
图像或嵌入的元数据
布局,一种我们怎样放置段落的描述
一个区域描述一个子列表段落的背景。
在编辑器中的任何选择操作都会被表述成两个点。每个点代表的是一个段落的索引和那么段落的文本偏移,以及一个类别。大多数选择操作都是文本类型的选择。我们也有媒介类型的选择(当提示信息显示在图片上时)和区域类型的选择(当提示信息显示在区域背景上面)。
这种模型的优点是如果仅当这两种模型相等时他们会有同样的可视化渲染效果。对模型的任何改变都能转换到一个定义很好的可视化改变上来。
下面我会定义DOM空间到这种模型空间的映射。我们把这种映射分成两种:“室内”映射和“室外”映射。
室内映射是指我们从编辑器中取出内容并且来回的在DOM空间和这种模型空间进行映射。我们希望室内映射是一对一的。
室外映射是指我们从编辑器外获取HTML,就像用户从Word中拷贝HTML到一篇帖子中一样。我们需要把它转换到我们的段落和区域模型中。我们希望室外映射是有损的。我们优先处理普通文本,然后加粗/倾斜/超链接标签,然后图片以及其他各种各样的格式。
When we map our model to the DOM, the tree looks like this:
<div> <!-- root --> <section> <!-- section --> <!-- section-inner --> <div class="section-inner layout-column"> <p> <!-- paragraph --> <strong><em>Baggins</em></strong> <!-- text -->
The section node is generated from the section model, and applies background images or colors to a list of paragraphs.
The section-inner node is generated from the paragraph’s layout property, and determines the width of the main column. On most paragraphs, it’s narrow and centered. On full-width image paragraphs, it’s 100%. On row grids, it’s half-way outset.
The next node is the semantic type of the paragraph: P, H2, H3, PRE, FIGURE, BLOCKQUOTE, OL-LI (ordered list item), and UL-LI (unordered list item).
When we translate markup ranges into DOM nodes, we sort them by type: A, then STRONG, then EM. We will never print a STRONG tag containing an anchor. We break it up so that the anchor contains the STRONG tag.
The Medium body editor has exactly 6 edit operations: InsertParagraph, RemoveParagraph, UpdateParagraph, InsertSection, RemoveSection, and UpdateSection.
They do just what they sound like. The paragraph operations take a paragraph model and an index. The section operations take a section model and an index.
All possible editor contents can be expressed by a sequence of these operations, and it’s usually trivial to construct such a sequence.
It’s easy to see that the content is well-behaved under these edit operations. They act directly on our model, not on the DOM, and the model makes it easy to tell when two things are visually equivalent.
When you interact with the Medium editor, we have to translate your key presses and mouse clicks into a sequence of those 6 operations.
This is the trickiest part. We don’t want to keep around some huge list of every possible key sequence. It would be a crazy long list for English-speaking users, never mind for languages with non-Latin characters and keyboards.
The key insight is that we can enumerate all the ways to insert and remove paragraphs with normal ContentEditable keyboard commands. They are: carriage return (enter, ctrl-m, etc.), delete (delete, backspace, etc.), type-over (select-text-and-type), and paste. So we capture, cancel, and manually translate those keyboard events into our internal editor operations.
For all other keyboard events, we let the native ContentEditable behavior kick in. After the keyboard event finishes, we map the paragraph DOM back to a paragraph model, and compare the model to what we had before. If the DOM changed, we create a new UpdateParagraph op and flush it through the editor pipeline, bringing the DOM and model back in sync.
If we had infinite computing power, applying these edit operations would be straightforward. We would apply them to the model, re-render the whole post, and be done with it.
But in the real world, re-rendering the whole post on every keypress would be too slow. And you would see lots of ugly flickering, because iframes and images would be continuously reloading. Instead, we listen on changes to the model, and try to make the minimal possible change to the DOM.
As I type this now, I can see the Chrome spellchecker’s red-underline on the word “keypress” flicker. That’s because the Medium editor is changing the whole paragraph at once, rather than only changing a piece of the paragraph. If we made a more narrow DOM change, the flicker would go away, but the code would be more complicated.
There have been some rumblings lately from some Chromium contributors (Levi Weintraub, Julie Parent, and Jelte Liebrand) that they want to redo ContentEditable on top of Polymer Elements and Shadow DOM. The proposal wrestles with many of the same high-level architecture issues that the Medium editor tries to solve.
Create an editor model made out of custom Polymer elements
Define a mapping between the editor model and the real DOM with a Shadow DOM
All keypresses and mouse clicks in ContentEditable would be translated into an abstract edit intent, expressed as a JSON object like {editIntent: ‘delete’}
Polymer elements could define handers for edit intents
If the Medium editor got some sort of edit intent API, we would be able to throw away a lot of custom code for translating keypresses into abstract edit operations. It would be an interesting experiment to express our paragraph models as Polymer/ShadowDOM elements.
Whenever I explain this to people who work on text editors, they call me on my sleight-of-hand here.
“Of course the Medium editor is better than ContentEditable. You cheated. ContentEditable tries to be a general-purpose WYSIWYG HTML editor. The Medium editor drops the ‘general-purpose’ requirement, so you can pick and choose what HTML structures you want to handle.”
This is true. But it’s misguided.
A good WYSIWYG editor is axiomatically inconsistent with a good general-purpose HTML editor. It’s impossible to build what ContentEditable wants to be, because they have conflicting requirements.
我们的模型映射到DOM树后,看起来像下面这样:
<div> <!-- root --> <section> <!-- section --> <!-- section-inner --> <div class="section-inner layout-column"> <p> <!-- paragraph --> <strong><em>Baggins</em></strong> <!-- text -->
这个区域(section)节点是从区域模型中产生的,并且会将背景图片和颜色应用到一连串的段落中。
这个区域内(section-inner)节点是根据段落排版属性而产生的,并且决定了主列的宽度。对于大部分段落来说,它是狭窄并且居中的。对于全宽的图片段落来说,它是100%宽的。对于上面的网格来说,它是原始的一般。
下一个节点是段落的语义类别: P, H2, H3, PRE, FIGURE, BLOCKQUOTE, OL-LI (有序列表项), 和 UL-LI (无序列表项).
当我们把标签类别转换到DOM节点的时候,我们会按照类别排序它们:A,然后 STRONG,再然后 EM。我们永远不会打印一个包含锚定的STRONG标签。我们会拆散它而让锚定包含STRONG标签。
编辑器主要包括6个编辑操作:插入段落,移除段落,更新段落,插入区域,移除区域,以及更新区域。
这些操作表现的会和描述的那样。段落操作会接受一个段落模型和一个索引。区域操作会接受一个区域模型和一个索引。
所有可能的编辑器内容都能够被一系列的这些操作所表述,并且构造这样一个序列通常是比较简单的。
显而易见,内容在这些编辑操作下能够表现的很好。这些操作是直接应用于我们的模型上面的,而不是DOM上面,并且这个模型能够更容易地区分两件东西在视觉上是否是相等的。
当你和编辑器交互的时候,我们必须将你的按键操作和鼠标点击操作转换成那6个操作的一个序列。
这是最复杂的部分。我们不会对那么多的每种可能按键的序列履行职责。这对于一个以英语为语言的用户来说是一个非常巨大的列表,永远不考虑非拉丁字符和键盘。
通过观察,我们能够用常规的ContentEditable键盘操作枚举所有对段落插入和移除的操作方式。他们是:回车(enter,ctrl-m,等。),删除(delete,backspace,等。),悬浮输入(type-over)(在一段选择的文本上输入),以及拷贝。所以我们能够捕获,取消,以及手动将这些键盘事件转换到我们的编辑器内部操作上来。
对于所有其他键盘事件,我们让原有的ContentEditable 行为生效。在键盘事件结束之后,我们把段落的DOM映射回到段落的模型中来,并且和我们之前的模型进行比较。如果DOM改变了,我们会创建一个新的更新段落操作并且通过编辑器管线应用它,保持DOM和模型同步。
如果我们有无限计算的能力,那么直接应用这些编辑操作就可以了。我们应用这些操作到模型上,重新渲染的整个页面,最后结束操作。
但是在现实生活中,对每个按键操作都重新渲染整个页面是非常慢的。并且你会看到许多丑陋的闪烁现象,因为内嵌框架(iframes)和图片会一直处于加载中。相反,我们会对模型的改变事件进行监听,并且尽最大可能减少对DOM的改变。
当我在写篇文章时,我可以看到Chrome拼写检查程序在“keypress”单词下面所加的红色下划线在闪烁。这是因为编辑器正在同时对整个段落进行改变,而不是仅仅改变这个段落的一小块。如果我们仅仅对DOM进行相对很小的改变,那么闪烁就会消失,但是这样的代码会相对的更复杂。
最近有一些来自Chromium贡献者 (Levi Weintraub, Julie Parent, and Jelte Liebrand) 的流言说这些贡献者想去基于聚合元素(Polymer Elements)和Shadow DOM特性重做ContentEditable。这个方案也会像编辑器一样去尝试解决许多同样高级架构方面的问题。
创建一个由自定义 聚合元素(Polymer elements)构成的编辑器模型
定义编辑器模型与真实的具有 Shadow DOM特性的DOM之间的映射
所有在ContentEditable中的按键操作和鼠标点击操作都会被转换成一种抽象的编辑含义,被表示成像{editIntent: ‘delete’}这样的JSON对象。
聚合元素(Polymer Elements)对这样的编辑含义操作定义相应的处理方法
如果编辑器能够获取某种编辑含义的API,那么我们就能够抛弃许多转换按键操作到抽象编辑操作的自定义代码了。将我们的段落模型当作聚合(Polymer)/ ShadowDOM元素是一件很有趣的尝试。
不管我什么时候向那些从事于文本编辑器工作的人解释的时候,他们都认为我在玩花招。
“当然编辑器要比ContentEditable更棒一些。你错了。ContentEditable努力的去成为一个通用的所见即所得HTML编辑器。而一般的编辑器放弃了'通用目的'的需求,所以你能够挑选你想去处理的任何HTML结构。”
这是事实。但是确是误导的。
一个好的所见即所得编辑器和一个好的具有通用目的HTML编辑器在理论上是不一致的。不可能把ContentEditable 构建成那样的,因为它们在需求上是冲突的。