The history of Web Word Processor (or, web word) began earlier than expected. It was known widely to the public in 2006 as Google Docs was introduced, so the software is more than a decade long. ThinkFree aimed for a web word running on a web browser under Java Applet. However, as it is not created on a web language like Javascript and hence cannot be called a web word. Native word processors are implemented, under each OS, with standalone Graphics Toolkits and programming languages, like C++ or Java. On the other hand, web words run on a web browser and therefore can be edited on any computer only if a supportive browser is available.
This document introduces the basic principles of creating a web word.
First, let’s discover why web word is necessary, and the criteria to be classified into a web word, as well as its types.
One big advantage of a web word is to express content as you want, whatever the OS is, only if a browser is available. However, regarding document editing, restrictions exist that a particular OS or software, or document format is required in most cases. In particular, the window-based MS Office and Hancom Office are most commonly used in Korea; DOC and HWP are the frequently used document formats. From user’s perspective, such environment comes as nuisance as supportive OS may not be available, and editing is simply impossible if there’s not a computer with such software installed. Sometimes, it may cost you unnecessary prices to buy relevant software.
With the development of web technology, many things are available on the web. The web word, only if a browser is supported, can be operated in any OS and can be used to document in any computers, even without installation of particular office software. As a document is represented in a web programming language, such HTML, Javascript, SVG, and Canvas, user doesn’t have to care about a document format.
By the standard of native word processors, I believe a web word must have page representation and editing functions. This may just be a personal opinion, but I set such criteria because many native word processor users find the most foreign aspect of using a web word in the unavailability of page representations. Most documents (including exam papers) we have experienced while educated, or those frequently used in public institutions or companies tend to have page representations, since even their digital versions are originated from books. They have been written upon word processors which are equipped with page representations and editing functions.
While developing a web word, I have benchmarked those web words supporting page representation and editing, including Google Docs, Hancom Netffice which took over ThinkFree. They provide much more functions than expected but still not so easy to use.
Many issues exist to implement words on the web and they are quite hard to realize for a document representations like below, even without format conversion.
Representing documents in similar forms to native word processors
Even as more functions are supported than expected, it is still not easier to use than expected. Representational differences, as compared to native word processors, seem to stand out even bigger for the users or organizations who or where document representation is considered more important than the content. Open Document Text-ODF (ODT) of Open Document Format (ODF) seems to satisfy content more, and the representation is slightly different for each of its implemented software: at the heart of the representation lie page representation and editing.
Given that representing and editing pages are necessity to create a web word, let me now share the basic principles of implementation.
Like I mentioned earlier, I believe the core criteria and the start of a web word is to have page representation and editing functions. To implement page representation after all the other implementations, like entering, deleting, inserting, and pasting content, as well as images and tables, many parts need to be newly implemented to fit for page representation.
Another important choice to make is to decide the method of character entry and representation.
Some Editor represents character entry and documents in Canvas, while SVG is used in ODT Editor of WebODF. TOAST UI Editor separates implementation of cursor entry from cursor rendering, although they are based on HTML. contentEditable is controversial but undoubtedly supports editing faster than any other.
To name advantages of contentEditable:
Disadvantages are so called due to its dependency on browser implementation:
Here, HTML is described as a representational tool of page. HTML, in general, refers to a document with vertical representation: document here doesn’t refer to a paper like A4-size print. Functions like page-break-before and page-break-after are very useful to implement printing and page division, but difficult to represent a page.
Then, let me define simple requirements of page representation:
Now, based on the requirements, let’s delve into a stage where HTML document is divided into pages (so called, Page Layout).
Page layout can be enabled in the following steps:
First, define the length and space of a page to create a space for representation (210mm x 297mm for A4 size, and the actual representation size excludes spaces on top, bottom, left and right. The prerequisite is that DOM has tags for page container). Then,
Perform page layout and the screen shall be configured in the web word with the document represented in pages, as below.
(The example is a demo screen for page layout of POLARIS Web Editor: the markdown article is edited into 7-page layout.)
Some editing performances require page layouts as described above, such as:
Page layouts must be carried out when these events occur.
Not many web word products provide page representation and editing functions, and one of the reasons regards to performance. Page representation based on the principles in the above requires a huge number of layouts and DOM operations that a browser needs to calculate. In particular, it requires lots of DOM operations to cover each character with span tags while dividing a paragraph into lines. Plus, the layout result of a browser must be imported to use as coordinate of a character, which costs quite much. Considering that such process needs to be repeated for every typing, you can imagine how enormous the job is.
Even though I haven’t tested all web word processors, in many cases, this type of implementation entails slow typing of each character. The more you enter, the slower it gets. In the case of testing Google Docs on simple characters only, I felt that it got slightly more difficult as I went over 10 pages.
Let me share two tips to save calculation:
Another point to note for an implementation while considering performance is to make the utmost use of the Javascript optimization method. Also make sure to minimize causing of a force layout. Remove performance-prohibiting codes, by learning anti-patterns. By understanding the concept of layout or reflow, try not to occur unnecessary layouts.
In the next article, I will introduce how to implement simple page representation and editing by using actual codes: implementing page editing by using contentEditable, as well as the page layout process after change events of a document (e.g. insertion/deletion of characters) will all be shared.
The emergence of new APIs like, WebGL, WebAssembly, or File/ DB of HTML5 indicates that a web is now considered as a platform where traditional works, functions and services can be processed. And the development is still ongoing. The transition from native word processors to web words is part of the process. Some environment, like Chromebook, even provides web words only through Google Docs, without having a native web word processor.
Such trend is no exception to Korea. Many public institutions began to define Active X-based native word processor as non-standard and are trying hard to introduce web word. Nevertheless, given that Korean users tend to focus more on expression than content of a document, I suppose we’ve just joined the flow of web word: front-end developers have many to contribute.
Supposedly, if a majority of Korean users fade away from IE11 and lower versions and move on to Edge, the flow of transition to web word would probably gain more momentum.