In computer science, separation of content and form is an important point in the creation and management of a document by computer tool.

The separation of content and form consists in separating the message of a document from its presentation. We define on one side the body of the document, for example the text of a Wikipedia article, and on the other side the form, such as the presentation of titles and paragraphs.

With the advent of digital technology in documentary creation, writing practices have changed. Computing brings an additional abstract layer to writing and documentary form and changes the definition of what we can commonly call a document. This area offers key techniques that show that authors are led to change their practices.

A single document can, for example, be multi-media. Multi-media is in fact the characteristic of a document to be presented on different media. Typically, a course given by a teacher can be presented to students in various ways, on different media: a paper medium such as a handout, a slide show (Powerpoint, Beamer or other) and / or an interactive website.[1]. Each of these media will have different presentation requirements. For greater consistency and efficiency, authors have an interest in constructing content that can be published on different media.

These new constraints lead the author to separate the substance from the form of his document. Hence the emergence of editorial chains of the WYSIWYM type which require this approach unlike the WYSIWYG type word processors which only allow it and, for lack of training, are most often badly used spontaneously.

The separation of content and form is not a necessity in itself; it is not linked to a software or hardware constraint.

Traditionally, the creation of a document – article, book, poster, leaflet, etc. – was carried out by several specialized trades: the author, who wrote the text, the proofreader (s), the typographer who ensured the layout. . The computer tool, when the content is well marked out and separated from the form, performs these tasks automatically, as a by-product of the drafting. As there is a dissociation of the two aspects, they can be modified separately: it is thus easy to test several presentations of the same document.

The separation of content and form facilitates these tests and the intervention of several authors (multi-authoring).

The author focuses on the content of the document. The formatting is handled before (on-screen layout for comfortable writing) or after (for publication in various forms), often delegated to someone else: secretary or graphic designer.

It also allows you to search for all elements of the same type, for example to make a list: table of contents (list of titles), list of figures, tables, referencing (cross references, search engine, inclusion in a database of data)…

Separating form and form also makes it easier to migrate a document to other forms (single sourcing): a file to be printed as a web page or vice versa, generation of electronic help from the document to be printed … This also facilitates interpretation by other software: braille output or voice reading software (accessibility) for example.

The separation of content and form quickly becomes necessary with large documents. However, it is a good habit to get into, and some recommend it for small documents as well.

The titles of a text are distinguished, for the reader, from the body of the text by their font (font, size, weight, etc.) and their layout (lines, spaces, alignment, etc.). We could, each time we come across a title, define its typography, but:

  • there would be a risk of not strictly defining the same typography each time, and therefore of not having a homogeneous document;
  • if we want to change this formatting, we would have to find all the titles and do it individually.

Another example: within text, program code elements, program names, keyboard entries and screen messages are often fixed-spaced font (Courier type). If we want to change the font of one of these elements, a search for the formatting would give us all of the aforementioned elements, which would be inefficient. And the task is all the more tedious as the occurrences are frequent.

The problem is multiplied:

  • when several people edit documents from the same collection: how to ensure consistency and, moreover, how to check it?
  • when a document exists and its format or the medium of the document must change: switching from A4 to A5 for example, or publication on paper and online;
  • when the documents have several recipients who each require the use of their own standard, for example the Snecma for several aircraft manufacturers, or an arsenal for the three armies of land, air and sea, not to mention sales to foreign armies ;
  • it is not possible for the reader to personalize the appearance: the author’s preferences apply.

If the software allows it, it is therefore better to indicate the function of the text element – in our example “chapter title”, “section title”, “program name” … – and indicate only once in the document , or better yet out of the document (this is the notion of profile or from style sheet), how elements of this type are formatted. The author indicates to the machine, with his terminology, what he is writing. The machine automatically performs the formatting from this formal indication of the nature of the information entered. The machine automatically performs – on writing, after compilation (eg LaTeX) or during interpretation (eg XHTML / CSS) – work formerly carried out during the transcription of the manuscript. The machine only demands more rigor in expression than a typist or a composer. To simplify collaborations, it is preferable that the terminology used to describe the content of the document is standardized.

This consideration was at the origin of the GML marking (Generalized Markup Language), by Goldfarb in 1979, which later became the SGML standard. CERN, using GML extensively, created an extension for the Internet which met with worldwide success: HTML.

Creation of web pages[modifier | modifier le code]

For the creation of web pages, it is recommended to write the content in XHTML (or in HTML), and to describe the formatting in CSS, preferably placed in a separate file.

Among the good habits to take:

  • use the tag <em>…</em> – emphasis, which indicates the function of the text portion, and can be transcribed in italics, in color (for display in text mode) or by a particular intonation of the voice for a vocal reader – rather than <i>…</i> (put in italics);
  • use the tag <strong>…</strong> (strong emphasis) rather than <b>…</b> (bold characters) ;
  • use the tag <code>…</code> (program source code) rather than <tt>…</tt> (typewriter style);
  • use the tag <q>citation</q> rather than "citation" Where «&nbsp;citation&nbsp;».

Web sites[modifier | modifier le code]

Some websites have a form filled out and used to produce a document. This is for example the case of sites offering to put a CV online: a certain number of fields are filled (marital status, diplomas, experiences…) and the formatting is done by the site engine.

Word processor[modifier | modifier le code]

Modern word processing software generally allows you to define styles.

Thus, it will be indicated that a portion of text is in a determined style, corresponding to the functional designation of the information, and the formatting of the style will be defined in the style definition dialog box.

Care should be taken to ensure that the names of styles have a functional and not a typographical meaning. For example, we write “Chapter title” and not “Centered title”. Several styles can perfectly have, however, the same typographic aspect at a given time, but differentiating them will allow to differentiate them in the future.

For example, when IBM introduced the GML in 1979, the “title” mark (:h1) did not differ in its effects from the mark “mise en relief” (:hi1) than by a conditional line feed; for the rest, the printer just made a bolding by double keystroke. When four years later appeared with this manufacturer his first typographic printers, the :hi1 were translated into equal-body italics, and the titles :h1 by the bold Roman 14, without having to change a comma to the texts themselves; moreover, these texts also remained printable on conventional printers: the GML had proved to be a Trojan horse to encourage all its customers to immediately switch to typography, since the benefits were immediate.

With word processing, the separation of form and form requires not placing any unnecessary layout characters: empty paragraphs, tabs at the beginning of a line – or for that matter anywhere -, page breaks not required by function, multiple spaces, etc. Indeed, any character of layout which is not essential is of the “form” which pollutes the “substance”. Some programs try to remove them automatically, but their reliability is not perfect.

Latex[modifier | modifier le code]

The LaTeX document composition language and system has environments or instructions allowing the content / form separation: chapter{…}, section{…}, begin{tabular}…end{tabular}, … You can also define macro-instructions (personal commands) in the header, for example


to change the file names to a single-spaced font.

MediaWiki[modifier | modifier le code]

With MediaWiki software, which Wikipedia uses, titles are indicated by equals (see Help: Syntax> Title), which allows to generate a summary of the page, the links by two pairs of brackets (see Help: Syntax> Links), arrays are in brace / tube pairs (see Help: Syntax> Tables). This is the description of the bottom. The formatting is determined automatically; it is customizable if you are a registered user (see Help: User Preferences).

The background / form separation also involves the creation of models (see Help: Model): in the body of the text, we simply indicate the model, and the formatting is defined in the model page. This is the case for example of the model {{API}} for international phonetics, the model {{s}} for centuries, the model {{Chiffrage mesure}} for the encryption of music measures on Wikibooks…

The generating software in fine HTML with a CSS style sheet, you can also use classes, like the class explain for explanations:

<span class="explain" title="Société nationale des chemins de fer">SNCF</span>
  1. Crozat, Stéphane, Editorial channels and re-editorialization of digital content

Bibliography[modifier | modifier le code]

Related articles[modifier | modifier le code]

Leave a Reply