Jump to content

Document file format

From Wikipedia, the free encyclopedia
(Redirected from Document format)

A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats.

Examples of XML-based open standards are DocBook, XHTML, and, more recently, the ISO/IEC standards OpenDocument (ISO 26300:2006) and Office Open XML (ISO 29500:2008).

In 1993, the ITU-T tried to establish a standard for document file formats, known as the Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.

Page description languages such as PostScript and PDF have become the de facto standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of ISO/IEC standards for PDF began to be published, including the specification for PDF itself, ISO-32000.

HTML is the most used and open international standard and it is also used as document file format. It has also become ISO/IEC standard (ISO 15445:2000).

The default binary file format used by Microsoft Word (.doc) has become widespread de facto standard for office documents, but it is a proprietary format and is not always fully supported by other word processors.

Common document file formats

[edit]

Below is a list of some of the more common document file formats, common file extensions used by the formats in parentheses:

  • ASCII, UTF-8 (.txt, others) — any of a number of plain text encodings that may have differing line endings depending on what system they were created or edited on
  • Amigaguide (.guide) — a hypertext document format designed for the Amiga that is used to document Amiga programs
  • Microsoft Word (.doc, .docx) — structural binary (.doc) and XML-based text formats (.docx) developed primarily by Microsoft, both of which are subject to the Microsoft Open Specification Promise and are used to store word processing documents[1][2]
  • DjVu (.djv, .djvu) — a file format designed primarily to store scanned documents, especially ones containing a mixture of text, line drawings, and images[3]
  • DocBook (.dbk, .xml) — an XML-based format intended for writing technical documentation
  • HTML (.html, .htm) — an ad hoc hypertext document format originally created for the World Wide Web, initially developed as an open standard by the W3C and currently being developed as one by the WHATWG
  • FictionBook (.fb2, .fb3) — an open, XML-based e-book format that originated and gained popularity in Russia
  • Markdown (.md) — a simple, plain text markup language with a number of different implementations that is popular on blogs and content management systems
  • OpenDocument (.odt, .fodt) — an open, XML-based standard for office documents, including word processing documents, spreadsheets, presentations, and graphics
  • OpenOffice.org XML (.sxw, .sxc, .sxd, .sxi, others) — an open, XML-based format for office documents including word processing documents, spreadsheets, presentations, graphics, and formulas
  • Open XML Paper Specification (.xps, .oxps) — an XAML-based page description format designed by Microsoft (.xps), intended to compete with the Portable Document Format (PDF) and was later standardized by Ecma International as ECMA-388 (.oxps)
  • PalmDOC (.pdb) — a special version of the PDB record database format used by Palm OS used to store e-books and other text documents for handheld devices
  • Pages (.pages) — a document file format used to store word processing documents for Apple's Pages app, as a part of its iWork office suite
  • Portable Document Format (.pdf) — a now standardized (ISO 32000), open format based on PostScript, developed by Adobe in 1992, that is able to store documents, forms, rich media, and graphics (PDF and PDF/UA) for document exchange (PDF/X and PDF/VT), archival (PDF/A), and engineering (PDF/E)
  • PostScript (.ps) — a page description and programming language designed by Adobe for use with printing, display systems, and storing documents
  • Rich Text Format (.rtf) — a proprietary document format developed by Microsoft for cross-platform document interchange with Microsoft products[4]
  • Symbolic Link (.slk) — a plain text ASCII format created by Microsoft in the 1980s to exchange data between spreadsheet applications
  • Scalable Vector Graphics (.svg) — an XML-based vector image format for defining two-dimensional graphics that has support for animations and interactive content
  • TeX (.tex) — a plain text format for describing complex types and page layouts that is often used for mathematical, technical, and academic publications
  • Text Encoding Initiative (.xml) — a primarily XML-based format for semantically marking up text, used primarily in the field of digital humanities to provide detailed representations of the components and concepts that make up a document
  • troff (.tmac, .man, others) — short for "typesetter roff", a typesetting markup language developed by Bell Labs from the original roff program for Unix
  • Uniform Office Format (.uof, .uot, .uos, .uop) — a standardized, XML-based open format designed for use with office applications developed in China, with support for word processing documents, presentations, and spreadsheets
  • WordPerfect (.wpd, .wp, .wp7) — a proprietary format now owned by Alludo used to store and represent word processing documents

See also

[edit]

References

[edit]
  1. ^ "Microsoft Office Binary (doc, xls, ppt) File Formats". Microsoft. 2008-02-15. Archived from the original on 2009-03-08. Retrieved 2010-03-18.
  2. ^ Microsoft Corporation (2010-07-23). "MS-DOC - Word Binary File Format (.doc) Structure Specification". Retrieved 2010-08-08.
  3. ^ "What is DjVu - DjVu.org". DjVu.org. Archived from the original on 2019-01-21. Retrieved 2009-03-05.
  4. ^ "Rich Text Format (RTF) Specification Version 1.9.1" (PDF). Archived from the original (PDF) on 8 July 2019.
[edit]