February 9, 2009

The Hidden Data in MS Office Documents

Most people are unaware that the documents they create and edit using Microsoft’s
Office suite of products contain a large amount of data related to the documents life-cycle.
While usually benign and not very interesting, this data can become quite valuable
in a forensic investigation. It can help establish timetables of when a file was last
accessed or modified. An examiner can even extract the last few users who edited the
file and the previous locations the document was stored.


Example of some available metadata fields in Office documents:

  • Your name
  • Your initials
  • Your company or organization name
  • The name of your computer
  • The name of the network server or hard disk where you saved the document
  • Other file properties and summary information
  • Non-visible portions of embedded OLE objects
  • The names of previous document authors
  • Document revisions
  • Document versions
  • Template information
  • Hidden text or cells
  • Personalized views
  • Comments

Extracting metadata

There are several ways to extract the metadata from a document. The simplest method
is to view or modify many of the fields by using the Office applications themselves.
In Word or Excel, under the File menu, the Properties option will display a dialog
window containing many of the editable metadata fields.

However, not all metadata fields are this easy to access. Third party tools must be
used to extract certain fields. There are many tools available such as MetaDiscover
or MetaViewer by PinPoint Labs to extract
these additional fields. MetaDiscover is able to extract the last 10 authors of a
document and the locations it was stored at.

An additional benefit of using a third party extraction utility is that most open
the document in a read-only mode when retrieving the metadata. By using the native
Office applications to view the data, the document is typically opened in a read-write
mode which may cause certain fields, such as last accessed, to be updated thereby
altering the file. In a forensic investigation, it is imperative that the data remain

Figure 1: Metaviewer showing a Word 2003 Metadata


Removing metadata

There are two primary techniques used to ensure metadata is not included in documents
when they are shared or published: limiting the creation of the metadata and scrubbing
or redacting the data before publication.

The first technique is to configure the Office applications to not create and store
the metadata fields in the document in the first place. If the data never existed
there is no need to remove it at a later point. While this technique would appear
to solve all the issues surrounding the inclusion of metadata, in practice there are
still some fields the Office applications will create and populate even with all user
available options configured. A good guide on configuring Word 2003 to limit the amount
of metadata it stores in documents can be found

The other way to ensure that metadata cannot be extracted from a document is to run
a utility which edits the document and scrubs or redacts the information. Microsoft
has created an add-in for the Office 2003 suite that is available
will remove most metadata fields from a document. There are also many third party
programs available such as iScrub by Esquire Innovations.

Comment about Office 2007

 The latest version of Microsoft Office, version 2007, utilizes new file formats. These
new file formats still contain metadata but store it in a different structure than
all previous versions of Office. Therefore, most extraction or scrubber utilities
will not operate correctly on these new formats. However, Microsoft has included a
new feature in Office 2007 called the Document Inspector. Microsoft claims this new
feature will allow company’s to control the metadata within the documents they publish.
For more details on its operation Randall Farrar of Esquire Innovations has written
a detailed
whitepaper on it.

Final Thoughts

The metadata
associated with Office documents can provide many clues into the history of a document.
Present with this information, many organizations may choose to either remove this
data or convert the files to another format (such as PDF) before they publish them.
From a forensic investigation vantage, the metadata can be very important and should
always be reviewed for all documents related to an incident.