Learning About EPUB: Structure and Content


e-Books have revolutionized how an increasing number of people consume written content, allowing convenient access to myriad publications on e-readers, mobile devices, and computers.  But did you ever wonder about the technology that makes it all possible?

There are various formats used to create e-books, and one of the most popular is the standards-based EPUB format. Let’s take a look at how EPUB works by creating our own EPUB e-book, then re-purposing some existing HTML pages to flesh out the content.

e-reader

What is EPUB?

EPUB® (electronic publication) is an open standard from the International Digital Publishing Forum (IDPF) for creating and distributing digital publications such as e-books. EPUB content is “reflowable,” which means it can be accessed on any of the numerous e-readers that support the standard (Kindle, Sony Reader, Nook, Kobo, etc.), as well as most smartphones and tablets.

An EPUB document is comprised of OPS (Open Publication Structure), OPF (Open Packaging Format), OCF (Open Container Format), XHTML, CSS, SVG, image, and other file types in a single, interoperable file format for easy distribution and publication.

Editing and Validating EPUB Content

The components that make up an EPUB document are packaged in a zipped archive.  XMLSpy includes an EPUB editor for viewing, adding, deleting, validating, and editing these files and folders. XMLSpy even ships with an example EPUB book so you can explore this functionality easily. Let’s create an e-book from scratch so that we can get a better look at the structure and components in each document. In this example, we’ll be creating a cookbook.

Let’s start by selecting New from the File menu, then clicking .epub Electronic Publication. After we enter a name for our new e-book and save it, it opens in the XMLSpy Archive View as a skeleton that includes all the files and folders required to create a valid EPUB document.

 

clip_image001

As shown above, each EPUB archive has the following structure and key components:

|– Mimetype file (Archive)
|– META-INF folder
|             — container.xml
|– DOCUMENT folder (In the screenshot above, OEBPS is the Document folder.)
|             — contains HTML, CSS, image files, plus OPF and NCX files

The OPF file, traditionally named content.opf, contains the digital book’s metadata. It is based on the Open Packaging Format (OPF) specification.

The NCX file (Navigation Control file for XML), traditionally named toc.ncx, contains the e-book’s table of contents. It is based on the NCX part of the OPF specification.

The folder named META-INF must contain the file container.xml, which points to the file defining the content of the book (the OPF file). The file container.xml specifies how the archive files should be organized according to rules in the Open Container Format (OCF) specification.

XMLSpy provides context sensitive entry helpers and useful editing guidance for all these standards-based files. Let’s start creating our e-book content by double-clicking the title.html file. When it opens in the XMLSpy HTML editor, we can immediately see what needs to be updated first.

 

clip_image002

Let’s add some initial content based on the place holders in the file, as well as an <h2> and <p> for the welcome message.

After saving these changes, we can switch back to Archive view to preview the content of our EPUB document so far. Clicking the Preview button generates an HTML file from the content in the EPUB archive and displays it in XMLSpy’s integrated Browser View.

 

clip_image004

 

As you can see, there is still some required information missing. Let’s double click the content.opf file to add the e-book metadata. We can switch to Grid View to enter the data this time. It’s easy to move between text-based and graphical editing methods, depending on your preference.

 

clip_image005

 

Clicking Preview again shows that our initial information is complete.

 

clip_image006

 

Before proceeding, let’s validate the EPUB file to ensure interoperability by pressing F8.

 

clip_image007

Our file is valid, and we can continue adding the rest of the content to the EPUB archive to finish our e-book. When errors are present, the XMLSpy validation window lists and describes each instance with a link to where it occurs in the file to aid in troubleshooting.

Repurposing Existing Content in EPUB

With the number of options readers have for consuming written content these days – from e-books to PDF files to Web pages – it’s become a common requirement to have the same content available for consumption via multiple channels. And the easier it is to do that, the better.

It’s easy to repurpose HTML content from a Web site in an EPUB document in XMLSpy. Let’s continue with our example by adding some existing HTML pages from the author’s cooking blog to build out the content of the e-book.

We can add those blog articles by clicking the Add Document button in Archive View, and browsing to select the files.

 

clip_image008

 

We also need to add the images included in the HTML pages. Since these are binary files, the best way to do this is to open the EPUB document using WinZip or WinRAR and add the required files to the archive.

 

clip_image009

 

Once we save the zip archive, the EPUB document in XMLSpy reflects the changes.

Next, let’s update the Table of Contents (toc.ncx) file to point to the HTML pages by creating a separate NavPoint and NavLabel for each HTML page…

 

clip_image010

 

…and update content.opf to include our HTML pages as part of the EPUB manifest.

 

clip_image011

 

This time, when we click Preview, we see the two articles we added and can navigate to each one.

 

clip_image012

 

Our e-book is coming together!

This is, of course, a very simple example, yet it’s useful for understanding the structure of EPUB documents and demonstrating how easy it is to repurpose existing content in EPUB. In addition to the easy-to-use Archive View, XMLSpy provides intelligent editing support for the technologies required for the most sophisticated e-book presentation: XML, XHTML, HTML, CSS, etc.

To see the structure of a complete EPUB book, open TheCantervilleGhost.epub file in the XMLSpy Examples project – or access one of the free EBUB books available on the Internet. A great source is Project Gutenberg.

If you’re not already an XMLSpy customer, you can download a free trial of XMLSpy to try this out now.

Tags: , , ,
1 reply
  1. Roberto
    Roberto says:

    Thanks Erin for this incredibly informative post. Greatly appreciated. There is a paucity of detailed, accurate information n epub structure/function. Or maybe I just haven't know where to look. totochto

Comments are closed.