Insight Web Service

Insight Web Service > Overview

What is Insight?
The Penguin Random House Digital Page Initiative is an on-going project to index, digitize, distribute and set the terms for using book content online. As part of that initiative, Penguin Random House has developed Insight, a service that gives search engines and online retailers access to digitized book content over the Web. Publishers need to manage their published content, and an increasingly large percentage of this content is digital. The Insight service was developed to address the relationship between book publishers, their digital content and the Web at large.

Insight addresses this relationship specifically by maintaining the publisher's proper ownership and management of the content and giving business partners access to easy-to-use, browser-compatible tools to search, view and retrieve digitized book content over the Web.

Middleware
Fundamentally, Insight is a form of "middleware." Middleware is type of transparent software that brings other software together. In the case of Insight, it connects software from partner websites or search engines to the web-ready content archives made available by publishers. The publishers can then offer or sell the content to the partner, when requested by end users on the partner's website.

How does it work?
Insight is a set of transactions for requesting book content, delivered in an online industry standard, XML. These XML transactions are defined by URL requests that can be downloaded from this website and embedded into a partner's website or a search engine. For a more detailed description of these transactions, click html or PDF to review the Insight Service Specification directly.

The embedded transactions are submitted over the web to a publisher's Insight server in the form of simple URL requests. The publisher's Insight server authenticates the user, tracks the request, and responds with the appropriate data, formatted in either XML or as page media such as JPEG images.

Insight diagram

Insight for Web Developers
For a web developer at a retail partner, Insight is a lightweight and browser-compatible tool, requiring no special updates or plug-ins for the end-user. Insight uses familiar industry standards like JPEG images and XML formatting to display actual book page views as well as to provide keyword searches in the text of a title.

Insight for Publishers
For the publisher, Insight is a tool to get the publisher's digital content onto the websites of retail partners, search engines, publicity outlets, authors, blogs, and readers. With Insight, the publisher's digital book content remains in the hands of the publisher. It leverages existing industry tools like ONIX to work with partners; it implements business rules to guarantee that ownership and management of the digitized content remains with the publisher; and it manages access to the content from third-party websites.

To those ends, Insight codifies the following:

  • the rules of access to a publisher's digital archive,
  • the tools a developer can use to search and view a publisher's archive, and
  • the format in which the developer can expect to get the data returned.

Who uses Insight?
The tools of Insight target the following types of users:

  • Online Retail Partners - In conjunction with existing book title resources like ONIX, Insight enhances the websites of online retail publishing partners by providing page views of actual book pages as well as "search inside" functionality for text.
  • Search Engines - Insight provides a secure gateway for search engine spiders like Google to crawl book content at the publisher's discretion.
  • Social Networks - The simple URL requests of Insight can easily be wrapped into a tool for use in online social communities like MySpace.com and personal blogs.

Insight Use Case Examples
Read the transactions below for an overview of what Insight can do. For exact specifications on how to implement these use cases, see the Insight Service Specification (html or PDF).

  1. Whole Archive Keyword Search Summary
    Search the entire publisher's archive to get a total count of books and pages that contain the keyword.
    Request:
    How many books and pages from the archive contain the word "Ulysses?"
    Response:
    The keyword Ulysses appears in 12 books and on 678 pages of the archive.

  2. Whole Archive Keyword Search Results
    Search the entire archive to get a list of book titles and excerpts that contain the keyword.
    Request:
    Which books contain the keyword, "Ulysses," and what is the context in which it first appears?
    Response:
    A list of 12 book titles in which the keyword Ulysses appears as well as the excerpted text and pageID in which it first appears.

  3. Book Keyword Search
    Search a specific book title to get a list of search results and links to pages that contain the keyword, within the specified range.
    Request:
    What pages of the book, The Iliad, contain the keyword, "Ulysses," and what is the context in which it appears?
    Response:
    A list of links to the 58 pages with pageIDs of the book, The Iliad, on which the keyword Ulysses appears as well as the excerpted text in which it appears.

  4. Book Full-Page Transaction
    Get full-page media (e.g., JPEG) of a specified book by page number. These pages are of sufficient quality for reading, but the publisher decides the quality, size, and media type.
    Request:
    The full-page image of pageID 256 of The Iliad.
    Response:
    A full page representation of the page corresponding to pageID 256 of The Iliad. Insight responds with JPEG images, but the service could respond with another media type.

  5. Book Thumbnail-Page Transaction
    Get thumbnail media (e.g., JPEG) of a specified book by page number. The thumbnails are useful for displaying search results, to indicate the kind of content on the page (full text, pictures, etc.). They are not intended for reading.
    Request:
    The thumbnail image of page 256 of The Iliad.
    Response:
    A thumbnail representation of the page corresponding to page 256 of The Iliad. Again, Insight currently responds with JPEG images for this request, but it could respond with another media type.

  6. Book Page Context Transaction
    Get a list of links to thumbnail and full-page images for a specified number of pages before and after a specific page. This use case enables browsing forward and back, or jumping a few pages in either direction.
    Request:
    Where can I find links to the five pages before and after page 256 of The Iliad?
    Response:
    A list of thumbnail and full-page URLs to pages 251-255 and 257-261 of The Iliad.

  7. Sample Book Page Transaction
    Get a list of links to thumbnail and full-page images for a group of pre-determined sample pages available for the specified book (e.g., cover, backcover, TOC, etc.), as chosen by the publisher.
    Request:
    Where can I find links to all of the sample pages made available from The Iliad.
    Response:
    A list of thumbnail and full-page URLs to pages the front cover, table of contents, first index page, and first pages from sections of The Iliad.

How do I get started?

  • Publishers:
    • The management and ownership of the content remains in the hands of the publisher.
    • Prepare the content data in the desired format; e.g., JPG, PDF, indexed text, etc.
    • Click on html or PDF to read the Insight Service Specification and implement the content onto an Insight-compatible content server.
    • Decide which parts and/or how much of the book will be made of available to Insight and at what price.
    • Message this information to an Insight-compatible content server.
  • Development partners:
    • Set up an Insight partnership with the publisher to define and authenticate access to the Insight service.
    • Click on html or PDF to read the Insight Service Specification and begin enhancing websites/search engines code with features for keyword searches and full page or thumbnail page views.

^ back to top

RH.BIZ - Penguin Random House LLC

Bertelsmann Media Worldwide