It looks like you're offline.
Open Library logo
additional options menu
Last edited by Mek
March 15, 2022 | History

Search inside individual book API

WARNING: This is an experimental API and can change in future.

Here is an example of searching inside a book using the searching within a book using the API.

This API is based on changing datanode hosts (i.e. an archive.org item's files live on data hosts which can change). To find the data node host of an item, go to archive.org/metadata/{identifier} and change the prefix ia800204 to the value of d1 or d2 accordingly. The path variable in the url may also have to change to dir value within the metadata:

https://ia800204.us.archive.org/fulltext/inside.php?item_id=designevaluation25clin&doc=designevaluation25clin&path=/27/items/designevaluation25clin&q=%22library%20science%22

Information you need to search inside a book, with an example from the above search:

You can find the hostname and path using the archive.org locator service.

Example of output from API call:

reply( {
    "ia": "designevaluation25clin",
    "q": "\"library science\"",
    "page_count": 224,
    "body_length": 475677,
    "leaf0_missing": true,
    "matches": [
       ...
    ]
} )

The reply includes page count, this is the number of pages that were passed to the OCR.

Example of a match:

{
    "text": "The first Clinic on Library Applications of Data Processing was held at the Illini Union on the Urbana-Champaign campus of the University of Illinois, April 28 - May 1, 1963 under the sponsorship of the University of Illinois Graduate School of {{{Library}}} {{{Science}}}. Writing in the Foreword to the Clinic proceedings, Herbert Goldhor (1964) provides the rationale for sponsoring such a Clinic:",
    "par": [
        {
            "page": 14, "page_width": 2134, "page_height": 3328,
            "b": 1090, "t": 700, "r": 2024, "l": 192,
            "boxes": [
                { "r": 1560, "b": 957, "t": 899, "l": 1378 },
                { "r": 1767, "b": 957, "t": 899, "l": 1587 }
            ]
        }
    ]
}

Each match contains a 'text' field. This is usually a complete paragraph. The matched words are surrounded by three braces either side, like {{{this}}}.

The other field is called par, it contains details of every page that is part of this match. Paragraphs can cross pages. Each par object provides a page number, page width, height, and coordinates for the paragraph on the page. The boxes field field lists the coordinates to draw around each word or part of word in the match.

Hyphenation means words can break across lines and across pages.

History

March 15, 2022 Edited by Mek Edited without comment.
March 15, 2022 Edited by Mek Edited without comment.
October 26, 2016 Edited by Brenton Cheng Edited without comment.
January 7, 2011 Edited by Edward Betts host and path of sample book changed
October 22, 2010 Created by Edward Betts started page