The first challenge was to increase the efficiency of the host libraries and speed up the photography process.
The vision of the project was to identify early Hebrew printed books in libraries, photograph a selection of pages and use these images to create a catalogue record for each book.
Introduction
The Rothschild Foundation Hanadiv Europe (RFHE) is a charitable foundation which funds projects that fulfil its mission of: “Strengthening Jewish communal life and ensuring cultural heritage is accessible for future generations”.
The ITalYA project had just completed a successful pilot and were about to embark on a project to create a union catalogue of an estimated 35-40,000 Hebrew books in over 40 libraries across Italy. The project had a significant challenge to organise the work, identify and catalogue the books, transliterate the Hebrew titles into Latin script and then export the combined images and metadata for uploading into an online catalogue called TECA provided by the National Central Library of Rome.
The project is a partnership between The Union of Italian Jewish Communities (UCEI), in cooperation with the National Central Library of Rome (BNCR) and the National Library of Israel (NLI).
Scan Data Experts (SDE) were commissioned by the RFHE to act as consultants for the project and, working with intranda, they designed and implemented a technical solution for the project utilising Goobi Workflow to manage the production of the union catalogue and preparation of the data for uploading to TECA.
At the time of writing there are already over 11,000 books featured on the TECA catalogue:
http://digitale.bnc.roma.sbn.it/tecadigitale/progettoVolumiEbraici
The Story of the ITalYA Project
The project was conceived as a source for researchers about old Hebrew books in Italian Libraries. Early print publications in Hebrew were created by a huge number of different publishers in various editions and bindings. These publications have various provenance information recorded in them, they often have unique marginalia and have been subject to censorship.
Before the project, it was impossible for researchers to get an overview of where these books were located in Italy. In response to this, dr.ssa Gloria Arbib from UCEI developed a plan to create a union catalogue of the books in partnership with the National Central Library of Rome (BNCR) and the National Library of Israel (NLI).
The vision of the project was to identify early Hebrew printed books in libraries, photograph a selection of pages from them (such as the cover, title page, a selection of interior pages and the end matter) and then use these images to create a catalogue record for each book.
The catalogue record needed to contain title and author information (in Hebrew and transliterated Latin characters) together with information about the full catalogue record of the book on the National Library of Israel catalogue system, the transliterated record about the book in the OCLC Worldcat system and other information about provenance, censorship, and marginalia.
The project challenges
The pilot of the project had been successful, but the process was very time consuming as all the cataloguing work was carried out manually using Google sheets, and the data all had to be checked and cleaned to generate a suitable set of data to be uploaded into the TECA system. This was the point where the RFHE asked SDE to step in and see if they could help.
It became clear early in the pilot that the logistics of the project were the main challenge.
- There were 40 libraries, all with books to be included in the catalogue.
- 35-40,000 books to be included over the length of the project.
- Books needed to be assessed after photography to see if they were suitable for inclusion in the project
- The Hebrew title, publisher and author information needed to be copied from the full catalogue record for each book in the NLI Catalogue.
- Some books would not be in the NLI catalogue and therefore these books would need to be catalogued by NLI experts before their information could be copied
- The transliterated information had to be either copied from the OCLC system or manually transliterated by experts working in different parts of Italy.
- The images captured were to be quality checked and cropped
- The transliterated author information had to be looked up in the NLI system and then manually copied into the data.
- The photographs captured had to be manually file named for each book by the photographers
- The data had to be 100% accurate and correctly formatted for it to be imported correctly into the TECA system.
All the different steps involved meant that the scale of the challenge would be impossible to complete in the time frame of the funded project without either more team members than the project could afford or a radical rethink about the best methodology to adopt.
Why RFHE Chose Scan Data Experts
At the time SDE was approached by the foundation to assist with this project, they had already been supporting another project, Yerusha for some time. The Yerusha project (subject of another SDE case study) was set up to collect and publish collection level records of archival holdings about the Jewish people from across Europe. SDE had set up the technical infrastructure and online delivery system for that project using Goobi Workflow and Goobi Viewer.
Prior to that project SDE had been recommended to the foundation from the work that it carried out for the Wiener Holocaust Library.
How Scan Data Experts were able to help
When SDE was first engaged for the project, some significant time was spent understanding the pilot project and the logistics of the project. Geoff Laycock visited the National Central Library in Rome with the project team and gained real insight into the project requirements. It was always the intention to use Goobi Workflow as the production management software, so that as many steps as possible could be automated. This automation would dramatically reduce the time that each of the steps had taken in the pilot and therefore would make the project more feasible.
Once the steps were agreed and defined, SDE worked with intranda to install and configure the software, along with defining the development of some innovative new features.
The first challenge was to increase the efficiency of the initial inventory by the host libraries and thereby speed up the photography process. The concept we developed was that the libraries would create a simple spreadsheet of the books in their collections that would be included in the project. This spreadsheet assigned each book a unique identifier made up of the library identifier (the Italian SBN number) and a running number. Each library was managed as a dedicated project in Goobi. At the time the spreadsheet and instructions were sent to the library, Goobi generated PDF ‘Dockets’ to be printed and inserted into the front of each book on the shelf. These would be printed at the library on A4 paper and featured a prominent barcode of the book identifier. The library was then responsible for recording basic information about the book into the spreadsheet, and then placing the docket page inside the front cover. They then sent the spreadsheet to the project manager, Diletta Cesana. Diletta uploaded the spreadsheet into Goobi which created a Goobi process for each book to be photographed.
The photographers were then scheduled to visit the libraries. When the photographer arrived, the task was simpler because all they had to do was photograph the books with the docket pages inserted in them. The docket page was photographed first and then 5-7 pages of the book, before repeating the process with the next book. Importantly SDE specified that the books must be photographed on a pure black background which was achieved by supplying artificial black velveteen fabric to the photographers, along with their standard equipment of camera, laptop, rulers, and copy stand.
At the end of each day the photographers did not need to do any file naming, they simply had to upload all the images in bulk to Goobi online. Goobi then automatically recognised the barcodes and linked the images to the correct book processes already in Goobi.
Goobi then automatically renamed the images to correspond to the correct project specification for the filenames. A team at the National Central Library in Rome carried out image quality assurance on all the books and passed any errors back to the photographers for correction in Goobi if needed. After QA, the barcode image for the books was removed by Goobi leaving only the images (in an uncropped form) that would eventually be featured on the TECA system.
From this image the expert project cataloguers were able to start their work from their home offices across Italy. Their first task was to find the book on the National Library of Israel’s library catalogue based on the information they could read on the images of the title pages. Once they found the book, they would simply copy the unique NLI book identifier into Goobi and finish the task. Importantly, at this stage they would also weed out books that did not fit the project criteria for inclusion, such as books that were too modern. Also, if they could not find the book on the NLI system, or if they had other questions, Goobi routed these to a team of expert Hebrew cataloguers at the NLI working from Israel. If needed, the NLI team would catalogue the book into their system from the images in Goobi and then the project cataloguers would be able to include the NLI identifier in the Goobi process.
After this Goobi performed some automated steps: it used the NLI identifier to automatically query the NLI Catalogue API to pull in the entire MARC record for the book, thereby completing the Hebrew project fields with 100% accuracy. It also took the Hebrew Author and place of publication information from the imported NLI data and used this to automatically query the Virtual Internet Authority File (VIAF) system. VIAF is an online database holding the person, corporate body, and locations databases of all national libraries in the world. The system was able to find the correct author records and place of publication records and retrieve both the Latin name forms and alternative name forms for the records into the Goobi process metadata automatically.
The next step was the transliteration of the Hebrew title information and the adding of the publisher information from a controlled vocabulary stored in the Goobi vocabulary manager. Here, the cataloguers would look up the book in the OCLC WorldCat system and copy the OCLC ID and the Latin title into Goobi. While this took more time than the Hebrew steps, Goobi was able to facilitate this process.
Given that the project was cataloguing printed books as opposed to unique items, the same book could feature many times in one library and across the different libraries. All the books still needed to be captured as they had differences in binding, provenance and marginalia, but they were all linked together virtually by the same NLI identifier.
A feature was developed by Intranda to enable the cataloguers to check if the transliteration of each new book had already been done elsewhere on the system. When they opened a new book for transliteration, they clicked a button which pulled up other transliterations already in Goobi for other books with the same NLI identifier. For these books therefore all the cataloguers had to do was accept one of the other transliterations and import it into the metadata of the book that they were working on. As more and more books were transliterated the number that needed to go through the manual process reduced dramatically, thereby speeding up this part of the process.
After transliteration, the book images all went through the Goobi Layout Wizard engine. Layout Wizard is an automated cropping tool that calculates where the crop should be done on each image by identifying the edges of the book on the high contrast black background. After this analysis, the National Central Library team in Rome were able to quickly assess the crop positions and accept them or make minor adjustments in the Goobi system. This was several times faster than the process of manual cropping would have been, while still retaining control over the crop positions and thereby maintain quality.
Goobi then cropped the images, and after a final metadata quality check by the cataloguers each book was complete. At the end of each library project, Goobi then exported the metadata and images to a zip file in exactly the correct format for uploading to the TECA system. This was downloaded by the National Central Library team and uploaded into TECA automatically.
Throughout the process, SDE was responsible for system design, problem solving, managing the Intranda development process, supporting the project manager, reporting, training all the different teams, documenting the processes, creating training manuals and providing first line technical support.
The Results
Working with SDE and Goobi to deliver this highly complex project significantly reduced the effort needed by the team to create consistent and high-quality content to be featured in the union catalogue. The real credit for the project goes to the dedicated team members and experts that worked on the books in the libraries, in countless locations in Italy and in Israel. The online nature of Goobi and it ability, with intranda’s expert skills, to automate much of the process was the key to making best use of the expertise of the project team.
At the time of writing over 11,000 books now feature on the TECA system and work is continuing apace to continue creating this valuable research resource into Hebrew printed books in Italian Libraries.
http://digitale.bnc.roma.sbn.it/tecadigitale/progettoVolumiEbraici
Want to find out more?
If you would like to hear more about the project and how SDE was able to help, then please get in touch with us here.
We have a variety of case studies available to show how we have designed and implemented a large range of different heritage digitisation and digital strategy projects for our customers. Please feel free to browse through them here.
If you are facing similar challenges or are considering projects like the one described here, then get in touch. We would be happy to talk through your plans on an informal basis to see if we can point you in the right direction. Having an initial consultation is free and comes with no obligation. We look forward to hearing from you.