50 years ago a team of independent reporters began writing down everything politicians said in debates at Parliament. They stood for truth and accountability and their work continues to this day. The team and their written record is known as Hansard.
Over the month of July the Office of the Clerk acknowledged and celebrated the centenary of the official record of Parliamentary Debates being kept which included the digitisation of every word ever spoken in the House. With the upcoming election, we at NZMS thought we’d take some to share a bit of history about Hansard and some of digitisation specifics of the project because we’re interested in and been involved in the (mass-) conversion of New Zealand’s documented history into discoverable, searchable, repurposable formats.
Key Facts about Hansard
- The records are known as Hansard, after Thomas Curson Hansard, the first official printer in the UK Parliament.
- New Zealand’s parliamentary records date from 9 July 1867 when independent reporters began record the debates. Records prior to this were published in Newspapers, though were subject to reporter bias.
- MP’s cannot improve on what they said in the House unless the editor of Hansard approves their changes
- Te Reo parliamentary speeches have been recorded since 1997 translated into English
- 723 volumes of Hansard have been digitised dating from 1869-1985, around 750,000 pages
- 17 reporters and editors record today’s House debates which are usually available online within 2 hours
- Circa 500,000,000 spoken words have been recorded in the debating chamber
- Hansard records from 1854-2003 are available via pdf, those dating 2003 and onwards are available on https://drive.google.com/drive/folders/0B1Iwfzv-Mt3CRGZkMWNfeXoybmc
About the Digitisation Process
Our Managing Director, Andy Fenton, recently participated in a (NZ Libraries) listserv discussion on the celebration of the sesquicentenary of Hansard – discussing the specifics about the digitisation of Hansard, and raised some observations, many compliments, and some questions about the newly available online version. Peter Riches from the Office of the Clerk kindly responded with the following information.
- To make the records searchable Optical Character Recognition, aka OCR, was performed on all historical Hansard volumes up to volume 482, thereafter volumes dating 483 and onwards were already in a digital format. The OCR was undertaken using state of the art technology provided by Google and provided relatively high accuracy. As a result, the OCR transcripts were not manually corrected or cleaned up, and some Te Reo words with macrons have not been accurately rendered.
- The entire collection can be searched by either reading volumes up to 482 online at the Hathi Trust site here: https://babel.hathitrust.org/cgi/mb?a=listis&c=71329709 or by downloading volumes 483 to 605 here https://drive.google.com/drive/folders/0B1Iwfzv-Mt3CRGZkMWNfeXoybmc. and performing a full reader search using Adobe Reader. But it’s not perfect and, by deduction, even an accuracy rating of 96% by character can render one word wrong in 5 (80% accuracy by word). Given the current search method only renders a ‘hit’ on an exact matched word, it is fortunate that multiple sets (five in fact) were digitised, and if a search was conducted across all of these the effective accuracy by word would increase again. These different digital versions can be found here: https://en.wikipedia.org/wiki/Parliamentary_Debates_(Hansard).
- Quirky search mechanisms. For volumes up to 482 the HathiTrust search functionality appears to be whole-word only at this time (at least for non-HathiTrust-members). So a search for ‘Launcelot’ indicates that there are 23 volumes that contain at least one occurrence of that word, whereas no volumes contain ‘Launcel’. For volumes 483 to 605 you can choose whether you want your search to be whole-word and/or case sensitive from within the PDF reader software you are using.
- The transcripts of the date are ‘near verbatim’ but not every word is recorded as word repetitions, fillers and other disfluencies are removed to provide more readable content.
- A Māori language translation of Hansard was produced from 1881-1906 called Nga Korero Paremete (our Parliamentary Library holds copies of these). Nga Korero Paremete contained Māori and Pākehā members’ speeches on legislation considered particularly relevant to Maori. Since 1997 speeches delivered in te reo Māori, together with their English translation, have been included in Hansard.
If anyone has additional queries you are welcome to contact the Parliamentary Information Service firstname.lastname@example.org.
The Hansard records are not only important for historical researchers, lawyers, courts and political science students but assure the New Zealand public that the MP’s they voted for speak on the issues we, the public, voted for. Their content is a seminal record of New Zealand’s history. We think it’s intriguing that this material was digitised in the US over a decade ago and has hitherto been unavailable online till very recently (we are grateful!). The fact it was digitised overseas (albeit inaccessible) essentially put on hold the decision to digitise it in New Zealand (if we could have afforded to do so). Andy Fenton has talked about the value proposition behind the digitising and OCR methodologies and this merits further discussion for other similar cultural heritage material we feel.
The Hansard records have been published courtesy of The University of California and Google and shared by The Hathi Trust. We at NZMS offer our thanks to the staff of the Office of the Clerk of the House of Representatives, the Parliamentary Library, The National Library and University of Canterbury Library whose collective efforts resulted in this material being made available to the public.
We at NZMS also believe The Hathi Trust deserves a massive shoutout: they now have ~6M volumes (of 15.75M volumes digitised) available in the public domain … and that is a wonderful contribution to humankind.
To read more about the 150th centenary of Hansard visit:
The importance of NZ Newspapers:
“More difficult was finding an accurate account of the debates between 1854 and 1867, before Hansard was created. During this period the only record was provided by the press, and reporting was politicised and patchy. “If the newspaper didn’t like the members, they just didn’t report them,” Riches said. Nevertheless, five volumes of Hansard have been patched together from newspaper reports over these 13 years.”
About NZ newspapers (a personal plea to save them):
The New Zealand Herald Article:
There’s a fantastic selection of Highlights uttered over the decades provided: