The documents in the residential school database were obtained by Ed Sadowski, a retired researcher with the Shingwauk Residential Schools Centre in Sault Ste. Marie, Ont.
They are the result of more than a decade of requests to the National Centre for Truth and Reconciliation (NCTR) through Manitoba’s freedom of information framework, and to Crown-Indigenous Relations and Northern Affairs Canada (CIRNAC) and the Department of Justice under federal access to information legislation.
The federal government only agreed to release the documents in its care after multiple complaints to the Office of the Information Commissioner of Canada. Sadowski’s complaints led the commissioner to order the Department of Justice in 2021 and CIRNAC in 2024 to comply with the law, conduct reasonable searches and release the records Sadowski had requested.
Duplication
In most cases, our database contains two versions of each residential school narrative, one obtained from CIRNAC, and the other from NCTR. There are important differences between these two sets of documents. In general, the CIRNAC narratives are longer, containing both the narrative itself and supplementary primary source material. However they also contain numerous redactions. The NCTR narratives are often shorter in length but contain fewer redactions. Both versions of these documents are necessary to get as much information about each school as possible.
Data Cleaning
PDF documents often include selectable text, which makes it possible to copy and paste as well as search for text within a document. The documents Sadowski received from CIRNAC did not include selectable text. In order to make our database searchable and accessible, we used the AWS Textract Optical Character Recognition (OCR) service to scan and generate a text transcript for all files obtained from CIRNAC.
Limitations
IAP residential school narrative documents contain a range of primary source material, including handwritten and typewritten letters, maps, newspaper clippings and copies of reports. While AWS Textract uses machine learning to accurately identify text in low-fidelity scans, the quality of some of these documents is such that we are not able to accurately identify all the text contained within.