DIALLED: Distributed Index of All Library Location and Event Data
This project maintains a linked open dataset listing all (as many as we
are aware of) of the
libraries, archives, and museums in Canada, along with their locations,
geographic coordinates, hours, and other meta-institutional data. We
periodically harvest the linked data embedded in the canonical web page for
each site and aggregate and redistribute that data for easy public consumption
and reuse.
Last harvest
Data was last harvested and analyzed on 2016-09-11.
State of the Canadian library, archive, and museum linked open data union
- Number of sites crawled (attempted)
- 4,801
- Number of sites with schema.org Library entities
- 5
- Number of sites with OpenGraph Protocol title attributes
- 525
- Number of sites with schema.org name attributes
- 215
- Number of sites with Twitter card attributes
- 10
Data dumps
The current dataset is available at /datadumps/dialled.ttl.zip.
Previous datasets are archived in /datadumps/archives.
Help build DIALLED
- Fast and easy: Ensure each location for your institution is listed with the correct URL. We can't
include your institution if we don't know about it!
- More involved: Create or maintain lists of institutions for your
province or sector. For example, Manitoba lists all of their public libraries
but only in a non-machine-readable PDF file. Creating an HTML version of that
would enable us to ensure all of those libraries are properly represented in
DIALLED.
- More effort, but really valuable: Add linked data to your institution's home page!
My article "
White
Hat Search Engine Optimization (SEO): Structured Web Data for Libraries",
published in
Partnership: The Canadian Journal of Library and Information
Practice and Research, gently walks you through the process of augmenting an
existing library web page with rich schema.org linked data. It also discusses
the importance of
robots.txt
and sitemap files for ensuring the
visibility of your library website.
- Enhance the dialled.ca aesthetic. You might have noticed it's stark. If you
have design skills, I welcome them!
- Write HTML scrapers. Some sets of institutions have easily recognized patterns
for how they represent their contact information, hours, etc. We don't have to
be purists about how we generate our linked open dataset; if the source web sites
can't or won't add structured data, then we can figure it out ourselves!
- Contribute to the DIALLED crawler source code.
We use Python3 with the RDFLib library for the bulk of our effort, and our code is
licensed under the GPLv3.
Live demos and accompanying source
About the project