You are here
"data about data" -- like that helps!
Submitted by erikhatcher on Mon, 2008-01-14 15:45
Submitted by erikhatcher on Mon, 2008-01-14 13:37
I'm honored to be speaking at the upcoming OLA Super Conference 2008. I'm presenting "Collex: Collecting and exhibiting scholarly materials" (session #406, Thursday January 31, 10:40am) and " Blacklight: the University of Virginia's Catalog on Solr" (session #1203, Friday, February 1, 2:10pm). I'm looking forward to demonstrating these projects and sharing their latest news.
Jon Phipps - NSDL Metadata Registry, Cornell University Libraries
An introduction to the Metadata Registry, an open source vocabulary, metadata schema, and DC application profile manager and registry. The Registry provides a bridge between the XML an RDF worlds, providing its output in XML Schema and SKOS/OWL, as well as providing managed namespace services, URI design, permanent URLs with content negotiation, support for multi-user ontology design, change history and version management tools.
Submitted by edsu on Wed, 2007-06-13 10:08
CrossRef is now accepting proposals for the most original and promising use of CrossRef's article-level metadata as part of a scholarly research/authoring tool or related library service. Free access to participating publishers' metadata and DOIs will be granted to the researcher or librarian whose proposal is selected.
Deadline for proposals: July 15, 2007
What to submit:
- A short but thorough description of the service you would like to
create using the CrossRef metadata, including estimated implementation time
- Links to software/services/protocols/standards that you may have
implemented in the past
- Your CV
- Amy Brand, PhD, Director of Business & Product Development
email: abrand at crossref dot org
More details available at CrossRef
Background on Crossref Metadata and Web Services:
CrossRef is an independent membership association, founded and directed by publishers. CrossRef helps connect users to primary research content by enabling publishers to work collectively. Our citation-linking network today covers over 27 million articles and other content items from several hundred scholarly and professional publishers.
CrossRef Web Services, or CWS, offers an easy-to-use suite of tools for authorized partners to collect metadata on a cross-publisher basis to streamline their own crawling, indexing, and linking services. One way to access the 27 million+ metadata records now registered in CrossRef is via our OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) interface. The CrossRef OAI-PMH repository interface serves as the central point for the distribution of metadata from participating publishers, utilizing a robust and widely adopted technology targeted at consumers of large quantities of metadata. Access to the CrossRef’s metadata repository is controlled by IP authentication and can be tailored to provide specific content from select publishers to each authorized recipient.
The metadata in CrossRef’s database consists of basic bibliographic information for each item, including author(s), title, journal name, ISSN, volume, issue, page, and DOI-URL pair. Here are links to some sample CWS metadata: http://www.crossref.org/oai-handler_sample.xml, http://www.crossref.org/oai-pmh_sample2.xml.
CrossRef's data repository is extensive but highly variable in quality from publisher to publisher. They have recently begun to take measures to ensure higher quality metadata. Any suggestions you may have regarding how best to achieve a higher level of data quality and consistency, as part of your proposal or otherwise, would be greatly appreciated. Different but similarly basic metadata is collected for other types of content. In addition, many publishers now also deposit references as part of their metadata.
Submitted by erikhatcher on Wed, 2007-01-17 07:56
(my first code4lib post)
Ed Summers and William Groppe have jumped in head-strong to crafting a Ruby DSL to Solr. solrb is coming along nicely thanks to our collaborations.
Another early adopter asked a question on the solr-user e-mail list, and I replied with a lot of juicy tidbits to whet your appetite.
We're moving as fast as we can in order to use this infrastructure for the basis of Solr Flare, destined to be faceted browsing plugins for Rails allowing your Rails application to easily benefit from what Solr and solrb offer. Solr Flare will debut at the code4lib pre-conference event in whatever form it happens to be in by then.
Submitted by edsu on Mon, 2006-11-27 21:27
In light of the change in the README file at MIT:
What happened to the data?
We are currently evaluating legal issues about ownership and licensing possibilities
and hope to be able to be back online with it soon. Wish us luck.
the torrent has been temporarily (hopefully) disabled. Thanks to MIT for continuing to investigate how to make large bibliographic data sets available to the general public.
Submitted by edsu on Tue, 2006-11-21 16:57
The Simile folks made MARC, MODS and RDF versions of Barton (the MIT Library Catalog) available as a test data set.
To show code4lib's support for Simile and open-library-data we've set up some torrents for the data.
Please help us seed this data far and wide.
Attached is the first version of the unAPI spec. Excerpting:
"unAPI is a simple website API convention. There are many wonderful APIs and protocols for syndicating, searching, and harvesting content from diverse services on the web. They're all great, and they're all already widely used, but they're all different. We want one API for the most basic operations necessary to perform simple clipboard-copy functions across all sites. We also want this API to be able to be easily layered on top of other well-known APIs."
"The objective of unAPI is to enable web sites with HTML interfaces to information-rich objects to simultaneously publish richly structured metadata for those objects, or those objects themselves, in a predictable and consistent way for machine processing."
This specification is being developed under the ROGUE 05 process. The next revision, revision 1, is due on Thursday, 16 Feb 2006, and is dependent on at least three working implementations to move forward. That date is also day 2 of code4lib 2006; I will be speaking about unAPI on day 1, and hopefully we'll have a breakout session following up the talk which will be a great time to discuss the spec, to try implementing it, and to work on revision 1 for release the next day. :)
We are tracking milestones, notes, and document revisions within a private basecamp project, but interested parties are welcome to participate (contact me for access).
The list of record for unAPI development is the gcs-pcs-list. Please send questions/comments/implementation feedback there.
[Update (2005-05-21)]: Please note that the official unAPI site now lives at unapi.info.
Submitted by edsu on Fri, 2006-01-13 16:41
With a bit of python the code4lib 2006 schedule has been encoded using the hCalendar microformat. hCalendar allows you to bundle up event information so it is available downstream to machines that crawl the content, while keeping the content readable for us humans. It is really very easy to grok, and the only reason for the script was to avoid repetitive typing.
Here's what a sample event looks like in hCalendar:
11:05- 11:25 -
Connecting Everything with unAPI and OPA - Dan Chudnov
Which is equivalent to the ical
SUMMARY;LANGUAGE=en:Connecting Everything with unAPI and OPA - Dan Chudnov
The page includes a 'subscribe' link (at the top) to Brian Suda's x2v, which extracts hCalendar from a page and spits back iCal for your calendaring application to ingest.