You are here

Feed aggregator

LITA: Call for submissions for the LITA / Ex Libris Student Writing Award

planet code4lib - Fri, 2018-01-26 22:41

The Library and Information Technology Association (LITA), a division of the American Library Association (ALA), is pleased to offer an award for the best unpublished manuscript submitted by a student or students enrolled in an ALA-accredited graduate program. Sponsored by LITA and Ex Libris, the award consists of $1,000, publication in LITA’s refereed journal, Information Technology and Libraries (ITAL), and a certificate. The deadline for submission of the manuscript is February 28, 2018.

The purpose of the award is to recognize superior student writing and to enhance the professional development of students. The manuscript can be written on any aspect of libraries and information technology. Examples include digital libraries, metadata, authorization and authentication, electronic journals and publishing, open source software, distributed systems and networks, computer security, intellectual property rights, technical standards, online catalogs, and universal access to technology.

To be eligible, applicants must be enrolled in an ALA-accredited program in library and information studies at the masters or PhD level, follow the detailed guidelines, and fill out the application form (PDF). Submission guidelines are available at: Send the signed, completed forms electronically no later than February 28, 2018, to the Award Committee Chair, Eric Phetteplace at

The award will be presented at the LITA President’s Program during the 2018 ALA Annual Conference in New Orleans, Louisiana.

Library of Congress: The Signal: Digital Scholarship Resource Guide: Text analysis (part 4 of 7)

planet code4lib - Fri, 2018-01-26 15:39

This is part four in a seven part resource guide for digital scholarship by Samantha Herron, our 2017 Junior Fellow. Part one is available here, part two about making digital documents is here, part three is about tools to work with data, and part four (below) is all about doing text analysis. The full guide is available as a PDF download

Text analysis

Clean OCR, good metadata, and richly encoded text open up the possibility for different kinds of computer-assisted text analysis. With instructions from humans (“code”), computers can identify information and patterns across large sets of texts that human researchers would be hard-pressed to discover unaided. For example, computers can find out which words in a corpus are used most and least frequently, which words occur near each other often, what linguistic features are typical of a particular author or genre, or how the mood of a plot changes throughout a novel. Franco Moretti describes this kind of analysis as “distant reading”, a play on the traditional critical method “close reading”. Distant reading implies not the page-by-page study of a few texts, but the aggregation and analysis of large amounts of data.

The strategies of distant reading require scholars to “operationalize” certain concepts–to make abstract ideas quantifiable. For more on text mining see Ted Underwood’s post on what text mining can do and what some of the obstacles are.

Some important text analysis concepts:

  1.  Stylometry

Stylometry is the practice of using linguistic study to attribute authorship to an anonymous text. Though some of stylometry’s methods and conclusions (both digital and analog) have been disputed, the practice speaks to some of the kinds of evidence researchers hope to surface using text analysis.

One of the early successes of stylometry was in 1964 when Frederick Mosteller and David Wallace used linguistic cues to assign probable authorship to disputed Federalist Papers. Patrick Juola for Scientific American describes it: “[The researchers] showed that the writing style of Alexander Hamilton and James Madison differed in subtle ways. For example, only Madison used the word ‘whilst’ (Hamilton used ‘while’ instead). More subtly, while both Hamilton and Madison used the word ‘by,’ Madison used it much more frequently, enough that you could guess who wrote which papers by looking at how frequently the word was used.” Using these methods, they discovered that the disputed papers were likely written by Madison.

Today, computers can perform these kinds of comparative tasks quickly, and keep track of a many different features difficult to track by hand (e.g. not only the relative presence of the word ‘by’ but the relative presence of ‘by’ at the beginning vs. the end of a sentence, or the ratio of ‘by’ to other prepositions, etc.).

Example Stylometry was recently used by researchers to look into the works of Hildegard of Bingen, a female author from the Middle Ages. Because she was not entirely fluent in Latin, she dictated her texts to secretaries who corrected her grammar. Her last collaborator, Guibert of Gembloux, seemed to have made many changes to her dictation while he was secretary. The researchers used digital stylometry methods to display that collaborative works are often styled very differently from works penned by either author individually.

Stylometric methods and assumptions can also be applied beyond author attribution. If stylometry assumes that underlying linguistic features can function as ‘fingerprints’ for certain authors, linguistic features might also be fingerprints for certain years or genres or national origin of author, and so on. For example, are there linguistically significant identifiers for mystery novels? Can a computer use dialogue to determine if a book was written before 1800? Can computers discover previously unidentified genres? This pamphlet from Stanford Literary Lab gives a good overview of their research into the question of whether computers can determine genre.

  1. Word counts, etc. and topic models

Stylometry deals with the attribution and categorization of texts’ style. Other distant reading research looks at semantic content, taking into account the meanings of words as opposed to their linguistic role.

Word frequency – One of the simplest kinds of text analysis is word frequency. Computers can count up and rank which words appear most often in a text or set of texts. Though not computationally complicated, term frequency is often an interesting jumping off point for further analysis, and a useful introduction into some of digital humanities’ debates. Word frequency is the basis for somewhat more sophisticated analyses like topic modeling, sentiment analysis, and ngrams.

To the right is a word cloud for Moby Dick.

A word cloud is a simple visualization that uses font size to represent the relative frequency of words–the bigger the font, the more frequently a word is used.

The word cloud is based on this data: (First column is word, second is word count, third is word frequency)

whale   466      0.004317093

like      315      0.0029182069

ye        283      0.002621754

man     251      0.0023253013

ship      227      0.0021029618

sea       216      0.0020010562

old       212      0.0019639996

captain 210      0.0019454713

dick     199      0.0018435656

moby   199      0.0018435656

said      188      0.00174166

ahab     180      0.0016675468

time     169      0.0015656411

little     165      0.0015285845

white    164      0.0015193204

queequeg         162      0.0015007921

long     150      0.0013896223

great     146      0.0013525657

men     138      0.0012784525

way      134      0.001241396

say       132      0.0012228676

whales 132      0.0012228676

head     124      0.0011487544

good    116      0.0010746412

boat     111      0.0010283205

thought            110      0.0010190563

round   106      0.0009819998

sort      101      0.000935679

hand    98        0.0009078866

world   92        0.00085230166

come    90        0.0008337734

sperm   89        0.00082450925

look     88        0.0008152451

whaling            88        0.0008152451

deck     86        0.0007967168

night    84        0.00077818846

chapter 82        0.0007596602

seen     82        0.0007596602

day      78        0.0007226036

know   78        0.0007226036

tell       78        0.0007226036

things   78        0.0007226036

right     77        0.0007133394

water    76        0.0007040753

away    74        0.000685547

bildad  74        0.000685547

far        74        0.000685547

god      74        0.000685547

You’ll notice that this particular word count (completed using Voyant Tools) doesn’t include certain stop words: ‘fluff’ words like pronouns, articles, conjunctions, and prepositions (e.g. she, that, the, any, but…), keeping only ‘meaning’ words–names, nouns, verbs, adjectives, adverbs.

Mostly, this data aligns with what we already know or would assume about Moby Dick: that it concerns a whale and an old captain at sea. But with this data, we can ask new questions: Is it significant that ‘whale,’ the most frequent word, is used 150 more times than the runner-up (or even more times if we include the plural ‘whales’ or the verb ‘whaling’)? Why is ‘like’ used so often? Can we safely assume that word count says anything at all about the book’s content or meaning? How does Moby Dick’s word frequency compare to Melville’s other works? To the works of his contemporaries?

Voyant is a set of out-of-the-box tools that allows you to manipulate and compare texts. Given a corpus (it is preloaded with two corpora: Jane Austen’s novels, and Shakespeare’s plays, but users can also supply their own) Voyant displays word counts and clouds, comparative frequencies over time, concordances, and other visual displays. There are plenty of other more sophisticated and customizable tools available  that do similar tasks, but Voyant is one of the most accessible, because it requires no coding by the user.

Here is a link to a list of clean demo corpora to play around with.

Google Books Ngram Viewer is also a powerful example of how word frequencies can be used as a jumping off point for scholarly inquiry. Using Google Books as its massive database, users can track the relative presence of words in books across time.

Here’s what a Google ngram looks like:

This ngram compares the (case-insensitive) frequency of ‘internet’, ‘television’, ‘radio’, ‘telephone’, and ‘telegram’ across the entire Google Books collection from 1850-2000. This graph (we suppose) reflects a lot of interesting historical information: the birth and quick rise of radio, the birth and quicker rise of the Internet, the birth and steady increase of television, which appears to level out in the 1990s. However, ngrams like this also allow us to ask questions: Does the 1944 peak in frequency of the word ‘radio’ in books reflect a historical peak in radio popularity? If not, is there some reason why people might be writing more about radios than using them? Or, why was the telegram so infrequently written of in books? Would running this same ngram on a corpus of newspapers rather than books return different results? And so on.

Here are some interesting and silly ngrams from webcomic xkcd.

Word frequency data at both the scale of a single book, and of very many books, asks as many questions as it answers, but can be an interesting jumping off point for beginning to envision texts as data.

Another popular text analysis method is topic modeling. A ‘topic’ is a set of words that frequently colocate in a set of texts (meaning that they occur near each other). In general, topic modeling tool looks through a corpus and spits out clusters of words that are related to each other. So, in a very hypothetical example, if you fed a topic modeling tool the text of every ecology textbook you could find, it might return topics like ‘dirt rock soil porous’ and ‘tree leaf branch root’ etc.

The significance of such a tool is more obvious at a large scale. A human can read an article on bananas and state with confidence that the article is about bananas and perhaps that the key words are ‘bananas’ ‘fruit’ ‘yellow’ ‘potassium’… But when working with a corpus that is say, the text of 100 years of a newspaper, or the text mined from every thread on a subreddit page, the ‘topics’ become more difficult to discern.

Example Robert K. Nelson at the Digital Scholarship Lab at the University of Richmond authored Mining the Dispatch, a project that uses topic modeling to look at nearly the full run of a newspaper, the Richmond Daily Dispatch, in the early 1860s. For example, one of the topics his model returned was predicted by the words ‘negro years reward boy man named jail delivery give left black paid pay ran color richmond subscriber high apprehension age ranaway free feet delivered.’ Then, by looking at articles where this topic was most prominent, it was determined that this topic most often refers to fugitive slave ads. By tracking the relative presence of this topic through time, one can track the relative presence of fugitive slave ads through time. Other topics identified by the model and named by Nelson include ‘Poetry and Patriotism’, ‘Anti-Northern Diatribes’, ‘Deserters’, ‘Trade’, ‘War Reports’, ‘Death Notices’, ‘Humor’, among others.

Topic models can reveal latent relationships and track hidden trends. Especially for unindexed corpora (like old newspapers, often, that do not have article-level metadata), topic modeling can be used to identify the incidence of certain kinds of content that would take years to tag by hand, if it were possible at all.

A popular topic modeling tool is MALLET, for those comfortable working in the command line. Programming Historian has a tutorial for getting started using MALLET for topic modeling. If you’re not comfortable in the command line, there is a GUI (graphical user interface) tool for implementing MALLET here (meaning you can input files and output topics without entering code yourself), and a blog post from Miriam Posner on how to interpret the output.


  • AntConc: Concordance tool.
  • DiRT Directory: Digital Research Tools directory.
  • From the Page: Crowdsourcing manuscript transcription software
  • Google Ngram Viewer: Explore ngrams in Google books corpus.
  • Juxta: For textual criticism (identification of textual variants). Look at base and witness texts side by side, locate variations easily. Offers analytic visualizations like heat maps.
  • Natural Language Toolkit: Computational linguistics platform for building Python programs that work with language data. “It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.”
  • MALLET: (Machine Learning for Language Toolkit) – Java-based package for sophisticated text analysis. Often used for topic modeling. See description above.
  • Programming Historian: Peer-reviewed, novice-friendly tutorials for digital tools and techniques.
  • R: Statistical software. Often used for text manipulation, but the language is less user-friendly than other coding languages (say, Python).
  • Stanford Natural Language Processing Group: Set of Natural Language Processing tools
  • Stylo: Suite of stylometric tools for R.
  • Voyant: Web-based text explorer (See above).
  • WordHoard: WordHoard contains the entire canon of Early Greek epics, as well as all of Chaucer, Shakespeare, and Spenser. The texts are annotated or tagged by ‘morphological, lexical, prosodic, and narratological crieteria’. Has a user-interface that allows non-technical users to explore textual date.


Evergreen ILS: Evergreen Highlights Opportunities for Modern Workflows at ALA Midwinter

planet code4lib - Thu, 2018-01-25 19:00

With the changing services libraries offer to their patrons, staff workflows are quickly changing as we strive to keep up with our customers’ high expectations. Library systems also need to adapt to support these modern workflows.

The Evergreen community is leading a session at the 2018 American Library Association Midwinter meeting to show how Evergreen 3.0 is helping libraries serve their users in the modern age. Evergreen at the desk, on the road, and everywhere in between is scheduled for 1 to 2 p.m. at the Colorado Convention Center, Mile High 1E.

Released this fall, Evergreen 3.0 features a new client that can run in a web browser on a desktop, tablet, or mobile phone. In addition to a new modern interface and new way of doing business, Evergreen 3.0 brings us many new features like digital bookplates, batch-editing of user records, public catalog support for linking to maps of shelving locations, and more robust infrastructure.

The session will also highlight new features expected in the March 2018 Evergreen release and provide opportunities for attendees to ask question about Evergreen and the community that supports it.

David Rosenthal: Magical Thinking At The New York Times

planet code4lib - Thu, 2018-01-25 16:00
Steven Johnson's Beyond The Bitcoin Bubble in the New York Times Magazine is a 9000-word explanation of how the blockchain can decentralize the Internet that appeared 5 days after my It Isn't About The Technology. Which is a good thing, because otherwise my post would have had to be much longer to address his tome. Follow me below the fold for the part I would have had to add to it.

I'm not the first to critique Johnson's work.  Bryan Clark's What the NY Times got wrong about Bitcoin is obviously written by a Bitcoin believer, because he missed the whole point of the article. It isn't about Bitcoin, it is about decentralizing the Web.

Johnson understands that the driving force behind the centralization of the Web has been economies of scale:
The self-reinforcing feedback loops that economists call “increasing returns” or “network effects” kicked in, and after a period of experimentation in which we dabbled in social-media start-ups like Myspace and Friendster, the market settled on what is essentially a proprietary standard for establishing who you are and whom you know. That standard is Facebook.But Johnson appears to believe that somehow, magically, blockchain-based systems are immune from economies of scale, increasing returns or network effects despite the fact that they are visibly, powerfully in effect in the Bitcoin ecosystem. He writes:
What would prevent a new blockchain-based identity standard from following Tim Wu’s Cycle, the same one that brought Facebook to such a dominant position? Perhaps nothing. But imagine how that sequence would play out in practice. Someone creates a new protocol to define your social network via Ethereum. ... That way of defining your social network might well take off and ultimately supplant the closed systems that define your network on Facebook.The best he can offer is that it "might well take off". Given Facebook's likely reaction to losing its user base, it won't. And even if it did, as Roger McNamee points out, it would be unlikely to cause users to abandon Facebook:
consumers, not the platforms, should own their own data. In the case of Facebook, this includes posts, friends, and events—in short, the entire social graph. Users created this data, so they should have the right to export it to other social networks. Given inertia and the convenience of Facebook, I wouldn’t expect this reform to trigger a mass flight of users. Instead, the likely outcome would be an explosion of innovation and entrepreneurship. Facebook is so powerful that most new entrants would avoid head-on competition in favor of creating sustainable differentiation. Start-ups and established players would build new products that incorporate people’s existing social graphs, forcing Facebook to compete again.And, given network effects, even if the users did abandon Facebook, they would congregate at some other centralized site.

Johnson repeats the blockchain enthusiasts claim for security:
In this one respect, the Bitcoin story is actually instructive: It may never be stable enough to function as a currency, but it does offer convincing proof of just how secure a distributed ledger can be. “Look at the market cap of Bitcoin or Ethereum: $80 billion, $25 billion, whatever,” Dixon says. “That means if you successfully attack that system, you could walk away with more than a billion dollars. You know what a ‘bug bounty’ is? Someone says, ‘If you hack my system, I’ll give you a million dollars.’ So Bitcoin is now a nine-year-old multibillion-dollar bug bounty, and no one’s hacked it. It feels like pretty good proof.”There are at least four reasons why this claim is complete BS:
  • The Bitcoin blockchain has been been hacked. At least one successful block withholding attack has taken place. There have been times when a single pool controlled 51% or more of the mining power. The Ethereum blockchain has been hacked; someone found a vulnerability and stole "$60M".
  • The "market cap" of a cryptocurrency is a joke. Its the result of multiplying the number of coins by the price of the last trade. Lets ignore that the price is subject to manipulation. Assume someone hacks the Bitcoin blockchain, and steals Satoshi Nakamoto's 1M Bitcoins. Now that the blockchain is vulnerable, what is the price of the next trade? So what are the 1M Bitcoins worth to the miscreant? By hacking the blockchain the miscreant has destroyed the value of the loot.
  • Lets assume that the miscreant can steal the 1M Bitcoins and, despite the fact that Nakamoto's wallet is now empty and this is visible in the blockchain, no-one notices and the price isn't affected. Since you can't buy Lamborghinis (or pretty much anything else legal, even registration at a Bitcoin conference) with Bitcoins, in order to enjoy his ill-gotten gains the miscreant now needs to obtain fiat currency by selling the 1M Bitcoins. How big an effect on the price would a sell order for 1M Bitcoins have? Once the Bitfinex theft of 120K BTC hit the headlines, Izabella Kaminska wrote Time to reevaluate blockchain hype:
    The mark-to-market value of the stolen coins is roughly $70m, but again who can really tell their true worth. Bitcoin is an asset class where the liquidation of 119,756 (approximately 0.8 per cent of the total bitcoin circulation) can move the market more than 20 per cent, suggesting a certain fantastical element to the valuation.
  • Finally, the bad guys don't need to hack the blockchain to steal money from the general populace. They're already hacking wallets, manipulating the markets, conducting Ponzi schemes (PonziCoin!), peddling ICOs and printing their own "US Dollars". Why kill the goose that lays the golden egg?
Clark is more perceptive
Blockhain technology isn’t inherently more secure than a third-party server. In the case of corporate titans like Facebook and Google, it’s almost certainly not. Cryptography is cryptography, and whether its principles are used to secure a server or a blockchain, one isn’t necessarily better than the other.
It’s also not, not better than the other. There are simply too many factors involved to make this claim.But he too drinks the Kool-Aid:
It does, however, offer a compelling proof of concept as the world’s largest bug bounty program.They should pay attention to Vitalik Buterin (a co-founder of Ethereum), when he writes in The Meaning of Decentralization:
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently. If any one actor gets more than 1/3 of the mining power in a proof of work system, they can gain outsized profits by selfish-mining. However, can we really say that the uncoordinated choice model is realistic when 90% of the Bitcoin network’s mining power is well-coordinated enough to show up together at the same conference?The security of a blockchain depends upon an assumption that is impossible to verify in the real world (and by Murphy's Law is therefore false).

The level of magical thinking throughout Johnson's piece is well illustrated by this paragraph:
As many critics have observed, ordinary users on social-media platforms create almost all the content without compensation, while the companies capture all the economic value from that content through advertising sales. A token-based social network would at least give early adopters a piece of the action, rewarding them for their labors in making the new platform appealing. “If someone can really figure out a version of Facebook that lets users own a piece of the network and get paid,” Dixon says, “that could be pretty compelling.” So why exactly is it better for the mass of the users for the value to be captured by a few early adopters than by a huge corporation? Even if "users own a piece of the network and get paid", experience with Bitcoin and ICOs shows that the early adopters leave only crumbs for the mass of users.

District Dispatch: CopyTalk: copyright librarian starter kit

planet code4lib - Thu, 2018-01-25 15:49

Many librarians are finding themselves in the position of being the local copyright expert. Some of these librarians are professionals who applied for a formal copyright librarian posting. However, other librarians are tasked with taking on copyright, to fill a growing yet unclear need in their organization, while retaining their other job responsibilities.

The purpose of this webinar is to help other incoming copyright librarians know what to expect, and to prepare them with a basic knowledge base of user needs to ease into them into their new role. This CopyTalk will provide specific guidance, and include “hands-on” best practices. A sample of the topics covered are finding collaborators within and beyond the library, how to start building the foundation for an education program, understanding what advocacy looks like, and getting a handle on the kinds of questions a Copyright Librarian answers.

Our speaker Emilie Algenio will share what she has learned in her first years of service, in her full-time librarian position as the Copyright/Fair Use Librarian for an American research university.

Emilie Algenio is the Copyright/Fair Use Librarian for the Texas A&M University Libraries, and she focuses on copyright education and outreach. She recently co-presented on their program for graduate students at the International Federation of Library Associations and Institutions’ annual meeting in Poland. She started her career as the Library Resident at the University of Massachusetts Amherst, followed by her position as the Consortia Resources Coordinator for the University of Texas System Libraries. Emilie graduated with a Bachelor of Arts from Guilford College, has a Masters of Library and Information Science from Simmons College, and is a 2014 graduate of Harvard University’s “CopyrightX.”

Thursday, February 1st at 2pm Eastern/11am Pacific for our hour-long free webinar. Go to and sign in as a guest.

The post CopyTalk: copyright librarian starter kit appeared first on District Dispatch.

District Dispatch: NCWIT AspireIT funding open for summer

planet code4lib - Thu, 2018-01-25 15:36

This is a guest post from Jennifer Manning, Program Director of AspireIT Partnerships at the National Center for Women & Information Technology. The ALA Washington Office’s Libraries Ready to Code initiative and NCWIT AspireIT are partners in connecting young women program leaders to public libraries to design and implement coding programs for K-12 girls in an exciting pilot project.

Girls and women in the U.S. are avid users of technology, but they are significantly underrepresented in its creation. NCWIT AspireIT can help libraries inspire more girls to become technology innovators.

AspireIT connects high school and college-aged members of our Aspirations in Computing community with K-12 girls interested in computing. Using a near-peer model, AspireIT Leaders teach younger girls fundamentals in programming and computational thinking in fun creative environments that are supported by Partner Organizations from the NCWIT community. The relationship between the AspireIT Leaders and their AspireIT Partner Organizations fosters mentoring with technical professionals, increases young women’s confidence in their computing abilities, and develops valuable leadership skills.

To date, NCWIT has gifted more than $800,000 to 300 programs, providing an estimated 240,000 instruction hours to nearly 8,000 girls in 40 states, the District of Columbia, and the U.S. Virgin Islands since 2013. since 2013. AspireIT aims to engage more than 10,000 girls by 2018.

The AspireIT Partner Organization role connects you with AspireIT Leaders from our Aspirations in Computing community to share their passion by facilitating K-12 computing education in your local community, inspire future innovators, and give back. Not to mention – we will provide non-profit organizations with up to $3,000 in support to run these programs! Visit our AspireIT Toolkit to view the current RFP and application.

We are in the process of opening our next round of AspireIT funding, for programs occurring between June 15, 2018, and October 14, 2018. Matching is open now, applications will officially open on February 1, 2018, and are due on March 11, 2018. Please fill out a new interest request so that potential AspireIT Leaders can connect with you.


  • Join us via Zoom on Monday, February 5, 2018, at 3pmET for a Q&A session for more information on being an AspireIT Partner organization.
  • If you missed the “Overview for AspireIT Partner Organization” Webinar on January 18 and would like to know what it means to be a Partner Organization, review these Slides and the webinar recording.
  • Email the team:

The post NCWIT AspireIT funding open for summer appeared first on District Dispatch.

Open Knowledge Foundation: OpenEdu Policies reports: JRC Research Centre

planet code4lib - Thu, 2018-01-25 12:04

This blog has been reposted from the Open Education Working Group blog and has been written as a joint effort by Javiera Atenas and  Paul Bacsich, co-coordinators of the Open Education Working Group. 

Hot off the press: OpenEdu Policies reports . These reports are the final outcome of one and a half intense years of research into open education policies involving many stakeholders, particularly ministries of education, research and science across Europe. ‘Going Open’ is a report bringing policy recommendations on open education at regional, national and EU levels. ‘Policy Approaches to open education’ is a report covering the 28 EU Member States, presenting case studies about how each country approaches open education policies. Both reports are part of the JRC’s OpenEdu Policies project.

The Joint Research Centre of the European Commission has just published a comprehensive overview report (164 pages) on Policy Approaches to Open Education across all of the 28 EU Member States.

The Foreword to the report, by Yves Punie (Deputy Head of Unit DG JRC Unit Human Capital and Employment) summarises the conclusions as follows: “The diversity of polices and approaches presented herein reflect the diversity that is intrinsic to the European Union. Each Member State has specific goals for education and priority areas to address when formulating its policies. However, this research shows that Member States are aware of open education issues and that in one way or another nearly all of them have implemented some sort of initiative or action plan in relation to open education, even though that goal is not explicit in some cases.” He goes on to describe the report as “another step taken by the European Commission (DG EAC and JRC) to meet Members States’ requirements for more research and evidence on open education in support of policy-making in Europe.”

The work for the overview report was carried out by the European Commission’s Joint Research Centre (JRC) in collaboration with the Research Institute for Innovation & Technology in Education (UNIR iTED) at the Universidad Internacional de la Rioja (UNIR) in Logroño, Spain. An international team based in Spain (Daniel Burgos), Italy (Fabio Nascimbeni and Stefania Aceto) and the UK (Javiera Atenas and Paul Bacsich) carried out the work, with assistance from 28 ministry officials and other experts who agreed to be interviewed. The interview work was supported by substantial desk research across all Member States, for which a further large number of experts on open education were consulted, along with outputs from key projects such as OER World Map, OERup!, D-TRANSFORM, ADOERUP (for the European Parliament), POERUP and earlier JRC projects and reports on open education.

In particular all identified policies were analysed using the OpenEdu Framework produced by JRC, which identifies six core dimensions of open education (Access, Content, Pedagogy, Recognition, Collaboration and Research) and four transversal dimensions (Strategy, Technology, Quality, Leadership). The report is available here.

The report, together with additional research and expert consultations, forms the basis for the also just released JRC report “Going Open: Policy Recommendations on Open Education in Europe (OpenEdu Policies)”, which highlights policy options to further open up education in Europe.

Our long report is, we believe, the first one of its kind to bring together at a detailed level policy work in open education for a complete geopolitical region. The team will be happy to explain the methodology to other interested research groups. We can see no reason why the approach, including use of the OpenEdu Framework for analysis, cannot be replicated for other geopolitical groupings such as Latin America, Asia-Pacific, Commonwealth of Nations, La Francophonie and more widely across Europe. Regarding the last, it would perhaps be most immediately useful if funding could be found for those countries in the European Economic Area and the European Neighbourhood to carry out similar work.

Inevitably in such a detailed report, there will be items at the Member State level that get rapidly out of date. Indeed, we hope that such reports as this and the overview reports from JRC will foster an increased climate of policy formation and creation of initiatives at Member State level, not only at EU level. As part of its ongoing work, the Open Education Working Group will continue to make its email list and blog available to interested researchers and specifically to encourage them to produce similar and updated material for their countries. For more details see this recent update blog.

DuraSpace News: Registration Open: "Getting Started with Fedora" Webinar

planet code4lib - Thu, 2018-01-25 00:00

DuraSpace presents a community webinar,

“Getting Started with Fedora”

Wednesday, February 28, 2018

12:00 p.m. Eastern

Presented by David Wilcox, Fedora Product Manager

LITA: Jobs in Information Technology: January 24, 2018

planet code4lib - Wed, 2018-01-24 20:54

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Boston College, Senior Research Librarian/Bibliographer, Chestnut Hill, MA

California Digital Library of the University of California, Discovery and Delivery Director, Oakland, CA

Brazosport College, Librarian – Reference and Outreach, Lake Jackson, TX

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Tim Ribaric: Gettin' Vizy With it

planet code4lib - Wed, 2018-01-24 19:38


That list of LIS Peer Reviewed journals was a great idea. Now there is a CSV version of the data

read more

Lucidworks: Increase Retail Sales with Recommendations

planet code4lib - Wed, 2018-01-24 14:10

Retailers know that it is harder and more expensive to acquire new customers than to sell new things to existing customers. That’s why they spend a lot on loyalty programs and Customer 360/Customer Journey programs. One of the best tools a retailer has for selling products to customers is recommendations.

Recommendations are simply that, suggestions by the retailer on other things the customer may be interested in. In order to do this, a retailer needs to know the customer. If I’ve never purchased anything pink and frilly, don’t recommend things to me that are pink and frilly. Instead, recommend something that suits my interests and preferences, like boxing gloves, a new cooking apron, or a nice oak table.

Strictly “brick and mortar” retailers are a rarity these days. I look at (see REI’s talk from the last Revolution) before I go into the store to get a general idea of what I’m going to purchase. I shop there especially for cycling gear because of their focus on customer service. In general, employees are there to help me decide and explain the differences between products. However, depending on how I got there that day (by bike, or convertible, or the family sedan), I may still end up making my purchase on their website.

Your Homepage: Items for User Recommender

Regardless of how I get to a store, for loyal customers, recommendations should start on the homepage. Lucidworks Fusion gives you an Items for User Recommender based on a user’s past interests, purchases, or other information you may have captured.

Anyone who has been involved in the Internet at all knows that most users go to your homepage and then leave. Real estate on your homepage is precious. Sure you may have a promotion, but very soon after, I should see something that speaks to me. The Items for User Recommender is a great way to generate these kinds of recommendations.

A Product Page: Items for Items Recommender

Once I click on one of those items you recommended on your homepage or that I found in a search, I should quickly see other recommended items. This is important for a number of reasons. I might click on an item that is close to what I need, but isn’t exactly right. For instance, I may need a 4G phone that is Android not iPhone or a Quick-Dri shirt that is black rather than white. In other cases, you may have a recommendation for an add-on product.

This is where the Lucidworks Fusion Items for Items Recommender comes into play. It provides a recommendation for other items based on what other users have purchased or viewed when buying that item.

When buying products from video game machines, to coffee machines, to cycling gear, there is often a complementary product that goes with the purchase. If you buy a coffee press, you may want a grinder. If you buy a bike, you may need a tire pump or spare tube. Offering these at the time of purchase or on the product page is a good way to ensure the customer buys everything they would need from your site. In the physical world, it is a way to make offers at the register. “Hey did you get a spare tube for this bike?”

A Promotion: Users for Items Recommenders

Whether it is an offer a user sees when they return to your site, or a flash sale alert sent by email, it is important to target your promotions. The Users for Items Recommender lets me find which users are interested in an item that I’m trying to promote.

I frequently get events in the mail from Ticketmaster. These are obviously strictly geographically targeted, as they sent me WWE and The Eagles. There is nothing in my browsing, search, or purchase history that should cause either of these to be recommended to me. I don’t like The Eagles and am not interested in WWE, so I’ve stopped looking at their emails as a result. If they used Fusion, Ticketmaster could have looked at what events I’ve actually purchased tickets in order to better target events I’d be interested in. Promotions can boost sales — but only if they are targeted. Anything else is a waste of resources and risks reducing the effectiveness of future promotions.


Capturing Signals

Signals are data about what users have done, things like purchases, or queries, or clicks. Lucidworks Fusion has built-in signal capture capability. For custom applications this just means posting a Javascript object to a URL. However, you can do all of this automagically by using Fusion App Studio. App Studio allows you to put together search applications with pre-written components. Through App Studio you can automatically capture signals with just one html-like tag.

In Depth

All of this works using signal capture, where an application sends Fusion a little data about what a user has done, combined with a machine learning algorithm and your data. Specifically, recommendations are based on a machine learning algorithm called Alternating Least Squares. If you think of behavior as composed of points on a line graph, ALS is an algorithm for finding the “most similar” behaviors.

One of the most powerful capabilities about Fusion’s recommendations is that they are “just search”, which means that your developers can slice and dice and combine them with other criteria (geography, stated preferences, “not the Eagles”).

Next Steps

“We’ve seen our conversions increase more than we had initially hoped for,” said Andy House, IT Director of Menards. If conversions are what you’re after, consider looking into Lucidworks Fusion and its advanced recommendations and machine learning capabilities. If you’re interested in rapidly developing search applications that are ready to take advantage of these capabilities, consider adding Fusion App Studio.



The post Increase Retail Sales with Recommendations appeared first on Lucidworks.

Ed Summers: Structure

planet code4lib - Wed, 2018-01-24 05:00

From Chaffee & Lemert (2009), p. 126:

It is often said that the most important tool of structural methods in all fields is the archive – that is, the discovered (or maintained) residue of mental or cultural events held in libraries and other social forms of memory. The structural method is often referred to as archeological or geological in nature – a systematic reconstruction of evidently dead and buried events and objects.

This is the last mention of the archive that Chaffee and Lemert make in this overview of Structuralism and Poststructuralism, so we’re kind of left to fill in the blanks about who often said it. But the subsequent discussion of Derrida and Foucault are good indicators of they might be thinking of.

For someone who studies information it’s quite surprising to see how quickly this little passage moves from talking about libraries to forms of social memory, to physical processes like geology. It reminds me a bit of Jenkinson’s idea of naturalness which Caravaca (2017) relates to the Italian concept of sedimentation in archives. How do our archival practices shape that sedimentation process?

As I write this on a computer, in the form of a blog post, by placing a text file on the file system as Markdown, running a piece of software to convert the text to HTML, and publishing it to the web using another piece of software, it’s interesting to think about how the structuring method of the archive is present in the form of computation. Does computation disrupt or distort this sedimentation process? Does the debate around the roles of la parole (speaking) and la langue (a community’s language) in Structuralism and Poststructuralism have anything useful to say about archival materials and their deployment on the web?

I guess I’ll find out more when I read Foucault (1966) and Bourdieu (1977) later this semester.


Bourdieu, P. (1977). Outline of a theory of practice (Vol. 16). Cambridge University Press.

Caravaca, M. M. (2017). The concept of archival “sedimentation”: Its meaning and use in the italian context. Archival Science, 17(2), 113–124.

Chaffee, D., & Lemert, C. (2009). The new Blackwell companion to social theory. In B. S. Turner (Ed.). Wiley-Blackwell.

Foucault, M. (1966). The order of things: An archaeology of the human sciences. Pantheon.

District Dispatch: Advocating for libraries: Tips for talking to legislators

planet code4lib - Tue, 2018-01-23 20:39

The FY 2019 federal budget proposal will be released in the coming weeks: what’s your strategy to help #FundLibrariesYear-round advocate and Idaho State Librarian Ann Joslin offers valuable tips for effectively engaging your members of Congress.   

Few legislators will go on record saying they don’t like libraries, so enlisting legislative support should be simple, right? But engaging with elected officials doesn’t always equate to consistent support for libraries and library issues. With members of Congress heading back to their districts soon for the holiday break, what are the most effective ways to talk with them about libraries?

Idaho State Librarian Ann Joslin first got to know Representative Mike Moyle by sharing an Idaho book with him in the rotunda of the state capitol building. Rep. Moyle is now the House Majority Leader.
Photo credit: Idaho Commission for Libraries

I have been Idaho’s state librarian since 2005 and held several positions in the state library agency for 26 years before that. The Idaho Commission for Libraries—the state’s library development agency—relies on a mixture of state and federal funding to execute its mission of assisting the more than 850 public, school, academic, and special libraries in Idaho. And with Idaho libraries being as varied as the state’s landscape in terms of size, remoteness, and relevance in their community, creating effective statewide library programs and services remains challenging.

Speaking to Idaho’s legislators was not easy or comfortable at first, but through repetition and years of practice, I’ve learned some things that have increased my effectiveness:

  • Don’t overestimate what legislators know (or understand) about your cause. Be ready to educate them each legislative session, and do so in engaging ways. Legislators are busy people who get a lot of information thrown at them, often about unfamiliar subjects and issues. Make it simple for legislators to absorb—and remember—your library’s story and why it’s important to their constituents and your state. For example, we have created eye-catching brochures and infographics that clearly illustrate facts, statistics, and issues in a way that is easy to understand and retain. And these materials can be customized with data about a legislator’s district or a particular part of the state.
  • Research your legislators. Focus your efforts on senators and representatives whose stated goals and interests align the best with the library mission. For example, one of our agency’s strategic goals concerns workforce development, which is also a stated priority for one of our key legislators. In our correspondence with this legislator, we demonstrate specific library and agency programs, services, and assistance that the community uses to spur workforce and economic development. We also include a success story that highlights the issue and the effect the agency and library have on the topic and the community. The story personalizes the subject and makes it more relevant and memorable. And we customize a version of a brochure or infographic to reinforce the topic.
  • Tailor your message. Find out what messages might resonate with the people who have control over the purse strings and tailor a pitch to each one. This can go a long way toward securing the funding you’re looking for. And be prepared to share that targeted message—anywhere. If you unexpectedly encounter a legislator—maybe in line at your favorite coffee place—introduce yourself and bring up the library.

In Idaho, we are working to get the library message out in new and different ways, such as through direct, consistent follow-up with each of our senators and representatives throughout the year. In addition, we conduct personalized outreach to them arising from specific opportunities, like media coverage of a member of Congress reading to school children.

We are also helping a group of library directors to be more effective spokespeople in a variety of settings, from speaking in front of the Rotary Club to doing a live TV interview. In our next training session, we plan to include library staff members, because one thing we learned is that the director is not always the best choice to represent the library. A young staff member who is passionate about social media might be the right person to handle those duties—as long as clear guidelines, responsibilities, and expectations have been established in advance.

Advocacy, engagement, outreach, public relations, and good, old-fashioned schmoozing are all components of disseminating the library message to the appropriate audience. And they are ongoing and ever-changing. So be persistent in your efforts and unafraid to alter your course. Don’t let the “because we’ll never get his or her support” mentality limit your future achievements.

Ann Joslin is the Idaho State Librarian. This post first appeared in American Libraries’ blog The Scoop

If you haven’t already, register here for National Library Legislative Day 2018 (May 7-8) – you can come to Washington or participate from home!

The post Advocating for libraries: Tips for talking to legislators appeared first on District Dispatch.

Islandora: Islandora Camp HRM 2018 - Call for Proposals

planet code4lib - Tue, 2018-01-23 16:06

Doing something great with Islandora that you want to share with the community? Have a recent project that the world just needs to know about? Send us your proposals to present at iCamp HRM! Presentations should be roughly 20-25 minutes in length (with time after for questions) and deal with Islandora in some way.

You can see examples of previous Islandora camp sessions (albeit from a long time ago!) on our YouTube channel.

The Call for Proposals for iCampHRM in Halifax will be open until April 1st.

Type: blog Name * Please tell us your name Institution Please tell us from where you'll be joining us Session Title * Please tell us what your session will be called (you can change this later if you come up with something you like better) Email Address * Please give us a way to contact you Session Details * Please tell us a little bit about what you would like to present. Rough ideas are welcome. Summary Please give a brief summary that can be printed in the camp schedule if your proposal is accepted. CAPTCHAThis question is for testing whether you are a human visitor and to prevent automated spam submissions. Math question * 1 + 11 = Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

David Rosenthal: Herbert Van de Sompel's Paul Evan Peters Award Lecture

planet code4lib - Tue, 2018-01-23 16:00
In It Isn't About The Technology, I wrote about my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?, but only in the context of the push to "decentralize the Web". I believe Herbert's goal for this lecture was to spark discussion. In that spirit, below the fold, I have some questions about Herbert's vision of a future decentralized system for scholarly communications built on existing Web protocols. They aren't about the technology but about how it would actually operate.

My questions fall into two groups; questions about how the collaborative nature of today's research maps onto the decentralized vision, and questions about how the decentralized vision copes with the abuse and fraud that are regrettably so prevalent in today's scholarly communication. For some of the questions I believe I spotted the answer in Herbert's talk, my answers are in italic.
CollaborationTabby's Star FluxI was one of the backers of the "Where's The Flux" Kickstarter, astronomer Tabetha Boyajian's successful effort to crowdfund monitoring the bizarre variations of "Tabby's Star" (KIC 8462852) which, alas, are no longer thought to be caused by "alien megastructures".

A 2016 paper arising from this research is Planet Hunters X. KIC 8462852 – Where’s the flux?, with Boyajian as primary author and 48 co-authors. 11 of the authors list their affiliation as "Amateur Astronomer"; the discovery was made by the Planet Hunters citizen science project. The affiliations of the remaining 38 authors list 28 different institutions. 7 of the authors list multiple affiliations, one of them lists 3. The paper acknowledges support from at least 25 different grants from, by my count, 10 different funders. The paper combines observations from at least 7 different instruments and 2 major astronomical databases. (There is actually a more recent paper about this star with authors listing 114 different affiliations, but I was lazy).

As I understand it, Herbert's vision is to have researchers in control of access to and the content of their own output on the Web, in "pods" hosted by their institution. This sounds like a laudable goal. But in this case the output is from 49 authors who each have from 0 to 3 institutions. The question is, where is the paper? Is it:
  • In Boyajian's pod at Yale, the primary author's sole institution? She was post-doc there at the time of publication. Or at Louisiana State, where she is now faculty? Do we really want one person to have sole control over access to and the content of this collaborative work? What, for example, is to stop the primary author of a popular paper erecting a paywall? Or preventing access by a hated rival?
  • In each institutional author's pod at each of their institutions? That's 47 copies. But what about the citizen scientists? Are only credentialed researchers entitled to credit for and control of their work? Is the Planet Hunters project an institution?
  • In each author's pod at each of their 46 institutions, plus 11 personal pods some place else? That's a total of 57 copies of one paper. Lots Of Copies Keep Stuff Safe, but how can we ensure that all the copies remain the same? Presumably, each change to one copy notifies the others via their inbox, but this seems to be a security risk. To which of the putatively identical copies does the DOI resolve?
    • The primary author's? If so, what is the point of the other 56 copies?
    • One chosen at random?
    • All 57? What is the user to do with a DOI multiple resolution page with a list of 57 entries?
  •  At a single pod created specifically to hold the collaborative work? Is the primary author in control of the pod? Is there some method by which all authors vote on control of the pod?
  • At one pod at each of the 10 funders of the research? What about the citizen scientists? Which of the multiple researchers funded by each funder is in control of the pod at that funder? Or is the funder in control of the pod? How are the copies maintained in sync?
Presumably, the raw data resides at the instruments and databases from which it was sourced. Is this raw data linked by Signposting to the processed data reported in the paper? To the paper itself? How do the instruments and databases know to create the links, and from which parts of the data? Presumably, this requires the databases and instruments to implement their own pods which implement the inbox and notification protocols.

If there are multiple copies of the processed data, are they linked by Signposting? If so, how does this happen? Presumably again via the pods notifying each other.

Where does the processed data reside?
  • In Boyajian's pod at Yale? Or at Louisiana State? Much if not all the work of processing was done by others, so why does Boyajian get to control it, and move it around with her?
  • In each of the pod(s) of the authors who did the processing? The data might be voluminous, and we're now storing many copies of it, so who gets to pay the storage costs (for ever)? Which of these pods does which copy of the paper point to? If its the copy in each researcher's pod, and those who did the processing point to their own copy of the processed data, the copies of the paper in the multiple pods are no longer the same, so how are they kept in sync?
  • In a single pod created to hold the processed data (and the paper itself)? Again, who gets to control this pod and, presumably, pay the storage costs (for ever)?
These questions relate to a paper with a lot of authors. A different set of questions relate to a paper with a single author, but these are left as an exercise for the reader.
Fraud and AbuseDigital information is malleable. Lets assume for the sake of argument that papers (and data) once published are intended to be immutable. Do authors sign their papers (and the data)? If so, there are a number of previously unsolved problems:
  • How do we implement a decentralized Public Key Infrastructure? 
  • How do we validate signatures from the long past? Doing so requires a trusted database of the time periods during which keys were valid (see Petros Maniatis' Ph.D. thesis).
  • How do we update signatures (which are based on hashes) when the hash algorithm becomes vulnerable? Do we need to get all 49 authors to update their signatures, and if so how? Or is it Boyajian's responsibility to update her signature? She's not a computer scientist, how does she know the hash algorithm that was used and that it is now vulnerable? And now 48 of the signatures are outdated. This is easy to detect, so does it matter?
  • How do we detect, and recover from, leakage of a researcher's private key?
Signatures on papers and data allow for post-publication changes to be detected. If such a change is detected, we would like to recover the previous state. Herbert envisages that publication would automatically cause the published version to be archived, creating a copy that is outside the control of the author. In the case where a copy resides in each author's pod, does publication trigger archiving of each of them? If not, how does the archive know which pod's copy to archive? I believe Herbert views signatures as impractical, and depends on archiving at the point of publication as the way to detect and recover from post-publication changes.

Since archives are required to have mechanisms to detect and recover from damage or loss, does not the archived copy become in effect the canonical one? If so, why does not the DOI resolve to the reliable, persistent, canonical copy in the archive? If so, what is the role of the copies in the authors' pods? How do we implement and sustain a decentralized archiving infrastructure? If, as is highly likely, the archiving infrastructure is or becomes centralized, what has the decentralized pod structure achieved?

If authors do not sign their papers, how are post-publication changes detected? By comparing the copies in the pods with the copy in the archive(s)? Who is responsible for regularly making this comparison, and how are they funded? What happens when a change to a copy in an author's pod is detected? How does this result in the status quo being restored?

In the real world papers and data, once published, are not immutable. According to whether the changes are in the author's or the public interest, there are four different kinds of change, shown in the table.

Four kinds of post-publication changePublicInterestYesNoAuthor'sYes13InterestNo42
Lets look at each of these kinds of change:
  1. There are changes that the author wishes to make and are in the public interest that they should make, such as errata. Each author is in control of their own pod and its content, so they can make the change there. Does this change trigger a new version at the archive? But in the case where the paper resides in each author's pod, how are these changes made consistently? Does each one cause a new version at the archive, requiring the archive to deduplicate versions? I believe Herbert would answer yes, the archive should deduplicate the multiple versions.
  2. There are changes that are in the interest of neither the author nor the public. For example, suppose some malefactor guesses the author's password and alters some content. If the papers aren't signed, how is this distinguished from an authorized change? Even if they are signed, how are readers and the archive to know that the author's private key hasn't leaked?
  3. There are changes that the author wishes to make but which are not in the public interest. For example, the author may wish to make changes to conceal fraud. How can such changes be distinguished from those that are in the public interest (case 1)? Do we depend on the author's pod notifying the world about a change that the author does not want the world to know about?
  4. There are changes that should be made in the public interest but which are against the wishes of the author, for example the analog in this vision of a retraction. If the DOI resolves to the copy in the author's pod, how are readers made aware that the paper has been retracted?
The last case is the most interesting. Herbert's example of Bob commenting in his pod on a document Alice published mentions that Alice gets notified and "if all is well" links to Bob's comment. What if Alice doesn't like Bob's comment? Alice is in control of her own storage, so there is no way to force her to link to Bob's comment. Presumably, in this vision, the "certification" function happens because there are entities (journals in the current system) that "like" or not papers. What's to stop authors claiming that their paper has a "like" from Nature? Of course, Nature would not have a link "like"-ing the paper. But who is going to detect the conflict between the author and Nature? And how is it to be resolved. given that only the author(s) can remove the false claim?

Herbert envisages that reviews would also be archived at the point of publication. Would the archive link from Alice's paper to Bob's unfavorable review that the copy in Alice's pod did not link to? This does not seem archive-like behavior.
ConclusionI've raised a lot of questions here. It seems to me that there are several fundamental issues underlying them:
  • The vision depends on a trusted archival infrastructure which, as I've pointed out, we have so far failed to implement adequately, or in a decentralized enough manner.
  • The vision depends on automated processes run by some trusted institution that regularly scan looking for changes, missing links, etc.
  • The pods' inboxes are discoverable, so will get spammed and attacked. Thus they will need anti-spam and anti-virus capabilities. Incoming messages will have to be reviewed by the owner before being acted upon.
  • In general, like the original (and decentralized) Web, the vision depends on everyone involved acting from the best motives. Which, in some sense, is how we got where we are.
I agree with Herbert that his vision, as it stands, has little chance of being implemented. But he has raised a very large number of interesting questions. It is worth noting that many of them are not in fact specific to a decentralized system.

Terry Reese: MarcEdit 7 Updates

planet code4lib - Tue, 2018-01-23 13:35

I posted updates that spanned all versions of MarcEdit yesterday.   The updates were primary based around the thoughts I’ve been having around unicode normalizations, and a need to provide a way to formalized and standardized normalizations across records. 

Unicode Normalizations

In all versions of MarcEdit, there is a new settings in the preferences window:

When the option, Enforce Defined Normalization is selected, the tool will ensure all input and output created through the MarcEditor and MarcEngine conform to the normalization selected.  This will allow users to find and replace data without need to worry about normalization, and will ensure that the data that comes out of MarcEdit all complies to a specific normalization when working with UTF8 data.

Executable tasks

I had a question asking if MarcEdit’s Tasks could be created as executable files (like scripts).  The answer is kind of.  I’ve added a new option to the tool – in the MarcEdit Task Manager – you can select a task for export and now select: Task Executable (*.exe)

What this will do is allow you to create an new program that wraps execution of the task into it.  Now, this doesn’t make the task portable or a stand alone file.  The executable assumes MarcEdit is installed, and the task lives in your task store – But this gives you a shortcut of sorts that can be used to just drag and drop records for processing.  On the desktop, your new executable task will look like:

And you can drag files to process onto the icon.  The program will run the task, outputting the new file as the same file name as processed, with the addition of .rev.[extension] added to it.

This is still slightly a concept – you may find your mileage will vary with it.  But if you try it, let me know how it goes.


District Dispatch: FCC finds broadband speed standards sufficient

planet code4lib - Tue, 2018-01-23 11:53

Last week, the Federal Communications Commission (FCC) put out a fact sheet on their (almost) yearly report on broadband: how it is being deployed and what policymakers consider to be the speeds and what characteristics the FCC will use to determine how it is being deployed. The report keeps some crucial standards in place the ALA had been tracking.

FCC Chairman Ajit Pai had previously suggested that the commission’s current standard for home broadband of 25 Mbps up and 3 Mbps down, defined under a previous FCC chairman, was perhaps unnecessarily high. The Chairman suggested that the emergence of mobile services, combined with traditional wireline broadband, provides sufficient connections for people. So he had proposed a combined fixed and mobile broadband standard of 10 Mbps up and 1 Mbps down. ALA, several members of Congress and other advocates were concerned, and as we noted in our comments to the FCC: rather than ensuring commercial ISPs are meeting consumers’ needs and holding carriers accountable to existing standards, the FCC may just be choosing to “make the test easier” for those providers.

In the end, the FCC’s report keeps the definition of broadband at 25 Mbps up and 3 Mbps down and opted to continue to evaluate mobile and fixed services separately. The report also says that the FCC should consider what broadband access means “holistically,” though the fact sheet released provides little indication of what that means. On the one hand, the fact sheet indicates that broadband is being deployed in a “reasonable and timely manner” (a departure from the previous FCC). But two days earlier, Chairman Pai proposed an order to give small carriers over $500 million to deploy broadband in rural areas, indicating that, at least in these areas, broadband is not being deployed in a timely enough fashion.

This is a dynamic we will be following closely. As a policy matter, it seems the majority at the FCC would like to continue pushing for deregulation as the fact sheet argues that the reason broadband is now being deployed in a timely fashion is because of certain deregulatory actions of this FCC. But when funds are made available to support broadband, it seems the FCC’s preference is to focus on rural areas.

We look forward to the release of the full report to address additional issues In our comments. we asked the FCC to maintain the benchmarks for broadband to libraries set in 2014 as part of the modernization of the E-rate program: for libraries serving less than 50,000 population the FCC recommended a minimum broadband speed of 100 Mbps; for libraries serving more than 50,000 population it recommended a speed of at least 1 Gbps. We also hope the FCC will work with us to find other metrics that might help our shared policy goals of ensuring well-connected anchor institutions.

The post FCC finds broadband speed standards sufficient appeared first on District Dispatch.

Code4Lib: Seeking for a Local D.C. Host

planet code4lib - Mon, 2018-01-22 21:02

Looking for any D.C. locals that could host a guest for the 2018 C4L. I'll be traveling from Philadelphia by car and am hoping to bike or metro to the Omni Shoreham Hotel. I have a potential registration that could be transferred to me if I can find boarding. Our travel budget is tight so I need to reduce as much cost as possible. Thanks all!

Lucidworks: Keeping Retail Sites Up 24x7x365

planet code4lib - Mon, 2018-01-22 19:21

In a global economy there is no downtime. Even for national or local retailers, downtime has become a thing of the past. People buy at odd hours. Customers who have a bad experience have other options that are up 24×7. Search is the core of the Internet and even more so the core of how people buy things on the Internet.

24×7 Core Infrastructure

Despite this, many e-commerce platforms and even dedicated search solutions treat availability as an afterthought. One well-known search solution actually requires you to store everything and scale a NAS in order to achieve availability. Not only will that not scale, but it isn’t very dependable.

Lucidworks Fusion is built upon a Solr core architecture. This architecture is tried and true and is the same general idea that the Internet giants use to scale to great size while simultaneously maintaining what used to be uncommon uptime.

Meanwhile the world is becoming more volatile. Whether it is climate change making weather patterns more erratic, violence around the world, or just plain old fiber cuts, you need to make sure you can handle a data center or cloud “Availability Zone” going down. This is known by many names such as Disaster Recovery, WAN replication, and CDCR. The bottom line is you need to stay up, no matter what!

24x7x365 Analytics

Recently everyone is getting into the personalization and analytics/data science business. Supposedly you’re going to profile your customers, send that off to a team of data scientists who are going to load that into notebooks, and they’re going to send you back something actionable. There are a lot of crickets in that.

With Fusion Insights you can see this kind of customer profile information out of the box, in real-time, whenever you want. Combined with our advanced AI technology you can also automate most of what you’d do using this data out of the way. From promotions to recommendations, you can automatically find the user their exact desires.

And yes, if you want you can just grab the user events into plain old Excel (or your favorite analytics tool) or query it with plain old SQL.

24x7x365 Updates

Stuff is happening, inventory is being received. You need a system that doesn’t need to go down for batch updates. You need an architecture that can be updated in real-time. If you have other systems that operate in batch, you need to be able to share the load and get it done as soon as inhumanly possible. If not, you’re losing real money.

Fusion’s Solr architecture is architected like the Internet. It can take streams of data at real-time speeds and make it readily available for your customers. Meanwhile, Fusion’s ingestion architecture can take and transform data using distributed computing technology so that as many nodes as possible are involved in getting the work done. This means your data is updated and ready as fast as you need it to be.

24x7x365 Changes

Data Sources change, customers change, and therefore systems change. A modern retailer is tweaking their algorithms for product search relevance, customer personalization, and everything in-between. A modern e-commerce search solution needs to be open to change at any time.

Fusion’s Query Workbench lets your search developers see what they’re doing and test it. Fusion’s Index Workbench even lets you change the way you import data and see what those changes will mean before they are live. Fusion’s Experiments engine allows you to do real A-B testing, allowing you to see which version of a query or customer targeting method yields more sales.

24x7x365 Search UI Development

Supposedly every time you want to make a change you’re supposed to have a JavaScript developer wind through a mountain of code and add some new functionality (type-ahead or recommendations, promotions, whatever). This is the way we’ve always done it in the Interweb era. However, it is a lot slower than the client-server days where you could drag-and-drop a WYSIWYG interface. Besides, surely someone has developed nearly everything we can think of by now. Why can’t your e-commerce search be “Legoware” and just rapidly re-arrange the blocks when it is time to update the interface?

As it turns out, you’ve got one up on the whiz-kid JavaScript developer. Lucidworks has already written nearly any advanced feature of your search UI that you can think of. It is a matter of wiring the right components together and putting your skin on. When it is time to change, move stuff around or add new components. Again, you need to be able to change your site at any moment and with rapid turn around. Fusion App Studio lets you do just that.

Fusion is your 24x7x365 Search Solution

You’ve got to stay up, accept changes to data, your system, and your UI at anytime while providing the most personalized tailored customer experience that the Internet allows. Lucidworks Fusion is built on the right architecture and has the right tools to let you do that.

Take Your Next Steps:

The post Keeping Retail Sites Up 24x7x365 appeared first on Lucidworks.

Karen G. Schneider: Keeping Council

planet code4lib - Sat, 2018-01-20 15:52

Editorial note: Over half of this post was composed in July 2017. At the time, this post could have been seen as politically neutral (where ALA is the political landscape I’m referring to) but tilted toward change and reform. Since then, Events Have Transpired. I revised this post in November, but at the time hesitated to post it because Events Were Still Transpiring. Today, in January 2018, I believe even more strongly in what I write here, but take note that the post didn’t have a hidden agenda when I wrote it, and, except where noted, it still reflects my thoughts from last July, regardless of ensuing events. My agendas tend to be fairly straightforward. — KGS


Original Post, in which Councilors are Urged to Council

Edits in 2018 noted with bolding.

As of July 2017, I am back on ALA Council for my fifth (non-consecutive) term since joining the American Library Association in 1991. In June I attended Council Orientation, and though it was excellent–the whole idea that Councilors would benefit from an introduction to the process is a beneficial concept that emerged over the last two decades–it did make me reflect on what I would add if there had been a follow-on conversation with sitting Councilors called “sharing the wisdom.” I was particularly alerted to that by comments during Orientation which pointed up a traditional view of the Council process where ALA’s largest governing body is largely inactive for over 350 days a year, only rousing when we prepare to meet face to face.

Take or leave what I say here, or boldly contradict me, but it does come from an abundance of experience.

You are a Councilor year-round

Most newly-elected Councilors “take their seats” immediately after the annual conference following their election — a factoid with significance. Council, as a body, struggles with being a year-round entity that takes action twice a year during highly-condensed meetings during a conference with many other things happening. I have written about this before, in a dryly wonky post from 2012 that also addresses Council’s composition and the role of chapters. I proposed that Council meet four times a year, in a solstice-and-equinox model. Two of those meetings (the “solstice” meetings) could  be online. (As far back as 2007 I was hinting around about the overhead and carbon footprint of Midwinter.) I doubt Midwinter will go to an online format even within the next decade–it’s a moneymaker for ALA, if less so than before, and ALA’s change cycle is glacial–but the proposal was intended to get people thinking about how Council does, and doesn’t, operate.

In lieu of any serious reconsideration of Council, here are some thoughts.

First, think of yourself as a year-round Councilor, even if you do not represent a constituency such as a state chapter or a division that meets and takes action outside of ALA. Have at least a passing familiarity with the ALA Policy Manual. Bookmark it and be prepared to reference it. Get familiar with ALA’s financial model through the videos that explain things such as the operating agreement. Read and learn about ALA. Share news. Read the reports shared on the list, and post your thoughts and your questions. Think critically about what you’re reading. It’s possible to love your Association, believe with your heart that it has a bright future, and still raise your eyebrows about pat responses to budget questions, reassurances that membership figures and publishing revenue will rebound, and glib responses about the value of units such as the Planning and Budget Assembly.

Come to Council prepared. Read everything you can in advance, speak with other Councilors, and apply solid reflection, and research if needed, before you finish packing for your trip. Preparation requires an awareness that you will be deluged with reading just as you are struggling to button up work at your library and preparing to be away for nearly a week, so skimming is essential. I focus on issues where I know I can share expertise, and provide input when I can. Also, I am proud we do memorial resolutions and other commemorations but I don’t dwell on them in advance unless I have helped write them or had close familiarity with the people involved.

Fee, Fie, Foe, Forum

Coming prepared to Council is one of those values Council has struggled with. Looking at the Council list for the week prior to Annual 2017, the only conversation was a discussion about the relocation of the Council Forum meeting room from one hotel to another, complete with an inquiry asking if ALA could rent a special bus to tote Councilors to and from the Forum hotel.

Council Forum is an informal convening that has taken place for decades to enable Council to discuss resolutions and other actions outside of the strictures of parliamentary procedure. It meets three times during ALA, in the evening, and though it is optional, I agree with the Councilor who noted that important work happens at this informal gathering.

I am conflicted about Forum. It allows substantive discussion about key resolutions to happen outside of the constrictive frameworks of parliamentary procedure. Forum is also well-run, with volunteer Councilors managing the conversation. But Forum also appears to have morphed into a substitute for reading and conversation in advance. It also means that Councilors have to block out yet more time to do “the work of the Association,” which in turn takes us away from other opportunities during the few days we are together as an Association. I don’t say this to whine about the sacrifice of giving up dinners and networking with ALA colleagues, though those experiences are important to me, but rather to point out that Forum as a necessary-but-optional Council activity takes a silo–that Brobdingnabian body that is ALA Council–and further silos it. That can’t be good for ALA. As Councilors, we benefit from cross-pollination with the work of the Association.

Resolved: To tread lightly with resolutions

New Councilors, and I was one of them once, are eager to solve ALA’s problems by submitting resolutions. Indeed, there are new Councilors who see resolutions as the work of Council, and there have been round tables and other units that clearly saw their work as generating reams of lightly-edited, poorly-written resolutions just prior to and during the conference.

There are at least three questions to ask before submitting a resolution (other than memorial and other commemorative resolutions):

  • Can the resolution itself help solve a problem?
  • Has it been coordinated with the units and people involved in the issue it addresses?
  • Is it clear and well-written?

There are other questions worth considering, such as, if the issue this resolution proposed to address cropped up a month after Council met, would you still push it online with your Council colleagues, or ask the ALA Executive Board to address it? Which is another way to ask, is it important?

Tread lightly with Twitter

Overall, since coming through the stress of living through the Santa Rosa fires, I’m feeling weary, and perhaps wary, of social media. Though I appreciate the occasional microbursts taking on idiots insulting libraries and so on, right now much of social media feels at once small and overwrought. If I seem quieter on social media, that’s true. (But I have had more conversations with neighbors and area residents during and after the fires than I have since we moved to Santa Rosa in early 2015, and those convos are the real thing.)

More problematically, as useful as Twitter can be for following real-world issues–including ALA–Twitter also serves as a place where people go to avoid the heavy lifting involved with crucial conversations. I find I like #alacouncil Twitter best when it is gently riffing on itself or amplifying action that the larger ALA body would benefit hearing about. [the following, to the end of this post, is all new content] I like #alacouncil Twitter least when it is used as a substitute for authentic conversation, used to insult other Councilors, or otherwise undermining the discourse taking place in the meatware world. Twitter is also particularly good at the unthinking pile-on, and many people have  vulnerabilities in this area that are easily exploited.

Sometimes those pile-ons hit me close to home, as happened a little over a year ago. Other times these pile-ons serve only to amuse the minx in me, such as when a Famous Author () recently scolded me for “trafficking in respectability politics” because I was recommending a list of books written by writers from what our fearless leader calls “s–thole countries.” Guilty as charged! Indeed, I have conducted two studies where a major theme was “Do I look too gay?” I basically have a Ph.D. in respectability politics. And like all writers–including Famous Author ()–I traffic in them. I chuckled and walked on by.

Walking on by, on Twitter, takes different forms. As an administrator, I practice a certain pleasant-but-not-sugary facial expression that stays on my face regardless of what’s going on in my head. I’m not denying my emotions, which would be the sugary face; I’m managing them. It’s a kind of discipline that also helps me fjord difficult conversations, in which the discipline of managing my face also helps me manage my brain.

The equivalent of my Admin Face for me for #alacouncil Twitter is to exercise the mute button. I have found it invaluable. People don’t know they are muted (or unmuted). If only real life had mute buttons–can you imagine how much better some meetings would be if you could click a button and the person speaking would be silenced, unaware that you couldn’t hear them? Everyone wins. But that aside, I have yet to encounter a situation on Twitter when–for me–muting was the wrong call. It’s as if you stepped off the elevator and got away from that person smacking gum. Another car will be along momentarily.

My last thought on this post has to do with adding the term “sitting” before Councilors in the first part of this post. When I was not on Council I tried very hard not to be “that” former Councilor who is always kibitizing behind scene, sending Councilors messages about how things should be and how, in the 1960s, ALA did something bad and therefore we can never vote online because nobody knows how to find ALA Connect and it’s all a nefarious plot hatched by the ALA President, his dimwitted sycophants, and the Executive Board, and why can’t MY division have more representation because after all we’re the 800-pound gorilla (ok, I just got political, but you’ll note I left out anything about what should or should not be required for a Very Special Job).

Yes, once in a while I sent a note if I thought it was helpful, the way some of my very ALA-astute friends will whisper in my ear about policy and process I may be unfamiliar with. Michael Golrick, a very connected ALA friend of mine, must have a third brain hemisphere devoted to the ALA policy manual and bylaws. And during a time when I was asking a lot of questions about the ALA budget (boiling down to one question: who do you think you’re fooling?), I was humbled by the pantheon of ALA luminaries whispering in my ear, providing encouragement as well as crucial guidance and information.

But when I am no longer part of something, I am mindful that things can and should change and move on, and that I may not have enough information to inform that change. We don’t go to ALA in horse-and-buggies any more, but we conduct business as if we do, and when we try to change that, the fainting couches are rolled out and the smelling salts waved around as if we had, say, attempted to change the ALA motto, which is, I regret to inform you, “The best reading, for the largest number, at the least cost”–and yes, attempts to change that have been defeated. My perennial question is, if you were starting an association today, how would it function? If the answer is “as it did in 1893” (when that motto was adopted), perhaps your advice on a current situation is less salient than you fancy. You may succeed at what you’re doing, but that doesn’t make you right.

And with that, I go off to Courthouse Square today to make exactly that point about events writ much, much larger, and of greater significance, than our fair association. But I believe how we govern makes a difference, and I believe in libraries and library workers, and I believe in ALA. Especially today.

Bookmark to:


Subscribe to code4lib aggregator