You are here

Feed aggregator

State Library of Denmark: Changing field type in Lucene/Solr

planet code4lib - Mon, 2014-12-15 12:06
The problem

We have 25 shards of 900GB / 250M documents. It took us 25 * 8 days = half a year to build them. Three fields did not have DocValues enabled when we build the shards:

  • crawl_date (TrieDateField): Unknown number of unique values, 256M values.
  • links_domains (multi value Strings): 3M unique values, 675M references.
  • links_hosts (multi value Strings): 6M unique values, 841M references.

We need DocValues on those fields for faceting. Not just because of speed and memory, but because Solr is technically unable to do faceting without it, at least on the links_domains & links_hosts fields: The internal structures for field cache faceting does not allow for the number of references we have in our index.

The attempted solution

Faced with the daunting task of re-indexing all shards, Hoss at Stump the Chump got the challenge of avoiding doing so. He suggested building a custom Lucene FilterReader with on-the-fly conversion, then using that to perform a full index conversion. Heureka, DVEnabler was born.

DVEnabler takes an index and a list of which fields to adjust, then writes a corrected index. It is still very much Here Be Dragons and requires the user to be explicit about how the conversion should be performed. Sadly the Lucene index format does not contain the required information for a more automatic conversion (see SOLR-6005 for a status on that). Nevertheless it seems to have reached first usable incarnation.

We tried converting one of our shards with DVEnabler. The good news is that it seemed to work: Our fields were converted to DocValues, we could perform efficient faceting and casual inspection indicated they had the right values. Proper test pending. The bad news is that the conversion took 2 days! For comparison, a non-converting plain optimize took just 8 hours.

Performance breakdown

Our initial shard building is extremely CPU-heavy: 8 days with 24 cores running 40 Tika-processes at 90%+ CPU utilization. The 8 real time days is 192 CPU core days. Solr merge/optimize is single-threaded, so the conversion to DocValues takes 2 CPU core days, or just 1/100 of the CPU resources needed for full indexing.

At the current time it is not realistic to make the conversion multi-threaded, to take advantage of the 24 cores. But it does mean that we can either perform multiple conversions in parallel or use the machine for building new shards, while conversing the old ones. Due to limited local storage, we can run 2 conversions in parallel, while moving unconverted & converted indexes to and from the machine. This gives us an effective conversion speed of 1 shard / 1 day.


FOSS4Lib Upcoming Events: Islandora Conference

planet code4lib - Sun, 2014-12-14 21:56
Date: Monday, August 3, 2015 - 08:00 to Friday, August 7, 2015 - 17:00Supports: IslandoraFedora RepositoryDrupal

Last updated December 14, 2014. Created by Peter Murray on December 14, 2014.
Log in to edit this page.

August 3 - 7, 2015, we invite Islandorians from the world over to join us in the birthplace of Islandora (Charlottetown, PEI) for a week of great food, (hopefully) beautiful weather, and all the Islandora you can handle.

Mark E. Phillips: What is a use?

planet code4lib - Sun, 2014-12-14 13:28

One of the metrics that we use for the various digital library systems that we run at work is the idea of an item “use”.

This post will hopefully explain a bit more about how a use is calculated and presented.

The different digital library systems that we operate (The Portal to Texas History, the UNT Digital Library, and the Gateway to Oklahoma History) make use of Google Analytics to log and report on access to these systems.  Below is a screenshot of the Google Analytics data for the last month related to The Portal to Texas History.

Google Analytics Screenshot for The Portal to Texas History

From Google Analytics we are able to get a rough idea of the number of users, sessions, and pageviews as well as a whole host of information that is important for running a large website like a digital library.

There are a number of features of Google Analytics that we can take advantage of that allow us to understand how users are interacting with our systems and interfaces.

One of the challenges we have with this kind of analytics is the fact that it collects information when triggered by Javascript on the page.  This can happen when the page is loaded or when something is clicked on the page.  The reason that this is sometimes not enough for our reporting is the fact that much of the content in our various digital libraries is linked to directly by outside resources,  either embedded in discussion forums or by directing users directly to the PDF representation of the item.

A few years ago we decided to start accounting for this kind of usage of our systems in addition to the data that Google Analytics provides.  In order to do this we developed a set of scripts that we run each night that work on the previous days worth of log files on the application servers that serve our digital library content.  These log files are aggregated to a single place,  parsed, and then filtered to leave us with the information we are interested in for the day.  This resulting data are the unique uses that an item has had from a given IP address during a 30 minute window.  This allows us to report on uses of theses and dissertations that may be linked to directly from a Google search result,  or possibly an image that was embedded in another sites blog post that pertains to one of our digital libraries.

Once we have the data for a given object we are able to aggregate that usage information to the collection and partner level for which the item belongs.  This allows us to show information about usage at the collection or partner level.  Finally the item use information is aggregated at the system level so that you can see the information for The Portal to Texas History, UNT Digital Library, or The Gateway to Oklahoma History in one place.

Item page in the UNT Digital Library

The above image shows how an end user can see the usage data for an item on the items about page.  This shows up in the “Usage” section which displays total usage, uses in the last 30 days, and then uses yesterday.

Usage Statistics for item in the UNT Digital Library

If a user clicks on the stats tab they are taken to the items stats page.  They can see the most recent 30 days or select from a month or year in the table below the graph.

Referral data for item in the UNT Digital Library

A user can view the referral traffic for a selected month or year by clicking on the referral tab.

Collection Statistics for the UNT Scholarly Works Repository in the UNT Digital Library

Each item use is also aggregated to the collection and partner level.

System statistics for the UNT Digital Library

And finally a user is able to view statistics for the entire system.  At this time we have usage data for the systems going back to 2009 when we switched over to our current architecture.

I will probably write another post detailing the specifics of what we do and don’t count when we are calculating a “use”.  So more later.

Ranti Junus: Digital Collections and Accessibility

planet code4lib - Sat, 2014-12-13 06:06

[This is a crosspost from the Digital Scholarship Collaborative Sandbox blog from the MSU Libraries.  The original blog post can be read there.  Do visit the blog and read the other posts written by my colleagues as well.]

Like many other academic libraries, our collection consists of not only print materials, but also electronic collections. Typical electronic resources can be those we subscribe to through a vendor (ProQuest, JSTOR, Elsevier, etc.), or ones that we produce in-house such as https://www.lib.msu.edu/exhibits/).

We digitize a lot of stuff. The Library was busy working on digitization projects even before I joined in 2001, from the Making of the Modern Michigan, the Cookbooks project and Sliker Collection, Sunday School Books, nd more recently, historic images from Chicago Tribunes. Or consider other digital collections from other institutions such as the New York Public Library, the Library of Congress,Smithsonian National Museum of Natural History, the World Digital Library, or the Digital Public Library of America (DPLA). There are a lot of digital collections produced by various libraries, archives, museums, and other institutions.

The typical outcome from these digitization projects are images, metadata, and text, represented either as an image of printed or handwritten material or as a transcript. We then create a Web presence for these outcomes, including features like search, browse, and perhaps some additional application to display and interact with the images. User interaction with these digital collections should be straightforward: users should be able to visit the site, search or browse, and read the information presented on the page with ease. We also want to make the presentation of these collections pleasing to the eye, with background color or images, font type and color, and consistent placement of the images with the associated metadata (image on the top with metadata on the bottom, or image on the left with metadata on the right, or the whatever design decision we make to present the collection.) We also want to make sure that our institution’s branding is visible. So we add the banner, image or logo of our institution, some navigation so visitors can also go to our main website, and footers to provide visitors with contact information, acknowledgement of the funder, link to the privacy statement, etc.

Eventually, we produce a set of rich interfaces, chock full of images, text, and links. And probably some audio, too, for a sound project.

Given the ubiquitous nature of digital collections, the goal that these collections would be used as part of scholarly activities, and the library’s mission to disseminate the information as widely as possible, there is one aspect that many of us need to address when we plan for a digitization project: how do people with disabilities access these collections without getting lost? Can they also get the same access and benefit of our collections if they only rely on their screen readers (or refreshable Braille, or any other assistive technology)? Can people move around our website easily using just a keyboard (for those with hand-coordination difficulty who cannot use a mouse)?

Consider these questions when you begin working on any digital humanities project. Data visualization is now being used a lot. Sighted users can review the image representations easily; we can distinguish the information by shape and colors. Mundane data that used to be presented as text can now have pretty face. Information can be conveyed faster because we can see the charts and colors right away without having to go through lengthy text. But how can those who rely on sound be able to infer the information from those charts? Can color-blind people distinguish the color palette that you use? How are you going to explain the conclusion of your charts “verbally”? These are areas that have yet be addressed fully. We still have a lot of work to do.

Some resources:

 

Galen Charlton: Our move, by the numbers

planet code4lib - Sat, 2014-12-13 02:14

We’ve landed in Atlanta, having completed our move from Seattle driving cross-country.  Here are some numbers to ponder, with apologies to Harper’s magazine.

  • Humans: 2
  • Cats: 3
  • Miles as the car rolls: 3,600
  • Miles per gallon: 42.1
  • Average speed of the car: 174,720 furlongs per fortnight
  • Seconds spent pondering whether to use furlongs or smoots for the previous measure: 15
  • Cracked windshields: 1
  • Cats who forgot that if the tail is visible, the cat is visible: 1
  • Mornings that the cats were foiled by platform beds: 5
  • Mornings that the cats were foiled by an air mattress: 2
  • Mornings that the humans were foiled by a bed with an underneath: 2
  • Number of cats disappointed that said beds turned out to be moveable: 3
  • Hours spent experiencing the thrills of Los Angeles rush hour traffic: 3
  • Calls from a credit card fraud monitoring department: 1
  • Hotel hot tubs dipped into: 2
  • Restaurant restrooms with disconcerting signs: 1
  • Progress of feline excavation to China: no report
  • Fueling stops: 10
  • Net timezone difference: +3.0
  • Number of moving company staff involved: 9
  • Host cats consternated by the arrival of three interlopers: 4
  • Cats who decided to spend a few hours under the covers to bring down the number of whelms: 1
  • Tweets sent using the #SEAtoATL hashtag, including this post’s tweet: 23
  • Nights spent in California: 2
  • Nights spent in Texas: 3
  • Humans and cats happy to have arrived: 5

District Dispatch: How many pizzas can you get for $1.5 billion?

planet code4lib - Fri, 2014-12-12 23:09

Photo by Robert D Peyton via Flickr

Yesterday the Federal Communications Commission (FCC) brought the E-rate modernization proceeding to a conclusion with all the bravado it deserved. To a packed room, including library directors, teachers, a superintendent, a school principal and a handful of school students from D.C. public schools, the FCC staff presented the E-rate Order that the Chairman had circulated to his colleagues three weeks ago.

We at the American Library Association (ALA) had a pretty good understanding of what would be included in yesterday’s Order through our numerous briefings from the Chairman’s staff. Thankfully, it looks like things have not changed since the last update before the Commission’s sunshine period began (all hush-hush negotiations among the Commissioners when they no longer take public comment but finish digesting the public record to make sure the final item is commensurate with that record and the Commission’s own data and goals).

Despite interruptions during the open meeting from net neutrality protesters, the meeting and subsequent vote went forward smoothly. Richard Reyes-Gavilan spoke eloquently about the many services the D.C. Public Library provides all D.C. residents. These span the gamut from “basic human services such as applying for health benefits and communicating with loved ones” to those that change lives as shown in the story from The Washington Post, he related:

“[A]bout a 69-year old man who was set on rebuilding his life after 40 years of incarceration. With few other places to turn, he began taking computer classes at the Martin Luther King Jr. (MLK) Library. He learned to apply for jobs online and now he’s employed full-time at the University of the District of Columbia.”

Reyes-Gavilan, also showed the Commission where libraries with high-capacity broadband are heading in describing the DC Tech Meetup held monthly in the MLK Library where “hundreds of technologists gather to pitch ideas, demonstrate new products, and network with potential collaborators and funders.”

While Richard spoke on behalf of libraries, during the meeting, he was joined by colleague Nicholas Kerelchuck, manager, MLK Library Digital Commons, as well as Andrea Berstler, executive director, Wicomico Public Library; and Rose Dawson, director of Libraries, Alexandria Library. Each of whom could add similar examples of why high-capacity broadband is the currency of libraries today and why what the Commission accomplished today, can make a difference for so many libraries across the country.

What’s in the Order?

The shiny object that tops the list of all the press headlines and most of the organizations’ statements issued yesterday, including ALA’s, is the additional $1.5 billion that will be added to the program, immediately increasing available funding to $3.9 billion (plus annual inflation) from here on out.

Equally important to the additional funding are the policy changes the Commission adopted to address the to the library broadband capacity issues—geared to help more libraries get the speeds they desperately need.

Throughout the E-rate proceeding, ALA pushed the Commission to take up this issue. Yesterday we were rewarded for our efforts. While we do not have the Order in hand yet to read the exact details, FCC staff described the changes (pdf) during the meeting. They will “maximize the options schools and libraries have for purchasing affordable high-speed broadband” and include:

  • Additional flexibility for libraries and schools seeking to purchase high-speed broadband by suspending the amortization requirements for new construction and allowing applicants to pay for the non-discounted portion over multiple years;
  • In 2016, equalizing the Commission’s treatment of dark and lit fiber and allowing for self-construction when these options are the most cost-effective solutions;
  • Providing up to a 10 percent match to state funding for construction with special consideration for tribal libraries and schools; and
  • Requiring carriers that receive Connect America funding to offer high-speed broadband to libraries and schools located in their service areas.

Larra, Marijke, and Alan with Chairman Wheeler

These rule changes have very real potential for libraries that have been struggling to increase broadband-capacity. But do not fear. We at ALA, with the help of the E-rate task force, our telecommunications subcommittee, Bob Bocher, an OITP Fellow, and our consultants, are rolling up our sleeves to determine the most effective outreach and support activities. We will also be working with the Public Library Association (PLA), the Chief Officers of State Library Agencies (COSLA), as well as the Association for Rural and Small Libraries (ARSL), and other groups to make sure we address the most pressing concerns in the coming weeks.

What do pizza and E-rate have in common?

A number of my E-rate posts refer to my kids (they can tell you what universal service is, what the Senate Commerce Committee does, the difference between category 1 and 2, and what amortization means. We’re working on what a form 470 is—poor things). At least one of my posts refers to food. This one is both.

My son texts me, “what’s for dinner” while we at the office are reflecting on the events of the day. So, because my fridge has been empty (really empty) for the last several weeks, I text back a math problem. How many pizzas can you buy for $1.5 billion? Well, about 150 million which may be any teenager’s dream come true. But we won’t get any no matter how big the check is if we don’t place the order. The same holds true for libraries and E-rate. It would not matter if we had an additional $150 billion unless libraries place their order for E-rate eligible services—and in a big way.

It is true that the changes the Commission made to the program between the July Order and yesterday’s Order will take some real effort to navigate and it’s true that there are wrinkles that need ironing out by the Commission and USAC. This next year will undoubtedly be rocky while those who support E-rate applicants struggle to make sense of the changes and figure out how best to support individual libraries. It takes a village, and libraries should well understand that phrase if the stories I have been collecting are truly indicative of the tenacity of librarians today.

The Commission removed barriers in its rules to open the door for more libraries to get more funding for more broadband. It is now up to the library community to walk through those doors.

As always, more in the near future. Next post could very well have cocktails involved.

The post How many pizzas can you get for $1.5 billion? appeared first on District Dispatch.

Code4Lib: Code4Lib 2015 Diversity Scholarships

planet code4lib - Fri, 2014-12-12 21:35

The Code4Lib Scholarship Committee will award 5 diversity scholarships based on merit and need. Each scholarship will provide up to $1,000 to cover travel costs and conference fees for a qualified attendee to attend the 2015 Code4Lib Conference, which will be held in Portland, Oregon, from February 9 - 12, 2015.

Applications are due by December 31, 2014 at 5 PM EST (see below for more details).

DIVERSITY SCHOLARSHIP ELIGIBILITY, CRITERIA, AND REQUIREMENTS

To qualify for a scholarship, an applicant must be interested in actively contributing to the mission and goals of the Code4Lib Conference.

  • Two scholarships will be awarded to any woman or transgender person.
  • Two scholarships will be awarded to any person of Hispanic or Latino, Black or African-American, Asian, Native Hawaiian or Pacific Islander, or American Indian or Alaskan Native descent.
  • One scholarship will be awarded to the best remaining candidate who
    meets any of the previously mentioned eligibility requirements.

Eligible applicants may apply based on multiple criteria, but no applicant will receive more than one scholarship. Past winners of any Code4Lib scholarship are not eligible for a scholarship.

The scholarship recipients will be selected based upon their merit and financial needs.

Registration spots are being held for scholarship recipients. If you can attend only if you receive a scholarship, there is no need to register for the conference at this point. Scholarship recipients will receive a special link for free registration, or will be reimbursed if they have already registered.

Scholarship recipients are required to write and submit a brief trip report to the Code4Lib 2015 Scholarships Committee by April 1, 2015 to be posted to the Code4Lib wiki. The report should address: (a) what kind of experience they had at the conference, (b) what they have learned, (c) what suggestions they have for future attendees and conference organizers.

All reimbursement forms and receipts must be received by May 26, 2015.

HOW TO APPLY

To apply, please send an email to Francis Kayiwa (francis.kayiwa_at_gmail.com) with the subject heading Code4Lib 2015 Diversity Scholarship Application containing the following (combined into a single attached PDF, if possible):

  1. A brief letter of interest, which:
    • Identifies your eligibility for a diversity scholarship
    • Describes your interest in the conference and how you intend to
      participate
    • Discusses your merit and needs for the scholarship
  2. A resume or CV
  3. Contact information for two professional or academic references

The application deadline is Dec. 31, 2014, 5pm EST. The scholarship committee will notify successful candidates the week of Jan. 9, 2015.

SPONSORS

We would like to thank our sponsors for supporting the Code4Lib 2015 Diversity Scholarships. All sponsors have left it up to the discretion of the Code4Lib 2015 Scholarship Committee for how to award these diversity scholarships.

CONFERENCE INFORMATION

For more information on the Code4Lib Conference, please see the
conference website at http://code4lib.org/conference/2015. You can see write-ups of previous Code4Lib Conferences:

Nicole Engard: Bookmarks for December 12, 2014

planet code4lib - Fri, 2014-12-12 20:30

Today I found the following resources and bookmarked them on <a href=

  • TAGS TAGS is a free Google Sheet template which lets you setup and run automated collection of search results from Twitter.

Digest powered by RSS Digest

The post Bookmarks for December 12, 2014 appeared first on What I Learned Today....

Related posts:

  1. If This Then That
  2. LibraryThing adds another neat feature
  3. New Addition to Google

District Dispatch: National STEM Video Game Challenge kicks off

planet code4lib - Fri, 2014-12-12 19:49

The National STEM Video Game Challenges begins a series of workshops tomorrow at the Free Library of Philadelphia as part of Computer Science Education week. The Challenge, the result of a partnership between the Smithsonian, the Joan Ganz Cooney Center, and E-Line Media,  begin a series of workshops tomorrow, December 13, 2014 as part of Computer Science Education week. The Challenge is also supported by IMLS, the Grable Foundation, AARP, and Mentor Up. It was inspired by President Obama’s “Educate to Innovate” campaign, which focuses on science, technology, engineering and math education.

The first 14 workshops are confirmed at the following locations:

  • December 13, 2014, 1-4pm: Free Library of Philadelphia, Lillian Marrero Branch
  • December 20, 2014, 1-4pm: Science Museum of Virginia, Richmond, VA
  • January 3, 2015, 1-4pm: Pima County Public Library, Tucson, AZ
  • January 14, 2015, 4-7pm: City of New Braunfels Public Library, New Braunfels, TX
  • January 24, 2015, 10am-2pm: Museum of History & Industry, Seattle, WA
  • January 25, 2015, 12-4pm: Port Townsend Public Library, Port Townsend, WA
  • January 26, 2015, 10am-1pm: Port Townsend Public Library, Port Townsend, WA
  • February 5, 2015, 6-8:30 pm: Billings Public Library, Teen Game Design Workshop, Billings, MT
  • February 6, 2015, 1-3:30 pm: Billings Public Library, Librarian Workshop, Billings, MT
  • February 7, 2015, 1-3:30 pm: Billings Public Library, Librarian Workshop, Billings, MT
  • February 7, 2015, 1-4pm: Solomon R. Guggenheim Museum Teen Workshop, New York, NY
  • March 6, 2015, 9am-2pm: SciTech Days, Carnegie Science Center, Pittsburgh, PA
  • March 7, 2015, 1-4pm: Carnegie Science Center, Teen Workshop, Pittsburgh, PA
  • March 7, 2015: Carnegie Science Center, Teacher Workshop, Pittsburgh, PA

Additional dates and locations can be found on their website. Librarians and museum staff are encouraged to visit the website and take advantage of the mentor resource kit, a hands-on workshop guide, game design resources and additional guidance.

Entries for the Video Game Challenge will be accepted through February 25, 1015. Only entries submitted by students grades five through twelve will be accepted. For more information about the contest, visit the STEM Challenge website.

The post National STEM Video Game Challenge kicks off appeared first on District Dispatch.

William Denton: The sandbar

planet code4lib - Fri, 2014-12-12 18:08

This metaphor for life and death from Pale Gray for Guilt (1968), one of John D. MacDonald’s Travis McGee novels, came to mind the other day. I looked it up, and here it is for easy reference.

Picture a very swift torrent, a river rushing down between rocky walls. There is a long, shallow bar of sand and gravel that runs right down the middle of the river. It is under water. You are born and you have to stand on that narrow, submerged bar, where everyone stands. The ones born before you, the ones older than you, are upriver from you. The younger ones stand braced on the bar downriver. And the whole long bar is slowly moving down that river of time, washing away at the upstream end and building up downstream.

Your time, the time of all your contemporaries, schoolmates, your loves and your adversaries, is that part of the shifting bar on which you stand. And it is crowded at first. You can see the way it thins out, upstream from you. The old ones are washed away and their bodies go swiftly by, like logs in the current. Downstream where the younger ones stand thick, you can see them flounder, lose footing, wash away. Always there is more room where you stand, but always the swift water grows deeper, and you feel the shift of the sand and the gravel under your feet as the river wears it away. Someone looking for a safer place can nudge you off balance, and you are gone. Someone who has stood beside you for a long time gives a forlorn cry and you reach to catch their hand, but the fingertips slide away and they are gone.

There are the sounds in the rocky gorge, the roar of the water, the shifting, gritty sound of sand and gravel underfoot, the forlorn cries of despair as the nearby ones, and the ones upstream, are taken by the current. Some old ones who stand on a good place, well braced, understanding currents and balance, last a long time. A Churchill, fat cigar atilt, sourly amused at his own endurance and, in the end, indifferent to rivers and the rage of waters. Far downstream from you are the thin, startled cries of the ones who never got planted, never got set, never quite understood the message of the torrent.

District Dispatch: No luck on FOIA this Congress

planet code4lib - Fri, 2014-12-12 17:42

The Freedom of Information Act (FOIA) (S. 2520) bill has been on a roller coaster the last few weeks! We were hopeful. Then thought it wasn’t possible. Then the law, which would give private citizens greater access to government information, passed the Senate and hope was revived. But no, sadly attempted FOIA reform is now reliably reported over and done for this Congress.

Photo by Rich Hill via Flickr

In the House of Representatives, with its 435 Members, the rules of the chamber are deliberately designed so that individual Representatives generally have no say in what bills come up for votes, when they’ll be considered, or what amendments might be in order. In sharp contrast, the Rules of the Senate empower every one of the 100 Senators to profoundly affect what’s considered, when and on what terms. What’s more, the Senate has long honored the unofficial practice of allowing any Senator to place a “hold” on a bill—for any reason or no reason—effectively preventing it from even being considered by the body.

On Monday, in the last few hours of the 113th Congress (which is likely to permanently adjourn today), Sen. Jay Rockefeller (D-WV) placed just such a hold on S. 2520, critical FOIA reform legislation by outgoing Judiciary Committee Chairman Patrick Leahy (D-VT).

Sen. Leahy’s bipartisan FOIA Improvement Act of 2014, co-authored with Republican Sen. John Cornyn of Texas, proposed a host of badly needed improvements in FOIA. The bill was unanimously approved by the Senate Judiciary Committee in November and, after Sen. Rockefeller removed his hold, the bill was unanimously approved by the full Senate this week and sent to the House where FOIA reform took a backseat to last minute appropriations discussions.

We have worked with OpenTheGovernment.org and others on this bill and will be sad to see its official death at the close of this Congress.

On the decidedly “plus” side, S. 2520 would have:

  • Recodify pro-transparency standards that existed in FOIA itself as recently as the Clinton Administration, but which were rolled back during the Bush Administration. The Clinton-era standards were temporarily reinstated by Executive Order of the President. Sen. Leahy and other FOIA champions believe, however, that this standard of openness must be permanently preserved in statute rather than being subject to the political whims of whomever may be President in the future;
  • Address the overuse of “Exemption 5” to FOIA, which covers “inter-and intra-agency records,” by creating a 25-year limit on withholding pre-decisional agency documents;
  • Tackle some of the procedural inefficiencies of FOIA by encouraging proactive disclosure of frequently requested documents, and by clarifying when agencies can and cannot charge fees when they exceed statutory deadlines;
  • Strengthen the Office of Government Information Services (OGIS), the public’s FOIA ombuds¬man, by requiring that agencies notify information requestors regarding the availability of alternative dispute resolution options through OGIS as an alternative to litigation.

While it now looks like the 113th Congress will adjourn without adopting S.2520, we will join with other like-minded groups next Congress to begin the process of attempting to pass these common sense reforms all over again. We thank Sens. Leahy and Cornyn for their admirable attempt to improve the FOIA process and look forward to working with them again during the 114th Congress.

The post No luck on FOIA this Congress appeared first on District Dispatch.

LITA: Tech Yourself Before You Wreck Yourself – Vol. 4

planet code4lib - Fri, 2014-12-12 17:22

Art by Jason Garrattley

What a year it’s been, TYBYWYers! Last month, I talked about gratitude. This month, I’m focusing on the future. Let’s put a pretty bow on 2014 and take a peek at all the shiny opportunities 2015 has to offer. I promise I won’t tell your mom you looked at your presents.

If this is your first time stumbling upon this monthly feature of the LITA Blog, Tech Yourself Before You Wreck Yourself is a curated assortment of online education opportunities for aspiring library technologists at all levels of experience. I focus on webinars, MOOCs, and other free/low cost options for learning, growing, and increasing tech proficiency. I’m glad you’re here!

Monthly MOOCs

This coming year, you may have made a few tech education resolutions, and I’m going to help you keep them!

If you want to learn to code, the University of Michigan’s Programming for Everybody is a great place to start. The course This course aims to teach everyone to learn the basics of programming computers using Python. The course “has no pre-requisites and avoids all but the simplest mathematics. Anyone with moderate computer experience should be able to master the materials in this course.”  Get in there and start coding, TYBYWYers! I love avoiding all by the simplest mathematics.

I would also be remiss if I didn’t mention a slightly less techy MOOC being presented Northwestern University, Content Strategy for Professionals: Engaging Audiences for Your Organization. We are all challenged by the demands of creating and curating compelling content (she wrote as she curated content she hoped was compelling). Go learn from some masters, and tell me what you think.

Worthwhile Webinars

Tech On, TYBYWYers!

Have a wonderful holiday season, and I’ll see you in the new year.

 

M. Ryan Hess: The People Wide Web

planet code4lib - Fri, 2014-12-12 17:14

The debate around Net Neutrality has taken an interesting spin of late. Just as foes to Net Neutrality have gotten closer to their goal of setting up tollways and traffic controls on the information superhighway, some drivers are beginning to build their own transportation system altogether.

Net Neutrality is a concept that has been the norm on the Internet since its inception: the idea that every website gets equal treatment by Internet Service Providers (ISPs). But of course, media companies and the ISPs could conceivably benefit greatly if surcharges for access to higher bandwidth were allowed on the Net. For example, let’s say that Cable Company A offers priority bandwidth to Media Company X, allowing it to serve super high-def streaming video to users at lightning speed. However, Startup Company Z will then be obligated to compete against Media Company X for that bandwidth in order to provide the same quality service. Same goes for Blogger Y.

Fat chance of that. Indeed, given the pace at which media consolidation continues to go unchecked by regulators, were Net Neutrality abandoned, the Internet would quickly resemble something akin to how Network Television dominated communication in the years before high-speed Internet arrived.

And this is what concerns many people since a free, open web has so clearly promoted innovation. So far, the battle is not lost and Net Neutrality is still the norm. Nevertheless, some are creating back up plans.

This past week, BitTorrent, the people behind the popular torrent app uTorrent, announced they are exploring the creation of a new Internet which takes back control of the web and distributes access to websites across peer-to-peer networks.

Called Project Maelstrom, this torrent-based Internet would be powered by a new browser which would effectively rework the Internet into a much freer network with pretty much no gatekeepers.

Details are sparse at the moment, but essentially access to websites would be served as torrents, and thus not served from a single server. Instead, the sites would exist across the peer-to-peer network, in small, redundant bits living on people’s computers. Essentially, its the same technique used for torrent-based file sharing. When you try to access a site, your computer queries the torrent network and dozens of computers begin sending you the packets you need to rebuild the web page in question on your browser. And even as the web page is partially assembled, your computer then begins sharing what it already has with other people trying to access the site.

The result could likely be a much faster Internet, with much greater assurances of privacy. But technical questions remain and this does sound like it could take some time. But wow, what a revolution it would be.

Of course, this could get tricky to pull off. As you may have heard this week, the infamous torrent website Pirate Bay was taken down by authorities in Sweden this week. Pirate Bay serves up links to torrents allowing people to download everything from freeware applications to Hollywood movies that haven’t even been released yet and so has been targeted by law enforcement for years now. Even on today’s Internet, Pirate Bay could conceivably come back online at any time. But if the BitTorrent’s peer-to-peer Internet were realized, Pirate Bay would be back up instantaneously. Indeed, it would probably never come down in the first place. Same goes for Dark Net sites that sell everything from drugs to human beings, which have also been recently taken offline.

Bottom line is: Project Maelstrom is another example of how a free and open Internet is unlikely to ever go away. Question is, how much freedom is a good thing?

My own personal take is that taking back control of the Internet from media companies and ISPs would, on balance, be a great thing. Bad people do bad things in the physical world and that’s why we have never defeated crime 100%. As long as there is an Internet, there will be those that abuse it.

But even more importantly, innovation, freedom of speech and freedom to access information are core to advancing society. So I welcome Project Maelstrom.

So here’s a toast to the People-wide Web!


Pages

Subscribe to code4lib aggregator