You are here

Feed aggregator

Nick Ruest: 14,478,518 #WomensMarch tweets January 12-28, 2017

planet code4lib - Mon, 2017-01-30 21:38
Overview

A couple Saturday mornings ago, I was on the couch listening to records and reading a book when Christina Harlow and MJ Suhonos asked me about collecting #WomensMarch tweets. Little did I know at the time #WomensMarch would be the largest volume collection I have ever seen. By the time I stopped collecting a week later, we'd amassed 14,478,518 unique tweet ids from 3,582,495 unique users, and at one point hit around 1 million tweets in a single hour.

(Generated with Peter Binkley's twarc-report)

This put #WomensMarch well over 1% of the overall Twitter stream, which causes dropped tweets if you're collecting from the Filter API, so I used the strategy of using the both the Filter and Search APIs for collection. (If you're curious about learning more about this, check out Kevin Driscoll, Shawn Walker's "Big Data, Big Questions | Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data", and Jiaul H. Paik and Jimmy Lin's "Do Multiple Listeners to the Public Twitter Sample Stream Receive the Same Tweets?). I've included the search and filter logs in the dataset. If you grep "WARNING" WomensMarch_filter.log or grep "WARNING" WomensMarch_filter.log | wc -l you'll get a sense of the scale of dropped tweets. For a number of hours on January 22, I was seeing around 1.6 million cumulative dropped tweets!

I collected from around 11AM EST on January 21, 2017 to 11AM EST January 28, 2017 with the Filter API, and did two Search API queries. Final count before deduplication looked like this:

$ wc -l WomensMarch_filter.json WomensMarch_search_01.json WomensMarch_search_02.json 7906847 WomensMarch_filter.json 1336505 WomensMarch_search_01.json 9602777 WomensMarch_search_02.json 18846129 total

Final stats: 14,478,518 tweets in a 104GB json file!

This put's us in the same range as what Ryan Gallagher projected in "A Bird's-Eye View of #WomensMarch."

Below I'll give a quick overview of the dataset using utilities from Documenting the Now's twarc, and utilities described inline. This is the same approach as Ian Milligan and my 2016 Code4Lib Journal article, "An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter." This is probably all that I'll have time to do with the dataset. Please feel free to use it in your own research. It's licensed CC-BY, so please have at it! :-)

...and if you want access to other Twitter dataset to analyse, check out http://www.docnow.io/catalog/.

Users Tweets Username 5,375        paparcura 4,703        latinagirlpwr 1,903        ImJacobLadder 1,236        unbreakablepenn 1,212        amForever44 1,178        BassthebeastNYC 1,170        womensmarch 1,017        WhyIMarch 982        TheLifeVote 952        zerocomados


3,582,495 unique users.

Retweets

146,370 Retweets

I'm a nasty girl #WomensMarch pic.twitter.com/GjFriucGUY

— Nick Offerman (@Nick_Offerman) January 21, 2017


141,111 Retweets

Yes we can.
Yes we did.
Thank you for being a part of the past eight years. pic.twitter.com/mjmr4RkxpV

— White House Archived (@ObamaWhiteHouse) January 20, 2017


109,865 Retweets

Thanks for standing, speaking & marching for our values @womensmarch. Important as ever. I truly believe we're always Stronger Together.

— Hillary Clinton (@HillaryClinton) January 21, 2017


84,161 Retweets

'Hope Not Fear'
Indeed.
And what a beautiful piece by Louisa Cannell. #womensmarchpic.twitter.com/7h3Bzx79nB

— Hillary Clinton (@HillaryClinton) January 21, 2017


70,600 Retweets

I'm here today to honor our democracy & its enduring values. I will never stop believing in our country & its future. #Inauguration

— Hillary Clinton (@HillaryClinton) January 20, 2017


62,591 Retweets

People chanting: "Love not hate, makes America great." #WomensMarchdenver pic.twitter.com/Fukn4pYq1k

— Larry Ryckman (@larryryckman) January 21, 2017


59,366 Retweets

Congratulations to the women marching today. We must go forward to ensure full reproductive justice for all women. #WomensMarch

— Bernie Sanders (@SenSanders) January 21, 2017


56,365 Retweets

The #WomensMarch in downtown St. Louis! This is what democracy looks like. pic.twitter.com/mzb5xMZUKC

— Antonio French (@AntonioFrench) January 21, 2017


52,125 Retweets

Hi everybody! Back to the original handle. Is this thing still on? Michelle and I are off on a quick vacation, then we'll get back to work.

— Barack Obama (@BarackObama) January 20, 2017


50,944 Retweets

Was going on Good Morning Britain, didn't realise @piersmorgan was host. Won't go on with him after his comments about #WomensMarch

— Ewan McGregor (@mcgregor_ewan) January 24, 2017


Clients Tweets Clients 7,098,145        Twitter for iPhone 3,718,467        Twitter for Android 2,066,773        Twitter for iPad 634,054        Twitter Web Client 306,225        Mobile Web (M5) 127,622        TweetDeck 59,463        Instagram 54,851        Tweetbot for iOS 47,556        Twitter for Windows 36,404        IFTTT URLs Tweets       URL 29,223        https://www.facebook.com/cnn/videos/10155945796281509/ 27,435       http://www.cnn.com/2017/01/21/politics/womens-march-donald-trump-inauguration-sizes/index.html?sr=twCNN012117womens-march-donald-trump-inauguration-sizes0205PMStoryGal 24,854       http://www.independent.co.uk/news/world/americas/womens-march-antarctica-donald-trump-inauguration-women-hate-donald-trump-so-much-they-are-even-a7538856.html 21,189       https://twitter.com/kayleighmcenany/status/822979246205403136 20,902       https://twitter.com/mcgregor_ewan/status/823805815488331776 14,857       http://www.cnn.com/2017/01/21/politics/womens-march-donald-trump-inauguration-sizes/index.html?sr=twpol012117womens-march-donald-trump-inauguration-sizes0832PMVODtopLink&linkId=33643748 12,630       https://www.womensmarch.com/sisters 11,244       https://twitter.com/tomilahren/status/822852245532319744 9,761       https://twitter.com/mstharrington/status/823190136200593408 9,585       http://www.cnn.com/2017/01/21/politics/womens-march-protests-live-coverage/index.html?sr=twCNN012117womens-march-protests-live-coverage1208PMVODtop


2,403,637 URLs tweeted, with 527,350 of those being unique urls.

I've also setup a little bash script to feed all the unique urls to Internet Archive:

#!/bin/bash URLS=/path/to/WomensMarch_urls_uniq.txt index=0 cat $URLS | while read line; do curl -s -S "https://web.archive.org/save/$line" > /dev/null let "index++" echo "$index/527350 submitted to Internet Archive" sleep 1 done

And, I've also setup a crawl with Heritrix, and I'll make that data available here once it is complete.

Domains Tweets Domian 1,219,747        twitter.com 159,087        instagram.com 134,309        cnn.com 68,479        facebook.com 50,561        womensmarch.com 43,219        youtube.com 36,946        nytimes.com 30,201        huffingtonpost.com 21,520        paper.li 21,476        cbsnews.com Embedded Images Tweets Image 146,442      
81,139       
71,877       
64,149       
59,214       
58,599       
51,439       
44,611       
43,845       
41,436       


6,153,894 embedded image URLs tweeted, with 390,298 of those being unique urls.

I'll be creating an image montage similar to what I did for #elxn42 and #panamapapers for #WomensMarch. It'll take some time, and I have to gather resources to make it happen since we're looking at about 5 times the amount of images for #WomensMarch.

tags: #womensmarchtwarctwitteractivismpolitics

LITA: Meet Your Candidates for the 2017 LITA Election

planet code4lib - Mon, 2017-01-30 21:06

The LITA Board is pleased to announce the slate of candidates for the 2017 spring election:

Vice-President/President-Elect Director-at-Large, 3 elected, each for a 3-year term

Voting in the 2017 ALA election will run March 13 – April 5, and results will be announced on April 12. Note that eligible members will be sent their voting credentials via email over a three-day period, March 13-15. Check the main ALA website for information about the general ALA election.

The Board thanks the LITA Nominating Committee for all of their work: Rachel Vacek (Chair), Emily E. Clasper, and Melissa S. Stoner. Thank you to the candidates for agreeing to serve.

Code4Lib Journal: Editorial: Introspection as Activism, or, Getting Our Houses in Order

planet code4lib - Mon, 2017-01-30 18:46
Those of us in libraries like to trace our history to Alexandria or to the French governmental system of record-keeping, but the construction of the modern GLAM world is far more recent, almost as new as coding. It has evolved almost as rapidly. And its future is on us, whether we choose to passively accept a status quo others build or to act and grow and develop ourselves and our workplaces.

Code4Lib Journal: Bridging Technologies to Efficiently Arrange and Describe Digital Archives: the Bentley Historical Library’s ArchivesSpace-Archivematica-DSpace Workflow Integration Project

planet code4lib - Mon, 2017-01-30 18:46
In recent years, ArchivesSpace and Archivematica have emerged as two of the most exciting open source platforms for working with digital archives. The former manages accessions and collections and provides a framework for entering descriptive, administrative, rights, and other metadata. The latter ingests digital content and prepares information packages for long-term preservation and access. In October 2016, the Bentley Historical Library wrapped up a two-year, $355,000 grant from the Andrew W. Mellon Foundation to partner with the University of Michigan Library on the integration of these two systems in an end-to-end workflow that will include the automated deposit of content into a DSpace repository. This article provides context of the project and offers an in-depth exploration of the project’s key development tasks, all of which were provided by Artefactual Systems, the developers of Archivematica (code available at https://github.com/artefactual-labs/appraisal-tab).

Code4Lib Journal: The Devil’s Shoehorn: A case study of EAD to ArchivesSpace migration at a large university

planet code4lib - Mon, 2017-01-30 18:46
A band of archivists and IT professionals at Harvard took on a project to convert nearly two million descriptions of archival collection components from marked-up text into the ArchivesSpace archival metadata management system.  Starting in the mid-1990s, Harvard was an alpha implementer of EAD, an SGML (later XML) text markup language for electronic inventories, indexes, and finding aids that archivists use to wend their way through the sometimes quirky filing systems that bureaucracies establish for their records or the utter chaos in which some individuals keep their personal archives.  These pathfinder documents, designed to cope with messy reality, can themselves be difficult to classify.  Portions of them are rigorously structured, while other parts are narrative.  Early documents predate the establishment of the standard; many feature idiosyncratic encoding that had been through several machine conversions, while others were freshly encoded and fairly consistent.  In this paper, we will cover the practical and technical challenges involved in preparing a large (900MiB) corpus of XML for ingest into an open-source archival information system (ArchivesSpace). This case study will give an overview of the project, discuss problem discovery and problem solving, and address the technical challenges, analysis, solutions, and decisions and provide information on the tools produced and lessons learned.  The authors of this piece are Kate Bowers, Collections Services Archivist for Metadata, Systems, and Standards at the Harvard University Archives, and Dave Mayo, a Digital Library Software Engineer for Harvard’s Library and Technology Services.  Kate was heavily involved in both metadata analysis and later problem solving, while Dave was the sole full-time developer assigned to the migration project.

Code4Lib Journal: Participatory Design Methods for Collaboration and Communication

planet code4lib - Mon, 2017-01-30 18:46
Website redesigns can be contentious and fraught in any type of organization, and libraries are no exception. Coming to consensus on priorities and design decisions is nearly impossible, as different groups compete to ensure their subject or specialty area is represented. To keep projects on track and on time, libraries may give a few staff members the authority to make all of the decisions, while keeping user research limited to a small number of usability tests. While these tactics are sometimes necessary, at best they can leave many feeling left out of the process, and at worst, can result in major oversights in the final design. Participatory design methods can bring users and stakeholders into the design process and ultimately lead to a better design and less friction in the project. The authors share their experience and lessons learned using participatory design techniques in a website redesign project at a large, multi-location academic library, and how these techniques facilitated communication, shaped design decisions, and kept a complex, difficult project on track.

Code4Lib Journal: Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows

planet code4lib - Mon, 2017-01-30 18:46
In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University’s founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia’s Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python’s pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.

Code4Lib Journal: Supporting Oral Histories in Islandora

planet code4lib - Mon, 2017-01-30 18:46
Since 2014, the University of Toronto Scarborough Library’s Digital Scholarship Unit (DSU) has been working on an Islandora-based solution for creating and stewarding oral histories (the Oral Histories solution pack). Although regular updates regarding the status of this work have been presented at Open Repositories conferences, this is the first article to describe the goals and features associated with this codebase, as well as the roadmap for development. An Islandora-based approach is appropriate for addressing the challenges of Oral History, an interdisciplinary methodology with complex notions of authorship and audience that both brings a corresponding complexity of use cases and roots Oral Histories projects in the ever-emergent technical and preservation challenges associated with multimedia and born digital assets. By leveraging Islandora, those embarking on Oral Histories projects benefit from existing community-supported code. By writing and maintaining the Oral Histories solution pack, the library seeks to build on common ground for those supporting Oral Histories projects and encourage a sustainable solution and feature set.

Code4Lib Journal: Building a Scalable and Flexible Library Data Dashboard

planet code4lib - Mon, 2017-01-30 18:46
Data dashboards provide libraries with the means to demonstrate their ongoing activities and usage in an engaging and communicative fashion. Yet, due to the number of service platforms used by libraries, and the wide-ranging technical specifications they entail, bringing all of this content together in a sustainable way is a significant challenge. This article describes Portland State University’s project to design and build a data dashboard based on a scalable and flexible infrastructure that would enable them to present data in a visually compelling and dynamic interface.

Code4Lib Journal: What’s New? Deploying a Library New Titles Page with Minimal Programming

planet code4lib - Mon, 2017-01-30 18:46
With a new titles web page, a library has a place to show faculty, students, and staff the items they are purchasing for their community. However, many times heavy programing knowledge and/or a LAMP stack (Linux, Apache, MySQL, PHP) or APIs separate a library’s data from making a new titles web page a reality. Without IT staff, a new titles page can become nearly impossible or not worth the effort. Here we will demonstrate how a small liberal arts college took its acquisition data and combined it with a Google Sheet, HTML, and a little JavaScript to create a new titles web page that was dynamic and engaging to its users.

Code4Lib Journal: OPRM: Challenges to Including Open Peer Review in Open Access Repositories

planet code4lib - Mon, 2017-01-30 18:46
The peer review system is the norm for many publications. It involves an editor and several experts in the field providing comments for a submitted article. The reviewer remains anonymous to the author, with only the editor knowing the reviewer´s identity. This model is now being challenged and open peer review (OPR) models are viewed as the new frontier of the review process. OPR is a term that encompasses diverse variations in the traditional review process. Examples of this are modifications in the way in which authors and reviewers are aware of each other’s identity (open identities), the visibility of the reviews carried out (open reviews) or the opening up of the review to the academic community (open participation). We present the project for the implementation of an Open Peer Review Module in two major Spanish repositories, DIGITAL.CSIC and e-IEO, together with some promising initial results and challenges in the take-up process. The OPR module, designed for integration with DSpace repositories, enables any scholar to provide a qualitative and quantitative evaluation of any research object hosted in these repositories.

Code4Lib Journal: Adopting a Distributed Model for Data Services

planet code4lib - Mon, 2017-01-30 18:46
This article describes how the Saint Edward’s University Library implemented a distributed model for the Institutional Repository. Based on Cloud Based platforms and APIs, the Library has created an Institutional Repository that is scaleable and modular, considerably lowering its implementation and maintenance costs, while lowering its technical complexity.

Code4Lib Journal: Developing an online platform for gamified library instruction

planet code4lib - Mon, 2017-01-30 18:46
Gamification is a concept that has been catching fire for a while now in education, particularly in libraries. This article describes a pilot effort to create an online gamified platform for use in the Woodbury University Library’s information literacy course. The objectives of this project were both to increase student engagement and learning, and to serve as an opportunity for myself to further develop my web development skills. The platform was developed using the CodeIgniter web framework and consisted of several homework exercises ranging from a top-down two-dimensional library exploration game to a tutorial on cleaning up machine-generated APA citations. This article details the project’s planning and development process, the gamification concepts that helped guide the conceptualization of each exercise, reflections on the platform’s implementation in four course sections, and aspirations for the future of the project. It is hoped that this article will serve as an example of the opportunities–and challenges–that await both librarians and instructors who wish to add coding to their existing skill set.

ACRL TechConnect: #1Lib1Ref Edit (2017)

planet code4lib - Mon, 2017-01-30 17:00

I participated in the “#1Lib1Ref” campaign again this year, recording my experience and talking through why I think it’s important.

Andromeda Yelton: the highest level of service

planet code4lib - Mon, 2017-01-30 15:53

I. We provide the highest level of service to all library users… ALA Code of Ethics

That’s what public libraries do, right? Provide service to everyone, respectfully and professionally — and without conditioning that respect on checking your papers. If you walk through those doors, you’re welcome here.

When you’re standing in the international arrivals area at Logan, you’re in a waiting area between a pair of large double doors, exiting from Customs, and then the doors to the outside world. We stood in a crowd of hundreds, chanting “Let Them In!” Sometimes, some mysterious number of minutes after a flight arrival, the doors would open, and tired people and their luggage pour through, from Zurich, Port-au-Prince, Heathrow, anywhere.

And the Code of Ethics ran through my head because that’s what we were chanting, wasn’t it? That anyone who walks through those doors is welcome here. Let them in.

Library values are American values. And if you have a stake in America, don’t let anyone build an America that’s less than what we as a profession stand for.


Hydra Project: IMPORTANT – Change of dates for Hydra Connect 2017

planet code4lib - Mon, 2017-01-30 15:37

Apologies, but after our announcement, just before Christmas, of dates for Hydra Connect 2017 it became apparent that they clashed with a PASIG conference which, at that point, had not been widely advertised.  This would have represented a conflict of interest for a significant number of our Hydra community

Accordingly, the dates for Hydra Connect 2017 have been changed.  It will still be hosted by Northwestern University but the dates are now Monday November 6th – Thursday November 9th, 2017.  This year we have made the decision to use a conference hotel and the event will take place at the Hilton Orrington near the University.  Please update your calendars!

Further information via emails and the Hydra wiki in due course!

Open Knowledge Foundation: Brazil’s Public Spending project is looking for leaders in various regions of Brazil to increase participation in the budgeting process.

planet code4lib - Mon, 2017-01-30 06:48

On the 11th of January, OK Brazil launched its new Public Spending website.

The website is part of a wider campaign to search, recruit and support new leaders that wish to work with transparency, mainly public spending, in Brazilian municipalities and is using OKI’s OpenSpending technical architecture. The support will be provided to mentors specializing in law, transparency, technology and open data. The goal here is to increase the transparency in budget execution, bidding process and contractual management of cities.

In order that leaders can achieve concrete results, the OK Brazil team will develop a chronogram with each and everyone of them, using the existing legal framework, the support of mentors and digital tools to increase transparency and the participation in the budgeting process.

“The new website demonstrates how to organize the missions and actions of the new leaders, empower the civilian society so that they may be able to monitor public spending and give access to both academics and journalists to budgeting data of cities”, says Lucas Ansei, developer and one of the mentors of the new website.

According to Thiago Rondon, coordinator of the OK Brazil team, the mentors will have a fundamental role to the formation of the leaders. “They’re specialists with experience on the matter at hand and will support the leaders with online conferences that will offer directions so that the impact of the actions of these new leaders is meaningful.”

Another goal of this new phase of the project is to reach out to city mayors all over the country with the intention to get them to both sign the Public Spending Brazil Commitment Letter and realize the concrete actions foreseen in the letter.

Be a leader of the Open Spending project in 2017

According to Thiago, there will be an initial agenda of action that functions like a step-by-step manual so that anyone can help to increase the transparency in the city where they reside. “We want to empower the people so that they may do that on their own. To potentialize the divulgation, we will have local leaders in pilot cities that will have a direct support from the OK Brazil.”

Those who want to participate as a local leader of the Public Spending project can do so on the website. During this first phase,  the OK Brazil team will select 15 local leaders through answers offered via inscription form.

LibUX: What users expect

planet code4lib - Mon, 2017-01-30 04:56

An overview of a Libre book-detail page, which one tester referred to as “dark Amazon.” We took that as a good thing.

Users have high expectations these days. The hours spent in elegant web apps like Netflix and Spotify seem to be sharpening the collective sense of design. What was once the pinnacle is now the convention, and as Don Norman said, “Conventions are slow to be adopted and, once adopted, slow to go away.” So we thought it would be fun to emulate some of our favorite sites in a lightweight concept discovery layer we call Libre. Below are some of the expectations we prioritized in the design. 1

#1: Things worth doing also look cool

First, we wanted to elevate books to the same “cool status” of other media. Thanks to Netflix and Spotify, that meant choosing a dark theme with white lettering and neon trim. Because of the ready association with the national library symbol, we chose blue for the secondary color.

Search bar and category navigation within Libre. These categories could later be personalized.

#2: The most useful things are also the most visible

The intent in a known-item search (33-60% of all queries 2) is rapid visual confirmation, so we highlighted title, author, and cover image. In more serendipitous browsing, the intent is evaluation, so average rating and a synopsis are prioritized second. 3

Central section of a Libre book-detail page. Other proposed tabs were Reviews (from Goodreads) and Library Info.

#3: All the answers are here

Several friends of mine have revealed, at one point or another, that they didn’t know the library was free. While this can seem shocking, it’s bad design to assume that the user knows everything they need: immigrants may never have had access to a public library before, and the less tech-savvy might need to know that borrowing ebooks is legal. Hence, we avoided jargon like “Place Hold,” list requirements, and explained the basic premise of a library in fine print beneath the main call-to-action.

#4: Browsing is always assisted

Other sites deliver personalized recommendations by capturing reams of personal data. Content-based recommendations like “Nebula Award Winners” or “NYT Bestsellers – Fiction” assist users in a similar way, though. Offering a compelling alternative is more important at the library than anywhere else online, since the title a user came looking for could be out on loan already. We wanted to keep our users from leaving in frustration if they encountered an unavailable title.

In our model, content-based recommendation lists like this one are curated by local librarians.

#5: I can bring friends

A site without sharing is a city without roads. Even if the features aren’t used too often, we decided that it was important to offer up multiple options for users to save, share, and otherwise show off their discoveries. We distinguish subtly between casual users, who might know to post or tweet, and the power user, who may want to embed a free link on his book review blog, for instance.

What other expectations are important? What features did we miss? Feel free to play around with the Libre prototype and let us know on Twitter at @readlibre.

Endnotes

1: Our work in this article focuses on a popular reading use case, and will therefore seem more applicable to public libraries. Still, we hope our friends in academics get something out of it too.

2: EBSCO and Ex Libris are at odds over this figure. EBSCO says “Just under 30” and Ex Libris “over 50.” Both of them exclude author searches from their definition of “known-item” entirely, which seems to me a mistake. Often an author search is an easier route to a known item: for instance, when the title is so long as to be annoying to type or so short as to be ambiguous. Therefore, I inflate their estimates by about 5%.

3: Notably absent are Format and Availability. These are currently displayed after the user clicks “See at the Library.” A more robust implementation might have them both appear on the page.

Ed Summers: Twitter Replies

planet code4lib - Sun, 2017-01-29 05:00

It can be difficult to have a conversation in Twitter but people somehow seem to manage. You can reply to someone’s tweet, and other people can reply to your replies, which forms a conversation thread of sorts. But the display of the thread is difficult to interpret.

What’s worse is that there is no Twitter API call to get the replies to a given tweet. If you have the JSON for a tweet in hand you can use the in_reply_to_status_id property to fetch the tweet that it is responding to. But the converse is not true: there is no straightforward way to get the tweets that are in response can given tweet. If I’m wrong about that please let me know. For a much more thorough discussion and analysis of these constraints see Alexander Nwala’s Tweet Visibility Dynamics in a Tweet Conversation Graph.

It’s a bit of a hack but you can use Twitter’s Search API to programmatically scan through tweets directed at a given user (e.g. to:barackobama), and inspect them to see if any are in response to a given tweet. You can also stop scanning when you arrive at tweets that are older than the tweet you are looking for responses to, since to my knowledge it’s impossible to reply to a tweet from the future. Yeah, that was my dry attempt at a joke. The big caveat here is that Twitter’s Search API only allows you to retrieve tweets from the last week. So this technique will only work for fetching conversation threads from the last week.

In the Documenting the Now project we are building tools to help researchers study Twitter. We’ve added a command to twarc that performs this heuristic to rebuild a given reply thread for a given tweet identifier. So to get the replies to this tweet:

let’s make this shit huge https://t.co/iP8IOY3CqB

— laura olin ((???)) January 25, 2017

you can run this command:

% twarc replies 82407791092769177 > replies.json

This will only get the initial set of replies to the tweet. If you want to get the entire conversation thread you can use the --recursive option:

% twarc replies 82407791092769177 --recursive > replies.json

That will get the replies to the replies, and will also walk up the conversation chain if the supplied tweet identifier is itself a reply to another tweet. In addition it will follow tweets that are quotes.

To demonstrate that it’s working we’ve added a little utility called network.py that will read a set of tweets and write out the network of conversation as a GEXF for loading into Gephi, or DOT for use with Graphviz or as a standalone HTML file that uses D3 to visualize the conversation in your browser. Here’s how you run it:

% ./network.py replies.json replies.html

and here’s what the D3 visualization looks like for that tweet above. Try clicking on the nodes in the graph to see the tweets that the node represents. You can see the quote is colored yellow, and the original tweet (the one with no parent) is colored red.

Paul Butler also recently added the ability to drag and drop a file of tweets generated with the twarc replies command in his Treeverse. Treeverse is a Chrome plugin which provides a much more usable display of a conversation thread. Here’s a screenshot of looking at that same set of replies. (https://paulgb.github.io/Treeverse/).

The nice thing about the D3 vidualization is that it’s possible to restyle the presentation using CSS. You can also use it to visualize the network of tweets that were not acquired using the replies command. For example here is a visualization that was generated from a search for the #datarefuge hashtag a few days ago. I recorded it as a video on a large screen because there were so many nodes.

If you get a chance to try any of this or have any thoughts about it I’d love to hear from you.

Ed Summers: Hidden

planet code4lib - Sat, 2017-01-28 05:00
> What the profession has deemed archival "theories" are perhaps more akin to > sets of rules and procedures or belief systems (albeit with hidden ontological > and epistemological viewpoints). > > @Trace:2010

Pages

Subscribe to code4lib aggregator