Writer wanted: Better Conflict Bulletin

Posted onLeave a comment

I’m looking for a professional writer to produce a weekly newsletter for those who care about making the U.S. political conflict less violent and more productive. The ideal candidate is someone who is both fascinated and horrified by thepolarizationof American politics. You’re already reading widely across the political spectrum, suspicious of ideology and tribalism, nerdy about conflict dynamics,and more into the idea of learning to live together than winning.

This is a new publication which will offernewsand informed perspectives on the U.S. civil conflict. Topics will includepolarization, the culture war, disinformation, institutional trust, censorship, algorithms, extremism, protest movements, and so on. We are writing for professionals whose work touches this conflict in some way including mediators, technologists, researchers, policy-makers and journalists. Conflict is an essential part of how society changes, so we’re not trying to end or “resolve” it. Rather, we’re trying to have better conflict — a good fight. Our orientation is practical, as we are always asking the question: what can I, as a professional whose work is touched by this conflict, do differently to make things better?

The Better Conflict Bulletin will be in newsletter form and include:

A weekly roundup of the bestnews, essays, and research in this space. This will be your main task.Interviews with people working on conflict-related problems.Original articles that summarize important trends or research.

We might discuss content like:

Complicate The Narrativeby Amanda Ripley,the best advice I know for how do to journalism onpolarizedtopicsBraver AngelsandLiving Room Conversations, two organizations scaling small in-person mediated gatheringsChloe Valdary, who does a different type of anti-racism trainingTo Reason with a Madmanby Charles Eisenstein , an essay on the informational consequences of conflictACLED US crisis monitor, for reliable data on US civic violenceDesigning Recommenders to Depolarize, a research paper

The newsletter will be edited by Jonathan Stray, a former AP journalist who researchers media algorithms and their effects on conflict at UC Berkeley. Thoughtful people from across the political spectrum are encouraged to apply. We are starting with a one year grant from the Mercatus center at George Mason University to provide a valuable service to the rapidly growing bridge-building community.

To apply AI for good, think form extraction

Posted on1 Comment

Folks who want to use AI/ML for good generally think of things like building predictive models, but smart methods for extracting data from forms would do more for journalism, climate science, medicine, democracy etc. than almost any other application. Since March, I’ve been working with a small team on applying deep learning to a gnarly campaign finance dataset that journalists have been struggling with for years. Today we are announcing Deepform, a baseline ML model, training data set, and public benchmark where anyone can submit their solution. I think this type of technology is important, not just for campaign finance reporters but for everyone.

This post has four parts:

Why form extraction is an important problem Why it’s hardThe start of the art of deep learning for form extraction The Deepform model and dataset, our contribution to this problem

Form extraction is incredibly useful

Form extraction could help climate science because a lot of old weather data is locked in forms. These forms come in a crazy variety of different formats from all over the world and across centuries.

Continue reading To apply AI for good, think form extraction

What tools do we have to combat disinformation?

Posted onLeave a comment

What types of defenses against disinformation are possible? And which of these would we actually want to use in a democracy, where approaches like censorship can impinge on important freedoms? To try to answer these questions, I looked at what three counter-disinformation organizations are actually doing today, and categorized their tactics.

The EU East StratCom Task Force is a contemporary government counter-propaganda agency. Facebook has made numerous changes to its operations to try to combat disinformation, and is a good example of what platforms can do. The Chinese information regime is a marvel of networked information control, and provokes questions about what a democracy should and should not do.

The result is the paper Institutional Counter-disinformation Strategies in a Networked Democracy (pdf). Here’s a video of me presenting this work at the the recent Misinfoworkshop.

Continue reading What tools do we have to combat disinformation?

An Introduction to Algorithmic Bias and Quantitative Fairness

Posted onLeave a comment

There are many kinds of questions about discrimination fairness or bias where data is relevant. Who gets stopped on the road by the police? Who gets admitted to college? Who gets approved for a loan, and who doesn’t? The data-driven analysis of fairness has become even more important as we start to deploy algorithmic decision making across society.

I attempted to synthesize an introductory framework for thinking about what fairness means in a quantitative sense, and how these mathematical definitions connect to legal and moral principles and our real world institutions of criminal justice, employment, lending, and so on. I ended up with two talks.

This short talk (20 minutes), part of a panel at the Investigative Reporters & Editors conference, has no math. (Slides)

This longer talk (50 minutes), presented at Code for America SF, gets into a lot more depth, including the mathematical definitions of different types of fairness, and the whole tricky issue of whether or not algorithms should be “blinded” to attributes like race and gender. It also includes several case studies of real algorithmic systems, and discusses how we might design such systems to reduce bias. (Slides)

My favorite resources on these topics:

The Workbench workflow analyzing Massachusetts traffic ticket data.Sandra Mayson, Bias In, Bias Out. One of my favorite overall discussions of algorithmic bias.Megan Stevenson, Assessing Risk Assessment in Action. What happens with criminal justice risk assessment in the real world?Corbett-Davies and Goel, The Measure and Mismeasure of Fairness is a well done more mathematical discussions of fairness measures.Open Policing Project findings. A very clearly thought out analysis of US national traffic stop data.Workbench Open Policing Project tutorial. An interactive introduction to working with this data.Arvind Narayanan, 21 Definitions of Fairness and Their Politics. More on the connection between quantitative and political concepts of fairness.

Extracting campaign finance data from gnarly PDFs using deep learning

Posted on7 Comments

Update, Oct 2020: we’ve done a lot more since this post! If you want to try working on this problem, Weights and Biases is very kindly hosting a public benchmark.

I’ve just completed an experiment to extract information from TV station political advertising disclosure forms using deep learning. In the process I’ve produced a challenging journalism-relevant dataset for NLP/AI researchers. Original data from ProPublica’s Free The Files project.

The resulting model achieves 90% accuracy extracting total spending from the PDFs in the (held out) test set, which shows that deep learning can generalize surprisingly well to previously unseen form types. I expect it could be made much more accurate through some feature engineering (see below.)

You can find the code and documentation here. Full thanks to my collaborator Nicholas Bardy of Weights & Biases.


TV stations are required to disclose their sale of political advertising, but there is no requirement that this disclosure is machine readable. Every election, tens of thousands of PDFs are posted to the FCC Public File, available at https://publicfiles.fcc.gov/. All of these contain essentially the same information, but in in hundreds of different formats, like these:

Continue reading Extracting campaign finance data from gnarly PDFs using deep learning

Ethical Software Engineering Lab Course

Posted onLeave a comment

There is now, at long last, wide concern over the negative effects of technology, along with calls to teach ethics to engineers. But critique is not enough. What tools are available to the working engineer to identify and mitigate the potential harms of their work?

I’ve been teaching the effects of technology on society for some time, and we cover a lot of it in my computational journalism course. This is an outline for a broader hands-on course, which I’m calling the Ethical Engineering Lab.

This eight-week course is a hands-on introduction to the practice of what you might call harm-aware software engineering. I’ve structured it around the Institute for the Future’s Ethical OS, a framework I’ve found useful for categorizing the places where technology intersects with personal and social harm. Each class is three hours long, split between lecture and lab time. Students must complete a project investigating actual or potential harms from technology, and their mitigations.

Each lecture is structured around a set of issues, cases where technology is or could be involved in harm, and tools, methods for mitigating these harms. The goal is to train students in the current state-of-the-art of these problems, which often requires a deep dive into both the social and technical perspectives. We will study both differential privacy algorithms and HIPAA health data privacy. In many cases there is disagreement over the potential for certain harms and their seriousness, so we will explore the tradeoffs of possible design choices.

Continue reading Ethical Software Engineering Lab Course

Introducing Workbench

Posted on6 Comments

Some of you may have heard about by new data journalism project — The Computational Journalism Workbench. This is an integrated platform for data journalism, combining scraping, analysis, and visualization in one easy tool. It works by assembling simple modules into a “workflow,” a repeatable, sharable, automatically updating pipeline that produces a publishable chart or a live API endpoint.

I demonstrated a prototype at the NICAR conference. UPDATE: Workbench is now in production at workbenchdata.com and has now been used in teaching in dozens of schools.

I’ll be working on Workbench for at least the next few years. My previous large data journalism project is the Overview document mining system, which continues active development.

Defense Against the Dark Arts: Networked Propaganda and Counter-Propaganda

Posted on32 Comments

In honor of MisinfoCon this weekend, it’s time for a brain dump on propaganda — that is, getting large numbers of people to believe something for political gain. Many of my journalist and technologist colleagues have started to think about propaganda in the wake of the US election, and related issues like “fake news” and organized trolling. My goal here is to connect this new wave of enthusiasm to history and research.

This post is about persuasion. I’m not going to spend much time on the ethics of these techniques, and even less on the question of who is actually right on any particular point. That’s for another conversation. Instead, I want to talk about what works. All of these methods are just tools, and some are more just than others. Think of this as Defense Against the Dark Arts.

Let’s start with the nation states. Modern intelligence services have been involved in propaganda for a very long time and they have many names for it: information warfare, political influence operations, disinformation, psyops. Whatever you want to call it, it pays to study the masters.

Continue reading Defense Against the Dark Arts: Networked Propaganda and Counter-Propaganda

What do Journalists do with Documents?

Posted on4 Comments

Many people have realizedthat natural language processing (NLP) techniques could be extraordinarily helpful to journalists who need to deal with large volumes of documents or other text data. But although there have been many experiments and much speculation, almost no one has built NLPtools that journalists actuallyuse. In part, this is because computer scientists haven’t had a good description of the problems journalists actually face. This talk and paper, presented at theComputation + Journalism Symposium, are oneattempt to remedy that. (Talk slides here.)

This all comes out of my experience both building and using Overview, an open source document mining system built specifically for investigative journalists. The papersummarizes every story completed with Overview,and also discusses the five cases I know where journalists used custom NLP code to get the story done.

Continue reading What do Journalists do with Documents?

The Dark Clouds of Financial Cryptography

Posted on2 Comments

I feel we’re on the precipice of some delightfully weird and possibly very alarming developments at the intersection of code and money.  There is something deep in the rules that is getting rewritten, only we can’t quite see how yet. I’ve had this feeling before, as a self-described Cypherpunk in the 1990s. We knew or hoped that encrypted communication would change global politics, but we didn’t quite know how yet. And then Wikileaks happened. As Bruce Sterling wrote at the time,

At last — at long last — the homemade nitroglycerin in the old cypherpunks blast shack has gone off.

That was exactly how I felt when that first SIGACT dump hit the net, by then a newly hired editor at the Associated Press. Now I’m studying finance, and I can’t shake the feeling that cryptocurrencies — and their abstracted cousins, “smart contracts” and other computational financial instruments — are another explosion of weirdness waiting to happen.

I’m hardly alone in this. Lots of technologists think the “block chain” pioneered by bitcoin is going to be consequential. But I think they think this for the wrong reasons. Bitcoin itself is never going to replace our current system of money transfer and clearing; it’s much slower than existing payment systems, often more expensive, uses far too much energy, and don’t scale well. Rather, bitcoin is just a taste, a hint: it shows that we can mix computers and money in surprising and consequential ways. And there are more ominous portents, such as contracts that are actually code and the very first “distributed autonomous organizations.” But we’ll get to that.

What is clear is that we are turning capitalism into code — trading systems, economic policy, financial instruments, even money itself — and this is going to change a lot of things. Continue reading The Dark Clouds of Financial Cryptography