Tuesday, January 16, 2018

Archive of poisonous truths

In ToxicDocs.org, a Treasure Trove of Industry Secrets
Sample document
THE HISTORIANS Gerald Markowitz and David Rosner took notice when, in 2004, a colleague wrote a 41-page report lambasting their work. 

The two New York professors (Markowitz at the City University of New York and Rosner at Columbia University) had spent decades working together at the intersection of history and public health, and much of their research focused on the consequences of corporate wrongdoing, so attacks weren’t uncommon — or even surprising.

This one, though, was particularly scathing.

Philip Scranton, a historian at Rutgers University, had taken aim at their book “Deceit and Denial: The Deadly Politics of Industrial Pollution” — and at Markowitz in particular. Scranton accused him of everything from “overgeneralization and failure to corroborate” to “selectively appropriat[ing] information,” among a list of other alleged misdeeds.

Rosner and Markowitz’s peers quickly came to their defense, calling Scranton a “hired gun” for the chemical industry. (Scranton had in fact been hired by a group of companies to review two chapters in the book, along with a report Markowitz had prepared for a court case involving job-related chemical exposure.) 

But Rosner and Markowitz knew there would be more rounds to the stressful, time consuming, and seemingly never-ending fight.

“We didn’t know how to respond,” said Rosner.

One of Rosner’s undergraduate students, Merlin Chowkwanyun, gave them the answer. Why not, he asked, just post all of their source documents — secret company memos, the minutes of internal meetings, industry letters, and more — online and let people decide for themselves? Rosner and Markowitz agreed. 


Together with Chowkwanyun, they started by creating a website and uploading the maligned chapters of “Deceit and Denial,” with each footnote linked to the original supporting documents in their entirety.

“It was an incredibly liberating moment,” Rosner recalls, adding that Chowkwanyun had “taught two old guys the possibilities of what can be done with the web.”

Since then, Chowkwanyun has expanded that early effort into what is now called ToxicDocs.org, a searchable public archive of the many documents that Rosner and Markowitz have gathered in their research over the years, as well as an ever-expanding host of others. 

The site officially launched January 5 with an initial 20 million pages of material focused on six toxic substances: asbestos, benzene, lead, polychlorinated biphenyl (PCB), polyvinyl chloride, and silica, and millions more pages are coming. “There is no other toxic substances database like this,” said Chowkwanyun, who now teaches at Columbia.

It took some time to get here. 

After completing his undergraduate degree, Chowkwanyun went on to graduate school at the University of Pennsylvania, and while he kept thinking about the work he’d done with Markowitz and Rosner, he wasn’t able to focus on the project in earnest until after he finished his Ph.D. and started his postdoctoral work at the University of Wisconsin-Madison. 

And even then, there were substantial hurdles. Many of the documents came un-scanned, and many others weren’t in the right format to be read and searched electronically. The first issue was solvable with many hours spent at a scanner. The second was more stubborn.

Chowkwanyun estimated that more than 5 million pages needed to be converted to “optical character recognition” (OCR) format, and commercial software took about 30 seconds per page. At that rate, it would have taken him nearly five years of 24/7 operations to complete. Far too slow. 

So, Chowkwanyun began tinkering with natural language processing technology and UW-Madison’s high-speed computer to develop a faster method. After about a year, he had found one. A recent batch of about 1.5 million pages only required about three days to convert to OCR, “which was nothing,” he said.

A test version of the website went live last February, with about 100,000 fully searchable pages. (In some other systems only the document titles are indexed, not the text on the page). Chowkwanyun has spent the last few months working out the kinks in the site. Some of those have been additions, such as the inclusion of a bookmarking button. Most, though, have been subtractions.

“We originally had a much more elaborate set of features,” he said. But they found that users preferred simplicity, and that a more streamlined site also performed better on mobile devices. “It goes back to the idea of keeping things as slim as possible.”

The project does have similarities to the Truth Tobacco Industry Documents maintained by the University of California San Francisco — an archive of tobacco company advertising, manufacturing plans, marketing campaigns, scientific research, and political activities — as well as The Poison Papers collection, a project of The Bioscience Resource Project and The Center for Media and Democracy that does much the same with the chemical and pesticide industries.

But, Chowkwanyun said, ToxicDocs is designed to be broader in scope and content, as well as much easier for laypeople, like journalists and community members, to use. 

The goal is not only to take the wind out of critics’ sails, but also to encourage a deeper public exploration of the documents. The hope is that it could foster a better understanding of the industries, their impact on communities, and perhaps even lead to new discoveries.

That’s what happened with the tobacco documents, said Susan Polan, who is with the American Public Health Association and has decades of advocacy experience. There are still new tobacco finds to this day, she says, and the same potential exists with ToxicDocs, which she described as user-friendly. “That offers a phenomenal opportunity to the public-health community.”

FOR HIS PART, Chowkwanyun is careful to differentiate ToxicDocs from sites like WikiLeaks, which publishes confidential or personal information often provided by anonymous sources. All the documents on ToxicDocs, he said, are already publicly available. The site just centralizes what would otherwise be scattered in courtrooms and law offices around the country.

With the official launch last week, the range of documents on ToxicDocs is expansive. There are notes from a 1969 Monsanto meeting where the company discusses plans to “sell the hell out of [PCBs].” 

There’s another document, from 1973, that shows the chemical industry debating whether it would be “illegal” to withhold the findings of a medical study from the government. There are some half a million more in the archive.

Sen. Sheldon Whitehouse, a Democrat from Rhode Island, is a staunch supporter of the project. 

When he was his state’s attorney general in the early 2000s, he worked with Markowitz and Rosner on a case against the lead paint industry. They’ve kept in touch since, and Whitehouse is excited about the doors that ToxicDocs might open. 

The new database combines “the reach of discovery with the capabilities of big data technology to provide a rich new resource for researchers, journalists, and public health advocates,” he said in a statement. “Unlocking the secrets of corporate archives can level the field between polluters and victims of industrial contamination.”

Of course, the potential impact of ToxicDocs is still largely theoretical. The earlier iterations were not heavily publicized and the bulk of the documents were only uploaded recently. It will take time for people to comb through them. 

But Chowkwanyun is already looking ahead, and hopes that the site will continue to grow and adapt. In addition to Markowitz and Rosner’s documents, he says, the archive could eventually include others. For example, he’s currently processing documents about the Flint, Michigan water crisis released through Freedom of Information Act requests.

Rosner and Markowitz would love to see the site thrive as well. But, for them, it’s also becoming the refuge they’d first hoped for. Instead of having to defend themselves to every critic, or respond to every document request, they have started referring people to the website. They thank Chowkwanyun for that.

The duo had always wanted to make the material available to the public, Rosner said. “But we didn’t know how.”

Tik Root is a freelance journalist whose work has been published by The Washington Post, Newsweek, The New Yorker, and PBS, among other outlets.