In ToxicDocs.org, a
Treasure Trove of Industry Secrets
Sample document |
The two New York professors (Markowitz at the City University of New York and Rosner at Columbia University) had spent decades working together at the intersection of history and public health, and much of their research focused on the consequences of corporate wrongdoing, so attacks weren’t uncommon — or even surprising.
This one, though, was particularly
scathing.
Philip Scranton, a historian at
Rutgers University, had taken aim at their book “Deceit and Denial: The Deadly
Politics of Industrial Pollution” — and at Markowitz in particular. Scranton
accused him of everything from “overgeneralization and failure to corroborate”
to “selectively appropriat[ing] information,” among a list of other alleged
misdeeds.
Rosner and Markowitz’s peers quickly
came to their defense, calling Scranton a “hired gun” for the chemical industry.
(Scranton had in fact been hired by a group of companies to review two chapters
in the book, along with a report Markowitz had prepared for a court case
involving job-related chemical exposure.)
But Rosner and Markowitz knew there would be more rounds to the stressful, time consuming, and seemingly never-ending fight.
But Rosner and Markowitz knew there would be more rounds to the stressful, time consuming, and seemingly never-ending fight.
“We didn’t know how to respond,”
said Rosner.
One of Rosner’s undergraduate
students, Merlin Chowkwanyun, gave them the answer. Why not, he asked, just
post all of their source documents — secret company memos, the minutes of
internal meetings, industry letters, and more — online and let people decide
for themselves? Rosner and Markowitz agreed.
Together with Chowkwanyun, they started by creating a website and uploading the maligned chapters of “Deceit and Denial,” with each footnote linked to the original supporting documents in their entirety.
Together with Chowkwanyun, they started by creating a website and uploading the maligned chapters of “Deceit and Denial,” with each footnote linked to the original supporting documents in their entirety.
“It was an incredibly liberating
moment,” Rosner recalls, adding that Chowkwanyun had “taught two old guys the
possibilities of what can be done with the web.”
Since then, Chowkwanyun has expanded
that early effort into what is now called ToxicDocs.org, a searchable public
archive of the many documents that Rosner and Markowitz have gathered in their
research over the years, as well as an ever-expanding host of others.
The site officially launched January 5 with an initial 20 million pages of material focused on six toxic substances: asbestos, benzene, lead, polychlorinated biphenyl (PCB), polyvinyl chloride, and silica, and millions more pages are coming. “There is no other toxic substances database like this,” said Chowkwanyun, who now teaches at Columbia.
The site officially launched January 5 with an initial 20 million pages of material focused on six toxic substances: asbestos, benzene, lead, polychlorinated biphenyl (PCB), polyvinyl chloride, and silica, and millions more pages are coming. “There is no other toxic substances database like this,” said Chowkwanyun, who now teaches at Columbia.
It took some time to get here.
After completing his undergraduate degree, Chowkwanyun went on to graduate school at the University of Pennsylvania, and while he kept thinking about the work he’d done with Markowitz and Rosner, he wasn’t able to focus on the project in earnest until after he finished his Ph.D. and started his postdoctoral work at the University of Wisconsin-Madison.
And even then, there were substantial hurdles. Many of the documents came un-scanned, and many others weren’t in the right format to be read and searched electronically. The first issue was solvable with many hours spent at a scanner. The second was more stubborn.
After completing his undergraduate degree, Chowkwanyun went on to graduate school at the University of Pennsylvania, and while he kept thinking about the work he’d done with Markowitz and Rosner, he wasn’t able to focus on the project in earnest until after he finished his Ph.D. and started his postdoctoral work at the University of Wisconsin-Madison.
And even then, there were substantial hurdles. Many of the documents came un-scanned, and many others weren’t in the right format to be read and searched electronically. The first issue was solvable with many hours spent at a scanner. The second was more stubborn.
Chowkwanyun estimated that more than
5 million pages needed to be converted to “optical character recognition” (OCR)
format, and commercial software took about 30 seconds per page. At that rate,
it would have taken him nearly five years of 24/7 operations to complete. Far
too slow.
So, Chowkwanyun began tinkering with natural language processing technology and UW-Madison’s high-speed computer to develop a faster method. After about a year, he had found one. A recent batch of about 1.5 million pages only required about three days to convert to OCR, “which was nothing,” he said.
So, Chowkwanyun began tinkering with natural language processing technology and UW-Madison’s high-speed computer to develop a faster method. After about a year, he had found one. A recent batch of about 1.5 million pages only required about three days to convert to OCR, “which was nothing,” he said.
A test version of the website went
live last February, with about 100,000 fully searchable pages. (In some other
systems only the document titles are indexed, not the text on the page).
Chowkwanyun has spent the last few months working out the kinks in the site.
Some of those have been additions, such as the inclusion of a bookmarking
button. Most, though, have been subtractions.
“We originally had a much more
elaborate set of features,” he said. But they found that users preferred
simplicity, and that a more streamlined site also performed better on mobile
devices. “It goes back to the idea of keeping things as slim as possible.”
The project does have similarities
to the Truth Tobacco Industry Documents maintained
by the University of California San Francisco — an archive of tobacco company
advertising, manufacturing plans, marketing campaigns, scientific research, and
political activities — as well as The Poison Papers collection, a
project of The Bioscience Resource Project and The Center for Media and
Democracy that does much the same with the chemical and pesticide industries.
But, Chowkwanyun said, ToxicDocs is
designed to be broader in scope and content, as well as much easier for
laypeople, like journalists and community members, to use.
The goal is not only to take the wind out of critics’ sails, but also to encourage a deeper public exploration of the documents. The hope is that it could foster a better understanding of the industries, their impact on communities, and perhaps even lead to new discoveries.
The goal is not only to take the wind out of critics’ sails, but also to encourage a deeper public exploration of the documents. The hope is that it could foster a better understanding of the industries, their impact on communities, and perhaps even lead to new discoveries.
That’s what happened with the
tobacco documents, said Susan Polan, who is with the American Public Health
Association and has decades of advocacy experience. There are still new tobacco
finds to this day, she says, and the same potential exists with ToxicDocs,
which she described as user-friendly. “That offers a phenomenal opportunity to
the public-health community.”
FOR HIS PART, Chowkwanyun is careful to differentiate
ToxicDocs from sites like WikiLeaks, which publishes confidential or personal
information often provided by anonymous sources. All the documents on
ToxicDocs, he said, are already publicly available. The site just centralizes
what would otherwise be scattered in courtrooms and law offices around the
country.
With the official launch last week,
the range of documents on ToxicDocs is expansive. There are notes from a 1969
Monsanto meeting where the company discusses plans to “sell the hell out of
[PCBs].”
There’s another document, from 1973, that shows the chemical industry debating whether it would be “illegal” to withhold the findings of a medical study from the government. There are some half a million more in the archive.
There’s another document, from 1973, that shows the chemical industry debating whether it would be “illegal” to withhold the findings of a medical study from the government. There are some half a million more in the archive.
Sen. Sheldon Whitehouse, a Democrat
from Rhode Island, is a staunch supporter of the project.
When he was his state’s attorney general in the early 2000s, he worked with Markowitz and Rosner on a case against the lead paint industry. They’ve kept in touch since, and Whitehouse is excited about the doors that ToxicDocs might open.
The new database combines “the reach of discovery with the capabilities of big data technology to provide a rich new resource for researchers, journalists, and public health advocates,” he said in a statement. “Unlocking the secrets of corporate archives can level the field between polluters and victims of industrial contamination.”
When he was his state’s attorney general in the early 2000s, he worked with Markowitz and Rosner on a case against the lead paint industry. They’ve kept in touch since, and Whitehouse is excited about the doors that ToxicDocs might open.
The new database combines “the reach of discovery with the capabilities of big data technology to provide a rich new resource for researchers, journalists, and public health advocates,” he said in a statement. “Unlocking the secrets of corporate archives can level the field between polluters and victims of industrial contamination.”
Of course, the potential impact of
ToxicDocs is still largely theoretical. The earlier iterations were not heavily
publicized and the bulk of the documents were only uploaded recently. It will
take time for people to comb through them.
But Chowkwanyun is already looking ahead, and hopes that the site will continue to grow and adapt. In addition to Markowitz and Rosner’s documents, he says, the archive could eventually include others. For example, he’s currently processing documents about the Flint, Michigan water crisis released through Freedom of Information Act requests.
But Chowkwanyun is already looking ahead, and hopes that the site will continue to grow and adapt. In addition to Markowitz and Rosner’s documents, he says, the archive could eventually include others. For example, he’s currently processing documents about the Flint, Michigan water crisis released through Freedom of Information Act requests.
Rosner and Markowitz would love to
see the site thrive as well. But, for them, it’s also becoming the refuge
they’d first hoped for. Instead of having to defend themselves to every critic,
or respond to every document request, they have started referring people to the
website. They thank Chowkwanyun for that.
The duo had always wanted to make
the material available to the public, Rosner said. “But we didn’t know how.”
Tik Root is a freelance journalist
whose work has been published by The Washington Post, Newsweek, The New Yorker,
and PBS, among other outlets.