Big Data & the Public Good: A Conversation about Array of Things at SAIC

On April 4th, Illinois Humanities hosted “Big Data & the Public Good” at the School of the Art Institute of Chicago. The event overviewed the Array of Things urban sensing project and facilitated a conversation on the role of technology in contemporary society. Smart Chicago’s Executive Director Dan O’Neil moderated the event. The featured presenters were Douglas Pancoast and Marissa Lee Benedict.

This program was organized by the School of the Art Institute of Chicago and was supported in part by the Robert R. McCormick Foundation. On the event website, the event’s framing questions on big data, technology, and democracy were listed: 

What is the relationship between information technology, urban space, and the public good in the age of big data? Where do “smart cities” initiatives like the Array of Things – which doesn’t collect any information about individuals – fit into contemporary conversations about privacy and surveillance? How can the arts and humanities help our society think through these issues?


Douglas Pancoast, an Associate Professor of Interior Architecture and Designed Objects at the School of the Art Institute of Chicago, designed the Array of Things sensor enclosures along with Satya Mark Basu. Pancoast gave background on the project and shared the evolving iterations of the design:

Pancoast overviewed the functionality of the Array of Things nodes — that they will measure air quality, standing water, noise pollution, wind, light, pedestrian traffic, and other environmental factors. Wired Magazine called Array of Things a “Fitbit for the City.”

The second speaker, Marissa Lee Benedict, gave an artist’s perspective on Array of Things. Benedict is a lecturer at the School of the Art Institute of Chicago in Sculpture and Fiber & Material Studies, and works as the Program Coordinator for the Arts, Science & Culture Initiative at the University of Chicago. As an artist, Benedict can approach technology with a different perspective — she can see the art in data and fiber-optic cables. She can also assist with activist gestures in a way other people working with technology cannot.

What is the value of open data from urban sensors?

Several themes arose from the audience questions at the event. The first theme centered on the expected benefits of Array of Things and the data it would produce.


Pancoast pointed out that society has always placed value in creating and investing in archived, searchable collections of information. Organizing and sharing the data produced from an urban sensing project arguably has the same societal value of building and filling a library with books.

Still, communicating the value of collecting data through urban sensors also means articulating compelling, relevant use cases of the data. The value of open data from this smart city infrastructure is less clear unless there are specific examples of how data can be turned into local action:

Pancoast shared several problems that could be identified and illuminated by Array of Things data. Some examples:

  • Understanding how exhaust activity at O’Hare impacts surrounding property values
  • Understanding how noise pollution in certain areas of the city should impact zoning
  • Understanding the source of standing water

One specific case highlighted was Albany Park. Albany Park has a high incidences of flooding. If we could watch it, monitor where the water goes, how long it takes to evaporate and see how it correlates to other environmental factors, the problem can be better defined.

Engagement & Participation

Of course, problem identification isn’t enough to catalyze change. Communities have to be involved and empowered to act on the new information. In this light, another main theme from the event was resident engagement: what type of engagement is needed, who to engage with, and how to do it well. Specifically, there was an interest in how Chicagoans might broadly engage with the Array of Things project outside of targeted efforts in schools and youth programs. At Smart Chicago, we are committed to this broad engagement with urban sensing and the Internet of Things.

Benedict shared the following thought-provoking questions with the audience:

questions AT

Dan O’Neil shared some of the best practices that Smart Chicago has gleaned: do engagement work as openly as possibly, document your process and planning, invite everyone, and “fetishize the outputs.” One recent example of model of engagement is Smart Chicago’s work with the Chicago Police Accountability Task Force Community Forums.

This event facilitated an interesting conversation about data, participation, and urban sensors — a conversation that needs to be continued openly, interactively, and across different venues in Chicago. Smart Chicago is committed to broad community engagement on the Array of Things project. To learn more about this work, visit our project page.

Chicago Hospitals Win $8.75 Million to Launch Data Network

A data-sharing network of 10 Chicago hospitals could make medical research more reliable and less expensive. It’s a big-data project that keeps patients records locked up, but lets researchers search for trends.

An $8.75 million grant will fund the three-year launch of the Chicago Area Patient-Centered Outcomes Research Network. Awarded July 21, the contract taps money set aside in the Affordable Care Act for medical research.

Terry Mazany

Terry Mazany, The Chicago Community Trust

“What’s unique about CAPriCORN is that it brings together these 10 institutions that historically have been competitors, or at least disinterested in each other,” says Terry Mazany, chief executive of The Chicago Community Trust and the project’s principal investigator. (The trust is also a Smart Chicago funder.)

“This brings them together in a very formal organization across the entire region,” Mazany says, “with a patient population of upwards of 5 million patients potentially available for research, and in particular a patient population that is very diverse.”

The Chicago network and clinical networks in 10 other regions will allow health advocates to monitor even rare conditions and prove how well current treatments work.

Their first test will be Duke University’s nationwide study to prove whether taking children’s aspirin to prevent a heart attack is as effective as an adult dose, which carries potential side effects. Researchers in Chicago and five other cities will study 20,000 at-risk heart patients, a large sample size that allows fine-tuned analysis.

Richard Kennedy, Loyola University Chicago

Richard Kennedy, Loyola University Chicago

“They contacted us and said, you’ve got the numbers that we need, would you be able to participate?” says Richard Kennedy, vice provost for research and graduate studies at the Loyola University Chicago health sciences division in Maywood. “We had a significant number of patients that would fit nicely in the cohort.” Kennedy and Frances Weaver are Loyola’s head researchers for the data network.

Hospitals now are collaborating on how to conduct the trial and manage the data. Other studies will track obese patients after bariatric surgery and children on antibiotics to treat immune disorders. Mazany sees Chicago hospitals as active participants. “When the national level is looking at need and expertise in an area, we have a far broader and deeper bench than any of the other systems,” he says. “That’s a real strength.”

In a $7 million startup phase, CAPriCORN built out a system to connect the medical centers without exposing patient information. The next phase explores its real-world uses, as well as a funding model that puts patients’ interests first.

The aspirin study “is going to answer a question of great clinical concern,” Kennedy says, “but the importance is truly we’re testing the infrastructure we’ve been building for the past 18 months. All right, you’ve put together what seems to be a very impressive informatics system with all the security we would want for our patients. Now let’s see if it works.”

Privacy starts with keeping personal identifiers off the network. Researchers query data in a small, separate access layer, with names and addresses reduced to a cryptographic hash. “We’re currently having it validated by a security firm that’s one of the top in the region to make sure it protects subjects,” Kennedy says.

A novel algorithm links the anonymous patients’ records across all hospitals, giving public health researchers a more reliable count of how common their condition is and where to find hot spots. “You have the ability to look for rare diseases and aggregate an adequate sample size to do statistically significant studies,” Mazany says.

“There’s a next step in some of the research designs,” he adds. Instead of just counting how many patients share a condition, studies that pass an ethics review will reach out to them.

“Let’s say you’re looking at exploring treatments for sickle cell, and you’re specifically looking at teenagers as a population,” he explains. “Then you can do a query to identify the total population and where they’re distributed among institutions.”

Hospitals then can ask patients to join clinical trials that will log treatment details. “It still protects patient privacy but is able to more efficiently identify candidates for the research study,” Mazany says.

Researchers see the network as low-cost way to recruit trial subjects. “Instead of tens of thousands of dollars per participant, then it’s dollars per participant,” Mazany says. “You leverage the efficiency of large data systems so each researcher doesn’t independently have to enroll institutions.

“What makes this in someone’s interest? Lowering the cost of research, speeding up research, creating greater effectiveness. Those three standards are part of a health system that’s learning and evolving rapidly.”

The focus likely will improve data handling as well. “One interesting byproduct could be if there is unevenness across institutions that may become apparent,” Mazany says.

Research on the network will be subject to more thorough advance review. “It’s patient centered,” Loyola’s Kennedy says. “It includes a lot of patient input into the design of the study, the importance of the study to the subjects, the patients, the community.”

Like other clinical trials, research must pass muster with an institutional review board. Feedback also comes from a doctor-patient advisory panel that includes advocates for treating asthma, arthritis and other diseases.

“There’s also a pastor’s group on the South Side that’s very active,” Mazany says. The advisory group “totals about 30 people — it’s a pretty large group.”

The extra review should put important research on a fast track, and prime doctors and patients to follow its recommendations.

“Oftentimes research truly answers medical questions for the people that ran it, yet the results don’t get distributed and implemented as well as we would like,” Kennedy says. “We hope that by engaging the community and the patients – and the clinicians who are taking care of those patients – the results will be implemented much more quickly, because they will be designed in part by input from these subjects.”

The aspirin study also will look into the benefits of mobile health devices. A University of California-San Francisco team will give some participants apps to send reminders and record activity. In Chicago, Kennedy says investigators are considering how they might manage frequent readings from blood sugar monitors in a diabetes trial.

The network is “more open and accessible for that type of data collection,” Mazany says. “Who knows where that will lead as far as the efficacy of the research?”

Hospitals will have to consider a long-term funding model after federal funding runs out in 2018. “We’ve been contacted by an industry sponsor, who would very much like to think that there was a Chicago network they could access without working individually with the 10 institutions,” Kennedy says “That’s going to take some time to create that kind of trust.”

Mazany wants to make sure patient advocates can propose research on the network, but they’ll need to be thoroughly vetted. “They’ll come up with their own queries, but there won’t be an open-data hack night,” he says. “There are just too many privacy and security concerns with these types of data. But in a sense, the hack night would be communities and patients identifying questions that could interrogate data sets through the mechanism of the queries.”

The network has no data portal, but researchers will be encouraged to find ways to show their work outside of medical journals. That may include websites such as Smart Chicago’s Chicago Health Atlas, a past collaborator with hospital networks.

“The Health Atlas is an example of a good partner both on the front end of identifying important trends in the data that can help to frame priorities, and then on the back end as a distribution system for communications outward,” Mazany says. “I look for the Health Atlas to be a very valuable partner, but none of this has been formalized.”

The big-data approach also might spread beyond hospitals. “I don’t know how that’s going to play out,” Mazany says. The network already includes community health centers that store electronic health records centrally. He envisions opening up the network to more health providers.

“That line of thinking is an exciting frontier,” Mazany says. “Right now everybody is up to their necks in alligators draining the swamp. The analogy with Walt Disney envisioning Disney World and Florida in the midst of the swamp I think is appropriate here. We have a vision and are laying the infrastructure to have arise a Magic Kingdom.”

Private Data and Public Health: How Chicago Health Atlas Protects Identities

Public health information is anonymous. In the day of the data breach, identity theft and wearable health trackers, data scientists have procedures in place to keep it that way.

“Health information presents a huge risk,” says security researcher Larry Ponemon. In a report for the Medical Identity Fraud Alliance, he estimates that 2.3 million Americans have been victims of medical identity theft.

The biggest danger to consumers comes from others using their insurance or other identification to run up medical bills. “You want to take whatever steps you can to protect yourself,” Ponemon says. “In the hands of a criminal, that could be really valuable.”

Data provided to researchers, such as the statistics in the Chicago Health Atlas, are stripped of private data beforehand. What’s left is information designed to compare groups, not individuals.

The goal in data handling is to make sure identities can’t be guessed.

“There is a difference between privacy and security issues,” says Brad Malin, vice chair of biomedical informatics at Vanderbilt University School of Medicine in Nashville. Malin advised on safe handling of atlas data.

“The power of opening up the data is giving people some quick intuition about issues that deserve study,” Malin says. “You can take diabetes and look for a correlation where there are food deserts.”

Chicago Health Atlas

Chicago Health Atlas: Adult diabetes rates, 2006 through 2012.

The Chicago Health Atlas maps shows high diabetes levels across a large swath of Chicago’s South and West sides. Hospital records suggest the highest prevalence in North Lawndale’s 60623 ZIP code, with the most hospitalizations. An animation shows hospitalizations year by year, with the highest recent rate in Calumet Heights’ 60619 area.

Before giving statistics to outsiders, hospitals remove names and other identifiers, such as birthdays or treatment dates. “You may see residents of one neighborhood with an increased chance of having that diagnosis,” Malin says, “but this system will not allow you to drill down on any factors. There’s no individual-level data.

Health workers also withhold unique cases, where a patient might be identified from a combination of sources and guesswork. “We did not investigate rare disorders in the Health Atlas,” Malin says. “You never disclose information on less than five people.”

This can be a sticky issue for agencies that tackle public health emergencies. In a privacy panel at last fall’s Chicago School of Data conference, City of Chicago informatics project manager Matthew Roberts noted that information like date, sex, county and age might be enough to reveal the identity of a West Nile virus victim.

“If you take a look at the obituaries in a small county,” Roberts said, “for any of those given days where the date of death was mentioned, you could pretty quickly figure out who was the 84-year-old male who had died from disease x.”

Federal guidelines recommend making some data more general to protect privacy. In a case like a West Nile virus death, health workers will giving an age range, or a wider area such as northern Illinois.

Data mining also figures into how much information is released. Health workers consider whether identities can be pieced together from multiple sources. That’s a real danger in data breaches: Hackers mine social media profiles to work up enough information to make a false credit application or tax refund filing.

To study medical outcomes by neighborhood, several years of data might be combined to cut the chances that individuals might be re-identified.

“It’s safety in numbers,” Malin says. “You put your faith in that a certain number of individuals are enough to protect the anonymity of everyone in the group. As you get more specific, the risks go up.“

Still, there are dangers to being profiled as a group. Chicago community activists fought for years against insurers identifying whole neighborhoods as bad risks. Battles against home insurance redlining ultimately were resolved in court.

Health care reform bans insures from denying coverage for pre-existing conditions. However, the Affordable Care still allows higher rates by location. The rules require broad areas, no smaller than an entire county. But higher costs still may keep some insurers out of urban areas.

“What are the risks? It’s not quite clear,” Malin says. “In this situation the dangers are group-based. The regulations are defined with respect to individuals. “

Citizens give up bits of their privacy every day to stores or websites tracking their habits, with few complaints if it keeps prices low. But we treat medical care as a public good. We accept that some small piece of our health interactions are for the greater good, whether it’s teaching interns on hospital rounds or stopping infectious disease outbreaks. Our medical care is confidential, but not exactly secret.

“For 150 years, there’s been the expectation that medical information will be used for the public benefit,” Malin says. “In any teaching hospital, or any for-profit hospital for that matter, the information can be reused, unless you decide to be an anonymous patron who pays out of pocket.

“At the end of the day it’s a risk-utility tradeoff,” Malin says. “Unless somebody is actually harmed, they’re not going to see this as a risky situation. These are questions on the table as we move into a data-based society.”

Dan O’Neil on WBEZ re: the Limits of Open Data

WBEZ LogoOn Monday, Smart Chicago Executive Director Dan O’Neil went on WBEZ’s Tech Shift to talk about what the Homan Square story says about open data in Chicago.

Dan wrote a blog post on both the Smart Chicago blog and his own personal blog with his thoughts on the issue.

Dan spoke about how he’s been a big fan of the open data policy, but that we’ve run right against the limits of open data. Some things are just not publishable, and data that does get published has limited utility. The crime incident data, for instance, has always been limited and the city’s always been upfront about it. (More here.)

Dan also spoke about how the Open Data movement has had the general idea that if we release data, steps 2-10 (the civic innovation) will occur all on it’s own and how this may not be true.

Dan stated that we can’t have a data-first solution for civic tech. We have to start with people and with what they know.

Boohood also asked about the recent discovery of missing crime data on the data portal that was uncovered by the Crime in Wrigleyville and Boystown blog.  Dan responded by stating that the people who ran the blog did a great service – they used the portal to find a specific case – spoke out about it – and resulted in more data being added.

Here’s the whole interview:

Add your own data sets to

Today, released a new feature that allows you to add your own data sets to

Brett Goldstein Presenting at Code for America Summit 2014 was conceived as a centralized hub for open datasets from around the country. Funded by the National Science Foundation and the MacArthur Foundation, and led by a team of prominent open data scientists, researchers, and developers, it is a collaborative, open-source solution to the problems inherent to the rapid growth in government data portals.

Today, the team added a new feature that allows people to submit their own datasets to be used by Currently, is able to accept any URL to a comma separated value (CSV) or link to a dataset on a Socrata data portal (like that has fields with the following attributes:

  • Unique id: a field that is guaranteed to contain a unique number for every row in the dataset, even if rows are updated
  • Observation date: a date or datetime field for each observation
  • Latitude/Longitude or Location: either two fields with latitude and longitude , or a single field with both of them formatted (latitude, longitude)

If you have a dataset that has these feilds you can enter them on the website and it’ll be reviewed by the team. 

Tonight on Smart Chicago Live: The City of Big Data at OpenGov Hack Night

At tonight’s OpenGov Hack Night, Bo Rodda will talk about his work with the interactive 3D city model at the City of Big Data Exhibition.

Rodda is currently working on a platform to allow anyone to submit their own visualization for public display. Learn more tonight at hack night (every Tuesday at 6:00pm at 1871) or by tuning into our live stream at

Chicago: City of Big Data

City of Big Data exhibit at the Chicago Architecture Foundation. Photo by John Tolva

We’ll post the live stream below the fold at start time!

Continue reading