Civic Hax

A blog probably about FOIA and civic hacking.

Losing a 5-year-long Illinois FOIA lawsuit for database schemas

March 2, 2025 — Matt Chapman

Thomas Ptacek, a friend and expert witness in this lawsuit summed it up best in the court's hallway while walking within three feet of opposing counsel: "This is fucking stupid".

His companion post explains why.

Intro

Working with the City of Chicago's parking ticket data—which I've received through FOIA—has always been a pain, especially in terms of knowing what exactly to request. In August 2018, I attempted to generally solve that problem, by submitting a request for the following:

An index of the tables and columns within each table of CANVAS.
Please include the column data type as well.

Per the CANVAS specification, the database in question is Oracle, 
so the below SQL query will likely yield the records pursuant to this request:

select utc.column_name as colname, uo.object_name as tablename, utc.data_type as type
from user_objects uo
join user_tab_columns utc on uo.object_name = utc.table_name
where uo.object_type = 'TABLE'

CANVAS Database Schema request on Muckrock

After the City initially denied the request with an argument that the records' release would compromise network security, I took the denial to court where we initially won at-trial. The City then appealed, which we won as well. The case ultimately went up to the Illinois Supreme Court, where we lost unanimously. Better Government Association did a good explainer of the consequences of that loss, which boils down to a significant broadening of public agencies' leeway to apply exemptions (i.e., withhold records or redact information) in response to FOIA requests.

Why Go Through All of This?

Under Illinois FOIA case law, if a request's responsive documents—the set of records or information within the scope of that request—are stored in a queryable database, a query must be written. The requester is not required to write the query. The law even requires the agency to give you the data in a format of your choice (protip: "excel format"). When it works, it's freaking great. Reality makes it difficult for a number of reasons, though:

  • The FOIA officer will likely need to defer any querying to a colleague who is a "data person."
  • You can't just ask a question:"FOIA does not compel the agency to provide answers to questions posed by the inquirer."
  • From the requester's perspective, "Is X column requestable?" isn't answerable without first attempting to request that column's data.
  • Requesting too many columns will likely lead to time-consuming back-and-forth, or a flat-out denial.
  • Even though Illinois FOIA requires that a requester be given a chance to narrow their request, FOIA officers sometimes just stop responding during this "conferral" process.

To generally work through this problem, many folk will spend hours surfing through PDFs, reports, contracts, work products, etc, just to get a sense of what data might exist. This process is frustrating and often yields incomplete results. Let's walk through my attempt with CANVAS.

First Attempts for Parking Ticket Data

My very first FOIA request was pretty narrow and sought the City's towing data. The City was unable to get me what I requested for reasons I can't seem to find, but it painted a picture that the Chicago doesn't really track how cars are towed.

A month later, the project began shifting towards parking ticket data in addition to towing data, so I requested:

all raw towing and parking violation records available in the CANVAS system and any records that are from imported/interpolated from non-CANVAS systems.

This request was denied. The Department of Finance argued that the request would take anywhere between 280 to 400 hours to complete:

There are 55 million ticket records and 928K seizure records in CANVAS. As far as tow information, we only have knowledge of when a vehicle is towed due to a boot and released. The Department of Finance's application support vender estimates a minimum of 60-80 hours to design/develop/test and run the program.

In addition, since this is like a conversion to another system, we are not sure how long it would take to transfer so much data, a rough estimate would be an additional 80-120 hours to design a solution to get all the data on some kind of media for retrieval. Compliance with this request as currently written would take approximately 140-200 hours utilizing our vendor's resources to the exclusion of other work assignments.

A couple months and some phone calls later, I submitted a narrower request, which was successfully fulfilled, because I included an explicit list of fields. After honing the request language a a bit more, I was eventually able to get the data used in the analysis of my first blog post.

But Wait, Is There More?

Despite getting the limited information I had requested, I still wanted to expand my analysis, which required knowing what other information exists within CANVAS. So, I submitted another request for high-level and low-level system information:


1. Code for CANVAS
2. Logs of CANVAS and/or CANVAS log analysis. 
3. Documentation for the operation of CANVAS, including how information is stored, what kind of database is used, along with any other technical documentation or generic documentation.
4. Any Wiki page related to CANVAS.
5. Any analysis of City parking ticket levels or trends.
The only records the City sent in response was a lackluster spreadsheet with just 100 rows, broken down by ward. I'm still not sure if this was the only analysis ever done at the time, but let's get back to the meat of this blog post.

1, 2, and 3 were denied because:

[The records] could be used in a security breach against CANVAS and jeopardize the security of the system, therefore it is being withheld.

But with the goal of just figuring out what information exists, the request was extremely wide and could have been narrowed to something more akin to a "data dictionary". To this day, I've never been able to get anything like a data dictionary from the City, though there is a contractual obligation—as described in the RFP spec for this $200 million system—for the City to maintain something like that! But alas, at least in 2018, the City claimed they don't have anything like it.

https://www.documentcloud.org/documents/25537825-document/#document/p180/a2624483
—Professional Services Agreement Between the City of Chicago Department of Finance and Department of Administrative Hearings and IBM Corporation: City of Chicago Violation, Noticing and Adjudication Business Process and System Support, p. 180 (2012)

Requesting Database Records from All City Databases

Sensing a pattern of a general failure to maintain data dictionaries, despite the City's public support for launching one, I submitted a FOIA request to every City agency for the following:

1. A short description of the database.
2. The names of the applications that are able to run queries/inserts.
3. All usernames and permissions
4. All database table names.
5. All column names in each table.
6. A description of each column.
7. Number of rows in each table.

A couple weeks later, Chicago's Department of Law sent me a letter on behalf of every agency and denied all parts, 1 through 7, of that request.

First, they argued that they would need to "create a new document":

First, no City Department is in possession of a document which contains the information you seek. The only way to compile the requested information, to the extent it is not exempt for one or more of the reasons noted below, would be to create a document.

Then, they requested a pedantic clarification about what "database" means:

Your request does not provide a definition of the term database. A commonly accepted definition of "database" is collection of pieces of information that is organized and used on a computer. http://www.merriam-webster.com/dictionary/database. Such a broad definition would include Excel spreadsheets. It would be unduly burdensome to the operations of each of the City's Departments to search every computer in use by its personnel in order to identify, open,review and catalogue each database and every Excel spreadsheet in the manner you request."
But even with all of that, they offered a helpful suggestion, and pointed to the City's "data dictionary":
Please note that in late 2013, the City of Chicago launched a publically available Data Dictionary which can be found at http://datadictionary.cityofchicago.org/. It is described as “a resource for anyone who is interested in understanding what data is held by City agencies and departments, how and if it may be accessed, and in what formats it may be accessed.”
Cool! It's a damn shame the system shutdown less than a year later, though.

"Metalicious": Chicago's Failed Data Dictionary

A lot of government agencies have absolutely recognized the problem of the public not knowing what information exists, including Chicago. One such attempt at fixing this problem is to voluntarily make the columns and table names of their databases open to the public, like the Department of Justice's PDFs of table names, column names, and descriptions of both.. There's even an open specification for government database schemas!

But even with agencies voluntarily making schema information public, such releases are effectively discretionary and are outside of the realm of FOIA.

One such release of discretionary information, as the Department of Law mentioned in their denial letter, is the 2013-released city-wide data dictionary project called "Metalicious". That's the actual name.

Metalicious was funded by a $300,000 John D. and Catherine T. MacArthur Foundation grant to UChicago's Chapin Hall, with the intended purpose of making table names, column names and descriptions of both publicly accessible. It's the City's "data dictionary".

CANVAS!

Schema information of the Chicago Budget System on Metalicious (2016)

An example of a system whose database schema information is released was the Chicago Budget System (CBS). A total of 110 tables are listed, with descriptions and a link to each table's columns. An interesting table worth investigation on its own is, BOOK_ALDERMANIC_PAYRATE, which is described as, "data used for creating pay schedule for aldermanic staff published in the Budget Book". Good to know!

Metalicious received some attention in civic data circles:

Journalists and civic inquisitors can use it to determine what information is available when composing Freedom of Information Act requests. Based on my own experience, knowing what to even ask for has been a challenge. All that is over.

All That Is Over: Its Inevitable Shutdown

Within a few short years, the project ostensibly shut down and its front page was replaced with a message about being down for "temporary maintenance". That temporary maintenance has been ongoing for about nine years now.

Down For Maintenance

Back in 2018, I asked the City's now-former Chief Data Officer Tom Schenk why it was shut down, and he explained:

Metalicious was retired because of lack of resources to expand it (originally grant funded). It had some, but very, very small proportion of databases. There was security review of any published data and some information was withheld if we felt it could undermine the application security. By Info Sec policy, it is confidential information until a review deems it appropriate for public release--same as the open data workflow which mirrors the FOIA workflow.

RIP.

Down For Maintenance |Last-Known Running | Metalicious GitHub

Requesting Metalicious

Okay, that's not surprising, but since the first goal here was to figure out whether column and table names are requestable, I submitted my request for the MySQL dump of Metalicious. As these things go, that request was also denied:

Please be advised the Department of Innovation and Technology neither maintains nor possesses any records that are responsive to your FOIA request.

So, I submitted another request and was sure to included a quote from a press release that was explicit about the Department's ownership of Metalicious.

They eventually sent me a copy of a MySQL dump with about 150 databases' columns and table names, including their descriptions. Neat! Progress!

To me, this reasonably shows that the City can provide table names and column names of City databases under IL FOIA.

The CANVAS Request and Trial

This brings us back to the FOIA request for the CANVAS database schema, which was twice appealed and died at the Illinois Supreme Court.

The request included a SQL statement for the City to run in order to fulfil the request. I made some small mistakes that bit me later, which is ripe for another whole post. Essentially, the City denied the request by arguing that the release of this information would jeopardize the security of Chicago's systems:

Your request seeks a copy of tables or columns within each table of CANVAS. The dissemination of these pieces of network information could jeopardize the security of the systems of the City of Chicago.  Please be advised that even if you were to narrow your request, certain records may be withheld from disclosure under the exemptions enumerated in the FOIA, including but not limited to the exemption set forthin 5 ILCS 140/7(1)(o).

I disagree wholeheartedly and Thomas Ptacek goes into more detail in his companion post.

Upon recieving this denial, I reached out to my attorneys at Loevy & Loevy who agreed to sue.

"Civic Hacker"

Eventually there was a trial in January 2020. During the trial, the City's attorneys argued that my intent was nefarious:

They are seeking the ability to have information that helps Mr. Chapman, civic hacker, go into the system and manipulate the data for whatever means he sees fit. That is not something that FOIA requires the City to do.

I have no idea where they came up with the idea that I wanted to manipulate their data, especially considering that just four months earlier, I was asked to help the City with parking ticket reform.

While we were waiting for the trial date, Kate LeFurgy, Director of Comms for the Office of the Mayor, reached out to me and asked if I could help with some parking ticket analysis (for free). I agreed, and compiled a spreadsheet detailing how a large number of vehicles received a disproportionate number of tickets—groupings that highlight, for example, one vehicle which received at least three tickets per week for 41 continuous weeks.

This is incredible. I can't thank you enough as to how helpful this was. I truly appreciate your time and talents on this work. It has been invaluable in shaping the reform measures we hope to put in place later this year.
-Kate LeFurgy | Fri, Aug 23, 2019

Those good spirits did not last long, and LeFurgy did not respond to my emails asking for thoughts on the CANVAS litigation.

Privacy When It's Convenient

Chicago's expert witness, Bruce Coffing, said in court:

In this particular case we are saying, I'm saying that from defending this, our constituents' information, their private information, one of the things that helps us defend that system is not making this [schema information] available.

It is not the only thing we do. We do many things. But I don't want to make it easier for the bad guys and bad gals out there to attack our system and let— put our constituents' private data at risk.

This argument is striking to me, because the City has already shared so much private data through FOIA.

For instance, in 2018, when I requested parking ticket data from the Department of Finance, their FOIA officer told me that they could not include both license plates andthe vehicles' registered address. To resolve this issue, they offered to remove license plate data and only provide addresses.

However, they had already given me the license plate data of millions of ticketed vehicles, in response to a different, earlier FOIA request. So, I received registered home addresses from one request, and license plates from another.

The responsive records from these two separate FOIA requests can easily be paired.

To demonstrate the extent of this problem, I created this visualization which shows the scale of private information disclosed by the Department of Finance: vehicle addresses from every U.S. state, including 11,057 unique addresses of Texas vehicles and 48,707 from Michigan.

I've been told by a reliable source that the Department of Finance no longer sends license plates nor registered addresses in response to FOIA requests.

Next Steps

The whole point of this entire thing was to make it easier to request data through FOIA. Ultimately, the goal is to simply send a SQL statement to an agency for them to run, and avoid so much of the usual nonsense. Basically, an API.

Relatedly, these two bills from last year were interesting, and sought to fix the IL Supreme Court's bad decision. But they didn't go anywhere during last year's session.

Fortunately this year, a new bill was filed with the addition of this language:

[...] and shall include the identification and a plain-text description of each of the types or categories of information of each field of each database of the public body. [...] and shall provide a sufficient description of the structures of all databases under the control of the public body to allow a requester to request the public body to perform specific database queries.
That's pretty neat! I hope it passes.