July 12, 2014

Ruby / D3js / JSON

Reading time:
3 min. | 541 words

I took a moment yesterday morning to start work on a Ruby script to convert CSV files to JSON. My main motivation for writing the script was to get data into a format that more easily works with D3.js.

D3 has two main ways to load data: JSON or CSV. Although CSV has its benefits, the data is untyped. Because of that, D3 loads all the data as strings rather than differentiating between strings and integers. I could do the conversion on the fly with D3 pretty easily using built-in functions or coerce the data with JavaScript. But sometimes I like prepping data before I begin working with it.

The script builds JSON using the first line of the CSV file to determine the appropriate elements and the corresponding data, as well as checking that integers and strings are converted to their appropriate (typed) formats. And the script seems fairly robust; I noticed hardly a pause when throwing a 14,000-line CSV file at it. There are no built-in specifics to the data mapping; you’ll need to handle that either by modifying the script or handling it in D3.

The script:

#!/usr/bin/env ruby

# Parse CSV files and convert them to JSON.
# Mostly used for preparing data for D3.js. I don't like using untyped
# CSV files, so this script begins to clean things up for me.

# Usage: ./parse.rb /path/to/input.csv /path/to/output.json

# Jason A. Heppler | jason@jasonheppler.org | jasonheppler.org
# MIT License <http://heppler.mit-license.org/>
# Created: 2014-07-10

require 'rubygems'
require 'json'
require 'csv'

def is_int(str)
  # Check if a string should be an integer
  return !!(str =~ /^[-+]?[1-9]([0-9]*)?$/)

lines = CSV.open(ARGV[0]).readlines
keys = lines.delete lines.first

File.open(ARGV[1], "w") do |f|
  data = lines.map do |values|
    is_int(values) ? values.to_i : values.to_s
  f.puts JSON.pretty_generate(data)

Say I give it an input like this (postwar population of Bay Area cities):

date,San Jose,Palo Alto,Sunnyvale,Mountain View,Santa Clara

And in return I get this:

    "date": "1940",
    "San Jose": "68457",
    "Palo Alto": "16774",
    "Sunnyvale": "4373",
    "Mountain View": "3946",
    "Santa Clara": "6650"
    "date": "1950",
    "San Jose": "95280",
    "Palo Alto": "25475",
    "Sunnyvale": "9829",
    "Mountain View": "6563",
    "Santa Clara": "11702"
    "date": "1960",
    "San Jose": "204196",
    "Palo Alto": "52287",
    "Sunnyvale": "52898",
    "Mountain View": "30889",
    "Santa Clara": "58880"
    "date": "1970",
    "San Jose": "445779",
    "Palo Alto": "55966",
    "Sunnyvale": "95408",
    "Mountain View": "51092",
    "Santa Clara": "87717"
    "date": "1980",
    "San Jose": "629442",
    "Palo Alto": "55225",
    "Sunnyvale": "106618",
    "Mountain View": "58655",
    "Santa Clara": "87746"
    "date": "1990",
    "San Jose": "782225",
    "Palo Alto": "55900",
    "Sunnyvale": "117229",
    "Mountain View": "67460",
    "Santa Clara": "93613"
    "date": "2000",
    "San Jose": "894943",
    "Palo Alto": "58598",
    "Sunnyvale": "131760",
    "Mountain View": "70708",
    "Santa Clara": "102361"
    "date": "2010",
    "San Jose": "945942",
    "Palo Alto": "64403",
    "Sunnyvale": "140081",
    "Mountain View": "72510",
    "Santa Clara": "112466"

Note that the data transformation for this particular example probably isn’t the best. A more useful JSON construction would group the data together with child nodes, for example:

    "name": "San Jose",
    "values": [
        "date": 1940,
        "population": 945942
        "date": 1950,
        "population": 95280

But if you need a simple CSV to JSON transformation, the script starts getting you there.

All Posts by Date · All Posts by Category


Greetings! My name is Jason Heppler. I am a Digital Engagement Librarian and Assistant Professor of History at the University of Nebraska at Omaha and a scholar of the twentieth-century United States. I often write here about the history of the North American West, technology, the environment, politics, culture, and coffee. You can follow me on Twitter, or learn more about me.