Published:


Categories:
ruby / scripting / terminal

Reading time:
1 min. | 136 words

I was playing around with Ruby the other night and wrote a simple n-gram generator. In case anyone is interested, here is the script:


#!/usr/bin/env ruby -w
# r_ngram.rb
# Generate a simple bi- and tri-gram, prints to STDOUT
# Usage: ruby ngram.rb file.txt
# To save the output to a file: ruby ngram.rb file.txt > output.txt

$words = File.read(ARGV[0]).downcase.scan(/[a-z]+/)

bi_grams = Hash.new(0)
tri_grams = Hash.new(0)

num = $words.length - 2
num.times {|i|
  bi = $words[i] + ' ' + $words[i+1]
  tri = bi + ' ' + $words[i+2]
  bi_grams[bi] += 1
  tri_grams[tri] += 1
}

puts "## -- bi-grams -- ##"
bg = bi_grams.sort{|a,b| b[1] <=> a[1]}
(num / 10).times {|i| puts "#{bg[i][0]} : #{bg[i][1]}"}
puts "\n"
puts "## -- tri-grams -- ##"
tg = tri_grams.sort{|a,b| b[1] <=> a[1]}
(num / 10).times {|i| puts "#{tg[i][0]} : #{tg[i][1]}"}

About

Greetings! My name is Jason Heppler. I am a Digital Engagement Librarian and Assistant Professor of History at the University of Nebraska at Omaha and a scholar of the twentieth-century United States. I often write here about the history of the North American West, technology, the environment, cities, politics, and coffee. You can follow me on Twitter, or learn more about me.

Where

Search