I am sure everyone who took COSC 2320 with Dr. Anderson remembers the word count program. Well after going through some of the tutorials of python here is the same program in python
Not only is it smaller, but its also easier to understand. Hopefully more to follow
import re
import string
#dictionary to store words and their counts
word_count = {}
#read in text document line by line
for line in open("trial.txt").readlines():
#remove leading and trailing whitespace
line = string.strip(line)
#split the string into words
#based on whitespace, punctuation, digits
for word in re.split("["+string.whitespace+string.punctuation+string.digits+"]",line):
#make the word lower case
word = string.lower(word)
#check if it is actually a word
if re.match("^["+string.lowercase+"]+$",word):
#increment count if true
if word_count.has_key(word):
word_count[word]+=1
#else add entry
else:
word_count[word] = 1
for w in word_count:
print w, ":" ,word_count[w]
Edit: Some more playing around
import re
import string
word_count = {}
text = open("trial.txt").read();
#list of words delimited by whitespace, punctuation and digits
#iterate by words in returned list from split
#lower case all the words in the text
words = re.split("["+string.whitespace+string.punctuation+string.digits+"]",string.lower(text))
#go through the list
for i in range(0,len(words)-1):
#as long as the word in the list is a word and is not already a key
if re.match("^["+string.lowercase+"]+$",words[i]) and not word_count.has_key(words[i]):
#add to the dictionary and get the count from the list
word_count[words[i]] = words.count(words[i])
for w in word_count:
print w,":",word_count[w]
1 comment:
Thanks, Kashif. Just learning Python and surfed the internet for a word count program until I finally found your excellent script. best, la
Post a Comment