WikiPoetry

I wanted to use webpages as a source text for my RWET final project. This turned out to be more difficult than I had imagined, as many websites now use formatting and frameworks that make it difficult to access the actual text of the page. I ended up settling on Wikipedia, since it remains spare and simple.

I extended and tweaked my epigrammatic poetic form to include words up to 14 letters long. I am not entirely convinced that this was a good decision, because many of the results sound like a pretentious fourteen-year-old trying to sound smart. There were a few gems though.

This poem comes from the entry for epigram (how could I resist?). I love the alliteration and the Latin words that sneak in.

contemporary inscriptional copied
epigrams epigramma

selectively the strokes
edit contemporaries
in considered longa

This one was generated from the entry for Klezmer. I feel like there’s a real story behind it.

Transylvania knowledgeable Mickey
bursting generally

traditional the Yiddish
Also transcriptions
of pejorative Moshe

This one was generated from the entry for Nabokov:

Nevertheless consciousness object
children invention

Comparative not English
They lepidopterists
He descending essay

Here’s one from Rosalind Franklin’s entry:

laboratories understanding became
proposal knowledge

significant and studied
upon acknowledgment
at importance space

And here is the code that makes it all possible:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Kim Ash
# wikipoem.py
# creates poem from Wikipedia page for command line input
 
from bs4 import BeautifulSoup
import urllib
import sys
import re
import random
 
sourceText = ''
 
# list of 14 empty lists, one for each word length
words_by_len = [ [], [], [], [], [], [], [], [], [], [], [], [], [], [], ]
 
# list for words that will be in poem
poem_words = list()
poem = ''
 
def extract_text(tag):
  if hasattr(tag, "name") and tag.name in ["ul", "ol", "table"]:
	  return ""
  else:
	  tag_string = tag.string
	  if tag_string is None:
		children = tag.contents
		result = ''
		for child in children:
		  child_text = extract_text(child)
		  result += child_text + ' '
		return result
	  else:
		return tag_string.strip()
 
# here's how to fake a user agent string with urllib
# necessary to access articles on Wikipedia
class FakeMozillaOpener(urllib.FancyURLopener):
  version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
urllib._urlopener = FakeMozillaOpener()
 
# get input from command line, retrieve Wikipedia entry for input term
term = sys.argv[1]
url = 'http://en.wikipedia.org/wiki/' + term
 
data = urllib.urlopen(url).read()
soup = BeautifulSoup(data)
 
sourceText += extract_text(soup.p)
 
for sibling in soup.p.next_siblings:
	sourceText += extract_text(sibling)
#re.sub(r"\[\s\w{1,}\s\]", "", sourceText)
 
for i in range(len(words_by_len)):
#find words of each length (i+1 because range() starts at 0)
	regexp = r"\b\w{" + str(i+1) + r"}\b"
	for match in re.findall(regexp, sourceText):
		words_by_len[i].append(match)
 
#randomly select words for use in poem
for i in range(len(words_by_len)):
	poem_words.append(random.choice(words_by_len[i]))
 
print poem_words[11] + " " + poem_words[12] + " " + poem_words[5]
print poem_words[7] + " " + poem_words[8] + "\n"
print poem_words[10] + " " + poem_words[2] + " " + poem_words[6]
print poem_words[3] + " " + poem_words[13]
print poem_words[1] + " " + poem_words[9] + " " + poem_words[4]