WikiPoetry
I wanted to use webpages as a source text for my RWET final project. This turned out to be more difficult than I had imagined, as many websites now use formatting and frameworks that make it difficult to access the actual text of the page. I ended up settling on Wikipedia, since it remains spare and simple.
I extended and tweaked my epigrammatic poetic form to include words up to 14 letters long. I am not entirely convinced that this was a good decision, because many of the results sound like a pretentious fourteen-year-old trying to sound smart. There were a few gems though.
This poem comes from the entry for epigram (how could I resist?). I love the alliteration and the Latin words that sneak in.
contemporary inscriptional copied epigrams epigramma selectively the strokes edit contemporaries in considered longa
This one was generated from the entry for Klezmer. I feel like there’s a real story behind it.
Transylvania knowledgeable Mickey bursting generally traditional the Yiddish Also transcriptions of pejorative Moshe
This one was generated from the entry for Nabokov:
Nevertheless consciousness object children invention Comparative not English They lepidopterists He descending essay
Here’s one from Rosalind Franklin’s entry:
laboratories understanding became proposal knowledge significant and studied upon acknowledgment at importance space
And here is the code that makes it all possible:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | # Kim Ash # wikipoem.py # creates poem from Wikipedia page for command line input from bs4 import BeautifulSoup import urllib import sys import re import random sourceText = '' # list of 14 empty lists, one for each word length words_by_len = [ [], [], [], [], [], [], [], [], [], [], [], [], [], [], ] # list for words that will be in poem poem_words = list() poem = '' def extract_text(tag): if hasattr(tag, "name") and tag.name in ["ul", "ol", "table"]: return "" else: tag_string = tag.string if tag_string is None: children = tag.contents result = '' for child in children: child_text = extract_text(child) result += child_text + ' ' return result else: return tag_string.strip() # here's how to fake a user agent string with urllib # necessary to access articles on Wikipedia class FakeMozillaOpener(urllib.FancyURLopener): version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' urllib._urlopener = FakeMozillaOpener() # get input from command line, retrieve Wikipedia entry for input term term = sys.argv[1] url = 'http://en.wikipedia.org/wiki/' + term data = urllib.urlopen(url).read() soup = BeautifulSoup(data) sourceText += extract_text(soup.p) for sibling in soup.p.next_siblings: sourceText += extract_text(sibling) #re.sub(r"\[\s\w{1,}\s\]", "", sourceText) for i in range(len(words_by_len)): #find words of each length (i+1 because range() starts at 0) regexp = r"\b\w{" + str(i+1) + r"}\b" for match in re.findall(regexp, sourceText): words_by_len[i].append(match) #randomly select words for use in poem for i in range(len(words_by_len)): poem_words.append(random.choice(words_by_len[i])) print poem_words[11] + " " + poem_words[12] + " " + poem_words[5] print poem_words[7] + " " + poem_words[8] + "\n" print poem_words[10] + " " + poem_words[2] + " " + poem_words[6] print poem_words[3] + " " + poem_words[13] print poem_words[1] + " " + poem_words[9] + " " + poem_words[4] |