Thursday, June 7, 2007

OlinDocs datbase generation (part 1)

All right. I'm finally going to talk a bit more about OlinDocs. (btw, I'm sad that I haven't gotten, like, anything)

Let's talk making a database with Python. In order for this site to work, I need to put variables that define the documents and their metadata into an easy to access array. My setup is basically one line gets all the names and one line gets all the meta data. For example:

data_names=['my car','my pen'];
// brand,name,weight
data=[['Honda',''Accord','like a ton'],['Pilot','G2','something in ounces']];


I think I'll talk about some Python stuff today and put it all together tomorrow. That being said, here's a link to the text file for my generator. And if you want to get the .py file you can right-click here and save it.

File management in Python:
-First we need to import os. This let's us use all the other commands that we'll need.
-Now try
location='/Documents and Settings/'
for file in os.listdir(location):
print file


Nice. This lets us see files. We can pretty much do anything like copy files, remove files, make directories, rename files etc. but I won't put that all here; that's what the internet's for.

Making
files:
Making text files in python is useful for myriad reasons. For example, they're persistent (thus useful for saving data) and usable by other programs. The way you make a text file is just by opening it:
new = open(location+'/newfile.txt','w')

The w means open it in write mode. You can also open it in append mode or read mode (a and r respectively).

Now that it's open we can put stuff in it:
new.write('Hello World')

If you need tabs or new lines use \t or \n. If you need a \ you will have to escape that to \\.

If it were open in read mode, we could do new.read() or new.readline() to get a string that has the entire file or the next line of it.

String Tricks
This is likely old hat to a lot of you but strings can be manipulated in a lot of powerful ways. For example we can use replace to find a string and change it with another string by using string.replace('s1','s2'). And there's also one of my favorite things in Python: string.split('s3'). This returns a list of items that were separated by some marker (eg comma-separated value files [.csv]). These can be used in-line to give you a lot of firepower for very little real-estate. A cute thing that my program does is
path_file=open('/Documents and Settings/bdieseldorff/My Documents/OlinDocs/path.txt','r')
path=path_file.read().replace('C:','').replace('\\','/').split('\n')

This baby takes a text document that has a path with all the directories I need to look at in a form that I can copy and paste from Windows explorer and turns it into a list of locations that python understands. -First it reads the whole file
-Then it replaces C: with nothing to leave just \dir\subdir\subsub etc.
-Then it changes all of the \ with / (remember \ is escaped)
-Finally it makes a list of locations with every line break defining a new location in the list.
Pretty neat. So much stuff in so little space. Sweet.

Cool now my program nows where to look and we know how to write stuff in files. Tune in tomorrow for more on how we get from this to a complete database.

In other news I played a little bit of soccer today. I am incredibly out of shape. I'm gonna start running daily (starting tomorrow evening actually).

No comments: