Saturday, January 14, 2006

Relation-chaining gimmick in Python

This is just a piece of a whole idea, but I thought you might find it amusing.

There is an RDF notation called N3 that has an interesting syntax for quickly describing a chain of relations linking one node to another.

Here are some examples from the N3 paper:

:joe!fam:mother!loc:office!loc:zip   Joe's mother's office's zipcode
:joe!fam:mother^fam:mother Anyone whose mother is Joe's mother.
:joe ("Colon Joe" -- sounds like a brand of medicinal coffee, doesn't it?) is an RDF node. The other strings with colons in them, fam:mother, loc:office, and loc:zip, represent binary relations. I think the part before the colon tells you the namespace it's from, but ignore that for now.

The ! is like the "." operator in most object-oriented languages. :joe!fam:mother means "Joe's Mother", and you might say this as joe.mother in some language like Python or C.

The ^ is the converse of !; so if joe!mother == sue, then sue^mother == joe. You can think of !mother as meaning "mother" and ^mother as meaning "child". ^employer means employee, ^owner means possession. "hello, world"!length == 12, so 12^length == "hello, world" (among many other strings).

My mind was brought back to this when I was working on a project with a bunch of related objects in Python, and I was looking for a clean way of expressing complicated queries among them. In my day job I do a lot of ad-hoc SQL querying, and while I don't love SQL, it is nice to be able to fluently, readably, ask some complex questions of a bunch of relations. I'd read that list comprehensions were more or less isomorphic to SQL, but I found when playing with them that my queries were inefficient and not very readable; they were no better than just writing explicit code to traverse the object tree and collect the information I wanted.

I'm not sure if these N3-style paths will be useful in practice, yet, but it was worth playing around with them. The biggest hurdle with them, as I see it, is that there's no easy way when presented with 12^length to know what string is wanted, so you have to have a well-defined universe of objects to work with, and be willing to accept multiple results from every query.

So anyway, here's some code to give you an idea what I've come up with so far:

import pprint

class relation:
def __init__(self):
self.forward = dict()
self.backward = dict()

def add(self, arg1, arg2):
if not self.forward.has_key(arg1):
self.forward[arg1] = dict()
if not (arg2 in self.forward[arg1]):
self.forward[arg1][arg2] = True
if not self.backward.has_key(arg2):
self.backward[arg2] = dict()
if not (arg1 in self.backward[arg2]):
self.backward[arg2][arg1] = True

def __call__(self, arg1, arg2):
return self.forward[arg1][arg2]
except Exception, e:
return false

def get_all_keys(self, hash, key):
return hash[key].keys()
except Exception, e:
return []

def __rlshift__(self, other):
result = []
for item in other:
result = result + self.get_all_keys(self.forward, item)
return result

def __rrshift__(self, other):
result = []
for item in other:
result = result + self.get_all_keys(self.backward, item)
return result

father = relation()
phone = relation()

class frank: pass
class jeff: pass
class chris: pass
class andy: pass

father.add(frank, jeff)
father.add(jeff, chris)
father.add(jeff, andy)
phone.add(andy, "101-555-1249")
assert [chris] >> father == [jeff]
assert [jeff] << father == [chris, andy]
assert [chris] >> father == [jeff]
assert [jeff] << father == [chris, andy]
assert [andy] << phone == ["101-555-1249"]
assert [chris] << phone == []
assert [andy, chris] << phone == ["101-555-1249"]
assert (([chris] >> father) << father) << phone == ["101-555-1249"]
assert [chris] >> father << father << phone == ["101-555-1249"]

Some things to note about this implementation:
  • I used classes like "jeff" and "chris" for nodes. You could use any Python objects; the "class X: pass" syntax is just a cheesy quick way to create a dummy object to play around with. In Python a class is a kind of object.
  • I used >> and <<>
  • nstead of the RDF "node", I use a list with one or more python objects in it, in this case strings. I'm using a list as kind of a half-assed monad. Since a >> or <<>
  • I define a relation object which holds all its pairs of related things in hashes. It's probably not very efficient. I originally thought I'd just use the members of objects as the relations, so that the natural joe.mother would be the basis for the relation joe >> mother. But that didn't fit in very well with the list thing. If Heather has two mommies, you can have:

    mother("Ann", "Heather")
    mother("Patricia", "Heather")
    assert ["Heather"] >> mother == ["Ann", "Patricia"]
    as opposed to

    heather.mother = ann
    heather.mother = patricia
    where Patricia just overwrites Ann. This makes Ann sad.

    Maybe there's a better way around this.

  • I'm not sure I've got the precedence the same as N3 has it.
  • It should be sets, not lists. ["Ann", "Patricia"] should be considered the same result as ["Patricia", "Ann"].
Next time I'll try to apply this to a more interesting problem to see if it holds up.

No comments: