Sunday, January 22, 2006

A Different Implementation of Relation-Chaining

OK, I wasn't really happy with the implemenation the relation-chaining thing I did last week, so I rewrote it. This won't make a lot of sense if you haven't read last week's installment, so go read it if you haven't yet...

This time I have the forward-linking operator sort of equivalent to the python "." operator. If romulus.father == mars, then romulus >> father == rset([mars]); in other words, the set containing mars. rset is a class derived from set. It's convenient to work with sets, because in the case of the reverse-linking operator, there could be more than one result, and I wanted some symmetry between the two cases.

To do the reverse linking, I use gc.get_objects(), which is python's garbage collector's master list of all the objects in the system. I guess that's not very efficient for a large program. In a large program I'd recommend creating a "universe" dictionary, put the objects you really care about it in it, then search that instead of get_objects().

A nice result of using sets is shown in the brother() function; the set operator "-" means to remove items from a set, so (x >> "father" << "father") - x means: find the father of x, then find the people x is a father to, then subtract x from that set, leaving only the siblings.

The only reason I create my own set class, or my own relational_object class, is to get the syntax of << and >> to work. Really, this is a demonstration of a feature I think could be central to a language, and in that hypothetical language you'd build this feature in so it could be used on any objects.

Well, here's the code. I promise I'll move on to something different next week! I just had to get this out of my system.

#!/usr/local/bin/python2.4
from gc import get_objects

class relational_object(object):
def __repr__(self):
return ""

def get_conv_gen(self, att):
for other in get_objects():
try:
if other.__dict__[att] == self:
yield other
except Exception, e: pass

def get_conv_any(self, att): return self.get_conv_gen(att).next()
def get_conv_all(self, att): return rset([ x for x in self.get_conv_gen(att)])

def __rshift__(self, att): return rset([self.__dict__[att]])
def __lshift__(self, att): return rset(self.get_conv_all(att))

class rset(set):
def __rshift__(self, att):
result = rset()
for x in self:
result = result | rset([x.__dict__[att]])
return result
def __lshift__(self, att):
result = rset()
for x in self:
result = result | x.get_conv_all(att)
return result

class person(relational_object):
def __init__(self, theName):
self.name = theName
def __repr__(self): return "<>"

mars = person("mars")
romulus = person("romulus")
remus = person("remus")
assert mars == mars
assert mars != remus

romulus.father = mars
remus.father = mars

assert mars << "father" == rset([romulus, remus])

Saturday, January 14, 2006

Relation-chaining gimmick in Python

This is just a piece of a whole idea, but I thought you might find it amusing.

There is an RDF notation called N3 that has an interesting syntax for quickly describing a chain of relations linking one node to another.

Here are some examples from the N3 paper:

:joe!fam:mother!loc:office!loc:zip   Joe's mother's office's zipcode
:joe!fam:mother^fam:mother Anyone whose mother is Joe's mother.
:joe ("Colon Joe" -- sounds like a brand of medicinal coffee, doesn't it?) is an RDF node. The other strings with colons in them, fam:mother, loc:office, and loc:zip, represent binary relations. I think the part before the colon tells you the namespace it's from, but ignore that for now.

The ! is like the "." operator in most object-oriented languages. :joe!fam:mother means "Joe's Mother", and you might say this as joe.mother in some language like Python or C.

The ^ is the converse of !; so if joe!mother == sue, then sue^mother == joe. You can think of !mother as meaning "mother" and ^mother as meaning "child". ^employer means employee, ^owner means possession. "hello, world"!length == 12, so 12^length == "hello, world" (among many other strings).

My mind was brought back to this when I was working on a project with a bunch of related objects in Python, and I was looking for a clean way of expressing complicated queries among them. In my day job I do a lot of ad-hoc SQL querying, and while I don't love SQL, it is nice to be able to fluently, readably, ask some complex questions of a bunch of relations. I'd read that list comprehensions were more or less isomorphic to SQL, but I found when playing with them that my queries were inefficient and not very readable; they were no better than just writing explicit code to traverse the object tree and collect the information I wanted.

I'm not sure if these N3-style paths will be useful in practice, yet, but it was worth playing around with them. The biggest hurdle with them, as I see it, is that there's no easy way when presented with 12^length to know what string is wanted, so you have to have a well-defined universe of objects to work with, and be willing to accept multiple results from every query.

So anyway, here's some code to give you an idea what I've come up with so far:

import pprint

class relation:
def __init__(self):
self.forward = dict()
self.backward = dict()

def add(self, arg1, arg2):
if not self.forward.has_key(arg1):
self.forward[arg1] = dict()
if not (arg2 in self.forward[arg1]):
self.forward[arg1][arg2] = True
if not self.backward.has_key(arg2):
self.backward[arg2] = dict()
if not (arg1 in self.backward[arg2]):
self.backward[arg2][arg1] = True

def __call__(self, arg1, arg2):
try:
return self.forward[arg1][arg2]
except Exception, e:
return false

def get_all_keys(self, hash, key):
try:
return hash[key].keys()
except Exception, e:
return []

def __rlshift__(self, other):
result = []
for item in other:
result = result + self.get_all_keys(self.forward, item)
return result

def __rrshift__(self, other):
result = []
for item in other:
result = result + self.get_all_keys(self.backward, item)
return result

father = relation()
phone = relation()

class frank: pass
class jeff: pass
class chris: pass
class andy: pass

father.add(frank, jeff)
father.add(jeff, chris)
father.add(jeff, andy)
phone.add(andy, "101-555-1249")
assert [chris] >> father == [jeff]
assert [jeff] << father == [chris, andy]
assert [chris] >> father == [jeff]
assert [jeff] << father == [chris, andy]
assert [andy] << phone == ["101-555-1249"]
assert [chris] << phone == []
assert [andy, chris] << phone == ["101-555-1249"]
assert (([chris] >> father) << father) << phone == ["101-555-1249"]
assert [chris] >> father << father << phone == ["101-555-1249"]

Some things to note about this implementation:
  • I used classes like "jeff" and "chris" for nodes. You could use any Python objects; the "class X: pass" syntax is just a cheesy quick way to create a dummy object to play around with. In Python a class is a kind of object.
  • I used >> and <<>
  • nstead of the RDF "node", I use a list with one or more python objects in it, in this case strings. I'm using a list as kind of a half-assed monad. Since a >> or <<>
  • I define a relation object which holds all its pairs of related things in hashes. It's probably not very efficient. I originally thought I'd just use the members of objects as the relations, so that the natural joe.mother would be the basis for the relation joe >> mother. But that didn't fit in very well with the list thing. If Heather has two mommies, you can have:

    mother("Ann", "Heather")
    mother("Patricia", "Heather")
    assert ["Heather"] >> mother == ["Ann", "Patricia"]
    as opposed to

    heather.mother = ann
    heather.mother = patricia
    where Patricia just overwrites Ann. This makes Ann sad.

    Maybe there's a better way around this.

  • I'm not sure I've got the precedence the same as N3 has it.
  • It should be sets, not lists. ["Ann", "Patricia"] should be considered the same result as ["Patricia", "Ann"].
Next time I'll try to apply this to a more interesting problem to see if it holds up.