del.icio.us schema

April 24th, 2005

Just read this discussion about possible del.icio.us database schema solutions. Interesting postulation. Here’s how I’d do it, and in fact, that’s how I set it up for tags in my blog’s previous incarnation.

I would have three principle tables, much like the completely normalized version in Philipp Keller’s blog. Of course, some modifications would need to be made to normalize this schema further to make it efficient for multiple users bookmarking the same URLs, etc. However, that’s a topic for another day. Here’s the MySQL-compatible table structures:

Items
CREATE TABLE Items (
item_id int NOT NULL AUTO_INCREMENT,
item_description varchar(255) NOT NULL,
item_url varchar(255) NOT NULL,
item_extended varchar(255) NOT NULL,
item_tags varchar(255) NOT NULL,
PRIMARY KEY(item_id),
KEY(item_url)
)
Items_x_Tags
CREATE TABLE Items_x_Tags (
item_id int NOT NULL,
tag_id int NOT NULL,
PRIMARY KEY(x_item_id,x_tag_id)
)
Tags
CREATE TABLE Tags (
tag_id int NOT NULL AUTO_INCREMENT,
tag_name varchar(255) NOT NULL,
tag_item_count int NOT NULL,
PRIMARY KEY(tag_id),
KEY(tag_name),
KEY(tag_item_count)
)

Here are some design decisions that were involved in this design. First of all, we want to be able to run statistics on tags, such as count the number of times they’ve been mentioned, etc. That’s why the Tags table is separate, assigning a unique integer identifier to each tag as a primary key. We’ll keep a count of the number of times that tag is used in this table as well, so it’ll be easy to figure out which tags are most-used. For multiple-user situations, we’d want to do the same thing with a table that has URLs, their hash and a count of the number of times that URL has been bookmarked. Then the user’s private URL name, tags, etc will just link to this table keeping track of hashes/counts for each URL.

When we add new Items to the database, we clean up the plain text string of tags the user submits, then check to see if each tag already exists (and get its tag_id, incrementing the tag_item_count). Otherwise we create a new Tags record with a count of one and take the newly created tag_id and use that. For each tag the user supplied, we insert a record into the Items_x_Tags with the newly inserted Item’s item_id. Note that there’s a field in the Items table that keeps the tag list in its character form. Why? First of all, its nice to have the original tag order. Second, this way we can display items without lots of joins, saving on database thrashing. Yes, its not 100% normalized. But then again, integer primary keys are excessive as well as far as normalization goes when the URL would serve as a good primary key for instance. I prefer to take a mid-line approach to normalization – just enough to make it practical, fast and well-suited for our purpose.

When modifying entries, we just unlink all tags, decrementing counts, then start over with the same procedure as for new items – link them again through the map, incrementing counts.

Ok, so now we have those three tables, populated with our tags. Lets try to design the querries that Philipp Keller’s blog entry mentions – Intersection, Union and Minus.

Intersection (AND)
SELECT *
FROM (Items AS i INNER JOIN Items_x_Tags AS x ON i.item_id=x.item_id)
INNER JOIN Tags AS t ON x.tag_id=t.tag_id
WHERE t.tag_name IN (’bookmark’, ‘webservice’, ’semweb’)
GROUP BY i.id
HAVING COUNT(i.id)
Union (OR)
SELECT *
FROM (Items AS i INNER JOIN Items_x_Tags AS x ON i.item_id=x.item_id)
INNER JOIN Tags AS t ON x.tag_id=t.tag_id
WHERE t.tag_name IN (’bookmark’, ‘webservice’, ’semweb’)
GROUP BY i.i
Minus
SELECT *
FROM ((Items AS i INNER JOIN Items_x_Tags AS x ON i.item_id=x.item_id)
INNER JOIN Tags AS t ON x.tag_id=t.tag_id)
INNER JOIN Tags AS m ON x.tag_id=m.tag_id)
WHERE t.tag_name IN (’bookmark’, ‘webservice’, ’semweb’) AND NOT (m.tag_name IN (’semweb’))
GROUP BY i.id

Haven’t tested the Minus query, but in theory it should work – do a four-table join, since its an INNER JOIN, we’ll have only Items that have ‘bookmark’ and ‘webservice’ tags, but that don’t have the tab ’semweb’. This might take some debugging, but in theory that’s the way I’d approach that kind of Minus query.

Here’s some extra reading: Philipp’s original entry, some code snippets that deal with tags.

O-Zone – Dragostea Din Tei / Maiyahi

January 15th, 2005

maiyahiA while ago FUSE posted a link to a Flash animation entitled Maiyahi (in the FUSE What the F*ck section – highly recommended, go to their site and click on the menu option Timekillas and then on What the F*ck). Needless to say this resulted in numerous replays of said Flash animation, to everybody learning the song, etc, etc. For a while I thought the song was in Japanese, but apparently its actually in Romanian. Here are the lyrics to Dragostea Din Tei by Haiducii (or here without translation. I’m pretty sure the MP3 is floating around on file sharing networks, just search for that or some permutation of “nu ma, nu ma iei” or “ma-ia-hii” or something like that. Of course, not to be outdone, other people made their own renditions of the song. Here’s an especially bizzare animation. In terms of absurdity, this last one is up there with the Star Wars kid, but not quite as funny.

numanumaUpdate: O-Zone CD on Amazon.fr, apparently not available from the US Amazon site. And a Lego video for the song has surfaced. Truly bizzare, and I’m amazed that somebody would spend that much time doing frame-by-frame animation with Legos for ANY song. (via Anticlown Daily)

o-zoneUpdate 2: Ok, just for sake of completeness, here’s the real video and their official site under Universal Germany and a Romanian official site and a fansite. Apparently the Lego version is pretty close to the way the real video looks, who would’ve figured. Except the real video is basically a flashback to the era of boyband videos – ouch, maybe that’s still popular over in Europe.

And here are the lyrics:

Read the rest of this entry »

Everyone Else Has Had More Sex Than Me

December 8th, 2004

If you haven’t seen this flash animation, you’re missing out.

And here are the lyrics…

Read the rest of this entry »

The Killers – Somebody Told Me

August 15th, 2004

thekillersThere’s a whole lot of music that I really like right now. Just today, was listening to the radio at the office and heard “Somebody Told Me” by The Killers. Really fun song. Jasmine said it sounds like something she heard at 80’s night at this club in Pittsburgh. So, the question is – is this a cover of an 80’s song, or is it an original by The Killers.

Island Records has a cool eCard for them, which plays the song, go and listen to it here.

And here’s a link to their CD on Amazon.com.

And the lyrics..

Read the rest of this entry »

A Perfect Circle : The Nurse Who Loved Me

January 6th, 2004

Listening to A Perfect Circle right now, and the song “The Nurse Who Loved Me” struck me as being at the same time odd and comfortably fascinating. Crazy stuff. Apparently its a cover of a song by Failure, and the lyrics are only a little bit different. The Failure song talks about a “rock’s topography”, whereas I think this version talks about a “rug’s topography”. Also there Failure version has “she’s got everything i need; some pills that are red on top“, while this version has “she’s got everything i need; some pills in a little cup“. Interesting differences and the song’s meaning is a mystery to me as well, although I have my theories. Anybody have any theories they’d like to share?

And the lyrics…

Read the rest of this entry »