Sometimes we are given a task where we need to model a hierarchy of different complexity levels and not really sure how to do that properly in the most efficient, reliable, and flexible way. Let’s review one of the data modeling patterns that give us some answers for that.
Consider we have a hierarchy with a ragged variable depth, like on a picture below.
The requirements are that we need to:
This is where a closure table comes into the picture.
A closure table gives you the ability to work with tree structure hierarchies. It involves storing all paths through a tree, not just those with a direct parent-child relationship. For example: things like family tree(Parent – Kids – etc), company hierarchy(Owner – CEO – Manager – Employee), or file sytem(/users/folauk/Documents/test.mp4). There are other ways such as Path Enumeration and Nested Sets to keep track of this kind of information but using a closure is the simplest and most efficient way of doing it.
The main building block of the pattern is an additional structure(table) or a mapping table that keeps the tree relationships — a set of node pairs for each path from the ancestor to a descendant or parent to child, including a path to itself.
Our example here is a social media site where a user can put up a post. Other users can comment on that post and can comment on comments from the same post. The tree here is comments under comments. We have to keep track of which comments go under which comments and their parents.
CREATE TABLE `user` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `date_of_birth` date DEFAULT NULL, `email` varchar(255) DEFAULT NULL, `first_name` varchar(255) DEFAULT NULL, `gender` varchar(255) DEFAULT NULL, `last_name` varchar(255) DEFAULT NULL, `phone_number` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=latin1; CREATE TABLE `post` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `content` longtext, `created_at` datetime(6) DEFAULT NULL, `last_updated_at` datetime(6) DEFAULT NULL, `uuid` varchar(255) NOT NULL, `author_id` bigint(20) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `UK_b1a5vs6t5vi32stck5kjw0tgf` (`uuid`), KEY `FK12njtf8e0jmyb45lqfpt6ad89` (`author_id`), CONSTRAINT `FK12njtf8e0jmyb45lqfpt6ad89` FOREIGN KEY (`author_id`) REFERENCES `user` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=latin1; CREATE TABLE `comment` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `content` longtext, `created_at` datetime(6) DEFAULT NULL, `last_updated_at` datetime(6) DEFAULT NULL, `author_id` bigint(20) DEFAULT NULL, `post_id` bigint(20) DEFAULT NULL, PRIMARY KEY (`id`), KEY `FKh1gtv412u19wcbx22177xbkjp` (`author_id`), KEY `FKs1slvnkuemjsq2kj4h3vhx7i1` (`post_id`), CONSTRAINT `FKh1gtv412u19wcbx22177xbkjp` FOREIGN KEY (`author_id`) REFERENCES `user` (`id`), CONSTRAINT `FKs1slvnkuemjsq2kj4h3vhx7i1` FOREIGN KEY (`post_id`) REFERENCES `post` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=latin1; CREATE TABLE `comment_tree` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `depth` int(11) DEFAULT NULL, `child_id` bigint(20) DEFAULT NULL, `parent_id` bigint(20) DEFAULT NULL, PRIMARY KEY (`id`), KEY `FKhosfyxi6cpykkhyrccfxq2k2x` (`child_id`), KEY `FKq2283664y5ywd961iojgjk98b` (`parent_id`), CONSTRAINT `FKhosfyxi6cpykkhyrccfxq2k2x` FOREIGN KEY (`child_id`) REFERENCES `comment` (`id`), CONSTRAINT `FKq2283664y5ywd961iojgjk98b` FOREIGN KEY (`parent_id`) REFERENCES `comment` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1201 DEFAULT CHARSET=latin1;
Here we have a comment table that represents our comment, author of the comment, and the post it belongs to.
Insert comments(rows) into the comment_tree(closure) table
INSERT INTO comment_tree(parent_id, child_id, depth) SELECT p.parent_id, c.child_id, p.depth+c.depth+1 FROM comment_tree as p, comment_tree as c WHERE p.child_id = :parentId and c.parent_id = :childId;
Delete comments(rows) in the comment_tree(closure) table
DELETE link FROM comment_tree as parent, comment_tree as link, comment_tree as child WHERE parent.parent_id = link.parent_id AND child.child_id = link.child_id -- AND parent.child_id = {parentId} and child.parent_id = {childId} AND parent.child_id = ? AND child.parent_id = ?
Retrieve comments under a comment.
Get all comments under comment with id 4.
SELECT com.* FROM comment AS com JOIN comment_tree AS comTree ON com.id = comTree.child_id WHERE comTree.parent_id = 4;
Retrieve parent comments of a comment
Get all parent comments of comment with id 4.
-- retrieve parent comments of comment with id 4 SELECT com.*, comTree.* FROM comment AS com JOIN comment_tree AS comTree ON com.id = comTree.parent_id WHERE comTree.child_id = 4;