Home page | Back

Joomy Korkut's blog
An impractical pseudo-Prolog
Posted on December 5, 2014, tags: prolog, logic, programming, haskell

As my final project for COMP360 Computational Logic and Logic Programming, Fall 2014, Prof. Lipton, I decided to implement a pseudo-Prolog that tries to answer queries by building the least Herbrand model1. A project like this is not only very impractical and impossible to use in a real life situation, but also technically problematic.

(Link to the project on GitHub)

In this blog post, I will discuss the problems of building this project and the limitations we encounter in terms of how pseudo this Prolog is.

Implementation Problems

Note that trying to find answers for queries by building the least Herbrand model is problematic in certain ways. First of all, for many programs, the least Herbrand model is infinite. For example,

add(o,X,X).
add(s(X), Y, s(Z)) :- add(X,Y,Z).

will have an infinite least Herbrand model, which is like this:

\[T_P(\emptyset) = \{add(o,o,o), add(o,s(o), s(o)), add(o, s(s(o)), s(s(o))), …\} \] \[T_P(T_P(\emptyset)) = \{add(s(o), o, s(o)), add(s(o), s(o), s(s(o))), add(s(o), s(s(o)), s(s(s(o)))), …\} \] \[\mathbb{M}_P = T_P(\emptyset) \cup T_P(T_P(\emptyset)) \cup … \]

or with another notation,

\[ \mathbb M_P = \bigcup_{i=1}^{n} (T_P)^{i} (\emptyset)\]

Note that it’s not only that the result \(T_P\) on any set is infinite, but also \(\mathbb M_P\) is constructed by the union of infinite number of sets. In this case, we need to make sure that our implementation doesn’t get stuck in an infinite list.

Since our program is going to proceed by trying to find the query in the created Herbrand base, we should make sure that it doesn’t try to construct the entire Herbrand base first. This would cause the program to fail whenever the program has a predicate like the example above. (\(add/3\)) Haskell’s lazy evaluation is a perfect environment to avoid this. However, we cannot use the built-in Data.Set type (which is not lazy) in Haskell, we need another data type to represent an infinite set. We also need this infinite set structure to handle set unions with lazy evaluation, and we need to check membership without getting stuck on the first element of a set union.

For example, for the \(add/3\) example above, if the query happens to be add(s(o), o, s(o)), the program should not get stuck in \(T_P(\emptyset)\), which is an infinite set. The answer is in \(T_P(T_P(\emptyset))\), which is the second set in the union \(\mathbb{M}_P\). Our infinite set data type should be able to handle this kind of issue.

However, what makes this project a pseudo-Prolog is that, it can never say no for a predicate like add. It will check membership in \(\mathbb{M}_P\) forever and it will never find one for an incorrect query, such as add(s(o), o, o). A real Prolog implementation would get the result no with SLD resolution, which is not what this project does. (again, pseudo-Prolog)

Unpacking the problem

As I mentioned in the previous section, Haskell’s built-in set structure doesn’t allow infinite sets, since it is not lazily evaluated. Data.Set also requires the type to be an instance of the type class Ord, which doesn’t make much sense for the types in question.

I looked for an implementation of infinite sets in Haskell, but the one I found (Data.Searchable) was not handling set unions the way this problem required. Our infinite set structure should be able to handle the union of different infinite sets. By that, I mean that when looking up for an element in a union of infinite sets, it shouldn’t get stuck in the first infinite list; it should go step by step. But it cannot just check the first elements, because there is a chance that the set is a union of an infinite number of elements, so in that case it would get stuck looking at the first elements of infinite numbers of sets.

My infinite set implementation

To solve, the problems I encountered, I needed to implement my own infinite set data type. In order to allow a union of sets to be handled lazily, I created a set type (Set a) with two data constructors: Set [a] and Union [Set a]. Since a list is a data type that allows lazy evaluation, this implementation should work for our purposes. Of course, there are two crucial things that my implementation fails to satisfy:

  1. Repeating elements! How is this possibly a set if there are recurring elements?! To be honest, it’s not, but it’s possible to make it one. Since this is an experimental project, I don’t mind having the same element more than once in the set. Also, checking for repeated elements is very costly in a list with infinite number of elements.

  2. Complexity. Note that an atomic set is a list, after all. This requires quadratic time, \( O(n^2) \), if you want to check for repeating elements as well. Omitting repeating element check means that we only need linear time, \( O(n) \). This was a trade-off I made since this was only an experimental project.

Membership check in an infinite set

I previously described the problems with membership check that we need for this project. Since Union [Set a] contains a list that allows lazy evaluation, we can go one step at a time. Likewise, Set [a] contains a list, so we can check the elements one element at a time as well. Let’s draw a table of how we can check the elements in a union: (@ denotes the index of the set in the union, # denotes the index of the element in the corresponding set, ! represents at which lookup step the element is checked)

Table @0 @1 @2 @3 .. \(\infty\)
#0 !1 !2 !3 !4 ..
#1 !2 !3 !4 !5 ..
#2 !3 !4 !5 !6 ..
#3 !4 !5 !6 !7 ..
…\(\infty\) .. .. .. .. ..

As you can see in the table, membership check goes diagonally, so that it doesn’t get stuck in an infinite set. This might not be very efficient, but it should work for our problem.


Implementing the \(T_P\) operator

The \(T_P\) operator would not be very hard to implement if the Horn clauses didn’t have any variables. If there weren’t any variables then it would be enough to check if the argument set 2 contains the tail formulae for each Horn clause in the program. However, that is not the case.

Variables require us to replace the variables in the Horn clause with ground terms, because that is the only way we can possibly check for membership in the argument set, because the argument set only contains ground formulae.

However, if there are functions3 in our language, that means we will have an infinite number of ground terms. The golden question is: Can we build the result of a \(T_P\) operator if there are infinite number of ground terms?

Let’s think about this issue. Infinite number of ground terms means that we have to build the ground formulae, say, for every relation \(rel\), we have to add possible combinations of Formula rel termsList, where termsList has \(arity(rel)\) terms. Considering that there are infinite number of combinations, there are going to be an infinite number of ground formulae for every relation. Therefore, the result of the \(T_P\) operator will be a union of infinite sets. Can our lazy infinite sets handle this?

Let’s leave this issue for a later time for now and how our program is supposed to handle variables.

Handling the variables in the Horn clause

Let’s remember the definition of the \(T_P\) operator at this point.

\[ T_P(X) = \{ A \ | \ (A \leftarrow B_1,…,B_n) \in P, \{ B_1\theta,…,B_n\theta \} \subseteq X \} \]

where \( A \leftarrow B_1,…,B_n \) is a ground instance of a clause in \(P\), which means that there are no variables inside. How can we compute a ground instance of a Horn clause that has variables? We can basically plug in all possible combinations of the ground terms and then, for the \(T_P\) operator, we can check if the tail of the new ground clause is a subset of \(X\).

Is there a way to construct only the correct ground formulae? I don’t see a way to do this, so I decided to construct all possible ground formulae from all possible ground terms, and then filter them to get the correct ones. Of course, this approach makes infinite sets almost impossible. There must be a way to handle them carefully with lazy evaluation, but I cannot see how. If I try to fix the program to work with infinite sets, I’ll have the basic structures that will facilitate that.

The rest

After creating a function to generate all ground instances of a non-ground Horn clause, we can create a ground program of a given language. This basically means that when we construct \(T_P\), we just have to check for membership in the ground program. (groundProgram in src/Herbrand/Model.hs)

Completing the \(T_P\) operator makes the least Herbrand model trivial. Since we have given up the infinite sets now, we can assume that the least Herbrand model is finite.

What we have to do now is to write a parser to parse facts and predicates, and a REPL to check queries. These are relatively insignificant tasks, so you can see the source code if you want to see how I did it. But I’m sure mine is definitely not the most elegant one.

Conclusion

I started the project with the intent of creating a pseudo-Prolog that handles not only finite programs, but also infinite programs like the \(add/3\) predicate we talked about. However, I couldn’t do this, because when a variable is used in a predicate, there was no way for me to get only the correct ones. I could work backwards and do unification, like a real Prolog does, but that would defeat the whole experimental purpose. My only solution to build all possible formulae and take combinations of \(arity(rel)\) from that set. There might be a lazy way to do this, but I couldn’t figure it out, so I ended up building a pseudo-Prolog that doesn’t allow functions. I can call it a Datalog, but it’s probably too slow to be actually used for anything. Nevertheless, it was a fun experiment for me.

(Link to the project on GitHub)



  1. The least Herbrand model is basically the set of all atomic formulae that are true.

  2. The argument that \(T_P\) takes, for example, for \(T_P(\emptyset)\), it would be \(\emptyset\).

  3. When I say function, I mean the s in s(o) in add. add is a relation. o and s(o) are terms, and s is a function.