In my last post, I showed you how you can make a very simplistic computer algebra system which can expand trigonometric expressions (but is trivially expandable to support whichever expressions you like.) It's probably worth reading that before reading this post, because otherwise it might not make a lot of sense.

The thing was, there were some pretty severe limitations to what it could do. I'll address one of these in this post and show you how you could ammend that.

Suppose you have the transformation rule,

$$ sin^2(x) + cos^2(x) \rightarrow 1 $$and you enter this into the program and tell it to expand the expression

$$ sin^2(\pi) + cos^2(\pi) $$You would expect that it is "expanded" into $1$, and indeed that is what happens. But, what if you ask it to expand this?

$$ sin^2(\pi) + cos^2(2\pi) $$This clearly shouldn't match (because $\pi \neq 2\pi$), but in its current state, the program will happily tell you that it too is equal to $1$. The reason for this becomes obvious when you take a look at the current definition of `match`

, which if you recall is the function which is in charge of taking a pattern (the left-hand-side of a rule) and checking if it matches another expression, while simultaneously returning an associative list of the matched "holes" and their corresponding values.

To clarify, suppose it was given the rule I described earlier, $ sin^2(x) + cos^2(x) \rightarrow 1 $. The pattern would be $ sin^2(x) + cos^2(x) $, with two holes. The holes denote a variable-like expression - $x$, in this case - which can be substituted for any other value. This value which is substituted for it in the expression will be considered the hole's "value". I hope that makes sense, but if not, have a quick read of the `match`

function section in my previous post.

Anyway, this is its implementation at the moment:

match :: Pattern -> Term -> Maybe Match match (Hole x) term = Just [(x, term)] match (Op patternOp patternL patternR) (Op termOp termL termR) | patternOp /= termOp = Nothing | otherwise = (++) <$> (match patternL termL) <*> (match patternR termR) match (Prefix patternOp patternX) (Prefix termOp termX) | patternOp /= termOp = Nothing | otherwise = match patternX termX match (Call patternFn patternArgs) (Call termFn termArgs) | patternFn /= termFn = Nothing | length patternArgs /= length termArgs = Nothing | otherwise = concat <$> sequence (zipWith match patternArgs termArgs) match a b = if a == b then Just [] else Nothing

The first clause, when the pattern is a `Hole`

type, is where the problem lies. It never checks if a hole is already defined, and as a result, both values will be added to the associative list. This is not a problem if they share the same value, but if they are different, the pattern should not match, as demonstrated above.

I can think of two ways to solve this problem - one is relatively easy and requires a small amount of changed code, and the other is more complicated and, in a way, more clever. Here they are, see if you can guess which is which:

- After the match has finished, check the returned associative list element-by-element to ensure that it is consistent. This can be done by going through the list and, for each element, checking if its value is equal to the first entry of the same key in the match result.
- Or, a solution involving a redefinition of
`Match`

, semigroups, and other things.- Redefining
`Match`

as an ADT which has two constructors, one producing an associative list like what a`Just Match`

currently is, and one representing a failed match, similar to`Nothing`

. - Making
`Match`

an instance of`Semigroup`

, where the binary operation is defined such that:- $ \textit{Match}\ a \diamond \textit{Match}\ b = \textbf{if} \ \ a\ \textit{`conflicts`}\ b \ \ \textbf{then} \ \ \textit{NoMatch} \ \ \textbf{else} \ \ \textit{Match}\ (a \diamond b) $
- $ \textit{otherwise} = NoMatch $

- Using the $\diamond$ operator in the
`match`

function instead of`++`

in order to properly concatenate the matches.

- Redefining

And while I really wanted to try the second option, I couldn't convince myself that it was better than the first one, which is much easier to implement, likely takes less time to run (at the end of the match instead of checking at each step) and also probably uses less memory (because it doesn't need to hold the extra wrapping of matches, and again the point about checking at the end instead of throughout).

One nice thing about the second option is that it would be able to terminate the matching early if a hole conflict was detected, but that shouldn't make much of a difference unless operating on absolutely huge expression trees.

So I went with the first one, despite it being marginally less "elegant". Here's what I came up with:

match' :: Pattern -> Term -> Maybe Match match' (Hole x) term = Just [(x, term)] match' (Op patternOp patternL patternR) (Op termOp termL termR) | patternOp /= termOp = Nothing | otherwise = (++) <$> (match' patternL termL) <*> (match' patternR termR) match' (Prefix patternOp patternX) (Prefix termOp termX) | patternOp /= termOp = Nothing | otherwise = match' patternX termX match' (Call patternFn patternArgs) (Call termFn termArgs) | patternFn /= termFn = Nothing | length patternArgs /= length termArgs = Nothing | otherwise = concat <$> sequence (zipWith match' patternArgs termArgs) match' a b = if a == b then Just [] else Nothing match :: Pattern -> Term -> Maybe Match match pattern term = case match' pattern term of Nothing -> Nothing Just vars -> if consistent vars then Just vars else Nothing consistent :: Match -> Bool consistent ms = all check ms where check (name, val) = (fromJust $ lookup name ms) == val

I first renamed `match`

to `match'`

, and replaced all the instances of it inside itself with the new name too. Then, everything works the same, but it leaves the name `match`

for another function. I defined `match`

such that, at an abstract level, it just calls `match'`

and checks for consistency before returning it again. If there is inconsistency, `Nothing`

will be returned instead, as if the match was unsuccessful (and indeed it was).

To do this, another function, `consistent`

, had to be defined. It checks if a given match result is consistent with itself, by checking, for all elements, if the value of that element is the same as the value of the first element in the list with the same hole name. This may not be an optimal solution, since it has a complexity of $O(n^2)$, and to be honest something could probably be done quite easily with a hash-set to get it down to $O(n)$, but since there will generally be less than, say, 10 match results, this won't matter. But perhaps as an exercise for the reader, see if you could improve this function.

And with that, the problem is fixed! Nothing else in the code needs to be fixed, because the type signature of `match`

hasn't changed.

For fun, I decided to have a go at implementing the match-consistency mechanism using the other method I mentioned earlier on, and then I came to the conclusion that it's actually much nicer and probably even faster, due to how I did it (you'll see later). I'll go over it at a high level first:

- Redefine the
`Match`

type as an ADT with two options. The first option is also called`Match`

and takes one parameter - a`Map String Term`

, which, if you're unfamiliar with`Data.Map(.String)`

, is a fast mapping between strings and terms. This is practically equivalent to a`[(String, Term)]`

associative list, but is much faster to access values from. The other option is`NoMatch`

, which represents the fact that a pattern didn't match with an expression. This means that the new`Match`

type is actually analogous to`Maybe Match`

, with the old definition of`Match`

. So, you might be wondering why I even bothered to define this type. - The reason I redefined
`Match`

is that`Maybe [(String, Term)]`

(or a type synonym of that) can't be made an instance of`Semigroup`

(to my knowledge, at least. I'm sure that there is a way). So, the next thing to do is to make`Match`

an instance of`Semigroup`

, where the semigroup operator $\diamond$ concatenates two matches as long as there are no conflics. If there are any conflics,`NoMatch`

will be returned. - Lastly, rewrite a load of code to work with the new
`Match`

type.

Now let's get started on the implementation.

First off, we need to import `Data.Map.Strict`

. Plain old `Data.Map`

could also be used, but the strict version is probably more efficient in this situation.

import Data.Map.Strict (Map) import qualified Data.Map.Strict as M

The new definition of `Match`

then looks like this:

data Match = Match { getMatch :: Map String Term } | NoMatch deriving (Show, Eq)

Pretty simple. The first constructor contains a mapping from strings to terms, just like the associative list used to do. The second constructors takes no arguments and represents a failed match. This type derives `Show`

and `Eq`

, which we'll need to use later. (Though, `Show`

isn't strictly necessary, but it's useful for debugging in ghci.)

Then, the semigroup instance of `Match`

can be defined like so:

instance Semigroup Match where -- It can be assumed, when joining two matches, that each individual -- map is already consistent with itself. This should be the case when -- a match is generated from the `match` function. Match a <> Match b | consistent = Match (a <> b) | otherwise = NoMatch where consistent = all checkConsistency (M.toList b) checkConsistency (k, v) = case M.lookup k a of Just prev -> v == prev Nothing -> True _ <> _ = NoMatch

The first clause basically says that to append two matches, you must first check if they are consistent with each other. If they are, then the two maps can be appended with the semigroup $\diamond$ operator and then wrapped in a `Match`

. If they aren't consistent, though, `NoMatch`

will be returned to signify the fact that the pattern and the expression are not compatible because they have conflicting hole values.

The consistency of a single element in `b`

can be confirmed by showing that there is no key-value pair in `a`

which conflicts with it. The consistency of `b`

, then, can be confirmed trivially by showing that all elements of it are consistent. This also implicitly confirms the consistency of `a`

- it should be clear that `b`

being consistent with respect to `a`

implies the fact that `a`

is also consistent with respect to `b`

. The proof for this would be trivial, maybe give it a go.

The second clause says that if two matches are appended and either one of them (or both) is not a `Match`

(and thus is a `NoMatch`

), then the result is just `NoMatch`

.

This implementation, using `Map`

s, will be incredibly efficient, even with very large amounts of match results.

We can confirm that this works by opening up the file in ghci and trying a few expressions against it:

ghci> Match (M.fromList [("a", Number 1)]) <> Match (M.fromList [("b", Number 2)]) Match {getMatch = fromList [("a",Number 1.0),("b",Number 2.0)]} ghci> Match (M.fromList [("a", Number 1)]) <> Match (M.fromList [("b", Number 2)]) <> Match (M.fromList [("a", Number 1)]) Match {getMatch = fromList [("a",Number 1.0),("b",Number 2.0)]} ghci> Match (M.fromList [("a", Number 1)]) <> Match (M.fromList [("b", Number 2)]) <> Match (M.fromList [("a", Number 2)]) NoMatch

The first example shows the basic fact that two matches can be successfuly appended. The second example shows that if two values of the same value are present but are consistent with each other, it still matches. The third examples shows that if two conflicting values are present, `NoMatch`

will be returned.

With the new definition of `Match`

, a few other things will have to change - but don't worry, they will actually become simpler since it encapsulates the idea of a value *maybe* existing with its actual value in one type.

This is the new definition of `match`

. Note that the type signature has changed: the return type is now `Match`

instead of `Maybe Match`

.

match :: Pattern -> Term -> Match match (Hole x) term = Match $ M.fromList [(x, term)] match (Op patternOp patternL patternR) (Op termOp termL termR) | patternOp /= termOp = NoMatch | otherwise = (<>) (match patternL termL) (match patternR termR) match (Prefix patternOp patternX) (Prefix termOp termX) | patternOp /= termOp = NoMatch | otherwise = match patternX termX match (Call patternFn patternArgs) (Call termFn termArgs) | patternFn /= termFn = NoMatch | length patternArgs /= length termArgs = NoMatch | otherwise = foldl (<>) (Match M.empty) (zipWith match patternArgs termArgs) match a b = if a == b then Match M.empty else NoMatch

The changes are small. Notably:

- The applicative style is no longer used to join together two matches (e.g. on line 5), because the matches are no longer wrapped up in
`Maybe`

s. - Where
`Just x`

was used before,`Match $ M.fromList x`

is now used. - Where
`Nothing`

was returned previously,`NoMatch`

is now returned.

We'll also need to make a couple of other tiny changes to the `substitute`

and `applyRule`

functions:

substitute :: Map String Term -> Term -> Term substitute vars term@(Hole x) = fromMaybe term (M.lookup x vars) substitute vars term = applyInside (substitute vars) term applyRule :: Rule -> Term -> Term applyRule rule@(pattern, replacement) term = case match pattern term of NoMatch -> applyInside (applyRule rule) term Match vars -> substitute vars replacement

The change in `substitute`

is that the first parameter, which used to be a `Match`

, is now a `Map String Term`

. It can't remain a `Match`

type, because that would allow it to be `NoMatch`

, and then substitute wouldn't be able to do anything. Since we know, at the time when `substitute`

is called, that the match has succeeded, we can just take the string-term map out of the match and feed it straight to the function. Due to this change, `M.lookup`

has to be used instead of the regular `lookup`

for associative lists.

The changes in `applyRule`

are even simpler - literally, `Nothing`

is changed to `NoMatch`

and `Just`

is changed to `Match`

.

And that's it! It's a bit more long winded than the other solution (about 10 more lines of code), but I would say that it is much *nicer*, and I'm sure that you'd agree.

Now we have a *more* functional CAS than at the end of my last post, so that's cool. There are a few more problems, so I guess I'll fix them at some point in the future. I reckon the next thing I'll do is a parser so I don't have to type out the ridiculously verbose constructors to make even simple expressions, which will help with writing rule sets as well as actually using the system (from a REPL, most likely).

At that point, it would also be trivial to load rules from files, which would be a nice little addition too.