Friendly Polymorphism in Functional Languages

Before we begin, let me make it clear that this is not supposed to be a groundbreaking new feature, nor is it described formally. This is me tinkering with ideas to improve on Haskell's type system, for me or someone else to perhaps implement in the future.

Background

Polymorphism is the ability for entities to have many forms, generally to make nice abstractions. In Object Oriented languages, this generally refers to subclassing and overriding: a Cat class can take methods from an Animal class and override them to do something more cat-like, while still fitting within the contract defined by the Animal class.

In Functional languages, polymorphism is handled differently depending on the language. This article will primarily focus on Haskell, which uses "type classes" for its polymorphism. Type classes are similar to interfaces in OO languages, except they are not strongly tied to the class or an object (since there are no classes!). This means that type classes can perform polymorphic dispatch without needing a value to dispatch over.

Type Classes vs Interfaces

For example, let's say we're writing code to deserialize some JSON input and then map the input to some arbitrary type. However, we need to put some constraints on the types we can accept - deserializing a Function doesn't make much sense.

In Haskell we can define a type class to do this:

class FromJSON t where
	fromJSON :: JSON -> t
This code defines a type class FromJSON which has a single method fromJSON

And then we can make an instance of the type class for some type:

instance FromJSON Int where
	fromJSON (JSONInt i) = i
    fromJSON other = error "Not an int"

This binds the t defined at the class declaration to Int and thus fromJSON must have type JSON -> Int

Now we can use the class's function like so:

readJSON :: (FromJSON a) => String -> a
readJSON input = do
	let json =  parseJSON input
    fromJSON json
    
 main :: IO ()
 main = do
 	let json = "3"
    let parsed = readJSON json :: Int
    print parsed

Notice how we never specify what a is - readJSON is "parametrically polymorphic" and will work for all types a such that there is a FromJSON instance defined for that type.

This is great, because we can write functions that work for multiple different types at once, while not compromising type safety.

Now let's compare this to how you'd do it in a statically typed Object Oriented language. I'll be using Java to demonstrate:

interface FromJSON<T extends FromJSON<T>> {
	T fromJSON(JSON input);
 }

This is straight forward enough, but requires a slightly messy generic declaration so that fromJSON always returns the implementing type

Now we have a problem: we can't edit the types int or Integer... so how do we make a FromJSON implementation?

The short answer is that we can't - this pattern in Java simply isn't possible. But what about types we do control? Let's define some Person type and make it implement FromJSON:

public class Person implements FromJSON<Person> {
	private final String name;
    private int age;
    
    public Person(String name, int age) {
    	this.name = name;
        this.age = age;
    }
    
    @Override
    public Person fromJSON(JSON input) {
    	// implementation irrelevant
    }
    // getters and setters etc here
 }

Looks good, but now we have a problem - how do we actually call fromJSON? It's not a static method, so it requires an instance of Person to call - but we don't have an instance yet! We have a chicken vs egg problem.

So even when we control the type's source, this pattern isn't really possible. There are a few workarounds, of course, but none of these are ideal:

  • Add an no-args constructor to Person that sets all of the fields to null. This is pretty bad as now all your fields are nullable even if they shouldn't be, and you could potentially have objects with invalid state floating around
  • We could make the no-args constructor private, and then define a static method which calls this constructor exclusively to call fromJSON. This lessens the problem, but doesn't fix it. We still either have to make everything nullable or violate the expected state, which could easily result in bugs. It's also quite verbose, and makes any type of function like the previously mentioned readJSON impossible - as Java can't do dynamic dispatch with static methods
  • The best "solution" would be to move the FromJSON implementation to a separate class - this is what libraries like Gson and Jackson do. This is more of a workaround though, as we now have 2 classes rather than 1, one of which only exists to allow dynamic dispatch because of the type system's weaknesses.

From this short (ish) example, we can see the benefits of type classes over interfaces - although they are not perfect by any means.

Type Classes' weaknesses (in Haskell)

While type classes are much more flexible in their usages, this flexibility can lead to quite a few problems.

Ambiguous Types

It's fairly common to run into problems with ambiguous types in Haskell, where there are multiple possible types to use and so the compiler doesn't know which.

Consider a slight modification of the previous code:

main :: IO ()
main = do
	let json = "3"
    let parsed = readJSON json
    print parsed

The compiler will complain here because there's no way of it knowing what FromJSON instance we want to use - and so there's no way of knowing which function overload to call. This could be solved with an explicit type declaration: let parsed = readJSON json :: Int or a language extension called TypeApplications: let parsed = readJSON @Int json, but both of these are slightly messier.

Complex Errors

The previous ambiguity issue leads nicely into the next problem, which is that the error messages generated for type class errors are often very unpleasant and confusing. The error generated for the previous code will look something like this:

JSONBlog.hs:24:3: error:
    • Ambiguous type variable ‘a0’ arising from a use of ‘print’
      prevents the constraint ‘(Show a0)’ from being solved.
      Relevant bindings include parsed :: a0 (bound at JSONBlog.hs:23:7)
      Probable fix: use a type annotation to specify what ‘a0’ should be.
      These potential instances exist:
        instance Show Ordering -- Defined in ‘GHC.Show’
        instance Show a => Show (Maybe a) -- Defined in ‘GHC.Show’
        instance Show Integer -- Defined in ‘GHC.Show’
        ...plus 23 others
        ...plus 13 instances involving out-of-scope types
        (use -fprint-potential-instances to see them all)
    • In a stmt of a 'do' block: print parsed
      In the expression:
        do let json = "3"
           let parsed = readJSON json
           print parsed
      In an equation for ‘main’:
          main
            = do let json = ...
                 let parsed = ...
                 print parsed
   |
24 |   print parsed
   |   ^^^^^

Looking at the error, it's not immediately clear what we've actually done wrong. GHC seems to imply that using print is what's causing the issue - while this is true, it doesn't show the whole picture.

The expression let parsed = readJSON json has the type parsed :: (FromJSON a) => a, meaning parsed currently can have the type of any type a with a FromJSON instance (remember that Haskell is lazy and so nothing has actually been parsed yet).

When we call print parsed, GHC tries to unify the type (FromJSON a) => a with the type of print :: (Show s) => s -> IO (), and throws an error because this isn't possible. parsed can still have infinitely many types, and so the Show constraint can't be solved.

This makes sense, but the error message doesn't tell us most of that. It just says that there's some ambiguity - it's up to the programmer to figure out why that ambiguity exists. A nicer, Elm-style error might look more like this:

JSONBlog.hs:24:3: error:
    • Function ‘print’ cannot be called with the argument ‘parsed’
      because ‘parsed’ has ambiguous type ‘(FromJSON a) => a’
      ‘print’ requires its argument to have type ‘(Show s) => s’,
      but ‘s’ cannot be determined for a polymorphic type, as infinitely
      many instances could exist.
      
      Fix: specify the expected type of ’parsed’ explicitly, either
      at the point of creation or at the call site.
      For example:
	      let parsed = fromJSON json :: T
          or 
	      print (parsed :: T)
      where ‘T’ explicitly satisfies any of the ‘Show’ instances:
        instance Show Ordering -- Defined in ‘GHC.Show’
        instance Show a => Show (Maybe a) -- Defined in ‘GHC.Show’
        instance Show Integer -- Defined in ‘GHC.Show’
      	...plus 23 others

This is slightly naive, of course, as in practice many type ambiguity errors are far more complex than this, so generating pretty error messages is harder than it may seem - but there is still a big room for improvement.

Too much flexibility?

One of Haskell's nice quality of life feature is that it allows numbers, string, and list literals to be polymorphic. If we want to use a binary string ByteString, a Set instead of a list, or a complex number instead of an integer, we can do so without needing to explicitly convert:

num = 3 :: Complex Double
bs = "Hello" :: ByteString
set = [1, 2, 3] :: Set Int
All of these declarations should compile (assuming the required types are imported & the OverloadedLists and OverloadedStrings extensions are used)

This is achieved using type classes - the value 3 has type (Num a) => a, the value "Hello" has type (IsString s) => s), et cetera. These classes have methods to convert from the default types into the specified type s

Unfortunately this comes at a price - firstly, the literals desugar into function calls, which adds a bit of extra overhead. Our "set literal" has to be converted from a list to a set at runtime, an O(n) operation in both time and space. Secondly, it's quite easy to run into ambiguous types yet again. Having to figure out the cause of the type error and adding a manual type declaration in the right place is time consuming and annoying.

This problem exists because type classes don't have any concept of a "default instance". They are either fully polymorphic, or not at all.

A real world example of this issue is when parsing with the library Megaparsec, which provides the combinator oneOf :: (Foldable f, MonadParsec e s m) => f (Token s) -> m (Token s) which takes some Foldable of tokens and matches any one of the tokens.

Naturally you might use it like this: operatorChar = oneOf ("!#$%&*+./<=>?@\\^|-~"), which matches any one of the symbol characters. Unfortunately, with polymorphic strings, this expression is ambiguous because GHC has to unify (Foldable f) and (IsString s), which again isn't possible.

Type Class Defaults as a Solution

I mentioned earlier that type classes don't have default instances. This is slightly untrue - GHC has a special edge case to apply for numeric type classes (Num, Integral, Rational, etc) to make them default to either Integer or Double depending on their type. Unfortunately, that's as far as it goes (except in GHCi). All the defaults are hardcoded in the compiler and we can't add or remove any.

This system is designed so that the defaults are only used if a type ambiguity error would occur. Given a function f :: (Num a) => a -> a, we can still write f 3 and it will use the polymorphic type rather than a default. This way, we don't lose the flexibility when we want it.

One potential improvement to this would be to default to instances in the Prelude module - [] for IsList, String for IsString, et cetera. This intuitively makes sense, but what if there are multiple instances in Prelude? Or none? This also puts more of a "magic" dependency in Prelude, which is not really the case - the only special thing about it, is that it's implicitly imported

So the best way would be letting the user define their own defaults according to their needs. We could also allow the declaration of the type class to define a default:

class IsList l where
	default instance []
    fromList :: [a] -> l a

But then override this default elsewhere:

default instance IsList Set

Default instances could be exported/imported by a module, or not:

module Test (default instance IsList) where
	-- exports the default IsList instance

module Y where
import Test (default instance (..)) -- imports all default instances from Test

Ambiguities (i.e. multiple defaults imported) would have to be resolved by the import declarations

I can't see many major flaws in this system, apart from the slight added mental complexity in having to figure out which default instance is being used, but IDEs could quite easily make that a non-issue.

Conclusion

Haskell is a very cool language, but it could use some modernisation to make it more beginner friendly and approachable. It's unfortunate that the only modern-feeling pure functional language so far is Elm, which only works on websites. Hopefully in the future Haskell will consider looking at flaws like this to make the language better for everyone.

Simon, if you need more genius ideas, feel free to drop me an email.