Friendly Polymorphism in Functional Languages
Before we begin, let me make it clear that this is not supposed to be a groundbreaking new feature, nor is it described formally. This is me tinkering with ideas to improve on Haskell's type system, for me or someone else to perhaps implement in the future.
Background
Polymorphism is the ability for entities to have many forms, generally to make nice abstractions. In Object Oriented languages, this generally refers to subclassing and overriding: a Cat
class can take methods from an Animal
class and override them to do something more cat-like, while still fitting within the contract defined by the Animal
class.
In Functional languages, polymorphism is handled differently depending on the language. This article will primarily focus on Haskell, which uses "type classes" for its polymorphism. Type classes are similar to interfaces in OO languages, except they are not strongly tied to the class or an object (since there are no classes!). This means that type classes can perform polymorphic dispatch without needing a value to dispatch over.
Type Classes vs Interfaces
For example, let's say we're writing code to deserialize some JSON input and then map the input to some arbitrary type. However, we need to put some constraints on the types we can accept - deserializing a Function doesn't make much sense.
In Haskell we can define a type class to do this:
And then we can make an instance of the type class for some type:
instance FromJSON Int where
fromJSON (JSONInt i) = i
fromJSON other = error "Not an int"
This binds the t
defined at the class declaration to Int
and thus fromJSON
must have type JSON -> Int
Now we can use the class's function like so:
readJSON :: (FromJSON a) => String -> a
readJSON input = do
let json = parseJSON input
fromJSON json
main :: IO ()
main = do
let json = "3"
let parsed = readJSON json :: Int
print parsed
Notice how we never specify what a
is - readJSON
is "parametrically polymorphic" and will work for all types a
such that there is a FromJSON
instance defined for that type.
This is great, because we can write functions that work for multiple different types at once, while not compromising type safety.
Now let's compare this to how you'd do it in a statically typed Object Oriented language. I'll be using Java to demonstrate:
interface FromJSON<T extends FromJSON<T>> {
T fromJSON(JSON input);
}
This is straight forward enough, but requires a slightly messy generic declaration so that fromJSON
always returns the implementing type
Now we have a problem: we can't edit the types int
or Integer
... so how do we make a FromJSON
implementation?
The short answer is that we can't - this pattern in Java simply isn't possible. But what about types we do control? Let's define some Person
type and make it implement FromJSON
:
public class Person implements FromJSON<Person> {
private final String name;
private int age;
public Person(String name, int age) {
this.name = name;
this.age = age;
}
@Override
public Person fromJSON(JSON input) {
// implementation irrelevant
}
// getters and setters etc here
}
Looks good, but now we have a problem - how do we actually call fromJSON
? It's not a static method, so it requires an instance of Person
to call - but we don't have an instance yet! We have a chicken vs egg problem.
So even when we control the type's source, this pattern isn't really possible. There are a few workarounds, of course, but none of these are ideal:
- Add an no-args constructor to
Person
that sets all of the fields tonull
. This is pretty bad as now all your fields are nullable even if they shouldn't be, and you could potentially have objects with invalid state floating around - We could make the no-args constructor private, and then define a static method which calls this constructor exclusively to call
fromJSON
. This lessens the problem, but doesn't fix it. We still either have to make everything nullable or violate the expected state, which could easily result in bugs. It's also quite verbose, and makes any type of function like the previously mentionedreadJSON
impossible - as Java can't do dynamic dispatch with static methods - The best "solution" would be to move the
FromJSON
implementation to a separate class - this is what libraries like Gson and Jackson do. This is more of a workaround though, as we now have 2 classes rather than 1, one of which only exists to allow dynamic dispatch because of the type system's weaknesses.
From this short (ish) example, we can see the benefits of type classes over interfaces - although they are not perfect by any means.
Type Classes' weaknesses (in Haskell)
While type classes are much more flexible in their usages, this flexibility can lead to quite a few problems.
Ambiguous Types
It's fairly common to run into problems with ambiguous types in Haskell, where there are multiple possible types to use and so the compiler doesn't know which.
Consider a slight modification of the previous code:
main :: IO ()
main = do
let json = "3"
let parsed = readJSON json
print parsed
The compiler will complain here because there's no way of it knowing what FromJSON
instance we want to use - and so there's no way of knowing which function overload to call. This could be solved with an explicit type declaration: let parsed = readJSON json :: Int
or a language extension called TypeApplications
: let parsed = readJSON @Int json
, but both of these are slightly messier.
Complex Errors
The previous ambiguity issue leads nicely into the next problem, which is that the error messages generated for type class errors are often very unpleasant and confusing. The error generated for the previous code will look something like this:
JSONBlog.hs:24:3: error:
• Ambiguous type variable ‘a0’ arising from a use of ‘print’
prevents the constraint ‘(Show a0)’ from being solved.
Relevant bindings include parsed :: a0 (bound at JSONBlog.hs:23:7)
Probable fix: use a type annotation to specify what ‘a0’ should be.
These potential instances exist:
instance Show Ordering -- Defined in ‘GHC.Show’
instance Show a => Show (Maybe a) -- Defined in ‘GHC.Show’
instance Show Integer -- Defined in ‘GHC.Show’
...plus 23 others
...plus 13 instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
• In a stmt of a 'do' block: print parsed
In the expression:
do let json = "3"
let parsed = readJSON json
print parsed
In an equation for ‘main’:
main
= do let json = ...
let parsed = ...
print parsed
|
24 | print parsed
| ^^^^^
Looking at the error, it's not immediately clear what we've actually done wrong. GHC seems to imply that using print
is what's causing the issue - while this is true, it doesn't show the whole picture.
The expression let parsed = readJSON json
has the type parsed :: (FromJSON a) => a
, meaning parsed
currently can have the type of any type a
with a FromJSON
instance (remember that Haskell is lazy and so nothing has actually been parsed yet).
When we call print parsed
, GHC tries to unify the type (FromJSON a) => a
with the type of print :: (Show s) => s -> IO ()
, and throws an error because this isn't possible. parsed
can still have infinitely many types, and so the Show
constraint can't be solved.
This makes sense, but the error message doesn't tell us most of that. It just says that there's some ambiguity - it's up to the programmer to figure out why that ambiguity exists. A nicer, Elm-style error might look more like this:
JSONBlog.hs:24:3: error:
• Function ‘print’ cannot be called with the argument ‘parsed’
because ‘parsed’ has ambiguous type ‘(FromJSON a) => a’
‘print’ requires its argument to have type ‘(Show s) => s’,
but ‘s’ cannot be determined for a polymorphic type, as infinitely
many instances could exist.
Fix: specify the expected type of ’parsed’ explicitly, either
at the point of creation or at the call site.
For example:
let parsed = fromJSON json :: T
or
print (parsed :: T)
where ‘T’ explicitly satisfies any of the ‘Show’ instances:
instance Show Ordering -- Defined in ‘GHC.Show’
instance Show a => Show (Maybe a) -- Defined in ‘GHC.Show’
instance Show Integer -- Defined in ‘GHC.Show’
...plus 23 others
This is slightly naive, of course, as in practice many type ambiguity errors are far more complex than this, so generating pretty error messages is harder than it may seem - but there is still a big room for improvement.
Too much flexibility?
One of Haskell's nice quality of life feature is that it allows numbers, string, and list literals to be polymorphic. If we want to use a binary string ByteString
, a Set
instead of a list, or a complex number instead of an integer, we can do so without needing to explicitly convert:
This is achieved using type classes - the value 3
has type (Num a) => a
, the value "Hello"
has type (IsString s) => s)
, et cetera. These classes have methods to convert from the default types into the specified type s
Unfortunately this comes at a price - firstly, the literals desugar into function calls, which adds a bit of extra overhead. Our "set literal" has to be converted from a list to a set at runtime, an O(n)
operation in both time and space. Secondly, it's quite easy to run into ambiguous types yet again. Having to figure out the cause of the type error and adding a manual type declaration in the right place is time consuming and annoying.
This problem exists because type classes don't have any concept of a "default instance". They are either fully polymorphic, or not at all.
A real world example of this issue is when parsing with the library Megaparsec, which provides the combinator oneOf :: (Foldable f, MonadParsec e s m) => f (Token s) -> m (Token s)
which takes some Foldable
of tokens and matches any one of the tokens.
Naturally you might use it like this: operatorChar = oneOf ("!#$%&*+./<=>?@\\^|-~")
, which matches any one of the symbol characters. Unfortunately, with polymorphic strings, this expression is ambiguous because GHC has to unify (Foldable f)
and (IsString s)
, which again isn't possible.
Type Class Defaults as a Solution
I mentioned earlier that type classes don't have default instances. This is slightly untrue - GHC has a special edge case to apply for numeric type classes (Num
, Integral
, Rational
, etc) to make them default to either Integer
or Double
depending on their type. Unfortunately, that's as far as it goes (except in GHCi). All the defaults are hardcoded in the compiler and we can't add or remove any.
This system is designed so that the defaults are only used if a type ambiguity error would occur. Given a function f :: (Num a) => a -> a
, we can still write f 3
and it will use the polymorphic type rather than a default. This way, we don't lose the flexibility when we want it.
One potential improvement to this would be to default to instances in the Prelude
module - []
for IsList
, String
for IsString
, et cetera. This intuitively makes sense, but what if there are multiple instances in Prelude
? Or none? This also puts more of a "magic" dependency in Prelude
, which is not really the case - the only special thing about it, is that it's implicitly imported
So the best way would be letting the user define their own defaults according to their needs. We could also allow the declaration of the type class to define a default:
class IsList l where
default instance []
fromList :: [a] -> l a
But then override this default elsewhere:
default instance IsList Set
Default instances could be exported/imported by a module, or not:
module Test (default instance IsList) where
-- exports the default IsList instance
module Y where
import Test (default instance (..)) -- imports all default instances from Test
Ambiguities (i.e. multiple defaults imported) would have to be resolved by the import declarations
I can't see many major flaws in this system, apart from the slight added mental complexity in having to figure out which default instance is being used, but IDEs could quite easily make that a non-issue.
Conclusion
Haskell is a very cool language, but it could use some modernisation to make it more beginner friendly and approachable. It's unfortunate that the only modern-feeling pure functional language so far is Elm, which only works on websites. Hopefully in the future Haskell will consider looking at flaws like this to make the language better for everyone.
Simon, if you need more genius ideas, feel free to drop me an email.