Pattern Matching – Make the Compiler Work for You

rubiquity · on May 3, 2014

I've recently started learning Erlang and Elixir and the feature that made me fall in love is pattern matching, combined with great support for recursion. Yes, some people will say pattern matching is just the same as if/switch statements but I think it's so much better. Pattern matching let's you push that conditional logic one level lower (similar to polymorphism and duck typing) and it makes reasoning about code so much easier.

dozzie · on May 3, 2014

Pattern matching is 1) conditional on values, 2) conditional on structure (not something you can get in imperative languages), 3) data extractor from data structures, all at the same time. Yes, it's helluva convenient. Pity it didn't make it to imperative languages (yet).

DanWaterworth · on May 3, 2014

It's in rust.

cromwellian · on May 3, 2014

Scala has it.

_mikz · on May 3, 2014

And do they support this kind of smart warnings on unmatched possibilities?

alco · on May 3, 2014

There are no enums in Erlang and Elixir. If you have a case expression and there is no match with the given value, you'll get a run time error. This is deliberate and is part of the general approach to error handling in those languages.

Statically typed languages like Haskell and Rust require the programmer to list all possible cases when matching on a value.

IsTom · on May 3, 2014

Haskell doesn't require you to do that by default. There is a warning flag for that, but in general language is okay with defining partial functions, more so than most languages (because laziness lets you cheat death).

kibwen · on May 4, 2014

  > Statically typed languages like Haskell and Rust require 
  > the programmer to list all possible cases when matching 
  > on a value.

This isn't a consequence of static vs. dynamic, it's just a question of philosophy. In fact, Rust didn't use to require you exhaustively list out all cases, though it does today in order to help catch errors at compile-time rather than at runtime.

I should also mention that if yo don't want to list out all variants, that's easily done by using the _ pattern (the "ignore this" pattern):

  match some_direction {
      East => print("You're going east!"),
      _ => print("You're going in some direction that isn't east!")
  }

alco · on May 4, 2014

What I mean is that it's not possible in a dynamically typed language to enforce at compile time enumeration of all possible cases for a variable, the type of which will be known at run time.

dozzie · on May 3, 2014

Actually, it's done some other way. You have Dialyzer, which tells you that your case clause will never match the data you provided or the case will surely blow up because clauses didn't catch some possibility that will occur.

kyrra · on May 3, 2014

Not sure about other IDEs, but for Eclipse for Java has a warning/error flag for enum/switch issues described here.

http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse....

> Incomplete 'switch' cases on enum

> When enabled, the compiler will issue an error or a warning whenever it encounters a switch statement which does not contain case statements for every enum constant of the referenced enum nor a default case. A default case is assumed to cover missing case statements for any enum constant, so the compiler is silent in the presence of default cases unless the option "Signal even if 'default' case exists" (see below) is enabled.

EDIT: oddly, I thought the blog post's first example was Java, not C#. Man, I forget how similar the syntax is.

dllthomas · on May 3, 2014

As a side note, traditional enum completeness checking in C is broken, in that you can only choose either to be warned statically when you need to add a new case (no default case) or to handle things at runtime if you get something unexpected (a default case).

Fairly recent gcc (last few years) now has a -Wswitch-enum flag which says "warn me even when there is a default label" which is the safest option (I assume clang has something similar though I haven't actually checked).

nhaehnle · on May 3, 2014

This option is also really annoying when you have switches over really large enum types, such as key codes, where it is usually legitimate to have a default case.

I guess even more annotations are needed ;)

dllthomas · on May 3, 2014

Heh, doubtless. If you have a switch statement where adding a case to the enum doesn't mean you need to consider whether to add it to that switch, it sounds like an appropriate place to use pragmas to disable the warning for the one switch (http://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html). If you do need to consider that switch, then I'd say you should still suck it up and list every case (though possibly collapsing similar groups of cases into a single macro... "#define NUMERIC_KEYS case KEY_KP0: case KEY_KP1"...).

_ejd · on May 3, 2014

GNU C has the Case Range C extension (http://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html#Case-Rang...) which allows you to write "case KEY_KP0 ... KEY_KP10:".

dllthomas · on May 3, 2014

Oooo, very nice!

pjmlp · on May 3, 2014

Lots of things in C are broken, not only enums.

dllthomas · on May 3, 2014

Thanks.

platz · on May 3, 2014

Welcome to the expression problem!

http://en.m.wikipedia.org/wiki/Expression_problem

benjiweber · on May 3, 2014

I was coincidentally just blogging about doing this in Java

http://benjiweber.co.uk/blog/2014/05/03/pattern-matching-in-...

cmplxen · on May 3, 2014

F#. If you're working in .NET, you might also be interested in [PEX](http://research.microsoft.com/en-us/projects/pex/) - it can catch edge cases like these, automatically generate high coverage unit tests, etc. and integrates nicely into VS.

BruceIV · on May 3, 2014

Wouldn't the OP get the same result by staying in C# and changing the outer if-else to a switch? I'm not super familiar with C#, but many C++ and Java compilers do that. (Pattern matching is still a cool feature, as some of his later examples show, I just don't think his first one motivates it.)

radicalbyte · on May 3, 2014

Yes, I'd never let his outer if-else get through code review. Visual Studio even has special support for generating a correct switch statement (all values + default) for enums.

Still, I'd love to have first-class pattern matching in C# and of course algebraic data types...

JackMorgan · on May 4, 2014

The correctly generated switch case isn't the point. It's not that it can't be generated the first time, it's that it can't detect the accuracy of the switch case every time there is an edit.

Also, the enum in a switch is the least powerful thing F#'s pattern matching gives you. Consider the second to last example, where I turn three nested switch statements into a single match. Now the compiler checks on the _combination_ of those three for anything missing. That is far more powerful than any static analysis tool I've seen can detect. Think, any time you need polymorphism, instead of reaching for classes and interfaces, you can just have a "combined" match that not only checks that you have all the types, but that the conditional work each type was doing is "all there". That's some powerful stuff.

What that leads to is the inverse of traditional polymorphism, now the "behavior" for a type can reside where it's needed, all together right where used. Not only is this safer, but it's much more likely to be the axis on which it will change.

In my experience, the traditional interface and class polymorphism changes in such a way that I need to edit every class. Usually, it's an extra parameter one class needs, or a different return type. Changes like that mean editing every subclass. Instead, if I used "match" for my polymorphism, those changes would just be all together, next to each other.

What you lose with this inverted polymorphism is you no longer have a single place to see all the behavior for a single type. We've changed the grain of how the type is used to run such that seeing the behavior for a single function for all types is possible, but seeing all the functions for a single subclass requires going to several classes. The good news here is _you can choose which style you want for every single function_. You can even have this function be a match on the type in place, while this other lives on the subclass! Now you can choose the best axis based off the likely change pattern. Going to be changing the interface a lot? Make it a match. Going to be only changing the internals of a single subclass a lot? Put it in the interface. And it's easy to change one to the other, so you don't even have to get it right the first time!

That being the case, and seeing how powerful the match is at detecting missing edge cases, I'd suggest always start with match for polymorphism, then convert to an interface method when needed.

gordaco · on May 3, 2014

(EDIT: Funny, I made the same mistake as kyrra, thinking it was Java. At least my last line still applies...)

The true solution, that doesn't involve using a new language, is to include the data for the switch inside each of each element of the enum, instead of treating it like C enums. That forces you to define all you need for "German" and give it to a constructor; otherwise the compiler complains about it. Come on, it's one of the very few things from Java that I miss when I do C++.

By the way, in Spanish it's "cero" not "zero" :).

JackMorgan · on May 3, 2014

Sure, but something still has to link the enum to the concrete class, then conditionally instantiate the concrete clas, right? Do you have an example of what you are talking about?

As to cero, yep, whoops!

agency · on May 3, 2014

At least in Java, Enums can implement interfaces. The following is valid:

  public interface ILanguage {
    String convert(int number);
  }

  public enum Language implements ILanguage {
    English {
      String convert(int number) {
        switch(number) {
          case 0: return "zero";
          case 1: return "one";
          default: return "...";
        }
      }
    },
    Spanish {
      String convert(int number) {
        switch(number) {
          case 0: return "zero";
          case 1: return "uno";
          default: return "~~~";
        }
      }
    }
  }

gordaco · on May 3, 2014

Ok, let's see... this is a draft and I don't have a compiler here, so I may write something wrong, but it would be something like this. Assuming that you don't want to use simply a translation table, and you really need different code for each element from the enum, you would first create an interface:

public interface ILanguage {

String convert(int num);

}

Then you would subclass for each language:

public class SpanishLanguage implements ILanguage {

  public String convert(int num) {

    String ret;

    switch (num) {

      case 0: ret="cero";break;

      case 1: ret="uno";break;

      default: ret="~~~";

    }

    return ret;

  }

}

And the same for English, German or whatever you want. Yes, it gets too verbose, that's why I'd prefer just a translation table... in any case, you can abstract common code for languages that only implement translation for 1, 2 and others, and create more complex subclasses for different behaviours. But I digress...

Now that you have the interface, you would declare the enum as follows:

public enum Language {

  SPANISH(new SpanishLanguage()), ENGLISH(new EnglishLanguage()), GERMAN(new GermanLanguage());

  // This is a member instance.

  private ILanguage translationImpl;

  // And this is the one and only constructor.

  public Language(ILanguage translationImpl) {

    this.translationImpl=translationImpl;

  }

  public String convert(int number) {

    return translationImpl.convert(number);

  }

}

If you do this, you need to supply the implementation of the language to the constructor, otherwise your code fails to compile. So you can't add FRENCH or JAPANESE after GERMAN(new GermanLanguage()); you need FRENCH(new FrenchLanguage()), JAPANESE(new JapaneseLanguage()) and so on. Of course you can do JAPANESE(null), but it's immediately clear that you're doing it wrong.

I'm completely aware that there is a lot of Java-ness (in the bad sense) in this code, so maybe in C# you can do it more elegantly. I haven't touched C# for a good 10 years...

EDIT: formatting.

CyberShadow · on May 3, 2014

D solves this particular problem in a simpler way: it has a `final switch` block, which forces the case set to be complete. It doesn't have arbitrary boolean expressions as seen here, but it does add an interval syntax.

http://dlang.org/statement.html#FinalSwitchStatement

http://dlang.org/statement.html#CaseRangeStatement

pjmlp · on May 3, 2014

First time I used pattern matching was in 1996 with Caml Light, followed by Prolog clauses.

Miss their power ever since. Thankfully FP is finally spreading into the industry.

masklinn · on May 3, 2014

Although completeness checking is a good idea, it is not a feature of pattern matching (ghc still does not enable it by default as of 7.8; conversely both Eclipse and IntelliJ IDEA are able to detect and warn about missing enum cases in switches).

JackMorgan · on May 3, 2014

That is true, but checking all the values of an enum is just the tip of the iceburg. Consider something as sophisticated as the second to last example in the post, where, because of pattern matching, I am able to to "flatten" an if and a switch into a single match. Since it is a single expression, the compiler can provide completeness checking on the _combination_ of the two. I am very curious if there are any static analysis tools that can provide that level of safety without pattern matching.

ritonlajoie · on May 3, 2014

I encountered a similar issue at work. How one would do compile-time completeness validation in C++ without using some extra stuff like Boost or the new features of C++ 0xXX?