Wednesday 13 March 2013

Mypy Development Update #2

This post is about the latest developments in the mypy project. A lot has happened since the last update in December, even though my family got hit by the flu pretty bad this winter; I lost perhaps two weeks of development time.

Latest Changes

  • There are many new type checker and Python back end features. Here are some of the more important:
    • Added package support (modules with names of form foo.bar). The mypy implementation now uses packages. As a side effect, the mypy driver is now named driver.py instead of mypy.py!
    • A module can be run as a script using the '-m' driver command line argument.
    • Arbitrary statements and references to class attributes are supported in the class body.
    • Added support for nested functions and classes.
    • Import statements can be used anywhere, not just at the top level of a file.
    • Added support for function decorators. I will write about using statically typed function decorators in another post.
    • Added support for 'with' statements. Also updated various library classes to support the with statement.
    • Special attributes such as __name__, __doc__ and __dict__ are supported.
    • Implemented chained assignments such as x = y = z.
    • Type checking of boolean operator expressions with non-boolean values works as expected (for example, s = s or 'x').
    • Set literals {a, b, ...} work.
    • Various minor conveniences now work, including raise without an argument and multiple types per an except block.
  • I have adapted several Python standard library modules to static typing. Currently there's around 3000 non-empty, non-comment lines of adapted code + around 7000 lines of related unit tests. I will write a separate blog post about my experiences, but generally the process was fairly smooth with the notable exception of the dozens of mypy bugs and missing features that I encountered during the work. However, this has improved the compiler front end tremendously, and I'm going to continue with more Python modules. An interesting result is that the type checker helped find several bugs in CPython 3.2 standard libs. This was somewhat unexpected. When starting the mypy project, I primarily wanted to improve programmer productivity and runtime efficiency. I wasn't really expecting to find bugs in debugged and tested code, so this is a very welcome result. Here's a link to the adapted code. Note that some unit tests still fail -- there's more fixing to do.
  • There are several new library stubs, including socket and time (thanks to Ron Murawski) and shlex. There are also dozens of fixes and additions to existing stubs, and several new partial stubs.
  • A lot of work since the last update has targeted the C back end. The C back end development has progressed well, though it will still take quite a lot of effort before it's usable for real programs. The biggest implementation changes are not user-visible and still only affect the first stages of the back end. However, this ground work will make future progress much faster.
  • We've started to use the GitHub issue tracker more actively. There are now 165 issues in total; 85 issues have been closed. Many changes and fixes still have no associated issue, though. As the project grows in size and complexity, the issue tracker will become more central to development.
  • Ashley Hewson started working on an automatic library stub generator. It could make it easy to enable mypy programs to access many Python modules.
  • I've written more about the compiler internals in the wiki.
  • Several new potential new features have been added to the wiki, including immutability and fixed-width integer types.
  • Several bugs have been fixed. Credits to me and Ashley.

Next Steps

Mypy development focus will shift more and more to the C back end (but porting more Python standard library code is important as well). There is still a lot of work to do before we can run interesting programs. For example, the current implementation has no garbage collector. The efficiency of the garbage collector is very important for mypy, as typical programs construct millions of objects. Instead of developing the garbage collector from scratch, I'm going to port the gc from the Alore VM. It's pretty speedy and has been working for me pretty well, and it supports multithreading. However, it does not support parallel collection yet, which is a minor downside.

The next minor C back end milestones is being able to run the well-known Pystone benchmark. Another small but important milestone is to be able to run unit tests using the native back end. This will speed up development.

The major long-term goal is to support a baseline compiler for a good subset of mypy and some standard library functionality, and to support self-compilation. This will speed up translation and compile times significantly. I concentrate on adding language features and making the compiler stable before turning to more complex optimizations. The baseline compiler will still give a good speedup over CPython due to static typing, optimised semantics and native code compilation, in addition to more powerful runtime type checking.

As before, there will also be incremental improvements to the compiler front end (parser and type checker) and the Python back end. The highest-priority features include properties, static and class methods and named tuples.