https://xcancel.com/charliermarsh/status/1884651482009477368

We’re building a new static type checker for Python, from scratch, in Rust.

From a technical perspective, it’s probably our most ambitious project yet. We’re about 800 PRs deep!

Like Ruff and uv, there will be a significant focus on performance.

The entire system is designed to be highly incremental so that it can eventually power a language server (e.g., only re-analyze affected files on code change).

Performance is just one of many goals, though.

For example: we’re investing heavily in strong theoretical foundations and a consistent model of Python’s typing semantics.

(We’re lucky to have @carljm and @AlexWaygood on the team for many reasons, this is one of them.)

Another goal: minimizing false positives, especially on untyped code, to make it easier for projects to adopt a type checker and expand coverage gradually over time, without being swamped in bogus type errors from the start.

We haven’t publicized it to-date, but all of this work has been happening in the open, in the Ruff repository.

All driven by a uniquely great team: @carljm, @AlexWaygood, @sharkdp86, @MichaReiser, @DhruvManilawala, @ibraheemdev, @dcreager.

I’m learning so much from them.

Warning: this project is not ready for real-world user testing, and certainly not for production use (yet). The core architecture is there, but we’re still lacking support for some critical features.

Right now, I’d only recommend trying it out if you’re looking to contribute.

For now, we’re working towards an initial alpha release. When it’s ready, I’ll make sure you know :)

    • logging_strict@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      3 hours ago

      I use stub files and mypy, but have concerns about behavior.

      Thought the point is to move the static type checking stuff into a separate file. This makes the code much easier to read.

      1. When a particular stub file becomes out of date, contents don’t reflect what’s going on in the code, there is no warning.

      2. inner functions are ignored.

      3. a functions contents are ignored.

      Reluctant to use a library running node (gh actions aside) or Rust. My opinion is speed and correctness are insufficient arguments to introduce another tech stack. If something breaks, suddenly the onus is on me to understand why. That’s complicated if the additional tech stack is in a coding language i’m unfamiliar with.

      This takes out: ruff, uv, and pyright. And whatever else comes out.

      Have seven published python packages.

      Trying to be open minded. Please layout other arguments why should be open to utilizing other tech stacks.

  • UFO@programming.dev
    link
    fedilink
    arrow-up
    4
    ·
    24 hours ago

    I don’t think a powerful type system can be added to python effectively. Even more convinced of this after reading “minimize false positives”.

    Otoh, how strong of a type system is required for effective development? Probably what can be shoehorned into python tbh.

    • RecallMadness@lemmy.nz
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      3 hours ago

      In my experience, it’s a manageable trade off.

      You allow for Python “magic” at the cost of type safety. Or you forgo magic for types, and the resiliency that comes with it.

      Day to day, you don’t need magic. With good application of hinting you can stop many bugs before they appear.

      When you do need magic, you can usually construct it to work within the type system, or at the very least easily ringfence the tainted typeless code the magic introduced.

      The sync/async contradiction is much worse to wrangle.

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      4
      ·
      18 hours ago

      How powerful do you want it? Python’s type system is actually pretty good already and relatively sound when checked with Pyright (not Mypy though).

      It’s not Typescript-level, but it’s better than e.g. Java or C++.

      The main problem is Python developers writing code that can’t be statically type checked. E.g. using magically generate method names via __dict__ or whatever (I think lxml does that).

      • UFO@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        10 hours ago

        I’m on the more powerful the better side. So for me, Scala is the weakest type system I like working with.

        Practically tho: aside from the issues you mention, the type checker for python would be a great aid for a broader range of developers than myself!

    • vrighter@discuss.tchncs.de
      link
      fedilink
      arrow-up
      1
      ·
      24 hours ago

      i don’t believe it’s possible either. For example the tree walker of the ast module takes the node passed to it, checks its type, gets its name, then looks for the method with that dynamically looked up name in your implementation of the tree walker and if it does (the user might not have implemented a visit method for that type of node), calls it and passes the node to it. All of this at runtime.

  • onlinepersona@programming.dev
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    3
    ·
    edit-2
    2 days ago

    I had a quick look and am already afraid that they are redoing what RustPython (parses python) is doing in order to build their type checker.

    After looking at it longer, yep, they forked RustPython. I’m not going to go through the history to find out why, but my first impression is that it’s a shame. Now two projects will be doing very similar work, IINM. However, it’s about time mypy had competition. It works fine for many cases, but sometimes just is a very frustrating experience.

    Anti Commercial-AI license

    • solrize@lemmy.world
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      2 days ago

      I don’t like the current landscape of python type checkers.

      I figure that Python itself is at the bottom of this. It simply wasn’t designed for static types. Mypy is still of some use but if you want a statically typed language, trying to graft a type system onto a unityped language hasn’t worked out well as far as I know. See also: the Erlang dialyzer, Typed Racket, and whatever that Clojure extension is called. Even Scala has its problems because the JVM has its own type system that isn’t that great a fit for Scala.

      Also, why Rust as the implementation language? Just for speed? It seems a shame to not use Python/PyPy.

      • esa@discuss.tchncs.de
        link
        fedilink
        arrow-up
        5
        ·
        1 day ago

        Astral is already a Rust shop; uv and ruff are written in Rust, and it makes sense for them to expand on what’s already considered very successful.

        Rust can enable a lot of speed and “fearless concurrency”; it also has a pretty good type system and a focus on correctness. They’d rather be correct than fast (C made the other choice, but is also from another age), but also show that that extra correctness comes with little runtime speed cost (compilation is another story).

      • Kogasa@programming.dev
        link
        fedilink
        arrow-up
        3
        ·
        1 day ago

        Yes, speed and the benefits of all the tooling and static analysis they’re bringing to Python. Python is great for many things but “analyzing Python” isn’t necessarily one of them.

      • Ephera@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        I mean, we’ll probably disagree on this, but in my not so humble opinion, Python is very unsuited for this large of a project, whereas Rust excels at large projects. I imagine, these folks might have a similar opinion, given that they’re building this tool in the first place. 🙃

        But execution speed is also not something I’d ignore in a tool like that. I remember having to work with Pipenv and Poetry, and it was just cruel, having to wait more than a minute for it to tell you whether it can resolve dependencies for a fairly small project. And you’ll want to run a type checker a lot more often that that.

  • alsimoneau@lemmy.ca
    link
    fedilink
    arrow-up
    1
    arrow-down
    3
    ·
    1 day ago

    I never understood the need for a Python type checker. If I wanted static typing I would code in Fortran.

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      13 hours ago

      Just in case that’s a genuine question, the reasons people like static types are:

      1. Catch more bugs.
      2. Catches bugs earlier (while you are writing the code). This is sometimes called “shift left”.
      3. Fewer tests needed.
      4. Code is easier to understand.
      5. Code is easier to navigate.
      6. Refactoring is much easier.
      7. Development speed is faster (due to the above points).

      Often people say it slows development down but it’s actually the opposite. Especially for large projects or ones involving multiple people.

      The only downside really is that sometimes the types can get more complicated than they’re worth, but in that case you have an escape hatch via the Any type.

      • alsimoneau@lemmy.ca
        link
        fedilink
        arrow-up
        1
        ·
        8 hours ago

        Thanks for the answer. It is a genuine question.

        But don’t you loose polymorphism? It seems like a big trade-off. For context I’m a scientist doing data analysis and modeling, so my view point is potentially significantly different than most of “the industry”.

        Your points 1-3 are handled by running the code and reading the error messages, if any. For 4-5 “ugly” code will be unreadable wether it’s typed of not. For 6 refactoring now necessitate to change the types everywhere, which I imagine could be error prone and increase code inertia. And for 7 it would definitely slow down developpement untill you get familiar with the libraries and have tooling to automate stuff.

        I can understand the appeal for enterprise code but that kind of project seems doomed to go against the Zen of Python anyways, so it’s probably not the best language for that.

        • FizzyOrange@programming.dev
          link
          fedilink
          arrow-up
          1
          ·
          3 hours ago

          But don’t you loose polymorphism?

          No. You’ll have to be more specific about what kind of polymorphism you mean (it’s an overloaded term), but you can have type unions, like int | str.

          Your points 1-3 are handled by running the code and reading the error messages, if any

          Not unless you have ridiculously exhaustive tests, which you definitely don’t. And running tests is still slower than your editor telling you of your mistake immediately.

          I probably didn’t explain 4-6 well enough if you haven’t actually ever used static types.

          They make it easier to navigate because your IDE now understands your code and you can do things like “find all references”, and “go to definition”. With static types you can e.g. ctrl-click on mystruct.myfield and it will go straight to the definition of myfield.

          They make the code easier to understand because knowing the types of variables tells you a lot of information about what they are and how to use them. You’ll often see in untyped code people add comments saying what type things are anyway.

          Refactoring is easier because your IDE understands your code, so you can do things like renaming variables and moving code and it will update all the things it needs to correctly. Refactoring is also one of those areas where it tends to catch a lot of mistakes. E.g. if you change the type of something or the parameters of a function, it’s very easily to miss one place where it was used.

          I don’t think “you need to learn it” really counts as slowing down development. It’s not that hard anyway.

          I can understand the appeal for enterprise code but that kind of project seems doomed to go against the Zen of Python anyways, so it’s probably not the best language for that.

          It’s probably best not to use Python for anything, but here we are.

          I will grant that data science is probably one of the very few areas where you may not want to bother, since I would imagine most of your code is run exactly once. So that might explain why you don’t see it as worthwhile. For code that is long-lived it is very very obviously worth it.