• thevoidzero@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    2 hours ago

    I thought the most mode sane and modern language use the unicode block identification to determine something can be used in valid identifier or not. Like all the ‘numeric’ unicode characters can’t be at the beginning of identifier similar to how it can’t have ‘3var’.

    So once your programming language supports unicode, it automatically will support any unicode language that has those particular blocks.

      • toastal@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        32 minutes ago

        OCaml’s old m17n compiler plugin solved this by requiring you pick one block per ‘word’ & you can only switch to another block if separated by an underscore. As such you can do print_แมว but you couldn’t do pℝint_c∀t. This is a totally reasonable solution.

      • thevoidzero@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 hour ago

        Sorry, I forgot about this. I meant to say any sane modern language that allows unicode should use the block specifications (for e.g. to determine the alphabets, numeric, symbols, alphanumeric unicodes, etc) for similar rules with ASCII. So that they don’t have to individually support each language.

        • NeatNit@discuss.tchncs.de
          link
          fedilink
          arrow-up
          1
          ·
          1 hour ago

          Oh, that I agree with. But then there’s the mess of Unicode updates, and if you’re using an old version of the compiler that was built with an old version of Unicode, it might not recognize every character you use…