COBOL to Kotlin via Formal Models (IR and Alloy and Golden Master)

(marcoeg.medium.com)

61 points | by marcoeg 8 days ago ago

19 comments

  • djoldman 2 days ago ago

    > 1. Both systems run with the same fixed input files (data/accounts.dat, data/txns.dat).

    > 2. Each writes its results to out/accounts_out_*.dat.

    > 3. Python scripts convert fixed-width output to CSV and compute SHA-256 checksums.

    > 4. If the hashes match — behavior is proven identical.

    Step 3 above introduces the possibility that the python scripts alter the output in such a way that the outputs don't actually match prior to the python.

    I'm curious why step 3 is not "If the two outputs match — behavior is proven identical."

  • drob518 2 days ago ago

    From the article:

    > This enduring reliance exists not out of nostalgia, but necessity: COBOL’s reliability, stability, and the prohibitive cost and risk of replacing decades of deeply integrated logic make it one of the most mission-critical technologies ever built.

    That sentence struck me as odd. Is COBOL any more "reliable" or "stable" than any other language? I'm no COBOL expert, but when I've looked at it and read about how it works, it seems rather verbose and mundane. That's not unexpected; it was developed in a different era with different sensibilities.

    • skissane 2 days ago ago

      Historically, COBOL lacked dynamic memory allocation-all data structures were fixed size and allocated at program startup. Although COBOL now has the equivalent of malloc/free, its long-time absence encouraged a coding style of using it sparingly-which does make a whole class of bugs less common in COBOL programs

      • bhawks 2 days ago ago

        Yes no dynamic memory allocation, however there still are many ways to ABEND your COBOL program. The reliability aspect comes from the fact that these systems have been running for 40+ years, and places where it could have ABEND'd probably have been fixed [hopefully].

        • drob518 2 days ago ago

          Okay, sure, but neither of those things are specific to COBOL. You can write C programs that allocate all memory statically and chase down every core dump over time and have a very reliable C program. Or better yet use Lisp or even Java with GC, if you find C too unsafe.

          • skissane 2 days ago ago

            Programming languages are a bit like natural languages-they aren’t purely systems of formal rules, they are also usage patterns-there are lots of sentences which are formally correct English, but which few English speakers would ever construct-valid syntax and semantics, but stylistically and pragmatically abnormal. In the same way, a programming language is more than just the set of strings accepted by its compiler, it is a culture-language A may produce (in practice) more reliable code than B, not because of its feature set, but due to the cultural baggage that comes with it-but in a broader sense of “language”, that culture is part of the language too.

          • rdc12 2 days ago ago

            With C in the embedded world it is very common to write entire applications that never only use static memory and the stack. Sometime programmers will allow dynamic memory during init only, other times not even then (I tend to favour the never approach, as I can verify that malloc is never called anywhere).

  • bigdatajs 2 days ago ago

    The problem I have with all Cobol translation models is that it completely ignores the actual modernization of the system. You've traded one type of syntactic sugar with another.

    • agumonkey 2 days ago ago

      you mean cobol 2002+ revisions ?

      • mike_hearn 2 days ago ago

        I think they mean that "COBOL" is often used as a synonym for old mainframe based software. The language isn't the biggest issue with such systems, usually. Any programmer can learn COBOL, just translating one syntax to another doesn't buy you much. It's also about the hardware the stuff runs on, the database systems, the job schedulers, etc.

        • agumonkey a day ago ago

          oh right, fair point

  • bhawks 2 days ago ago

    Having COBOL sources which match whats running in production is a load bearing assumption :).

  • marcoeg 8 days ago ago

    I’ve been experimenting with formal, verifiable modernization and taking a small COBOL batch program and translating it through an intermediate representation and Alloy formal model into Kotlin, while proving equivalence with the legacy output.

    Repo: https://github.com/marcoeg/cobol-modernization-playbook

    Would love feedback from people who’ve worked on reverse engineering or legacy transformations at scale.

    • kvemkon 2 days ago ago

      > formal, verifiable modernization

      Would it be possible to do the same to modernize a Kotlin program becoming legacy in the future to something even more modern?

    • tshanmu 2 days ago ago

      how are you creating the IR?

    • tadfisher 2 days ago ago

      The code is slop, correct?

      All the inputs and outputs are hardcoded. The code doesn't do anything except write hardcoded strings to files. Am I mistaken?

  • dfboyd 2 days ago ago

    Isn't the first code sample pasted in there twice?

    • Jtsummers 2 days ago ago

      Yes, starting at:

        STOP RUN.```cobol
      
      Then the code repeats.
  • marcoeg 8 days ago ago