Translate Fortran to C++ with AI and RAG
43 comments
·April 16, 2025jedimastert
AndrewGaspar
What you want isn't really "output C++ code that is pedantically equivalent to this Fortran code but with the array indexing fixed up", it's usually more like "translate this Fortran module into something that I can offload to a GPU using CUDA/ROCm/etc. with the same high level semantics, but GPU-friendly low level optimizations", and the exact composition of those low level bits probably don't look exactly like a loop-by-loop translation.
pjmlp
CUDA supports Fortran, is one of the reasons OpenCL lost.
IshKebab
Yeah I've used FORTRAN to C transpilers before too and they worked fine. There were some downsides though like it has to add and subtract 1 everywhere to deal with FORTRAN's 1-based indexing.
In theory AI could do a more idiomatic translation, but I think I would still prefer the janky but correct translation over the looks nice but probably subtly buggy AI one.
SanjayMehta
I don’t know what the transpiled code would look like vs that rendered by an LLM, but maybe the hope is that latter will be more readable?
int_19h
Yeah, but there's also N% chance that it's silently wrong.
Surac
there is no place for AI or C++ in this game. Just use a Fortran to C Transpiler . But i get it anything AI sounds modern and C++ because of reasons
nevi-me
Microsoft demoed a version of their GraphRAG that translated C code to (I believe) mostly idiomatic Rust, and it ran without errors.
I tried to find reference to how they did it, does anyone know?
It sounds like this approach of translating old code could help speed up teams that are looking at rewrites. I also have some old code that's in Kotlin that I'd like to move to something else. I had a bad NullPointerException take down my application after missing a breaking change in a Kotlin update.
vlovich123
From the looks of [1] they have a graph DB storing the code structure and acting as the RAG for an LLM.
vrighter
I do not believe it ran without errors on all cases.
npalli
[flagged]
nevi-me
I mentioned neither of the things you're shouting at me about.
> I also have some old code that's in Kotlin that I'd like to move to something else.
Something else was left unmentioned because I'm not even talking about Rust. My reference to it was that I've seen the approach before where a RAG is used to aid in translating C code, and it's an interesting thing which with more examples, might be easier to non-experts like me.
Translating languages is of great interest to various communities. I have friends stuck with a Scala codebase written by geniuses who are no longer around, and they want to move it to something else that the team is comfortable with.
npalli
My bad. Sorry. Didn't read the full comment and you didn't deserve the accusations from me.
vlovich123
Or he’s just mentioning the other major transpiler he’s heard of recently that happens to be C to Rust and wondering how it works and if it could be adapted to other language pairs. You’re the one that’s taken the conversation in a super weird direction all by your lonesome.
musicale
Good idea. I'd much rather write
do concurrent (i = 1:n)
y(i) = y(i) + a*x(i)
enddo
and then let the a compiler translate it into std::transform(par, x, x+n, y, y,
[=](float x, float y){ return y + a*x; }
);
if C++ is required for some reason.jll29
As a slight tangent, a re-write in another language is also an opportunity for the human engineer to re-design parts of the software that was clunky before or so that in the new target language idioms can be used.
Using automatic tools - whether AI-based or transpilers - leaves that opportunity unused, and both approaches are likely to create some additional technical debt (errors in translation, odd, non-idiomatic ways of doing things introduced by the automatism etc.).
WalterBright
A member of our community accidentally discovered that the D compiler could translate C code to D code.
D has a module system, where you import a module and it pulls the global declarations out of it. To speed up this process, there's a compiler switch to output just the global declarations to another file, a .di file, that functions much like a .h file in C.
Then there came along ImportC, where a C lexer/parser was welded onto the D compiler logic.
aaaand it wasn't long before the switch was thrown to generate a .di file, and voila, the C code got translated to D!
This resulted in making it dramatically easier to use existing C headers and source files with D.
npalli
This is great effort, wonder how it compares to Fortran2Cpp
pankajdoharey
LLM as translators for Cobol code to Java or Go should be attempted. And Shut down the IBM mainframe rent seek business for good permanently.
creatonez
No, not for the foreseeable future. In fact, this is the absolute hardest possible code translation task you can give an LLM.
COBOL varies greatly, the dialect depends on the mainframe. Chatbots will get quite confused about this. AI training data doesn't have much true COBOL, the internet is polluted with GnuCOBOL which is a mismash of a bunch of different dialects, minus all the things that make a mainframe a mainframe. So it will assume the COBOL code is more modern than it is. In terms of generating COBOL (e.g. for adding some debugging code to an existing system to analyze its behavior) it won't be able to stay within the 80 column limit due to tokenization, it will just be riddled with syntax errors.
Data matters, and mainframes have a rather specific way they store and retrieve data. Just operating the mainframe to get the data out of an old system and into a new database in a workable & well-architected format will be its own chore.
Finally, the reason these systems haven't been ported is because requirements for how the system needs to work are tight. The COBOL holdouts are exclusively financial, government, and healthcare -- no one else is stuck on old mainframes for any other reason. The new system to replace it needs to exactly match the behavior of the old system, the developer has to know how to figure out the exact confines of the laws and regulations or they are not qualified to do the task of porting it. All an LLM will do is hallucinate a new set of requirements and ignore the old ones. And aside from just knowing the requirements on paper, you'd need to spend a good chunk of time just checking what the existing system is even doing, because there will be plenty of surprises in such an old system.
jabl
The soon to be GCC 15 release will contain a COBOL frontend. Also other non mainframe compilers have existed for a long time, both proprietary and FOSS.
Thus, availability of a compiler is but a small piece of the puzzle. The real problem is the spider web of dependencies on the mainframe environment, as the enterprises business processes have been intertwined into the mainframe system over decades.
pankajdoharey
Which is why i think cross compiling to other dependencies and porting to other languages is a better solution. Many of these dependencies could be hardware specific. As long as core business solutions could be ported would be a win for everyone stuck in decades of vendor lockin.
alexchamberlain
I think the point was you could do that in COBOL; the vendor lock in won't go away just because you change language- it goes away when you decide to refactor the code to vendor agnostic solutions.
pjmlp
There are COBOL compilers that target JVM and .NET for as long as these technologies exist.
There are also modern compilers to IBM mainframes, including Go, C++, Java, PHP,..
Also outside DevOps and CNCF application space very few people bother with Go, specially not the kind of customers that buy IBM mainframes.
KaiserPro
Apart from cobol is only part of the reason for running on a mainframe. The other part is the orchestration and "resilience" of the mainframe platform
You can run cobol on x86, there are at least two compilers.
pankajdoharey
Resilience == redundancy, it has been successfully replicated by almost every organisation without mainframes. M-MANGA (Meta, Microsoft, Apple, Netflix, Google, Amazon) Infrastructure is quite resilient.
null
almostgotcaught
What is the point of this? Fortran is both faster than cpp and easier to write than cpp. It's also by no means a dead or dying or whatever language. Smells like literally "your scientists were so busy they forgot to ask why".
jandrewrogers
Seems pretty obvious to me, and I’ve written my fair share of both Fortran and C++. I think it is mostly that very few people know Fortran anymore and even fewer people want to maintain it. A vast number of people in 2025 will happily work in C++ and are skilled at it.
Fortran also hasn’t been faster than C++ for a very long time. This was demonstrable even back when I worked in HPC, and Fortran can be quite a bit worse for some useful types of performance engineering. The only reason we continued to use it was that a lot of legacy numerics code was written in it. Almost all new code was written in C++ because it was easier to maintain. I actually worked in Fortran before I worked in HPC, it was already dying in HPC by the time I got there. Nothing has changed in the interim. If anything, C++ is a much stronger language today than it was back then.
pjmlp
Fortran is still quite modern despite its age, and relevant enough that not only has one of the success factors of CUDA, the availability of Fortran on the CUDA SDK, LLVM project also started a Fortran frontend project.
Also to me seems more likely that people that enjoy Fortran in HPC are more likely to change to Chapel than use C++.
walleeee
> Almost all new code was written in C++ because it was easier to maintain.
What makes you say so? See musicale's comment above. I have a hard time seeing C++ as easier to maintain, if we are just talking about the language. The ecosystem is a different story.
jandrewrogers
For pure number crunching, Fortran is nicer. Unfortunately, performance for most codes became about optimizing memory bandwidth utilization at some point in the 2000s, and systems languages are much more effective at expressing that in a sane way. It was easier to make C/C++ do numerics code than to make Fortran do systems code. Some popular HPC workloads were also quite light on numerics code generally, being more about parallelization of data processing.
This was before modern C++ existed, so most of the code was “C with classes” style C++. If you can read C then you can read that code. I don’t consider that to be particularly maintainable by modern standards but neither is Fortran.
Modern C++ dialects, on the other hand, are much more maintainable than either. Better results with a fraction of the code. The article doesn’t say but I would expect at least idiomatic C++11, if not later dialects.
jabl
Some people at LANL seem to be on a holy crusade to replace Fortran with C++. They occasionally produce stuff like papers saying Fortran is dying and whatever. Perhaps it makes sense for their in-house applications and libraries, but one shouldn't read too much into it outside their own bubble.
jeffbee
I wonder if they feel that the toolchains are just rotting.
mkoubaa
This. If someone can't correctly articulate the advantages of Fortran they shouldn't be migrating away from it. This is not to say that migrations should never happen.
greenavocado
Chesterton's Fence
I'm trying to think of a reason this couldn't be done more directly with a pretty run-of-the-mill transpiler. Like I understand if this is a technical demo and there is a LOT of Fortran code, but...?
I've actually had to do this with a couple of different Fortran projects when I was in college, I translated them to C for various reasons.
Maybe it's because it was specifically code written by scientists (i.e somewhat brute force and very straightforward) but there really wasn't very many features that I can recall that didn't have a direct C counterpart, other than column major ordering and arrays staring at 1.
Was I just blissfully unaware?