Notebooks as reusable Python programs
39 comments
·March 19, 2025abdullahkhalids
cantdutchthis
(someone from the marimo team here)
The `export` command can generate a rendered artifact if that's what you're after but there is also another avenue here, have you seen the caching feature? The one that caches to disk and persists?
https://docs.marimo.io/guides/expensive_notebooks/?h=cache#d...
This can automatically store the output of expensive functions, keeping the previous state of the cells in mind. If a re-compute ever needs to happen it will just load it straight from cache.
Another option is to run in lazy mode, documented here:
https://docs.marimo.io/guides/expensive_notebooks/?h=cache#l...
This will prevent the notebook from rerunning cells by accident.
We're thinking about adding features that would make marimo great for running long running batch work but there's not a whole lot I can share about it yet. If you have specific thoughts or concerns though, feel free to join our discord!
abdullahkhalids
The caching is a very nice feature, and will stop me from keeping my computer running for days/weeks while I work on a notebook.
If I understand it correctly, `@mo.persistent_cache(name="my_cache")` creates a binary file `my_cache` that I should commit as well if I don't want others to repeat the computation?
This kinda solves the problem, except for having two files per notebook, and that marimo notebooks are no longer viewable with output on github directly.
mscolnick
The default "store" is a local FileStore. In your case, it will save the outputs to a file on disk called `my_cache`.
We plan to add more stores like Redis, S3-bucket, or an external server, since you may not always want to commit this file, but like you said want others to avoid the computation.
mscolnick
> One design decision they made is that outputs are not stored
This is not quite true. Outputs are not stored...in the Python file*. marimo does store outputs in the `/__marimo__` folder with settings enabled.
> writing the boiler plate to let the reader load the results.
Therea are some primitives to do this for you, such as mo.persistent_cache. This can be an annotation or 'with' block. It intelligently knows when either the source code or inputs change.
The plan is to take this one step further than storing just the output. Because marimo knows the dependencies of each cell, in a future version, it will store each output AND let you know which are stale based on code changes. This is being built by Dylan (from the blog) and inspired by Nix.
akshayka
It’s true that we don’t store outputs in the file format. This is the main tradeoff, as discussed in the blog. But that doesn’t mean marimo notebooks aren’t suitable for heavy computation.
marimo lets you automatically snapshot outputs in an auxiliary file while you work. We also have a persistent cache that lets you pick up where you left off — saving not just outputs but also computed data.
Many of our users do very heavy computation.
duped
> Other people are not expected to run minutes/hours long computation to see what the author intended
Arguably this is a good thing. You shouldn't distribute things you can't prove have the same results and one way to do that is to require others to run the same computations.
0cf8612b2e1e
You lose out on so many use cases without the stored output. The first that comes to mind is all of the learning resources that are now presented in notebooks.
Most learners do not need to fact check the instructor, but do want to see the operations which were run. Those that are curious can run/edit the notebook themselves.
Edit: The JupyterBook ecosystem (https://executablebooks.org/en/latest/gallery/) as an example of what is possible with stored plots/calculations. Most learners are just going to follow along with the material, but being able to optionally play with the data is the super power of the platform with minimal friction.
mscolnick
The parent comment wasn't fully correct. marimo doesn't store outputs in the notebook file, but it does have many ways outputs are stored alongside the notebook, or remotely if you'd like: HTML, ipynb, pickle
epistasis
It's hard for me to imagine the use case where this is appropriate.
I have looked at marimo several times, and while it's great for interactive computing, and it has a fantastic team, it's not a replacement for notebooks and I find their use of the term "notebook" confusing. As a scientist, I don't understand what use case they are exploring, but I do know it's not the use case where Jupyter was created and it's not my current use case for Jupyter on teams I work with.
aaplok
> You shouldn't distribute things you can't prove have the same results
Why not? Can you expand on this because I don't see why this is not a good thing.
Besides if you distribute your code alongside your output, aren't you providing that proof anyway? People can run your code and see they are getting the same result.
Kydlaw
I discovered Marimo a couple weeks/months ago here iirc. This really lands on a sweet spot for me for data exploration. For me the features that really nails it are the easy imports from other modules, the integrated UI components, and the app mode.
Being able to build model/simulations easily and being able to share them with others, who can then even interact with the results, as truly motivated me to try more stuff and build more. I've been deploying more and more of these apps as PoCs to prospects and people really like them as well.
Big thanks to the team!
jdaw0
i wanted to like marimo, but the best notebook interface i've tried so far is vscode's interactive window [0]. the important thing is that it's a python file first, but you can divide up the code into cells to run in the jupyter kernel either all at once or interactively.
0: https://code.visualstudio.com/docs/python/jupyter-support-py
0cf8612b2e1e
This is also where I have landed. Gives you all of your nice IDE tooling alongside the REPL environment. No need for separate notebook aware code formatters/linters/etc. That they version cleanly is just the cherry on top.
darkteflon
Looks very interesting. Could you elaborate on why you prefer this over the .ipynb notebook interface built into VS Code? The doc you linked mentions debugging, but I have found that the VS Code debugger is already fairly well-integrated into .ipynb notebooks. Is it mainly the improved diffing and having a REPL?
aaplok
Spyder also has these, possibly for longer than vscode [0]. I don't know who had this idea first but I remember some vim plugins doing that long ago, so maybe the vim community?
[0] https://docs.spyder-ide.org/current/panes/editor.html#code-c...
kylebarron
Agreed, I find this to be a super productive environment, because you get all of vscode's IDE plus the niceties of Jupyter and IPython.
I wrote a small vscode extension that builds upon this to automatically infer code blocks via indentation, so that you don't have to select them manually: [0]
cantdutchthis
Out of curiosity, does this approach also allow for interactive widgets?
luke-stanley
Yes. Though it is split into a code section and an interactive section, like with Markdown previews. It really is driven by the code cells though.
epistasis
I have looked at Marimo in the past, and read this blog post with great interest, but I still don't "get" Marimo. What it does well: have a sane way to create and interact with widgets. Lots of widget authors and tooling authors, people I respect a lot, admire Marimo and like how it does stuff.
However, I'm not sure what the use case is for Marimo. I see Jupyter notebooks being used in two primary use cases: 1) prototyping new code and interactions with services and databases and datasets, as a record of the REPL used to understand something, with interactive notes and plots and pasted in images from docs, etc. 2) a record of how a calculation was performed, experimental data analyzed, and in a permanent artifact that others can look up later. For both of these, outputs and markdown/image cells are just as important as the code cells. These are both "write once" types of things where changes in git are rare, and ideally would never happen.
With Marimo, can I check the outputs directory into version control in a reasonable way and have it stored for posterity? Is that .ipynb?
Is there a way to convert a stored .ipynb checkpoint back into the marimo format?
And why does a small .ipynb change lead to many lines of change in the git diff? It's because the outputs changed. Deciding to not store outputs in version control and counting it as a win for pretty git diffs is saying "this core feature of .ipynb should be ignored because it's inconvenient". I'd much rather educate people about turning on GitHub's visual Jupyter diff rather than switch to an environment where I can no longer store outputs inline.
Similarly, being able to import one cell into a different notebook seems like the wrong direction to solve the problem of "it's time to turn the prototype notebook into a reusable module." If it's time to reuse a cell, it's time to make a cleaned-up Python code module file, not have the code interspersed with all the rest of the stuff.
I'd like to learn more about the use cases where Marimo is useful. As a scientist, it's not useful to me. I don't care about smaller git diffs on a notebook, in fact if a notebook is getting changed and re-checked into version control then a big awkward diff is not a problem and probably a feature, because notebooks should not be getting changed. They are a notebook something that you write in once and it's done!
ayhanfuat
Unfortunately they don’t have Jupyter’s command mode. I wanted to switch a few times but not being able to create/delete/copy/move cells as easily is a big issue for me.
cjohnson318
I sometimes use notebooks mostly for taking notes, with a few code samples. In these cases, dealing with ipykernel and firing up a notebook is kind of a pain. Being able to open a "notebook" and make changes in vim sounds great.
florbnit
> When working with Jupyter, too often you end up with directories strewn with spaghetti-code notebooks, counting up to Untitled12.ipynb or higher. You the notebook author don’t know what’s in these notebooks
This is such a small UX thing but it’s so damn important. The simple fix is to not auto-name notebooks untitled-# when the user clicks new notebook just ask the name straight away, if they can’t name it don’t create it. It might add the smallest amount of friction to the UX, but it’s so damn important.
Also the choice of json as the file format is just plain wrong. Why the project hasn’t just abandoned that entirely and done a json-#python back and forth when writing to file is beyond me. There are extensions that do this, but that’s a really clunky interface, and while I can set it up for myself it’s difficult to force upon others in a corporate environment.
Great to see someone is taking the seemingly small things up, because they mean a world of difference to the overall ecosystem.
BrenBarn
> The simple fix is to not auto-name notebooks untitled-# when the user clicks new notebook just ask the name straight away, if they can’t name it don’t create it.
The even simpler fix is to just not name them until the user does. That's the way other programs work. If you create a new document in a word processor, it will say "Untitled" at the top of the window, but it doesn't create a file called untitled.doc on disk until you do "Save as" and choose a filename. It has always irritated me that Jupyter insists on having an on-disk file right from the beginning.
cantdutchthis
(someone from the marimo team here)
How you start the marimo notebook, via
`marimo edit must-give-name-to-this-file.py`
is indeed one of my teeny by favourite features of it. When you start a new notebook you're kind of forced to name it immediately.
TheAlchemist
This looks really very very neat.
One (not a great) workflow I have, is that I use notebooks as quicks UIs to visualize some results. 1. Run a simulation that outputs results to some file 2. Load results in a notebook and do some quick processing + visualization
Very often, I want to compare quickly between 2 different runs and end up copying down the cell with visualization, then just re-run the data load + processing + visualization and compare them.
My understanding is that this would not be possible with marimo, since it will re-run automatically the cell with my previous data right ?
cantdutchthis
(marimo team-member here)
I have had a similar situation and my "hack" for this was to start the same notebook twice and have two tabs open. This worked for something things ...
Other times I just bit the bullet and made two forms on two variables so that everything would fit in a single notebook. By having two variables that contain all the inputs you can make sure that only the cells update that need to update. It takes a bit more effort but makes a lot of sense for some apps that you want to share with a colleague.
If you share more details about your setup I might be able to give better advice/think along more.
mscolnick
It may be preferable to create a variable tied to a UI element that can be used as a toggle to view each analysis.
choice = mo.ui.dropdown(['train', 'split')
data = load(choice.value)
processed = process(data)
visualize(processed)
This way, you can toggle between just more than two if needed. If you need to see both at once, you'd want to refactor the processing and visualizing step into functions, and then just duplicate the finals cell(s).
marimo has a multi-column mode, so you can view them side-by-side
dchuk
I’ve been tinkering with Marimo, it’s pretty sweet (and you can use cursor or other AI IDEs pretty easily with it).
On running notebooks as scripts: I can’t find in the docs what happens if you have plotting and other notebook oriented code? Like I’m using pygwalker to explore data through transformation steps, and end with saving to csv. If I just run the notebook as a script, is all of the plotting automatically skipped?
cantdutchthis
(someone from the marimo team here)
It depends a bit on how the notebook is written. There is `mo.app_meta()` that allows you to detect how the notebook is running. This can be in "app mode", "edit mode" or in "script mode".
https://docs.marimo.io/api/app/?h=meta#marimo.app_meta
Effectively this could allow you to do things like "only run this bit when not in script mode" if you want to skip things.
Alternatively you can also run the notebook via the `marimo export` command if you care about the charts and want to have a rendered notebook as an artifact.
Gotta ask out of curiosity, anything you can share about your cursor/marimo workflow? Are you using the llm tools from within marimo or outside of it?
idanp
For lightweight calculations re-execute everything while typing cells https://github.com/idanpa/jupad
paddy_m
I develop an open source notebook widget. Working with marimo has been a joy compared to developing on top of any other notebook environment.
The team is responsive and they care about getting it right. Having a sane file format for serializing notebooks is an example of this. They are thinking about core problems. They are also building in the open.
The core jupyter team is very unresponsive and unfocused. When you have a bug, you need to figure out which one of many many interelated projects caused the bug, issues go weeks without a response. It's a mess.
Then there are the proprietary notebook like environments. VSCode notebooks, and google colab in particular. They frequently rely on opaque undocumented APIs and are also very unresponsive.
cantdutchthis
(someone from the marimo team here)
Happy to hear it! Got a link to your widget? I am assuming it's an anywidget?
dmadisetti
Very excited for the top level functions change coming up !!
One design decision they made is that outputs are not stored. This means these notebooks are not suitable replacement for heavy computation routines, where the notebook is a record of the final results. Other people are not expected to run minutes/hours long computation to see what the author intended.
You can work your way around it by storing the results in a separate file(s), and writing the boiler plate to let the reader load the results. Or they let you export to ipynb - which is still sharing two files.
Presumably the reason for this decision is making git diffs short. But to me the solution is to fix git diff to operate on JSON nicely, rather than changing the entire notebook format.