Scallop – A Language for Neurosymbolic Programming
33 comments
·March 22, 2025versteegen
Wow, I'm currently reading the Scallop paper, so funny to see it posted here!
I really love the concept. This isn't just differentiable neurosymbolic declarative probabilistic programming; Scallop has the flexibility of letting you use various (18 included) or custom provenance semirings to e.g. track "proofs" why a relational fact holds, not just assign it a probability. Sounds cool but I'm still trying to figure out the practicality.
Also worth pointing out that it seems that a lot of serious engineering work has been done on Scallop. It has an interpreter and a JIT compiler down to Rust compiled and dynamically loaded as a Python module.
Because a Scallop program (can be) differentiable it can be used anywhere in an end-to-end learning system, it doesn't have to take input data from a NN and produce your final outputs, as in all the examples they give (as far as I can see). For example you probably could create a hybrid transformer which runs some Scallop code in an internal layer, reading/writing to the residual stream. A simpler/more realistic example is to compute features fed into a NN e.g. an agent's policy function.
The limitation of Scallop is that the programs themselves are human-coded, not learnt, although they can implement interpreters/evaluators (e.g. the example of evaluating expressions).
alankarmisra
I'm wondering if this is a limitation though. If it can be learnt from training data, would it not be part of the neural network training data? I imagine we use Scallop to bridge the gap where we can't readily learn certain rules based on available data or perhaps we would prefer to enforce certain rules?
mark_l_watson
I looked at Scallop a year ago and decided that it was not a replacement for Prolog - for me.
I may re-evaluate now, thinking of smoother LLM integration as well as differentiability.
Has anyone here used Scallop for a large application? I ask because in the 1980s I wrote a medium large application in Prolog and it was a nice developer experience.
anonzzzies
Not Scallop related but did you try Mercury? It is prolog with types and flagging of deterministic functions; the Prolog we ported had a very large (... vague, I know) performance boost and that is a lot of code. Porting is gradual mostly.
FloorEgg
Ever since I learned about category theory and its relationship with symbolic reasoning I've suspected that AGI will come from elegantly combining symbolic reasoning and probabilistic reasoning. This is the first project I've seen that seems to be positioned that way. Very cool.
dr_dshiv
When LLMs code in order to reason, isn’t that a combination of probabilistic reasoning and symbolic reasoning?
sigmoid10
Neural networks are actually somewhere in between. They don't directly operate on symbolic expressions or explicit logical rules. And while they rely on probabilistic aspects for training (and sometimes for inference), they rely more on continuous-valued transformations in extremely high dimensional spaces. But if your goal is human-like intelligence, they are a pretty good bet, because we know the human brain also doesn't perform symbolic reasoning at its core and these things only emerge as high-level behaviour from a sufficiently complex system. But it also makes neural networks (and us too) prone to failure modes that you would not see in strictly symbolic reasoning processes.
null
VinLucero
If you like scallop, you are gonna love lobster:
versteegen
Unfortunately it doesn't seem to be available yet. Scallop and Lobster are both from UPenn, and the Scallop website says "We are still in the process of open sourcing Scallop," so I assume it's a matter of time.
integralof6y
The scallop source code is in github, https://github.com/scallop-lang/
sgt101
Thank you.
you seem to be more in the know than me :) Please could you just sketch out a few bullets and explain the relationship between Scallop and Lobster and what you think is going on?
alankarmisra
I read the paper on Lobster a little bit. Scallop does its reasoning on the CPU - whereas Lobster is an attempt to move that reasoning logic to the GPU. That way the entire neurosymbolic pipeline stays on the GPU and the whole thing runs much faster.
JFingleton
A bit over my head - but can't Prolog achieve similar results?
sgt101
Anything can do anything else given enough time and power, but I think: no, not without shenanigans. This has primitives for interfacing to nn's including foundation models so you can ask it (for example) to label images of cats and dogs using clip, then you can reason over the results.
So it's intended to combine nn reasoning and logical reasoning cleanly.
light_hue_1
The problem with scallop is that no one has ever shown a single program that wasn't easier to write without it. Their papers usually contain also no examples, and the ones that do are a heck of a lot more complicated than asking the model to do it directly. The programming languages world let's them get away with this because they're excited to be relevant in the LLM universe. But I wouldn't accept any of this work if I was their reviewer; they need to write real programs and compare against baselines. Many of them. And show that their language provides any value.
Just look at the examples on their website. All 3 are lame and far easier without their language.
It's like publishing that you have a new high performance systems language and never including any benchmark. They would be rejected for that. Things just haven't caught up in the ML+PL world.
null
revalo
This is amazing. I've been looking forward to such a thing for a while now.
tempodox
The tutorial claims that fib(0) = 1, which is wrong.
https://en.wikipedia.org/wiki/Fibonacci_sequence
This one was easy to spot and would have been easy to get right. Makes me wonder…
fib0equal1
From the first paragraph on Wiki:
> Many writers begin the sequence with 0 and 1, although some authors start it from 1 and 1[1][2] and some (as did Fibonacci) from 1 and 2.
ginko
From the first paragraph in the article you linked:
> Many writers begin the sequence with 0 and 1, although some authors start it from 1 and 1[1][2] and some (as did Fibonacci) from 1 and 2.
null
LoganDark
I wish this website explained what neurosymbolic means.
alankarmisra
It's a combination of neural networks and symbolic reasoning. You can use a neurosymbolic approach by combining deep learning and logical reasoning:
A neural network (PyTorch) detects objects and actions in the image, recognizing "Jim" and "eating a burger" with a confidence score.
A symbolic reasoning system (Scallop) takes this detection along with past data (e.g., "Jim ate burgers 5 times last month") and applies logical rules like:
likes(X, Food) :- frequently_eats(X, Food).
frequently_eats(Jim, burgers) if Jim ate burgers > 3 times recently.
The system combines the image-based probability with past symbolic facts to infer: "Jim likely likes burgers" (e.g., 85% confidence).This allows for both visual perception and logical inference in decision-making.
f1shy
Also can be used to verify NN decisions. In autonomous driving, a NN can make “instinctive” decisions, and a GOFAI system can verify they work and don’t break civil or physical laws. You can have many parallel NN giving recommendations, and let a symbolic system take the final decision.
eternauta3k
Is the reasoning strictly downstream of the image recognition? Or can prior knowledge impact how objects are recognized? E.g. I'm driving on the road at night so the two incoming lights are probably a car.
The SCC example is interesting, I wonder what behavior that gens to. Reminds me of Lean, have to suspect it may make the processor quite spicy, like Lean. Also don't see clear indication that this benefits from heterogeneous compute resources.
Oh, boy, it's written in Rust!