TradeExpert, a trading framework that employs Mixture of Expert LLMs

134 comments

·June 2, 2025

ta12653421

Good one, happy to add my perspective here:

DISCLAIMER: I've spent the last 8 month heavily on building a quant-based asset management app (though, still not live, currently in final steps to sync processes with broker)

a) I tried to leverage some of this AI-voodoo stuff, though not on the level as in the paper; my findings are clear (at least for me): AI-driven trading does not give you a bigger/better edge than any of the other well-known approaches

b) In fact, AI-based approaches are at best on par with traditional approaches, in lot of scenarios not even this; I havent seen any setup from anyone which actually outperformed one of the classic approaches. BUT: The AI-guys have much higher cost, be it Infra, processing time / waiting time in front of screen etc. So you have you to pick carefully, which one you choose.

c) I'm doing today only "standard approaches" with volume/statistics/vola/price action, as this approach is super-cost-efficient (i need only one cheap datastream) and a lightweight machine for 10 / 20 USD a month

d) It is clearly possible to outperform the market, though these approaches are not scalable unlimited - Ex: depending on the used instruments, there may not be enough liquidity to buy continuously for 100k, but maybe for 10k only. Apply leverage of 5-10 on an asset that moved 5% in last 10 days on a 10k position - is this outperforming? A clear >yes< in my perception?

e) People who have built & found a stable approach do not share it or talk about it, there is no real community; you will get details of working approaches only from people whom you are really "friend with"; there is a lot of unshared but working business tactics in the field.

flessner

I have been interested in algorithmic trading for quite a while now. Everything you've said resonates with me as I ran into similar issues. I hope you don't mind that I add my own couple cents here.

(a-c) LLMs are especially difficult to use due to their knowledge cutoffs and "unpredictability". A self-trained "old-school" machine learning model can go a long way though.

(d) With Crypto the volatility is great for trading, but liquidity can quickly become a problem (even at $1000 non-leveraged positions). For me, the ultimate goal is to find a strategy that is profitable in all market conditions. I personally value consistency and reliability more than absolute profit.

(e) There's some chatting about risk management, but absolutely no discussion on profitable strategies. Resources are incredibly scarce - Systematic Trading by Robert Caver is the only book that was actually useful.

ta12653421

Thanks for your question:

Regarding LLM: I do use them to write code in less time, i do not use them to do anything rlated to trade analysis / execution / etc.

Regarding Crypto: No, in my location i cant use crypto as Underlying, since the crypto market is open 24/7, but the instruments im using are available only between 0800-2200 on workdays

Risk Management is key: If you have a solid hitquote, its mathemathically impossible to ruin the account; most people get to greedy and have no patience, but if you stick strictly to your risk plan, there is not that much that could go wrong.

gavinray

I have zero knowledge of finance and trading, but when I got curious about algorithmic systems, it seemed like sentiment-based trading using current events was a more viable strategy than forecasting/regression-based analysis.

NotAnOtter

For HFT - yes, and that has been the case since before the .com bubble. Trading on news is what fuels HFT. And value trading is primarily based on insider trading / corruption.

What's left over is ETF's or luck.

ta12653421

Not really anymore, and for a long time: This article is more than 10 years old and describes what impressive tech & metrics they were using already back then:

https://queue.acm.org/detail.cfm?id=2536492

NotAnOtter

With cutting edge stuff you cannot say statements like "I tried X and found that it is not good at Y".

You could not get X to be good at Y, but it's not impossible someone else can.

pdabbadabba

> AI-based approaches are at best on par with traditional approaches

This seems like quite the generalization! Wouldn't it completely depend on the approach and model?

ta12653421

Thats what i meant with "in my experience": For sure, i see & read a lot of content from people who are claiming, that they are using AI for their approaches - though, i havent met one or seeing someone showing some "hard facts".

It is very likely possible to find a "stable setup" with this - but it didnt work out for me, maybe i had the wrong perspective :)

r0b05

Are you referring to LLM's when you mention AI here?

ta12653421

Thanks for the question:

Not really in the field of trading, rather prediction-based approaches etc.; Im not sure if LLM could be of any use here? The approaches based on statistical arbitrage are purely math/number models. from my own experience, LLM are absolutely useless when it comes to "trading ideas" (I use them for code generation, instead), this is because they are dicing together values in their output which are not really related, because of their hallucis.

Also, for fundamental analysis they are too often incorrect - so running an auto-approach based on LLM-fundamental output would be an "interesting" idea :))

r0b05

Interesting, so you're doing more fundamental quantitative analysis and prediction. Yes, the issue with using an LLM is that they are too often incorrect, however, a human in the loop could solve this at the expense of automation.

Your project sounds ready cool though. If you ever feel like collaborating, give me a shout at drknyt05@gmail.com.

throwawaymaths

you could certainly encode data into an transformer using custom tokens and fine-tune but that's not trivial.

bguberfain

So they used a LLM with knowledge cut in mid 2023 to evaluate 2023? Seems like a classic leakage problem.

From paper: "testing set: January 1, 2023, to December 31, 2023"

From the Llama 2 doc: "(...) some tuning data is more recent, up to July 2023."

mhmmmmmm

Removing the "Market expert" which uses OHLCV (Open, High, Low, Close, Volume) also drops the sharpee from 5.01 to 1.88 while also increasing the max draw down to 13.29% (v.s. 9.70% for the index). I'd be very surprised if the pre training of the base model was the only source of leakage...

flowerthoughts

> Alpha Factors incorporates 108 technical indicators and factors with their expressions, which are believed to possess predictive power regarding stock price movements.

Examples of the indicators are in Figure 15. The ablation studies in Table 4 suggest that market and news information made a much bigger impact than the magic indicators. Makes sense if the indicators are simple enough that the LLM can reproduce them without losing processing power.

I somewhat like that they used DJI and not SPX, but 2023 was a sideways bull year with DJI +12% and SPX +23%. One year is way too short of a study.

> Hardware: NVIDIA A5000 GPU x 4, AMD Ryzen Threadripper PRO 3975WX CPU, 256 GB RAM

Seems approachable.

> The proposed TradExpert framework utilizes a Mixture of Experts (MoE) approach, where four LLMs are specialized in processing distinct sources of financial data. All these LLMs are based on the LLaMA-2-7B Touvron et al. (2023b) model and fine-tuned using the LoRA mechanism Hu et al. (2022)

Relatively small LLM.

Overall, this does seem like an interesting study, even for just comparing data sources.

blitzar

> 7B, Relatively small LLM.

Possibly too large a model. (Daily) finance data is finite - 7B parameters is potentially order(s) of magnitude more than the training data.

flowerthoughts

Yes, but these are LLMs where the point is they understand English prompts and information. They aren't just fed market data.

niemandhier

If I understand this correctly we have come full circle on what MoE means.

MoE started out as some form of multi model approach.

Afaik in current architectures it’s basically a load balancing method that while it increases latency makes the model better suitable for distributed operations.

To me this reads as if the author uses the term closer to Urs original meaning than its current.

ArtTimeInvestor

How do people on HN think about the market?

Do you think the market is so efficient that anyone who outperforms it is merely lucky?

Or do you think the market is inefficient enough for a person smart enough to be able to outperform it by thinking?

In other words: Do you think a single person can rationally decide to invest their time into thinking about the stock market? Or would that always be a fallacy, and whatever the outcome is - we can't decide if it was just good or bad luck?

_heimdall

Its hard to find efficiency in markets when company values seem completely unhinged from reality.

Historically, companies were valued with a heavy weight put on their financials. That doesn't seem to be the case anymore, and without any clear approximation of how we are valuing companies it feels to me more like pure gambling.

ArtTimeInvestor

From this and your other replies it seems that you assume the average p/e ratio of the S&P over the last 75 years is somehow the "correct" one and therefore todays S&P's p/e is "unhinged from reality".

What would be an argument that supports this?

To me, it seems fairly easy to imagine that the p/e of the S&P could increase forever. Either because the increase of the money supply is constantly accelerating or because our ability to increase productivity is constantly accelerating.

_heimdall

I'm not trying to claim that there is any "correct" p/e ratio, its just an indicator or signal like anything else.

The p/e ratio is an interesting indicator because it more directly ties together the how much money the company actually makes relative to what I'm paying for it.

P/E ratios over time can be good indicators of relative value of the company. I'm not exactly inventing economic or investment theory here to point to high p/e ratios as an indicator of overvalued companies.

infecto

There are certainly cases where I would agree (TSLA) that I would never be able to underwrite the investment with my mindset. I don’t think your statement is correct though. There have always been hype growth companies, appetite for risk has increased but the mental model is not foreign. The market is betting on a major shift in some way that the company is doing a lot better than it currently is. This is a lot of tech but for the rest of the market most companies trade within their expectations based on current financials.

_heimdall

I put a lot of weight behind P/E ratios. Today those ratios are very high compared to historic trends, and are in the ballpark of what we saw in the sitcom bubble [1].

By no means am I saying that is predictive, maybe the ratio isn't as useful as I think or maybe it is less applicable today for some reason. I do have a hard time finding any stocks today that I can justify the market valuation for, though.

[1] https://www.currentmarketvaluation.com/models/price-earnings...

rafaelero

Maybe it's your understanding of reality that it's flawed.

_heimdall

It certainly is, everyone's understanding of reality is flawed.

Show your work here though. I'd point to P/E ratios over time [1]. That's a very common signal used when valuing a company and today those ratios are much higher than historic averages, nearing the ratios we saw before the dotcom bubble burst.

[1] https://www.currentmarketvaluation.com/models/price-earnings...

weego

there is a case that the markets are highly efficient, just the the "information" part of the market hypothesis isn't the information a layman thinks are the market signals, and that only the HFT trading firms have the "correct" signals available to them. That certainly accounts for the seeming dissociation between company performance and market performance on certainly "blue chip" stocks

barchar

HFTs chase two sided order flow, not valuations that are "correct" in the long term. They generally don't add information to the market.

Quantitative traders do add information.

HFTs probably don't like trading with people that are adding information, that's why order flow from retail traders is valuable.

_heimdall

I think you're coming in with an assumption that HFTs are generally correct in their valuations, and that they are chasing efficiency or proper valuations rather than short term gains.

Maybe those would be roughly analogous in a stable situation, but my read is that the big players in the markets today are chasing short term gains despite the signals. I'd point to the housing crisis as a recent example, the biggest banks and funds were massively over valuing real estate and real estate derivatives because they were blind to the actual risks and saw what appeared to be free money.

moritonal

I know that due to being interested in a specific industry, lets say Video Games, I can combine a general understanding of the environment, to a specific understanding of certain companies and the games they're making. If I see in a low-key mailing list that a company is making a new game that I know will line up well with the direction the industry is going, then I can invest in that company and likely be ahead of the market.

My hunch is that if you spend more than 6 hours a week studying the mood of a industry, you would likely be "luckier" than the market (although all risks still apply). I also believe a LLM could do exactly what I do with enough investment.

miningape

This is exactly how I make money shorting (or buying puts on) companies like Ubisoft and EA. They're pretty much hated by the entire gaming community and their games do not perform well - investors actually believe the stuff they say in press conferences so often the stock prices are over inflated and get corrected about 1 week post launch.

iTokio

Be careful, shorts are generally not for retail investors.

Shorting is about predicting when a stock will go down.

Not that it will go down.

Because

- there is no limit to how high a stock price can go

- the market can stay irrational longer than you can remain solvent

OtherShrezzing

Markets aren't perfectly information-efficient. They're just _very_ information-efficient. So there's at least some room for non-luck-based strategies to outperform the market.

More important than information efficiency, is the subjective nature of investments, and the inter-sectoral complexity of the modern economy. There's lots of meaningful calculations you can do to predict the longterm value of a company, but massive and mostly unseen shifts in expectations happen regularly.

A fairly small number of people in Feb 2020 saw that 1/3 of students globally were suddenly home-schooled, and connected the dots to turn that into an ultra-strong signal to invest in Moderna.

Similarly, lots of people saw that Nvidia's CUDA granted it a near absolute-monopoly over anything that requires matrix multiplication. They were able to connect the dots to turn that into an ultra-strong signal that NVDA would turn from a relatively obscure company into a significant company.

Domain specialists have an edge over the aggregate-market within their narrow field of interest.

TeMPOraL

N=1, I think that:

- You'd have to be either extremely lucky, or spend a lot of money and effort (e.g. to get to the bleeding edge of quant or HFI game), to be in a position where you could use your knowledge and brainpower directly to give yourself an edge over the market, and:

- Any such edge is extremely short-lived - the moment you take advantage of it, the market itself will start adjusting to correct, and on top of that, other smart players will notice the irregularity, work to exploit it or reverse engineer your approach; in the end, the "pattern" quickly disappears.

Whether one can become able to continuously find such short-lived edges and profit off them, by means other than pure random chance, I don't know, but I highly doubt it. The space is way too competitive; a sustained miracle advantage would eventually attract regulatory attention, and/or unscrupulous parties willing to lie and cheat to bury you legally, or literally. But that's just me speculating.

jamespattn

If the "edge" is ephemeral (I agree that it is), I always wonder how quant/HFT firms like Jane Street, RenTech continuously make insane profits out of such strategies.

I suppose it could be a combination of things that isn't necessarily just related to finding edge over markets. Having an entrenched market position or access to data faster perhaps?

TeMPOraL

My intuition: to the extent it's possible to systematically create an edge for yourself, someone will be doing it, and they'll have a bunch of competitors trying to outdo it, perhaps swapping the "first place" amongst themselves, but also continuously raising the costs of maintaining the edge.

There's an equilibrium there, where the top players have some profits - not necessarily insane, but enough for them to continue the race and take some of it home. Meanwhile, the up-front costs of getting all the advanced tech and expertise to try and compete with them is high enough, that you're not likely to break even for years - so you won't bother, and no one else will either, and the pressure pulling those "insane profits" down disappears. Those profits look like money being left on the table, but no one can afford to reach for it.

cmcaleer

Not all edge is necessarily ephemeral, if you're thoroughly entrenched and have a deep understanding of a market you will usually just outperform your competitors. Take Jane Street's infamously long-lasting (and profitable!) Indian options trade for example. They had clearly put in the work and effort into understanding a market that other firms didn't do to the same extent, and resources and minds who are good at figuring this stuff out are finite.

You're totally right that edge isn't just knowing if number is more likely to go up, as an example since you mentioned faster access to data: some of the best of the best companies will hire meteorologists so that they know how reliable their microwave towers for transmitting data between e.g. Chicago and NY are (and they can lean in or widen their spreads according to how current and up-to-date their information is).

It sounds like crazy stuff to do, but when your data could be up to 10ms slower than you expect due to weather and you're so sensitive to latency you hire FPGA engineers because normal high performance CPUs aren't enough for you, it's not that crazy.

I sometimes feel like HFT is a waste of good talent and wonder what some of the people I've met who work at JS or CitSec could have done in other industries, but at the same time HFT is often the only industry that correctly prices these peoples' minds. Ultimately having a smoother financial system where risk is more correctly priced is a good thing, even if it's not the best thing they could be doing.

mhh__

They have the best infra, lots of good properly cleaned data, and market access.

I don't know if rentech ever make markets but one way to make a _lot_ of money in finance to provide liquidity while also have good alpha models. The faster the better. This way you are earning the spread while also getting into the positions you want, and maybe even getting paid for it by the exchange.

jcfrei

Jane street and others are essentially market makers. Which means in one way or another they get paid for providing liquidity. They have a different pay off profile from hedge funds, retail traders or mutual funds which usually make a directional bet. For example by having exclusive order flows (for example from robinhood traders), lower trading fees (often rebates), access to better execution (via dark pools), cheaper financing (for leveraged trades), tighter spreads in OTC deals, etc.

mtillman

Rentech is an old customer of mine, do you know if they’ve still not had a down year on the employee fund?

_Algernon_

IMO efficient market hypothesis is not true (assuming you have all available information you could beat it), but for an ordinary person there are so many ways the market is stacked against them that they should still just assume that it is true.

The question isn't "is it possible to outperform the market by thinking?", it is "will whatever advantage I can get by thinking weigh up for all the disadvantages I have from not being able to insider trade, being further from the exchange than other market participants, having lower ability to sustain losses than big market players, trading fees, taxes, opportunity cost, etc.?"

For any single individual, the answer to the second question is almost certainly no (unless you happen to sit in congress).

davedx

“In the short run, the market is a voting machine but in the long run, it is a weighing machine.”

Short term movements are very much sentiment driven, both humans and algorithms. Long term movements generally trend towards the consensus of fair value of any given security, which is usually modelled somehow. You can see stocks where the models struggle to reach consensus by their long term volatility.

mtillman

I spent 6 years in algo trading. I think that the market is a thing people call the US stock market and it’s 98% machine traded volume. Whenever I hear people talking about their alpha on an equity I assume it’s either insider trading or ignorance of the billions in PhDs, super computing, and dark fiber they’re up against.

Fade_Dance

There is alpha in markets. I did a lot of relatively unsophisticated SPAC relative value warrant trading during the 2020 period, and I think a big part of it is the question of scale and slippage. If you're a prop firm that makes "real money" your universe is limited by these constraints. IE "not worth the time" would be a common response to trading opportunities.

For example, there were high sophistication players in the merger/stat arb phase of the game, and they would layer out their warrants like an onion of dark liquidity (the orders were hidden/not directly listed on ARCA/etc). They were involved in just about every SPAC name out there. But when an SEC filing came out, or you learned some specifics about a certain sponsor team (maybe they have very high quality lockup partners who don't dump shares on lockup date, as shown by their last 3 SPACs), then maybe that implies a higher warrant valuation, or maybe that should be priced into the option chain. And they will happily sell to those pricing in that "hair" and sell their inventory because they don't want to deal with some 5 dollar warrant that trades 50,000 volume per day.

The profitable futures traders I know are also more or less just riding off the back of the machine volume and participating when machine trading from option flow is moving/pinning markets. They are of course just exacerbating the situation, which is partly why we see this increasingly bifurcated market where robots/options are entirely into control, interspersed with violent price discovery/mini vol events.

Tycho

It’s pretty efficient at assimilating obvious revelations. You have to be quick to trade on the news, and you’re not going to find many arbitrage opportunities. There’s no reason why it would be particularly efficient at pricing long term outcomes. It reflects some sort of consensus or balance of opinions, but those things could simply be wrong. That said, even if you’re right about something, there’s so many factors to account for, it’s not going to be easy to profit. Personally I have most of my savings in retirement tracker funds (global equities), but I’m also building a portfolio of handpicked stocks/bets, aiming for 20, each with the same starting investment. Maybe it won’t be successful but at least it will be fun and minimise the regret of not acting on my insights/convictions. The diversification should stop it from being a disaster.

16th_hop

Does anyone understand how the Market Expert works? It takes in numerical OHLC data and converts it to embeddings for use by the LLM… but embedding are also numbers so I don’t see how that’s any easier for the LLM to process since it’s a language model.

> The Market Analyst LLM focuses on analyzing historical OHLCV (Open, High, Low, Close, Vol- ume) data to predict stock movements. However, time series data is inherently continuous and lacks the discrete token structure that LLMs are designed to process. This misalignment poses a signifi- cant challenge in effectively utilizing LLMs on time series. To this end, we utilize a reprogramming mechanism Jin et al. (2024) to reprogram the input financial time series into text prototype repre- sentations.

csantini

The market is like an ecosystem. There are huge mammals (investment banks, hedge funds) that look at certain type of preys, and there are smaller rodents that only eat very tiny worms.

In high volatility regimes, ie. stocks with low market cap, the market is far from efficient. Hedge funds are not even looking at stocks with 100M market cap.

There are traders that act in these regimes that beat the market, exactly because they play small.

Anyhow most people would be better of by assuming the market is completely efficient.

thunder-blue-3

fwiw, I tried something similar about 5–10 years ago. I wasn’t using LLMs like the abstract here suggests, and honestly, I’m not sure how you'd act on a signal fast enough with them. When I gave it a shot, there was some slight predictive value, but in the end it felt like noise and gambling, so I moved on.

mkoryak

Do is it "tradExpert" or "tradeExpert" ?

Shortening variable names from words by a vowel or two is a hge pt peev.

languagehacker

How did it perform against a boglehead portfolio? Were fees and commissions included? Seems weird to evaluate performance over a single year for trades. Much more interested in long-term growth over one or more market cycles.

Fade_Dance

I think if you're talking on portfolio level a lot of these things are used as signals and parts of a greater whole.

In that sense, they can indeed add value. My current project is a modern version of a classic Harry Browne portfolio with even asset allocations to gold/bonds/equity/commodities, with optional layers of sophistication according to spec needs.

Something like systematic macro could be analyzed as a standalone return stream, but it's more useful when considered as an input into allocation/leverage adjustments (ex: if geopolitical uncertainty readings are high, cut down the trend following exposure). Even the more robot/quant stuff like vol trading feeds back into the wider portfolio management and portfolio construction level to some degree.

AIorNot

So did it make any money?

jsheard

If it did, I think it's safe to assume they would have sold it to a quant firm and we never would have heard about it.

TechDebtDevin

This. It is a gimmick.

kinnth

It doesn't look like it. There isn't a clear comparison to a "human trader".

IronyMan100

one drawback here is the backtesting. They backtesting only 1 year in 2023.

infecto

Not a very good study. I did not look at any of the researchers background but it’s like they did not consult their respective finance school departments.