Skip to content(if available)orjump to list(if available)

Building a BitTorrent client from the ground up in Go (2020)

pzmarzly

It is easy to make a torrent client, but very hard to make a good torrent client. A very good, or let's say "perfect", one, needs to support multiple transport protocols (TCP, "uTP" aka UDP, "WebTorrent" aka WebRTC), multiple discovery mechanisms (DHT, PEX, HTTP trackers, WebSocket trackers), multiple torrent formats (v1, v2, hybrid), should use the network optimally (max the speed without overloading the network - IIRC some clients measure average packet latency and if it starts going up, put some backpressure), resolve magnet URLs, set up port forwarding, reconfigure firewalls, offer API for *arr stack, be a good netizen (report stats correctly, send packets within the specs, do not spam - otherwise other clients will blocklist your in their code or config), implement many BEPs (mutable torrents are cool), be able to recover from interrupted state based on only the data that's on disk, have configurable downloading order (people want to start playing videos before they finish downloading, so you may want to e.g. download header and footer of each file first), and ideally detect duplicates between torrents (cross-seeding). And then there will be people throwing 2TB+ torrents at it (e.g. TLMC) to benchmark it, and saying your client is "literally unusable" if it doesn't handle it.

So, building a "perfect" torrent client from the ground up is a daunting task. But the "good" news is that nobody built such a "perfect" client just yet, so if you have some spare months of your time, you can take a shot at it. Or even better yet, open the issue tracker for one of popular clients or libraries, and add one of the missing features from the list above.

blacklion

Also, good disk I/O scheduler needed, or you can thrash even stripped NVMe disks, not to mention spinning rust.

Good multi threading without any global locks is a must or your UI/remote control become unresponsive on big number of active torrents.

Also, it needs daemon mode, which can be compiled & installed without any GUI libs (like QT, GTK, Waynald, libx11, anything like this) and fast responsive remote GUI for all platforms which allows to manage thousands of torrents simultaneously (I'm looking at you, <any-existing-torrent-client>). It could be Web UI, but for now I didn't se usable one. Only thing that semi-works is Transmission + Transmission-Remote-GUI written in Pascal. But it is effectively abadonware and works worse and worse with each Transmission update.

And be scriptable to allow automate change of torrent on tracker, for example (you need to detect it by "Torrent not registered" tracker-specific error answer and re-download torrent building URL from, for example, "comment" field of torrent, again in tracker-specific way).

bool3max

> But the "good" news is that nobody built such a "perfect" client just yet,

Which of the qualities you listed are lacking from the currently most popular torrent clients (qBitTorrent, etc..)?

pzmarzly

I am using qBittorrent, and from the top of my head:

- WebTorrent and WebSocket patch for qBittorrent is ready but not merged (waiting on libtorrent's decision),

- cross-seeding support is poor (a separate "cross-seed" binary can be used to set up hardlinks to fool qBittorrent into cross-seeding, but it cannot detect duplicates on its own)

- when it comes to torrent management, there is no way to group torrents into groups with common settings (important if you use multiple private trackers) - people recommend having multiple installations of qBittorrent side by side

- when it comes to reconfiguring NAT and firewall, qBittorrent supports UPNP IGD protoocl, but I am not sure about NAT-PMP and PCP

- I have never seen qBittorrent connect to a single IPv6 peer - so I don't know if the support is there

- download order - you can choose "download in order" or "download rarest first". I dont think "download in order" downloads footers, so mp4 files won't work (IIRC mp4 store metadata in footer, mkv in header)

Marsymars

> - when it comes to torrent management, there is no way to group torrents into groups with common settings (important if you use multiple private trackers) - people recommend having multiple installations of qBittorrent side by side

Probably easiest to do this via docker-qbittorrent-nox.

> - I have never seen qBittorrent connect to a single IPv6 peer - so I don't know if the support is there

The Linux ISOs (not a euphemism) I'm seeding probably get about a third of their peer connections via IPv6.

anacrolix

See my other comments, but my client fully supports Webtorrent and webtrackers.

Not much support for port forwarding.

IPv6 definitely works.

Download order is handled by providing readers directly into torrent data and using that for prioritization. So basically request what you need when you need it. No arbitrary list of algorithms.

silotis

> NAT-PMP and PCP

qBittorrent (via libtorrent) supports NAT-PMP and PCP

> IPv6

qBittorrent supports IPv6

WithinReason

qbittorrent has "download first and last pieces first" (in addition to "download in order") so the footer is retrieved

andai

Personally, "being as small as uTorrent used to be", but clearly that's not a deal-breaker. (Then again, neither is like two thirds of the stuff GP mentioned!)

pzmarzly

To be fair, circa 2010 all torrent clients got "good enough", so my wishlist above is just a wishlist (because torrents are cool and I like to see progress). I use (and sponsor) qBittorrent myself and have no plans to change.

wincy

Last I tried it their hybrid torrent implementation was very broken and would fail half the time.

thrdbndndn

For me, the hardest part of building a good BT client (or any software, really) is getting the UX right.

These days, I use QB after uTorrent's downfall, but even after all these years, its UX still isn’t quite there.

szvsw

As someone who stopped torrenting around 2015 or maybe 2016, what happened to uTorrent? Also RIP what.cd, I still am sad that it is gone.

kohbo

Became adware

pzmarzly

qBittorrent + VueTorrent is not perfect, but it's hard to find anything better. If you haven't tried it, take a look.

https://github.com/VueTorrent/VueTorrent

Eikon

> but very hard to make a good torrent client

> So, building a "perfect" torrent client from the ground up is a daunting task.

Don’t you think you’re exaggerating a bit? It’s not daunting by any stretch. The feature set you described is fairly straightforward and something even a beginner developer could tackle without too much hassle.

I’m honestly kind of tired of seeing people act like doing anything these days is some impossible feat.

dgroshev

It's not impossible, it's just hard to be better than libtorrent, otherwise a new client only makes sense as an exercise.

Arvid Norberg is hardly a "beginner developer", yet look how hard it is to make a performant implementation of uTP: https://github.com/arvidn/libtorrent/issues/3542

anacrolix

My client anacrolix/torrent has most of that, including streaming, hybrid, v2, all trackers and protocols.

I don't do much port forwarding, it's too fiddly. I think I provide upnp out of the box and few other variants that were contributed.

I also have all the DHT extensions, like mutable torrents and get/put etc.

Large torrents do pop up occasionally, but it's been a long time since someone found a performance issue there but it used to happen every few months. Lots of optimisation.

GitHub.com/anacrolix/torrent

a-french-anon

Any clean room impl. is a good news to me, but is the support for private torrent/tracker complete (https://github.com/anacrolix/torrent/issues/531)?

pzmarzly

I know your library does a lot, I have been your GH Sponsor for months ;) Thanks for reminding me about it, I wanted to up the amount. And to thank you for working on such cool stuff and making the ecosystem better.

I missed the upnp support, thanks for correcting my beliefs.

Which client with WebUI would you recommend the most? I tried using exatorrent and distribyted, but was running into bugs.

dgroshev

Few more challenges:

- seeding a million active torrents (this is entirely possible and needed but drives a lot of architecture)

- a UI for a million active torrents

- be nice to trackers (keep-alive, batched reports; again, architecture gets tricky, iirc libtorrent can't do that still)

- correct pipelining

- congestion control on uTP

- I/O scheduling/batching

- testing all of the above

It is quite a lot.

blacklion

Looks like we have same experience with torrents :-)

dgroshev

Yeah. I was working on [0] for a while (I might get back to it when I have more time) and implemented a chunk of the protocol with an eye for a more scalable libtorrent replacement, and it's not entirely trivial. Even things like the piece picker get trickier at scale.

[0]: https://dgroshev.com/blog/torrentdyne-encryption/

6510

Alternatively, a minimal client with few features that can seed large numbers of torrents without crashing or freezing. Settings for how many it may seed at a time, prioritize that what needs seeding, ignore overly seeded things. Options to seed only blocks no one has. Remotely controlled.

anacrolix

I had a super seeder implementation that seeded over 400k torrents simultaneously using some fancy callbacks at certain points in the protocol.

If this is actually a common use case, I can resurrect it into a usable form for the public if there's interest or funding. https://github.com/anacrolix/torrent

6510

https://www.reddit.com/r/DataHoarder/comments/wcusy1/torrent...

The datahoarders run multiple clients. It's a workable solution but far from ideal.

newsclues

Given the state of software quality, it is remarkable that torrent clients are SO GOOD, given they are free!

anacrolix

There's only a handful of actually complete ones. A few it's surprising they even work, they're crazy complicated, and weird, but once they're stable they need very little maintenance. They're all also written in very different styles due to a mix of C/C++ and very high concurrency.

ktpsns

This looks simple not only because the article is written well but also because Go is the go-to-language for complex networking situations. Doing things in parallel, even pipelining? This would make quite some spaghetti algorithm in C/C++, even async rust/python world would not look so clean as in Go. This is clearly a big strength of the language.

pkolaczk

> even async rust/python world would not look so clean as in Go.

That’s quite debatable and my experience is different. There is a whole lot of high level stuff that can be expressed with eg async streams and functional transformation chains in Rust, that Go has no counter offer for. Same for being able to use any future in select/join not just channels. Also I find cleanup / error handling in Rust much cleaner.

z0ltan

[dead]

cherryteastain

Agree with C and old style C++, but going off the flowchart [1] in the article this could be done quite cleanly with boost::asio and C++20 coroutines as well.

[1] https://blog.jse.li/torrent/download.png

johnisgood

> Go is the go-to-language for complex networking situations

And Elixir / Erlang for serious ones.

cybrox

Rarely on the client side.

johnisgood

Why though, especially with Phoenix? (HTML HEEx, etc.)

dataviz1000

Unfortunately, BEAM VM is very slow compared to multiple other languages including Go. It's great when starting out with a few developers, however, since Go and Rust are much more performant, it is possible to hire several more Go or Rust developers with the server cost savings instead of being tied to Elixir. There is always a trade off between easy to use and high performance.

But, yeah, if I was going to bootstrap a startup at the seed level, Elixir is the best choice for backend. If I'm spending $500K+ a year on infrastructure, I'll be looking at Go and Rust.

cultofmetatron

> Unfortunately, BEAM VM is very slow compared to multiple other languages including Go.

for cpu bound tasks? sure. but we are talking in the context of networking. elixir is going to absolutely smoke go for applications requiring a lot of simultaneous connections. We can already see it in actionable vs phoenix channels. channels supports a magnitude order more simultaneous websocket connections per machine.

cultofmetatron

> Go is the go-to-language for complex networking situations.

laughs in elixir

neonsunset

And Go is not even the best or fastest language at this either - Goroutines are quite caveman of API, with footguns solved by better languages.

lqet

Very cool, and surprisingly simple. I was always mystified as a kid how these file-sharing protocols actually work, but if you just look at a torrent as a collection of fixed-size pieces you are asking peers managed by a tracker for, it becomes really straightforward to implement.

Now I wonder how clients protect themselves against abusers (i.e.. people who never upload a single bit but only download). I often noticed that when I set the maximum upstream to 0, clients would stop sending me pieces. Do clients share a predefined list of configuration parameters with each other or with the tracker (Max upstream, max downstream, etc.)? Or is it something more sophisticated?

rakoo

Sufficiently evolved torrent clients follow a tit-for-tat protocol:

- If you're nice with me I'll be nice with you

- If you're mean with me I'll be mean with you

This is the best answer for the Prisoner's dilemma (https://en.wikipedia.org/wiki/Prisoner's_dilemma) on the long run, ie a situation where peers don't trust each other but will both gain if both cooperate.

In bittorrent it's typically implemented as follows: peers start by sending a very minimal content, and see what the other replies. If they reply with low enough latency then slowly increase the amount of content that is sent and see if they reply with the same increased amount of content; if they do, continue up to the max of what the network link allows (in combination with other peers of course). If at some point the other peer doesn't send something equivalent (even though we know they have it and we asked for it) then that peer can be cut off for not being cooperative.

Situation is different for seeds of course, because they have everything and want nothing, but they can have a similar behaviour -- start sending a little, increase slowly over time

__MatrixMan__

It's only the best answer if the other party is bothering to seek the best answer in real time, and in ways that you can't predict. If your opponent is governed by some algorithm, which is the case for torrent clients, you may need to depart from tit for tat in order to break them out of a needlessly adversarial mode. Or, if you're feeling like the bad guy today, you might need to depart from tit for tat to put them in an irrationally cooperative mode.

andai

> [tit for tat] is the best answer for the Prisoner's dilemma on the long run

This is fascinating. Have there been any simulations about this? I'm sure they've looked into this in game theory, but I'm wondering if you have a big swarm of torrent clients, does the scale of the population change the outcome, or is it the same?

(It seems like it would be wise when designing the torrent clients to run such simulations, so I'm thinking this probably has been done at some point.)

salviati

Check this out for a nice explanation of why tit for tat is a good strategy for the repeated Prisoner's dilemma: https://ncase.me/trust/

BonoboIO

Did knew, that the client does that with the tit-for-that.

In an ideal world the network would prioritize to upload to high speed like webseeds, so the webseeds can distribute even faster for everyone else.

Or do I make a mistake in my mind and the network already distributes efficiently and it is not needed.

jedimastert

> Now I wonder how clients protect themselves against abusers (i.e. people who never upload a single bit but only download).

It's called leeching, and it depends. It's typically considered good etiquette to upload as much as you download and that's usually enough, but it can be enforced

https://en.wikipedia.org/wiki/Leecher_(computing)

hoseja

I have the opposite problem, many things I download seem so well-seeded (seedboxes?) that I rarely achieve any sort of respectable ratio even if my uplink idles most of the time. It might be client problem but I haven't found much anyone discussing this. TAKE MY BITS!

__MatrixMan__

I've often considered writing the equivalent of a robotrader for this. If you're paying for a sedbox and want to use it to the fullest re: maximizing your ratio then some automated attempt to predict demand and start seeding the right thing early would likely go a long way.

Unlike predicting the stock market, competition wouldn't be very fierce, and I think the tracker API would give you most of what you need.

It's in the long list of things I'll do if I ever find a big pile of time.

anacrolix

See my other comment, good seeders carry most of the network these days. Upload bandwidth has become huge since BitTorrent first debuted so true P2P for resource reasons is not really a necessity anymore.

Akronymus

Over here (austria), you are pretty free to leech as much as you want, but as soon as you actually start seeding at any capacity, people start paying very close attention.

ratorx

IIRC, when both peers only have some subset of pieces, they are meant to “trade” with each other, and so will eventually drop low value peers for higher value ones. Aside from the adversarial case, this helps optimise for being nearer to the people you are sharing with, since torrents are global.

In practice, BitTorrent really needs seeders who have downloaded the entirety of the file to be fast for everything except really popular downloads, Seeders don’t really check for fairness and will typically upload to whoever they can the fastest (with a limit on number of peers).

There’s an even more adversarial case because the unit of validation (a piece with a fixed hash from the spec) might be bigger than the chunks that are being shared individually. So it’s possible for a peer to fake having pieces and upload garbage data instead, and they wouldn’t be caught since different chunks came from different peers.

anacrolix

For the regular case, 5-10 seeders will hold up an entire torrent indefinitely and without any issues. The contribution leechers make drops off very quickly if good seeders exist.

My algorithm did favour the best peers (both upload and download) but a few years ago switched to a "seeded" ordering to prevent bad behaviour clients dumping or starving new peers.

The garbage uploaders are not an issue. You can isolate peers to pieces and remove them with certainty, or use a technique called smartban which uncovers bad peers very quickly.

ratorx

Sure, I didn’t mean to suggest that it was impossible to handle, just that it was an edge case that you needed to explicitly handle and which a minimal implementation of the protocol might be likely to leave out.

__MatrixMan__

Wouldn't the peer uploading garbage data be caught immediately because the garbage wouldn't hash to the correct value? I imagine most clients would stop talking to such a peer rather quickly.

ratorx

Torrents are split into pieces, each of which has a hash that allows it to be validated independently.

However, these pieces can themselves be large, so it is often recommended to split them into smaller chunks that are sent one at a time. You cannot validate a chunk independently, and if you assemble a piece from chunks from multiple origins, you can’t immediately tell which origin is bad.

You need to explicitly handle this case, like the sibling comment mentioned by isolating peers who may have sent bad data and forcing validation.

Trung0246

There's private tracker which forces their user to upload.

kamranjon

I was just thinking about how incredible BitTorrent was the other day, how it really does outperform non-distributed solutions in many cases and how it is one of the rare examples of that.

One thing I was thinking though was that the finding of Peers seems to be the bottleneck - that if peer resolution could be almost instantaneous, the BT protocol could be used for so many more use cases.

Does anyone know if this part of the process could ever be improved or does that just come with the territory?

snalty

CodeCrafters has a BitTorrent client building challenge that I watched Jon Gjengset do on a stream, if someone wants to try and build their own client in an iterative and checked way.

https://app.codecrafters.io/courses/bittorrent/overview

DrammBA

It's worth noting that this is a paid service offering some free courses for a limited time. The BitTorrent course does not appear to be free at the moment.

donatj

This was post number 5 when I started reading it and 45 when I finished, I am guessing some people from the RIAA and MPA found their way to the flag button.

SoKamil

This looks surprisingly simple. A perfect project to learn programming language by creating thing you can interact with and not get overwhelmed by domain complexity.

Do you know more such problems?

jedimastert

There's an ad at the end of the post for something called CodeCrafters at the end of the post that appears to be a repository of such interesting problems

TheDong

Creating a simple ray tracer for learning a language’s vector math libraries and parallelization tooling.

An FTP client and server.

An IRC client for the older protocol version.

A gemtext parser and gemini client.

A redis clone supporting only the simplest operations.

A brainfuck interpreter.

Twitter.

A todo list program (maybe a little too hard for this list)

HumblyTossed

Oh wow. I remember all the "I wrote a twitter in a weekend. Why does twitter need a thousand engineers?" posts everywhere.

imbusy111

One thing that I find hard to understand is how you're able to connect to a random peer online with just an IP and port combination. The peer is likely behind a NAT proxy, and never communicated with you to begin with. In my head, NAT shouldn't allow this connection through, or be able to associate the port with the individual peer machine since the connection source (you) is arbitrary.

This is possible with port forwarding. But that's a niche set of peers, who have the power to configure port forwarding on a NAT proxy.

shellac

Most torrent clients I know support UPnP and/or NAT-PMP to request port forwarding from routers. As you say, it just isn't going to work for many people without this.

I run a BitTorrent service for an academic institution, to disseminate research data. We have a regular routable IP address, but still need to navigate the institutional firewall.

miki123211

There are a couple ways to do this:

1. Manual port forwarding, it's likely that there are at least a few power users who already have the torrent and are seeding who have this set up. Seedboxes are a notable example, they're often simple servers that actually have a public IP.

2. UPNP, a protocol that lets you ask your router to set up a temporary port forward for you. Again, not all peers support this, but some do, and you can just connect to those.

3. Hole Punching. Imagine Alice is sending data to Bob, and her router ends up sending it from port 1234. Her router needs to send the packets it receives on port 1234 back to Alice's computer, to allow her to receive Bob's responses. Some routers will do this no matter which IP the packets are coming from. If Bob tells alice her router is sending from port 1234, she can spread that fact to others and let them contact her that way.

I don't know if BitTorrent clients take advantage of this specifically, but it's a very common way of doing NAT traversal in general.

NATs is why private trackers have the concept of "connectability", if you're "connectable", it means you can accept connections from other clients. Crucially, if just one of the peers is connectable, they can both communicate, so connectability is heavily encouraged but not required.

anacrolix

There's a less commonly known feature of NAT (maybe called cone NAT or something), where the remote endpoints are not checked for incoming packets and connections. You communicate with trackers and DHT and as usual your outbound packets are mapped to some port. But everyone use sees your public IP and port and talk about your client using that pair. They then also communicate with you over UDP to your public port. As long as you have regular outbound traffic running through that mapping, NAT will keep the hole alive. I think with testing I found at least half of people on NAT had this. It's less common on mobile and fibre.

pzmarzly

Most clients try to set up port forwarding using UPnP IGD, NAT PMP, or PCP protocol, a lot of residential routers support one of them.

If you enable WebTorrent as a transport protocol (enabled in gotorrent, disabled by default in libtorrent), it should be possible to use existing public STUN/TURN infrastructure, but I don't know if any client does it yet.

In practice, you just have to accept that many connections will simply fail, and make your client move on to try a different peer.

anacrolix

Yeah I believe WebRTC connections use stun/turn. The feature was contributed and just has a static list somewhere that works fine.

anacrolix

The other answers are correct about only needing a single peer among any pair of peers to be addressable.

However there is a built-in hole punching mechanism in BitTorrent where peers ask for a third peer to assist in hole punching.

It's implemented in my client. It was very painful to implement. I think someone privately funded the feature which was very nice.

untitaker_

the answer to this problem in general is NAT hole punching, but BitTorrent doesn't actually have an answer to your problem. if you are behind a NAT, you can only connect to peers that are not behind a NAT or have port forwarding set up. for popular torrents this is good enough because you don't have to connect to all peers.

> This is possible with port forwarding. But that's a niche set of peers, who have the power to configure port forwarding on a NAT proxy.

yes it's niche but I guess this means BitTorrent isn't as P2P in practice as one wants it to be, but held up by seedboxes.

indulona

i wanted to do the same when i got fed up with bt clients. but then i found there already are go clients and i found i like qbt. but it is a good read nevertheless.

null

[deleted]