Skip to content(if available)orjump to list(if available)

Backblaze seemingly does not support files greater than 1 TB

aosaigh

I like the log message "BadBadBadChunkRecord". I wonder what BadChunkRecord and BadBadChunkRecord are? Is there a VeryBadBadBadChunkRecord?

I've been trying to replace all my various backups (Time Machine, Backblaze, CCC) to use a single tool (Arq - https://www.arqbackup.com/)

The Backblaze client is the next to go. To be honest, I haven't had too many issues with it, but the restore interface in particular is pretty poor and slow.

jagged-chisel

I think it’s a triple negative. A BadChunkRecord is, well, a bad chunk record. A BadBadChunkRecord is a chunk record that’s bad at being bad, so it’s good. One can see the logical progression that leads BadBadBad… to be a bad one.

Of course, being bad at being bad could just be a different kind of bad. A good BadChunkRecord explains the problem with the chunk. A bad BadChunkRecord might have too little information to be a good BadChunkRecord. A bad BadBadChunkRecord could be an unhandled exception with no other details and the fact that a ChunkRecord is even involved is assumed and therefore questionable.

smatija

That's a 4-base number system. Next in progression is WorseChunkRecord, then you have BadWorseChunkRecord and so on to WorseWorseWorseChunkRecord, when you finally get to HorribleChunkRecord.

Moru

Nah, NGU standard is bad, badbad, badbadbad, x4 bad, x5 bad and so on

Crosseye_Jack

At least the file wasn't Michael Jackson bad (bad, bad, really, really bad).

redleader55

I'm speculating here based on my experience working on a storage fabric product in the past.

First - it's sensible to have limits to the size of the block list. These systems usually have a metadata layer and a block layer. In order to ensure a latency SLO, some reasonable limits to the number of blocks need to be imposed. This also prevents creating files with many blocks of small sizes. More below.

Second - Usually blocks are designed to match some physical characteristics of the storage device, but 10 MB doesn't match anything. 8MB might be a block size for an SSD or a HDD, or 72MB=9*8MB might be a block size with 9:16 erasure coding, which they likely have for backup use-cases. That being said, it's likely the block size is not fixed to 10 MB and could be increased (or decreased). Whether or not the client program that does the upload is aware of that, is a different problem.

Aeolun

This seems at the same time reasonable (no files larger than 1TB sounds fair, I don’t think anyone reads ‘no file size restrictions’ and imagines a one terabyte file), and not, because support should realize that that might have something to do with the issue and get someone technical involved.

davidt84

I read "no file size restrictions" and assume I can create a file as large as the storage space I can afford.

What else would I assume?

If there's a 1TB limit I would expect that to be described as "create files up to 1TB in size".

eterevsky

I would assume some limit no higher than 2^64, since all common file systems have file size limits: https://en.wikipedia.org/wiki/Comparison_of_file_systems

swiftcoder

Undocumented limits are a classic way to blow up traffic to your support centre. Please folks, if your service has limits, document them!

OtherShrezzing

Documenting also helps your own engineers understand that there are limitations. The back-end team might understand that there's a 1TB limit. If the front-end team doesn't, they could cost you a tonne of bandwidth uploading 99% of a 1.001TB file to the server before it gets rejected.

madeofpalk

If it actually wasn't supported, surely the client would know this and wouldn't attempt to continiously upload it.

Seems like there is a bug somewhere - whether artificially imposing a file size limit, or in not imposing the limit correctly.

jagged-chisel

> … surely the client would know this

”Should” most certainly. However, often client teams don’t work so closely with backend teams. Or the client team is working from incorrect documentation. Or the client team rightfully assumes that when it reports a file size before uploading, the backend will produce an acceptable error to notify the client.

baobabKoodaa

Oh my, this almost sounds like it just might have unintended bad effects when you lie in your marketing about what limits your service has.

mort96

It's a back-up service, not a cloud storage service. If I have a 1TB file on my machine, and I want to back up my machine, I want that 1TB file to be backed up.

londons_explore

And a 1tb file is pretty common if you take a backup of a 1tb disk drive into a disk image.

mort96

Certainly, I've regularly had single image files and tar.gz files of that magnitude on my machines which are precisely back-ups of old laptops when I get a new one etc.

kaivi

These limits are not reasonable at all. You are going to curse the Backblaze or AWS S3 before you learn to never pipe the output of pg_dump into an object store like that.

dpacmittal

Why couldn't they just say 1tb limit?

baobabKoodaa

This is what bothers me as well. I don't think you would find a single customer who would go "what, only 1TB files?"

I guess different kinds of people are drawn to different careers, and people with loose morals, hubris, and a propensity to lie are drawn to marketing. Don't even need an incentive to lie, it's just who they are.

pbalau

> I don't think you would find a single customer who would go "what, only 1TB files?"

I think you will find that there are people that see "limit <humongous>" and pick the "no limit" or no limit stated option.

> people with loose morals, hubris, and a propensity to lie are drawn to marketing

You are jumping the gun here.

tobyhinloopen

I'm going to ask the wrong question here, but... What files are >1TB? Raw video footage?

earth-adventure

Encrypted disk volumes from virtual machines for example.

bifftastic

Backblaze does not back up disk images. It's a documented limitation.

I just found this out the hard way.

dataflow360

There are certain file types that are excluded by default, but you can adjust these in BB's Prefs.

(Disclaimer: I'm a BB customer with 70TB backed up with them.)

baobabKoodaa

That can't be right. After all, "Backblaze automatically finds your photos, music, documents, and data and backs it up. So you never have to worry."

chlorion

A disk image is just a file though. Does it do some sort of analysis of files and block disk images somehow?

Maybe you mean that it doesnt do image-level backups by default?

TrackerFF

With any GHz sampling system, you can rack up TB of data very fast if the system/your setup allows for it.

Now imagine how much data the Large Hadron Collider can generate in a second.

eptcyka

Disk images, archives.

Symbiote

I've handled scientific data sets of that size (per file).

vman81

I'm guessing poorly configured log files?

baobabKoodaa

VeraCrypt containers

Borg3

The question is, where that limit comes from.. It sounds weird. 40bit file size record? like why?

I recently was fixing my own tool here and had to shufle some fields. I settled up at 48bit file sizes, so 256TB for single file. Should be enough for everyone ;)

spuz

It sounds like an arbitrary limit set by some engineer without too much thought and without the incentive to document their decision. I do this sometimes as well when dealing with untrusted data. Sometimes you don't want to have to think about the knock on effects of supporting large file sizes and it's easier to just tell the client please don't do that.

raverbashing

Honestly? Be a good customer. Help them help you and help yourself in the meanwhile

Having a 1TB file sucks in way more ways than just "your backup provider doesn't support it". Believe me.

aleph_minus_one

The problem is that it is often possible to solve the problem for (your) future, but you cannot get rid of the existing large files (if only for legacy reasons).

One example: A Git repository that I created contained files that are above GitHub's file size limits. The reason was a bad design in my program. I could fix this design mistake, so that all files in the current revision of my Git repository are now "small". But I still cannot use GitHub because some old commit still contains these large files. So, I use(d) BitBucket instead of GitHub.

Symbiote

You can rewrite the history of the repository to remove the huge files, if you wish.

Of course, all the commit identifiers change.

aleph_minus_one

> You can rewrite the history of the repository to remove the huge files, if you wish.

These old files nevertheless contain valuable data that I want to keep. Converting them to the new, much improved format would take serious efforts.

rschiavone

If the repository is personal one managed only by you, you can squash the commits so the large files disappear from the history.

You can do that on shared repos too but that would cause a bad headache to other maintainers.

znpy

On the other hand this means shifting responsibility to deal with past mistakes onto somebody else.

Are you really sure you cannot rebase/edit/squash the commits in that git repositories?

Yes, commit hashes will change, but will it actually be a problem?

aleph_minus_one

> On the other hand this means shifting responsibility to deal with past mistakes onto somebody else.

This was just one example. Let me give another one:

In some industries for audit reasons you have to keep lots of old data in case some auditor or regulator asks questions. In such a situation, the company cannot say "hey sorry, we changed our data format, so we cannot provide you the data that you asked for." - this will immediately get the company into legal trouble. Instead, while one is allowed to improve things, perfect capability to handle the old data has to be ensured.

davidt84

Perhaps you could say something concrete, rather than vague waffle and assertions?

542354234235

Perhaps you could say something concrete, rather than vague criticism?

2030ai

[dead]