Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's wrong. The whole point of cat is to concatenate files. But if you concatenate two files with different encodings, you end up with an unreadable file. So you want cat to error out if one of the files that was passed has a different encoding from the encoding you told it to use, which is exactly what the python cat will do.


> The whole point of cat is to concatenate files.

Yes.

> So you want cat to error out if one of the files that was passed has a different encoding.

No. I expect it to read bits from stream A until it is exhausted and then read bits from stream B until it is exhausted. All the time just writing the ones and zeros read to the output stream (of bits). And no, a byte does not have to be "8 bit" (http://www.lispworks.com/documentation/HyperSpec/Body/f_by_b...).


And concatenating a file of 9-bit bytes with a file of 8-bit bytes will produce something useful? No. If you don't know what your bits represent then you will corrupt them. Python does not need to faithfully reproduce all the historical oddities of unix.


It might - depending on what I intend to do with the file (I still have the offsets of the individual files because I know the original file sizes).

API-wise, in my humble experience, it's hell to deal with operations that are supposed to work on bit-streams but try to be smart and ask me for encodings of those - this is information I might not even have when building on those basic operations. The "oddities", how you call them, are the result of not over-abstracting the code to handle yet-unknown problems.

You want to concatenate text files with different encodings? Convert them. Expecting a basic tool to do this for you (or carp on "problems") quintessentially leads to cat converting image formats for you and demanding to know how to combine those images: (addition, subtraction, append to left/top/right/bottom of first image etc).


> API-wise, in my humble experience, it's hell to deal with operations that are supposed to work on bit-streams but try to be smart and ask me for encodings of those - this is information I might not even have when building on those basic operations. The "oddities", how you call them, are the result of not over-abstracting the code to handle yet-unknown problems.

That's how C ended up as the security nightmare that it is. There are a lot of things you can do in C that you can't do in Python - you can reinterpret strings or floats as integers, you can subtract pointers from integers, you can write to random memory addresses.... Sometimes these things are useful, but most of the time they just lead to your program breaking in an unhelpful, nonobvious way.

Python is not that kind of language; it will go to some pains to help you not make mistakes. If you want to do bit-twiddling in Python there are APIs for it, and you could implement a "bit-level cat" using them, but it's never going to be the primary use case. Arguably there should be better support for accessing stdin/stdout in binary mode, but that would make it very easy to accidentally interleave text and binary output which would again result in (probably silent) corruption. (Writing a "binary cat" that concatenates two files of bytes would not lead to any of the problems in the linked article - it's only trying to use stdin/stdout that's causing the trouble in the link).


> That's how C ended up as the security nightmare that it is.

And that's how I ended up completely wasted after a friend had to throw a party after a GNU version of a common unix tool was able to accept his first name as valid parameters.

And no, C's problem are not based on encoding issues. Those are not even a first-class symptom.

> Python is not that kind of language

Maybe. I don't care much, even though I do speak Python fluently. Though, I do care about minimal functional units whose documentation I can grasp in minutes, not hours.

> [Python] will go to some pains to help you not make mistakes.

This is not specific to python, this is specific to [language] developers. If all you do is text processing, you will think in characters and their encoding. I don't. A lot of programmers don't - because they deal with real world data that is almost never most efficiently encoded in text.

To reiterate: We are all dealing with bit-streams. Semantics of those are specific to their context. If your context is "human readable text" - deal with it. But please don't make me jump through hoops if I actually just want to deal with bit-streams. If you need magic to make your specific use-case easier, wrap the basic ops in a library and use it.

Last but not least: This all is completely off-topic when the question is about "why python3" - it's great for your use-case, but from an abstract point of view, python3 was just the rational continuation of python2, cleaning up a lot of inherited debt. Though it might fit your world-view, it's not necessarily what it was about.


Do you actually work with systems not using 8 bit bytes or are you just being pedantic?


Yes, I do. But the point I was trying to make was "it's bits, not bytes" (because "byte" has been defined to mean 8 bits only recently).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact