This seems novel until one realizes public channels are already available on the web.[1] Also, I'm surprised Paul didn't go with Pyrogram for creating the user accounts (which have unlimited cloud storage and 1.5GB file limits btw).
The words 'mass surveillance' in the project title seem to me sensationalist. From his repository list, I suppose the author is involved in ads and marketing, which would figure.
Had I parsed Usenet feeds into a relational database, and called it 'mass surveillance', I'd have rightly been ridiculed.
Regardless, it can be useful to have message data in this format.
I simply was not aware of Pyrogram, thank you for sharing that and pointing it out.
And yes, I admit the headline is a bit charged and its implementation does not reflect it as much as I would like.
It does not change the fact that I researched an implementation of Telethon for a use case I needed, and I didn't find it so I made it and shared it.
My involvement in ads and marketing had nothing to do with that. It's less nefarious, it just sounded cooler and was an innocent mistake in retrospect.
As mentioned above, I appreciate the mod(s) for taking the time and energy to edit the title so it reflected the implementation. This is my first shared project on HN and I am excited to finally participate and see the feedback/energy.
Sounds like a good old IRC channel logger.. but for Telegram. The title is some serious click bait.
That said, I haven’t checked deeper, if this bot is not actually using the bot API but MTProto, this is pretty significant as the bot appears like a normal user (and not as a bot, which are required to have a “-bot” suffix on Telegram).
Yes, it does use MTProto and yes it is a real user. And yes admittedly the title in retrospect was a bit charged but the applications remain the same if extended.
What prevents anyone from doing the same with Signal or any messaging service that allows one to build a client? This program isn’t pretending to be a bot on Telegram, and it works as a normal user in every way (including the requirement for a working phone number, even if it’s a burner number).
You can't ever assume that clients aren't logging everything that goes through them, even if there's no official documentation/API for custom clients. If a human can read a message, for all intent and purpose assume that a machine can too. For instance, things like Snapchat self destructing messages rely more on social norm than technology.
Well, Signal comes with a system for verifying a person's identity, so you can be sure it's really someone you know and not an imposter. But sure, for semi-public channels that will let anyone in without verification, something like this would allow you to monitor it. Lesson: If you're using Signal to run a dissident network and organize protests, be sure to verify everyone before adding them to groups.
This doesn’t verify an identity, it verifies that somebody has access to a key that at some point in the past, you chose to trust. Signal has no mechanism at all for verifying identity, verifying the authenticity of the safety number is entirely up to you.
Maybe I'm missing the point, and if so please correct me, but when I add new people to signal we do so in person and there's a QR code where one can "verify" the person. It obviously links the verified user with their phone, so theoretically someone could steal their phone and pretend to be them I guess. But it is making them a "verified" user, it's just up to me to verify.
The safety number is essentially equivalent to a self-signed certificate. You can use it to consistently identify a key holder, but it doesn’t offer you any way to identify who that key holder is. If you want to trust a self-signed cert, then figuring up whether you should or not is entirely up to you. This is the problem that CAs address with CA signed X.509 certificates.
If you want to validate identity as a service, then the only options you really have available are a central authority, or a web of trust. Both of which have serious downsides, and neither of which are offered by Signal.
That's a generally correct statement, but the comment you replied to mentioned a specific case for which it's false.
If you can reduce the trust problem by requiring every person to verify out-of-band the identity of every one of their contacts, it becomes a lot simpler. For some Signal users, such as the person you replied to, this appear to be the case.
Marking a safety code as trusted in signal is not different in any way to adding a self-signed TLS certificate to the trust store on your computer. You can perform exactly the same out of band verification of the self-signed cert. Signal does not have a single feature that verifies the identity of your contacts, they don’t pretend to have such a feature, and frankly I’d say it’s dangerous to assert that they do. How you do that is 100% your own problem to solve. The only thing it does is provide you with a trust store to use to keep track of which contacts you have personally decided to trust.
I agree completely. I merely wanted to point out that your reply that the problem is really hard was in reply to a person who solved the trust problem using the tools signal provides by having a simple problem (will always meet people in person first) and doing some work (verifying safety numbers).
This is effectively true, baseline it is indistinguishable from a "real" person, questioning the security model of openness and potential for mass social engineering. There is a reason why the hurdle of overcoming scaling the creation of a "real" phone number is difficult.
I wholly admit the title was a bit charged and I appreciate the correction from the mod(s). That said, it questions the security model of privacy apps and the nature of open source. As I mentioned above, there is a reason they make it a hurdle to scale account creation.
It represents more than just a crawler if you have an imagination. That said, I was looking for a solution for one of the listed use-cases and none existed so I did something about it and shared it and now one exists. It takes no creativity to deride a work, and enough to make one and put it out there.
Getting both sides of the feedback has been a fun learning experience and looking forward to putting more out given the amount of feedback! Thank you both :)
I maintain some medium-sized TG channels and constantly have what we have deemed “surveillance” accounts join daily. One of the admins of the chan implemented a simple turing test bot which requires immediate 60 sec solving of a basic math equation or the account gets kicked. They were solving the “click this button to verify” but none at all seemingly know how to solve 6+4, or they can’t read English quickly enough.
Yes, the click button is on the todo. I have encountered these and the common one is the button click followed by the basic arithmetic. Solvable but anything beyond this would "defeat" it.
Security vs. Usability. Multi-device E2E requires ONE device be the source of truth for the private key. That's why for WhatsApp Desktop to work, you need your phone to be connected. This defeats the purpose for most people.
Does it, though? I feel like most people just use their phone, and that's it, and people who use the web version don't mind so much that they need their phone on the internet, because it pretty much always is anyway.
Sure, there are edge cases, like being on a plane and only having bought internet access for your phone, phone battery dead and no charging cable, phone lost/stolen, but those things seem rare enough that most people just live with it.
I would prefer that I didn't need my phone to use WhatsApp Web, but in practice it hasn't kept me from using WhatsApp (mobile or web).
for me, its one of the reasons why I prefer Telegram over WhatsApp. There have been numerous cases when my phone was dead or offline (being abroad/roaming, battery dead, etc) and I wanted to use my laptop to finish that important conversation. The other one is lack of a native Linux desktop client for WhatsApp.
Right, I get that, and even mentioned one of the failure modes you mention. But I think for the average user (which more or less disqualifies most of our experiences here), it's not even remotely a deal-breaker.
> Multi-device E2E requires ONE device be the source of truth for the private key.
To this:
> That's why for WhatsApp Desktop to work, you need your phone to be connected.
Is just an implementation detail. Signal gets around this by letting server know about person's devices so that each device can sync independently of others. Phone holds ultimate key, but messages do not need to be routed through phone.
The way Wire does this, if I recall correctly, is that logging into your account (proving to the server you can authenticate) allows you to push up a new device public key, which you can see as your device fingerprint in the UI. A conversation between two users with multiple devices is really a group chat, with pairwise ("client fan out") double ratchet sessions going on.
When you do this, your other devices are informed of the account change by the server, as are people you communicate with (if they've previously marked your account as trusted, changing any devices on your account changes that). This isn't much different to Signal: ultimately the server acts as a key directory in both cases.
The problem with this approach is that it doesn't scale well at all. This is why Facebook, Wire etc are working on MLS (Messaging Layer Security) which is basically "add trees" so group chat scales better.
Thanks for explaining, that makes much more sense.
It seems to me that double ratchet is really to blame here. Without it, you could simply share a single key across all devices. With it, your choice is to either deal with this sort of complexity or to set up a trusted proxy in the middle.
It's a bit strange actually. There's this constant mantra of having to pick either security or usability. We now have readily available means for usable _and_ reasonably secure E2E, but the crypto nuts go and add additional "must haves" that once again make it difficult for the average person to use.
An aside: Instead of authenticating with a central server to add a key (as you've described Wire doing), why not handle this client side via X.509 certificate chains? This is very mature crypto and seems far more flexible. It would enable use of standard PKI token hardware for managing your root identity, allow fully offline enrollment of new devices, and provide cross signing for various purposes (changing your root identity, setting up a web of trust with a group, integrating with a corporate environment, etc).
No, it does not. Keybase does it pretty well. Signal is pretty close. Even the WhatsApp approach (require phone to be online, which is how it differs from Signal) is better than what Telegram has today (not even possible).
The whole purpose of cloud chats would be broken if e2e would be implemented this way. Search for example basically impossible to implement anywhere near the way telegram does it.
Also from all my group chats and I have hundreds 99% are public or semi public (not searchable in telegram but can be found on the internet) it make no sense to encrypt this.
Twilio has a WhatsApp integration. It's paid per message, but it's pretty cheap if you're just forwarding a single user's messages. Probably a bit expensive to do at scale though.
I'm not sure what kind of open source apps leverage it, but I would guess there is something.
Yeah, I mean they really ironed cookies out something fierce, with every website now spamming us multiple times a week probably asking if we want to store cookies. So ironed out.
Eliminating information dissymmetry, by showing to the rest of the world what -given the simplicity- undoubtedly many other people have already found out (but have kept for themselves).
For that to hold reasonable ethical weight, you’d need to spread wide awareness of this tool. I’d say you qualify for that if, for example, you were to run this bot in some kind of “warning mode” that would join unprotected groups and let participants know of their danger.
Not a fan of Telegram, but as far I understand this only let's you scrape messages from public channels. People shouldn't really expect anything else when you write a message on a group with 100s of other people
Exactly, and this tool even joins the channel. I mean what else do you expect, it's not a surprise that anyone who joins a channel can read the messages.
Not untrue. And its one ngnix ec2 instance away from silently changing it from anyone who joins a channel to read the message to the world indexing all those messages.
Gathering intelligence on - and data from - bad actors.
Do a search for "site:telegram.me" including a keyword from any illegal activity, such as carding, and you'll find hundreds of channels with interesting behavior.
The easiest negative ethical aspect I would point out is in chat apps like discord or TG we have a reasonable expectation of privacy that is not necessarily implicit in the APIs. It could be one ec2 free-tier instance away from making all that information indexable.
It would be great if someone can do an opensource like that, as mentioned by others Telegram is not privacy friendly but having access to the content that spreads around in such platform can be very valuable
Sweet! This point of this project was I found no boilerplate for one of the listed use cases so I just built it in 2 days and later shared it. I hope you do the same :)
This is so stupid. Public channels and Groups have a web frontend it doesn't even need a telegram account to see or crawl them. Example link: https://t.me/s/durov/110
That is actually an awesome and "ethical" use case for this project. Thank you for sharing who this journalist is and sharing ideas on ethical application.
[1] https://telegram.org/blog/privacy-discussions-web-bots#view-... [2] https://github.com/pyrogram/pyrogram