hmnom's comments

hmnom · on Aug 23, 2019

It could be argued you already had the phone number of your victim.

If mobile numbers in your country are in the 2________ range, how feasible is it to add millions of phone numbers to your contact list to find out the number of someone? I think this is nonsensical.

marcinzm · on Aug 23, 2019

>If mobile numbers in your country are in the 2________ range, how feasible is it to add millions of phone numbers to your contact list to find out the number of someone? I think this is nonsensical.

If you're a state actor probably pretty easy. Get a couple thousand rooted remote controllable android devices (which you probably already have for other projects) and have them automatically add 10k phones numbers each. Then have them join public telegraph lists and check for matches. Now you have gone through 10 million phone numbers. Run it in a loop 10 times and you have 100 million. Might take a few days to setup and run.

I don't see why this is infeasible in any way to do if you have a moderate budget (ie: state actor).

edit: And if your target is in your jurisdiction then you probably have a good mapping of names to phone numbers already.

malwrar · on Sept 2, 2019

All this to get an app to make "do any of my contacts also use signal" requests? You could probably just figure out what endpoint the mobile client calls and imitate them yourself to avoid all the overhead of setting up the mobile devices. If you have to register to make the request, just provision a bunch of VOIP numbers and go to town.

Point being, if "who is using signal" is a question you want answered, it's far more trivial than having to acquire actual devices. Your oppressive regime could go from zero to black bag list in an afternoon.

aasasd · on Aug 23, 2019

I don't think you need a single device. Just bots with virtual numbers.

fcbrooklyn · on Aug 23, 2019

The impact is specifically related to Hong Kong, where the protesters are using telegram to coordinate, and where, according to the bug report, the telephone number range is limited.

aasasd · on Aug 23, 2019

There's apparently at least one private company that gathered a database of account-to-number correlations precisely by adding over ten million numbers to Telegram's address books. Here's an article in Russian where one account is deanonymised: https://meduza.io/feature/2019/08/10/kto-takoy-tovarisch-may...

Dunno if this is patched by Telegram in any way now. However, I don't see why it would be difficult for a program to add numbers to the contact list incrementally. To my knowledge, computers so far were pretty good at incrementing numbers. And if the contact list length is limited, the question is just how many phone numbers a company can buy.

TiredOfLife · on Aug 23, 2019

The way cellphone telephones work, is by registering to a cell. so all they have to do is look what phones were in vicinity of cell towers in place where they protest.

dchest · on Aug 23, 2019

It could be argued you already had the phone number of your victim.

But you have no correlation between it and Telegram user. This bug is about this correlation.

tialaramex · on Aug 23, 2019

Right, the key trick here is that Telegram is easily used as an Oracle.

Telegram has essentially agreed to tell you whether any phone number is correct, so you can just guess all the phone numbers. Never allow this unless the thing an adversary has to guess is both _completely random_ and from a _very large keyspace_ (128-bits is where you can start to feel safe). If you find you're cornered into doing this (e.g. typical email + password login) aggressively rate limit it, so the adversary has to work harder/ longer to take advantage and maybe they'll give up.

Phone numbers are neither random nor from a large key space, it's maybe 10^12 worldwide or something? Much too small.

emrhzc · on Aug 23, 2019

they say that they managed to add 0.1 million people at once. If you're after a group of people and getting only one of them is enough, the limits look pretty feasible to me, even more possible especially in small communities.

hmnom · on Aug 23, 2019

It's written in Python so as soon as they have enough users it will be as slow as any webpage with tons of JS. Thankfully, that CPU load won't be on the client side... so it's still an improvement

kdmccormick · on Aug 23, 2019

I recommend you relax your convictions about Python performance a bit.

Server-side bottlenecks are more often than not database reads/writes and other IO. And the few CPU-intensive operations can be delegated to libraries written in C.

Pure Python is slow for CPU-intensive tasks, but that doesn't mean that a Python webserver is necessarily slow.

hmnom · on Aug 23, 2019

So what you're saying is that the author of Sourcehat is expected to rewrite Flask, Jinja2, etc in C?

As an example, https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht takes 600-800 ms to generate, and it's probably heavily cached already. What will happen when they get more users? The site will be unbearably slow, unless the guy starts spending thousands in servers.

kdmccormick · on Aug 23, 2019

I imagine that the CPU-intensive parts of Flask and Jinja2 are already written in C. Much of Python's standard library is.

800ms is a reasonable response time, and if they scale up according to their userbase, they will hopefully maintain that time.

(Also, we don't know how much of that 800ms is Python vs. IO)

hmnom · on Aug 23, 2019

https://github.com/pallets/flask

https://github.com/pallets/jinja

0% C

Also 800 ms is NOT a reasonable response time to generate what basically is a bunch of text, that is absurd, but I guess this is the baseline in 2019.

I trust all IO is cached. The author can confirm it. This is just how slow Python is.

ignaloidas · on Aug 23, 2019

Almost none of the IO of that page is cached actually[0]. The only thing I'm sure that is cached is the templates themselves. I'm pretty sure that neither git lookups, nor DB accesses are cached, where you can save time. And mind you, this is served from a single data center in USA, and the latency can already eat up a lot of that. I in Europe have 300ms ping to it, so it might be that you are simply far away from the physical location.

[0] https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht/bl...

wyldfire · on Aug 23, 2019

Switching to PyPy is likely to improve the overall performance if it's not database or I/O bound.

cik · on Aug 23, 2019

This. There are also several optimized versions of python, other that stock, allowing you to increase performance based on your needs. I've shipped with many of them over the years.

sirn · on Aug 23, 2019

I used to run an image board written in Python with around 15m PV/month on a Pentium 4 machine in 2004 or so. The first bottleneck I found was the database. Some query optimization and caching fixed it very quickly. It goes a really long way until Python become a bottleneck, of which can also be fixed relatively easily (image processing in my case, which is fixed by moving the whole thing to a worker).

kees99 · on Aug 23, 2019

Are you sure? Looks like static pre-generated page to me:

  $ curl -s https://sourcehut.org/|egrep 'meta.*gen'
  <meta name="generator" content="Hugo 0.57.2" />

[0] https://gohugo.io/

psychrometer · on Aug 23, 2019

The front end of the site it static but the backend of the service seems to mostly by python.

https://git.sr.ht/~sircmpwn/meta.sr.ht/tree/master/metasrht

https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/gitsrht

https://git.sr.ht/~sircmpwn/core.sr.ht/tree/master/srht

edit: It is using flask according to this blog.

https://drewdevault.com/2019/01/30/Why-I-built-sr.ht-with-Fl...

helb · on Aug 23, 2019

The website/blog (https://sourcehut.org) is static, the Sourcehut app itself (https://sr.ht) is Python-based.

https://git.sr.ht/~sircmpwn/?search=sr.ht

bluedino · on Aug 23, 2019

Python isn't slow. Running a lot of Python is slow.

stjohnswarts · on Aug 24, 2019

If it shards well and you have the capital to buy hardware, there's nothing wrong with using python