More

eLod · on July 12, 2015

out of curiousity:

  require 'numbers_and_words'

  def convert(locale, numbers)
    I18n.with_locale locale do
      numbers.map { |i| i.to_words.length }
    end
  end

  def eliminate(locale, numbers = (1..99).to_a)
    eliminated = convert(locale, numbers).uniq
    if (numbers - eliminated).empty?
      eliminated
    else
      eliminate(locale, eliminated)
    end
  end

  %i[en es ru fr ua hu lt lv et tr de it nl se].each do |locale|
    print "#{locale}: #{eliminate(locale).join ', '}\n"
  end

results:

  en: 4
  es: 4, 6, 5
  ru: 6, 3, 5, 4, 11
  fr: 3, 5, 4, 6
  ua: 5, 3
  hu: 5, 2, 4
  lt: 4, 2, 6, 5, 7
  lv: 5, 7, 11
  et: 4
  tr: 2, 3, 4
  de: 4
  it: 3
  nl: 4
  se: 3, 4

so if i'm not mistaken 4 is four letters long on most of these languages, but you can see there are some other destinations on other languages

eLod · on Nov 14, 2013

i myself heavily favor pg over my also, but just for the record mysql has some kind of support for that, see http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html#...

eLod · on July 20, 2012

i think the article misses the main point. the articles 'bashing' PHP does not say noone ever wrote good code in PHP or ever will. rather that we should change the situation where PHP is the de facto std for beginners on the web, because in their opinion there are superior alternatives. that's a valid concern imho, i mean it is 'not valid' to question wether there is good PHP code, but it is 'valid' to question whether PHP should be the std beginners language). sry for my english.

smsm42 · on July 20, 2012

De facto standards are de facto standards not because somebody decided them to be. It's because people choose it. So discussing what people "should" choose is pointless - they choose what they like, not what somebody thinks they "should".

eLod · on July 21, 2012

i disagree. first you copy others, discover and then you may alter/fine tune your decision. de facto standards are standards because "we" decide them to be.

eLod · on Sept 7, 2011

i found the pro git book a very good reading to understand git for the first time http://progit.org/book/

re_todd · on Sept 7, 2011

Great book. It really helped me overcome git's learning curve.

eLod · on Aug 11, 2011

i think you are missing the point: passwords should be hard to guess first and should be easy to remember second. the former is the stronger need.

let's say there are 500.000 english words you are choosing from and you use 4 words. that gives you 500000^4 possibilities. let's assume the words averages about 5 characters, so we will compare this to a 20(=4 words * 5 characters) character long password made of 26 types of character (english alphabet, not using numbers and other special characters), that gives you 26^20 possibilities. and 26^20 - 500000^4 ~= 2x10^28, or put it this way: (26^20) / (500 000^4) = 318 850.382..

i know a random sequence of 20 characters are very hard to remember, but 500.000 is an overestimation too. let's say we use special symbols too (50 characters) and the word dictionary has 100.000 words. (50^12) / (100 000^4) = 2.44 so we can say it is better to have a 12 character long password (made of alphanums + symbols) than 4 random word concatenated (i think 12 is somewhat a 'standard' for 'sensitive' passwords). and i would argue that on the long term multiple concatenated passwords are very hard to remember. i'm not saying this is a terrible approach, just not the silver bullet to the 'password problem' (which xkcd never claimed of course, and for 'non sensitive', 'reused'/'throwaway' passwords it may be a viable option).

edit: and i forgot about case sensitivity too.

ddlatham · on Aug 11, 2011

I'll grant you that it's more important to have a password be difficult to guess, but that doesn't obviate the importance of it also being easy to remember. Even better, let's look for password schemes that are harder to guess and easier to remember at the same time.

It's easy to compare the entropy of two schemes as you're doing in your comment, but it's more difficult to objectively claim which is easier to remember. You argue that a random sequence of 12 characters is easier to remember than 4 words. If so, then I'd agree it would be a better scheme. However, I don't think that's the case. To really settle the argument, we should do some experiments - maybe someone already has?

Here's some examples I used a generator to create:

gangster insert madden quartic

overlong cage figurine hardship

trimmer wholly movie nadir

Bt].iu@0Soc*

Vf+pIW;C>\vp

'.}]Ba,g%@vI

Which do you think are easier to remember?

dpark · on Aug 11, 2011

Are you serious? You think it's easier to remember 12 random characters than to remember four English words?

g6M;`Zt3^,d" vs selected aardvark badminton winnings

The way the human brain works, it would be at least as easy to remember 12 random words as 12 random characters.

barrkel · on Aug 12, 2011

The trouble is that by human intuitions, you think there's a strong inverse correlation between being hard to guess and easy to remember; but that's not always the case.

It's hard for humans to remember meaningless conjunctions of symbols, so we think they are hard to guess; so we err on the side of making them too short. Contrariwise, we think a sequence of just four words couldn't possibly be hard to guess because it's so easy to remember, but it's only easy to remember because we can use the meanings of the words to form an idea or image, something our brains are built for (unlike strings of meaningless characters).

This is why the word technique is better: it corresponds better with how we remember, while reducing two other risks: the risk of losing your password - non-trivial - and choosing too short a password.

eykanal · on Aug 11, 2011

Don't forget spaces. And Poland.

Another point is that letter placement within words is significantly non-random. By intelligently choosing which letters to try in each position, the hacker could at the very least minimize the number of tries by an order of magnitude for the first word.

artmageddon · on Aug 11, 2011

I probably shouldn't announce, in a forum, that using Don't Forget About Poland! as a passphrase seems like an awfully tempting for someone like me :)

(American by birth, Polish by heritage)

Speaking of the example I just presented, how much more effective would it be to include special characters within these long passphrases? Obviously the goal is to be able to remember them, but surely most if not all of us, are already using special characters for our passwords.

pyre · on Aug 12, 2011

When counting the entropy you would probably count each word as a single entry, and each special character as an entry (and disregard spaces).

* By capitalizing the words you've doubled the search space for words (assuming that the search space starts with all words lowercased)

* You could increase the search space for each word by 200% (from the space of all lowercase words) by including the possibility of words in all caps (it's unlikely for people to start using alternating case in the middle of words).

* The ' in "Don't" doesn't increase the search space that much because there are a small number of (common) contractions like that, and each of them would only break down into 3 permutations:

  don't
  dont
  don t

(though the last one is highly unlikely). So you're adding maybe 30 more words to a search space much large than that.

* As far as the special character is concerned, it probably doesn't add too much to the search space. You can break down your phase like so:

  Don't Forget About Poland!
              ||
              \/
  {item} {item} {item} {item}{item}
              ||
              ||  Disregard whitespace (acquire entropy!)
              \/
  {item}{item}{item}{item}{item}

So now you've got 5 items. Each item could be either a word or punctuation. The search space for words is huge. The search space for punctuation is small. Your algorithm just has to realize that if it chooses punctuation for one of the items, then it doesn't bother to use whitespace to separate it from the preceding word ("word," vs "word ,").

* You can also further reduce the effects of punctuation on the search space by realizing that punctuation will almost always follow a word, and not other punctuation. This also discounts punctuation as the first item in the passphrase too.

Edit:

Upon further though, if the attacker uses a simplified algorithm to account to upper-/lowercase, then it may not have that much of an effect on the search of each individual item (i.e. n!4 instead of (n+4)!). An attacker could break the common instances of case down into:

  * All words lowercased  "don't forget about poland!"
  * All words uppercased  "DON'T FORGET ABOUT POLAND!"
  * All words titlecased  "Don't Forget About Poland!"
  * First word titlecased "Don't forget about poland!"

This discounts the possibility of people alternating titlecase across words, because that's probably as likely to happen as people alternating case within words (e.g. WoRdS lIkE ThIs). Granted, this also discounts proper nouns in the middle of the passphrase (things that don't require extra effort for people to remember to capitalize).

eLod · on July 28, 2011

i researched a few php frameworks (sy2, zend, cake, li3, kohana, etc.) a little while, and by far i'm most satisfied with lithium (li3, http://lithify.me/). depending on 5.3 seems to open up a wide range of possibilities and oftentimes i feel like 'this is the way to do it', whereas i felt like 'the framework is getting in my way' with sy2 for example.

eLod · on June 7, 2011

this was already submitted & commented: http://news.ycombinator.com/item?id=2620870

eLod · on June 5, 2011

maybe i'm missing something here but the biggest issue i see with this is it does not effectively prevent spamming/robots. the libraries which this is based on were not created with distinguishing humans and robots in mind. especially checking the $1 gesture recognizer (https://github.com/josscrowcroft/MotionCAPTCHA/blob/master/j...) i don't see if this could be used in real life (for captchas).

eLod · on June 3, 2011

I wanted to give this a shot. I first thought making a chrome/safari/etc. extension/plugin/etc. to filter your feed on the github dashboard (it may be a viable option), but i settled on making api calls (and using public data only).

I pushed it to http://gfnf.heroku.com (source is at https://github.com/eLod/gfnf). The interface obviously lacks some detail, but still usable i think. You can create filters and set what to include based on languages, users and repositories. Without requirements those events are selected that matches any of the conditions (e.g. OR). With requiring user(s) for example only those events get selected whose actor is in the filter.

Of course backed with the right data this should be much easier.

eLod · on May 24, 2011

  barnabas:btch pota$ time python btch.py 
  found: 7555794

  real	1m18.464s
  user	1m18.417s
  sys	0m0.031s

(i5 2500k)

joss82 · on June 1, 2011

Nice! Would you show us the source?

anonymoushn · on June 2, 2011

It might be similar to mine: http://pastie.org/2007710

Run on 2.4Ghz Core 2 Duo:

  $time python oranges.py 
  7555794

  real	1m24.252s
  user	1m23.841s
  sys	0m0.144s

You can still get this to be much faster if you stuff this information into bitmasks, especially if you do it in one of the other languages. You can also get a lot of speedup out of more clever pruning if you're willing to precompute a bunch of stuff.

Edit: It seems like a lot of the function calls this program makes are unnecessary. This version runs about 37 seconds faster without any of the trickery I mentioned: http://pastie.org/2007699