In pursuit of the perfect phishing that would trick even you

Monday, March 4, 2019

Imagine that you flip a coin into the air (with no trick) six consecutive times: which of the three following sequences do you think is more likely to appear, considering that "heads" is represented by 1 and "tails" by 0?

    1. 1 0 1 0 1 0
    2. 1 1 1 1 1 1
    3. 1 0 1 1 0 1

Most of the people choose the third sequence ⸺1 0 1 1 0 1⸺ because it seems to be the most random one. First two sequences are too steady to match our intuitive idea of randomness. Actually, the three sequences are equally probable, with a probability of (1/6)6. However, as we are more used to see randomised than uniform sequences (since in fact they are larger), in some way the third sequence represents better our preconceived idea of how randomness must be.

This thought error is indeed called representativeness heuristic: we assume an example to be part of a class according to how well such example represents our stereotype (preconceived idea) of the class. For instance, if you see a man in leather jacket with a punk bracelet, it will be easier for you to imagine that he likes heavy metal rather than if he wore a suit and tie and used hair gel.

Remember that a heuristic is a thought shortcut: rather than answering a complex question, we answer an equivalent, but simpler, one. On the coin example, we asked ourselves "which is the sequence that seems to be more random?", rather than answering the question of interest "what is the likelihood of each sequence?". The use of the representativeness heuristic shows the implicit presumption that like goes with like. We mistakenly believe that the member of a given category must match the stereotype of such category, that an effect must match its cause. 

As it happens with all heuristics, representativeness is also useful in most situations: a thin and more than 2-meters tall sportsman is more likely to be a basketball player than a footballer; a man traveling with a guitar-shaped case is more likely to carry a guitar inside than a machine gun, in the same way that if you receive an e-mail from your boss such e-mail is more likely to have been sent by your boss than to be a phishing attack.

Nevertheless, unfortunately it is not always the case.

When you hear hoofbeats, do not expect a zebra
Marcos is a young 30-year-old man from Spain. He is shabby, wears glasses, black t-shirts with freak cartoons and uses Linux. Which option do you think is the most likely one?

     A. He works as a developer at ElevenPaths in Madrid
     B. He works as a storekeeper in a supermarket 

Many people swayed by the representativeness heuristic are tempted to choose the option A as the most likely one. However, in Spain there are many more people working in supermarkets than at ElevenPaths in Madrid. This way, even if there is probably a greater rate of Linux lovers among the ElevenPaths’ workers than among storekeepers, their total number will remain lower in relation to Linux lovers working in supermarkets.


The problem arises because a strong stereotype is blinding us to other sources of information, like the base rate: the awareness of a subgroup’s relative frequency regarding the total. The closer an individual is to the general class’ stereotype, the more stubbornly we ignore the base rate.

In fact, at US medical colleges the following aphorism goes around: "When you hear hoofbeats, don't expect a zebra". It means that, when diagnosing a disease, the most probable causes must be firstly considered, instead of the most exotic ones. In other words, when evaluating the probability of an event we cannot overlook the base rate. 

If you have watched the movie Moneyball de baseball you will remember how Billy Beane sets up a brilliant baseball team by engaging players with great batting statistics but who were excluded by other scouts because they didn’t look like the stereotype of great batter. To the astonishment of the remaining clubs, Beane gets excellent results at very low costs, and simply because he is the only one to not be swayed by the representativeness illusion.

As you can see, this heuristic makes us forget our (little) knowledge on probability and statistics and rushes us to make decisions with often catastrophic consequences, as you will see in the following example.

The fact that something is plausible does not imply that is very probable
Sergio is a 35 years old man. He studied Informatics at university and since his teenager years has been inclined towards hacking and programming. After his studies, he worked two years for an antivirus company and later three years as a pentester for a cybersecurity international firm, where he promoted to head of department. Following this, he obtained a master’s degree in Risk Fraud Management and wrote his master’s thesis on Game Theory for malware detection. Which option do you think is the most likely one?

    A. Sergio works for a large bank.
    B. Sergio works for a large bank, at the cybersecurity unit, where he heads the computer emergency response team.

I have raised this issue in dozens of trainings, and almost all my students have always chosen the same option: the B one. As you might have quickly noticed at this point of the article, the B option is not the right one. The employees of a bank’s cybersecurity unit constitute a very small subgroup of all the bank’s employees: it is mathematically impossible for the probability of occurrence of a subgroup to be higher than the probability of the whole group.


For example, it is more probable to draw hearts or aces from a deck of cards than to draw an ace of hearts. If you have chosen the option B as well, you have also been a victim of another common thought error highly related to the previous one: the conjunction fallacy. 


It is obvious that the probability that two events occur together (in conjunction) is always lower or the same than the probability that each one occurs separately:

  • When I present Sergio’s story before posing the question, intuition wins and gives a wrong answer.
  • Nevertheless, when I pose the question directly ⸻without previously presenting Sergio’s story⸻ no one choses the option B, conversely everybody selects the A one. Without a plausible story, logic wins.

If you think about it carefully, per every new detail added to a given scenario, the probability of such scenario only can decrease. On the contrary, its representativeness, and consequently its apparent probability, increase. What a paradox! The more plausible and detailed a given scenario is presented, the less probable in statistical terms such scenario is and, however, the more probable it is for us.

That said, the most coherent stories are not necessary the most probable ones, but they are plausible. This way, we confuse coherence, plausibility and probability. The most representative results coupled with personality description result in the most coherent stories.

This misjudgment may be particularly pernicious when hypothetical scenarios are set up to make forecasts. Think about the following two scenarios and evaluate their probability:

    A. Over next election that will be held on 28 April it is expected a large-scale cyberattack on Twitter against Spanish politicians that would result in the hijacking of more than 100 accounts.
     B. Over next election that will be held on 28 April it is expected that an old member of a political party who was fired ⸻and who keeps in touch with hacktivist groups⸻, orchestrates in retaliation a large-scale cyberattack on Twitter against their old colleagues that would result in the hijacking of more than 100 accounts.

The B scenario offers us a much more vivid and exciting story, that is, more plausible, although its probability is lower. In fact, the second scenario is included within the first one: politicians from a Spanish political party constitute a subgroup within the whole group of Spanish politicians. Nevertheless, adding details to a scenario makes this one more compelling, although it is less probable that may be true. It is a really common trick when estimating scenarios.

Here you have a new example: which is the most probable scenario?

    A. Telefónica will suffer a cybernetic attack
    B. Telefónica will suffer a large-scale cybernetic attack from North Korea intended to disrupt communications of a large area of our country and bring chaos

Now you are absolutely warned and know that the A scenario is the most probable one. So, I invite you to test your friends and you will see how most of them choose option B.

Looking for the perfect malware
So, how can a cybercriminal take advantage of the representativeness heuristic? They will exploit this trend to believe more firmly a rich and detailed scenario than a less detailed scenario. When a story includes "must be" details, we tend to believe it more firmly than if it lacked them; even if, mathematically, the more details a story includes, the less probable such story will be compared to a less detailed one.

For instance, the cybercriminal may call their victim with the aim to impersonate the security admin explaining that they are calling from the data center because they have received unusual behavior alerts on the network from the victim’s computer, so they need remote access to perform a security assessment and for this purpose they require the victim’s access credentials to log in with their user and check what it is happening.

Considering that all these details match our stereotyped image of the security admin’s everyday work, it is more probable that the victim discloses their credentials than if the attacker simply called and claimed to be the security admin and to need the victim’s password. Here you can see how, according to the probability theory, this second scenario is much more probable than the first one…but less plausible!

It is probable that the presence of plausible facts increases our belief in the implausible request. In other words, to exploit representativeness and conjunction an implausible request containing as many assumptions and details as possible is made firstly, so the story becomes richer. The more plausible the first assumptions are, the easier it will be for the victim to believe in the last request, no matter how implausible it may be.

This strategy may be found in:


  • Social engineering attacks by telephone, e-mail, and even face to face. They are intended to impersonate someone within the organization with the aim to get sensitive information.
  • Baits using infected physical devices, such as USB drives, perfectly labeled and intentionally placed on likely places, so the victim may find them and feel temptation to connect them.
  • Well-written phishing messages offering a number of plausible details before requesting to click on a link that typically will bring you to a usual log-in website. These attacks are so targeted that are usually included within the category "spear phishing".
  • Fake windows displaying virus alerts that pretend to come from the computer antivirus and request to execute the "antivirus" in order to clean your computer because it has supposedly detected any kind of malware.


Self-defense against attacks on our representativeness heuristic
In summary, if you want to avoid being swayed by the representativeness heuristic, it is necessary to look back on the following guidelines, some of them modeled on the book Irrationality:

  • If a given scenario requires to fulfil an additional condition, no matter how plausible such condition seems, the scenario will be less probable, never more.
  • Don’t judge only by appearance. If something is more similar to X than Y, such thing may be, however, more probably an Y if there are more Ys than Xs. Find out about the base rate!
  • It is always less probable that a statement including two or more data will be true regarding a statement containing one of these data, no matter how plausible all of them may be.
  • Be careful of believing that a statement is true because you know that it is partly true.
  • The fact that a story is very plausible does not imply that is very probable.

When making a security decision, discard tales and do the maths.


Gonzalo Álvarez Marñón
Innovation and Labs (ElevenPaths)

No comments:

Post a Comment