If you create a semantic graph for cyber threats, it will be able to discover which attack vectors your organization is exposed to. So what exactly is machine reasoning and what challenges will you have to crack to make it work?
One common mistake is using machine learning and artificial intelligence as synonyms. In fact, machine learning is just one method under the umbrella of artificial intelligence. Another method accepted in the world of AI is machine reasoning.
While machine learning is based on the statistical identification of hidden patterns within a large amount of data, machine reasoning is based on using facts and drawing conclusions from those facts. That is, machine learning is based on the analysis of many examples (preferably categorized) of the phenomenon you want to learn, and the machine independently builds a model that allows automatic classification of new examples. In machine reasoning, the system receives the semantic model and reasoning methods externally, and then the machine draws conclusions about new examples. Another difference between the two methods is that machine learning deals with pattern recognition, compared to machine reasoning which deals with understanding relationships and drawing conclusions from facts.
Machine reasoning uses concepts and ideas coded as symbols, and then drawing logical conclusions to try to resemble common sense. Reasoning systems represent data by semantic knowledge graphs that allow the machine to understand the meaning of the data through the semantics encoded in the graph, and to draw conclusions about that data by analyzing the graph of concepts and projecting them onto the new data.
The standard method for representing a semantic graph is RDF (Resource Description Framework) – a directed graph described as triplets. A triplet in an RDF graph has three components:
- Node for the subject
- An arc with the predicate linking the subject to the object
- Node for the object
Let’s take for example the concept of a user account. Intuitively, each of us understands what a user account is, but the computer has no intuition, so the concepts and the relationship between them need to be explained down to the smallest details, so that it can draw conclusions.
This simple and flexible data model has a lot of expressive power. It can represent complex situations, relationships and other interesting things, while also being abstract. RDF is considered one of the fundamental technologies of the Semantic Web; Reasoning systems excel in the ability to explain the “thought” process that led to the conclusion (explainability) – an ability that is lacking in most machine learning systems. Semantic graph technologies also make it possible to combine different types, formats and sources of information into a common language and to achieve semantic and logical action capability on the integrated information.
Figure 1: A simple semantic graph describing basic concepts from the IT relevant to attackers
Teach an AI model to behave like an attacker
Semantic graphs also have great value in the cyber world, where a semantic graph for cyber threats can be produced by using information and concepts found in standard information sources, such as MITRE ATT&CK and NVD CVE. Attack techniques can be analyzed to define the “requirements” of the attackers: if you combine a semantic graph of cyber threats with a graph describing features of an organization’s IT systems, the reasoning system can deduce what information is needed to enable the technique and build a “virtual attacker” that can explain how, in principle, to attack an organization. This is of course in order to better protect the organization.
Once there is an accurate description of the IT systems of an organization, the connectivity between the systems and the description of the system’s identity and access information, the reasoning system can build specific attack scenarios for that organization – just as a real attacker would do. If we also add to the system semantic information about defenses (mitigations) as they are defined by MITRE D3f3nd, the system can suggest ways to reduce the risks from those attacks. For example, if there is an attack on a certain port, it can be concluded whether it will affect the organization, and if so, which systems it will affect.
For these reasons, machine reasoning is particularly suitable as a system for assessing an attacker’s ability to succeed in attacking the organization, without carrying out the attack. It also enables the assessment of organizational resilience to prevent cyber-attacks.
So How Does the Virtual Attacker work?
The system finds which attack methods are relevant to that organization by checking the prerequisites for attack techniques and calculates which of them are most relevant to the organization. The more accurate information there is about the organization, the more relevant the answer will be and you can find out which attack techniques the organization is sensitive to.
For example, a basic prerequisite for an SQL injection attack is that the system must include an SQL database. Another example is that a condition that must be met in attacks against passwords (brute force) is the use of weak passwords that can be cracked. Many MITRE ATT&CK TTPs may be irrelevant due to missing organizational prerequisites.
There are three major challenges in building a virtual attacker. The first is the precise semantic analysis of attack techniques, such as those described in MITRE ATT&CK, which are described for human understanding and not suitable for reasoning systems. The solution is relatively simple to understand but difficult to implement: the techniques need to be rewritten precisely with consistent and precise basic concepts (that is, an appropriate semantic model) – and only then can a reasoning system be built.
Taking for example MITRE ATT&CK technique T1210, “Exploitation of Remote Services”, one of the accepted methods is to use a CVE that allows a remote service to be invoked. Therefore, it is necessary to enter into the reasoning system the ways to check the existence of the CVE on a system (SCAP can help) and to classify the vulnerabilities according to the ability to enable the activation of a remote service. For example, a prerequisite for finding a CVE is the ability to connect to that computer via the network – that is, having physical and logical connectivity that allows the vulnerability to be activated. These two facts are a start that enables reasoning regarding the use of the T1210 technique – “Find a system with the vulnerability that has connectivity that allows the exploitation of the CVE”.
The second challenge is to create a language (ontology) that connects concepts from different attack domains – such as permissions, vulnerabilities and configurations – and to create the semantic graph. There are some detailed ontologies that explain the relationship between various cyber concepts such as the UCO of the University of Maryland or MITRE D3F3ND.
Figure 2: An example of a simple semantic graph connecting IT concepts to attack concepts
The third challenge is collecting relevant information from the organization’s systems. This can be done by interfacing with existing systems and translating the information into the common language or by a dedicated scanner.
Once the system has gathered all the information, it can simulate of millions of cyber attacks to determine specific attack scenarios against the organization, and calculate the risk from these attacks. The goal is to determine courses of action to mitigate attack scenarios, reduce risk and build cyber resilience.
Reasoning systems, along with machine learning systems, will have an increasing use in cyber defense – especially in the world of risk analysis and management. As in any development in the cyber world, here too the concern is that these systems will also be in the hands of the attacker.