Machine learning technique predicts likely accounting fraud in supply chains

FraudGCN approach overview. The researchers constructed three types of “subgraphs” based on the type of relationships between firms: with accounting firms, along supply chains, and across an industry. The direction of the model’s machine learning is represented by red arrows. Gray circles (“nodes”) represent fraudulent firms, and white circles represent normal firms. Credit: Big Data Mining and Analytics, Tsinghua University Press

As accounting fraudsters become more sophisticated in their techniques, fraud detection needs to be improved. Fortunately, a group of researchers has developed a new machine learning “detective” that can analyze not only fraud within a single company, but also predict likely fraud across supply chains and industries.

A paper describing the team’s approach was published in the journal Big Data Exploration and Analysis August 28.

Financial statement fraud, or more commonly known as accounting fraud, may be a less common form of corporate fraud, but it is by far the most costly crime in the world. The most famous cases of white-collar crime can be considered accounting fraud, where a company manipulates the numbers on its financial statements or other valuation data in order to make itself appear more profitable than it is.

The collapse of the American energy company Enron, the largest bankruptcy in US history, was due to the group’s manipulation of accounts in collusion with its accounting firm. In 2008, Lehman Brothers declared bankruptcy due to insolvency, after hiding around $50 billion in debt through balance sheet fraud. In the late 2010s, American investment advisor Bernie Madoff managed to defraud his clients for a colossal sum of $65 billion.

Investors are not the only ones affected by financial fraud. Hundreds of thousands of jobs can be lost, communities devastated, and in the most extreme cases, the stability of national economies can be threatened by repercussions.

Despite the threat of such fraud, authorities have a hard time detecting it. Red flags such as a company’s sudden spike in performance just before the end of a reporting period, or a surge in sales while competitors’ sales are slow, may turn out to be nothing more than the result of good luck or a superior product. That’s why for decades, forensic auditors have used statistical analysis to spot manipulation.

But these efforts require a huge workforce and the examination of huge volumes of data. As a result, authorities tend to rely on random audits, meaning most companies go unchecked.

“Worse still, in recent years, fraudsters have used increasingly sophisticated techniques,” said Chenxu Wang, lead author of the study and associate professor at the School of Software Engineering and Key Laboratory of Intelligent Networks and Network Security at Xi’an Jiaotong University. “It’s a never-ending mathematical arms race between authorities and fraudsters.”

“What is needed is an efficient and accurate algorithm to automatically identify accounting fraud and leave behind the era of random audits,” said Mengqin Wang, also of Xi’an Jiaotong University.

Some mathematicians and computer scientists specializing in this field have achieved positive results using machine learning. But so far, this approach has only been applied to individual companies.

“This ignores the often complex relationships between different companies, which can also reveal clues to fraud,” said Yi Long, another member of the team but who works at the Shenzhen Finance Institute at the Chinese University of Hong Kong in Shenzhen. “An accounting firm that partners with one company to defraud its financial statements has an increased likelihood of engaging in fraudulent activities with other companies.”

And fraudulent relationships don’t just spread between accounting firms and their clients. Accounting fraud practices can spread up and down supply chains or perpetuate horizontally across multiple industries.

But integrating data beyond a single company means a proportional increase in IT costs. Additionally, existing machine learning approaches suffer from a severe imbalance in the samples used to train the computer model to classify an item as fraudulent, as normal, non-fraudulent samples significantly outnumber actual fraud cases. This imbalance can lead to biased computer models that prioritize the majority class, non-fraudulent cases, making it difficult to accurately detect fraudulent activity.

To overcome all these challenges, the research team developed a machine learning technique combined with mathematical methods from the field of graph theory.

The cutting-edge AI financial fraud detective they designed uses a graph, a structure that mathematically represents the connections or relationships (described as edges) between different companies, individuals, and products (described as nodes). Multi-relational graphs allow for multiple types of edges, allowing for the representation of various relationships between nodes, and provide a more complete representation of the complexity of the connections between them.

The detective itself, called FraudGCN, is a graph convolutional network, or GCN, a type of neural network designed to work on graph-structured data. Unlike traditional neural networks that work on grid-like data such as images, GCNs can work on data represented as graphs.

FraudGCN itself builds a multi-relational graph representing various industry connections, supply chain links, and audit practices shared by accounting firms. In doing so, it captures rich information from these relationships, particularly details discovered in particular “neighborhoods” of nodes in the graphs. By aggregating this information, FraudGCN improves not only the ability to identify patterns indicating existing potential fraudulent activity, but also to predict where it is likely to occur.

Finally, unlike previous efforts in machine learning-assisted fraud detection, FraudGCN is able to handle the addition of new nodes without the need to retrain the model, improving its adaptability and scalability.

The team tested FraudGCN on a real-world dataset from Chinese listed companies to evaluate its performance and found that it outperformed state-of-the-art approaches by 3.15% to 3.86%.

In the future, the team hopes to expand its approach to be able to deal with mid-sized companies, not just larger ones.

More information:
Chenxu Wang et al., Learning multi-relational graph representation for financial statement fraud detection, Big Data Exploration and Analysis (2024). DOI: 10.26599/BDMA.2024.9020013

Provided by Tsinghua University Press

Quote: Machine learning technique predicts likely accounting fraud in supply chains (2024, September 3) retrieved September 3, 2024 from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.

Machine learning technique predicts likely accounting fraud in supply chains

So what movie did your brain see? Eye movements can create different versions of the same movie in our heads

Severe turbulence injures seven on Boeing 737

Severe turbulence injures seven on Boeing 737

Leave a Reply Cancel reply

Category