Has Your Code Leaked?

Written by

As recently as a few weeks ago, an extortion group known as Lapsus$ was on every security analysts’ lips: they had stolen and leaked source code from some of the biggest tech companies in the world, including nearly 200GB of source code from Samsung and portions of Microsoft’s Bing, Bing Maps and Cortana projects. This took place only a few months after the popular streaming platform Twitch suffered the same fate.

We’ve seen a surge in corporate repositories being leaked online in the past years. While all the mentioned leaks were motivated by malevolent actors – a hacker’s motive is to put pressure on corporations by showing he has hacked them and had access to data – let’s not forget that mistakes do happen more often than we please to think and without making a noise: a developer inadvertently pushing to a public repository while thinking working on a private environment is enough.

Unfortunately, beyond the apparent reputational damage, the problem of source code leaks is sometimes misunderstood. Therefore, this article will summarize the risks incurred when a source code leak is found, but above all, to give you the means to know if your company has been a victim. You will be invited to evaluate today your GitHub footprint with a dedicated, free tool.

Source Code and Intellectual Property

In a world where 95% of the IT sector relies in some way or another on open-source software, it is easy to forget that source code is considered intellectual property from the moment the first line of code is created. It can be copyrighted, and in fact, it is quite common to see source code files, including a header stating the copyright owner and permissions.

Being digital by essence, source code is a leaky asset. Distributed version control systems and code-sharing platforms have greatly facilitated the sprawl and cloning of entire codebases over the Internet. This, in turn, has fueled the open-source community and led to great achievements but remains problematic for a growing number of companies for which source code is a core value asset. As a result, ensuring the respect of source code copyrights on a global scale has become an almost impossible task.

Source code can also hold trade secrets, which copyrights, patents or trademarks cannot protect. In an industry where innovation and speed are crucial, failing to keep this information secret could be the end of the game for some young startups. For bigger businesses, avoiding the legal fees of lawsuits is also a motive to better monitor and control what part of the source code has been shared and with whom.

On another plane, published source code gives hackers an inside view of the workloads deployed in the production, sometimes serving millions of users. This can be problematic as well.

Threats Lurking in Source Code

Code is comparable to blueprints for physical products. For the technically literate, having complete access to the inner workings of large systems or pieces of software means they can easily spot vulnerabilities such as hardcoded credentials. For instance, GitGuardian found more than 6600 secrets upon scanning Samsung code:

“Of the more than 6,600 keys found in Samsung source code roughly 90% are for Samsung’s internal services and infrastructure, whilst the other 10%, critically, could grant access to Samsung’s external services or tools such as AWS, GitHub, Artifactory and Google,” remarked Mackenzie Jackson, developer advocate at GitGuardian

It doesn’t necessarily mean hackers could exploit secrets, access controls flaws and other security vulnerabilities right after a leak. However, it is still another asset in their toolbox. Some weak signals might even be more valuable for an attacker than the business logic itself, such as develoers’ personal identifiable information (PII), habits, language and, more globally, company culture. Thousands of lines of code can reveal much more information than we think.

How to Find Out if Your Source Code Has Leaked?

As already mentioned, no matter if your organization has suffered a data breach in the past or not, your proprietary code could be present publicly on GitHub without your knowledge. Even if your company doesn’t have an official presence on the platform, chances are your developers do contribute to open-source projects or use personal repositories to share code.

Therefore, mapping your organization’s footprint on GitHub should be taken seriously. Threat intelligence should be able to answer the following: what assets are publicly available on the Internet? Do they include user data? Do they provide critical and still valid information that an attacker could leverage? Is it still possible to contain the leaked assets (by initiating a DMCA takedown process, for instance) or not?

Technically, the only solution to identify a source code leak is to fingerprint proprietary code and compare this to a database of public files. This is precisely the purpose of HasMyCodeLeaked, a free tool made available by GitGuardian to help anyone quickly identify the most critical matches concerning intellectual property. The tool works by performing an exact lookup of your code fingerprints (checksums identifying your files, including all their revisions) in GitHub’s public history.

It will generate a searchable report for you on all repositories containing your source code, with smart filtering to identify high-risk repositories.


Source code leaks are to be taken seriously. They are damaging your brand’s reputation and can also potentially put at risk some of the most valuable assets a company owns. While DMCA takedown requests can ensure copyrighted content is withdrawn from public platforms, trade secrets could also be revealed and jeopardize the most innovative aspects of your business. On the security side, they offer what attackers value the most: a complete picture of the inner working of a piece of software, including its flaws, the processes and the people involved as well.

With the explosion of the code-sharing platform GitHub, monitoring a company’s public footprint has become one of the most challenging tasks for threat intelligence analysts. This is what motivated the conception of HasMyCodeLeaked, a free tool specially designed to help figure out if your intellectual property has leaked.

Brought to you by

What’s hot on Infosecurity Magazine?