How to hide the source code of a website

A guide to the privacy and protection of your website

Source Code photo

The Problem

While this site's purpose is to make it simple and convenient view the source code of a particular website, there are times when the very opposite is required. You have likely spent a lot of time and effort in planning, designing, and creating your website. And this is something you would want to protect.

Furthermore, you may have spent a lot of resources on acquiring or creating other intellectual property, such as images, and videos for your website. It can feel quite terrible to see others using those without your permission. In this article, I will focus primarily on how you can hide your source code, but also give pointers on protecting other media.

So is it possible to hide your source code? If yes, then how can you do it?

Short Answer

You can't. Not all of it anyway.

Long Answer

How Software Works

We come across many applications on our laptops, PCs, and mobile devices. Even your operating system is essentially an application. At their core, these applications are sets of instructions for your hardware. For example, just to open a simple file on your laptop, your operating system has to first read that file from the disk, process it, and output it on your display through an appropriate application.

Binary Source Code

Even the simplest smartphone apps are sets of instructions that tell your smartphone CPU what to do and how to process data. As far as any CPU is concerned, these are in the form of ones and zeroes, or binary codes, that flip its switches to an on or off state.

Software can consist of many many instructions, sometimes numbering in the millions, and often relying on many other softwares, as well as special libraries. These instructions can be written in one of many programming languages, such as C++, or Java, by human programmers. The choice of language depends on various factors, such as the platform that the application will run on, or how deeply it needs to interact with the hardware.

Programmers can usually understand these languages pretty easily. However, for a computer to understand them, the instructions need to be converted to a format your particular computing hardware can understand. This conversion also reduces an application's size, and increases its execution speed.

Almost all languages require the instruction set, also known as a program, or simply code, to be written in plain text files. Think of this code as a recipe for a particular food, while running the code is analogous to cooking and eating the food.

This particular set of instructions, or source code, is what you would usually like to protect. Large software companies take some extreme measures to do this.

Compiled Languages

For certain languages, such as C++, the source code needs to be compiled into binary machine code, also called an executable file. Another software, a compiler, performs this conversion process. Cooked food source code analogy The CPU can then execute the instructions in this binary file directly. This compilation is a one-way process. That is, you can turn the source code into a binary, but it isn't possible to convert the binary back into its exact source code.

Coming back to the food analogy, compiled languages are like a fully cooked, ready-to-eat meal, that is prepared with fresh ingredients. You might not know what went into it, or how it was cooked, but it is easy to eat.

As you can see, distributing your software this way can protect your source code from theft or prying eyes. Just like a cooked meal, you can't easily tell what raw ingredients went into making it.

Compiled software is also very fast and efficient since the CPU can run this code directly. However, it does come at a cost. Code that is compiled this way will only run on the type of hardware or operating system that it is compiled for. This is why you can't just copy your favorite smartphone game to your PC and run it directly.

ByteCode Languages

Instant noodles bytecode

Certain languages, such as Java, can compile their code into a special type of program called bytecode. Think of it as a halfway point between the source code, and fully compiled binary code. Although not as fast and efficient as a fully compiled binary, bytecode applications have some advantages.

Bytecode can be run without modification on any CPU, or in any environment through a runtime application. This allows the code to be highly portable. You can write, compile, and test your code on your Windows PC, and run the same compiled bytecode on a mobile device.

This bytecode is also in binary form, so while it won't match the speed of a fully compiled binary, it can still execute very fast. And since it is binary, you can distribute it without exposing your original source code.

Bytecode can be likened to instant noodles, or other processed food, that is already cooked. You just need to add heat or water when you need to eat it.

Interpreted or Scripting Languages

Fresh salad

The third type of programming languages, and the most relevant type in our case, are interpreted or scripted languages. With these, the code is also written in plain text files, but instead of compiling it before distribution, an interpreter directly executes the instructions. Examples of such languages are PHP, Javascript, Perl, and Bash.

Think of scripted or interpreted languages as a quick salad that you prepare from scratch and eat, whenever you get hungry, or a raw dish such as sushi that the chef prepares in front of you. You know exactly what went into making it, and how it got onto the plate.

Why would you use this type of language for your application? Speed might not be a major factor for you, while compatibility, ease of maintenance, and transparency likely are. It is this openness, and convenience, that caused so much of the Internet to be built using such languages as bash, Perl, and HTML.

Some scripting languages, such as Python, can also be compiled into bytecode or binary for better performance.

How Web Pages Work

HTML Code

Web pages are written in HyperText Markup Language, or HTML for short. At its most basic, this language tells your web browser how text, images, and other elements should be displayed to the visitor. This is done via simple tags, such as <p> to define a paragraph, or <img> for an image.

HTML is an interpreted language. This is really useful in allowing a web page to be displayed on virtually any device. The same page can work on both a smartphone, as well as a laptop, without having to be rewritten. Some sites do serve different pages, depending on the platform, but this is usually done for design, or optimization purposes.

While HTML defines the structure of a web page, other details, such as the colors, width, font type, and font size, are handled by cascading style sheets, also known as CSS. These are typically stored in a separate file, and called upon by the HTML page when needed.

In addition, much of a webpage's computation and other functionality, is handled by javascript. This can also be stored in separate files, and loaded upon request.

Unfortunately for our purposes, HTML, CSS, and Javascript files are all written in plain text, and stored by the web browser's cache, when first accessed. This makes it virtually impossible to completely hide the source code.

Similarly, your browser also loads other content, such as images (e.g. PNG, JPEG, GIF, SVG) and video, and your visitors can easily copy these.

Good Ways To Hide Your Source Code

Now that we know what we are up against, we need to find ways of getting around these limitations. Or at least make it harder for someone to reuse your work. Here are some good ways to do this:

Obfuscating Source Code

The most basic protection you can add is to obfuscate much of a web site's code. With javascript, for example, you can replace all instances of a function or variable named calculate_time with the a single letter, say 't', as long as that letter isn't already used elsewhere in the code. This hides some of the functionality, and makes it very inconvenient for someone to figure out what your code is doing.

Although this type of obfuscation is not possible with HTML, you could remove any unnecessary data, such as newlines, and comments, from a web page's source code. This will also make it quite difficult for others to copy and modify your code. For example, this source code of a sample web page:

<!DOCTYPE html>
  <head>
    <title>Test page<title>
  </head>
  <body>
    <h1>This is an awesome site<h1>
    <!-- This is just a comment -->
    <p>This is a paragraph.<p>
  </body>
</html>

can be replaced with just this code:

<!DOCTYPE html><head><title>Test page<title></head><body><h1>This is an awesome site<h1><p>This is a paragraph.<p></body></html>

The web page will function the same, since your web browser already ignores the parts we left out, but removing the comment, as well as the newlines and whitespaces, will make it a bit more difficult for a person intending to steal your code. It will also reduce the size of your web page, making it load just a tiny bit faster. In this example, the savings are about 33%. This is a marginal improvement on such a small web page, but can make a lot of difference on large objects, especially CSS or Javascript files.

Disable File and Folder Listing

The contents of your website are stored in folders on your web server, the same way you store your data on your computer. If you access a folder on your computer, you get a list of all the files and folders within that folder. On a web server, instead of the list, an index file or a home page is displayed instead. Depending on how your web server is configured, this can be served from a file such as index.html, main.html, or index.php. Application servers might not even look for a file and display other information instead.

In any case, if your visitors could get a list of files and folders on your website, it makes it much easier to steal or copy your content. For certain sites, such as those serving open source software, this is actually desirable. Kernel.org, for example, allows you to directly traverse its public folder.

For most sites however, it makes sense to hide such a list by disabling folder listing. Almost all web servers today disable this by default. However, check your website just to be sure, and consult your web server's documentation.

For Nginx, make sure the autoindex directive is either absent, or set to off:

location /somedir {
  autoindex off;
}

For Apache, make sure the Indexes is either absent, or disabled with a '-' sign ('+Indexes' means it is enabled):

<Directory /usr/local/apache2/htdocs/dontlistme>
  Options -Indexes
</Directory>

Referer Header

Another useful header is the Referer. This is attached to each HTTP request, if it is generated by clicking another site, or when a resource is referenced in a web page. For example, if a webpage at https://www.example.org/page1.html contains a link to https://www.example.com/page5.html, clicking the link will include the first URL within the Referer header. Or if a stylesheet is loaded on the second link, the second link will be in this header.

How is this useful for us? You can configure your server to deny any request from certain refering URLs, or if a referer is absent. So let's say image 'background.jpg' is loaded by your page. You can allow your page's URL, and deny any requests other than this. So if someone types the image URL directly in the browser, or tries to load or hotlink it from another site, your server will deny that request.

For Nginx, you can use the ngx_http_referer_module module for this purpose:

valid_referers example.com *.example.com;

if ($invalid_referer) {
    return 403;
}

For Apache, you would use something like this in your .htaccess file, or the web server's configuration:

RewriteEngine On
RewriteBase /

# Allow any requests from example.com and any of its sub-domains
RewriteCond %{HTTP_REFERER} ^http://(?:.*\.)?example\.com
RewriteRule ^ - [L]

# Deny everyone else, by sending a Forbidden (403) response
RewriteRule ^ - [F]

# Document to show
ErrorDocument 403 /forbidden.html

Again, the referer header can be faked, but this can act as one additional layer of protection for your intellectual property.

Source Code legal protection

Admittedly, when it comes to law, many aspects of a website are still a little murky. In most parts of the world, the source code that you write yourself, as well as any images or other content that you create, belongs to you. You can give this away, license it, or legally protect it via copyright.

It gets very complicated when you include other people's source code, design, text, images, video, and audio. Furthermore, your website may be hosted on a server in one country, while most of your visitors are from another country, with entirely different laws that govern websites and their content.

With over 1.6 billion websites on the Internet, practically, there is little you can do about somebody stealing your content. Especially if the offending website is hosted in a region where copyright laws are lax, or non-existent. However, it is still possible.

I am far from a legal expert on this, so my advice would be to consult a local lawyer, who specializes in digital copyright law.

Encrypting Your Website

Though websites are written in HyperText Markup Language (HTML), they use the HyperText Transfer Protocol (HTTP) to deliver pages, as well as other content, to a visitor's browser. HTTP is very efficient for this purpose, but it has a crucial flaw. It transfers data in plain text. HTTPS encryption This means that anyone can easily intercept your data while in transit, making it unsuitable for any sensitive data, including passwords, credit card details, or mailing addresses. Even the email address you entered to signup somewhere is valuable to someone, and they will try to get it.

This is where HTTPS, the secure version of HTTP comes in. This protocol uses Transport Layer Security (TLS, the successor to Secure Sockets Layer, or SSL) to encrypt all data being transferred, allowing you to serve content to your users without fear of being intercepted.

Until a few years ago, HTTPS was rarely used due to the cost of encryption certificates, complicated setup procedures, and performance issues. Even large websites only used it for sensitive parts of the website, such as login, or payment pages. Today, performance is no longer an issue, it can be setup very easily, and a number of services provide free certificates. Furthermore, search engines, such as Google, favor websites with HTTPS.

Stopping Automated Website Scrapers

People often employ automated scripts, bots, or other tools to scrape data from your website. Besides obfuscation, there are some other ways to stop this type of access.

One method you may want to consider for protecting your source code is to restrict the user agent accessing your site. Web browsers include a User-Agent header with each request. In case of a PC browser, such as Firefox, running on a Windows system, this may look like this:

Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0

On the other hand, automated bots may use tools such as "Wget" or "Curl" to scrape your data.

Such activity can be curtailed by configuring your webserver to reject requests from such user agents. However, do note that this header can be modified. So a script using "curl" on a Linux system could still appear as Chrome running on a MacOS PC.

Another way to restrict such attempts is to put limits on how often someone can access your server. Refer to your web server's documentation to configure this according to your needs.

Methods That Should Be Avoided

That said, we should also mention some bad ways to hide your source code. These could help protect your source, but they cause a number of other problems, and it is best to avoid these:

Javascript HTML Generation

You can use javascript code to generate your web page in your visitor's browser. This code can be obfuscated, and as we saw above, obfuscation is one way to deter theft of your code. If anyone looks at your source code, all they will see is a large javascript file, with code that is very difficult to read. Something like this, but much much longer:

function f(t,e,n,i,s){for(var o,a=[],r=0,l=t.length,c=null!=e;l>r;r++)(o=t[r])&&(n&&!n(o,i,s)||(a.push(o),c&&e.push(r))) return a}

This may sound good, but hiding your code like this causes a number of problems. Firstly, it adds lots of complexity. You will need to generate this code every time you need to change something on the website. The potential for your code to break is also a lot higher.

Secondly, search engines will ignore your content. Not only will you have problems ranking for your keywords, any visitors or crawlers with Javascript disabled will not find anything useful on your site.

Lastly, it still doesn't protect you fully, and will likely end up annoying legitimate users.

Disable Right Click

Disable Mouse Right Click

One way to deter people from viewing your site's source code is to disable the context menu when you right-click on a web page. The context menu usually contains options to view the page's source, or to open the inspector that developers usually use.

Please note that this is not recommended, since there are pretty simple workarounds to this, and it mostly just serves to annoy users who might want the context menu for legitimate reasons. In case you still want to go ahead with it, one way to achieve this is to simply add this code to your body tag:

<body oncontextmenu="return false;">

Other HTML Tricks

Whitespace padding

Other tricks you can employ include adding lots of whitespace to pad your pages. When someone views the source in a viewer that doesn't wrap long lines, all they will see is a blank page. You can off course scroll to the right and see it, but this may not be apparent to some people.

The problem with this is that the whitespace does little to really hide your code, yet it greatly increases the filesize. Even with compression, a file that may be twice as large as it should be, is read, transferred and loaded by your browser, taking up a tiny bit more resources than an already compact one.

Tools for Source Code Protection

Here are certain tools that may be useful, if you would like to save your source code from prying eyes:

Webpage Minify

This online tool removes extra whitespace from the HTML of a given web page.

CSS Obfuscator

This online tool obfuscates CSS, making it very compact, without losing functionality.

Conclusion

Protecting your intellectual property; source code, design, images etc.; is very important, but can be a challenge when it comes to websites. However, following the tips mentioned here, you can achieve some level of protection, with the added benefit of a faster site, with a better user experience for your visitors.

What are your favorite tips for hiding your source code? Let us know in the comments.

About Image for Source Code