How to hide the source code of a website

A guide to the privacy and protection of your website

Posted On: June 17, 2019, Updated On: April 5, 2022

Overview

The Problem with HTML

While this site's purpose is to make it simple and convenient toview the source code of a particular website,there are times when the veryopposite is required. You have likely spent a lot of time and effort inplanning, designing, and creating your website. And this is somethingyou would want to protect.

Furthermore, you may have spent a lot of resources on acquiring orcreating other intellectual property, such as images, and videos foryour website. It can feel quite terrible to see others using those without your permission. In this article, I will focusprimarily on how you can hide your source code, but also give pointerson protecting other media.

So is it possible to hide your source code? If yes, then how can you do it?

How Software Works

We come across many applications on our laptops, PCs, and mobiledevices. Even your operating system is essentially an application. Attheir core, these applications are sets of instructions for yourhardware. For example, just to open a simple file on your laptop,your operating system has to first read that file from the disk,process it, and output it on your display through an appropriateapplication.

Even the simplest smartphone apps are sets of instructions thattell your smartphone CPU what to do and how to process data. As far asany CPU is concerned, these are in the form of ones and zeroes,or binary codes, that flip its switches to an on or off state.

Software can consist of many many instructions, sometimes numberingin the millions, and often relying on many other softwares, as well asspecial libraries. These instructions can be written in one of manyprogramming languages, such as C++, or Java, by human programmers. Thechoice of language depends on various factors, such as the platformthat the application will run on, or how deeply it needs to interactwith the hardware.

Programmers can usually understand these languages pretty easily.However, for a computer to understand them, the instructions need to beconverted to a format your particular computing hardware can understand.This conversion also reduces an application's size, and increases itsexecution speed.

Almost all languages require the instruction set, also known as aprogram, or simply code, to be written in plain text files.Think of this code as a recipe for a particular food, while running thecode is analogous to cooking and eating the food.

This particular set of instructions, or source code, is what youwould usually like to protect. Large software companies take someextreme measures to do this.

Compiled Languages

For certain languages, such as C++, the source code needs to becompiled into binary machine code, also called an executablefile. Another software, a compiler, performs this conversion process.The CPU can then execute the instructions in this binary file directly.This compilation is a one-way process. That is, you can turn the sourcecode into a binary, but it isn't possible to convert the binary back intoits exact source code.

Coming back to the food analogy, compiled languages are like a fullycooked, ready-to-eat meal, that is prepared with fresh ingredients. Youmight not know what went into it, or how it was cooked, but it is easyto eat.

As you can see, distributing your software this way can protectyour source code from theft or prying eyes. Just like a cooked meal,you can't easily tell what raw ingredients went into making it.

Compiled software is also very fast and efficient since the CPU canrun this code directly. However, it does come at a cost. Code that iscompiled this way will only run on the type of hardware or operatingsystem that it is compiled for. This is why you can't just copy yourfavorite smartphone game to your PC and run it directly.

ByteCode Languages

Certain languages, such as Java, can compile their code intoa special type of program called bytecode. Think of it as ahalfway point between the source code, and fully compiled binary code.Although not as fast and efficient as a fully compiled binary, bytecodeapplications have some advantages.

Bytecode can be run without modification on any CPU, or in anyenvironment through a runtime application. This allows the code to behighly portable. You can write, compile, and test your code on yourWindows PC, and run the same compiled bytecode on a mobile device.

This bytecode is also in binary form, so while it won't match the speedof a fully compiled binary, it can still execute very fast. And since itis binary, you can distribute it without exposing your original sourcecode.

Bytecode can be likened to instant noodles, or other processed food,that is already cooked. You just need to add heat or water when you needto eat it.

Interpreted or Scripting Languages

The third type of programming languages, and the most relevant typein our case, are interpreted or scripted languages. With these, the codeis also written in plain text files, but instead of compiling it beforedistribution, an interpreter directly executes the instructions.Examples of such languages are PHP, Javascript, Perl, and Bash.

Think of scripted or interpreted languages as a quick salad that youprepare from scratch and eat, whenever you get hungry, or a raw dishsuch as sushi that the chef prepares in front of you. You know exactlywhat went into making it, and how it got onto the plate.

Why would you use this type of language for your application? Speedmight not be a major factor for you, while compatibility, ease ofmaintenance, and transparency likely are. It is this openness, andconvenience, that caused so much of the Internet to be built using suchlanguages as bash, Perl, and HTML.

Some scripting languages, such as Python, can also be compiled intobytecode or binary for better performance.

How Web Pages Work

Web pages are written in HyperText Markup Language, or HTML for short.At its most basic, this language tells your web browser how text, images,and other elements should be displayed to the visitor. This is done viasimple tags, such as <p> to define a paragraph, or <img> foran image.

HTML is an interpreted language. This is really useful in allowing aweb page to be displayed on virtually any device. The same page can workon both a smartphone, as well as a laptop, without having to be rewritten.Some sites do serve different pages, depending on the platform, but thisis usually done for design, or optimization purposes.

While HTML defines the structure of a web page, other details, suchas the colors, width, font type, and font size, are handled bycascading style sheets, also known as CSS. These are typically stored ina separate file, and called upon by the HTML page when needed.

In addition, much of a webpage's computation and other functionality,is handled by javascript. This can also be stored in separate files, andloaded upon request.

Unfortunately for our purposes, HTML, CSS, and Javascript files areall written in plain text, and stored by the web browser's cache, whenfirst accessed. This makes it virtually impossible to completely hidethe source code.

Similarly, your browser also loads other content, such as images(e.g. PNG, JPEG, GIF, SVG) and video, and your visitors can easily copy these.

Good and Bad Ways To Hide Your Source Code

Now that we know what we are up against, we need to find ways ofgetting around these limitations. Or at least make it harder forsomeone to reuse your work. Here are some good ways to do this, aswell as others that should be avoided:

Code Obfuscation

The most basic protection you can add is to obfuscate muchof a web site's code. With javascript, for example, you can replaceall instances of a function or variable named calculate_time with thea single letter, say 't', as long as that letter isn't already usedelsewhere in the code. This hides some of the functionality, and makesit very inconvenient for someone to figure out what your code is doing.

Although this type of obfuscation is not possible with HTML, you couldremove any unnecessary data, such as newlines, and comments, from a webpage's source code. This will also make it quite difficult for others tocopy and modify your code. For example, this source code of a sampleweb page:

<!DOCTYPE html><head><title>Test page<title></head><body><h1>This is an awesome site<h1><!-- This is just a comment --><p>This is a paragraph.<p></body></html>

can be replaced with just this code:

<!DOCTYPE html><head><title>Test page<title></head><body><h1>This is an awesome site<h1><p>This is a paragraph.<p></body></html>

The web page will function the same, since your web browser alreadyignores the parts we left out, but removing the comment, as wellas the newlines and whitespaces, will make it a bit more difficult for aperson intending to steal your code. It will also reduce the size of yourweb page, making it load just a tiny bit faster. In this example, thesavings are about 33%. This is a marginal improvement on such a small webpage, but can make a lot of difference on large objects, especially CSSor Javascript files.

Disabling File and Folder Listing

The contents of your website are stored in folders on your web server,the same way you store your data on your computer. If you access a folderon your computer, you get a list of all the files and folders within thatfolder. On a web server, instead of the list, an index file or a home pageis displayed instead. Depending on how your web server is configured, thiscan be served from a file such as index.html, main.html,or index.php. Application servers might not even look for a fileand display other information instead.

In any case, if your visitors could get a list of files and folders onyour website, it makes it much easier to steal or copy your content. Forcertain sites, such as those serving open source software, this is actuallydesirable. Kernel.org, for example, allows you to directly traverse its public folder.

For most sites however, it makes sense to hide such a list by disablingfolder listing. Almost all web servers today disable this by default.However, check your website just to be sure, and consult your web server'sdocumentation.

For Nginx, make sure the autoindex directive is either absent, or setto off:

location /somedir {autoindex off;}

For Apache, make sure the Indexes is either absent, ordisabled with a '-' sign ('+Indexes' means it is enabled):

<Directory /usr/local/apache2/htdocs/dontlistme>Options -Indexes</Directory>

Referer Header

Another useful header is the Referer. This is attached toeach HTTP request, if it is generated by clicking another site, or whena resource is referenced in a web page. For example, if a webpage athttps://www.example.org/page1.html contains a link to https://www.example.com/page5.html,clicking the link will include the first URL within the Referer header.Or if a stylesheet is loaded on the second link, the second link willbe in this header.

How is this useful for us? You can configure your server to deny anyrequest from certain refering URLs, or if a referer is absent. So let'ssay image 'background.jpg' is loaded by your page. You can allow yourpage's URL, and deny any requests other than this. So if someone typesthe image URL directly in the browser, or tries to load orhotlink it from another site, your server will deny that request.

For Nginx, you can use the ngx_http_referer_module module for this purpose:

valid_referers example.com *.example.com;if ($invalid_referer) {return 403;}

For Apache, you would use something like this in your .htaccess file,or the web server's configuration:

RewriteEngine OnRewriteBase /# Allow any requests from example.com and any of its sub-domainsRewriteCond %{HTTP_REFERER} ^http://(?:.*\.)?example\.comRewriteRule ^ - [L]# Deny everyone else, by sending a Forbidden (403) responseRewriteRule ^ - [F]# Document to showErrorDocument 403 /forbidden.html

Again, the referer header can be faked, but this can act as oneadditional layer of protection for your intellectual property.

Legal Protection of Your Website

Admittedly, when it comes to law, many aspects of a website are stilla little murky. In most parts of the world, the source code thatyou write yourself, as well as any images or other content that youcreate, belongs to you. You can give this away, license it, or legallyprotect it via copyright.

It gets very complicated when you include other people's source code,design, text, images, video, and audio. Furthermore, your website may behosted on a server in one country, while most of your visitors are fromanother country, with entirely different laws that govern websites andtheir content.

With over 1.6 billion websites on the Internet, practically, there islittle you can do about somebody stealing your content. Especially if theoffending website is hosted in a region where copyright laws are lax, ornon-existent. However, it is still possible.

I am far from a legal expert on this, so my advice would be to consulta local lawyer, who specializes in digital copyright law.

Encrypting Your Website

Though websites are written in HyperText Markup Language (HTML), theyuse the HyperText Transfer Protocol (HTTP) to deliver pages, as well asother content, to a visitor's browser. HTTP is very efficient for thispurpose, but it has a crucial flaw. It transfers data in plain text. HTTPS encryption This means that anyone can easily intercept your data while in transit,making it unsuitable for any sensitive data, including passwords, creditcard details, or mailing addresses. Even the email address you enteredto signup somewhere is valuable to someone, and they will try to get it.

This is where HTTPS, the secure version of HTTP comes in. Thisprotocol uses Transport Layer Security (TLS, the successor to SecureSockets Layer, or SSL) to encrypt all data being transferred, allowingyou to serve content to your users without fear of being intercepted.

Until a few years ago, HTTPS was rarely used due to the cost ofencryption certificates, complicated setup procedures, and performanceissues. Even large websites only used it for sensitive parts of thewebsite, such as login, or payment pages. Today, performance isno longer an issue, it can be setup very easily, and a number of servicesprovide free certificates. Furthermore, search engines, such as Google,favor websites with HTTPS.

Stopping Automated Website Scrapers

People often employ automated scripts, bots, or other tools to scrapedata from your website. Besides obfuscation, there are some other waysto stop this type of access.

One method you may want to consider for protecting your source code isto restrict the user agent accessing your site. Web browsers include aUser-Agent header with each request. In case of a PC browser,such as Firefox, running on a Windows system, this may look like this:

Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0

On the other hand, automated bots may use tools such as "Wget" or "Curl"to scrape your data.

Such activity can be curtailed by configuring your webserver to rejectrequests from such user agents. However, do note that this header can bemodified. So a script using "curl" on a Linux system could still appear asChrome running on a MacOS PC.

Another way to restrict such attempts is to put limits on how oftensomeone can access your server. Refer to your web server's documentationto configure this according to your needs.

Enabling HTTP Authentication

If all else fails, you could enable HTTP Auth for certain pages, oreven your whole website. This will prompt anyone attempting to accessyour pages for a username and password. If they enter the wrongauthentication data, they will simply receive a 401 (Authorization Required)HTTP client error code.

Granted that this is a pretty extreme way of protecting yourself.Afterall, a website's purpose is to serve visitors. However, this is agreat way to restrict access to selected people. This is often used byorganizations for in-house services, or for test versions of theirproduction site.

HTTP Authentication is a widely accepted feature and can be enabledin your web server.

Methods That Should Be Avoided

That said, we should also mention some bad ways to hide your source code.These could help protect your source, but they cause a number of otherproblems with usability, SEO, and website security. It is best to avoid these:

Javascript HTML Generation

You can use javascript code to generate your web page in your visitor'sbrowser. This code can be obfuscated, and as we saw above, obfuscation isone way to deter theft of your code. If anyone looks at your source code,all they will see is a large javascript file, with code that is verydifficult to read. Something like this, but much much longer:

function f(t,e,n,i,s){for(var o,a=[],r=0,l=t.length,c=null!=e;l>r;r++)(o=t[r])&&(n&&!n(o,i,s)||(a.push(o),c&&e.push(r)))return a}

This may sound good, but hiding your code like this causes a numberof problems. Firstly, it adds lots of complexity. You will need togenerate this code every time you need to change something on thewebsite. The potential for your code to break is also a lot higher.

Secondly, search engines will ignore your content. Not only will youhave problems ranking for your keywords, any visitors or crawlers withJavascript disabled will not find anything useful on your site.

Lastly, it still doesn't protect you fully, and will likely end upannoying legitimate users.

Disabling Right Click

One way to deter people from viewing your site's source code is todisable the context menu when you right-click on a web page.The context menu usually contains options to view the page's source,or to open the inspector that developers usually use.

Please note that this is not recommended, since there are prettysimple workarounds to this, and it mostly just serves to annoy userswho might want the context menu for legitimate reasons. In case youstill want to go ahead with it, one way to achieve this is to simplyadd this code to your body tag:

<body oncontextmenu="return false;">

Other HTML Tricks

Other tricks you can employ include adding lots of whitespace to padyour pages. When someone views the source in a viewer that doesn't wraplong lines, all they will see is a blank page. You can off course scrollto the right and see it, but this may not be apparent to some people.

The problem with this is that the whitespace does little to reallyhide your code, yet it greatly increases the filesize. Even withcompression, a file that may be twice as large as it should be, is read,transferred and loaded by your browser, taking up a tiny bit moreresources than an already compact one.

Tools for Source Code Protection

Here are certain tools that may be useful, if you would like to saveyour source code from prying eyes:

CSS Obfuscator

This online tool obfuscates CSS, making it very compact, without losing functionality.

Conclusion

Protecting your intellectual property; source code, design, imagesetc.; is very important, but can be a challenge when it comes towebsites. However, following the tips mentioned here, you can achievesome level of protection, with the added benefit of a faster site, witha better user experience for your visitors.

What are your favorite tips for hiding your source code? Let us knowin the comments.