The Blockchain Notary: 5-minute file certification without corruptible intermediaries

Hello everyone, this article is going to be cool, because I finally got to write some code to put things in practice instead of just talking. Remember that in the blockchain ecosystem there are 2 kinds of people:

1- The ones that just talk talk talk (literally almost everyone)

2- The ones that study, experiment, fail, get headaches, and finally build concrete solutions (a few engineers and entrepreneurs).

You can imagine how happy am I to finally say that I got my hands dirty and joined group #2 by concretely engineering a decentralized, blockchain-powered web application. Let’s dive together into the topic!

The problem

I want to illustrate how blockchain technology can solve the problem of digital asset notarization. By that I mean certifying that a computer file was created (and therefore existed) in a certain specific date&time.

Not only I wanted to solve that problem, but I also wanted to provide a way to undeniably prove that the same file was never changed since the date it was certified.

So, if you really are interested in solving this notarization problem using the blockchain follow through the article! You will get a complete overview from problem to crafting a functioning web application to solve it. Let’s go!

When I’m not working I do enjoy learning about computer forensics and sometimes I do related CTFs (CTF stands for Capture-The-Flag, programming and software security challenges) challenges. Because practice is the best way to learn something right?

One thing that I realized after doing a few forensic-based CTFs is that every property of a file can be easily changed/tampered, even by a non-skilled programmer.

This because it is trivial to open up the entire code of a file with particular text editors and simply edit the “metadata” of the file (the creation date, the file size, the last modified date, etc).

So if I want to fake a document and, for example, backdate it, nobody stops me from doing it!

Specifically I can open the file’s metadata, change the creation date to one year ago, and save everything. If anyone will later right-click and inspect the file properties will see the creation date as one year ago.

This gives unlimited possibilities to scammers and high profile fraudsters to carry their dirty schemes agains companies and/or government entities. One example that comes to my mind now is tampering digital photos for insurance fraud.

So if files are corruptible and malleable, how can we guarantee their integrity or authenticity over time? How can you put a tamper-proof stamp on it that says “this document/photo/file/spreadsheet existed today November 19 2019“?

The solution

First of all you need to know that we can use a cryptographic function called “hashing” to take a file as input, and output a string of letters and numbers called “hash” or “digital signature”, which uniquely identifies that file. The hashing function is one-way only, so it is impossible reverse, meaning you cannot get a file from a digital signature… Just like you cannot get potatoes back from hashed browns! (through the article we’ll use the term digital signature)

a file can produce a digital signature thanks to the use of a hashing function
This is what “taking the digital signature of a file” means.

Now that you know what a file digital signature is, let’s try to come up with a solution for file certification together!

Hmm, maybe we can take a human, we can give him some authority, an office and then let him receive files by email! So he can inspect them, take the digital signature, and register these signatures in an excel sheet.

The problem with this approach is that it is massively inefficient, it’s gonna cost you so much to scale, let alone all the privacy and corruption problems that might come with it.

No problem, as a programmer I know very well how to make things fast and efficient! I can definitely replace the human with a 100% software-based robot! It will again accept files thru maybe a website UI or email attachment, hash them, and automatically archive their digital signature with a timestamp in a database.

Very efficient, but still not a 100% fail-proof solution unfortunately, because a skilled hacker can attack the server where the bot lives, knock it offline, and/or delete/modify the data stored in that database causing big damages.

Where am I going with this? I am showing you that this specific problem cannot be solved with a centralized intermediary, because whenever we are trusting a single central point we open ourselves to manipulation and corruption. We need to be 100% sure that once we certify a file nobody can tamper with it.

We literally need a ledger that can record information, but once that information is recorded it does not allow anyone to deleted it. We need a ledger that is available 24/7, a ledger that is supported by thousands of computers/servers distributed around the world. You guessed it right, we are going to solve this problem using the Blockchain!

Why did it take so long to get here? Because I wanted to show you the exact thought process I had when evaluating if this idea actually made sense to be blockchain-based. Do not make the mistake to think that blockchain solves EVERY problem, it is not a magic pill!

Ok, so now I have this technology that can accept any data, and record it permanently on thousands of computers. I know that once that data is stored nobody can delete it by definition (if you are not convinced here you need to go back to the basics and first understand how the blockchain works, it’s all covered in my first articles).

The final solution can therefore be summarized with these main points:

  1. Take a file.
  2. Take the file’s digital signature.
  3. Write that digital signature together with today’s date into the blockchain.
  4. Wait for the transaction to be mined and permanently included in the blockchain, then take note of the transaction and block number.
  5. Store the file certification data so it can be retrieved later, in case there will be the need to check the file existence/integrity.

The best way to accomplish this by building a web interface, hosted on a publicly accessible URL. Now let’s engineer this solution and make it reality!

The technical implementation

Where do we start? Well, the first step is to actually pick a specific blockchain that we can use to engineer our product. There are 3 types of blockchains that we can develop on:

  1. Globally distributed, public, non-programmable (turing incomplete) blockchains: an example of this kind of blockchain is the Bitcoin one, which is “dumb” (but extremely resilient) because it cannot be programmed, it just performs calculator-like functions and stores little information. It was the first blockchain ever invented.
  2. Globally distributed, public, programmable (turing complete) blockchains: here I’m mainly referring to the blockchains that can run “smart contracts” on them, this means you can use a dedicated programming landuage like Solidity to code a complete set of rules with complex logic and have it run completely on the blockchain. They also allow storage of any kind of data since you can declare variables and data structures. These are the blockchains of Ethereum, EOS, Tron and many more.
  3. Federated, private/consortium, blockchains: From an engineering perspective, these blockchains are not that different from a well structured centralized SQL database cluster. The computers supporting this blockchains are not globally distributed, but usually located in datacenters owned by corporations/companies. So it is basically a centralized data storage with a fancy name and innovative branding that gets the VC investors excited (Will rant about this in a different article maybe).

So since I definitely did not want to use #3, I was left between choice #2 and #1. Both are well suited for this project because I literally just need to store some data, no need to create complex logic (if this happens then do that, else do that other thing).

At the end I decided for #2 because it was way easier to get started with developing. Specifically I picked Ethereum, the oldest and most established programmable blockchain.

Also the most established framework to build dApps (decentralized applications) is Truffle Suite, so I decided to pick up its documentation and learn it. Truffle is amazing because it keeps everything in one place, it has a lot of documentation and helpful troubleshooting resources in case s**t hits the fan.

The architecture for my dApp was going to be the following:

The Frontend

The front end is, as the name implies, the front of the application, so the interface through which users will interact. The front end is implemented in React, but it could have been Angular, Vue, EmberJS or even vanilla HTML+CSS+Javascript. Doesn’t matter what you choose.

What matters is that the front end Javascript uses the WEB3 library to communicate with the blockchain.

Specifically to this project, the frontend allows a user to upload a file that needs to be certified, hashes it, and writes it to the blockchain thru a small cryptocurrency transaction using the MetaMask Wallet. The wallet transaction is invoked thru the Web3 library.

The other responsibility of the front end is not only to accept files, hash them and send their metadata to the backend, but it is also to retrieve past interactions and to generate a pdf certificate for each file that was certified in the past.

The Backend

In this project the backend is the real superstar, in fact the backend is the actual ethereum blockchain!

In a regular non-blockchain app, the backend is usually a software (so a set of instructions written in computer code) that runs on a server and remembers things (data) thanks to a connected database like SQL or MongoDB.

So how can our backend now be the blockchain? Where does the code run if we don’t have a server?

Well, to use Ethereum as an app backend you need to write and publish a smart contract. A smart contract is a set of rules written in a programming language called Solidity, and will be responsible for handling application backend logic and for recalling/storing data.

Writing a smart contract is a bit similar to writing software for a rocket that will be launched in space, no joke! Once you deploy it to the blockchain, it cannot be deleted or erased. Always remember that ANY data that is written to a distributed blockchain (smart contracts included) cannot be changed/tampered! So how to improve the backend and write new code? Deploy a new smart contract to a new address and point the frontend to that new address instead of the old one.

So here’s the smart contract source code with comments:

//declare the solidity compiler version
pragma solidity ^0.5.12;

//declare the contract
contract Authenticity {

  //declare the event that will be fired when a file is certified.
  event FileCertified(address author, string fileHash, uint timestamp, uint fileSize, string fileExtension);

  //declare a structured data that describes a certified file
  struct FileCertificate {
    address author;
    string fileHash;
    uint timestamp;
    uint fileSize;
    string fileExtension;
  }

  //declare an object that will store the file certificates by hash
  mapping (string => FileCertificate) fileCertificatesMap;

  //function that allows users to certify a file
  function certifyFile(uint fileSize, string memory fileHash, string memory fileExtension) public payable {
    FileCertificate memory newFileCertificate = FileCertificate(msg.sender, fileHash, block.timestamp, fileSize, fileExtension);
    fileCertificatesMap[fileHash] = newFileCertificate;
    emit FileCertified(msg.sender, fileHash, block.timestamp, fileSize, fileExtension);
  }

  //function that allows users to verify if a file has been certified before
  function verifyFile(string memory fileHash) public view returns (address, string memory, uint, uint, string memory) {
    return (fileCertificatesMap[fileHash].author, fileCertificatesMap[fileHash].fileHash, fileCertificatesMap[fileHash].timestamp, fileCertificatesMap[fileHash].fileSize, fileCertificatesMap[fileHash].fileExtension);
  }


}

(GitHub Link)

Again, the functions will be called by the front-end, and the FileCertified event is what gets fired every time a file is certified. Events are super useful for keeping a history of certified files.

As far as the DevOps side of things, I detailed how I deployed the contract to the ethereum testnet in this article.

The dApp

I called this project NOTARX, which stands for Notary eXpress. Here’s finally a demo:

Quick demonstration!

You can try it yourself too! Just visit notarx.com! Make sure you have MetaMask installed and connected to the Ropsten Testnet network.

If you are a programmer and you want to see the source code of this dApp you can check it out here.

I deployed it on testnet instead of mainnet because it’s an alpha version, I want to test it extensively now before moving it to the mein ethereum network and involve real money into it.

Final thoughts

It was a fun ride, creating a product with such newborn technologies is no easy feat. In fact things kept breaking, I had to digest lots of documentation, ask a few questions on StackOverflow/Reddit and understand how to tie everything together.

I hope you enjoyed the ride! Remember, whenever you are presented with a problem you should follow more or less the same flow to understand:

  • If the blockchain is really necessary to solve the problem.
  • How to structure/architect your dApp so that you can bring it to market fast.

Would you have done things differently? Do you want to know more details? Leave a comment!! Do it!

Thanks for reading all the way down here <3 !