1. Introduction
A blockchain is an online global database accessible to anyone via the Internet, at any time [
1]. In blockchain databases, the information is recorded, maintained and shared by a community. The blocks within a blockchain contain information about transactions. A typical transaction is a data structure that defines a transfer of information or value. In this sense, a transaction can be the operation of storing information to the block, extracting information from the block, transferring value from one entity to another, a contract, and so on. Blockchain technology combines many other technologies, like cryptography, peer-to-peer networks, smart contracts, and consensus mechanisms to make it nearly impossible to hack or tamper with the transactions and information stored within the blocks. Crypto currencies such as Bitcoin [
2] are an application of blockchain, which are related to day-to-day digital payment-based systems. In order to create a new block or to verify a transaction, computers must solve complicated math problems in a process called mining. The node that completes the mathematical puzzle the fastest receives a small Bitcoin reward and gets the right to certify the transaction. Blocks are then appended to the blockchain in chronological order, as they are created.
The main issue with the blockchain technology is the ever-increasing size of the blockchain, which makes its storage and validation process more complicated. A single block in the Bitcoin blockchain is around 1-2 Mb size, while the entire Bitcoin blockchain is today 578.48 Gb (on 12 June 2024), making it almost impossible to run it on conventional laptops and desktops [
3]. Moreover, its daily growth rate is around
, and its annual growth rate is
.
Another issue is the increasing computational power requirements to mine blocks and to perform the validation. The proof of work consensus mechanism uses a lot of power. For example, the electricity consumption of Bitcoin is estimated to be 172.26 terawatt-hours (TWh) per year (on 12 June 2024), which amounts to 96.08 Mt CO2 equivalent. The average energy consumption of Bitcoin per transaction is 766.64 kilowatt-hours (kWh), which has a carbon footprint of 427.60 kg CO2 [
4]. These huge numbers are having a direct impact to the power requirements to run the blockchains, but they are also having a measurable environmental impact.
In this article, we propose a novel method that allows the scalability of blockchain databases in the download, validation and confidentiality process, by developing a lightweight blockchain technology called Entropic Blockchain, for simultaneous data compression and encryption, while also facilitating a unique methodology for reaching consensus in real-time. This novel method uses Shannon’s Information Entropy (IE) [
5] to generate the Entropic Barcode of a dataset, and it is the subject of a recent patent application [
6].
2. Calculating the Information Entropy
Let us assume that a given set
X contains
N characters, each chosen from a set with radix
. Let
, where
, be the number of
characters in the set
X. The fractions
can be defined for the set
X, where
is the fraction of the occurrence of a character
within the set
X. According to Shannon [
5], the average information extracted per character, or the number of bits of information per character, or the information entropy of the set
X is:
where the base
b gives the units of information (e.g.,
for bits), and the total bit content of the set is
.
For a given set, the maximum IE is obtained when the fractions are equal to each other (similarly as normal numbers), that is
, so:
The IE given by (
1) is computed when the fractions refer to single characters within a set. However, a useful extrapolation could be the generalization of relation (
1) to
m-block information entropy [
7]:
where instead of single characters, combinations of
m characters are used to define a new set of characters, called
m-blocks. In this case,
are the fractions of the
m-block characters, and the summation extends over all possible combinations of distinct
m-blocks. For a given set of
N characters each chosen from a set of radix
U the maximum number of distinct
m-blocks of the newly constructed set of
m-block characters is:
The maximum value of the IE theoretically permitted for the new set of
m-blocks is:
Combining (
2) and (
5), we deduce that
, so using
m-blocks increases the IE value by a factor of
m relative to the set of single characters. In order to clarify the methodology proposed here, it is useful to show a few examples. Let us assume that a given set of characters contains characters form a set of radix
(only two distinct single characters). Using bits,
and relations (
4) and (
5), then:
If
,
, indicating that we have two possible states and each state encodes
bit per character:
If
,
, indicating that we have four possible states and each state encodes
bits per character:
If
,
, indicating that we have eight possible states and each state encodes
bits per character:
If
,
, indicating that we have 16 possible states and each state encodes
bits per character:
If the set has N characters and we take , then is the number of possible states and each state encodes bits per character.
3. Data Segmentation into Windows
Consider again our set of
N characters
. First, we create a subset called “window”, containing a number of characters called “window size”, WS. Taking a number of characters from
to
where
, creates the first window. Starting from left to right, one slides the segment of size WS across the whole set, where the position of each new window is obtained by sliding WS from left to right for a fixed number of characters, called “step size”, SS. In order to ensure that all sections of the set are captured by this process, the SS must be at least 1 and maximum WS, so
. By doing this, a given set of
N characters, will result in a new set of
windows,
, given by the formula:
Ideally, the SS is taken to ensure the ratio
is an integer, by selecting SS to satisfy the relation
where
A is positive integer. However, in order to ensure that
is an integer, another alternative is to augment the dataset by
additional blank characters to cover the entire dataset with windows:
The new augmented set has
characters.
Figure 1 shows this procedure applied to a set of bits. In the case illustrated in
Figure 1,
and
. Starting from left to right, the first window is formed, then sliding this to the right by SS, the second window is obtained, and so on until the whole set is split into
windows, forming a new set of
elements.
The link between the index of a given
window and the index of the first character in the set corresponding to the
window is given by the formula:
where,
and
, with
given by relation (
6).
4. Generating a Set of -Blocks
We already established that a new set constructed using
m-blocks from a given set of
N single characters, containing
U distinct characters, will have
distinct
m-blocks/characters. We are now working out how many
m-block elements will be in the newly formed set of
m-blocks. Just as the “windows” segmentation of the set of
N characters resulted in a new set containing
windows, the same set could be transformed into a new set of
m-blocks containing
elements. The set of
m-blocks is constructed using the same procedure used for generating the set of windows. However, instead of sliding the segment of size WS from left to right, in step size SS, where
, we now slide the
m-block segment from left to right in step size of ss, where the condition on ss is now
. When applied to each window, the newly formed set of
m-block elements within a window of size WS, contains a number of characters given by:
It is important to observe that the values of
m and ss are selected so that the sliding procedure produces a set of
integer elements. To clarify this procedure, let’s observe again a few examples. Let’s assume a random set of
single characters (this could be a window with
) and U = 2:
Taking
and
, we generate the new set of
m-block characters by sliding the
m-block segment of two single characters from left to right in steps of 1. This results in the following set containing
elements, as dictated by (
10):
Constructing another
m-block set with
and
, according to (
10) we obtain the following set of
elements:
Using this procedure, a new set of any
m-block size, with
, can be generated. For sets of single characters, the window size has to satisfy the condition
, in order to ensure that the IE per window can take all possible values between zero and the maximum value theoretically permitted,
. In the case of sets of
m-blocks, to ensure that the IE per set can take all possible values between zero and the maximum value theoretically permitted,
, the new imposed condition is:
. Using (
10) and solving for WS we obtain:
5. Generating the Entropic Barcode of a Digital File
Using the dataset windows segmentation procedure described above, together with the
m-block procedure, the Entropic Barcode of a set is obtained by computing the information entropy IE value of each window, and plotting the IE values as a function of the window index location within the new set. The IE of each window is computed identically using (
1) for single character sets, or (
3) for sets of
m-block characters, with each window containing WS characters,
, and a number of
U distinct characters. The Entropic Barcoding technology is the subject of a recent patent application (GB2404348.1) and allows a conversion of the information contained within any dataset into a compressed numerical set that can be used for further data processing on its own, or as a graphical optically readable barcode [
8,
9]. The generated Entropic Barcode is a representation of the dataset, which is irreversibly encrypted and it has a massively compressed digital footprint size. In addition to the one-way encryption and data compression, the Entropic Barcode method offers a representation of the original dataset that could be used for data integrity checking, fraud detection, data labeling and fast identification via laser barcode scanning. This technology is applicable to any dataset including genomes, but one of the main applications of the Entropic Barcoding technology is to digital files. Any digital file is composed of 0s and 1s in machine code/binary language. This can be seen as a set of
N characters containing two distinct characters, 0 and 1, so
,
, and fractions distributions are
. Using (
1), the IE of the set can be easily calculated. To demonstrate this process, let us use a random set of
bits:
If the bits within this set would occur with equal fractions (
), then the set would have
, and a total entropy of
bits of information. However, counting the single bits, the above set has the following fractions:
resulting in:
Hence, the IE of this binary set is 0.989 bits instead of 1 bit, and the total IE of the set is 15.84 bits instead of 16 bits. This is the basis of calculating the IE of each window within a set to generate an Entropic Barcode of the set, including a set of bits. The proposed Entropic Barcoding technology can only be implemented using fully automated computer software and we created a proof of concept software called ENtropic BARCoding (ENBARC), which is freely available by contacting the authors. For any given digital file, one can compute the Entropic Barcode representation of the original digital file, by implementing the following steps:
The file targeted for barcoding is decomposed from its own format into a string of bytes.
The bytes are then converted into a text file containing a long string of bits, 0s and 1s. All other characters are removed.
The string of 1s and 0s is split into windows.
The IE per each window is calculated.
The Entropic Barcode of the file is generated, which is a text file containing the IE values per window versus window location index.
The graphical representation of the Entropic Barcode of the file is obtained by plotting the IE values per window versus window location index.
Figure 2 shows diagrammatically the proposed methodology to generate the Entropic Barcode of a digital file, including an example of a graphical Entropic Barcode representation.
6. Application to Blockchain – Entropic Blockchain
Blockchains are decentralized databases. The blocks within a blockchain contain data/transactions in digital format, as well as the hash keys and time stamps. Blocks are then appended to the blockchain in chronological order, sequentially as they are created.
Figure 3 shows a diagrammatic representation of a blockchain.
Since blockchains are composed of blocks of digital information, the Entropic Barcoding technology of a digital file described in the previous section can be applied to the blockchain databases, just as detailed in the previous section. This process will be explained here and the result is a superior blockchain, called Entropic Blockchain. The method involves the conversion of all the block’s content into a string of bytes. The result is another block, which is a unique expression of the initial block, but its information content is expressed as a string of digital bytes. The block of bytes is then processed into a string of bits 0s and 1s. The result is a new block represented as a binary string, which is again a unique representation of the initial block’s content. At this stage, the method allows for two possible implementation options, as shown diagrammatically in
Figure 4:
Option 1: To compress the entire block’s information into a single IE numerical value by selecting the and calculating the IE value of the block.
Option 2: To compress the entire block’s information into an Entropic Barcode.
Both proposed options result in massive one-way data compression and simultaneous encryption. However, the encryption achieved in this way is irreversible, because there is no process of reconstructing the block’s information from the IE value or from the Entropic Barcode. In this sense, the IE value or the Entropic Barcode of the block are similar to a block’s Hash key, because any changes of the data inside the block, will result in changes of the IE value, or changes of its Entropic Barcode.
Figure 5 shows diagrammatically the methodology proposed for the implementation of these two options to an entire blockchain database.
Repeating the procedure described in
Figure 4 for the entire blockchain, i.e. for each block within the blockchain, Option 1 will produce an Entropic Barcode of IE values versus the block numbers, which is a unique 2D barcode of the entire blockchain. Deploying Option 2 will convert the entire blockchain into a chain of Entropic Barcodes, in which an Entropic Barcode represents each block and a chain on
N blocks will result in a chain of
N Entropic Barcodes. Both proposed options drastically reduce the size of a blockchain, while simultaneously encrypting its content. However, the new Entropic Barcode representation of the entire blockchain (Option 1) is particularly attractive because it could be used for fast peer-to-peer validation via the Entropic Differential Barcoding technique (see
Section 7).
The details of mining are beyond the scope of this article and will not discussed here, but essentially each time a new block is created, a copy is sent to all nodes together with the information entropy value (Option 1) or the Entropic Barcode (Option 2) of the last block. The validation is performed on the Entropic Barcode of the blockchain Option 1), or the chain of Entropic Barcodes (Option 2), instead of the entire blockchain.
Once the majority of the nodes agree on validating the Entropic Blockchain, the newly created block is added to the blockchain’s master copy, while the last IE value of the newly created block is added to the master blockchain’s entropic barcode (Option 1), or the last entropic barcode of the newly created block is added to the chain of entropic barcodes (Option 2). The benefit is that the newly created Entropic Blockchain is small enough to be emailed or transferred rapidly from terminal to terminal and the validation process is extremely fast, saving time, energy and data storage requirements. Hence, the proposed Entropic Blockchain technology offers solutions to the scalability of blockchain data facilitating more effective download, transfer, validation and confidentiality process. However, some of the decentralized features of blockchain technology are lost, because the Entropic Blockchain requires a blockchain master copy being kept in a digital volt, while all operations are performed on its reduced Entropic Blockchain version.
7. Entropic Differential Barcode (EDB)
For any given digital asset, one can use the Entropic Barcode technology to generate an Entropic Barcode of the original asset, not only for the purpose of labeling, compressing and encrypting, but also for detecting any changes in it at later time. Detecting any changes in the digital assets is a very powerful application because it allows one to check the integrity of a digital asset without accessing any information within it, maintaining full privacy of the data. Essentially after the Entropic Barcode is generated, any changes to the original digital asset, as small as just one bit, will be easily detected regardless of its size. The method involves reconstructing the Entropic Barcode of the digital asset again and then comparing it to the original Entropic Barcode. The comparison between the two barcodes is performed via the Entropic Differential Barcode (EDB), which is obtained by subtracting the two barcodes from each other. In a previous study using a similar method and applied to genetic sequences, the Entropic Ratio has been used instead of the Entropic Differential Barcode to detect genetic mutations [
10,
11]. However, it is very possible that sections of the dataset might have IE = 0. This creates a problem when applying the ratio of the two spectra/barcodes as a number divided by zero is not computable. Hence, to avoid this problem, a better method based on the Entropic Differential Barcode (EDB) is introduced. If the asset is unchanged, the EDB will be 0 everywhere. If the asset suffered any changes, the EDB will show deviations from 0, at the location where the changes occurred. This method could therefore be used to check the integrity of any digital asset and to detect any possible changes to it, however small. The applications are multifold and include validation of digital data files, blockchains and crypto tokens. To demonstrate the method of Entropic Barcoding and Entropic Differential Barcoding of digital files we will show here an example.
Figure 6a shows an example of an Excel spreadsheet digital file for which we produced its unique Entropic Barcode representation (
Figure 6b) of the original file. In this example, the Excel file has
bits. For this particular example we imposed 1000 points barcode size, i.e. the number of windows created equals 1000,
, obtaining WS = 655. The SS has been taken equal to the WS, WS = SS = 655. The above example has been generated using an
characters
m-block with step size
. The maximum possible IE per window is therefore 3. The entire barcode contains data with maximum IE = 2.9765, minimum IE = 0 and average
. For 1000 points Entropic Barcode, the resulted spectrum text file is always 15 Kilobytes (Kb), regardless of the type or size of the original file. Hence, this technique can achieve significant data compression from multiple Gigabytes (Gb) to 15Kb, while the Entropic Barcode is a true and unique representation of the original file, which does not disclose any of the file’s content.
After the barcode is generated, any changes or alterations to the original file, as small as just one bit, will be easily picked up. This method could be used to check digital fraud and the integrity of any digital file, including financial data files, banking records, image files, any type of document or software, blockchains, non-fungible tokens (NFTs), etc. Moreover, the method allows one to preserve the digital content of the file into this format while maintaining the full confidentiality of the file’s content, i.e. the Entropic Barcode does not reveal any information contained within the file. For example, one can have a digital copy of the Entropic Barcode of the spreadsheet Excel file analyzed here, and there is no way of extracting any numerical or financial data contained in the file from the barcode itself. However, after the original file has been Entropic Barcoded, any future changes to the file itself can be detected by barcoding the file again and using the Entropic Differential Barcode (EDB) method to check for the file’s integrity. To demonstrate this, we intentionally tampered the file by changing a single bit (0 into 1) and we run the EDB program to validate the file.
Figure 7 shows the Entropic Barcodes of the original file, the 1-bit altered file and the EDB verification spectrum. The data clearly shows that a single bit alteration, which was a 0 changed into a 1, has been detected via the EDB validation check, showing a spectrum that contains a non-zero value at the location index 522. This means that the window index 522 contains a modification/alteration to the file. The original file’s IE value of the window index 522 was 1.831 bits. The change of a 0 into a 1 resulted in a change of the IE value from 1.831 bits to 1.847 bits. Hence the non-zero spike in the EDB spectrum that indicates the alteration has an absolute value of 0.016 bits. Any substantial changes to the file that result in a change of the total number of bits will show up as an EDB spectrum that is non-zero everywhere. This is a powerful technique that could be deployed to perform forensic verification of any form of digital file/asset including performing financial audits, and validating blockchains, crypto tokens or NFTs.
8. Conclusions
The Entropic Barcoding technology facilitates the creation of new data processing tools with unique properties. The proposed method makes use of the Shannon’s information theory to simultaneously compress, encrypt and barcode any dataset. Here we proposed using this technology to digital data files and blockchain databases, to achieve the next generation blockchain technology, called Entropic Blockchains. Using the Entropic Barcoding, the datasets are massively compressed in terms of their digital footprint, allowing savings in digital data storage and transmission of data. In the same time, while condensing a whole dataset into an Entropic Barcode, the barcode acts as a one-way encryption of the data, because there is no mechanism to reconstruct the set from its Entropic Barcode. When this method is combined with the Differential Entropic Barcoding technique also proposed here, the methodology allows fast detection of any change of the digital asset, facilitating an ultra-effective, fast and energy efficient blockchain validation method.
9. Patents
The Entropic Barcoding technology described in this study is the subject of the patent application GB2404348.1.
Author Contributions
Conceptualization M.M.V.; All authors contributed equally to developing this project and writing the article. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by private funding from the Information Physics Institute
Data Availability Statement
Software ENBARC, supporting reported results, is freely available by contacting the authors.
Acknowledgments
We are grateful for the financial support received for this research from the University of Portsmouth and the Information Physics Institute.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| N |
the total number of characters in a given set X; |
| X |
the set of unique characters, ; |
| U |
the radix of a set each character belongs to, ; |
| W |
window; |
| WS |
window size, or the numbers of characters in a window, ; |
| SS |
the fixed number of characters of sliding WS, ; |
|
the total number of windows obtained from the original set of N characters; |
| m |
the m-block size, or the number of single characters combined to form a new |
| |
character, ; |
| ss |
the fixed number of characters of sliding m-block, ; |
|
the number of distinct m-blocks in a set X; |
| IE |
information entropy. |
References
- Habib, G.; Sharma, S.; Ibrahim, S.; Ahmad, I.; Qureshi, S.; Ishfaq, M. , Blockchain Technology: Benefits, Challenges, Applications, and Integration of Blockchain Technology with Cloud Computing, Future Internet 14, 341 (2022). [CrossRef]
- S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System (2008). [CrossRef]
- Bitcoin Blockchain Size, 2024. https://ycharts.com/indicators/bitcoin_blockchain_size.
- Bitcoin Energy Consumption Index, 2024. https://digiconomist.net/bitcoin-energy-consumption.
- C.E. Shannon, A mathematical theory of communication, The Bell System Technical Journal, Vol. 27, pp. 379–423 (1948). https://doi.org10.1002j.1538-7305.1948.tb01338.x.
- M. Vopson, S. Lukaszyk, A. Vopson, The Entropic Barcoding Technology, UK Patent Application Number: GB2404348.1, 26 March 2024.
- A.O. Schmitt, H. A.O. Schmitt, H. Herzel, Estimating the Entropy of DNA Sequences, Journal of Theoretical Biology, Vol.188 (3), 369-377 (1997). [CrossRef]
- Norman J Woodland, Silver Bernard, Classifying Apparatus and Method, Patent US2612994, Oct. 7 (1952).
- Method and System for Verification and Authentication Using Optically Encoded QR Codes, US2015/0295711 A1, Oct. 15 (2015).
- M. Vopson, S.C. M. Vopson, S.C. Robson, A new method to study genome mutations using the information entropy, Physica A: Statistical Mechanics and its Applications, Volume 584, 126383 (2021). [CrossRef]
- M. Vopson, A possible information entropic law of genetic mutations, Applied Sciences Vol. 12, Issue 14, 6912 (2022). [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).