Content Addressing | Lesson 4 of 5

Cryptographic hashing and Content Identifiers (CIDs)

So far we've exclusively been discussing adorable images for the sake of our general merriment, but content addressing can be used on all different types of files and data, from JSON objects to term papers to videos. For cryptographic hashing to work, we need to know what data format we're working with and use an appropriate tool.

Decoding data structures

A CID (Content Identifier) is a particular form of content addressing used on the decentralized web. It was developed for IPFS (a decentralized web protocol which we'll discuss in later tutorials), but has very broad implications.

A CID is a single identifier that contains both a cryptographic hash and a codec, which holds information about how to interpret that data. Codecs encode and decode data in certain formats.

+-------+------------------------------+
| Codec | Multihash                    |
+-------+------------------------------+

Many formats and protocols use content addressing already. Tools like Git and protocols like Ethereum and Bitcoin are among them, but they differ in how to interpret the data and in what cryptographic function they use for hashing. CID allows us to create a universal identifier for any of these systems.

Every CID is an identifier that contains the codec to interpret the data and a multihash which is a self-describing hash (a hash that tells you what type of hashing function was used to create it).

+------------------------------+
| Codec                        |
+------------------------------+
|                              |
| Multihash                    |
| +----------+---------------+ |
| |Hash Type | Hash Value    | |
| +----------+---------------+ |
|                              |
+------------------------------+

For lots more detail on how CIDs are constructed in IPFS, check out our Anatomy of a CID tutorial.