18 minutes
IPFS - Cloudflare Distributed Web Gateway
Adventure! Danger! Immutable Archives!
InterPlanetary File System is a peer-to-peer distributed file system that stores immutable, content-addressed blocks of data using Merkle DAG objects. If that description doesn’t resonate, it’s fair to think of IPFS as a versioned content delivery network where each node in the network can act as both a client and a server. IPFS is the brain child of Juan Benet and one of several projects by Protocol Labs whose FileCoin ICO raised more than $250 million USD in 2017.1
IPFS takes concepts from well-known technologies including Distributed Hash Tables (DHT), BitTorrent, Git and Self-Certifying File Systems (SFS).2 Building on these, IPFS introduces a novel technology dubbed BitSwap which is a BitTorrent-like protocol for sharing blocks of data.
The BitTorrent protocol let’s peers exchange data in which they are directly interested through a tit-for-tat sharing scheme.3 This sharing scheme is governed by a choking algorithm which clusters together peers of similar bandwidth.4 This works fine when peers share a mutual interest in a file but it fails to capture more complex relationships.
BitSwap iterates on this idea by enabling peers to negotiate block exchange for the purpose of building credit. IPFS is designed to allow clients to model sophisticated trade strategies with BitSwap making it possible to build credit-based overlay networks on IPFS.5 The implication is that peers can seek remuneration for network activities such as storing or transmitting a file.6
So what problem does IPFS solve?
News articles periodically draw attention to the loss of our early internet culture. Brilliant efforts by projects such as The Internet Archive and Archive Team work to preserve the Internet but when a site or service closes down there may be no opportunity to save it. Even so, the magnitude of the data is not for the faint of heart. The Internet Archive claims “a single copy of the Internet Archive library collection occupies 45+ Petabytes of server space (and we store at least 2 copies of everything)”. Hardly a casual homelab. But if we rely on the backups of the Internet Archive and/or benevolent rogue archivists, on whose archives do they depend? Who archives the archivists?
It’s actually an effort that we can democratize. Take some corner of Reddit like /r/netsec; at any given time there are several hundred concurrent users whom are all browsing the same, slowly moving content. In the current model each user makes requests to websites Reddit displays. Despite a few hundred users all browsing the same content, the total number of available online copies does not change and the load on the content server grows in proportion to the number of requests. If the number of requests is too high, such as when a small site or blog blows up on Reddit’s front page, it may fall victim to a hug of death making it temporarily unavailable.
The alternative model of IPFS is that, in addition to the original source, each user locally caches a copy and makes it available to all other users on the network. So when a little site becomes suddenly popular it also becomes more widely distributed across the network as everyone browsing it is caching and serving a copy. That’s much better than a hug that DDoS’s a site.
With the introduction of an incentive market, while sharing content with peers, one can farm Internet Archive points7 and/or digital currency for hosting some small, small percentage of the Internet. Recall that the trade strategies can be arbitrarily defined so it’s on us to get creative in how and why we chose to share data.
The IPFS whitepaper is an interesting read. It’s accessible to anyone with basic knowledge of Git and P2P networking.
IPFS Content Addressing
Objects on the IPFS network are identified and addressed by their Content Identifier (CID). Normally one loads a website by requesting the data hosted at a particular IP address. TLS can verify that the server has a valid certificate for the expected website but there’s no way to validate the data it sends is also expected. If an attacker has compromised a site to serve malicious JavaScript, a valid certificate only indicates that the malicious JavaScript was not tampered with en route.
IPFS addresses content using the cryptographic hash of that content. So when a user requests an object they are asking the IPFS network to return the data which hashes to that CID. The result is that it’s not important from where the content is served nor whether the host is trusted. If the hash is the same then the data to which is refers is necessarily the same.
An IPFS CID is a self-describing format for addressing content on the IPFS network. A CID contains the hash value of the object to which it refers and additionally multiple prepended fields of the object metadata. Each field, including the hash, is a multiformat.
Multiformats are objects containing self-describing data. They’re intended to future-proof a protocol by permitting easy extensibility. When a file is added to IPFS it is hashed to construct a multihash. The multihash contains the message digest but also the name of hash algorithm used and the length of the message digest. The multihash, along with other multiformats, are concatenated to create a CID.
# A CID is formed by the concatenation of 4 multiformats.
CID = multibase||cid-version||multicodec||multihash
# Each multiformat is itself concatenated data.
multihash = hash-function||digest-size||hash-value
If it seems like a lot of trouble to simply address a file consider that multiformats provide a means to identify the usage of and/or retire insecure hash algorithms.
What does a CID look like? How can it be used to understand the content it addresses?
# Add a string object to IPFS.
echo "sphinx of black quartz judge my vow" | ipfs add -n
# The first CID is the address of the string. The second CID is the directory in which it resides.
# Both CIDs are the same value when the object is not in a directory.
added QmebT2bY4NozXFcY1VBc4dicHmSUrUEUVLLxVB1ajjdy4M QmebT2bY4NozXFcY1VBc4dicHmSUrUEUVLLxVB1ajjdy4M
All CIDs 46 characters long and starting with “Qm” are CIDv0. CIDv0 does not explicitly state the multibase, cid-version, and multicodec; these values are static and implied. A CIDv0 is therefore just a multihash.
Decode the CIDv0 to hex using CyberChef to view the multihash data.
# Hex value of the decoded CIDv0.
12 20 f1 85 7e b6 31 09 b5 b3 97 56 84 64 98 b8 ef 19 1c d7 a6 96 87 c9 dd b3 fb 99 16 c0 1a c9 f3 62
The multicodec table contains the multiformat codes of IPFS and it’s used to understand what data the multiformat describes. The first byte of the CIDv0 is 0x12
which is the code for sha2-256
. The second byte is 0x20
which is the length of the message digest in bytes. The remaining data is the message digest f1 85 7e b6 31 09 b5 b3 97 56 84 64 98 b8 ef 19 1c d7 a6 96 87 c9 dd b3 fb 99 16 c0 1a c9 f3 62
.
The integrity of the digest can be checked by hashing the original data with the sha2-256
algorithm and comparing the results.
# Hash the data added to IPFS in the previous step.
echo "sphinx of black quartz judge my vow" | sha256sum
0b0675fba09e344193415664e8ce542c51c7fcd98c2febe03bc126a9f1548beb -
What happened?
# Get the object and pipe it to a hex editor.
ipfs object get QmebT2bY4NozXFcY1VBc4dicHmSUrUEUVLLxVB1ajjdy4M | xxd
00000000: 7b22 4c69 6e6b 7322 3a5b 5d2c 2244 6174 {"Links":[],"Dat
00000010: 6122 3a22 5c75 3030 3038 5c75 3030 3032 a":"\u0008\u0002
00000020: 5c75 3030 3132 2473 7068 696e 7820 6f66 \u0012$sphinx of
00000030: 2062 6c61 636b 2071 7561 7274 7a20 6a75 black quartz ju
00000040: 6467 6520 6d79 2076 6f77 5c6e 5c75 3030 dge my vow\n\u00
00000050: 3138 2422 7d0a 18$"}.
# Or print it as JSON with jq.
ipfs object get QmebT2bY4NozXFcY1VBc4dicHmSUrUEUVLLxVB1ajjdy4M | xjq
{
"Links": [],
"Data": "\b\u0002\u0012$sphinx of black quartz judge my vow\n\u0018$"
}
IPFS stores the message as a DAG so it makes sense that there’s some extra data there to represent the structure of the DAG and any links but in this case it isn’t the reason for the mismatched digests. To view only the block data use ipfs block get
command.
# Get the block and pipe it to a hex editor.
ipfs block get QmebT2bY4NozXFcY1VBc4dicHmSUrUEUVLLxVB1ajjdy4M | xxd
00000000: 0a2a 0802 1224 7370 6869 6e78 206f 6620 .*...$sphinx of
00000010: 626c 6163 6b20 7175 6172 747a 206a 7564 black quartz jud
00000020: 6765 206d 7920 766f 770a 1824 ge my vow..$
There are still extra bytes added to the original message and it explains why the resulting message digest is not the expected value. These bytes are actually an artifact of the data serialization process to a DagProtobuf used in CIDv0 and they represent some information about the folder and file.8,9
Hashing the data contained in the block returns a digest matching that of the CID.
ipfs block get QmebT2bY4NozXFcY1VBc4dicHmSUrUEUVLLxVB1ajjdy4M | sha256sum
f1857eb63109b5b39756846498b8ef191cd7a69687c9ddb3fb9916c01ac9f362
CIDv1 does it a little differently. Instead, the multiformats are explicitly stated as one would expect, and the block data does not contain the ugly protobuf data.
Generate a CIDv1 using the command switch.
# Generating a CIDv1 using it's command switch.
echo "sphinx of black quartz judge my vow" | ipfs add --cid-version=1
added bafkreialaz27xie6grazgqkwmtum4vbmkhd7zwmmf7v6ao6be2u7cvel5m bafkreialaz27xie6grazgqkwmtum4vbmkhd7zwmmf7v6ao6be2u7cvel5m
# Notice that the protobuf data is no longer present.
ipfs block get bafkreialaz27xie6grazgqkwmtum4vbmkhd7zwmmf7v6ao6be2u7cvel5m | xxd
00000000: 7370 6869 6e78 206f 6620 626c 6163 6b20 sphinx of black
00000010: 7175 6172 747a 206a 7564 6765 206d 7920 quartz judge my
00000020: 766f 770a vow.
For those curious, a CID viewer is available here for parsing multiformat data.
IPFS Web Gateway
An IPFS web gateway is a reverse proxy for serving files on the IPFS network to a web client. The gateway is transparent to users; there’s no requirement for users to install an IPFS client nor to be aware that IPFS magic is happening on the back end. From the perspective of the web client, IPFS content is just another web resource.
Where accessing IPFS content using the go-ipfs client looks like this:
ipfs cat /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme
Requesting content from an IPFS gateway looks like this:
https://ipfs.io/ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme
There’s an official IPFS web gateway available here, or for the DIY crowd a good tutorial is available here, and without Nginx here.
Cloudflare Distributed Web Gateway
Cloudflare’s Distributed Web Gateway is an IPFS gateway distributed across their 150+ data centres. The choice to replicate data through Cloudflare’s IPFS nodes comes with attractive advantages; it’s very fast, it’s free for non-commercial use, and sudden huge spikes in traffic are much less likely to adversely affect the IPFS nodes hosting a website’s content.
Normally, IPFS web gateways simply act as proxies; if IPFS nodes responsible for pinning content are not available, and no other nodes in the network have that content, the content is not available.
Cloudflare does it a little differently. Rather than serving a website’s content directly from IPFS, Cloudflare’s massive network of data centres caches the website content. In a situation where the IPFS nodes hosting content suffer an outage the content will remain available through Cloudflare until removed or expired from the cache.
To make a site available through their gateway first configure a CNAME record which points to their gateway and then a TXT record to specify the IPFS resource to request. Note that my CNAME points the root of the vvx7.io
domain; if not using Cloudflare’s DNS make sure the DNS provider supports CNAME flattening (or otherwise use a subdomain).
CNAME vvx7.io www.cloudflare-ipfs.com TTL
TXT _dnslink dnslink=/ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ TTL
The last step is to generate a Cloudflare SSL certificate.
Test everything works by navigating to the site. It should be serving the IPFS content from the TXT record.
Disclaimer: I’m not a Cloudflare shill. Know a different/better company offering IPFS services? Message me.
Ansible IPFS Cluster
This ansible playbook will set up IPFS and IPFS Cluster.
Each node in a cluster will run an IPFS daemon and an IPFS Cluster daemon. Running a cluster is a good idea if fault-tolerance is important but it’s not a requirement as even a single node will perform well with the Cloudflare gateway.
Clone this repository to the ansible controller.
git clone https://github.com/hsanjuan/ansible-ipfs-cluster
cd ansible-ipfs-cluster
Add each node’s FQDN to the inventory.yml
file.
echo "vps1.cloud" >> inventory.yml
echo "vps2.cloud" >> inventory.yml
echo "vps3.cloud" >> inventory.yml
Generate an IPFS Cluster secret to replace the default value of ipfs_cluster_secret
in group_vars/ipfs.yml
od -vN 32 -An -tx1 /dev/urandom | tr -d ' \n' ; echo
In the host_vars
folder create a file named after each node.
touch host_vars/vps1.cloud
touch host_vars/vps2.cloud
touch host_vars/vps3.cloud
Install ipfs-key
. This is a GoLang package for generating IPFS key material.
go get github.com/whyrusleeping/ipfs-key
Generate unique keys for each host_vars
file.
ipfs-key | base64 -w 0
Each host_vars
file must contain the following keys. Replace each value with the public or private key from the last step.
|
|
Before executing the playbook we first need to open firewall ports to allow IPFS traffic. IPFS ports are defined in the IPFS config file roles/ipfs/templates/home/ipfs/ipfs_default_config
.
|
|
While IPFS Cluster ports are defined in the IPFS Cluster service file roles/ipfs-cluster/templates/service.json
.
|
|
Open the IPFS swarm and IPFS cluster ports.
sudo ufw allow 4001/tcp
sudo ufw allow 9096/tcp
Finally, run make
to execute ansible-playbook
which will install IPFS and IPFS Cluster.
make
Or without IPFS Cluster.
make ipfs
IPFS is configured to run under the user ipfs
. Request a file from network to verify it’s connecting to peers.
su - ipfs
ipfs cat /ipfs/QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme
If something has gone wrong refer to the IPFS logs.
ipfs log tail
Hosting on IPFS
IPFS is intended to store versioned, immutable content. So it makes a lot of sense that IPFS pairs well with a static website generator. This site is built using Hugo. Thank you Djordje Atlialp for the beautiful theme.
Setting up a Hugo site is outside the scope of this post and so it’s assumed that Hugo is already working and has compiled a site to the public/
folder.
Normally this folder would be copied to /var/www/vvx7.io
or wherever Nginx looks for sites. Instead simply add the folder to IPFS.
# -r : recurse
# -Q : only display CID of root folder
ipfs add -rQ public
QmVSi4och46hZukxULidtVXGB9DQ7w1Hr9kjEktfrUUtmi
That’s it. The site is now available on the IPFS network. Use the ipfs ls
command to list the links from the folder object and see that the directory structure is identical to the public/
folder.
ipfs ls QmVSi4och46hZukxULidtVXGB9DQ7w1Hr9kjEktfrUUtmi
QmcfdGwaHtWmqh8DotFhWKy7m9vMCuXBsBy5ZEtTcpxsN6 - about/
QmYZjmtKZ5CECVtUmAEvK4o7fju5tmm7DhQDN371isNZu7 - contact/
QmQJHDDnQDtsDgTA7wSorPA3qFnayaZyujeD8dJoLQXMN9 - css/
Qmf7VuEyBx7w48gwK2TbDsXB7izaDJTNpXa8KhvbvmHhyq - fonts/
QmfDvjaNnkE5zXUZGJnVpFZJqn7Q2M4xGtHLNzCzUfUiMd - html/
QmZaxafB5XjCQvvGQPC4S56sqR55pgS1FgrLded7QyF9Py 7273 index.html
QmeKf7P7XT1FmUnDRgGgAqPYYH1e1XVF4wMMyoGz75jNGm - js/
QmSnhB63QJhCpqQouvvQv4HQzmw3M42Aos3UwN3qhJFzSs - pgp/
QmdzKeiyauN28b1WtKcf426rr8VwstuAcwar9qHuJLhR7V - scss/
QmfUdX6xn7ACfXNwCtRYdb8NFBCjYqPshVo6HnNbrhV8Ru - tags/
Refer to the previous section Cloudflare Distributed Web Gateway
and copy the CID of the root folder into a Cloudflare DNS TXT to make the site accessible through a browser.
Security Considerations
A technical analysis of IPFS is outside the scope of this post but it’s worth mentioning why this site is hosted on IPFS and what could go wrong.
Compromised IPFS Node
The premise of IPFS is that content is uniquely identified and addressed by it’s hash. What happens if an attacker has compromised one or more of the IPFS nodes hosting this site’s content? How does it affect users?
If an attacker defaces the site or modifies a file to contain malicious JavaScript the modified file will hash to a different CID. This is also true of a directory; changing any file in the tree will also result in a different directory CID. One can be sure that if the root CID is unchanged, the content it identifies must also be unchanged.10
So if a site is defaced, or modified to serve malware, the changes won’t be pushed to the users requesting the site content because it’s the Cloudflare DNS TXT record, and not the IPFS nodes, responsible for telling clients which CID to request. This means that nodes can be trusted to fail and/or be compromised while at the same time content remains available on the network through unaffected peers.
Integrity
In a web browser, IPFS blocks are not validated to match the CID, so browsers must trust the gateway is serving the requested content, and also that the content was not manipulated in transit. The IPFS Companion browser plugin for Firefox and Chrome solves this by redirecting IPFS requests to a locally running IPFS node but asking users to locally install IPFS and a browser plugin is pretty unreasonable.
A better solution is the js-ipfs implementation which will make it possible for browsers to natively run IPFS in JavaScript. The implementation is currently in alpha.
DDoS
IPFS content is only as resistant to DDoS as the combined power of the nodes hosting the content. If content is hosted by a single node, it’s unlikely that IPFS will provide any better DDoS resistance than would a typical webserver. If the content is popular and widely distributed, it becomes more difficult to DDoS but low level attacks against IPFS are theoretically possible if not yet seen in the wild.11,12
The Cloudflare Distributed Web Gateway can mitigate the risk of a DDoS by shielding the IPFS nodes pinning content behind Cloudflare’s DDoS resistant infrastructure, but be aware that this isn’t a property of IPFS.
XSS
Web browsers enforce the same-origin policy to sandbox sites at the domain level. IPFS web gateways use DNS TXT records to serve all files from the same domain which means that all sites served through the gateway share the same origin and are therefore vulnerable to cross site scripting.13
For a static site without user accounts, no backend API, and no user input, this is less of an issue but it prohibits the use of a shared web gateway for most other web applications.
Cloudflare can mitigate this with custom subdomains on their gateway but it requires DNSSEC and some extra steps.14
HTTP Headers
Cloudflare doesn’t allow custom headers on the web gateway so it’s not possible to obfuscate the server signature or deploy any other protection that might be used on a self hosted Nginx server.
Performance Considerations
In theory, performance should improve as content becomes more widely distributed on the IPFS network. In practice, according to this paper, it appears that increased distribution actually has a negative effect on performance.11 The paper uses a maximum replication count of 10, but the performance degradation is so poor that the authors go on to speculate that a DDoS attack might be possible by highly replicating blocks of data.15 Essentially it’s the exact opposite of how things are supposed to work with IPFS.
The authors of that paper aren’t wrong. It’s a serious design flaw and something the developers are aware of and working on.16
The underlying cause of the performance penalty is a consequence of the current BitSwap implementation.17 When a peer requests data it publishes it’s want list to the network. Nodes with the requested block reply by sending the block and the requesting client can’t stop the transfer until the block data is read and verified.18 As the replication count grows, the client must process an increasing number of duplicate blocks sent from the replicating nodes.
Until it get’s fixed performance degradation is just something users who want highly replicated data will have to live with but it can be mitigated through caching.
Closing Thoughts
IPFS is a promising distributed web technology which is likely to become more widely adopted as the js-ipfs
implementation matures.
The performance degradation caused by IPFS spamming blocks is something I just learned about while writing this post. It’s pretty surprising given how fundamentally important high replication is to a distributed file system. I haven’t found any publications of attacks on the IPFS network that exploit this issue but it’s certainly worth exploring. DM me if you know something about it.
The Cloudflare web gateway is simple to set up and the cached site is much faster than the Nginx web gateway I initially configured. Depending on the use case it may not be suitable for hosting a web service so consider using subdomains to enforce same-origin or a private web gateway.
Thanks for reading!
-
Stan Higgins, CoinDesk, “$200 Million In 60 Minutes: Filecoin ICO Rockets to Record Amid Tech Issues”, Available: https://www.coindesk.com/200-million-60-minutes-filecoin-ico-rockets-record-amid-tech-issues. [Accessed: June 13, 2019]. ↩︎
-
Juan Benet, “IPFS - Content Addressed, Versioned, P2P File System (DRAFT 3)”, pt. 1, Available: https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf. [Accessed: June 13, 2019]. ↩︎
-
Bram Cohen, “The BitTorrent Protocol Specification” Available: https://web.archive.org/web/20190606091654/http://bittorrent.org/beps/bep_0003.html. [Accessed: June 13, 2019]. ↩︎
-
Arnaud Legout, Nikitas Liogkas, Eddie Kohler and Lixia Zhang, “Clustering and Sharing Incentives in BitTorrent Systems”, pt. 7, Available: https://web.archive.org/web/20170809091029/http://read.seas.harvard.edu/~kohler/pubs/legout07clustering.pdf. [Accessed: June 13, 2019]. ↩︎
-
Juan Benet, GitHub, Available: https://github.com/ipfs/ipfs/issues/26#issuecomment-53341631. [Accessed: June 13, 2019]. ↩︎
-
Protocol Labs, “Filecoin: A Decentralized Storage Network”, pt. 5, Available: https://web.archive.org/web/20190603151159/https://filecoin.io/filecoin.pdf. [Accessed: June 13, 2019]. ↩︎
-
This isn’t a thing but it should be. ↩︎
-
Steballen, GitHub, Available: https://web.archive.org/web/20190622165600/https://github.com/ipfs/specs/issues/162. [Accessed: June 22, 2019]. ↩︎
-
Taras Emelyanenk, Medium, Available: https://web.archive.org/web/20190622172208/https://medium.com/@php.laboratory/you-are-right-that-mysterious-u0008-u0001-came-from-protobuf-its-serialized-directory-1-from-dc3a42b59db8. [Accessed: June 22, 2019]. ↩︎
-
whyrusleeping, GitHub, Available: https://web.archive.org/web/20190619153143/https://github.com/ipfs/faq/issues/24. [Accessed: June 13, 2019]. ↩︎
-
Oscar Wennergren, Mattias Vidhall, Jimmy Sörensen, “TRANSPARENCY ANALYSIS OF DISTRIBUTED FILE SYSTEMS: With a focus on Interplanetary File System”, pt. 6.1.2, Available: https://web.archive.org/web/20190623120007/https://pdfs.semanticscholar.org/5fb4/c414e912ef0579e8434a27cc30b60a61ed02.pdf. [Accessed: June 23, 2019]. ↩︎ ↩︎
-
cobookman, discuss.ipfs.io, Available: https://web.archive.org/web/20190623164514/https://discuss.ipfs.io/t/mallicous-peers-spamming-corrupted-data/1591/3. [Accessed: June 23, 2019]. ↩︎
-
flyingzumwalt, discuss.ipfs.io, Available: https://web.archive.org/web/20190623180557/https://discuss.ipfs.io/t/what-can-be-done-to-prevent-xss-attacks-against-ipfs-sites/333/8. [Accessed: June 23, 2019]. ↩︎
-
Brendan McMillion, Cloudflare, “End-to-End Integrity with IPFS”, Available: https://web.archive.org/web/20190623093600/https://blog.cloudflare.com/e2e-integrity/. [Accessed: June 23, 2019]. ↩︎
-
Oscar Wennergren, Mattias Vidhall, Jimmy Sörensen, “TRANSPARENCY ANALYSIS OF DISTRIBUTED FILE SYSTEMS: With a focus on Interplanetary File System”, pt. 7.9, Available: https://web.archive.org/web/20190623120007/https://pdfs.semanticscholar.org/5fb4/c414e912ef0579e8434a27cc30b60a61ed02.pdf. [Accessed: June 23, 2019]. ↩︎
-
whyrusleeping, GitHub, Available: https://web.archive.org/web/20190623142435/https://github.com/ipfs/go-ipfs/issues/3786. [Accessed: June 23, 2019]. ↩︎
-
davinci26, GitHub, Available: https://web.archive.org/web/20190623132442/https://github.com/ipfs/go-ipfs/issues/5226. [Accessed: June 23, 2019]. ↩︎
-
ivan386, GitHub, Available: https://web.archive.org/web/20190623141626/https://github.com/ipfs/go-ipfs/issues/5083. [Accessed: June 23, 2019]. ↩︎