A distributed scalable transaction (ACID) key-value database with crazy thorough testing is called as FoundationDB. It was quite good that three years ago the complete company was bought by the Apple and Shut value. Irrespective of that unfortunate fact, there are lots of companies which uses FDB even now:

  • Snowflake: It is a known fact that FoundationDB is a significant part of our architecture and it has permitted for building few truly amazing and differentiating features.
  • Wavefront: As per VMWare, the cloud monitoring and analytics has been done with the help of FoundationDB extensively for about 50 clusters spanning petabytes of data in production.
  • Update: For the purpose of FoundationDB, online warehouse management system as a distributed event store holding various events and coordination for nodes in the cluster is used.
  • Data Model

There is weird design for FDB which has a key-value database. It is possible to think of it as a giant sorted dictionary and here both keys and values are byte arrays. With that dictionary, it is possible to do the normal operations while linking with lots of operations inside a single ACID transaction.

You can find very low-level interface as it permits you to construct your own data layers on top after hosting them on a single cluster:

  • Tables
  • Object Storage
  • Lists
  • Graphs
  • Indexes
  • Blob Storage
  • Distributed commit-logs
  • High-contention queues
  • pub/sub

However, FoundationDB is regarded as a database constructor.

  • Testing

The database is developed inside by the FoundationDB called deterministic simulation. There are some IO operations like disk and network that was abstracted which was permitted by injecting various faults while running clusters inside the load inside an accelerated time.

Let us view some examples of faults unveiled in such environments:

  1. Buggy Router
  2. Network Outage
  3. Disk Outages
  4. Machine Reboots and Rreezes
  5. Human Errors

With the help of a single thread, a complete cluster can be stimulated and this is what a deterministic stimulation is regarded as.

When there is a self-manifestation of bug on its own it is possible to replay that simulation until you want it to happen. It will be the same always and just you need to keep the initial rand-seed.

Inside a custom scheduler, the simulated system is run by enabling you to force the time to move ahead similar to any discrete-event simulation. If you are aware that for the next 10ms there is nothing interesting that is going to work it out and it is instantly fast-forwarding the world to that point in time.

  • Simplified simulation of TCP/IP: Reorder buffers, SEQ/ACK numbers, connection handshakes are a part of this. You can also find a proper shutdown sequence and no packet re-transmissions.
  • Durable node storage: As per machine folders there is a use of LMDB database.
  • Simulation plans assists in specifying how we want to run the topology which is simulated and this includes a graceful chaos monkey.
  • Simulating power outages: by removing the future of the affected systems.

Network profiles: latency configuration ability with packet loss ratio and logging with respect to the network connection.

Join DBA Course to learn more about other technologies and tools.

Stay connected to CRB Tech for more technical optimization and other updates and information.

Reference site: abdullin