

Updates every 5 Minutes not every 24 Hours: td-agent is a versatile, robust logger that can handle up to 17,000 messages per second per instance.This means no longer has to worry about changes in the underlying data model breaking their ETL process. Flexible Schema: Treasure Data’s proprietary columnar database implements a flexible schema model that lets you add or remove a schema at any given time.For us, adding more storage or CPU is only a few keystrokes away. Scalability is no longer an issue: unlike MySQL, Treasure Data was designed from the ground up to scale.Those changes essentially solved all three of the problems that was facing. Instead of using the default file-based logging, td-agent has been installed on each Rails server to automatically forward logs to Treasure Data. Now, they run scheduled jobs on Treasure Data that update the MySQL aggregation server that powers their in-house dashboard.

Treasure Data’s Cloud Data Warehouse replaced the cluster of MySQL servers. The long feedback loop meant slower product development.īy introducing Treasure Data, transformed their infrastructure in two fundamental ways. Because of this delay, they couldn’t evaluate the effectiveness of new features or the popularity of new content in a timely fashion. Up to 24 Hours of Delay: Because the first step of the ETL process (copying log files from Rails server to MySQL servers) was run once a day, data refreshes could take up to 24 hours.While a rigid schema can help you organize and document data, it is not well-suited to a fast-moving, data-driven company like where the underlying data can change weekly if not daily. Rigid Schema: Relational databases, including MySQL, require a well-defined schema upfront.But this is not a robust solution and usually results in systems that are brittle and hard to scale. To make their MySQL servers more scalable,, like most other large-scale MySQL users, aggressively sharded their MySQL databases to alleviate the load on each instance.


However, MySQL was not designed to support petabytes of data. It’s been deployed extensively for more than a decade and the support community is active and mature.
