AHL is a systematic hedge fund where data is central to the business. Challenged by performance and scalability problems when storing and retrieving time series data using traditional data stores, we built our own.
Arctic is the result of that work. It’s a high performance time series column store built with Python on MongoDB. With compression and chunking arctic gives query performance orders of magnitude better than commercial (and open source) dedicated time series databases. We ingest 800M ticks per day, and read data at millions of rows per second (in pure Python). Our aim is to efficiently ship data to cheap compute, rather than run all computation on expensive (in software/hardware terms) dedicated database servers.
The talk explores the solution space of existing time series data stores, and the route we’ve taken to build a simple library with a beautiful API for numeric data storage.
How simple ideas can be used to build a high throughput time series processing system using off-the-shelf open source components, and how important the data model is when scaling a system.
James is the lead technologist on the market data platform at Man AHL. Recently he’s been involved in open sourcing AHL’s time series database: Arctic.Back to top
The entire world of IT is shifting, and the job of database administration is rapidly losing relevancy. In this talk, Laine discusses the job and role of the Database Engineer, their place in the worlds of reliability engineering and devops, and the skills needed to stay ahead of the curve. By the end of this, you should have a feel for what is required of today’s database engineer, and a path to get there.
The career path for today’s DBA, what needs to be developed, focused and broadened. What paradigms in IT are shifting the job of the database engineer.
Laine is currently the CTO of OrderWithMe, formerly AVP of Pythian’s open-source database practice, CEO and co-founder of Blackbird, and a founder of PalominoDB. Laine has been an Oracle, MySQL and Cassandra DBA architect and designer for 16 years with such organizations as Obama for America, Travelocity, Zappos, Chegg, LiveJournal, Disney Mobile, and Adobe. Laine is also an open-source proponent, and advocate for bringing technology, job opportunities, and privileges to underserved populations.
Laine is co-author of O’Reilly Media’s Databases at Scale.Back to top
The database landscape has changed a lot in the recent years. The NoSQL movement has taken the world by storm and you may wonder if there is still room for relational databases. In this talk we will learn about the strengths that make PostgreSQL more relevant than ever. We’ll survey its architecture, availability tradeoffs, durability with one or more servers, and yes even SQL!
From this talk you will learn to appreciate how versatile SQL really is and the many use-cases it can be applied to and solve elegantly. You will also be introduced to the basics for PostgreSQL High Availability and Scaling, a must-know in this time of high availability.
Dimitri is a PostgreSQL Major Contributor (Extensions, Event Triggers, Bi Directional Replication). Dimitri also develops pgloader and other PostgreSQL related software.Back to top
Writing to a database is easy, but getting the data out again is surprisingly hard.
Of course, if you just want to query the database and get some results, that’s fine. But what if you want a copy of your database contents in some other system — for example, to make it searchable in Elasticsearch, or to pre-fill caches so that they’re nice and fast, or to load it into a data warehouse for analytics, or if you want to migrate to a different database technology?
As the data is constantly changing, a one-off snapshot of the database is not enough: you need to tap into the ongoing stream of writes to the database. This technique is called Change Data Capture (CDC). At companies like LinkedIn and Facebook, this is how caches and indexes are kept up-to-date.
This talk explains why change data capture is so useful, and how it prevents race conditions and other ugly problems. Martin will explore the practical details of implementing CDC with PostgreSQL and Apache Kafka, and discuss the approaches you can use to do the same with various other databases.
How you can use change data capture to reliably keep several databases, indexes and caches in sync.
Martin co-founded Rapportive, worked on scalable data systems at LinkedIn, and is writing an O’Reilly book on Data-Intensive ApplicationsBack to top
Upgrading databases can be terrifying and perilous, and for good reason: you can totally screw yourself! Every workload is unique and standardised test suites will never give you enough information to evaluate how an upgrade will perform for your query set. We will talk about how paranoid you should be about various types of workloads and upgrades, how to balance risk vs engineering effort, and how to safely execute the most challenging upgrades by capturing and replaying real production workloads. The principles apply to any db, but we’ll go particularly deep into war stories and tooling options for MongoDB and MySQL.
You’ll learn how to evaluate the riskiness of any db upgrade and migration, as well as how to realistically assess your organisational appetite for risk. You will also learn about how to gain confidence really scary, high-risk upgrades using strategies like shadowing production traffic or capturing and replaying workloads offline.
Charity is an Engineering Manager at Parse/Facebook, the best way to build great mobile apps. She loves whiskey and taming chaos.Back to top
As Uber scales up fast, we’ve run into problems keeping all of our databases working reliably. Our solution has been to adopt Chaos Monkey-style failure testing for all production systems, even databases. This talk will cover our experience with production database failures, failure testing, and the fault-tolerant systems we are building to resist these failures.
After this talk, you’ll be able to better assess the risk from the different failure modes of databases in your system’s architecture.
Matt works on architecture, performance, and distributed systems at Uber. Before Uber, he was co-founder of Voxer.Back to top
When considering the next improvement in your infrastructure, it’s easy to forget that significant business value can be created by making unglamorous changes that introduce no new technologies or involve potentially hazardous upgrades. This talk is the story of a database migration of Intercom’s customer data stored in MongoDB databases from a third party provider to self-managed infrastructure hosted inside Amazon EC2, and shows how a relatively drab data migration significantly improved both our bottom line and security posture – both of which definitely matter.
Along with some low-level MongoDB details, you’ll gain insight into prioritising and executing work to maximise business impact.
Brian Scanlan is an engineer with Intercom, based out of Dublin, Ireland. He works on on their platform team, processing user data and building and operating Intercom’s APIs, SDKs and integrations. He tends to work somewhere in the overlap of systems engineering, software development and fixing large scale outages.Back to top
Bloomberg LP began development 11 years ago on an internally used database system called Comdb2. In this talk, Alex Scotti, the original author and head of this project will give insight into decisions that led us down this path at the time, and lessons we have learned going through this effort. We will discuss the changes in the landscape of database products that occurred in the background during this time and see if the problems we initially set out to solve are no longer such hard problems.
From this talk, you will gain insight into the high level architecture of Bloomberg’s database system, and the specific types of problems Bloomberg faced (and still does) which makes custom implementation a viable strategy.
Alex Scotti is the original architect and programmer of the Bloomberg’s proprietary Comdb2 database system.Back to top
In this talk Rongrong Zhong will give a brief introduction to the MySQL deployment at Facebook, what problems they in order solved to run their database at large scale and how they achieved this and some improvements they contributed to the MySQL project. Rongrong will also cover case studies about things that seem simple at small scale but bring interesting problems as the scale grows.
How to scale up and manage your database deployments, as well as potential problems when running a system at large scale that you should watch out for.
Rongrong works on the WebScaleSQL team to improve storage efficiency, and solve other problems to make MySQL more manageable and performant at Facebook.Back to top