Principal (Senior) Engineer
The Explanation Company is hiring a Principal Engineer who can also act as a Site Reliability Engineer until the day that we hire a dedicated SRE. We’re a small team so it’s important that we hire people who can play multiple roles. We’re looking for someone to bring deep experience on planning for scale, diagnosing performance issues, and site security—while still being able to contribute code, both for new features & refactoring, when appropriate. Our backend language is Ruby on Rails so we’re looking for someone with this experience, but part of scaling may be implementing key features in other languages. This is a remote role joining a fully remote team.
We’re building an internet app for children that enables them to video call their friends and search the web. There are a billion children in the world who now have an internet device and the existing communication & browser apps don’t work well for kids. This is an opportunity to scale a new video calling app from the ground up with an experienced team who knows how to build amazing products. You’ll be joining at the fun moment where our product is getting traction with users and we’ve started to experience growing pains as things scale up.
About The Work
In this role, you’ll be working alongside three other senior engineers building a mobile app in React Native with a heavy server component. All three engineers have deep experience as full stack product engineers, working across web and mobile development. Our engineers have well organized code, a solid continuous integration pipeline, and are mindful about slow queries and excessive API calls. You are not stepping into a mess that needs cleaning up. Instead, we’re looking to add someone to the team who can help with the next layer of performance optimization and reliability. In this role you’ll take the lead on diagnosing contention issues within the database, optimizing HTTP request routing to avoid request queuing, and figuring out a failover strategy to avoid downtime when an availability zone goes down.
A big part of your job is considering and mitigating all the things that could go wrong before they do, so that our system operations remains as boring as possible. We know that making it so that systems just run is challenging work that can benefit from deep experience in the domain.
Part of the work includes being available for on-call duty, for those times when something we didn’t anticipate happens. With good planning, it will be rare that you’re woken up in the middle of the night. But the peak usage of our product is on evenings and weekends, so when we do have issues they usually occur during these hours. For this role it’s important that your work hours can flex depending on what we have going on and that you can often keep an eye on things, even when off work. This is not a good role for someone who needs highly regular hours and wants to fully unplug when off work. We’ll be working to stay one step ahead, but if things go well, there will be busy times.
To give you a clearer picture of what the job could entail, our server backend is an API-only Ruby app using Rails. Our frontend is React Native using Typescript and the Expo library. We’re currently hosted on Heroku with Postgres and Redis. We use Docker for key systems such as our GitHub Actions CI pipeline.
Here are real examples of work we’re prioritizing in the near future:
Dig into NewRelic (or your preferred tool) to come up with a plan for keeping 95% of API calls under 500 ms and 99% of our API calls under 1 second. Also, develop a good understanding of the outliers and why they occur.
Figure out why some some of our queries occasionally take a surprisingly long time to return, even though they’re fast 99% of the time and we’ve covered all the basics (e.g. SQL EXPLAIN to confirm indexes, confirm indexes are loaded in ram, and no table locks).
Evaluate the capability of our video calling and texting API providers to confirm they can meet our capacity needs, reliability goals, and that we start including them in our capacity planning.
Develop a plan for spreading our traffic across multiple regions such as considering a master-master Postgresql setup.
Deciding when we should move to an on-prem servers that we control to avoid some of the complexity of building in the cloud.
Refactor our mobile push notifications to include payload data in order to reduce API calls we make to our server. Currently, we send a push notification to a user, they open our app and we query our server to retrieve data. This is an example of a task which would touch backend Rails, front-end React Native code to receive the push payload and proactively update a front-end data store.
Architect and implement an internal library for sending lots of SMS messages. We will need to send periodic blasts to a large number of people. This would be implemented in Rails, most likely on top of Sidekiq for queuing and Postgresql for message history. We also need an internal interface for other parts of the application to send & enable replies to one-off SMS as an interactive experience. For example, kids might take an action in the app, we text their parents to say, "Jonny wants to do video call X, please reply Yes if this okay?"
You’ve also had years of your career where you were primarily focused on scaling systems and figuring out how to work around old architecture decisions that are hard to undo. You’ve done basic load testing and security audits—and know when to bring in more experienced people. You’re comfortable setting up a dedicated server from scratch and managing it in a production environment.
You’re comfortable with remote work since this is a remote role. Our full team of six-people are spread across 4 timezones in the United States and Canada. To make sure time zones overlap with our team, we’re looking to hire someone based in the continental US or Canada.
You’re someone who is a strong communicator. You are good at talking through problems verbally with teammates, explaining a complex idea. And you’re also a clear and concise writer, both in long-form proposals and in code.
At The Explanation Company, we are setting out to tackle a big challenge: we’re building out the missing internet tools for children. As a grown-up, it’s easy to take for granted that whenever you wonder something or want to ask a friend a question, you pull out your phone and have an answer in seconds. Children who are 10 years old and younger have been left behind by this and we are going to fix this. We’re making it possible for children to independently look up anything they're curious about and to independently communicate with friends.
We're a well-funded startup backed by one of the best investment funds so we have the runway to pull this off, but we're early in building the team and shaping our culture.
Our culture is fast paced, focused on getting things done. We make big plans, identify the core assumptions in those plans, and take small steps quickly to try and validate those assumptions. The amount of work we need to get done with the small team we have is a little nuts, so we have to find clever shortcuts to pull this off. If you're looking for work-life balance, this isn't the place. At this early stage you'll have a big scope of responsibility, but it also means that things will come up at odd hours and you'll be the go-to person for those.
Our culture is one where the fun of the work comes from solving hard problems with amazingly talented colleagues. If your core motivation is to "work on something that's good for the world", this is not the place for that. That's not our lens so it would be better to find an education or healthcare startup. We are a proud for-profit company, although with a very-long term mindset. Our goal is to reach as many children as we can, while generating a healthy revenue in the process and building a great business, as our means to maximize impact.
You can expect to find teammates who trust you and support you, so you can do your best work. You’ll be given a high degree of autonomy and be expected to figure things out, both when focused on long-term projects and when ad hoc requests come up.
As a team, we are eager to learn about things we don’t understand. We challenge each other's ideas, but we do so respectfully. We have a bias for action, so we’re quick to try an idea rather than spend too much time debating it in the abstract. We appreciate people who have strong opinions and share those, but we also commit to moving forward after a decision is made even if everyone does not agree. As a company, we avoid having political conversations internally and do not weigh in on politics publicly, unless directly related to our business. We’re looking for teammates who are here to do great work on building great products for kids, and who reserve advocating for societal change for outside of the workplace.
A little background on some of your key colleagues:
Keith Schacht, our CEO
Keith was the co-founder and CEO of Mystery Science which sold to Discovery Education in 2020. Prior to this, he was a product manager at Facebook, first leading News Feed and then leading Messenger. Keith is an experienced entrepreneur who has started and had exits for multiple companies. As a colleague, he's an engineer involved in a lot of key architecture decisions and occasionally steps in as a designer and growth hacker.
David Vinca, our President
David leads operations at the company. He was previously the founder and CEO of eSpark Learning, a software company that helps elementary school kids learn reading and math. eSpark grew to serve 1 out of every 4 elementary schools in the United States. Prior to this he was a management consultant. David has two kids, Devin and Nyla who enthusiastically use the company’s product.
Anand Chhatpar, our Founding Engineer
Anand is a full-stack product engineer. He was previously at Mystery Science as the growth lead. There, he was instrumental in developing core aspects of the company’s strategy for acquiring customers which contributed to it becoming the most widely used science resource in schools across the country. Prior to Mystery, Anand founded three companies including a consumer app company that acquired 20M users. He's a generalist engineer with a special talent for quick solutions to problems and finding creative ways for a product to spread.
Nick Bonatsakis, our Founding Engineer
Nick is also a full-stack product engineer who specializes on the mobile side. He previously founded a company building a collection of iOS and Android apps which acquired more than a million installs. He was also an early engineer at Brightcove with experience in video and audio. Nick has a strong product sense and with young children of his own, he's well-calibrated on kids.
If you're intrigued by this opportunity, we'd love to hear from you!