Future

FounderQuest

Technical Debt - Our Approach to Building Cool Tech Profitably

Full Transcript:

Ben:                If I start selling my kidneys, I'll let you know.

Starr:              Okay, that's cool. Do I get a friends and family discount?

Announcer:          You are in a maze of twisty little passages, all alike. Time to start a fire. Crack open a can of Tab, and settle in for Founder Quest.

Starr:              I thought originally that this podcast would be called something like Hard Technical Decisions, which sounds pretty hitting to me. Like, I like that, cause I think of myself as a dyed in the wool realist.

Josh:               Mm-hmm (affirmative)-

Starr:              Who's unafraid to face hard facts, you know? But, then we were talking about it in Slack and it turns out that some of the things I was originally going to highlight, as tough Hard Technical Decisions, were mistakes. Rather, were actually forms of technical debt that we kind of took on on purpose, that we knew what we were getting into. Maybe not exactly to what depth we were getting into them,. But, we knew that something was happening.

Starr:              And, so, this week's topic has kind of blossomed into something a little bit more interesting I think. So, yeah, personally, running a business and being an engineer means there's this sort of constant struggle between the engineer in me and the business man in me. But, what I mean by that is that there's this constant desire to want to do things right, the engineering way. But, then, you always have to trade that off between, what is the return on investment?

Starr:              Like, what are my business outcomes that I'm trying to achieve by doing this engineering.

Josh:               The engineering in you wants to achieve technical perfection but the business person in you wants to make money.

Starr:              Yeah. Exactly. So, I guess maybe we should go into like, what is technical debt. Let's talk a little bit about technical debt and stuff like this in general and then maybe we can go onto some specifics about our

Ben:                When I think of technical debt, the first phrase that comes to my mind is: "it seemed like a good idea at the time."

Ben:                Right? I think it's those things that you do with good intentions that just over time, didn't continue to scale, which is just a normal outgrowth of scaling. Or, over time became an obviously bad decision based on new information. So, you just have to change your mind and go back and fix it.

Josh:               Yeah. I tend to also throw maintenance costs in there. There's extra things that you have to do that come with the technical decisions that you have to make. And so, like, things like putting off, deferring some things like maintenance costs. For instance, like on Rails, like a Rails upgrade for instance.

Josh:               I know you can kind of get behind on those, or like push them down the road, and those can build up like a large overhead that you have to think about all at once, which to me is technical debt. And, there's all kinds of maintenance costs, I think, associated with software or infrastructure.

Starr:              That's interesting because my take on technical debt is maybe a little more specific? I've always considered technical debt as a way to bide time with shitty code. Right?

Starr:              When we first launched Honeybadger, I felt that the market was super rife for a competitor in the space. So, I felt like we really needed to ship something out very fast. And, as a result we made some decisions that made us able to get to market much more quickly than we would have otherwise. But then maybe, a year or two later, we came to regret those decisions. Maybe we didn't really regret them. Maybe we just had to come back and clean them up a little bit.

Starr:              So, one thing that we did, that I think falls definitely into the category of technical debt is that when we started ... Well, our service for people who don't know, is an exception monitoring service, right? We have a little snippet ... well, it's not a snippet, it's a library, that goes into your application. And, it sends us information whenever errors happen.

Starr:              And, what we did when we first started out, is that we actually kind of you know used the library of the main competitor, which was totally legal. Because, it was MIT licensed. And, we always knew we were going to replace this with our own library. And, we got a little bit of flack for it.

Starr:              But, in the end, I think it was sort of the right decision. What do you guys think about that decision?

Josh:               I think it definitely bought us some time of not having to like figure out or reinvent that wheel, basically, because it was a pretty well-established pattern. And, if you look at those same libraries, today, of everyone who does this, basically, they're pretty much all doing ... they're all basically copies of each other. They're all basically doing the exact same thing. It's a pretty well-established pattern of code.

Josh:               So, it definitely helped us get to market quicker. And, as you said, it's MIT licensed. SO, we included attribution and all that stuff.

Starr:              I suppose we should say it was MIT licensed before we did that.

Josh:               Yeah.

Starr:              And, then they changed the license. Which I don't really blame them for. But, it took a while. Josh, you were the one who was in charge of version 2 of the [gem] that was 100% developed by you. How long did that take? That took a while, didn't it?

Josh:               It took a number of months. It was a not a small project. I know by the time we got to that point obviously there were a number of reasons we wanted to re-write that code, or re-implement it. We wanted to kind of custom-tailor some things to our particular service. By that time, we had made enough decisions with the service, where we knew where we wanted to go and we could bring that to the client side and it made sense to re-do it. And, it also gave us the opportunity to re-think some of the decisions that we didn't get to make as a result of using that code from someone else up front.

Josh:               So, that probably added a little time to the project. But, I think, overall, it was useful to us.

Starr:              Yeah. So, it saved us several months maybe coming out of the gate.

Josh:               I think it would have been ... I mean we, obviously, wouldn't have started with probably all the features that came with it initially, either. So, I think that a lot of the time it probably saved us ... it would have been incrementally over the first year or two of building the product that we didn't have to think about building in or coming up with how this thing should work. It just kind of worked out of the box initially anyway.

Starr:              We no longer support the original notifier. But, how long did we keep support of that in our application? Do we still have that?

Josh:               Probably too long. I think we still have some code for it. I don't know that anyone's actually using it.

Starr:              Like, if I sent an error payload from like 2012, would it work?

Josh:               That's a ... I don't know. What do you think, Ben?

Ben:                There's a good chance it would actually work, yeah.

Starr:              That's crazy.

Ben:                Yeah, I think we do have the ... the way we supported it was we basically implemented the same API server-side and then the [GEM] just change the host name that it connects to and it works.

Starr:              Yeah, and I think you became really well acquainted with some of the problems with this original library, which started out, I guess, as a much simpler thing. All legacy software grows and becomes more complicated. And, it's always kind of nice, and it's a ...

Starr:              I guess engineers always want to re-write their big thing. And, you kind of got a chance to do that. Because, we started with something that we knew we were going to have to re-write in the beginning.

Ben:                I like what you said there about technical debt, from the perspective that it's a shortcut that we choose to make that saves us some time. You know? But, one of those examples, I think, that from our experience of, that [I] was talking about, like, it's a good idea at the time, was when we started off with Backbone and we had ...

Starr:              Oh, you're not going to remind me of this, Ben. This is ... I've blocked this out.

Ben:                But, those were fun times. Like, everyone at the time, [inaudible 00:07:44], of course it was like the new hotness to build a single-page app. Like, React wasn't a thing yet, as far as I know. It was a brave new world. Like, let's try this new thing. And, so, we ... you, Starr, actually, because you did all the JavaScript.

Starr:              I'll own it. I'll own it. Sure.

Starr:              I make mistakes, Ben. I'm not perfect. Yeah.

Josh:               Well, and as developers, we don't want to fall behind the times. We want to be able to learn and grow. And, stay current and all that. So ... I'm sure it was a fun frontier for you to explore.

Ben:                Yeah. Well, that' the point I was trying to make. It wasn't a mistake. Right? It was a good idea at the time. Like, it sounded like a good idea because it was fun to build, and it was what other people we were doing and it was a good experiment. And, it worked. It's just, over time, we decided, "it's not working so well for us." And, we had to go back and replace that.

Starr:              Oh, see. I'm much more harsher in my ... I'm much harsher in my evaluation of my decision there.

Starr:              I think it did sound like a good idea at the time. But, it was the wrong decision.

Josh:               I tend to agree. I think that for us it was the wrong decision. Just because what we know now of the extra overhead that it takes to maintain. Especially, at the time, like, Backbone was not ... it had a lot to workout that today's frameworks have solved, I think. So, there was a lot to struggle with, in addition. But, you know, that, compared to if we had just gone with basically vanilla JavaScript in the Rails way, which isn't flashy, but works year after year. I think that's what would have made it a better decision for us to, kind of, go the more standard approach just because we're a small team and everyone knows.

Starr:              I mean, maybe, I should backup a little and give a little perspective for people who are listening. So, when we first started out, we wanted a really awesome application and so we chose to build it as a single-page application using this library, Backbone.js. This was before React. This was before, you know, Angular view. Any of the modern front-end libraries.

Starr:              So, the problem with building our application as a single-page application, using Backbone, wasn't necessarily that it was hard to build, using Backbone. The problem was that we had built this system which only one person, me, knew how to, sort of, use and work with.

Starr:              And, so as time went on, I had to move onto other things. Because we're three developers, we're all doing a lot of different things. We all have to be able to interact with all parts of the system. And, it became just more and more obvious that it was very hard for everybody to be able to work with this front-end application code that I had built. Because, it was so complex.

Starr:              It was like its own separate application. Having a separate, single-page application was incredibly difficult to maintain for three Full Stack developers and no independent, full-time front-end guys. So, that was our main mistake. So, I guess, maybe, this one canceled out whatever time we gained from using the Airbrake [Gem 00:10:41].

Josh:               It probably did. Could I just make one point? And, I might, probably get on a soapbox here, so I'll try not to.

Starr:              That's fine. Do it. Do it.

Josh:               So, if we went Backbone ... if you look today, at the number of people that are building Backbone applications how many would you say are doing that?

Starr:              New Backbone applications?

Josh:               Exactly.

Starr:              Nobody. Nobody.

Josh:               They're like ... I mean, it seems to me that anybody who's building a Backbone application back then is probably ported to a different framework entirely, I would guess. Like, React or something like that.

Josh:               You know, if you're going to use the latest, hot, front end thing, there's a good chance that you're going to want to use the next hot thing that comes along to replace it.

Starr:              Okay. Josh, now, now, now. We've got to fight about this, right?

Starr:              Backbone was the way, right? Backbone was the way. It was the way to write front-end apps. And, it turned out we were just wrong, you know? And, then, it turned out that Angular was the way, right?

Josh:               Right.

Starr:              [crosstalk] So, nobody's perfect. Nobody's perfect. So, still wrong.

Starr:              Now, React. But,

Josh:               I'm pretty sure that React is definitely the way, though. I mean, we can all agree on that.

Starr:              It's yeah, 100%.

Josh:               Meanwhile, JavaScript and Rails still works. You know, it's not super fancy.

Starr:              Maybe if you're an old man, Josh.

Josh:               Maybe I am an old man, Starr.

Starr:              I'm older than you, so.

Josh:               Well, maybe we're both old men, then. In any case, Rails and JavaScript has treated us pretty well. You know? I think we could all agree that it's treated us pretty well over the years, in that, we all know it. It's all basic ... it's built on basic web technologies that don't really have a whole lot of overhead. There's a little bit. But, comparatively, it's done pretty well for us, I think.

Starr:              And, so, the punchline for this, the end result, is that I went back. I ported the whole single-page application into sort of a vanilla Rails application with ... it's got a pretty extensive JavaScript layout on top of it. But, for the most part, it works without JavaScript. I think that's the perfect thing for us because it's easy for us to maintain. It's easy for us to understand and onboard people who need to work on it.

Starr:              And, frankly, I don't think our application is a very good fit for a single-page application. Why? Because, well, a single-page in our application has a huge amount of data that we have to go in and sort of extract and format and everything. And, just sending that over the wire and having that happen front-end is just kind of weird to me. You know?

Josh:               Yeah, and the other thing that I've told people in the past is that the usage pattern of how people like to interact with Honeybadger is ... our strategy is to keep people out of Honeybadger for the most part. In that, it's issue-based. So, if they get a notification that an error has happened, then they want to come in and look at Honeybadger to see what the error is and debug it.

Josh:               Ideally, they're not going to have to be, like, in there all the time. Hopefully, they're going to be able to be working on code and deploying features and that sort of thing. So, it's not the sort of app that people have open in a tab, 24/7, or whatever.

Josh:               So, what they're doing is usually clicking on a link from a notification and landing on Honeybadger and if you're putting all of your ... if all of your performance benefits are being loaded up front ... So, with a single-page app, a lot of the time, you'll load a lot of the application logic up front. And, if you're in the app for a long time, it seems faster.

Ben:                Yeah, people get in and get out. Instead of leading up all this data and this supporting structure that a single-page app requires. We just load the page they want to view, the data they want to see, and then they move on with their life.

Ben:                And, on that note, on the data thing, I think. Maybe a better example to make my point: that would be like Postgres, right? We decided, early on, to go with Postgres as our data store. It's been a great decision. We're still using Postgres. But, we've changed how we use it over time, right?

Ben:                Because, initially, we stored everything in Postgres.

Starr:              Yeah. We just had a basic Rails [CRUD] app, right? Our highly data-intensive application where we handle thousand of inbound errors and everything. Originally, our first version of that was a Rails CRUD app with everything ... all the default Postgres settings, pretty much everything vanilla, right?

Josh:               Mm-hmm (affirmative).

Ben:                Yeah, we didn't even have a queue in the beginning. We took a request from a payload. We put it right in the database.

Josh:               Straight to the database.

Ben:                But, you know, over time that didn't scale. And, so we had to make changes. We had to add a queue. And, then overtime we decided, you know what, maybe putting these 10 megabytes payloads inside a database isn't a good idea anymore. And, so we split that out to S3, right?

Ben:                Again, we just stored our reference to that, in Postgres. [crosstalk]

Starr:              How big was our database? I remember it got pretty big.

Ben:                It was about 2 terabytes.

Starr:              Yeah, that was right after I did a Rails comp talk about scaling up your database.

Ben:                Yeah.

Starr:              Like, "Everybody talks about, you know, scaling out your database and moving things onto different servers. But, you know. Just buy a bigger database. It's cool."

Ben:                Which we did, a few times.

Starr:              Yeah, we did that a couple times.

Ben:                And, then, I think the real breaking point for me was when I realized it would take a full 24 hours to re-store a database to production if we had to do a whole restore. I'm like, "You know what? We need to do something about that."

Josh:               You know, doing that, splitting out those larger payloads has allowed us to maintain that scaling strategy for Postgres for the most part. We're still ... we're not sharding or anything like that. We basically are just buying a bigger database. It's just that our database happens to be smaller.

Starr:              Yeah, and one other thing that ... I think I pushed for this, too, in the beginning I didn't really want to be on AWS. Because at the time, we launched Honeybadger there were all these really public AWS outages and it seemed kind of sketchy to me. They hadn't launched this entire ecosystem of cloud services. It was pretty much just EC2. So, we decided to go with a dedicated server, or several dedicated servers, at a host that ...

Starr:              I don't know, can we legally name them?

Ben:                No, let's not. Because, it wouldn't be nice.

Starr:              Okay, if we legally name them, we're going to get sued. Because, they were so incredibly terrible.

Ben:                You know, I think that was a great decision early on. Because, A: it was cheap. B: it was easy to get started and they did support our needs for quite a while. It was really only once we scaled a bit that we discovered the limits of their service, right.

Ben:                They were great for a while. Until, they weren't. So. I mean, it [crosstalk]

Josh:               They were pretty bad.

Ben:                Yeah. At the end, there, it was like "Ugh!", pulling teeth and stuff. But, it wasn't a universally bad experience.

Starr:              No, but it took us forever to get off of them. We eventually moved everything over to EC2. We use a lot of AWS services, along with it.

Starr:              And, how long did it take you, Ben? Like, that took a really long time.

Ben:                Yeah. I mean, I think several months. I think the thing about AWS is everything has to be automated. When you're using EC2, those instances can disappear at any time. And, the one good thing about that hosting company is that those servers just did disappear. Right? They were rock solid. They lasted forever. So, like, automation isn't so important if you know your server's going to be there. But,

Starr:              They disappeared occasionally for a few seconds at a time.

Ben:                Yeah.

Starr:              They were still there. But, we just couldn't see them.

Ben:                So, I mean, it took definitely a bit of work to move to AWS. So, if we were to do it again, yeah, the math might be a little different. Like, knowing how long, knowing how much work it took to actually make that change. Having done that up front, it would have been nice. But, then, how long would that have delayed our launch? That's the question.

Starr:              Do you think it would have delayed our launch to launch on like EC2 or AWS[crosstalk 00:18:37]

Ben:                No, because we probably would have done it naively. Like, "Let's just throw some instances. It's fine." And, then like one day, they would have disappeared and we'd be like, "Oh, crud. It's not fine anymore."

Starr:              Yeah. I think that's exactly what we would have done. That does sound like us.

Starr:              Did I mention that we've learned a few things?

Starr:              So, one of the benefits of AWS for us now is that we have this super awesome scaling set up so that if we have more notices come in then we can handle. We just spin out more servers. And, it all just kind of happens automatically.

Starr:              Yeah. One of the most intense evenings of my life was where we're in the middle of all these ... it was some sort of scaling problem with our old servers, our old dedicated servers. And, I was at some conference that night. I couldn't sleep. I eventually finally got to sleep. I was woken up by an alarm call from my home alarm company, saying somebody was trying to break into my house.

Starr:              So, then, I call my wife, who's at the house. And, we're freaking out and she's trying to figure out what's going on. And, then, okay. So, that's done and I go back to sleep. And, then I get a page. Because, I was on call. I get paged by our pager duty set up because the database is acting weird or something. And, then, okay. We have to get that done. And, then, an hour later, I go back to sleep for an hour. I get up and I have to give a presentation.

Starr:              I'm like the first person in the morning. And, on my personal timezone, it's like 4:00 a.m.

Josh:               So, that's my absolute nightmare.

Starr:              And, it turned out okay. The presentation went okay I think. But, man, that was pretty stressful.

Ben:                One of the lessons learned from that is that now we don't assign on call when you're traveling for a conference. Right?

Josh:               Right. I don't know if I've heard that ... knew it was that bad. But, sorry, Starr.

Starr:              It's okay. It wasn't like that all the time. It just was this perfect storm of ...

Ben:                We've definitely had the benefits from the automation that we had to do for Amazon. So, now we don't get pages in the middle of the night when we're traveling on a conference, right. It just handles itself.

Starr:              We don't hardly get any pages. At all. It just works.

Ben:                A+. Would do again.

Josh:               In order to get to this point, though, obviously we didn't have ... we weren't to this point back then or we would have built it like this in the first place. And, so, what we have today has come out of a lot of the mistakes and the experience that we've gained from doing things the wrong way. And so, I assume that we would have probably made our share of mistakes on any platform that we would have started on and we'd still be here talking about what we did wrong and what we wished we would have done in the beginning.

Ben:                Yeah. Well, you know, I think part of it comes down to, what is the pragmatic thing that gets the job done today versus what is the ideal solution that I would love to have for the next ten years.

Ben:                I mean, when we started, it was really one server. It cost us $75/month. Boom. We launched.

Ben:                Because, we had ...

Josh:               Yeah. And that was our entire infrastructure cost.

Ben:                Yeah. Totally.

Josh:               We had no idea, right?

Starr:              Now, we spend, what, like six figures on servers.

Ben:                Yeah. But, you know at the time when we launched, we had no idea if people would actually pay us for this server. We were just like, "We don't know. Will this work? Let's do the cheapest thing that could possibly work, right?"

Ben:                And, as soon as we started making $100/month we started talking about, "Okay, now, do we buy another server?"

Josh:               I remember thinking you were such a baller for dropping like $75/month on that server, Ben.

Ben:                Good times. Good memories.

Josh:               I'm a big fan of the idea of building to solve your problem today not building to solve your problem of, like, five or ten years from now or something. I think that goes into what you were saying abut taking the pragmatic approach. And, that's part of pragmatism, like you're not solving for needs you don't have yet, basically.

Ben:                I was thinking about that approach when it comes to software development. Less from the outside and more from the software side. The concept of you aren't going to need it, right?

Ben:                It took me a while in my career to accept that particular mentality that you don't need to build in on all the extensibility from Day 1. You can just build the thing that works today. And, then rebuild it when you need to, if you need to, as opposed to spending a whole bunch of time up front trying to anticipate every potential variation that you have to deal with.

Josh:               That also has helped us focus on our customers because we didn't build a bunch of stuff that we thought they wanted. We waited and built things they specifically told us they wanted. So, if it doesn't do exactly what they want from Day 1, at least they can tell you what they want it to do, versus you kind of to making those decisions for them.

Starr:              So, yeah. In the end, y'all guys are saying that ... like how do you think we did? [crosstalk] Did we do okay?

Josh:               Yeah.

Ben:                Hey, we're still here right? We're still making ... we're still in business. So, yeah, I think did alright.

Josh:               Yeah. We're still learning. But, that's like we were saying that kind of goes with the territory.

Starr:              For the past several years, we've sort of kicked around the idea of doing another product. I think this is going to be maybe something that we have to watch ourselves on, if we ever do develop another product. That we want to do everything right. We want to apply all the lessons that we learned while building Honeybadger. But, we also maybe need to realize that we're not going to be able to get everything right from the beginning and that it's okay to do things with ... to take on a little bit of technical debt at the beginning.

Starr:              Alright, well. It was really talking with you guys about the whole technical debt thing. And, I'll catch you next week and we can talk about something else that sounds [crosstalk] cool and fun.

Announcer:          Founder Quest is a weekly podcast by the founders of Honeybadger. Zero instrumentation. 360 degree coverage of errors, outages and service degradations for your web-apps. If you have a web app, you need it. Available at Honeybadger.io. Want more from the founders? Go to Founderquestpodcast.com. That's one word. You can access our huge back-catalog or sign-up for our newsletter to get exclusive VIP content. Founder Quest is available on iTunes, Spotify and other purveyors of fine podcasts. We'll see you next week.

Episode source