Future

Serverless Chats

Episode #49: Things I Wish I Knew Before Migrating to the Cloud with Jared Short

About Jared Short
Jared has been building and operating serverless technologies in production at scale since 2015, and is laser focused on helping companies deliver business value with a serverless mindset. Jared is currently Senior Cloud Engineer, Developer Accelerator, at Trek10, Inc. but was formerly Head of Developer Experience and Relations at Serverless, Inc. and an early contributor to the Serverless Framework. In his current role, Jared's day-to-day is serverless all the time, as he helps people build and operate cloud native architectures.

Watch this episode on YouTube: https://youtu.be/rA4eVtpFnVs

Transcript:
Jeremy: Hi everyone, I'm Jeremy Daly and this is Serverless Chats. Today I'm chatting with Jared Short. Hey Jared, thanks for joining me.

Jared: Hey, pleasure to be here. Thanks.

Jeremy: So you are a Senior Cloud Engineer and Developer Accelerator at Trek10, Inc. So why don't you tell the listeners a little bit about your background and what you do at Trek10, Inc?

Jared: Sure. So my background, I think starts similar to a lot of people, where I dabbled in the basement on the all the Apple II, I learned how to program actually from a book from the library on that Apple II. And then throughout college... Well, high school and college kept keeping up with technology and building things and exploring and learning. And eventually that led me to kind of the cloud back in 2014 or so. I was big into Docker in the early days, in the cloud, and eventually found serverless while I was at Trek10.

So Trek10 is of course an AWS consulting partner. And as part of that, I get to help companies design and build serverless and cloud-native systems, with different kind of verticals all over the world. SaaS companies, enterprise companies, all of that kind of stuff. So that's where I'm at today. And I'm mostly focused on helping people learn and understand the cloud through our developer acceleration program. So taking all of those things that I've learned while helping people build things, and now helping people just learn what all they need to learn to build successfully in the cloud.

Jeremy: Awesome. Alright, well, so I've been following you for a very long time. I mean, you and I have known each other now for a while. Met up at a few conferences and so forth, and you always do great stuff. So I love the Trek10 blog, love the stuff that you've been working on. You've done a lot of stuff I know with Forrest Brazeal and some other things that have been very popular. There's a whole bunch of great stuff out there by you. So definitely search for Jared Short, serverless and go check out your stuff.

But I saw an article from you a couple of weeks ago. That was the three big things I wish I knew before I started working with AWS, or something like that. And that just struck a chord with me, because as I was reading through these things, I was like, "Oh man, this was the article I wish I had when I started working with the cloud way back in 2009." And since then, it's like exploded a thousand times over. So this is a great article and I'm going to put the link in the show notes, because I do want people to go read it. But I think it'd be awesome to just go through and talk about this article and kind of hit on some of these points.

The article is very in depth that goes deep into some of these things, but this is something that really warrants a conversation. So the first point that you made, the first learning or the thing that you wanted to that you wish you had known, was this idea that AWS is just this massive ecosystem and it's basically pretty much impossible to understand all of it.

Jared: Right. Yeah. It's a massive ecosystem that shows no signs of slowing down. It's pretty similar to the ever-expanding edge of the universe, it just keeps growing and consuming.

Jeremy: It's like, S3 was the big bang and then it just kept growing from that point. Right, right. So you point out a couple of things though about this that I thought was sort of really interesting. Where it's like, there are all of these different services and you had said, you could explain what most of these services do, at a high level. Like what is Amazon Sumerian or AWS Sumerian, who even knows the names of some of these things. You can explain that at a high level, but then understanding the nuances and the limits. And that's like a graduate level course in and of itself.

Jared: Yep. Yeah. Right. And in fact, the fact that I can't even tell you how they name Amazon versus AWS in front of something tells you a little bit of something. Right? I think I would guess it's Amazon Sumerian I have no idea. And the fact that I can tell you a little bit about this, I can tell you at a high level, what it does, is I think you have to know that in many situations, if you're an architect or someone building stuff on AWS. Because you need to know at least which tools I need to go read the docs on to understand if I need to use it, or it could be useful in my particular scenario.

What I can't do, with for instance SageMaker, I can tell you it's their machine learning product and things like that. I couldn't tell you what models are preexisting in SageMaker. I can't tell you what limits might apply to SageMaker endpoints that I've deployed. Things like that. If I were to need to build a machine learning product or have some feature for that, I know I could go look at that, and then I would have to learn those specifics.

And I think that applies to the vast majority of services that exist in AWS. You can certainly know what they do. You might not know how or why you should use them. But knowing the what for the core services, it's at least I think a starting point, right?

Jeremy: So one of the things you mentioned too, is that again, reading the docs, right? This is something that you've publicized on Twitter. And I think it's a brilliant idea, and if only we all had the time to do this. Where you take a different service and you read through all the documentation once a week, which is... I probably should be doing this too. But this idea of being able to read the docs and get a really good understanding of a single service. I mean, obviously there are hundreds of services, and even beyond that, I mean, there's sort of hundreds of sub-services, right?

And like things to understand and then the interconnectivity between them. So what's the suggestion there? Like, do you try to learn it all or do you just pick a few things that's going to work best for you?

Jared: Yeah, I think I would start at least in terms of consuming documentation. I always suggest people start with the stuff that's relevant to them right now that they're looking at, right, so Lambda or S3. S3 I think S3 is the most applicable service probably that exists in cloud today. But I would consume the documentation. Now it's important to realize that there's multiple kinds of documentation out there that exist AWS at varying levels for each service, right?

You kind of have your narrative documentation, which is the one that I think most people read where it kind of goes through, it explains the features, the services, how to use them. You have the technical documentation, which you only read if you're really intending to implement something or low-level API docs, things like that. I would consider the Boto docs, the AWS SDK for Node.js, things like that. And then you have the blog posts, the examples, the explainers, the how tos.

You have to, I think read and synthesize all three of those sources, to be able to construct in your head, a cohesive or nearing complete model of what that service can do for you. And that's what I try to do when I'm going out and reading the docs for a particular service, and that's extremely time-consuming process, right? Even just finding all of that documentation can take a couple hours to really build all of that out.

And I guess my suggestion to folks is like, you don't have to do that for every service, do it for the ones that you use regularly, do it for the one that might make the difference in your particular product. So for instance, if you're doing machine learning, I would go do that for SageMaker, to understand if it meets my use case. And if it doesn't necessarily meet my entire use case, what problems am I going to run into, or limits am I going to run into? And understanding those makes it easier to build to your use case.

So I guess ultimately look, nobody has the time to read all those docs. I get that. But I think the investment upfront is absolutely worth it for those core services. And you'll just have an easier time later on if you're willing to do that small upfront investment.

Jeremy: Right, now and one of the things about investing time in anything outside of actually programming or doing something that is maybe making money for the company you're working for, obviously is this learning time is a huge investment, right? So digging into some of these document... You know, the docs and going through like you said, three sets of docs, probably for every service that's out there. Plus other like non-specific AWS affiliated stuff where other people are writing examples and things like that, open source libraries. What effect does this extra learning have on not only developer productivity, but maybe on like team productivity in general?

Jared: So I think it's compounding. I think the more folks that you have spread across various services, you quickly establish subject matter experts, right? SMEs. Even inside of Trek10, for instance, we have something we called the SME matrix, where we all kind of go through and individually rank our familiarity and skill level with particular services or technology. And ours is on a scale of: have no idea to have multiple production systems I've built that are using that service, right?

Like three in the middle is I've read all of the documentation, I understand it, I've experimented with it. Finding people that are fives in stuff that's outside of a Trek10's core competency, right? You can expect if you were to put a five on Macy or something crazy like that, we would be going to that person right now and saying, "Okay, tell me everything you know." Right? And having those people on your team and kind of distributing knowledge of more niche services across your team is super valuable.

Being able to go to somebody and say, "Tell me about API Gateway WebSockets, because I didn't want to build a real time product. Having that person that's invested that time, that's actually built something is invaluable. The 30 hours, the 40 hours that they spent reading documentation or prototyping, something I think pays back exponential dividends over the course of history.

Jeremy: Right.

Jared: It takes time to make back that investment, which I understand.

Jeremy: Yeah. And I think this idea of SMEs is a really, really interesting concept when you look at an ecosystem like AWS. So you have mentioned in the past this idea of the T-Shaped Engineer, right?

Jared: Yep.

Jeremy: And I'd love for you to explain that because I think that, that might take the pressure off of some people thinking they need to learn everything as deeply as possible.

Jared: Yeah. So, so the concept of a T-Shaped Engineer is really go wide on many topics, right? If you're coming into the AWS ecosystem, you're not going to know a ton right off the bat, but it's valuable to go wide in a lot of topics and gain a wide breadth of topical knowledge. Understanding what Lambda is and does, and maybe how to deploy Lambda functions, SAM, serverless framework, things like that. Understanding this is how they work. This is what they do. Understanding EC2, understanding S3, Dynamo, RDS, all this stuff.

Understand a fairly minimal level, just what they do, how to use them, but then that's the top of your T. And then you have the vertical where that's your specialty. That's the thing you're really good at the thing that, you know the limits of DynamoDB On-Demand capacity will work for zero to X amount of spike, right? Zero to 12,000 will just work. If you go above that in the span of 60 seconds, you might have some throttling or something like that. Right?

Most people don't know that but having those people that do know that for their prospective services, and you as an engineer makes you invaluable in that particular vertical. And then having that topical knowledge, really just helps you understand what questions you need to be asking of people as you start investigating other areas.

Jeremy: Right. And that idea of going really deep on specific things. I think this is where, sort of the next point you brought up in the blog post was, is that understanding a service's use case versus sort of just the service itself or the baseline, like you said, those right questions to ask versus exactly how they work. That's another thing that you sort of pointed out is that for every service out there, there are probably a ton of different use cases. And if you just learn one service, then everything starts looking like a nail.

Jared: Right. Yeah. I think part of that is the consequence of AWS being very good at building service primitives. Right? You have SNS, SQS, Kinesis, DynamoDB streams, EventBridge. I can make them all do very similar things.

Jeremy: Right.

Jared: The question then becomes what are the constraints of my business systems or my business logic, and that's how I start picking the service that actually fits the best. Right? Does my event need to go to multiple people? Well, okay. Maybe that's SNS, SQS probably won't work anymore, because that's pretty much a one-to-one. Does my event or a thing that I'm publishing meet ordering guarantees. Well, if I need ordering guarantees, I'm pretty much stuck on Kinesis or something like that.

But those are the questions that going and asking those subject matter experts or things like that... Like, I wouldn't expect somebody to know that if I had strict ordering guarantees, that Kinesis is pretty much my only option. Unless I need SQS FIFO, which is now an option, right? That wasn't an option until very recently because it didn't have a native integration with Lambda.

Jeremy: Right.

Jared: That of course evolved, but...

Jeremy: But speaking of evolving, right? Things that change over time, I mean, that's one of the things that's nice about using any of these services that even if there are limits in place, even if there are other types of limitations that prevent you from using it for certain use cases, eventually over time, it seems like these products just keep getting better and new features are added and next thing, you know, you can use it for some other use case.

Jared: Yeah. And I mean, that's I think part of the beauty of building on the cloud. There's very few other instances or circumstances I can think of, where building some piece of technology and letting it sit for a year or two, means it gets better and cheaper without touching it. And we've legitimately had, at Trek10, systems that we've built, where we kind of just let it sit. We had built it, we might do some occasional dependency level package management, you know, addressing CVs or things like that.

But largely, we just left the architecture untouched and we would go and look at metrics. And over time the metrics just got better. It got more performant, it got cheaper over time and that's kind of weird. But that's one of the benefits of building, I would say, not just serverless, but cloud-native in general you know, Macy just got cheaper by some crazy number, things like that, right? If you're already using these technologies, they tend to get cheaper in most circumstances.

Jeremy: So the other thing, I think that's interesting, especially we put this in the context of things I wish I knew. These service limits exist and there's been a very, I don't want to say a... I guess, a dogma in a sense around this idea, "Oh, well, serverless is infinitely scalable. And if I use this service and I use that service, then I can just scale up whatever." But there are soft concurrency limits for Lambda, there's Kinesis shard limits, there's throughput limits for, you know DynamoDB and these other things.

So there are a lot of limits in place. And I guess my frustration around this is you really do need to go deep on some of these topics in order to understand what those limitations are. And then it really hurts when you hit those limitations in production. And you're like, "I didn't even know this existed." I mean, that might be on you, but is that something that cloud maybe needs to do a better job of? I mean, not just Amazon or AWS in particular, but other clouds, do they need to do a better job at managing these service limits for us?

Jared: I think it's fair to say that those limits are not put there maliciously. They're put there to stop people from doing silly things and protect, I think the broader ecosystem in the cloud. Right.? There can still be a noisy neighbor. Like if somebody would accidentally infinitely call a Lambda function, that recursively called itself, I feel like letting someone eat up the entire capacity of Lambda in a particular region, because we didn't have those established limits, would be a scary thought that somebody could do that either intentionally or on accident. But you have to pick some arbitrary number.

Jeremy: Right.

Jared: And I think the cloud... Many of the providers could do a better job at providing guide rails while still managing where those guide rails are. And I think in a lot of cases, they do do a pretty good job at that with certain services. And I'm of course speaking from the Amazon perspective, the AWS perspective, I can't speak too credibly to most of the other clouds. Once again, I kind of know what the other clouds do, but I know Amazon really well. But I think it's absolutely fair to say that the cloud providers could be doing more there. And I think that's just, we're all learning this stuff this as we go.

Jeremy: Right. We're like learning those limits as we go, sort of.

Jared: I would argue that the cloud providers are learning where that number is, right? The default limits for Lambda have been lifted, I think at least once, maybe twice. And default limits change as we're discovering. "Okay. It's most valid use cases probably can get up to this amount." And they do good jobs at, at least providing the support that you need, if your cases are scaling beyond those defaults. They don't do a terribly proactive job in my experience of some of that, but they do a very good job in most cases.

Jeremy: Right. Well, one of the things I like what you said there is that they didn't put those in place maliciously. And one of those reasons that they're there I think is to protect against costs, right?

Jared: Yeah.

Jeremy: And like out of control costs. This denial of wallet thing that has sort of become a standard saying, I guess, in the serverless community, I think is very, very real. Right? And that's actually the third thing you said you wish you knew, is that cost is just really, really hard to understand. And not only just understand what certain things cost, but that dynamic flexibility, like what effect that has on the procurement departments and the business decision makers that have to understand these costs.

Jared: Yeah. Yeah. I think there's a couple points to that and it's really hard to do. Like people like to do apples to apples comparison. Well, I can buy this particular server or some similar server that could run Lambda functions like this, or whatever, put it in my data center. And my capital expenditure is going to be like X, right? Versus Y for the cloud, and X is less than Y therefore clogged up. Right? I have yet to see an actual good calculation, because it's so darn hard to quantify, not just the server expense.

Sure. That's easy. Like apples, apples, you can get close sure, but the power, the electricity, the real estate, you can kind of get put numbers on that. Sure. You can get close, but then you say, "Okay, so what's the opportunity cost of someone having worked on building out that server or putting stuff on that server, patching that server, versus what features could they have built if they weren't working on that? Right? What's the opportunity business cost there? I have no idea. Right?

So I wish we could put numbers on that. And then I would imagine, I don't have any really anecdotal evidence, let alone empirical evidence, that shows that serverless is of course always cheaper.

Jeremy: Right. And I think you've got this thing too, where what I really like about the fine grain billing aspect of serverless, or just as the cloud in general. I mean if you're just using EC2 instances. You know, with pretty good certainty, if you measure the number of users that 20 of these servers can support, or how many invocations we have on a given month or given week, and the number of users that supports. Being able to take those numbers and break them down, and then being able to see, "Okay, every time a user is in our system and they do X, it costs this much money."

And knowing that even though that may be variable, I think is really handy to know. Because that's some of the stuff you would never know if you were either on-prem or even in some cases just using virtual machines.

Jared: Yeah. No, I think it's definitely a point I've heard actually over the past couple of weeks, in really interesting conversations with people that are trying to get essentially to this idea of zero unallocated costs within their bill. They know for every penny that passes through their billing framework, try to understand which tenant or which feature that that is allocated to, which is very cool. And it's I think, you know... That's a hard problem to solve.

Just, "Okay. AWS billing or any cloud billing," that's a hard problem to solve. Yeah, I will say one of the most interesting things I have seen, I think my record right now for crazy Lambda spend, which I wish we would have had some of those soft concurrency limits we had raised them for other testing in an account. But I think the highest I've seen was like $12,000, like an hour. Because of like an infinite recursion thing. And we're like, "Whoops." Right?

Now to cloud providers credit, most of them are pretty good about being like-

Jeremy: Right, yeah, we know you made a mistake, yeah.

Jared: No sane person would do that. We're sorry. Here's some money back. Right?

Jeremy: It would be nice for some of those other things, those out of control cost things, where even if the limits are raised, that there'd be something that would detect that and would be like, "Well, hang on. This probably isn't right." So then the other thing about this though, is we're talking about costs, right? Which is funny because if you think about most developers, they're writing code and they're uploading into a server or checking it into their code repository, and then somebody else takes care of it.

How much it costs to run that code is generally not a huge... It's not the main focus for a developer, but you shift to serverless. And then all of a sudden, it's like the choice between using Kinesis and using SQS, or SNS, where there's a big cost difference depending on what that scale is. Right? If you said, "Hey." Your boss comes to you and they say, "We need to make sure we do detection in our S3 data, to get rid of all of our credit cards or any PII in there."

And you're like, "Oh great. I'll just flip on Macy" $200,000 later, right? You're like, "Well, wait a minute, maybe we could have done this a better way." So how much should developers be thinking about costs now that they have so much control over the services that they use?

Jared: I think developers should be cost aware. It's interesting, there's kind of been this FinDevOps or whatever you want to call it. And I'm not sure that's necessarily the answer. I think developers should absolutely be cost aware, and at the very least be able to make cost effective decisions. Now that gets hard at scale, right? At a very low scale, Kinesis it's exponentially more expensive, right? It's like a per shard thing where if I'm sending one SQS message per month, or using one Kinesis shard per month, that's not comparable.

Now, if I'm sending thousands of messages per second. Kinesis is probably a much more interesting option, because there's additional features and functionality that also come out of that. So I don't want to say that developer's core job should be cost optimization or anything like that. I think in any sufficiently large organization, you approach the point where having those subject matter experts or architecture experts, or even dare I say, cost optimization experts or cloud economists.

Jeremy: Yeah, there you go.

Jared: Few of those, but I'd argue some of those. I think they are positioned to be able to help developers understand the cost impact of decisions they're making in their architecture. I wouldn't pin that all on the developers, I think that's unfair, but I think sufficiently large organizations should have those people in place.

Jeremy: So what about organizations though that maybe aren't that large? I mean, even small... I mean, I've worked in very small startups before where it's like... I mean, after I finished building the CICD pipeline and writing some new facial recognition thing, I go, and I empty the trash. I've been at that level of diversity of job tasks. But for maybe midsize companies that do have a little bit of a separation, especially a separation between the developers and the technical people, and then some of those business decision makers, is there a good way or an effective way that you know of that developers can communicate costs sort of up that chain to those business leaders?

Jared: I think you can at least take some of those usage costs and say at this order of magnitude, this is what this will cost, right? And you can kind of put some of those rough numbers together. Especially in this granular world of serverless and many of these managed services, you can at least get an order of magnitude and you can kind of get close with the numbers. You can say, if one user is using this, this is what it costs us, if 10,000 users or whatever that is.

But I would say what... And this is kind of like a little bit like the scream test, but it's also my general suggestion to use DynamoDB On-Demand capacity or reserve capacity is just keep using a thing until it hurts. And this is not great advice in the enterprise ecosystem necessarily, because no one's going to come to you and be like, "Why are we spending $10 million on S3?" And you're like, "Well, because we just put everything in normal storage and we're not glaciering or life cycling anything."

Jeremy: Right.

Jared: But you have a pretty good idea of what hurts your wallets. And if you go look and say, "Why does my DynamoDB cost 10 times what everything else is?" Well, we could probably go make a more cost-effective decision at this point. And the reason that I like the, just pay for this thing until it hurts, is it's a moving target, and it also depends on your company, right? For a startup spending, a $1,000 on DynamoDB, might hurt. And we can go figure out how we can solve that problem, for an enterprise spending a $100,000 on DynamoDB, might be perfectly fine. Right?

And the nice part about most cloud-native architectures is it does give developers the knobs and turns and buttons too, for the most part, pivot their architectures or adjust their architectures to cost optimize when they're ready to. You don't have to prematurely optimize in most cases.

Jeremy: Right. Yeah. Good point. Alright. So there were a few other things you mentioned this article too. So those were sort of the three big ones. But you had a couple of other ones you mentioned in here. Just things like good AWS account hygiene, what's that about?

Jared: Yeah. So one of the most frequent things that I'd say we as a consulting partner end up doing, we come into new engagements is, let's do a quick audit of what our AWS accounts landscape looks like. Are we all in one AWS account or do we have multiple accounts? Do we have each environment its own account? Does each product have its own suite of accounts? Things like that. And I would say there's different levels of maturity that organizations can pass through. There's different tools out there.

Organization formation is a pretty cool new one. There's of course Control Tower from AWS themselves who have recognized that this is a struggle. But I would say investing in your core AWS account landscape is probably one of the most core things that you can do to set yourself up for success later. And that's where I really... It's just important. It's so hard to reverse bad decisions.

Jeremy: Right, yeah. Especially once you have things in production, start moving things around, and re-pointing things.

Jared: I'd say it's your best utility for that security blast radius that you have.

Jeremy: Right, right. Good point. Alright, so you have another one in here: follow new products and services. I think that's pretty straight forward. Proof of concept early and often, that's another I think interesting thing. Just this idea of staying up to date with new products and trying something new and seeing if that can help. Also this idea of preparing for failure and regional outages, that's an entire podcast in and of itself. But another one that I thought was really good was this idea of having a plan to turn people into cloud-natives.

Jared: Yeah. I think I'm biased towards that because that's very much my day job. Is helping people accomplish that. But it's very difficult to succeed when you are adopting AWS. If you have as Forrest likes to put it, one of those cutely named like centralized cloud teams, like the Cumulus Nimbus team or something like that. If you're trying to disseminate knowledge... Or actually I should not say to disseminate knowledge, I should say disseminate architectural decisions and authority from a central point. It's very, very hard because you haven't won the respect of the people that are not on that team, to actually adopt and follow those practices.

Jeremy: Right.

Jared: They're going to say I'm familiar with EC2, and maybe I am familiar with Ansible or other of those solutions like that. From my data center, they work just fine in EC2, you can take your SAM template and just go shove it in an S3 bucket, I don't care. I'm going to use the things that I know work. And I think you really have to invest and train in folks that are helping you on your cloud journey. So you can't just do that with a centralized team, without having that hard-won knowledge, like pushed across your entire team, it's just very difficult to succeed.

Jeremy: Right, yeah. I think you need to get buy-in from people and it's easy for people to get comfortable. And you're right, if my Chef scripts and ops works and all this other things working just fine, why am I going to change over to something else? So I totally agree with that one. But that actually I think is another mindset that is very damaging for people that are trying to do these cloud journeys. And those are the people who think this runs this way in my data center, and basically Amazon or Google or Microsoft or whatever, one of these cloud, these public cloud providers, they're just a big data center, and so I'm just going to take my stuff, I'm going to lift it, I'm going to shift it into the cloud and everything's going to be perfect. You think that is a terrible idea?

Jared: Yeah. So I would say in my short tenure in doing some of these lift and shift things, about half a decade or so at this point in doing some of those practices or kind of being called in after that was attempted. It's never a short-term strategy. It always ends up being a long painful slog of, "Okay, can we make this very specially designed server work in EC2?" "Oh no, it turns out they don't have the correct processors to run this optimized code that we have." That's a very edge case example, but my goodness it never works.

You can do it short-term and spend a ridiculous amount of money doing so, and still not have what I would argue is, as good of a solution as you would take a little bit more time, a little bit more money up front and have a better solution long-term. And you can have it fast, you can have it cheap and you can have a good brand too.

Jeremy: Right. Right. Yeah. I mean, and that's one of those things too, with lift and shift where, I mean, I don't think you have to get everybody to embrace microservices. Right? You can build a lot of distributed monoliths if you need to do that. I mean, just switching over to something that already like RDS versus trying to run your own database cluster or any of those things, just starting to use more cloud-native services, I think is a huge step in the right direction, even if you're still running your application on EC2 instances.

But the one last thing I wanted to mention on that article, and you brought it up a little bit earlier, and I thought this was really good advice is especially for someone like me, I do a podcast, I write a newsletter. I try to keep my finger on the pulse of serverless. That includes Cloudflare, and AWS, and Microsoft Azure, and GCP, and Fastly, and Kubernetes and Kubeless, or Kubeless or however you pronounce it. There's just so many... Who knows? There's just so many, OpenFaaS, right?

There's just so many cloud providers that are doing this now. The Adobe I/O Runtime, I mean, there's just so many of these that are doing this now. And trying to keep up with these just from an information dissemination standpoint is really tough. I know nothing, nothing about some of the services in these other clouds, other than just a tiny bit of it. So if you asked me, "Hey, you're reporting on all this stuff on GCP and on Azure, can you show me how to set this up?" "No, I can't. I don't even know the first thing about it."

And I think that is good advice for people that are trying to build something. You can't learn AWS completely, so don't try to learn four different public cloud providers either.

Jared: No, I mean in the very early days of Trek10, both personally, and then also as a company, we were asked to do work on Azure, I believe. And we attempted it and it worked, right? Like what we built worked, and then we pretty quickly decided AWS is already huge. The market share is plenty, and I think that's true for most of the other providers as well. You could probably make a living consulting on GCP or consulting on Azure but doing a good job in delivering services or even your own product, on all three of the cloud providers, the main ones I should say, there's tons more, Azure, CloudFlare, all of those.

It's extremely difficult. And I think you just have to optimize for the minimal time and brain capacity you have, pick one and commit. And if you're wrong, okay, just pick another one and commit. I'm sorry, you wasted the time, but... Even today, if you were to tell me Jared pick one, it can't be AWS. That's fine. I would just go pick one and that's where I'd go. I don't think I would try to distribute around too many of them right now. You can't do what I would consider a good job by spreading yourself so thin.

Jeremy: Yeah. And I think you've got all of these major cloud providers and it doesn't matter if it's Tencent or Alibaba or any of the big U.S. ones. There's just a growing ecosystem around every single one of these. So yeah, so AWS great. If you like it. I mean, it's got a lot of stuff. I love AWS because there's so much stuff there, but Azure is pretty cool. Right? And they've been doing a lot of stuff around that. And GCP has Cloud Run, which is a very cool way to do serverless containers.

So I totally agree with that. But I think if somebody is looking at this overwhelming number of clouds, that's just really good advice. Do not try to learn them all, pick one, go deep on a few services and learn that.

Jared: You'll never build anything, you'll spend all of your time learning. Right. So...

Jeremy: Which is important, but at some point you're going to have to write some code. Alright. So I want to move on to another article that you wrote that was about the guiding principles for building a SaaS. And I thought this just tied in nicely to this other article that you wrote recently. And one thing reading this article too, is I don't think this is just about building SaaS. This is about building any application in the cloud. I know Trek10 just recently got your SaaS competency with AWS, which is awesome by the way.

And by the way, I love these new Lambda ready programs and some of these things where it's just basically like certifying providers and consultants and partners that know what they're doing and kind of have that sign off from AWS. I think that's super important that those exist. So back to this article though this was... You know I guess there was about three principles. What are the three guiding principles? So I want to go through these, because I think this is super important for any company building an application.

Obviously I think it's very much a bias here towards building an application cloud-native and more so serverless cloud-native which is... I mean, again, this is the advice that I would give as well. So let's talk about these because this first one was really interesting, build as if you may sell at any time. What did you mean by that?

Jared: So I've actually experienced a couple folks have come to us and said "Hey, we're spinning off of this product, where we're selling the company, we're selling this branch. It is so tightly bound to all of our other infrastructure and practice, that spinning this off is like its own entire effort, that is nontrivial when it comes to actually needing to sell that product or branch or whatever." So I think the guiding principle there ultimately is consider your dependencies back to the company and things like that.

So if I'm building a new product line or a new branch of the company, I'm giving them their own AWS accounts, their own segment of an organization where, I can hand off the keys fairly easily, right? Even in terms of financials, I would run if I could through their own bank account or something like that. So auditing the books, right? Like it's good financial hygiene to be able to audit those books kind of independently for that product.

And then come back to your mainline business as well, and you have a better understanding of what my cost allocations are. Being able to do the hand off the keys, say, "Here you go. Here's the keys to your brand new car or product or whatever it is." That's ultimately where I'm trying to get is auditability and understanding, and segmentation away from anything else in your company. Try not to entangle too many things.

Jeremy: Right.

Jared: And that's really where the guidance there comes from. And the reason that I think that's important, even if you never plan on selling that thing, is there are security benefits. And you get your own internal audit benefits, and you get so many compounding benefits that I think it's worth that what I would consider small upfront investment if done correctly at the beginning.

Jeremy: Right, yeah. And then the other thing that I think this ties into, and you mentioned it in the article, is also the idea of sort of setting up the developer. I don't know if we'd call it the developer experience or just sort of the developer I guess interface into this piece of the cloud. And you quoted Ben Kehoe in the article, when he said "Move your development environment towards the cloud, do not try to move the cloud down to your dev environment."

And this is me interpreting this and you can correct me if I'm wrong, but I see this as basically saying, "Don't build some sort of overly complex local development system that is going to be really hard to migrate if somebody else takes over. Utilize as many tools as you can to again, move that development experience more towards the cloud."

Jared: Yeah. Right. I mean, if you can sufficiently mark AWS on one machine, you should be starting your own company. But I would say you can mark well enough locally to do some very fast unit test or things like that. But you cannot really sufficiently mark or simulate the cloud locally enough in such a way that I would consider it, even if you would run end-to-end test or something like that. Locally, I would never consider that sufficient as compared to doing it in the cloud.

There're so many things that are interesting about your application running in the cloud, whether it's network or even IAM permissions or things like that. There's so much complexity up there that I would much prefer my developers are working kind of with those resources natively for their end-to-end or integration tests and things like that, all of that, that should just all be happening up there. Now of course, it can be painful if you're using SAM or things like that. And you're like: type a line of code, try to push it, type line of code, build, push, I get that, that's painful.

And that's where I think fast, local unit tests are acceptable. That's fine. I'm never going to ask to take that away from a developer. But giving developers their own AWS accounts, or their own kind of small team environment, ephemeral AWS accounts, there's some cool tooling out there, that's kind of enabling that, that stuff's really cool. And that's where I would invest company and some engineering time into providing that better engineering experience for the rest of my teams.

Jeremy: Right. So then the other thing that I guess, ties to this idea of being able to sell it at any time is, what you title it as "build as if you may open source at any time." And I can tell you, I write a lot of open source projects and it's a bit scary at first when you write an open source project. And you're writing documentation and you're letting people look at your code, because I've worked for a lot of organizations where you would not want to pull back that curtain and see what was behind there.

Lots of duct tape, lots of Popsicle sticks, hamsters in wheels, keeping things running. And I think that is true of a lot of companies. And I think the way you get there is because technical debt builds up over time, right? You're moving fast. Like, "Oh, we have a proof of concept." Next thing you know it's productized, and we were missing some things there. But this is I think a really, really good point is, you should build your company in a way that says, "Look, we could be transparent tomorrow if we needed to be."

Jared: Sure. Yeah. And I mean, a lot of companies are building towards, I'd say short-term right now value versus long-term stability. And I get that. It makes a lot of sense, especially financially in certain cases. If I need something right now versus, what's this going to look like in six months, when we circle back to it.

But ultimately building as if you're going to open source at any time, I think forces you to at least think if somebody was looking over my shoulder right now, right? If I was building something and say, "If Jeremy's looking over my shoulder right now, am I really going to put like my GitHub magic key, or whatever into this line of code and just hard code and deploy. And be like, 'I'll fix that later?' I feel like Jeremy's going to be back there and be like, 'Really man? I respected you and now no, like there's nothing.'"

So I think it helps you justify the few extra minutes or in some cases, hours or days, to make the right technical decision. And I get tech that's a thing. Look, go look at open source code. There's tons of stuff out there that's like to do actually make this work appropriately or optimize this thing. That's fine. Nobody's going to judge you. We all get it. People write software and they understand software is hard. But they are going to judge you pretty hard if you make poor security decisions or poor architectural decisions where it just doesn't make sense. And I think having that fictitious open source gazer over your shoulder, it just helps you make those decisions and kind of think to yourself.

Jeremy: Right. And it's beyond just code though. I mean, I like to litter my open source stuff with to dos, because I know if I don't put it in there, I won't go back to it. And I also feel like you put some to dos in an open source project, somebody that has... Excuse me, that has a little bit extra time might come through and be like, "Oh, hey. I can do a PR for it. Great." But internally in your own company, I mean I still think that's a good practice.

I mean, if you say, "Look, this thing doesn't check the string the right way, or it needs more parsing or more validation." Great, then just put it to do in there and say that. But I think another thing that almost every company I work with and every company I sort of dealt with that isn't a open source company, that's publishing closed source software. They are terrible when it comes to documentation.

Jared: Yep. And I would argue, I mean, even most companies that aren't just all internal or anything, many open source projects have terrible documentation.

Jeremy: It's very true.

Jared: You've just never heard of them, because they have terrible documentation, because nobody knows what they do. Can you tell the theme here? Documentation. It's very important. So I think that when you're building most successful open source projects, if you go look have... They're very good at these technical documentation, you can go and understand how the product works and the technical documentation. And then also they usually have decent narrative documentation.

You can understand what the product does, how you can leverage the product, how it solves your use cases, things like that. And I think having some of those internally as well, can help with onboarding a new employee faster. They can help with, when a client comes along and says, "Hey, can you explain more to me about feature X? How does this feature work?" Right. If you can hand over some decent documentation around that particular feature, even if it's not the technical documentation, but it's the narrative docs and say, "Here's how this thing works. Here's how it's designed internally a little bit."

Give them a little peek behind the sheet. That's fine. And customers will value that, your employees will value that, people that are trying to build that contextual awareness of your product and how it works, they're going to value that documentation. And I think building as if you make open source at any time is there's the embarrassment aspect of the code decisions that might not be great. But you wouldn't want to open source with no explanation. And I think that documentation is part of your explanation.

Jeremy: Right, yeah. And one of the things I like to... And I don't think I'm the only one who does this, but if I make a decision, if I say, "Okay, I'm going to use SQS versus Kinesis or something like that." You make that decision and then six months later you go back and you're like, "Why did I choose SQS again?" Or, "what was the one..." I think you do that a lot. And so oftentimes I try to put in just justifications of why I made a certain decision. I do this a lot too.

I mean, I know recursive functions generally are pretty bad if they go very deep and they can cause all these kinds of overflows and whatever. But stack overflow, where the site got its name from, but the thing that I will do, sometimes if it doesn't go that deep and I know there's a limitation to it, I'll write a recursive function because it's faster, it's more compact. It's easier to probably reason about, especially if you had to write out some long interpretive or imperative version of it.

So justifying that though, and putting a note saying, "I did this, this way because," and I didn't do it this way because I think those notes can be really helpful as well.

Jared: Yeah. And I mean, I think even kind of going back to this case of the cloud improves around you, or things improve around you. Even going back here in and say, "This tactical decision was made before this other thing existed," is completely valid to do, right. Someone might be like, "Why in the world would you have used Kinesis here, when you can totally have used SQS FIFO." And you'd be like, "I made this decision two years ago. I can't be held viable for improvements made to the cloud while I wasn't doing this."

Jeremy: Yeah, dates in your code. I guess the other thing, I always date comments in my code, because it's helpful to have those when you do that. This other concept or this idea where people think... Or I've heard a lot of companies where they're like, "Well, the code is the documentation," right? And we're not talking about well-commented code that can use a doc generator, which we can talk about in a second. But the thing that I tend to see, especially when people write code is they like to get cute. They find their own shortcuts, right?

They find their own different ways to do it. I was into functional programming for JavaScript for quite some time. And then I realized, "Okay, the speed benefit of the interpreter probably doesn't matter. It just looks more compact and it's much more confusing when I go back and look at it later." So it's even looking at my old stuff from like two years ago. I have no idea what that even does." So speaking of that, do you think that the code itself is good enough documentation if it's well-written or what are your thoughts on that and what are your thoughts on adding a doc generator?

Jared: Yeah. So I think that the best written most elegant code in the world cannot compare even remotely to well-done natural language documentation when it comes to communicating the intention and context of what was built, right? How this thing works? Why we built this thing? Sure, the code can explain what it does and at a technical level, how it does it. But it doesn't explain the business reasons for some of that code, and the business impact, or even necessarily the other systems that might depend or how they depend on that.

And of course you can get some of that in, I think doc generators, can get part of the way. You can explain definitely the technical library API or the technical documentation can be codegen, or docgen for a lot of those. Sure. I think you know Py docstrings and JavaScript docstrings, all that stuff fantastic, we need it, it's important. It does not replace the narrative documentation that helps people actually read for context and understanding. And I think even what a lot of people underestimate is the value to the doc writer, of having to sit down and write out those docs.

Jeremy: It's not easy.

Jared: It's not easy. I'd say it's not easy, but also you as individual sitting down and writing that documentation, might discover things about what you've just built, or you're going to build that you're like, "Oh, wait, I missed something," right? Like, it could even be like trivial things. "Oh, you know what? We have this input for color, but it's actually enumerated and you have to give us certain kinds of colors. But that's not explained anywhere. Are we just going to let people pass in a hex code for colors or are we going to expect strings for colors?"

Things like that, where it's like, you have to explain that. And also just in general, here's limits of the service or things like that. That's not always explained terribly well in the technical documentation, whereas narrative docs, as we all know from AWS, you have to go to the limits page or something like that to really find where that is.

Jeremy: Yeah, totally agree. So, alright. Last thing on this subject, because this is another thing that is painfully obvious with most companies you work with, is the fact that they don't write any tests or if they do, they write very few tests.

Jared: Yeah. So, I mean, that's definitely true. And I think this is open source gazer over the shoulder, but oh my goodness, you have to have some tests in there and this is just pure embarrassment aspect. And I think tests are another level of documentation that most people don't consider to be documentation. But I can tell you that at least me personally, probably the first way that I go and understand if I should use an open source project, is to go look at the test folder, the test directory. Because it's going to tell me a couple of things.

It's going to tell me A how well tested are they, how solid is this system. But beyond that test code and the test themselves, explain to me the API of the system and how the library works in most cases. I can go look at that and say, "Okay, this is what the system is capable of. This is what they're testing for. This is how I interact with this library or this system. This is how they think I should be interacting with this library or system." Things like that. Right? And that's stuff that you can't get from any other form of documentation, but also it's just system stability is it's so critical.

Jeremy: Right. Especially with I mean, just regression testing and any changes they made to the code. And I think I just said this on the last episode was, I see code where you change something and you're like, "I have no idea if this will break everything or what? Because there's no way for me to thoroughly test it." So I think that's hugely important. And I mean, even just some level of testing even at a high level, even if you're not going deep on unit testing, even testing at a function level and more complex things happening under that.

But it is just such an important thing that... But I get it, it takes time, right? It's an investment. It's another thing you have to do that takes away from feature building.

Jared: And I do think that it's kind of interesting once you move more to this cloud-native managed services world, I would say that end-to-end testing is more valuable than ever, right? I can't unit test, or I shouldn't really be unit testing does S3 ranks?

Jeremy: Right.

Jared: I think I can safely assume it probably ranks. And I'm not going to like unit test the core functionality of that. But what I can test is if I call my API, that's supposed to write an object to S3 and then can I call the other API endpoint that is supposed to mutate or map that object and give me some kind of response out of it. And if I can do that, if I can end-to-end test, like a few different things, through a few different endpoints, my goodness, I can get to like 80% code coverage with probably 10 tests in a sufficiently large system.

And it takes you a day, two days, let's say a week, because you have to learn a whole new end-to-end testing framework to get to a pretty high level of confidence that my system is at least working in the happy path. And that's invaluable. So that's where I would start at least these days.

In fact, when I go to work on new client systems, if I don't know how they work, and I don't have the docs, and I don't have even doc generator docs or anything like that, the first thing I do is sit down and go, "Cool, I'm going to run some end-to-end test, just so I understand how your API works or your system works. Then the side effect is I understand this thing and we haven't done tests, so I can start changing stuff and at least know if I broke something big.

Jeremy: Yeah. So I totally, totally agree with that. Alright. So last thing, and this was the one I was really looking forward to getting to. And I think this ties back in with the note you said earlier about just sort of building cloud-native people. Is this idea of just building with a cloud-native mindset, right? I mean, because this is the thing where your opinions on lift and shift, I totally agree with that. I think it's a bad idea, might be a great maybe onboard thing, but you know that stuff's going to get left that way.

So if you start thinking about building things in the cloud and using those native services and you actually had a quote in there, something like, "If Cloud Formation doesn't exist for it, then is it even available," or something like that.

Jared: Right, right. Yeah. If it's not supported in Cloud Formation, does it actually exist? Questionable.

Jeremy: Right, right. But yeah. But you tie i into a bunch of other things and you've always had this really great, I don't know, sort of like your serverless credo or something like that. Where it's if the platform has it, use it, if the market has it, buy it, if you can reconsider requirements, do it. If you have to build it, then own it. And I love that because I think that is such a... It takes what we're trying to do with serverless and just wraps it up into four quick sentences, which is great. But your thoughts on that overall, what are your thoughts on this building with a cloud-native mindset?

Jared: I think it takes practice, right? You're giving up a lot of fundamental control that I think people are used to having, right? I can't walk into my data center, open a rack and turn off or turn on a server or pull wires or things. That's a huge fundamental shift for a lot of folks. And as we're migrating to people now, these days, that have never even walked into a rack of servers, we're having people that are coming out of college that AWS and going into ec2 and clicking launch instance is their concept of a server.

I think what we're starting to build towards in terms of this cloud native mindset is, we fundamentally can trust these larger providers to provide mostly good experiences, let me be careful there. Mostly good experiences around these cloud primitive services. And we have S3, which has kind of been referred to as one of the seventh or eighth wonder of the world. It's like this modern Marvel, right? That thing holds so much data and performs so well, and it's so scalable.

When it goes down the internet is just basically done. That's incredible that they have this service and we're trusting it. As cloud-natives, we're trusting these providers. I don't care if it's Azure or GCP or anybody, to provide these primitives that we can build on top of it. I think cloud-natives look at those primitives and you have an implied level of trust, and you're willing to build businesses and business value on top of them.

And I think it's control and being able to trust somebody else with giving up that control, so you can accelerate what you're doing and looking to build in terms of business value, is more of a cloud-native mindset than anything else.

Jeremy: And I'll bring Forrest into this again, because you brought him into this. He's becoming very popular as a side topic on this podcast. But I think you included one of his cartoons about the regret index. And that is brilliant, because it perfectly captures exactly what it is. If I write something myself for me to be like, "No, I'm just going to get rid of it," is so hard. If I just buy something for me to switch from X to Z, for some product that I bought much, much easier, and it's no skin off my teeth to do that. But if I built it myself, I really, really want to hang on to it. And that is, I think just a huge problem.

Jared: Yeah. I mean, we see organizations and when I say we, I don't just mean Trek10, I mean, universally. I think all of us have seen organizations that have built something that some platform does better, some product does better, some open source product does better. And pretty much everybody says, "Why in the world do you keep using that thing?" And it's like the not-built here syndrome, right? Interestingly, I think Netflix suffers from the not-built here thing, but also they have this side effect of the stuff they build is also really good.

And all other people are using it, so outlier. But to take it to an individual level, right? If you're willing to build something or invest in a hobby, or craft brew, or something like that, you're not just going to throw that out if it's not great.

Jeremy: Right.

Jared: Right? You're going to be like, "Well, I've made this thing, I'm going to drink it now. This is terrible." But you take a sip. You're like, "Wow, that's bad. Hey, Jeremy, do you want to try this thing that I made?" Whereas if you go buy a six pack or whatever from the store, and you take a drink and you're like, "That's terrible." You're like, "Eh."

Jeremy: Yeah right, exactly, exactly.

Jared: I'm not going to drink that. I'm going to go get another drink.

Jeremy: Just dump it out and yeah.

Jared: And I think that just generalizes to building stuff as a company. And Forrest, I think really brought that to light when he was like, "If I buy something off the shelf and it doesn't work out, my regret is, "Well, I spent money on something that I could have spent money on something else." If I invest engineering resources and build my own thing that turns out to be not great and the wrong thing, I'm much more invested in saying, "Well, I've already made that decision. I don't want to look like an idiot to my company. I don't want to waste company resources. We're just doubling up."" Right?

Jeremy: We're going to make it even worse. We're going to keep on working on it until this thing is even. Yeah. Well, so I think there's a couple of key takeaways you had in here. I mean, one of the thing was just like this idea of cloud-native mindset, or building with that, is you're giving your team that autonomy so they can build things on their own, right? That they have more flexibility, right? You're not trapped into some system. You can try new things. You can experiment.

You can use products that get better around... They get better like you said, they just get better with time, because somebody is upgrading those things and you don't have to do anything. And then the greatest piece of that whole bit of using somebody else's stuff, is you don't have to write documentation for it. Right? Because they've already written the documentation. So anyways, Jared, thank you so much for being here.

Jared: Thank you.

Jeremy: Those articles are awesome. I will put those in the show notes because I do think you need to go check out not only those, but also everything else that you're working on. So if people do want to find out the other things you're working on, how do they get ahold of you?

Jared: Yeah, I'd say the best way is of course Twitter, the universal complaint box. So @shortjared on Twitter, and then you can also go to my website jaredshort.com, which just uses a notion page because the market had it, I just used it. And then you can always email me as well, I guess we can put that in the show notes, I guess. But yeah. Thanks Jeremy. It's been an absolute pleasure.

Jeremy: Awesome. Alright. Well, I will get all that information into the show notes. Thanks again.

Jared: Alright, thanks.

THIS EPISODE IS SPONSORED BY: Amazon Web Services (Serverless-First Function May 21 & 28, 2020)

Episode source