How Did We Choose Our Next Tech Stack at Hashnode?

How Did We Choose Our Next Tech Stack at Hashnode?

ยท

14 min read

The following is a republishing of a summary of a research I was tasked to do in Hashnode. The perspective here is me talking to my colleagues. I chose to keep that perspective to preserve the raw quality this writeup have. So keep that in mind when you're reading.

It's also worth noting that what we ended up choosing depended on our specific needs. Not choosing a certain tool doesn't necessarily mean we think it's bad. We at Hashnode have great love and appreciation for the open source community. And we are thankful to have so many great choices.

So buckle up comrades. This might be a long one ๐Ÿ˜€

Background

Currently our APIs are scattered across 3 repos. Some APIs are written using typical REST routes. While some parts of our web apps access our database directly using a homegrown abstraction. Also There's no consistency in how we access our database. 2 repos use mongoose. While another repo uses barebones node driver. It's also worth noting that the mongoose schema is duplicated across two repos.

All of the above has pushed us to look for ways to improve how we build Hashnode. One of these efforts was researching about the best way we can migrate all of our REST APIs into a centralized GraphQL API and also improve on how we build, deploy, and expose them.

In the first week of this research project, I tried to gather as much information about as many tools as possible. That included reading the docs, github issues, and going through their hello world examples in code. That was enough to give me an idea about the "theoretical" pros and cons of each tool. But it wasn't enough for us to make a decision as to how are we going to handle this migration.

For us to be able to make a decision, I had to get a better understanding of each tool. An understanding that goes beyond the promised advantages of each tool and the obvious disadvantages. I needed to get a feeling for the limitations and the rough edges of these tools.

The approach I took was that I listed a few areas of concern that I should be testing each tool against. These areas (in no particular order) are:

  • Performance
  • Developer experience
  • Maintainability
  • Limitations
  • Maturity
  • Documentation.

TLDR

Here's the TLDR. For our use case and considering the fact that we use mongodb, The recommended stack is the following:

  • GQL Server: apollo-server
  • GQL Schema: SDL (the classic way) + Codegen
  • ORM/ODM: Papr
  • To solve N+1 problem: Facebook's data-loader. It's magical ๐Ÿ˜
  • Deployment: Lambda for serverless or a containerized managed solution.
  • Cache: HTTP edge caching as a start.

Now onto why I'm recommending that stack:

Performance

When it comes to performance most available material focus on how fast a tool is in a certain benchmark. And while that can be important, it is usually more important that we look into the ways a certain tool give you in case of a performance bottleneck. Does the tool provide you with APIs to debug ? Does it provide you with an escape hatch in case its API couldn't fulfill your use case ? Or does it just leave you in front of a tall concrete wall without a solution ?

ORM/ODM:

TypeORM / Mikro-orm / Mongoose:

All of these had comparable performance. Also all of them allow you to debug and see what call they're making to the database. They all allow you to talk to the database driver directly.

Prisma:

It saddens me to say that Prisma was the worst offender when it comes to performance. Their rust query engine seems to be very lacking with mongodb. For example, when I was trying to fetch a single user using the username, the query took about 5 seconds!!! compared to ~150 ms with all other ORMs. Because for some reason (probably because of their built-in data-loader), the query engine translated one prisma.findUnique call to a mongodb.aggregate call instead of a mongodb.findOne call. That was compounded by the fact that Prisma doesn't support mongodb collation natively which was needed for that particular query.

And while Prisma allow you to see what calls it's making and also allow you to hit the database driver directly, if the query engine can't figure out how to map a simple findOne call, I imagine we're gonna be hitting that driver a lot if we go with Prisma.

Papr:

Enters Papr. A paper-thin layer (sorry) on top of the mongodb driver. It offloads the schema validation to mongodb itself using native json schema validation instead of doing it in the application layer like mongoose. It's by far the closest performer to directly hitting the mongodb driver.

N+1 problem

It's as old as data itself. If you're not familiar with it read this. It's very easy to fall into when writing GraphQL resolvers. And who else would be able to solve that issue but the creators of graphql themselves ?

Facebook's data-loader is an elegant solution to the N+1 problem in graphql. It works by caching the keys of a certain resolver over process ticks and then batching all of it into one big request and returning a promise. So in essence, you're converting N database hits for a single document to 1 database hit for N documents. This dramatically reduces network roundtrips. And the best part is that you don't have to write any gnarly code to achieve it.

There are more ways we can solve the N+1 problem and over-fetching problems. Read this article for a more advanced approach of how you can make use of graphql's AST to get exactly the fields the user needs. It is safe to say that we don't need to go that far. But it's good to know we have more options for optimization down the line.

GraphQL resolvers and garbage collection:

This one is a bit nit-picky. But it was very bizarre to discover that some graphql libraries (namely TypeGraphQL) can have issues at resolving the graph to a point where the javascript garbage collection would kick-in more than it should. Here's a video explaining the issue.

It's worth noting that the performance hit wasn't that big. And they have since patched the issue. But it's interesting to see how deep the rabbit hole of optimization can go.

caching

For caching, our options include: http caching, in-memory LHR caching, and Redis. As with any optimization, we only do it when we know we need it. So as a start we'll only utilize http caching on an edge network like Vercel or GraphCDN.

For that to happen we need a GraphQL server that supports editing the http response header. And while graphql-yoga (my initial choice) supports editing the cache response imperatively within resolvers, Apollo took the crown with their amazing cache hints on the schema level and their support for automatic persisted queries. (btw, I contributed to their documentation while doing this research ๐Ÿ˜ƒ)

And since we'll be using directives in our schema to enable caching, the library we use to write the schema must support custom directives. That made nexus no longer viable for our use case. As their support for custom directives is still an ongoing PR at the time of writing this report.

Developer experience and maintainability

DX and maintainability go hand in hand in helping reduce the surface area of mistakes developers make. That's why we can't discuss one of them in isolation from the other.

ORM/ODM:

Mikro-orm / TypeORM:

Both ORMs provide a somewhat familiar way of representing our models. They're both heavy on the OOP way of doing things. Where every model is presented as a class and the database jargon is then added by using a typescript decorator on each field. This approach is battle tested in other ecosystems, such as Hibernate, Doctrine and Entity Framework.

For type safety, they both rely on reflection. Which means that our dev environment would need to constantly run the typescript compiler. I tried it with Vercel cli hot reloading typescript environment and it was somewhat buggy.

Mikro-ORM provides ts-morph as an alternative to reflect-metadata. But will need types to be shipped in the final bundle. Which is basically code generation with extra steps. Read Metadata Providers for more information.

And after going through all that hassle. The type-system of both of the ORMs isn't foolproof. You'll still be able to do things you shouldn't be allowed to do and the compiler won't complain.

mongoose + (typegoose or interfaces):

There are 2 ways to add type safety to mongoose. The first is by writing an interface alongside your schema. Here's how. The second way is by using Typegoose. Which is a package made to create the mongoose schema and models using classes and decorators.

While Typegoose is more DRY because you write the types once. Both result in the same level of type safety. Which is full of holes. For example, when you run User.create({name: 6}) it should fail because name is a string. It actually passes because for some reason mongoose wraps every type you write in the interface/class with any ๐Ÿ™‚.

Papr:

Enter Papr. The alpha giga chad of mongodb ODMs. It's the only one of all of the above that is truly DRY. You only write the schema once. And it's fully type safe*. The only time where the type system might yield unexpected results is when using the aggregation pipeline. Read this issue. It also doesn't rely on reflection. So you can use any dev env you desire. It is truly the embodiment of the phrase "simple is powerful".

prisma:

Prisma is 100% type safe. And thanks to their introspection, you don't even have to write the schema yourself. You execute a terminal command and Prisma will take a sample data from your database and analyze and write the schema for you. At least that's the idea.

With mongodb, prisma is not able to introspect relations/references. Which means you'll have to go through your generated schema file and add relations one by one manually. It can't infer default values either. And it has a hard time dealing with the type ObjectId. Especially in arrays.

Schemas are written in prisma's own SDL. Which is very simple. But it adds to the complexity of the codebase. It's worth noting that our schema file was more than 3000 lines of code. Prisma doesn't support splitting or importing. All the community solutions for splitting are clunky at best.

After the schema is done, we generate a prisma client. The client is a fully type-safe way to access the database. It is safe to say that prisma has the best type-safety of all the previous solutions. Papr is the only one that came close.

graphql schema:

pothos:

Since nexus became unviable for us duo to its lack of custom directives, we turned to its next of kin: Pothos. It tries to achieve the same pattern as nexus. Instead of defining the schema by writing SDL, you write it in code. And while that might seem counterproductive as graphql's SDL is very simple, it should result in better maintainability on the long run. As you're writing the schema in code, it's very easy to reuse common bits (relay connections for example). Pothos also provides type-safety to our resolvers without the need for code generation.

The asterisks start here unfortunately ๐Ÿ˜ž In Pothos, the default way to define an object type in our schema is by writing a class and then a resolver that expose the fields of that class. In essence, you're writing the type twice. Read here for an example. Which will result in non-DRY code at best. And at worst will encourage developers to start coupling the classes of their ORM with their API classes. Which is a recipe for disaster.

Pothos does support an alternative way that is DRY. The simple-objects plugin has a similar syntax to nexus. But it has some limitations when dealing with nested data in our resolvers. Which we deal with a lot.

Typegraphql:

It has a similar syntax to TypeORM and Mikro-ORM. It tries to express the graphql schema as a class with some decorators on top. It also depends on reflection. I can't say much more about it as by that point I had made the decision to forgo code-first solutions of writing graphql schemas as they will always (at least for now) have more limitations down the line compared to the classical way that depends on SDL.

sdl + codegen:

We've come full circle now. The classic way of writing a graphql schema leaves very little to be desired. It's much more concise, much less verbose than any code-first option. I mean just look at this picture:

Screen Shot 2022-05-13 at 1.29.24 PM.png

The approach on the left leaves one thing to be desired: how do we type these resolvers ? The answer is code generation. Yes, it's an extra step compared to Pothos. But it is super seamless and in return we get code that is a blast to write.

We're nearly there. The following sections are much more concise. You don't have to endure this much longer ๐Ÿ˜…

Limitations

Prisma:

  • You're forced to choose between DX and performance.
  • Doesn't support collation in mongodb
  • No announced plan to add mongodb support for prisma migrate
  • No native ObjectId support
  • No modular schema support.

PAPR:

It doesn't support populate. Which is not a big deal as we're going to use resolvers and data-loaders to fetch relations/references anyway.

Mikro-ORM / Typeorm:

No support for projection.

Code-first schema:

You'll end up hitting a wall once you try doing anything outside their hello-world example and plugin system. For example, there was no way to support custom directives in nexus.

Maturity + The Open Source Graveyard

In my opinion, whether a certain tool is ready for production or not depends on the following factors:

  • How big is the community around it ?
  • How long has it been around ? That is to say: how much time did they have to fix bugs and stabilize their API ?
  • Who's maintaining it ? Is it a company ? a group ? or an individual ? That is because often the tools that die are the tools that are maintained by individuals.

prisma:

  • The community around prisma is huge. It has ~22k stars on github and with typescript growing to be the way developers choose to write javascript, prisma's type-safety will attract more and more developers.
  • Prisma 2 has only been around for 2 years which is not long but decent. However, the mongodb connector has only been around for about a month. See version 3.12 release. Which means it's not unlikely you'd find a bug or get a breaking update while using it. Which I did. Twice ๐Ÿ™‚
  • A company is behind Prisma.

Mongoose:

  • Mongoose is the gold standard of accessing mongodb in node. The community around is massive.
  • It has been around since 2010 so they've had time to fix bugs and stabilize their API.
  • The same guys that worked on Temporal are behind mongoose.

typeorm:

  • It has a good community around it. But the mongodb support is still new and limited.
  • Also one might argue that the future of the package is hazy. There is no company behind it. Only a group of developers.

Mikro-orm:

It's only maintained by a single developer. And the community around it is very impressive. But it's much smaller than Prisma or TypeORM.

Papr:

  • As Papr only has one year of age. It is still hasn't garnered a big following.
  • However, there's a company behind it (PLEX). Which gives a decent confidence boost in its future.
  • It doesn't add any restrictions on or abstractions to how we access mongodb. It only provides type-safety. Which is not much to maintain when you compare it to what other ORMs are promising. Papr doesn't even define itself as an ORM/ODM.

Apollo:

Apollo as a company has been shaping the graphql ecosystem for so long. It is safe to say Apollo is not going anywhere anytime soon.

Documentation

Most of the documentation I came across doing this research is generally good and usable. The only docs that left something to be desired is Mikro-ORM. It felt like pages were taped together and there were a lot of adhoc paragraphs. It's not ideal.

The recommendation

  • graphql server: Apollo-server
  • graphql schema: SDL + codegen
  • ORM/ODM: Papr
  • To fix N+1 problem: data-loader

Appendix A: Lambda or not ?

We still need to have a discussion about how we deploy our API. The stack mentioned in the recommendation can be deployed on both server and lambda environments.

I personally lean more towards a containerized solution. I feel it's a good middle ground between having to maintain a VM and between the completely state-less Lambda. We can scale it with ease. And we don't have to deal with the performance hit of cold-starts.

Conclusion

This is it. A tale of 2 weeks of research. I lay it all in front of you. We at Hashnode used this research as a starting point to have a discussion and make a decision on how we're going to build a glorious Hashnode 2.0 โค๏ธ. If you want to join us in building the next generation of blogging, we're hiring!