DDD is an approach to software development where the focus is on the core domain of the business. Subdomains using DDD should be treated as a product rather than a project. They are the part of the business that matters, they will grow over time and need special care by everyone. Not all the subdomains need to be implemented using DDD, DDD fits perfectly for the core domain.
Shared/Ubiquitous language (UL) helps to connect the software model and the analysis model using the same terminology. By using UL with the business, the communication will be more explicit and both models will feed back and enrich. This is a key part of DDD that enables the domain experts and development teams to collaborate together.
Domain experts are usually people that are part of the organization, they can offer expertise in the problem domain. They could be a user of a current system. When talking with them try to keep the conversation focused on the problem space and do not jump to implementation details.
This information here is not even 1% of you will get from reading the book, if you are new to DDD or you would like to reinforce some knowledge I encourage you to give it a try.
]]>The preferable option when extracting new microservices is also to move the data apart but that is not always possible to achieve, these patterns add options when facing this of scenario.
Multiple microservices use the same database as a source of truth. It might be useful when the data to be accessed is read-only and highly stable or when the database is exposed as an endpoint (only read-only) to fulfil reporting needs or clients joining large amounts of data.
Exposing different database views for each service might be useful to limit the data that services can access. Applying this pattern could help when the existing schema is difficult to decompose. If the end goal is to push for a full schema decomposition this pattern should be avoided.
It might be useful to add service as a wrapper of the database schema when is difficult to pull apart. This pattern will hide the database behind a service and introduce well define APIs. Clients that need the data can now consume the new service. This pattern intends to encourage people to stop making changes to the database and push for their services.
With this pattern, there is a copy of the database in read-only mode. The new database is exposed directly to clients. This might be useful when dealing with reporting. An important obstacle that needs to be taken under consideration is the need for a mapping engine that will populate the new database from the source database.
Given the number of databases schemas are going to increase when splitting the monolith there are two options to consider if the databases share the same technology:
Usually, within the monolith, the data access component is a big component. Instead of having that single repository layer, there are multiples components by bounded context. This technique helps to realise what data is needed in each bounded context and it is useful when looking into split the monolith to have a better understanding about what tables need to be moved apart as a new schema.
Each bounded context has its database schema. This pattern is useful when the application is supposed to be split in the future into microservices. It is recommended to follow this approach when building new systems since with little effort developers will have more options available in the future.
This pattern might be useful if the extracted microservice needs data owned by the monolith. Instead of accessing the data directly, the monolith exposes an API that is consumed by the new microservice. It also might help to discover other candidates to be extracted.
This pattern fits in the scenario where the data owned by the microservice still lives under the monolith. The pattern can be applied when there is a new business capability for the microservice that need to add data into the schema. The new data should be added in the microservice schema instead of adding it into the monolith. This could open the door to move the entire data set out of the monolith.
This is a recurrent problem that can be solved with several approaches, let’s take a look.
It might be useful when not all services need to have the same data. Answering these question should give a hint if we could apply it: How often does the data change and how consistent need to be across services?
The data is placed under the same schema and is accessed by multiple services. Use it when changes are unlikely to happen. Take into account that a change in the schema might impact multiple services.
Share the static data as a library might be an option for a small amount of data where different services can see multiple versions of that data.
This option is obvious, the shared static data lives under a microservice who owns the data. This option might be feasible for organizations where creating a new microservice does not involve too much overhead.
In the next post, I’ll try to describe a few techniques to migrate data from the monolith database into the new schemas.
]]>On the third chapter of the book, Sam enumerates and explains a few patterns that will help when extracting microservices:
It helps to move functionality out of the monolith without doing a big bang release.
It’s recommended to extract a vertical slice of the functionality. To use it with success, the functionality being extracted has to map clearly to an inbound call.
To move functionality to the new service without touching or making any change to it.
Show together with a group of pages or widgets exposed by different microservices. Allow vertical slices to be migrated.
When composing UI from a different microservices makes sense. It must be possible to modify the existing user interface for the new functionality to be added.
It allows adding a different implementation for the functionality you want to extract that will coexist with the old one until ready to migrate.
It can be combined with feature toggles to change between implementations. When switching to the new implementation, keeping the old one for some time can be useful as a fallback mechanism if something goes wrong with the new code.
If the functionality to be decomposed is deeper inside in the system and it is not possible to intercept the calls at the monolith boundaries.
It is used to mitigate the risk of introducing the new implementation having errors. The pattern allows comparing the results of the new implementation with the old one to verify that the new one will behave correctly. Only one implementation is, in this case, the source of truth.
Use it when the functionality to be replaced is high risk.
Introducing a proxy that can intercept inbound requests or responses from the monolith allows adding new behaviour that is in a new microservice.
Use it when the monolith code is complex or cannot be modified.
Instead of intercepting calls to the monolith, the pattern intercepts changes in the data store. A few options to implement this pattern can be:
If there is a need to act to a data change in the monolith and there are no other options to intercept it differently.
Useful articles about this topic
]]>In my opinion, even if you are not familiar with microservices or you already use microservices it will help you to understand the pros and cons of using microservices as architectural style. The book explains well-known patterns about how to do microservices properly that you can benefit of.
I’ve created this conceptual map collecting the most important ideas and patterns explained in the book.
Cmap is the tool I’ve used to model the map.
]]>“Hey buddy we need to do X, Y, Z.”
It happens all the time and I think we the software developers are responsible to educate people that we are not only solution implementors. We are also problem thinkers.
Why? Are we not paid to build software? Yes, we are. Are others not supposed to think about the problems and leave us the implementation? Of course not. I truly believe that developers must be involved in the context of the problems.
For us, the issue starts when we do not understand the problem and we are given a solution. Usually, thinking only in the solution leads to overcomplicated code, misjudge complexity, unused features and detachment of what the customer wants.
In the last years, I have realised that when developers think first about problems and not solutions the result is far better. And this is not something I am making up, this idea is in almost every agile book. Exploring the problem space is key in the context of the company trying to build the right thing. Even more when the company does not have loads of money to spend.
Perhaps the outcome of the exploration problem process is that the company does not need software to solve the problem. Trying first with a manual process might be enough until discovering more about the problem. Maybe we just to gather data and postpone the decision for the future when we are in a better position.
Probably, we do not want to make that thinking process alone. What if instead of just developers or CTOs, product managers and stakeholders, we were able to speak about problems together? Again this is not something new, Agile manifesto, fourth principle:
Business people and developers must work together daily throughout the project.
This might look basic stuff but companies get lost with this kind of reasoning.
As my last reflection/open question, are we the responsible of being detached of the problems? Do we put too much focus on technology and that blinds us for what it matters that is helping people? Would we be able to assume the leadership of this matter?
]]>I decided to prepare the exam when my company Ding asked me if I would like to attend a course in Amazon offices about Architecting on AWS. Before attending the course I had no idea about what cloud services AWS offered, I knew only about EC2 and Lambda so yes I learnt a lot during the 5 days. Thanks to Ding for the opportunity.
As Sandro Mancuso mentions in his book, we the developers should own the learning path for ourselves. For that reason, I committed to knowing more about what AWS provides and how it can help developers to build products.
I started for signing up in a learning platform called A Cloud Guru where it provides several courses to prepare different cloud vendor certifications. I liked the way the course is structured. The best part is they encourage you to practice a lot with the AWS console and some of the lessons are purely practical which is very good. They also have an exam simulator where you can practice and prepare for the exam.
After finishing the course I continued reading a few AWS whitepapers:
From the multiple AWS cloud services, I would say that the most relevant for the exam are S3, EC2, RDB, DynamoDB, ELB, EBS and VPC. Reading the FAQ for these services will give you an overview of the purpose of the services and how they work.
Another important point is that you should be able to create from scratch a VPC with Subnets, Internet Gateways, NAT instances / NAT Gateways and Route tables. That is only possible with practice with the AWS Console or the AWS CLI.
During the almost two months of preparation, I took some notes to study the multiple services. The notes are mostly extracted from A Cloud Guru platform and the FAQ of each service: AWS Solution Architect Associate Exam Notes
I imagine the notes will get obsolete at some point since AWS changes very fast the services in the platform, although the fundamentals should not change too much.
As my final advise, do not focus just on getting the certification done. For me, it was much more rewarding the travel through the path of learning than the actual certification.
]]>I’d like to highlight the two talks I liked most. Although they don’t expose the newest tech I think is interesting how understanding a few principles could help you to prevent errors in production and help to deliver software with shorter feedback loops.
The first one is What Breaks Our Systems: A Taxonomy of Black Swans by Laura Nolan @lauralifts. Her presentation exposed a few major outages that big companies had and how they could have been avoided.
Soft and hard dependencies
She explained that most of the time when a new dependency is introduced in the software, we the developers, don’t think enough if that dependency should be hard or a soft one. For instance, she gave the example of a system that relies on an external dependency at initialization time could cause the unavailability of our software if measures are not put in place. Developers should take into account when adding new dependencies and thinking if the system can work without the dependency, and if it can, the software should not fail if the dependency is not available.
Retries
Normally when a system talks to other systems problems like connectivity and network issues might happen. A few alternatives might be retries with exponential backoff and the circuit breaker pattern. A lot of errors as oversized queues or exhausted systems might happen because retries are not well implemented. Ignoring problems in this area might lead to serious systems outages.
The second talk Immutable Deployments – Implementing a wicked fast deployment strategy by @grandazz is about practices to deploy software into production.
He talked about how the team he is part of uses feature flags for every single feature they put in production. How features flags have changed the way they make software and how their codebase is better than before having always that constraint in mind.
He categorised the feature flags in two categories, business and ops feature flags. The first one with a short duration in time and used by business people to decide what feature activate and when. The last is own by developers enabling them to make refactors or even deactivate features on demand.
Last but not least he showed how feature branches go directly to production without exposing the functionality to the final user. That allows them to make tests in production. They have also in place a way to rollback quickly because the application allows them to change between different versions in case of any problem.
]]>El día fue muy ameno en cuanto a contenidos. Jason empezó el curso introduciendo que es TDD y explicando conceptos “básicos” (muchos decimos que ya los sabemos pero no siempre es así) como las mecánicas de TDD y por qué hacer TDD.
Durante el día nos dio tiempo a ver muchísimos conceptos, algunos relacionados con el diseño de los tests, diseño simple, refactorización o consejos para dar buenos nombres. Parece mucho para un solo día pero como no éramos demasiados participantes dio tiempo a verlo todo. Una de las cosas que me gustó mucho es que entre un concepto y otro había una parte práctica (parecido a una kata) para poder aplicar lo que has aprendido.
A continuación voy a ir comentando algunas de las prácticas y conceptos que me han parecido interesantes después de haber hecho el curso.
Un test solamente debe comprobar un ejemplo/regla y por lo tanto solo debe de fallar por un motivo.
Al Igual que esperamos a ver la duplicación tres veces1 para eliminarla en el código, la misma regla debería aplicar también a los tests. Tener paciencia a la hora de refactorizar los tests es importante ya que un generalización prematura podría ofuscar su significado.
Crear generalizaciones en los tests puede ayudar a minimizar el impacto de los cambios cuando el código de producción cambia. Varios ejemplos serían crear una fachada para crear instancias de un objeto o extraer la configuración de un mock a un único lugar.
Parametrizar los tests, siempre y cuando tenga sentido, puede ser una buena opción para incrementarla cobertura de código o hacer mutation testing.
Al escribir un test deberíamos de empezar por la aserción y continuar completando el test hacia arriba. Esta práctica ayuda a centrarnos en lo que queremos probar y a no crear un setup innecesario.
Una de las frases que mencionó Jason durante el curso y que me resultó muy reveladora:
Don’t leave the path. Don’t go to the deep forest.
Quien no se ha encontrado en la fase de refactorización y cambiando un montón de código/ficheros y cuando nos damos cuenta se nos ha ido de las manos. Otra situación muy común es que no ejecutamos los test con cada pequeño cambio y después están en rojo y no sabemos a que se debe.
A estos dos ejemplos es a lo que se refiere la frase de Jason, es muy importante hacer commits muy pequeños durante la refactorización y ejecutar los tests muy a menudo para comprobar en todo momento que no se ha roto el código de producción.
Una de las tareas más complicadas en programación es dar buenos nombres a las abstracciones que creamos. Para los componentes que creemos tengan una correlación conceptual con el negocio, podemos apoyarnos en herramientas como un generador de nube de tags2. Por ejemplo, utilizar el texto de la historia de usuario para generar tags puede ayudarnos a descubrir palabras y nombres que nos pueden ser de utilidad.
Otra de las cosas que hicimos fue enumerar qué es diseño simple en los siguientes puntos:
Son básicamente los mismos puntos que Kent Beck3 enumeró en su libro Extreme Programming con un mayor nivel de detalle.
Otro consejo con el que me quedo del curso:
Clases que crean objetos no deberían usarlos, clases que los usan no deberían crearlos.
Para finalizar el post me gustaría recomendar el curso a todos ellos que quieran aprender TDD o quieran mejorar su técnica. A mi me resultó muy útil y volveré para completarlo si hay una segunda y tercera parte.
]]>In shared-nothing systems, the network is the only way machines can communicate between themselves. The network cannot guarantee when a message will arrive or even if it will arrive at all. A few things can go wrong:
These issues are impossible to identify, you only know you have not received a response. The usual way of handling that is through timeouts. With timeouts, some questions arise, i.e. how long the timeout should be? should we retry after a while?
The communication in distributed systems is tricky because sometimes is not immediate, it takes time for a message to reach the destination machine.
Moreover, each machine on the network has its own clock, which is not a perfectly accurate hardware component. The most common option to synchronize clocks is Network Time Protocol (NTP) where the clock adjusts taking into account the time reported for a group of servers. Although, there are more reliable options like GPS receivers.
The methods for getting a clock to tell the right time are no accurate or reliable as we might think. But it is also true that most data processing systems do not need real-time guarantees. The effort to build it taking those details into account may be neither appropriate nor economical.
In summary, a distributed system cannot rely on a single node because that node can fail unexpectedly. Distributed systems rely on quorum algorithms. The nodes need an absolute majority of more than half of them to take a decision (there are other kinds of quorums). For instance, even if a quorum decides that a node is dead the node will be declared dead, even although the node is still alive.
Provided by @mintxelas
]]>It is important to define what ACID means since it is used to describe the safety guarantees provided by transactions.
Atomicity is the ability to abort a transaction on error have all the writes from that transaction discarded.
Consistency is in part relied on the application to preserve certain statements about the data (invariants). Specific invariants can be checked by the database, e.g., foreign keys.
Isolation means that concurrent transactions are isolated from each other. Each transaction can pretend that is the only one running on the database.
Durability promises that one the data has been committed successfully, the data will remain there.
It is worth to mention that ACID implementations are different in each database vendor.
Databases have tried to hide concurrency problems from application developers by providing transaction isolation. In practice, it is not that simple. There is a level of isolation called Serializable isolation that comes with the price of worst performance. For that reason database systems implement weaker levels of isolation.
Commonly databases prevent dirty writes using row-level lock where only one transaction can hold the lock for a given object.
Most databases avoid dirty reads remembering the old committed value and the new value for a written object. Having that, the database can provide the old value for reads until the new value is committed.
It is the default option in Oracle 11g, PostgreSQL, SQL Server 2012, MemSQL.
With Snapshot isolation level transactions read from a consistent snapshot of the database. This technique avoids other transactions to see data that is changed by another transaction, each transaction sees only the old data from that particular point in time.
The lost update problem can happen when the application reads a value from the database and modifies it. If two transactions do this concurrently, one of the updates might be lost because the second update does not have the first modification.
In order to avoid this issue a few solutions can be applied:
Serializable isolation is the strongest isolation level. It guarantees even transactions run in parallel, the result will be as if they had executed one at a time. All the race conditions are prevented by the database.
Most of the databases today implement serializable isolation use one of these techniques:
Since this post only touches the surface of the book chapter. I would encourage you to read the full book chapter to understand better how transactions work behind the scenes.
Provided by @mintxelas
]]>