Blog

Microservices Data Architecture: Design Principles, Best Practices, and Tools

As the trend of building microservices-based applications continues to gain traction, so does the need for a data architecture that can support them. In traditional monolithic applications, the data model is tightly coupled with the application code, making it difficult to scale and evolve independently. In contrast, microservices-based architectures allow for more flexibility and scalability, but this also requires a different approach to data architecture. In this blog post, we will explore the principles, best practices, and tools for designing a data architecture that supports microservices.

Design Principles

When designing a data architecture for microservices-based applications, there are several design principles that should be considered:

  1. Data isolation: Each microservice should have its own database or schema to prevent data coupling and enable independent scaling.
  2. Decentralization: Data should be decentralized, meaning that each microservice owns and manages its own data, rather than relying on a central database or data store.
  3. Service-oriented: The data architecture should be designed around the services, rather than the data. This means that the data model should reflect the services and their interactions, rather than trying to create a single unified data model.
  4. Event-driven: An event-driven architecture can help decouple services and enable asynchronous communication. Events can be used to notify other services of changes to the data.
  5. Security and privacy: Data security and privacy should be considered at all stages of the architecture design. This includes data encryption, access controls, and auditing.

Best Practices

Along with the design principles, there are several best practices that can help ensure a successful data architecture for microservices:

  1. Use a polyglot persistence approach: This means that each microservice can choose the best database technology for its specific needs, rather than being limited to a single technology.
  2. Implement API gateways: API gateways can help manage the communication between services, including authentication and authorization, rate limiting, and caching.
  3. Use a message broker: A message broker can help enable asynchronous communication between services, and can also provide features such as message queuing, retries, and dead letter queues.
  4. Implement data versioning: Since each microservice owns its own data, it’s important to have a strategy for versioning the data schema to ensure compatibility between services.
  5. Monitor and analyze data usage: Understanding how data is being used across services can help optimize performance and identify potential issues.

Tools

Finally, there are several tools that can help implement a data architecture for microservices:

  1. Database technologies: There are a variety of database technologies that can be used for microservices, including traditional relational databases, NoSQL databases, and in-memory databases.
  2. API gateways: Popular API gateway tools include Kong, Apigee, and AWS API Gateway.
  3. Message brokers: Popular message brokers include Apache Kafka, RabbitMQ, and Amazon SQS.
  4. Schema versioning tools: Tools such as Flyway and Liquibase can help manage database schema changes and versioning.
  5. Analytics tools: Tools such as Prometheus and Grafana can help monitor and analyze data usage across services.

Conclusion

Designing a data architecture for microservices-based applications requires a different approach than traditional monolithic applications. By following design principles, best practices, and using appropriate tools, it’s possible to build a scalable and flexible data architecture that supports the needs of microservices.

Comprehensive Guide to Data Architectures: From Monolithic to Data Mesh

As organizations continue to collect and generate vast amounts of data, they need a robust and scalable data architecture that can support their data needs. A data architecture is a set of rules, policies, and models that govern how data is stored, organized, and managed within an organization. There are several different types of data architectures, each with its own strengths and weaknesses. In this article, we will provide a comprehensive guide to data architectures, including their features, advantages, and challenges.

Part 1: Monolithic Data Architecture

The monolithic data architecture is a centralized approach to data management, where all data is stored in a single database or data warehouse. This architecture is simple to implement and manage, but it can quickly become inflexible and difficult to scale as the organization’s data needs grow. We will discuss the features, advantages, and challenges of monolithic data architecture in detail.

Part 2: Service-Oriented Data Architecture

The service-oriented data architecture is a distributed approach to data management, where data is stored in multiple databases or data warehouses that are connected by APIs. This architecture enables organizations to scale their data systems more effectively and provides greater flexibility and agility. However, it can also introduce additional complexity and require more resources to manage effectively. We will discuss the features, advantages, and challenges of service-oriented data architecture in detail.

Part 3: Lambda Architecture

The lambda architecture is a hybrid approach to data management that combines batch processing and real-time processing. This architecture enables organizations to process large amounts of data quickly and efficiently while also providing real-time insights into their data. However, it can also introduce additional complexity and require more resources to manage effectively. We will discuss the features, advantages, and challenges of lambda architecture in detail.

Part 4: Microservices Data Architecture

The microservices data architecture is a distributed approach to data management that uses small, modular services to manage data. This architecture enables organizations to scale their data systems more effectively and provides greater flexibility and agility. However, it can also introduce additional complexity and require more resources to manage effectively. We will discuss the features, advantages, and challenges of microservices data architecture in detail.

Part 5: Data Mesh Architecture

The data mesh architecture is a distributed, domain-oriented, and self-organizing approach to data management that aims to improve the scalability, agility, and flexibility of data systems. This architecture enables organizations to manage their data more effectively by decentralizing data ownership and governance and establishing clear data contracts between different domains. However, it can also introduce additional complexity and require more resources to manage effectively. We will discuss the features, advantages, and challenges of data mesh architecture in detail.

Conclusion:

A data architecture is a critical component of any organization’s data management strategy. There are several different types of data architectures, each with its own strengths and weaknesses. By understanding the features, advantages, and challenges of each architecture, organizations can choose the one that best meets their data needs. From the simple and centralized monolithic data architecture to the distributed and self-organizing data mesh architecture, there is a data architecture that can support any organization’s data requirements.

Data Mesh: A New Paradigm for Managing Complex Data Systems

Data Mesh is a new paradigm for managing complex data systems that seeks to overcome the limitations of traditional centralized approaches. It is a distributed, domain-oriented, and self-organizing model that enables organizations to scale their data systems while maintaining agility, flexibility, and autonomy. In this article, we will provide an overview of the Data Mesh concept, its principles, and its benefits. We will also discuss the challenges and risks associated with implementing a Data Mesh architecture and provide some practical recommendations for organizations interested in adopting this paradigm.

In today’s digital world, data is the lifeblood of modern organizations. Companies use data to gain insights into their customers’ behavior, optimize their operations, and develop new products and services. However, as data volumes and complexity continue to grow, managing data has become a major challenge for many organizations. Traditional centralized approaches to data management, such as data warehouses and data lakes, are struggling to keep up with the pace of change and the growing demands for data access and agility. This is where Data Mesh comes in.

What is Data Mesh?

Data Mesh is a new paradigm for managing complex data systems that was introduced by Zhamak Dehghani, a principal consultant at ThoughtWorks. Data Mesh is a distributed, domain-oriented, and self-organizing model that seeks to overcome the limitations of traditional centralized approaches to data management.

The Data Mesh model is based on four key principles:

  1. Domain-oriented decentralized data ownership and architecture: In a Data Mesh system, data ownership and architecture are decentralized and domain-specific. Each domain is responsible for managing its own data and making it available to other domains as needed. This enables organizations to scale their data systems while maintaining agility, flexibility, and autonomy.
  2. Data as a product: In a Data Mesh system, data is treated as a product that is designed, built, and operated by dedicated data teams. These teams are responsible for ensuring the quality, reliability, and availability of the data products they create.
  3. Self-serve data infrastructure as a platform: In a Data Mesh system, data infrastructure is treated as a platform that enables self-serve data access and consumption. This platform provides a set of standardized APIs, tools, and services that enable data teams to create and manage their data products.
  4. Federated governance: In a Data Mesh system, governance is federated and domain-specific. Each domain is responsible for defining and enforcing its own governance policies and standards. This enables organizations to maintain consistency and compliance across their data systems while allowing for flexibility and autonomy at the domain level.

Benefits of Data Mesh

Data Mesh offers several benefits over traditional centralized approaches to data management. These include:

  1. Scalability: Data Mesh enables organizations to scale their data systems by decentralizing data ownership and architecture. This allows for more efficient data processing and faster data access.
  2. Agility: Data Mesh enables organizations to be more agile by empowering domain-specific teams to manage their own data. This reduces dependencies and enables faster decision-making.
  3. Flexibility: Data Mesh enables organizations to be more flexible by allowing for the use of different data technologies and tools within each domain. This enables teams to choose the best tools for their specific needs.
  4. Autonomy: Data Mesh enables organizations to maintain autonomy by allowing domain-specific teams to manage their own data and make their own decisions about data architecture, governance, and technology.

Challenges of Data Mesh

  1. Complexity:

Data Mesh architecture introduces additional complexity into the data system, which can be difficult to manage and understand. In a Data Mesh system, each domain is responsible for managing its own data, which can lead to duplication, inconsistency, and fragmentation of data across the organization. This can make it difficult to ensure data quality, maintain data lineage, and establish a common understanding of data across different domains.

  1. Integration:

Data Mesh architecture requires a high degree of integration between different domains to ensure data interoperability and consistency. However, integrating data across different domains can be challenging, as it requires establishing common data models, APIs, and protocols that are agreed upon by all domains. This can be time-consuming and resource-intensive, especially if there are multiple data sources and technologies involved.

  1. Governance:

Data Mesh architecture introduces a federated governance model, where each domain is responsible for defining and enforcing its own governance policies and standards. While this approach allows for more autonomy and flexibility at the domain level, it can also lead to inconsistencies and conflicts in data governance across the organization. Establishing a common set of governance policies and standards that are agreed upon by all domains can be challenging, especially if there are different regulatory requirements and data privacy concerns.

Risks of Data Mesh

  1. Data Security:

Data Mesh architecture requires a high degree of data sharing and collaboration between different domains, which can increase the risk of data breaches and unauthorized access. Ensuring data security and privacy across different domains can be challenging, especially if there are different security protocols and access controls in place. Organizations need to establish a robust data security framework that addresses the specific security requirements of each domain and ensures that data is protected at all times.

  1. Data Ownership:

Data Mesh architecture introduces a decentralized data ownership model, where each domain is responsible for managing its own data. While this approach enables more autonomy and flexibility at the domain level, it can also lead to disputes over data ownership and control. Establishing clear data ownership and control policies that are agreed upon by all domains can help mitigate this risk and ensure that data is used appropriately and ethically.

  1. Vendor Lock-in:

Data Mesh architecture requires a high degree of flexibility and interoperability between different technologies and platforms. However, using multiple vendors and technologies can increase the risk of vendor lock-in, where organizations become dependent on a specific vendor or technology for their data needs. Organizations need to establish a vendor management strategy that ensures they have the flexibility to switch vendors and technologies as needed without disrupting their data systems.

Conclusion

Data Mesh architecture offers many benefits, including improved scalability, agility, and flexibility of data systems. However, it also presents several challenges and risks that organizations need to consider before adopting this approach. Organizations need to establish a clear data governance framework, address data security and privacy concerns, establish clear data ownership and control policies, and develop a vendor management strategy that ensures they have the flexibility to switch vendors and technologies as needed. By addressing these challenges and risks, organizations can successfully implement a Data Mesh architecture that enables them to effectively manage their complex data systems.

RBAC in AZURE and how to consulting the configuration

RBAC (Roled based access control) is a security feature used to control access based on user roles in an organization, that is, considering its functions within the organization. In large organizations is a classic way to organize permits, based on the competences, authority and responsibility of a job.

A RBAC attribute is the dynamism, because the access control function is given to a role and integration in that role of a person can change over time, like the permissions associated with a role. It is opposed to classical methods of access where access permissions are granted or revoked to a user object to object.

In AZURE we have a RBAC implementation for resources and a number of predefined roles. The roles in AZURE can be assigned to users, groups, and applications, and at the level of subscriptions, resource groups, or resources. As we see the options are vast.

20160524_RBAC_AZURE_Paso01

There are three basic roles: owner, contributor or partner, and reader. The owner has full access to resources, including permissions to delegate access to others. The contributor is equal to the owner but can not grant access to others. The reader can only see resources.

Of these three roles inherit another set of roles for specific resources. In this link is a full list of roles based on Azure and its functions.

However you can generate as many roles with custom permissions as necessary. To create them can be done via Azure PowerShell, Azure client line interface (CLI), or the API REST. In this link you have more information and examples of how to do it.

Access to the list of permissions for each role

One way to check what permissions each role have, is through the portal AZURE. You enter into a subscription, resource group or resource, and you will see an icon like two peoples at the top right:

20160524_RBAC_AZURE_Paso02

Selecting it, the users panel appears. Click Role:

20160524_RBAC_AZURE_Paso03

And the list of available roles will appear:

20160524_RBAC_AZURE_Paso04

Select the role that interests you to check their permissions, and the Members Role tab appears with a button to see the list of permissions:

20160524_RBAC_AZURE_Paso05Once on the list we can expand information for each group of actions by clicking on the corresponding entry:

20160524_RBAC_AZURE_Paso06

And within it each individual action:

20160524_RBAC_AZURE_Paso08

At this level is useful the information that provides the icon to learn more on each input with an explanation of each share representing:

20160524_RBAC_AZURE_Paso09

To learn more about how to create, delete or consult the members of each roles, you can consult the following link.

Load balancing two Azure WebAPP with nginx

In the previous post we saw how to install a ngin-x server. One of the capabilities that have ngin-x is to be a powerful proxy server, used as a load balancer. In this post we will see how to use it to balance the load of two WebAPPs (could be as many as were necessary). This scenario presents a feature that requires slightly modify the normal procedure for this operation.

We start from a linux machine with NGIN-x installed, as seen in the previous post.

In addition we will create two simple WebAPPs, with a message that differentiates each of them, for example, as shown in the following images:

20160505_NGINX_WebAPP_Paso02

20160505_NGINX_WebAPP_Paso03

Then we will set up ngin-x following the normal guidelines. We entered the linux server console and edit the configuration file with nano for example:

sudo nano /etc/nginx/nginx.conf

And modify the script so it looks like the following code:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

events {
     worker_connections 768;
     # multi_accept on;
}

http {
     upstream bloqueprimerproxy {
          server xxURL1xx.azurewebsites.net;
          server xxURL2xx.azurewebsites.net;
     }

     server {
          listen 80;
          server_name   localhost;

          location / {
               proxy_pass http://bloqueprimerproxy;
               proxy_set_header  X-Real-IP  $remote_addr;
          }
     }
}

Where xxURL1xx.azurewebsites.net and xxURL2xx.azurewebsites.net are the URLs of the two WebAPPs to balance.

We save the code and restart the NGIN-x service:

sudo service nginx restart

The above script would be the normal way to balance two WEBs with ngin-x. But if we tried now we get the following error:

20160505_NGINX_WebAPP_Paso01

This is because Azure App Service uses cookies to ARR (Application Request Routing). You need to ensure that the proxy passes the header correctly to the WebAPP so that it identifies the request correctly.

For this we edit again the configuration file and leave it as follows:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

events {
     worker_connections 768;
     # multi_accept on;
}

http {
     upstream bloqueprimerproxy {
         server localhost:8001;
         server localhost:8002;
     }

     upstream servidor1 {
         server xxURL1xx.azurewebsites.net;
     }

     upstream servidor2 {
         server xxURL2xx.azurewebsites.net;
     }

     server {
          listen 80;
          server_name   localhost;

          location / {
               proxy_pass http://bloqueprimerproxy;
               proxy_set_header    X-Real-IP    $remote_addr;
          }
     }

     server {
          listen 8001;
          server_name   servidor1;

          location / {
               proxy_set_header Host xxURL1xx.azurewebsites.net;
               proxy_pass http://servidor1;
          }
     }

     server {
          listen 8002;
          server_name   servidor2;

          location / {
               proxy_set_header Host xxURL2xx.azurewebsites.net;
               proxy_pass http://servidor2;
          }
     }
}

Where as before xxURL1xx.azurewebsites.net and xxURL2xx.azurewebsites.net are the URLs of the two webapps to balance.

In this script we apply a double proxy, so that we balance the input against the same ngin-x, attacking the ports 8001 and 8002, which headed to the webapps, but adding to the header the real WebAPP url.

After recording the script and restart the ngin-x service, if we navigate to the ngin-x server, we see that we are balanced from one to another web without problem.

To learn more about balancing modes available on ngin-x you can see this link.

 

Installing Nginx on an Azure Linux Ubuntu 16.04 VM

In this post we will see how to install nginx on a Ubuntu Linux 16.04 LTS virtual machine on Azure. This is one of the best HTTP servers and reverse proxy, and also an IMAP/POP3 proxy. It is open source.

Let’s assume that we have deployed the Linux virtual machine on a basic state. Otherwise, as summary, the steps are:

– Create a virtual machine from the gallery with Ubuntu 16.04. You can see my post about creating Linux VM.
– Change the default ssh port. You have instructions to do it in Azure in my post about it.
– Upgrading the system, connecting to a console session and running:

sudo apt-get update
sudo apt-get upgrade

This step is always recommended before installing a package (except production servers with previous production packages, that you have to consider whether or not it is convenient).

As we will install an HTTP server, if you have got a previous http server like Apache, you have to uninstall it to prevent conflicts.

Once the machine is ready to install nginx, from the ssh console run:

sudo apt-get install nginx

And finally we start the nginx service with:

sudo systemctl start nginx

Check that the service is active with:

sudo service nginx status

It provides service information that will be similar to the following screen:

20160505_Install_NGINX_Paso02

Now, we have installed nginx, with its default settings to port 80. If we go to the machine, trhought that port, the next page appears:

20160505_Install_NGINX_Paso03

For more information about nginx you can find it on this link.

The 25 largest files with Powershell

All we have encountered more than once out of space in our hard disks, usually at the most inopportune moment, and we had to dedicate ourselves to delete something to continue working.

Here I present a small script in Powershell which one we will find your largest files on your disk. Accepts two parameters:

  • The path where we want to check the size of the files. It can be an entire drive such as c:\ or, which is often more effective, a specific path such as the path where we keep our documents. The script searches recursively in the specified folder and its subfolders.
  • The second parameter is the number of files to display, starting with the largest. Normally between 25 or 50 largest usually left over something.

To use the script copies the following code (without the line numbers):

$Ruta = Read-Host 'Please, enter the route'
$NumFicheros = Read-Host 'Number of files to return'

Get-ChildItem -Path $Ruta -WarningAction SilentlyContinue -ErrorAction SilentlyContinue -Recurse -Force -File | `
 Select-Object @{Name='Ruta';Expression={($_.FullName)}},@{Name='Tamaño';Expression={($_.Length/1MB)}} | `
 Sort-Object -Property Tamaño -Descending | `
 Select-Object -First $NumFicheros | Format-table Ruta, {$_.Tamaño.ToString("000000.00")} -HideTableHeaders 
pause

Inside an empty text file and name it 25Ficheros.ps1

The important thing is to have the extension ps1. We have to have Powershell installed on our system. If we have Windows 10 installed, we have it already. If not, install it from following this link.

To execute it, you must click with the right mouse button on the file created with the script and select the option Run with Powershell.

20160420_25Ficheros_Paso01

If this is the first time you execute a Powershell script, it will tell you if you want to change the execution policy, as by default the scripts executions are not allowed. You will indicate yes, and you are requested for the two execution parameters.

20160420_25Ficheros_Paso02

Write both data and press ENTER after each item. The script will begin its work and after a moment (the more generic route to search in, the longer it takes) you will obtain the results in two columns. To the left, the file name with full path, and on the right the megas occupying the file.

20160420_25Ficheros_Paso03

For example, for my directory C:\Windows\System32 (and all subfolders), these are my top 25 files:

20160420_25Ficheros_Paso04

Press Enter again to close the window.

I hope you find it useful the script. For any comments do not hesitate to contact via links on social networks or email.

 

Assign OneDrive for business folder to a removable drive

OneDrive, the Microsoft storage unit in the cloud currently offers in its enterprise edition 1Tb of storage. It is a high quantity for a regular use. However, unlike his personal version, it is not allowed to change the local folder to a removable drive. By default, the business version is in the user path or if you modify it, in a path in a non-removable drive. In the personal version, during the installation process, you can either select a folder within or without a removable drive for the local files.

In the words of Microsoft, the two OneDrive are different products really, who share the name, hence the different behavior.

It might not seem like a problem at first, but with a terabyte of possible storage, devices such as Windows tablets, with 32Gb or 64Gb storage units, take 10 or 20 gigas, that could be in a memory card, can mean the difference between being able or not to use the system.

In such cases, have an SD card always inserted in the tablet is essential as a support unit. And it is ideal for local OneDrive storage unit.

The following solution allows this operation, but we must clarify that although functional, is not an official Microsoft solution, with all that involve. I’m not responsible for any problems that may arise, either.

The first step is to create a folder in a non-removable drive of our system. For example, at the root of C, called SD:

 

20160406_OnedriveSD_Paso01

 

The folder must be empty to continue, so we should not copy anything inside. Now we open the Disk Manager and look for the removable drive. Press the right button on it and select the option to change the drive letter:

20160406_OnedriveSD_Paso02

Click on the option to add route:

20160406_OnedriveSD_Paso03

 

And select the folder created at first and then press accept:

20160406_OnedriveSD_Paso04

 

Now the folder is a mount point of the removable unit. We just have to unlink OneDrive, if we have it linked, and relink, changing during the initialization process the local path, using the folder (not the removable drive). We observe that it does not put any constraint on the use of that folder and it begin normally synchronization. Files are sent to the removable drive, not taking space in non-removable drive.

If we want to remove the drive, we must unlink the OneDrive account first.

Changing SSH and XRDP ports in a Azure Linux virtual machine

 

A basic safety recommendation is to change the default connection ports of a system for the various available communications services. Let’s see how to change the ssh and xrdp ports on a Azure Linux virtual machine.

Change ssh port

Immediately after creating the virtual machine, the default port is 22. You can connect to the machine through its public IP or DNS with a client like Putty through that port. Edit the configuration file with nano for example:

sudo nano /etc/ssh/sshd_config

And we change where it says port 22 by the value we want (eg I put 40167):

20160401_CambioSSH_Paso02

Now to restart the ssh service, run:

sudo service ssh restart

We close the remote session that we are running, that still go through the port 22. Now we need to edit the security rule in the control panel of the virtual machine to reflect the change in port. To do this, we look for the machine in our Azure subscription, for example, in my case it is called f23uh4733:

20160401_CambioSSH_Paso04

Click on the entry safety rules option:

20160401_CambioSSH_Paso05

And we double click on the current rule for port 22:

20160401_CambioSSH_Paso06

And you must modify the value of the port 22 to port defined in the configuration file:

20160401_CambioSSH_Paso07

Pressing save after modification. The rule will take a few seconds to be applied.

Installing a remote desktop and xrdp port change

Now we will install a remote desktop. This will be necessary if Linux is a server image for example. Keep in mind that xrdp since Ubuntu 12.04LTS does not support Gnome Desktop, so we’ll use xfce.

First we install xrdp, executing the following command at the terminal:

sudo apt-get install xrdp

20160401_CambioSSH_Paso08

After the installation of xrdp, we must install xfce, running the command:

sudo apt-get install xfce4

20160401_CambioSSH_Paso09

The next step is to configure xrdp to use xfce. Run the following command:

echo xfce4-session >~/.xsession

20160401_CambioSSH_Paso10

Once installed the desktop, we will change the default port for remote connection. We use an editor, for example nano, to modify the xrdp configuration file. Run the command:

sudo nano /etc/xrdp/xrdp.ini

And modify the port with the desired value, in this case for example the port 40168:

20160401_CambioSSH_Paso12

We record the changes and restart the xrdp service to take effect, using the following command:

sudo service xrdp restart

20160401_CambioSSH_Paso13

Once you have configured the port, as before, we need to create the security rule that allows us to access. To do this we return to the list of rules of entry, and click the add button:

20160401_CambioSSH_Paso14

And we add a rule indicating the destination port that we have set in the previous step:

20160401_CambioSSH_Paso15

Press save button and wait for the rule to apply. After, we can open a remote desktop connection to the machine by the port:

20160401_CambioSSH_Paso16

We have to identify us with a UNIX user. If you have not created any, the administrator user serve us:

20160401_CambioSSH_Paso17

And we access the Linux desktop machine:

20160401_CambioSSH_Paso18

Public_IP

Azure VM’s public direction

Each virtual machine we deploy in Azure, by default, has assigned a public IP, through which we can access it. You can later modify both access ports as restrict, in certain cases, public access.

IP and DNS of a virtual machine

To access the public IP of a virtual machine created in ARM model, open the panel of the machine from the list of virtual machines:

20160326_IPDNSPublico_Paso01

In the main panel the public IP appears, and if it was configured, your DNS. If the DNS appears undefined, you can specify one by clicking on the link:

20160326_IPDNSPublico_Paso02

In the Public IP panel, we can see the address and easily copy both IP and DNS.

20160326_IPDNSPublico_Paso03

If you click Settings, you will access to specific IP options. We can establish a static IP to the virtual machine (default is dynamic) and define a DNS domain within our geographic region domain:

20160326_IPDNSPublico_Paso04

Una vez guardados los cambios, en segundos que se habían aplicado y estarán a disposición del público.