Wednesday, January 3, 2018

Best Practices: Optimizing the Number of Application Servers

A production stack commonly includes multiple application servers distributed across multiple Availability Zones. However the number of incoming requests can vary substantially depending on time of day or day of the week. You could just run enough servers to handle the maximum anticipated load, but then much of the time you will end up paying for more server capacity than you need. To run your site efficiently, the recommended practice is to match the number of servers to the current request volume.

AWS OpsWorks Stacks provides three ways to manage the number of server instances.

24/7 instances are started manually and run until they are manually stopped.
Time-based instances are automatically started and stopped by AWS OpsWorks Stacks on a user-specified schedule.
Load-based instances are automatically started and stopped by AWS OpsWorks Stacks when they cross a threshold for a user-specified load metric such as CPU or memory utilization.

Note

After you have created and configured your stack's time and load-based instances, AWS OpsWorks Stacks automatically starts and stops them based on the specified configuration. You don't have to touch them again unless you decide to change the configuration or number of instances.

Recommendation: If you are managing stacks with more than a few application server instances, we recommend using a mix of all three instance types. The following is an example of how to manage a stack's server capacity to handle a variable daily request volume with the following characteristics.

The average request volume varies sinusoidally over the day.
The minimum average request volume requires five application server instances.
The maximum average request volume requires sixteen application server instances.
Spikes in request volume can usually be handled by one or two application server instances.

This is a convenient model for the purposes of discussion, but you can easily adapt it to any variation in request volume and also extend it to handle weekly variations. The following diagram shows how to use the three instance types to manage this request volume.

This example has the following characteristics:

The stack has three 24/7 instances, which are always on and handle the base load.
The stack has 12 time-based instances, which are configured to handle the average daily variation.

One runs from 10 PM to 2 AM, two more run from 8 PM to 10 PM and 2 AM to 4 AM, and so on. For simplicity, the diagram modifies the number of time-based instances every two hours, but you can modify the number every hour if you want finer-grained control.
The stack has enough load-based instances to handle traffic spikes that exceed what can be handled by the 24/7 and time-based instances.

AWS OpsWorks Stacks starts load-based instances only when the load across all of the currently running servers exceeds the specified metrics. The cost for nonrunning instances is minimal (Amazon EBS-backed instances) or nothing (instance store-backed instances), so the recommended practice is to create enough of them to comfortably handle your maximum anticipated request volumes. For this example, the stack should have at least three load-based instances.

Note

Make sure you have all three instance types distributed across multiple Availability Zones to mitigate the impact of any service disruptions.

Tuesday, January 2, 2018

Big data architecture style

A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems.

Big data solutions typically involve one or more of the following types of workload:

Batch processing of big data sources at rest.
Real-time processing of big data in motion.
Interactive exploration of big data.
Predictive analytics and machine learning.

Most big data architectures include some or all of the following components:

Data sources: All big data solutions start with one or more data sources. Examples include:
- Application data stores, such as relational databases.
- Static files produced by applications, such as web server log files.
- Real-time data sources, such as IoT devices.
Data storage: Data for batch processing operations is typically stored in a distributed file store that can hold high volumes of large files in various formats. This kind of store is often called a data lake. Options for implementing this storage include Azure Data Lake Store or blob containers in Azure Storage.
Batch processing: Because the data sets are so large, often a big data solution must process data files using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually these jobs involve reading source files, processing them, and writing the output to new files. Options include running U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight Spark cluster.
Real-time message ingestion: If the solution includes real-time sources, the architecture must include a way to capture and store real-time messages for stream processing. This might be a simple data store, where incoming messages are dropped into a folder for processing. However, many solutions need a message ingestion store to act as a buffer for messages, and to support scale-out processing, reliable delivery, and other message queuing semantics. Options include Azure Event Hubs, Azure IoT Hubs, and Kafka.
Stream processing: After capturing real-time messages, the solution must process them by filtering, aggregating, and otherwise preparing the data for analysis. The processed stream data is then written to an output sink. Azure Stream Analytics provides a managed stream processing service based on perpetually running SQL queries that operate on unbounded streams. You can also use open source Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster.
Analytical data store: Many big data solutions prepare data for analysis and then serve the processed data in a structured format that can be queried using analytical tools. The analytical data store used to serve these queries can be a Kimball-style relational data warehouse, as seen in most traditional business intelligence (BI) solutions. Alternatively, the data could be presented through a low-latency NoSQL technology such as HBase, or an interactive Hive database that provides a metadata abstraction over data files in the distributed data store. Azure SQL Data Warehouse provides a managed service for large-scale, cloud-based data warehousing. HDInsight supports Interactive Hive, HBase, and Spark SQL, which can also be used to serve data for analysis.
Analysis and reporting: The goal of most big data solutions is to provide insights into the data through analysis and reporting. To empower users to analyze the data, the architecture may include a data modeling layer, such as a multidimensional OLAP cube or tabular data model in Azure Analysis Services. It might also support self-service BI, using the modeling and visualization technologies in Microsoft Power BI or Microsoft Excel. Analysis and reporting can also take the form of interactive data exploration by data scientists or data analysts. For these scenarios, many Azure services support analytical notebooks, such as Jupyter, enabling these users to leverage their existing skills with Python or R. For large-scale data exploration, you can use Microsoft R Server, either standalone or with Spark.
Orchestration: Most big data solutions consist of repeated data processing operations, encapsulated in workflows, that transform source data, move data between multiple sources and sinks, load the processed data into an analytical data store, or push the results straight to a report or dashboard. To automate these workflows, you can use an orchestration technology such Azure Data Factory or Apache Oozie and Sqoop.

Azure includes many services that can be used in a big data architecture. They fall roughly into two categories:

Managed services, including Azure Data Lake Store, Azure Data Lake Analytics, Azure Data Warehouse, Azure Stream Analytics, Azure Event Hub, Azure IoT Hub, and Azure Data Factory.
Open source technologies based on the Apache Hadoop platform, including HDFS, HBase, Hive, Pig, Spark, Storm, Oozie, Sqoop, and Kafka. These technologies are available on Azure in the Azure HDInsight service.

These options are not mutually exclusive, and many solutions combine open source technologies with Azure services.

When to use this architecture

Consider this architecture style when you need to:

Store and process data in volumes too large for a traditional database.
Transform unstructured data for analysis and reporting.
Capture, process, and analyze unbounded streams of data in real time, or with low latency.
Use Azure Machine Learning or Microsoft Cognitive Services.

Benefits

Technology choices. You can mix and match Azure managed services and Apache technologies in HDInsight clusters, to capitalize on existing skills or technology investments.
Performance through parallelism. Big data solutions take advantage of parallelism, enabling high-performance solutions that scale to large volumes of data.
Elastic scale. All of the components in the big data architecture support scale-out provisioning, so that you can adjust your solution to small or large workloads, and pay only for the resources that you use.
Interoperability with existing solutions. The components of the big data architecture are also used for IoT processing and enterprise BI solutions, enabling you to create an integrated solution across data workloads.

Challenges

Complexity. Big data solutions can be extremely complex, with numerous components to handle data ingestion from multiple data sources. It can be challenging to build, test, and troubleshoot big data processes. Moreover, there may be a large number of configuration settings across multiple systems that must be used in order to optimize performance.
Skillset. Many big data technologies are highly specialized, and use frameworks and languages that are not typical of more general application architectures. On the other hand, big data technologies are evolving new APIs that build on more established languages. For example, the U-SQL language in Azure Data Lake Analytics is based on a combination of Transact-SQL and C#. Similarly, SQL-based APIs are available for Hive, HBase, and Spark.
Technology maturity. Many of the technologies used in big data are evolving. While core Hadoop technologies such as Hive and Pig have stabilized, emerging technologies such as Spark introduce extensive changes and enhancements with each new release. Managed services such as Azure Data Lake Analytics and Azure Data Factory are relatively young, compared with other Azure services, and will likely evolve over time.
Security. Big data solutions usually rely on storing all static data in a centralized data lake. Securing access to this data can be challenging, especially when the data must be ingested and consumed by multiple applications and platforms.

Best practices

Leverage parallelism. Most big data processing technologies distribute the workload across multiple processing units. This requires that static data files are created and stored in a splittable format. Distributed file systems such as HDFS can optimize read and write performance, and the actual processing is performed by multiple cluster nodes in parallel, which reduces overall job times.
Partition data. Batch processing usually happens on a recurring schedule — for example, weekly or monthly. Partition data files, and data structures such as tables, based on temporal periods that match the processing schedule. That simplifies data ingestion and job scheduling, and makes it easier to troubleshoot failures. Also, partitioning tables that are used in Hive, U-SQL, or SQL queries can significantly improve query performance.
Apply schema-on-read semantics. Using a data lake lets you to combine storage for files in multiple formats, whether structured, semi-structured, or unstructured. Use schema-on-read semantics, which project a schema onto the data when the data is processing, not when the data is stored. This builds flexibility into the solution, and prevents bottlenecks during data ingestion caused by data validation and type checking.
Process data in-place. Traditional BI solutions often use an extract, transform, and load (ETL) process to move data into a data warehouse. With larger volumes data, and a greater variety of formats, big data solutions generally use variations of ETL, such as transform, extract, and load (TEL). With this approach, the data is processed within the distributed data store, transforming it to the required structure, before moving the transformed data into an analytical data store.
Balance utilization and time costs. For batch processing jobs, it's important to consider two factors: The per-unit cost of the compute nodes, and the per-minute cost of using those nodes to complete the job. For example, a batch job may take eight hours with four cluster nodes. However, it might turn out that the job uses all four nodes only during the first two hours, and after that, only two nodes are required. In that case, running the entire job on two nodes would increase the total job time, but would not double it, so the total cost would be less. In some business scenarios, a longer processing time may be preferable to the higher cost of using under-utilized cluster resources.
Separate cluster resources. When deploying HDInsight clusters, you will normally achieve better performance by provisioning separate cluster resources for each type of workload. For example, although Spark clusters include Hive, if you need to perform extensive processing with both Hive and Spark, you should consider deploying separate dedicated Spark and Hadoop clusters. Similarly, if you are using HBase and Storm for low latency stream processing and Hive for batch processing, consider separate clusters for Storm, HBase, and Hadoop.
Orchestrate data ingestion. In some cases, existing business applications may write data files for batch processing directly into Azure storage blob containers, where they can be consumed by HDInsight or Azure Data Lake Analytics. However, you will often need to orchestrate the ingestion of data from on-premises or external data sources into the data lake. Use an orchestration workflow or pipeline, such as those supported by Azure Data Factory or Oozie, to achieve this in a predictable and centrally manageable fashion.
Scrub sensitive data early. The data ingestion workflow should scrub sensitive data early in the process, to avoid storing it in the data lake.

Deployment and Release Strategies & Best Practices in Cloud

Application Deployment and Release Strategies & Best Practices

In this article we’ll be covering various options and considerations for deploying code and releasing features. We’ll discuss patterns for deploying to a fixed set of servers as well as variations where multiple groups of servers can be utilized. We’ll wrap up with strategies for releasing features to targeted groups of users.
While some of the strategies provide better capabilities than others, this is not meant to be a review or comparison, but rather a guide to the options at hand. Every situation is unique and may require a different implementation based on the constraints at hand.

Single Server Group Deployments

In many situations you may have a set of dedicated servers running your application. This may be a traditional data center setting where procuring servers is difficult, or maybe deployment to devices in the field, such as point of sale, or any other case where you need to deploy to a fixed set of servers in place.

Highlander:

The most traditional deployment patten is the Highlander strategy. In this pattern all instances running a version of an application are upgraded to the new version at the same time. This is common for apps that don’t require significant uptime such as lower life cycle development servers, or hobby applications.
This is simple but high risk strategy that will impact all users not only in the event of a failure but as part of the deployment itself. Even with a successful deploy, the servers will need to stop taking traffic when switching to the new code.

2. Canary Deployment:

A safer pattern than Highlander the Canary deployment that deploys to only a small portion of the available servers. This pattern allows the new code to be introduced into a live environment and monitored for any abhorrent behavior. Any issues with the code or deployment are limited to a smaller set of users.
While this patten does provide a safer option, having multiple versions running for a length of time brings its own set of challenges ranging from operations to user facing.

3. Rolling Deploy:

This is safest so far, limiting user downtime and impact, but requires more sophisticated deployment tooling.

The rolling deploy is simply the continuation of the canary deploy. In this case you would update one server after another until your whole bank of servers has been upgraded.

Multiple Server Group Deployments

With the adoption of virtualization and cloud computing, the need to limit ourselves to a fixed set of servers is gone. Instead we can spin up while new sets of servers whenever we choose.
This helps enable a deployment best practice where you separate your deployment from the release or usage of the deployment. For example with multiple server groups you can deploy to a new set of servers but never activate it to receive user traffic.

1. Blue / Green

The Blue/Green pattern (or Red/Black depending on your camp) is the Highlander patten for multiple server groups. In this strategy a new server group with the new version of code is stood up with no traffic. Once all the servers are ready, all the traffic is directed to the new bank of servers.
This technique allows rapid rollback in the event of a failure since we’ve remove the deployment from the equation and are only directing traffic to one version or the other.
While this is a great example of separating the deploy from traffic and while it provides rapid rollback, this is still an all or nothing switch.
As with the Highlander strategy all users are impacted during the switch. Even though we don’t need to deploy the code, often times applications have a bit of startup time where connections are built, and objects are cached which will impact users.

2. Canary with two groups

The Canary deploy with multiple groups works very similar to the Canary in a single group. The main difference is that we’ve separated the deploy from the traffic.
The most straight forward way is to introduce a new group with one server and add the group to the load balancer. If you had three servers for version one, adding this would introduce a fourth server and direct a quarter of the traffic to the new instance.
This allows you to monitor the new code under live conditions before serving it to all your users.
A variation depending on your capabilities would be to deploy three new servers with the new version and spray a small prevent of all traffic to the new servers. This allows more fine grained control over how many of the users are impacted and provides a warmup period for all the new servers.

3. Rolling Deploy with two groups

Again as with the single group patterns the rolling deploy for multiple groups is just a continuation of the Canary deploy. We deploy our code and servers in one step then add traffic separately.
Here though we have to add a new technique. As we continue to add servers to the mix, we’ll need to take servers out of rotation on the old group.
This would be an ideal strategy for a CI/CD stack where robust health checking and operational monitoring allow code to automatically roll live.

Feature Release Strategies

Much of the focus with deployment strategies is on the act of putting code into the environment. We’ve talk briefly about patterns that separate the deployment from the user traffic but those were still focused on the code.
Multiple issues can crop up when focusing only on the code. User sessions may be dropped mid stream, users may see V1 of a page on one click, then see V2 on refresh and back to V1 on yet another refresh.
In this section we’ll review strategies targeted toward the user experience.

Environment Separation

The most basic pattern for providing a consistent user experience and testing new code is to build a new environment or site. You may offer your users an option to try out the new site at http://beta.yourcompany.com. You might use this for a huge new design review or as part of your regular process where every code deploy goes to beta for a time before moving to production.
The clean separation allows for simple management and clear operation.

Feature Toggles

Feature toggles are a technique where both versions of your feature are included in the same code base, but are surrounded by logic to execute on or the other based on external factors such as a property value or database switch.
This is a useful technique to separate the deploy from usage in any setup, multiple server groups, single group and legacy monoliths.
Ideally these are more dynamic in nature, managed by a backend datastore. Operators would toggle a feature on or off by updating the setting in the database not by deploying code or manipulating traffic.
This also acts as a safety shutoff in case some external dependency or service provider starts to impact your site, just flick the switch and shut off the feature that uses them.

User Targeted Testing

Feature toggles are useful but by default they’re all or nothing and don’t provide the ability to test a new feature for a group of users.
Small enhancements to the Toggle pattern allow the switch to be related to users instead of the system as a whole.
For example instead of the toggle using an database on/off value to show version A or B, you might utilize a cookie value. All users with a cookie value ending in an odd number would get one version of the feature while those with an even cookie could get version 2 of the feature.
This technique can be very simple or as complicated as you wish. You might set random cookie and split ⁵⁰⁄₅₀ as listed above or get more sophisticated and barrack down to smaller percentages. You might also begin to utilize user data such as location to target users on the east coast. You could even tap into the customers profile to target their experience.
Entire companies and products are built around this technique but there is a lot of value in even the simplest implementation.

Conclusion

There are many techniques for deploying code into an environment. Depending on your use case one may fit better than another.
Balancing the technical complexity with the customer impact and overall business needs will ultimately drive which pattern works best for

Wednesday, January 3, 2018

Best Practices: Optimizing the Number of Application Servers

Best Practices: Optimizing the Number of Application Servers

Tuesday, January 2, 2018

Big data architecture style

When to use this architecture

Benefits

Challenges

Best practices

Deployment and Release Strategies & Best Practices in Cloud

Application Deployment and Release Strategies & Best Practices

Single Server Group Deployments

Highlander:

2. Canary Deployment:

The rolling deploy is simply the continuation of the canary deploy. In this case you would update one server after another until your whole bank of servers has been upgraded.

Multiple Server Group Deployments

1. Blue / Green

2. Canary with two groups

3. Rolling Deploy with two groups

Feature Release Strategies

Environment Separation

Feature Toggles

User Targeted Testing

Conclusion

There are many techniques for deploying code into an environment. Depending on your use case one may fit better than another. Balancing the technical complexity with the customer impact and overall business needs will ultimately drive which pattern works best for

There are many techniques for deploying code into an environment. Depending on your use case one may fit better than another.
Balancing the technical complexity with the customer impact and overall business needs will ultimately drive which pattern works best for