Monday, December 31, 2018

New Year Greetings 2019



GitHub vs. Bitbucket vs. AWS CodeCommit


GitHub

Bitbucket

AWS CodeCommit
Description GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Over three million people use GitHub to build amazing things together. Bitbucket gives teams one place to plan projects, collaborate on code, test and deploy, all with free private Git repositories. Teams choose Bitbucket because it has a superior Jira integration, built-in CI/CD, & is free for up to 5 users. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. You can use CodeCommit to securely store anything from source code to binaries, and it works seamlessly with your existing Git tools.
Pros • Open source friendly
• Easy source control
• Nice UI
• Great for team collaboration
• Easy setup
• Issue tracker
• Remote team collaboration
• Great community
• Great way to share
• Pull request and features planning
• Free private repos
• Simple setup
• Nice ui and tools
• Unlimited private repositories
• Affordable git hosting
• Integrates with many apis and services
• Reliable uptime
• Nice gui
• Pull requests and code reviews
• Very customisable
• Free private repos
• IAM integration
• Pay-As-You-Go Pricing
• Repo data encrypted at rest
• Amazon feels the most Secure
• I can make repository by myself if I have AWS account
• Faster deployments when using other AWS services
• Does not support web hooks yet!
• AWS CodePipeline integration
• Codebuild integration
Cons • Expensive for lone developers that want private repos
• Relatively slow product/feature release cadence
• Owned by micrcosoft
• API scoping could be better
• Not much community activity
• Difficult to review prs because of confusing ui
• Quite buggy
• Managed by enterprise Java company
• UI sucks
• No fork

Distributed Version Control System

Snapshots, Not Differences - The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These other systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).
Nearly Every Operation Is Local - Most operations in Git need only local files and resources to operate — generally no information is needed from another computer on your network. If you’re used to a CVCS where most operations have that network latency overhead, this aspect of Git will make you think that the gods of speed have blessed Git with unworldly powers. Because you have the entire history of the project right there on your local disk, most operations seem almost instantaneous.
DVCS clients not only check out the latest snapshot of the directory but they also fully mirror the repository. If the server goes down, then the repository from any client can be copied back to the server to restore it. Every checkout is a full backup of the repository. Git does not rely on the central server and that is why you can perform many operations when you are offline. You can commit changes, create branches, view logs, and perform other operations when you are offline. You require network connection only to publish your changes and take the latest changes.
DVCS Terminologies
Local Repository
Every VCS tool provides a private workplace as a working copy. Developers make changes in their private workplace and after commit, these changes become a part of the repository. Git takes it one step further by providing them a private copy of the whole repository. Users can perform many operations with this repository such as add file, remove file, rename file, move file, commit changes, and many more.
Working Directory and Staging Area or Index
The working directory is the place where files are checked out. In other CVCS, developers generally make modifications and commit their changes directly to the repository. But Git uses a different strategy. Git doesn’t track each and every modified file. Whenever you do commit an operation, Git looks for the files present in the staging area. Only those files present in the staging area are considered for commit and not all the modified files.

Friday, December 21, 2018

Snowflake Architecture

Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.

Snowflake’s unique architecture consists of three key layers:
  • Database Storage
  • Query Processing
  • Cloud Services
Database Storage When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage. Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
Query Processing Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider. Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.
Cloud Services The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider. Among the services in this layer:
  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

Tuesday, November 27, 2018

.NET Core 3

.NET Core is a free and open-source managed computer software framework for the Windows, macOS and Linux operating systems. It consists of CoreCLR, a complete runtime implementation of CLR, the virtual machine that manages the execution of .NET programs. CoreCLR comes with an improved just-in-time compiler, called RyuJIT. .NET Core also includes CoreFX, which is a partial fork of FCL. While .NET Core shares a subset of .NET Framework APIs, it comes with its own API that is not part of .NET Framework. Further, .NET Core contains CoreRT, the .NET Native runtime optimized to be integrated into AOT compiled native binaries. A variant of the .NET Core library is used for UWP. .NET Core's command-line interface offers an execution entry point for operating systems and provides developer services like compilation and package management.
.NET Core 3.0 supports for Windows Desktop applications, specifically Windows Forms, Windows Presentation Framework (WPF), and UWP XAML. There are many benefits that .NET Core will bring for desktop applications. Some of them are mentioned below.
  • Performance improvements and other runtime updates
  • Easy to use or test a new version of .NET Core
  • Enables both machine-global and application-local deployment
  • Support for the .NET Core CLI tools and SDK-style projects in Visual Studio
New features in .NET Core 3.0
  • Side-by-side and App-local Deployment: The .NET Core deployment model is one the biggest benefits that Windows desktop developers will experience with .NET Core 3. In short, you can install .NET Core in pretty much any way you want. It comes with a lot of deployment flexibility. Deployment of .NET Core desktop applications can either use a global install of the .NET Core runtime (similar to how .NET Framework is deployed), or side-by-side deployment so that each application uses its own version of the runtime.
  • Easily convert existing Desktop applications to .Net Core 3: The conversion of existing desktop application to .NET Core 3.0 will be pretty straightforward.
  • Improvement to Project Files: The .Net Core has adopted the SDK based project structure as it offers many advantages like:
    • Much smaller and cleaner project files
    • Much friendlier to source control (fewer changes and smaller diffs)
    • Edit project files in Visual Studio without unloading
    • NuGet is part of the build and responsive to changes like target framework update
    • Supports multi-targeting
  • Continue to support Controls, NuGet Packages, and Existing Assembly References: Desktop applications often have many dependencies, maybe from a control vendor, from NuGet or binaries that don’t have a source any more. .NET Core 3.0 will continue to support dependencies as-is without requiring developers to rewrite functionalities.

Monday, November 26, 2018

Splunk

Over the last decade, due to growing number of machines in the IT infrastructure and use of IoT devides there is an exponential growth in machine data. This machine data has a lot of valuable information that can drive efficiency, productivity and visibility for the business. Splunk was founded in 2003 for one purpose: To Make Sense Of Machine Generated Log Data. Splunk  captures, indexes, and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards, and visualizations. Splunk's mission is to make machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems, and providing intelligence for business operations. Splunk is a horizontal technology used for application management, security and compliance, as well as business and Web analytics.
Splunk's core collects and analyzes high volumes of machine-generated data. It uses a standard API to connect directly to applications and devices. It was developed in response to the demand for comprehensible and actionable data reporting for executives outside a company's IT department.
Splunk Enterprise Security (ES) is a security information and event management (SIEM) solution that provides insight into machine data generated from security technologies such as network, endpoint, access, malware, vulnerability and identity information. Its a premium application that is licensed independently from Splunk core.
Splunkbase is a community hosted by Splunk where users can go to find apps and add-ons for Splunk which can improve the functionality and usefulness of Splunk, as well as provide a quick and easy interface for specific use-cases and/or vendor products. Splunk apps and add-ons can be developed by anyone, including Splunk themselves.
Real time processing is Splunk’s biggest selling point, the other benefits with implementing Splunk are:
  • Your input data can be in any format for e.g. .csv, or json or other formats
  • You can configure Splunk to give Alerts / Events notification at the onset of a machine state
  • You can accurately predict the resources needed for scaling up the infrastructure
  • You can create knowledge objects for Operational Intelligence

Monday, October 22, 2018

Big Data: Spark

Apache Spark is an open-source distributed general-purpose cluster-computing framework. Apache Spark has as its architectural foundation the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.
Spark uses cluster computing for its computational (analytics) power as well as its storage. This means it can use resources from many computer processors linked together for its analytics. It's a scalable solution meaning that if more oomph is needed, you can simply introduce more processors into the system. With distributed storage, the huge datasets gathered for Big Data analysis can be stored across many smaller individual physical hard discs. This speeds up read/write operations, because the "head" which reads information from the discs has less physical distance to travel over the disc surface. As with processing power, more storage can be added when needed, and the fact it uses commonly available commodity hardware (any standard computer hard discs) keeps down infrastructure costs.
Unlike Hadoop, Spark does not come with its own file system - instead it can be integrated with many file systems including Hadoop's HDFS, MongoDB and Amazon's S3 system.
Another element of the framework is Spark Streaming, which allows applications to be developed which perform analytics on streaming, real-time data - such as automatically analyzing video or social media data - on-the-fly, in real-time.
In fast changing industries such as marketing, real-time analytics has huge advantages, for example ads can be served based on a user's behavior at a particular time, rather than on historical behavior, increasing the chance of prompting an impulse purchase.

Sunday, October 21, 2018

Snowflake : Cloud Data Warehouse

Snowflake allows corporate users to store and analyze data using cloud-based hardware and software. The data is stored in Amazon S3. It is built for speed, even with the most intense workloads. Its patented architecture separates compute from storage so you can scale up and down on the fly, without delay or disruption. You get the performance you need exactly when you need it.
Snowflake is a fully columnar database with vectorized execution, making it capable of addressing even the most demanding analytic workloads. Snowflake’s adaptive optimization ensures queries automatically get the best performance possible – no indexes, distribution keys or tuning parameters to manage. Snowflake can support unlimited concurrency with its unique multi-cluster, shared data architecture. This allows multiple compute clusters to operate simultaneously on the same data without degrading performance. Snowflake can even scale automatically to handle varying concurrency demands with its multi-cluster virtual warehouse feature, transparently adding compute resources during peak load periods and scaling down when loads subside.
Snowflake’s mission is to enable every organization to be data-driven with instant elasticity, secure data sharing and per-second pricing, across multiple clouds. Snowflake combines the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.

Wednesday, April 25, 2018

Jest : Javascript Unit Testing Framework

Jest is delightful JavaScript Testing framework. It has below features:

  • Easy Setup: Complete and easy to set-up JavaScript testing solution. Works out of the box for any React project.
  • Instant Feedback: Fast interactive watch mode runs only test files related to changed files and is optimized to give signal quickly.
  • Snapshot Testing: Capture snapshots of React trees or other serializable values to simplify testing and to analyze how state changes over time.
  • Zero configuration testing platform: Jest is used by Facebook to test all JavaScript code including React applications. Jest is already configured when you use create-react-app or react-native init to create your React and React Native projects. Place your tests in a __tests__ folder, or name your test files with a .spec.js or .test.js extension. Whatever you prefer, Jest will find and run your tests.
  • Fast and sandboxed: Jest parallelizes test runs across workers to maximize performance. Console messages are buffered and printed together with test results. Sandboxed test files and automatic global state resets for every test so no two tests conflict with each other.
  • Built-in code coverage reports: Easily create code coverage reports using --coverage. No additional setup or libraries needed! Jest can collect code coverage information from entire projects, including untested files.
  • Powerful mocking library: Jest has powerful mocking library for functions and modules. Mock React Native components using jest-react-native.
  • Works with TypeScript: Jest works with any compile-to-JavaScript language and integrates seamlessly with Babel and with TypeScript through ts-jest.

Monday, April 16, 2018

Queues - RabbitMQ vs SQS

RabbitMQ
RabbitMQ supports powerful message routing via exchange. This is very important when we need to run the same job on a specific server, group of servers or all servers. Our application sends one message and exchange will route it. RabbitMQ also has vhosts so that multiple applications can share the same RabbitMQ server but be isolated from each other (we can create unique logins for separate applications to access their vhosts). RabbitMQ can be setup in clusters for redundancy / failover and will acknowledge receipt of messages.
RabbitMQ has a powerful GUI which is accessible http://localhost:15672/. Hosting for RabbitMQ offers fewer choices and is more expensive.

AWS SQS
AWS SQS takes care of managing our queues. AWS SDK offers low level access but shoryuken is a higher level integration (shoryuken author acknowledges sidekiq as inspiration). Retrying failed jobs will happen automatically unless the messages is explicitly deleted. We can only delay jobs for 15 minutes (or we get The maximum allowed delay is 15 minutes (RuntimeError)). Recurring jobs are not supported but there are workarounds with AWS lambda and CloudWatch. AWS SQS UI is decent and we can use AWS SDK to access data directly. SQS has other interesting features such as long polling, batch operations and dead letter queues. SQS also has FIFO queues which guarantee he order in which messages are sent and received (and does not allow dupes). However, FIFO queues only allow 300 TPS (much less that regular SQS). Shoryuken works with standard and FIFO queues.
Hosting - easy to setup and cheap (pay for what you use) but obviously only available on AWS. SQS is a great choice when we need to run LOTS of jobs or when we do not care about more advanced options such as scheduling.


RabbitMQ
Amazon SQS
DescriptionRabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.
Pros• It's fast and it works with good metrics/monitoring
• Ease of configuration
• I like the admin interface
• Easy to set-up and start with
• Intuitive work through python
• Standard protocols
• Durable
• Written primarily in Erlang
• Simply superb
• Completeness of messaging patterns
• Easy to use, reliable
• Low cost
• Simple
• Doesn't need to maintain
• Delayed delivery upto 12 hours
• Has a max message size (currently 256K)
• Delayed delivery upto 15 mins only
Cons• Needs Erlang runtime. Need ops good with Erlang runtime
• Too complicated cluster/HA config and management
• Configuration must be done first, not by your code
• SQS has guaranteed delivery, but messages can be delivered more than once.

Friday, March 16, 2018

NoSQL : CAP

Distributed
– Sharding: splittng data over servers “horizontally”
– Replication

Lower-level than RDBMS/SQL
– Simpler ad hoc APIs
– But you build the application (programming not querying)
– Operations simple and cheap

Different flavours (for different scenarios)
– Different CAP emphasis
– Different scalability profiles
– Different query functionality
– Different data models
NoSQL: CAP (not ACID)
Consistency: each client always has the same view of the data.
Availability: all clients can always read and write.
Partition tolerance: the system works well across physical network partitions.

Source: AWS NoSQL DynamoDB

Monday, March 12, 2018

Node Version Manager (NVM)

Node Version Manager (NVM) is a neat little bash script that allows you to manage multiple versions of Node.js on the same box. A version manager really helps to test our applications under different versions of the related software. nvm is a tool that allows you to download and install Node.js. You don't need nvm unless you you want to keep multiple versions of Node.js installed on your system or if you'd like to upgrade your current version.
Command to check NVM version:
nvm --version.

Bash command to install NVM:
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.26.1/install.sh | bash
How to use NVM:
1. Install Node.js
If you have not already installed Node.js on your machine, it is time to install the latest or a specific version of Node.js using NVM. Below command will install version 0.12.7 for 64-bit Windows by running the command:
$ nvm install 0.12.7 64

2. List the available Node.js versions
In order to list all the Node.js versions installed on your machine, simply run the command:
$ nvm list

3. Use a specific Node.js version
Below command used to set version by running the command:
$ nvm use 4.2.1 64
As Node.js is still a go-to solution, many versions of it are released and new versions will be released in the future. That is where testing an application with various Node.js version comes handy.

Wednesday, February 28, 2018

Microservices

Microservices is a variant of the service-oriented architecture (SOA) architectural style that structures an application as a collection of loosely coupled services. In a microservices architecture, services should be fine-grained and the protocols should be lightweight. The benefit of decomposing an application into different smaller services is that it improves modularity and makes the application easier to understand, develop and test.
Microservice architecture, or simply microservices, is a distinctive method of developing software systems that has grown in popularity in recent years.  In fact, even though there isn’t a whole lot out there on what it is and how to do it, for many developers it has become a preferred way of creating enterprise applications.  Thanks to its scalability, this architectural method is considered particularly ideal when you have to enable support for a range of platforms and devices—spanning web, mobile, Internet of Things, and wearables—or simply when you’re not sure what kind of devices you’ll need to support in an increasingly cloudy future.
Microservices Pros and Cons:
Among their advantages for software development, microservices:
    Are easily deployed
    Require less production time
    Can scale quickly
    Can be reused among different projects
    Work well with containers, such as Docker
    Complement cloud activities

However, there are also drawbacks with microservices, such as:
    Potentially too granular
    Latency during heavy use
    Testing can be complex

Tuesday, February 27, 2018

SSL : Secure Sockets Layer

What is SSL? SSL stands for Secure Sockets Layer, an encryption technology that was originally created by Netscape in the 1990s. SSL creates an encrypted connection between your web server and your visitors' web browser allowing for private information to be transmitted without the problems of eavesdropping, data tampering, and message forgery.
To enable SSL on a website, you will need to get an SSL Certificate that identifies you and install it on your web server. When a web browser is using an SSL certificate it usually displays a padlock icon but it may also display a green address bar. Once you have installed an SSL Certificate, you can access a site securely by changing the URL from http:// to https://. If SSL is properly deployed, the information transmitted between the web browser and the web server (whether it is contact or credit card information), is encrypted and only seen by the organization that owns the website.
SSL vs. TLS
SSL and TLS generally mean the same thing. TLS 1.0 was created by RFC 2246 in January 1999 as the next version of SSL 3.0. Most people are familiar with the term SSL so that is usually the term that is used when the system is using the newer TLS protocol.
What is a certificate authority (CA)?
A certificate authority is an entity which issues digital certificates to organizations or people after validating them. Certification authorities have to keep detailed records of what has been issued and the information used to issue it, and are audited regularly to make sure that they are following defined procedures. Every certification authority provides a Certification Practice Statement (CPS) that defines the procedures that will be used to verify applications. There are many commercial CAs that charge for their services (VeriSign). Institutions and governments may have their own CAs, and there are also free Certificate Authorities.

Wednesday, January 31, 2018

Javascript Module Definition

Over the years there’s been a steadily increasing ecosystem of JavaScript components to choose from. The sheer amount of choices is fantastic, but this also infamously presents a difficulty when components are mixed-and-matched. And it doesn’t take too long for budding developers to find out that not all components are built to play nicely together.

To address these issues, the competing module specs AMD and CommonJS have appeared on the scene, allowing developers to write their code in an agreed-upon sandboxed and modularized way, so as not to “pollute the ecosystem”.

Asynchronous module definition (AMD)
Asynchronous module definition (AMD) is a specification for the programming language JavaScript. It defines an application programming interface (API) that defines code modules and their dependencies, and loads them asynchronously if desired. Implementations of AMD provide the following benefits:
  • Website performance improvements. AMD implementations load smaller JavaScript files, and then only when they are needed.
  • Fewer page errors. AMD implementations allow developers to define dependencies that must load before a module is executed, so the module does not try to use outside code that is not available yet.
In addition to loading multiple JavaScript files at runtime, AMD implementations allow developers to encapsulate code in smaller, more logically-organized files, in a way similar to other programming languages such as Java. For production and deployment, developers can concatenate and minify JavaScript modules based on an AMD API into one file, the same as traditional JavaScript.

CommonJS
CommonJS is a style you may be familiar with if you’re written anything in Node (which uses a slight variant). It’s also been gaining traction on the frontend with Browserify.

Universal Module Definition (UMD)
Since CommonJS and AMD styles have both been equally popular, it seems there’s yet no consensus. This has brought about the push for a “universal” pattern that supports both styles, which brings us to none other than the Universal Module Definition.
The pattern is admittedly ugly, but is both AMD and CommonJS compatible, as well as supporting the old-style “global” variable definition.

Thursday, January 25, 2018

webpack : module bundler

Webpack is the latest and greatest in front-end development tools. It is a module bundler that works great with the most modern of front-end workflows including Babel, ReactJS, CommonJS, among others.
At its core, webpack is a static module bundler for modern JavaScript applications. When webpack processes your application, it recursively builds a dependency graph that includes every module your application needs, then packages all of those modules into one or more bundles.

Link: https://webpack.js.org/