Monday, December 31, 2018

New Year Greetings 2019



GitHub vs. Bitbucket vs. AWS CodeCommit


GitHub

Bitbucket

AWS CodeCommit
Description GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Over three million people use GitHub to build amazing things together. Bitbucket gives teams one place to plan projects, collaborate on code, test and deploy, all with free private Git repositories. Teams choose Bitbucket because it has a superior Jira integration, built-in CI/CD, & is free for up to 5 users. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. You can use CodeCommit to securely store anything from source code to binaries, and it works seamlessly with your existing Git tools.
Pros • Open source friendly
• Easy source control
• Nice UI
• Great for team collaboration
• Easy setup
• Issue tracker
• Remote team collaboration
• Great community
• Great way to share
• Pull request and features planning
• Free private repos
• Simple setup
• Nice ui and tools
• Unlimited private repositories
• Affordable git hosting
• Integrates with many apis and services
• Reliable uptime
• Nice gui
• Pull requests and code reviews
• Very customisable
• Free private repos
• IAM integration
• Pay-As-You-Go Pricing
• Repo data encrypted at rest
• Amazon feels the most Secure
• I can make repository by myself if I have AWS account
• Faster deployments when using other AWS services
• Does not support web hooks yet!
• AWS CodePipeline integration
• Codebuild integration
Cons • Expensive for lone developers that want private repos
• Relatively slow product/feature release cadence
• Owned by micrcosoft
• API scoping could be better
• Not much community activity
• Difficult to review prs because of confusing ui
• Quite buggy
• Managed by enterprise Java company
• UI sucks
• No fork

Distributed Version Control System

Snapshots, Not Differences - The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These other systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).
Nearly Every Operation Is Local - Most operations in Git need only local files and resources to operate — generally no information is needed from another computer on your network. If you’re used to a CVCS where most operations have that network latency overhead, this aspect of Git will make you think that the gods of speed have blessed Git with unworldly powers. Because you have the entire history of the project right there on your local disk, most operations seem almost instantaneous.
DVCS clients not only check out the latest snapshot of the directory but they also fully mirror the repository. If the server goes down, then the repository from any client can be copied back to the server to restore it. Every checkout is a full backup of the repository. Git does not rely on the central server and that is why you can perform many operations when you are offline. You can commit changes, create branches, view logs, and perform other operations when you are offline. You require network connection only to publish your changes and take the latest changes.
DVCS Terminologies
Local Repository
Every VCS tool provides a private workplace as a working copy. Developers make changes in their private workplace and after commit, these changes become a part of the repository. Git takes it one step further by providing them a private copy of the whole repository. Users can perform many operations with this repository such as add file, remove file, rename file, move file, commit changes, and many more.
Working Directory and Staging Area or Index
The working directory is the place where files are checked out. In other CVCS, developers generally make modifications and commit their changes directly to the repository. But Git uses a different strategy. Git doesn’t track each and every modified file. Whenever you do commit an operation, Git looks for the files present in the staging area. Only those files present in the staging area are considered for commit and not all the modified files.

Friday, December 21, 2018

Snowflake Architecture

Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.

Snowflake’s unique architecture consists of three key layers:
  • Database Storage
  • Query Processing
  • Cloud Services
Database Storage When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage. Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
Query Processing Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider. Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.
Cloud Services The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider. Among the services in this layer:
  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control