Tuesday, July 26, 2016

U-SQL

U-SQL is a data processing language that unifies the benefits of SQL with the expressive power of your own code to process all data at any scale. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across relational stores such as Azure SQL Database. It enables you to process unstructured data by applying schema on read, insert custom logic and UDF's, and includes extensibility to enable fine grained control over how to execute at scale.
U-SQL is the new big data query language of the Azure Data Lake Analytics service. It evolved out of Microsoft's internal Big Data language called SCOPE and combines a familiar SQL-like declarative language with the extensibility and programmability provided by C# types and the C# expression language and big data processing concepts such as “schema on reads”, custom processors and reducers. It also provides the ability to query and combine data from a variety of data sources, including Azure Data Lake Storage, Azure Blob Storage, and Azure SQL DB, Azure SQL Data Warehouse, and SQL Server instances running in Azure VMs. It is however not ANSI SQL.
U-SQL script:
The main unit of a U-SQL “program” is a U-SQL script. A script consists of an optional script prolog and a sequence of U-SQL statements.
@t = EXTRACT date string
        , time string
        , author string
        , tweet string
    FROM "/input/MyTwitterHistory.csv"
    USING Extractors.Csv();

@res = SELECT author
    , COUNT(*) AS tweetcount
    FROM @t
    GROUP BY author;

OUTPUT @res TO "/output/MyTwitterAnalysis.csv"
ORDER BY tweetcount DESC
USING Outputters.Csv();
The above U-SQL script shows the three major steps of processing data with U-SQL:
  • Extract data from your source, using EXTRACT statement in query. The datatypes are based on C# datatypes and it use the built-in Extractors library to read and schematize the CSV file.
  • Transform using SQL and/or custom user defined operators.
  • Output the result either into a file or into a U-SQL table to store it for further processing.
U-SQL combines some familiar concepts from a variety of languages: It is a declarative language like SQL, it follows a dataflow-like composition of statements and expressions like Cascading, and provides simple ways to extend the language with user-defined operators, user-defined aggregators and user-defined functions using C#, and provides a SQL database-like metadata object model to manage, discover and secure structured data and user-code.

Sunday, July 10, 2016

SQL Server Management Objects (SMO)

SQL Server Management Objects (SMO) are .NET objects introduced by Microsoft as of Microsoft SQL Server 2005, designed to allow for easy and simple programmatic management of Microsoft SQL Server. You can use SMO to build customized SQL Server management applications. The SMO object model extends and supersedes the Distributed Management Objects (SQL-DMO) object model. Compared to SQL-DMO, SMO increases performance, control, and ease of use. Most SQL-DMO functionality is included in SMO, and there are various new classes that support new features in SQL Server. The object model is intuitive and uses SQL-DMO terminology, where it is possible, to help transfer your skills.
New features in SMO SQL Sever 2016 include the following:
  • Cached object model and optimized object instance creation. Objects are loaded only when specifically referenced. Object properties are only partially loaded when the object is created. The remaining objects and properties are loaded when they are referenced directly.
  • Batched execution of Transact-SQL statements. Statements are batched to improve network performance.
  • Capture Transact-SQL statements. Allows any operation to be captured into a script. Management Studio uses this capability to script an operation instead of executing it immediately.
  • Management of SQL Server services with the WMI Provider. SQL Server services can be started, stopped, and paused programmatically.
  • Advanced Scripting. Transact-SQL scripts can be generated to re-create SQL Server objects that describe relationships to other objects on the instance of SQL Server.
  • Use of Unique Resource Names (URNs). A URN allows you to create instances of and reference SMO objects.
SMO also represents as new objects or properties many features and components that were introduced in SQL Server 2005. These new features and components include the following:
  • Table and index partitioning for storage of data on a partition scheme.
  • HTTP endpoints for managing SOAP requests.
  • Snapshot isolation and row level versioning for increased concurrency.
  • XML Schema collection, XML indexes and XML datatype provide validation and storage of XML data. For more information, see XML Schema Collections (SQL Server) and Using XML Schemas.
  • Snapshot databases for creating read-only copies of databases.
  • Service Broker support for message-based communication.
  • Synonym support for multiple names of SQL Server database objects.
  • The management of Database Mail that lets you create e-mail servers, e-mail profiles, and e-mail accounts in SQL Server.
  • Registered Servers support for registering connection information.
  • Trace and replay of SQL Server events.
  • Support for certificates and keys for security control.
  • DDL triggers for adding functionality when DDL events occur.
The SMO namespace is Microsoft.SqlServer.Management.Smo. SMO is implemented as a Microsoft.NET Framework assembly. This means that the common language runtime from the Microsoft.NET Framework version 2.0 must be installed before using the SMO objects. The SMO assemblies are installed by default into the Global Assembly Cache (GAC) with the SQL Server SDK option.