Tuesday, July 26, 2016

U-SQL

U-SQL is a data processing language that unifies the benefits of SQL with the expressive power of your own code to process all data at any scale. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across relational stores such as Azure SQL Database. It enables you to process unstructured data by applying schema on read, insert custom logic and UDF's, and includes extensibility to enable fine grained control over how to execute at scale.
U-SQL is the new big data query language of the Azure Data Lake Analytics service. It evolved out of Microsoft's internal Big Data language called SCOPE and combines a familiar SQL-like declarative language with the extensibility and programmability provided by C# types and the C# expression language and big data processing concepts such as “schema on reads”, custom processors and reducers. It also provides the ability to query and combine data from a variety of data sources, including Azure Data Lake Storage, Azure Blob Storage, and Azure SQL DB, Azure SQL Data Warehouse, and SQL Server instances running in Azure VMs. It is however not ANSI SQL.
U-SQL script:
The main unit of a U-SQL “program” is a U-SQL script. A script consists of an optional script prolog and a sequence of U-SQL statements.
@t = EXTRACT date string
        , time string
        , author string
        , tweet string
    FROM "/input/MyTwitterHistory.csv"
    USING Extractors.Csv();

@res = SELECT author
    , COUNT(*) AS tweetcount
    FROM @t
    GROUP BY author;

OUTPUT @res TO "/output/MyTwitterAnalysis.csv"
ORDER BY tweetcount DESC
USING Outputters.Csv();
The above U-SQL script shows the three major steps of processing data with U-SQL:
  • Extract data from your source, using EXTRACT statement in query. The datatypes are based on C# datatypes and it use the built-in Extractors library to read and schematize the CSV file.
  • Transform using SQL and/or custom user defined operators.
  • Output the result either into a file or into a U-SQL table to store it for further processing.
U-SQL combines some familiar concepts from a variety of languages: It is a declarative language like SQL, it follows a dataflow-like composition of statements and expressions like Cascading, and provides simple ways to extend the language with user-defined operators, user-defined aggregators and user-defined functions using C#, and provides a SQL database-like metadata object model to manage, discover and secure structured data and user-code.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.