Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Does SQL
“Top 5”s Surprises for SQL Users AS is not as • C# keywords and SQL keywords overlap • Costly to make case-insensitive -> Better build capabilities than tinker with syntax = != == • Remember: C# expression language null IS NOT NULL • C# nulls are two-valued PROCEDURES but no WHILE No UPDATE nor MERGE • Transform/Recook instead
@customers = SELECT Customer.ToUpper() AS Customer FROM @orders WHERE Customer.Contains("Contoso"); C# Expression Transforming Rowsets C# Expression Use WHERE for filtering
@rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer; Many other aggregations are possible. You can define your own aggregator with C#! Grouping & Aggregation
@rows = SELECT Customer, SUM(Amount) AS TotalAmount FROM @orders GROUP BY Customer HAVING TotalAmount > 1000000; HAVING filters the output of a GROUP BY Grouping & Aggregation (2)
Sorting a rowset @customers SELECT * FROM @customers ORDER BY Amount ASC FETCH FIRST 3 ROWS; SELECT with ORDER BY requires a FETCH FIRST!
Sorting on OIUTPUT OUTPUT @customers TO @"/output.tsv" ORDER BY Amount ASC USING Outputters.Tsv();
Creating Constant Rowsets in Script @departments = SELECT * FROM (VALUES (31, "Sales"), (33, "Engineering"), (34, "Clerical"), (35, "Marketing") ) AS D( DepID, DepName );
@m = SELECT new ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs FROM @t; @t = SELECT author, "authored" AS category FROM @t UNION ALL SELECT r.Substring(1) AS r, "referenced" AS category FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r); category, , category @m(refs) @me, @you @him, @her Refs(r) @me @you @him @her @me, @you @me @you
U-SQL Joins Join operators • INNER JOIN • LEFT or RIGHT or FULL OUTER JOIN • CROSS JOIN • SEMIJOIN • equivalent to IN subquery • ANTISEMIJOIN • Equivalent to NOT IN subquery Notes • ON clause comparisons need to be of the simple form: rowset.column == rowset.column or AND conjunctions of the simple equality comparison • If a comparand is not a column, wrap it into a column in a previous SELECT • If the comparison operation is not ==, put it into the WHERE clause • turn the join into a CROSS JOIN if no equality comparison Reason: Syntax calls out which joins are efficient
U-SQL Analytics Windowing Expression Window_Function_Call 'OVER' '(' [ Over_Partition_By_Clause ] [ Order_By_Clause ] [ Row _Clause ] ')'. Window_Function_Call := Aggregate_Function_Call | Analytic_Function_Call | Ranking_Function_Call. Windowing Aggregate Functions ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP Analytics Functions CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG Ranking Functions DENSE_RANK, NTILE, RANK, ROW_NUMBER
INSERT • INSERT constant values • INSERT from queries • Multiple INSERTs INSERT constant values INSERT INTO T VALUES (1, "text", new SQL.MAP<string,string>("key","value")); INSERT from queries INSERT INTO T SELECT col1, col2, col3 FROM @rowset; Multiple INSERTs into same table • Is supported • Generates separate file per insert in physical storage: • Can lead to performance degradation • Recommendations: • Try to avoid small inserts • Rebuild table after frequent insertions with: ALTER TABLE T REBUILD;
Additional Resources Documentation U-SQL Reference Doc: http://aka.ms/usql_reference Sample Projects https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAn alysis
http://aka.ms/AzureDataLake

U-SQL Does SQL (SQLBits 2016)

  • 1.
    Michael Rys Principal ProgramManager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Does SQL
  • 2.
    “Top 5”s Surprises forSQL Users AS is not as • C# keywords and SQL keywords overlap • Costly to make case-insensitive -> Better build capabilities than tinker with syntax = != == • Remember: C# expression language null IS NOT NULL • C# nulls are two-valued PROCEDURES but no WHILE No UPDATE nor MERGE • Transform/Recook instead
  • 4.
    @customers = SELECT Customer.ToUpper()AS Customer FROM @orders WHERE Customer.Contains("Contoso"); C# Expression Transforming Rowsets C# Expression Use WHERE for filtering
  • 5.
    @rows = SELECT Customer, SUM(Amount) ASTotalAmount FROM @orders GROUP BY Customer; Many other aggregations are possible. You can define your own aggregator with C#! Grouping & Aggregation
  • 6.
    @rows = SELECT Customer, SUM(Amount) ASTotalAmount FROM @orders GROUP BY Customer HAVING TotalAmount > 1000000; HAVING filters the output of a GROUP BY Grouping & Aggregation (2)
  • 7.
    Sorting a rowset @customers SELECT* FROM @customers ORDER BY Amount ASC FETCH FIRST 3 ROWS; SELECT with ORDER BY requires a FETCH FIRST!
  • 8.
    Sorting on OIUTPUT OUTPUT@customers TO @"/output.tsv" ORDER BY Amount ASC USING Outputters.Tsv();
  • 9.
    Creating Constant Rowsetsin Script @departments = SELECT * FROM (VALUES (31, "Sales"), (33, "Engineering"), (34, "Clerical"), (35, "Marketing") ) AS D( DepID, DepName );
  • 10.
    @m = SELECTnew ARRAY<string>(tweet.Split(' ').Where(x => x.StartsWith("@"))) AS refs FROM @t; @t = SELECT author, "authored" AS category FROM @t UNION ALL SELECT r.Substring(1) AS r, "referenced" AS category FROM @m CROSS APPLY EXPLODE(refs) AS Refs(r); category, , category @m(refs) @me, @you @him, @her Refs(r) @me @you @him @her @me, @you @me @you
  • 11.
    U-SQL Joins Join operators • INNERJOIN • LEFT or RIGHT or FULL OUTER JOIN • CROSS JOIN • SEMIJOIN • equivalent to IN subquery • ANTISEMIJOIN • Equivalent to NOT IN subquery Notes • ON clause comparisons need to be of the simple form: rowset.column == rowset.column or AND conjunctions of the simple equality comparison • If a comparand is not a column, wrap it into a column in a previous SELECT • If the comparison operation is not ==, put it into the WHERE clause • turn the join into a CROSS JOIN if no equality comparison Reason: Syntax calls out which joins are efficient
  • 12.
    U-SQL Analytics Windowing Expression Window_Function_Call 'OVER''(' [ Over_Partition_By_Clause ] [ Order_By_Clause ] [ Row _Clause ] ')'. Window_Function_Call := Aggregate_Function_Call | Analytic_Function_Call | Ranking_Function_Call. Windowing Aggregate Functions ANY_VALUE, AVG, COUNT, MAX, MIN, SUM, STDEV, STDEVP, VAR, VARP Analytics Functions CUME_DIST, FIRST_VALUE, LAST_VALUE, PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK; soon: LEAD/LAG Ranking Functions DENSE_RANK, NTILE, RANK, ROW_NUMBER
  • 14.
    INSERT • INSERT constant values •INSERT from queries • Multiple INSERTs INSERT constant values INSERT INTO T VALUES (1, "text", new SQL.MAP<string,string>("key","value")); INSERT from queries INSERT INTO T SELECT col1, col2, col3 FROM @rowset; Multiple INSERTs into same table • Is supported • Generates separate file per insert in physical storage: • Can lead to performance degradation • Recommendations: • Try to avoid small inserts • Rebuild table after frequent insertions with: ALTER TABLE T REBUILD;
  • 15.
    Additional Resources Documentation U-SQL Reference Doc:http://aka.ms/usql_reference Sample Projects https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAn alysis
  • 16.