CREATE FUNCTION (SQL and Python)

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Creates a SQL scalar or table function that takes a set of arguments and returns a scalar value or a set of rows.

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above

Creates a Python scalar function that takes a set of arguments and returns a scalar value.

Python UDFs require Unity Catalog on serverless or pro SQL warehouses, or a shared or single user Unity Catalog cluster.

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 14.1 and above

In addition to positional parameter invocation, you can also invoke SQL and Python UDF using named parameter invocation.

Syntax

CREATE [OR REPLACE] [TEMPORARY] FUNCTION [IF NOT EXISTS]
    function_name ( [ function_parameter [, ...] ] )
    { [ RETURNS data_type ] |
      RETURNS TABLE [ ( column_spec [, ...]) ] }
    [ characteristic [...] ]
    { AS dollar_quoted_string | RETURN { expression | query } }

function_parameter
    parameter_name data_type [DEFAULT default_expression] [COMMENT parameter_comment]

column_spec
    column_name data_type [COMMENT column_comment]

characteristic
  { LANGUAGE { SQL | PYTHON } |
    [NOT] DETERMINISTIC |
    COMMENT function_comment |
    [CONTAINS SQL | READS SQL DATA] }

Parameters

  • OR REPLACE

    If specified, the function with the same name and signature (number of parameters and parameter types) is replaced. You cannot replace an existing function with a different signature. This is mainly useful to update the function body and the return type of the function. You cannot specify this parameter with IF NOT EXISTS.

  • TEMPORARY

    The scope of the function being created. When you specify TEMPORARY, the created function is valid and visible in the current session. No persistent entry is made in the catalog.

  • IF NOT EXISTS

    If specified, creates the function only when it does not exist. The creation of the function succeeds (no error is thrown) if the specified function already exists in the system. You cannot specify this parameter with OR REPLACE.

  • function_name

    A name for the function. For a permanent function, you can optionally qualify the function name with a schema name. If the name is not qualified the permanent function is created in the current schema.

  • function_parameter

    Specifies a parameter of the function.

    • parameter_name

      The parameter name must be unique within the function.

    • data_type

      Any supported data type. For Python, data_type is cast to a Python data type according to this language mapping.

    • DEFAULT default_expression

      Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 10.4 LTS and above

      An optional default to be used when a function invocation does not assign an argument to the parameter. default_expression must be castable to data_type. The expression must not reference another parameter or contain a subquery.

      When you specify a default for one parameter, all following parameters must also have a default.

      DEFAULT is supported for LANGUAGE SQL only.

    • COMMENT comment

      An optional description of the parameter. comment must be a STRING literal.

  • RETURNS data_type

    The return data type of the scalar function. For Python UDFs, return values must exactly match the data type as specified in data_type. Otherwise, to prevent unanticipated type conversions, the function will fail.

    For SQL UDF this clause is optional. The data type will be derived from the function body if it is not provided.

  • RETURNS TABLE [ (column_spec [,…] ) ]

    This clause marks the function as a table function. Optionally it also specifies the signature of the result of the table function. If no column_spec is specified it will be derived from the body of the SQL UDF.

    RETURNS TABLE is supported for LANGUAGE SQL only.

    • column_name

      The column name must be unique within the signature.

    • data_type

      Any supported data type.

    • COMMENT column_comment

      An optional description of the column. comment must be a STRING literal.

  • RETURN { expression | query }

    The body of the function. For a scalar function, it can either be a query or an expression. For a table function, it can only be a query. The expression cannot contain:

    Within the body of the function you can refer to parameter by its unqualified name or by qualifying the parameter with the function name.

  • AS dollar_quoted_definition

    dollar_quoted_definition is the Python function body enclosed by two matching $[tag]$body$[tag]$. tag can be an empty string.

    Examples:

    $$
      return “Hello world”
    $$
    
    $py$
      return "Hello World"
    $py$
    
  • characteristic

    All characteristic clauses are optional. You can specify any number of them in any order, but you can specify each clause only once.

    • LANGUAGE SQL or LANGUAGE PYTHON

      The language of the function implementation.

    • [NOT] DETERMINISTIC

      Whether the function is deterministic. A function is deterministic when it returns only one result for a given set of arguments. You may mark a function as DETERMINISTIC when its body is not and vice versa. A reason for this may be to encourage or discourage query optimizations such as constant folding or query caching. If you do not specify ths option it is derived from the function body.

    • COMMENT function_comment

      A comment for the function. function_comment must be String literal.

    • CONTAINS SQL or READS SQL DATA

      Whether a function reads data directly or indirectly from a table or a view. When the function reads SQL data, you cannot specify CONTAINS SQL. If you don’t specify either clause, the property is derived from the function body.

Supported Libraries in Python UDFs

To use any dependencies, use import <package> within the function body. For example, see the following:

CREATE FUNCTION […]
AS $$
   import json
   [... (rest of function definition)]
$$

Dependencies are limited to the standard Python library and the following libraries:

Package Version
bleach 4.0.0
chardet 4.0.0
charset-normalizer 2.0.4
defusedxml 0.7.1
googleapis-common-protos 1.56.4
grpcio 1.47.0
grpcio-status 1.47.0
jmespath 0.10.0
joblib 1.1.0
numpy 1.20.3
packaging 21.3
pandas 1.3.4
patsy 0.5.2
protobuf 4.21.5
pyarrow 7.0.0
pyparsing 3.0.9
python-dateutil 2.8.2
pytz 2021.3
scikit-learn 0.24.2”
scipy 1.7.1”
setuptools 65.2.0
six 1.16.0
threadpoolctl 3.1.0
webencodings 0.5.1
user-agents 2.2.0
cryptography 38.0.4

Examples

Create and use a SQL scalar function

> CREATE VIEW t(c1, c2) AS VALUES (0, 1), (1, 2);
-- Create a temporary function with no parameter.
> CREATE TEMPORARY FUNCTION hello() RETURNS STRING
    RETURN 'Hello World!';

> SELECT hello();
  Hello World!

-- Create a permanent function with parameters.
> CREATE FUNCTION area(x DOUBLE, y DOUBLE) RETURNS DOUBLE RETURN x * y;

-- Use a SQL function in the SELECT clause of a query.
> SELECT area(c1, c2) AS area FROM t;
 0.0
 2.0

-- Use a SQL function in the WHERE clause of a query.
> SELECT * FROM t WHERE area(c1, c2) > 0;
 1  2

-- Compose SQL functions.
> CREATE FUNCTION square(x DOUBLE) RETURNS DOUBLE RETURN area(x, x);

> SELECT c1, square(c1) AS square FROM t;
  0  0.0
  1  1.0

-- Create a non-deterministic function
> CREATE FUNCTION roll_dice()
    RETURNS INT
    NOT DETERMINISTIC
    CONTAINS SQL
    COMMENT 'Roll a single 6 sided die'
    RETURN (rand() * 6)::INT + 1;
-- Roll a single 6-sided die
> SELECT roll_dice();
 3

Create and use a function that uses DEFAULTs

-- Extend the function to support variable number of sides and dice.
-- Use defaults to support a variable number of arguments
> DROP FUNCTION roll_dice;
> CREATE FUNCTION roll_dice(num_dice  INT DEFAULT 1 COMMENT 'number of dice to roll (Default: 1)',
                            num_sides INT DEFAULT 6 COMMENT 'number of sides per die (Default: 6)')
    RETURNS INT
    NOT DETERMINISTIC
    CONTAINS SQL
    COMMENT 'Roll a number of n-sided dice'
    RETURN aggregate(sequence(1, roll_dice.num_dice, 1),
                     0,
                     (acc, x) -> (rand() * roll_dice.num_sides)::int,
                     acc -> acc + roll_dice.num_dice);

-- Roll a single 6-sided die still works
> SELECT roll_dice();
 3

-- Roll 3 6-sided dice
> SELECT roll_dice(3);
 15

-- Roll 3 10-sided dice
> SELECT roll_dice(3, 10)
 21

-- Roll 3 10-sided dice using named parameter invocation
> SELECT roll_dice(10 => num_sides, num_dice => 3)
 17

-- Create a SQL function with a scalar subquery.
> CREATE VIEW scores(player, score) AS VALUES (0, 1), (0, 2), (1, 2), (1, 5);

> CREATE FUNCTION avg_score(p INT) RETURNS FLOAT
    COMMENT 'get an average score of the player'
    RETURN SELECT AVG(score) FROM scores WHERE player = p;

> SELECT c1, avg_score(c1) FROM t;
 0  1.5
 1  3.5

Create a SQL table function

-- Produce all weekdays between two dates
> CREATE FUNCTION weekdays(start DATE, end DATE)
    RETURNS TABLE(day_of_week STRING, day DATE)
    RETURN SELECT extract(DAYOFWEEK_ISO FROM day), day
             FROM (SELECT sequence(weekdays.start, weekdays.end)) AS T(days)
                  LATERAL VIEW explode(days) AS day
             WHERE extract(DAYOFWEEK_ISO FROM day) BETWEEN 1 AND 5;

-- Return all weekdays
> SELECT weekdays.day_of_week, day
    FROM weekdays(DATE'2022-01-01', DATE'2022-01-14');
  1     2022-01-03
  2     2022-01-04
  3     2022-01-05
  4     2022-01-06
  5     2022-01-07
  1     2022-01-10
  2     2022-01-11
  3     2022-01-12
  4     2022-01-13
  5     2022-01-14

-- Return weekdays for date ranges originating from a LATERAL correlation
> SELECT weekdays.*
    FROM VALUES (DATE'2020-01-01'),
                (DATE'2021-01-01'),
                (DATE'2022-01-01') AS starts(start),
         LATERAL weekdays(start, start + INTERVAL '7' DAYS);
  3     2020-01-01
  4     2020-01-02
  5     2020-01-03
  1     2020-01-06
  2     2020-01-07
  3     2020-01-08
  5     2021-01-01
  1     2021-01-04
  2     2021-01-05
  3     2021-01-06
  4     2021-01-07
  5     2021-01-08
  1     2022-01-03
  2     2022-01-04
  3     2022-01-05
  4     2022-01-06
  5     2022-01-07

Replace a SQL function

-- Replace a SQL scalar function.
> CREATE OR REPLACE FUNCTION square(x DOUBLE) RETURNS DOUBLE RETURN x * x;

-- Replace a SQL table function.
> CREATE OR REPLACE FUNCTION getemps(deptno INT)
    RETURNS TABLE (name STRING)
    RETURN SELECT name FROM employee e WHERE e.deptno = getemps.deptno;

-- Describe a SQL table function.
> DESCRIBE FUNCTION getemps;
 Function: default.getemps
 Type:     TABLE
 Input:    deptno INT
 Returns:  id   INT
           name STRING

Note

You cannot replace an existing function with a different signature.

Describe a SQL function

> DESCRIBE FUNCTION hello;
 Function: hello
 Type:     SCALAR
 Input:    ()
 Returns:  STRING

> DESCRIBE FUNCTION area;
 Function: default.area
 Type:     SCALAR
 Input:    x DOUBLE
           y DOUBLE
 Returns:  DOUBLE

> DESCRIBE FUNCTION roll_dice;
 Function: default.roll_dice
 Type:     SCALAR
 Input:    num_dice  INT
           num_sides INT
 Returns:  INT

> DESCRIBE FUNCTION EXTENDED roll_dice;
 Function:      default.roll_dice
 Type:          SCALAR
 Input:         num_dice  INT DEFAULT 1 'number of dice to roll (Default: 1)'
                num_sides INT DEFAULT 6 'number of sides per dice (Default: 6)'
 Returns:       INT
 Comment:       Roll a number of m-sided dice
 Deterministic: false
 Data Access:   CONTAINS SQL
 Configs:       ...
 Owner:         the.house@always.wins
 Create Time:   Sat Feb 12 09:29:02 PST 2022
 Body:          aggregate(sequence(1, roll_dice.num_dice, 1),
                      0,
                      (acc, x) -> (rand() * roll_dice.num_sides)::int,
                      acc -> acc + roll_dice.num_dice)

Create Python functions

—- Hello World-like functionality using Python UDFs
> CREATE FUNCTION main.default.greet(s STRING)
  RETURNS STRING
  LANGUAGE PYTHON
  AS $$
    def greet(name):
      return "Hello " + name + "!"

    return greet(s) if s else None
  $$

—- Can import functions from std library and environment
> CREATE FUNCTION main.default.isleapyear(year INT)
  RETURNS BOOLEAN
  LANGUAGE PYTHON
  AS $$
    import calendar
    return calendar.isleap(year) if year else None
  $$

—- Must return the correct type. Otherwise will fail at runtime.
> CREATE FUNCTION main.default.a_number()
  RETURNS INTEGER
  LANGUAGE PYTHON
  AS $$
    # does not work: return "10"
    # does not work: return 3.14
    return 10
  $$

—- Deal with exceptions.
> CREATE FUNCTION main.default.custom_divide(n1 INT, n2 INT)
  RETURNS FLOAT
  LANGUAGE PYTHON
  AS $$
    try:
      return n1/n2
    except ZeroDivisionException:
    # in case of 0, we can return NULL.
    return None
  $$