共用方式為


histogram_numeric 聚合函數

適用於: 檢查標示為是 Databricks SQL 檢查標示為是 Databricks Runtime 10,2 和更新版本

使用 numBins bin 計算 上的expr直方圖。

語法

histogram_numeric ( [ALL | DISTINCT ] expr, numBins ) [ FILTER ( WHERE cond ) ]

引數

  • expr:函式取用的數值、 TIMESTAMPDATEINTERVAL 運算式,並計算其上的直方圖。
  • numBinsINTEGER:必須大於 1 的常值,指定直方圖計算的 bin 數目。
  • cond:選擇性 BOOLEAN 表達式,可篩選匯總的數據列。

傳回

傳回值是 ARRAY STRUCTS 的 ,具有字段x,並y代表直方圖量化的中心。的型別與 的類型xexpr相同,且的型別yDOUBLE。 增加 的值 numBins 會精簡直方圖近似值,使其更精細。 不過,它可以在極端值周圍引進成品。 一般而言,20-40 個量化對直方圖有效,不過扭曲或較小的數據集可能需要更多量化。請注意,此函式會建立具有非統一量化寬度的直方圖。 在直方圖的均方誤差方面,它不提供任何保證,但實際上與其他運算套件所產生的直方圖相當。

指定 DISTINCT 可讓函式只在唯一的 expr 一組值上運作。

範例

> SELECT histogram_numeric(col, 5)
    FROM VALUES (0), (1), (2), (10) AS tab(col);
 [{"x":0.0,"y":1.0},{"x":1.0,"y":1.0},{"x":2.0,"y":1.0},{"x":10.0,"y":1.0}]

> SELECT histogram_numeric(col, 5)
    FROM VALUES (0L), (1L), (2L), (10L) AS tab(col);
 [{"x":0,"y":1.0},{"x":1,"y":1.0},{"x":2,"y":1.0},{"x":10,"y":1.0}]

> SELECT histogram_numeric(col, 5)
    FROM VALUES (0F), (1F), (2F), (10F) AS tab(col);
 [{"x":0.0,"y":1.0},{"x":1.0,"y":1.0},{"x":2.0,"y":1.0},{"x":10.0,"y":1.0}]

> SELECT histogram_numeric(col, 5)
    FROM VALUES (0D), (1D), (2D), (10D) AS tab(col);
 [{"x":0.0,"y":1.0},{"x":1.0,"y":1.0},{"x":2.0,"y":1.0},{"x":10.0,"y":1.0}]

> SELECT histogram_numeric(col, 5)
    FROM VALUES (INTERVAL 0 YEAR), (INTERVAL 1 YEAR), (INTERVAL 2 YEAR),
    (INTERVAL 3 YEAR) AS tab(col);
 [{"x":0-0,"y":1.0},{"x":1-0,"y":1.0},{"x":2-0,"y":1.0},{"x":3-0,"y":1.0}]

> SELECT histogram_numeric(col, 5)
    FROM VALUES (INTERVAL 0 DAY), (INTERVAL 1 DAY), (INTERVAL 2 DAY),
    (INTERVAL 3 DAY) AS tab(col);
  [{"x":0 00:00:00.000000000,"y":1.0},{"x":1 00:00:00.000000000,"y":1.0},{"x":2 00:00:00.000000000,"y":1.0},{"x":3 00:00:00.000000000,"y":1.0}]

> SELECT histogram_numeric(col, 5)
    FROM VALUES (TIMESTAMP '2020-01-01'), (TIMESTAMP'2020-02-01'),
    (TIMESTAMP'2020-03-01'), (TIMESTAMP'2020-10-01') AS tab(col)
  [{"x":2020-01-01 00:00:00,"y":1.0},{"x":2020-02-01 00:00:00,"y":1.0},{"x":2020-03-01 00:00:00,"y":1.0},{"x":2020-10-01 00:00:00,"y":1.0}]