Link Search Menu Expand Document

Release notes for Soda Library

[soda-library] 1.7.1

25 October 2024

1.8.0 Features and Fixes

  • CLOUD-8739: orchestrate observability by @jzalucki in #339

[soda-library] 1.7.1

25 October 2024

1.7.1 Features and Fixes

  • Anomaly Detection check: add alert_directionality
  • Comparison row count check: secondary datasource filter fix (#2165) by @m1n0 in #337
  • Reconciliation row: fix multiple checks with filters by @m1n0 in #338

[soda-library] 1.7.0

17 October 2024

1.7.0 Features and Fixes

  • Support both pydantic v1 and v2 by @m1n0 in #328

[soda-library] 1.6.5

17 October 2024

1.6.5 Features and Fixes

  • Feature: changed auto_exclude_anomaly under TrainingDatasetParameters by @teresama in #333
  • Chore: performance testing CI pipeline

[soda-library] 1.6.4

08 October 2024

1.6.4 Features and Fixes

  • Chore: Changed pandas version to be compatible with python 3.8 by @teresama in #323
  • Add tracing to bunch of classes + allow 1.6.4.dev0 by @jzalucki in #325
  • Update obs extreme values test after pandas2 by @m1n0 in #329
  • Run perf nightly, add more dd tags, fix none hostname. by @jzalucki in #330
  • Feature: offer user control to automatically exclude classified anomalies from training auto_exclude_anomalies in anomaly checks by @teresama in #327
  • Fix obs test for py38 by @m1n0 in #331
  • [CLOUD-8480] Revert “Put back global 10k query limit temporarily (#324)” by @dirkgroenen in #332

[soda-library] 1.6.3

26 September 2024

1.6.3 Features and Fixes

  • Put back global 10k query limit temporarily by @m1n0 in #324

[soda-library] 1.6.2

24 September 2024

1.6.2 Features and Fixes

  • Remove global hard limit on queries by @m1n0 in #321
  • Fix: exclude NaNs (as NULLS) from aggregate queries in databricks by @bastienboutonnet in #322

[soda-library] 1.6.1

17 September 2024

1.6.1 Features and Fixes

  • Dataset level configuration for attributes and samples columns. by @jzalucki in #313
  • Do not collect samples if collecting of default samples were disabled in the cloud. by @jzalucki in #314
  • Use default cloud samples columns. by @jzalucki in #315
  • Fix: Handle cases where database returns NaN instead of NULL in aggs and frequent values queries by @bastienboutonnet in #316
  • Add support for collect failed rows table and checks level. by @jzalucki in #317
  • CLOUD-8251 - Fix Oracle in CI by @dakue-soda in #302
  • Add custom message to DefaultSampler depending on samples disabled reason. by @jzalucki in #319
  • Send soda library version during file upload, if fileId not present mark sample as not persisted with a message. by @jzalucki in #318

[soda-library] 1.6.0

04 September 2024

1.6.0 Features and Fixes

  • Comparison check: Fix “other” table filter by @jzalucki in #304
  • Oracle: Bug fixes by @m1n0 in #260
  • Failed rows: Always expose failing sql for failed rows and user defined metric if failing query is available. by @jzalucki in #305
  • Failed rows: Add column property to failed rows and user defined metric checks. by @jzalucki in #307
  • Failed rows: Templatize rerouted sample message. by @jzalucki in #310
  • Observability: Catch issues when orchestrating profiling by @m1n0 in #311
  • Observability: Fall back to 1M if partition fails by @m1n0 in #312
  • Observability: Set duplicate percentage to None when zero rows in partition by @jzalucki in #301
  • Observability: Warn user when 24h partition is empty. by @jzalucki in #306

[soda-library] 1.5.25

14 August 2024

1.5.25 Fixes

  • Spark: Replicate implicit ‘include all’ in profiling by @m1n0 in #300 and #303
  • Freshness: Support variables in thresholds by @m1n0

[soda-library] 1.5.24

13 August 2024

1.5.24 Fixes

  • Adapt anomaly detector outcome messages for observability by @bastienboutonnet in #296
  • Always clean DB even on GH. by @jzalucki in #295
  • Group evolution: fix group changes not being detected by @m1n0 in #297
  • Feature: smaller min confidence interval ratio by @bastienboutonnet in #298
  • Execute observability checks outside of regular flow. by @jzalucki in #299

[soda-library] 1.5.23

02 August 2024

1.5.23 Fixes

  • Attempt to always show freshness even if last 24 hours partition does not return data. by @jzalucki in #291
  • Observability: Always add partition column to profiling result by @m1n0 in #293

[soda-library] 1.5.22

01 August 2024

1.5.22 Fixes

  • Observability: minimize metadata retrieval, do not push data into dis… by @m1n0 in #282
  • Handle SQL exception nicely for failed rows and user-defined check. by @jzalucki in #286
  • Spark: send discovery data despite errors. by @jzalucki in #290
  • Quote column names during observability partition detection. by @jzalucki in #288
  • Spark: failed rows should not be limited to max 100 total results. by @jzalucki in #292

[soda-library] 1.5.21

31 July 2024

1.5.21 Fixes

  • Add nchar, nvarchar and binary to text types for profiling. by @jzalucki in #281
  • CLOUD 8061: alias table names in sql queries by @jzalucki in #280
  • Oracle data source properties prefix should be None instead of “None” when no service name is provided. by @jzalucki in #283
  • Sqlserver: use appropriate aggregate methods to build queries by @jzalucki in #284
  • Cross row count check should support custom identity. by @jzalucki in #285
  • Copyedit on frequency detection error message by @janet-can in #287
  • Chore: update auto-assignments by @milanaleksic in #289

[soda-library] 1.5.20

24 July 2024

1.5.20 Fixes

  • Fix: make sure labelling incorrect anomalies always returns something by @bastienboutonnet in #265

[soda-library] 1.5.19

23 July 2024

1.5.19 Fixes

  • Always reset logger when new Scan instance is created. by @jzalucki in #277
  • Use SHOW TABLES and SHOW VIEWS instead of spark session catalog API. by @jzalucki in #278
  • Fix: apply cast to numerical for ms sqlserver by @bastienboutonnet in #279

[soda-library] 1.5.18

22 July 2024

1.5.18 Fixes

  • Use spark session catalog to get all table names including temporary views. by @jzalucki in #276

[soda-library] 1.5.17

17 July 2024

1.5.17 Fixes

  • Reconciliation: support custom source/target query with deepdiff strategy by @jzalucki in #269
  • Observability: apply 1M rows limit with time partition. by @jzalucki in #270
  • Snowflake: support custom hostname and port (#2109) by @m1n0 in #271
  • Add sslmode support to postgres and denodo (#2066) by @m1n0 in #273
  • Add Scan Context to read/write data from/to a scan (#2134) by @m1n0 in #272
  • Better user provided queries sanitize. (#2131) by @jzalucki in #275

[soda-library] 1.5.16

16 July 2024

1.5.16 Fixes

  • Observability: get all metric history by @m1n0 in #268

[soda-library] 1.5.15

15 July 2024

1.5.15 Fixes

  • Fix: cast SUM query to NUMERIC in BQ by @bastienboutonnet in #266
  • Fix: output outlier holidays even when no country holiday by @bastienboutonnet in #267

[soda-library] 1.5.14

02 July 2024

1.5.14 Fixes

  • Http Sampler: Do not invoke when no failed rows by @m1n0 in #264
  • Missing Count: Fix sample query by @m1n0 in #264
  • Between threshold: Fix error when using variables by @m1n0 in #264
  • Profiling: Fix discovery metadata bug by @m1n0 in #263

[soda-library] 1.5.13

28 June 2024

1.5.13 Fixes

  • Databricks: run tests in CI by @m1n0 in #239
  • Fix CI by @m1n0 in #261
  • Set minimum version of the freshness detector to 0.0.7 by @bastienboutonnet in #262

[soda-library] 1.5.12

27 June 2024

1.5.12 Fixes

  • SAS-3334 For duckdb do not use database as filter at all. by @jzalucki in #259
  • Fix: explictly construct country holiday df and concat with outliers by @bastienboutonnet in #256

[soda-library] 1.5.11

24 June 2024

1.5.11 Fixes and features

  • Fix: leave warning bounds and only make level be pass when warn by @bastienboutonnet in #257
  • Fix: handle overflowing timestamps for nanosecond precision overflow issues by @bastienboutonnet in #258

[soda-library] 1.5.10

21 June 2024

1.5.10 Fixes and features

  • CLOUD-7426 Add scan_time to http payload. by @jzalucki in #255
  • Feature: exclude outliers from training via holiday interface by @bastienboutonnet in #254

[soda-library] 1.5.9

20 June 2024

1.5.9 Fixes and features

  • Oracle: fix profiling/discovery queries by @m1n0 in #253

[soda-library] 1.5.7 & 1.5.8

18 June 2024

Fixes and features

  • Observability: Fix Anomaly Detection check history retrieval by @m1n0 in #246
  • Spark: profiling support more text types (#2099) by @m1n0 in #250
  • Feature: use fail_only flag instead of warning 0 by @bastienboutonnet in #249
  • Oracle: fix queries, profiling and other by @m1n0 in #251

  • Duplicate check: support sample exclude columns fully by @m1n0 in #241
  • Spark: profiling support more numeric types by @m1n0 in #242
  • Fix: make gap removal wait for at least 5 days and implement simpler thresholds by @bastienboutonnet in #243
  • Profiling: support casting numericals to large data type by @m1n0 in #244
  • Cloud: better error handling and logging by @m1n0 in #245

[soda-library] 1.5.6

10 June 2024

Fixes and features

  • Freshness in obs: add log msg when no data by @m1n0 in #238
  • Obs: get all metadata only when enabled, make tests more robust by @m1n0 in #240

[soda-library] 1.5.5

05 June 2024

Fixes and features

  • SAS-3519 CLOUD-7769: Correctly map statuses from remote scans by @dirkgroenen in #232
  • Bump requests and tox/docker by @m1n0 in #236
  • Feature: make all non critical error messages be warnings in profiling by @bastienboutonnet in #235
  • Feature: observability anomalies are considered correctly classified unless negative feedback given by @bastienboutonnet in #234
  • Duplicate check: fail gracefully in case of error in query by @m1n0 in #237

[soda-library] 1.5.4

29 May 2024

Fixes and features

  • CLOUD-7751 - fix nightly CI pipeline, use Snowflake CI account config… by @dakue-soda in #229
  • Feature: use partition row count (via aggregates) and use in duplicate percent by @bastienboutonnet in #230

[soda-library] 1.5.3

28 May 2024

Fixes and features

  • CLOUD-7400: Improve memory usage for Queries by @dirkgroenen in #227
  • CLOUD-7702: Add Snowflake CI account to pipeline for soda-library by @dakue-soda in #223
  • CLOUD-7725: use newer thrift 0.20.0 by @milanaleksic in #228

[soda-library] 1.5.2

24 May 2024

Fixes and features

  • Observability: handle one row scenario by @m1n0 in #224
  • User defined metric check: support failed rows query by @m1n0 in #226

[soda-library] 1.5.1

22 May 2024

Fixes and features

  • Fix float comparisons in tests by @m1n0 in #222
  • Observability: hash metric identities by @m1n0 in #225

[soda-library] 1.5.0

20 May 2024

Fixes and features

  • Observability beta (behind feature flag) @m1n0 in #198

[soda-library] 1.4.10

17 May 2024

Fixes and features

  • Failed rows: fix warn/fail thresholds for fail condition (#2084) by @m1n0 in #221
  • upgrade sqlparse version inside soda base package by @Antoninj in #220

[soda-library] 1.4.9

14 May 2024

Fixes and features

  • CLOUD-7400 Stream query data through memory, reducing memory footprint by @dirkgroenen in #210

[soda-library] 1.4.8

07 May 2024

Fixes and features

  • CLOUD-7362: add base exception to error log messages cloud payload by @Antoninj in #212
  • Fix automated monitoring, prevent duplicate queries by @m1n0 in #90
  • Denodo: fix connection timeout attribute (#2065) by @m1n0 in #215
  • DB2: Update db2_data_source.py (#2063) by @m1n0 in #216
  • Update autoflake precommit by @m1n0 in #214
  • SAS-3361: upgrade to latest version of ibm-db python client by @Antoninj in #213
  • Hive: support scheme by @m1n0 in #217
  • Bump dev requirements by @m1n0 in #218

[soda-library] 1.4.7

10 April 2024

Fixes and features

  • Rename argument in set_scan_results_file method (#2047)
  • Dremio: support disableCertificateVerification option (#2049)

[soda-library] 1.4.5 & 1.4.6

04 April 2024

Fixes and features

  • SAS-3165 Only reset sampler when originally SodaCloudSampler by @dirkgroenen in #207
  • Feature: enable new anomaly detection algo in group by checks by @bastienboutonnet in #208

[soda-library] 1.4.4

23 March 2024

Fixes and features

  • Failed rows: fix warn/fail thresholds by @m1n0 in #204
  • Bump opentelemetry to 1.22 by @m1n0 in #205

[soda-library] 1.4.3

20 March 2024

Fixes and features

  • Add missing import for type annotations backwards compatibility by @Antoninj in #196
  • Refactor: Parse access_url from dbt config for new multicell org by @bastienboutonnet in #194
  • Timestamp conversion fixes by @Antoninj in #200
  • SAS-2966 Remove scan reference exception throw in local mode by @dirkgroenen in #199
  • Add test for checks level attributes by @m1n0 in #201
  • Fix: Attribute handler timezone test by @m1n0 in #202
  • Feature: Better legend wording and nicer tooltip formatting by @bastienboutonnet in #203

[soda-library] 1.4.2

05 March 2024

Fixes

  • Dremio: fix token support (#2028) by @m1n0 in #195

[soda-library] 1.4.1

01 March 2024

Fixes and features

  • IA-533: Implement daily and monthly seasonality to external regressor by @baturayo in #189
  • Support GMT (Zulu) and microseconds time format by @m1n0 in #193

[soda-library] 1.4.0

28 February 2024

Fixes and features

  • Cloud 6550: remote scans by @m1n0 in #192

[soda-library] 1.3.4

28 February 2024

Fixes and features

  • Fix: timezone mismatch between the recent and historical ad results by @baturayo in #188
  • Feature: in anomaly detection simulator use soda core historic check results endpoint instead of test results by @baturayo in #190
  • Update dask-sql by @m1n0 in #191

[soda-library] 1.3.3

13 February 2024

Fixes and features

  • Fix: include simulator assets folder into the setup.py by @baturayo in #186

[soda-library] 1.3.2

13 February 2024

Fixes and features

  • Fix: simulator import and streamlit path by @m1n0 in #182
  • Oracle: create dsn if not provided (#2012) by @m1n0 in #183
  • Oracle: cast config to str/int to prevent oracledb errors (#2018) by @m1n0 in #184
  • Oracle: fix Cloud integration by @m1n0 in #185

[soda-library] 1.3.1

09 February 2024

Fixes and features

  • Feature: correctly identified anomalies are excluded from training data by @baturayo in #178
  • Fix: show more clearly the detected frequency using warning message first by @baturayo in #180
  • Pin segment analytics and typing-extensions by @m1n0 in #181

[soda-library] 1.3.0

08 February 2024

Fixes and features

  • Feature: anomaly detection simulator by @baturayo in #163
  • Feature: added dremio token support (#2009) by @m1n0 in #179
  • Temporarily affix Segment Analytics version by @dirkgroenen in #177
  • Cloud 6693 improve group by by @m1n0 in #176

[soda-library] 1.2.4

31 January 2024

Fixes and features

  • Feature: implement severity level paramaters by @baturayo in #169
  • Fix for min_confidence_interval_ratio parameter by @baturayo in #170
  • Always use datasource specifis COUNT expression (#2003) by @m1n0 in #172
  • Send result to Cloud if data source connection issue by @m1n0 in #171
  • CLOUD-6805: avoid sending empty error location when logging configuration file parsing errors by @Antoninj in #173
  • CLOUD-6817: Catch Cloud exceptions (failed insertions) properly by @dirkgroenen in #174

[soda-library] 1.2.2 & 1.2.3

26 January 2024

Fixes and features

  • Hive data source improvements by @robertomorandeira in sodadata/soda-core#1982
  • Feature: Implement migrate from anomaly score check config by @baturayo in sodadata/soda-core#1998
  • Bump Prophet by @m1n0 in sodadata/soda-core#2000
  • Tests: Use approx comparison for floats by @m1n0 in sodadata/soda-core#1999


  • Support token auth by @m1n0 in #159
  • Schema check: Support custom identity (#1988) by @m1n0 in #161
  • CLI: Omit exception if no cli args by @m1n0 in #162
  • Add semver release for major, minor and latest by @dirkgroenen in #164
  • Bug: Handle null values for continuous dist by @baturayo in #165
  • IA-486: implement new anomaly detection logic and syntax by @baturayo in #153
  • Fix Python3.8 type issues for new AD syntax by @baturayo in #166
  • Feature: Support built in prophet public holidays by @baturayo in #167

[soda-library] 1.2.0

16 January 2024

Fixes and features

  • cbt: improve parsing logs by @m1n0 in #157
  • Sampler: fix link href by @m1n0 in #158
  • BREAKING: Row Reconciliation, new simple strategy for batch processing by @m1n0 in #155

[soda-library] 1.2.1

14 January 2024

Fixes and features

  • Recon row fixes by @m1n0 in #160

[soda-library] 1.1.29

03 January 2024

Fixes and features

  • Feature: implement warn_only for anomaly score by @baturayo in #156

[soda-library] 1.1.28

15 December 2023

Fixes and features

  • Fix frequency aggregation bug for anomaly detection by @baturayo in #152
  • Bump pydantic from v1 to v2 by @baturayo in #151
  • Adding support for authentication via a chained list of delegate accounts by @m1n0 in #154

[soda-library] 1.1.27

15 December 2023

Fixes and features

  • Group by: support anomaly/cot, better names by @m1n0 in #147

[soda-library] 1.1.26

04 December 2023

Fixes and features

  • Freshness: support in-check filters (#1970) by @m1n0 in #150. Documentation to follow shortly.

[soda-library] 1.1.24 & 1.1.25

24 November 2023

Fixes and features

  • Reconciliation row: expose deepdiff config, lower sensitivity by @m1n0 in #149

  • Make custom identity fixed as v4 by @m1n0 in #143
  • Reconciliation row: fix key cols mapping, bugfixes by @m1n0 in #148

[soda-library] 1.1.23

19 November 2023

Fixes and features

  • Align usage of database/catalog and implement fallback by @dirkgroenen in #142
  • Remove segment logs by @m1n0 in #145
  • Align usage of exit codes and add exit_code(4) by @dirkgroenen in #146

[soda-library] 1.1.22

14 November 2023

Fixes and features

  • Cloud: Add ScanId by @dirkgroenen in #137
  • Athena: Set default catalog name by @dirkgroenen in #139
  • Sqlserver: remove % from pattern (#1956) by @m1n0 in #140
  • Sqlserver: support quoting tables with brackets, “quote_tables” mode by @m1n0 in #141

[soda-library] 1.1.20 & 1.1.21

02 November 2023

Fixes and features

  • Freshness: support mixed thresholds by @m1n0 in #134
  • Duckdb: Rename path to database by @dirkgroenen in #135

  • Failed rows: new ‘empty’ type, handle no rows scenario better by @m1n0 in #132
  • Extend Data Source identity migration to spark_df by @dirkgroenen in #133

[soda-library] 1.1.19

23 October 2023

Fixes and features

  • Fix: compute value counts in DB rather than in python for categoric distribution checks by @baturayo in #116
  • Run scientific unit tests in CI by @baturayo in #121
  • Raise a warning instead of exception when dataset name is incorrect in suggestions by @baturayo in #126
  • Add support for custom dask data source name by @dirkgroenen in #120

[soda-library] 1.1.17 & 1.1.18

12 October 2023

Fixes and features

  • Remove quotes from dataset name in check payload by @m1n0 in #124

  • Fix package specific tests by @m1n0 in #123
  • Cloud 4311 nightly dev builds by @vijaykiran in #125
  • Add threshold support to failed row query/condition checks by @vijaykiran in #127

[soda-library] 1.1.15 & 1.1.16

11 October 2023

Fixes and features

  • Change dbt version marker in extras_reqiure. To install soda-dbt, use either pip install -i https://pypi.cloud.soda.io "soda-dbt[ver16]" or pip install -i https://pypi.cloud.soda.io "soda-dbt[ver15]".
  • Fix error on invalid check attributes by @vijaykiran in #117
  • CLOUD-5705 Fix schema attributes validation by @vijaykiran in #118
  • Add attributes to checks level by @vijaykiran in #119
  • Add tests for reconciliation checks, minor bugfixes by @m1n0 in #122

[soda-library] 1.1.14

05 October 2023

Fixes and features

  • Chore: rename Soda Library docker image build step by @Antoninj in #109
  • Fix threshold cloud payload for freshness checks by @vijaykiran in #112
  • Fix schema reconciliation config parsing by @m1n0 in #113
  • Allow to specify virtual file name for add sodacl string by @m1n0 in #115
  • Check type segment tracking by @m1n0 in #114

[soda-library] 1.1.13

27 September 2023

Fixes and features

  • Reconciliation schema: support type mapping by @m1n0 in #110
  • Fix databricks numeric types profiling by @m1n0 in #111

[soda-library] 1.1.12

21 September 2023

Fixes and features

  • Add PR auto assign reviewer GH workflow by @Antoninj in #103
  • Fix: nofile payload when http sampler is used by @m1n0 in #104
  • Trino: fix dataset prefix by @m1n0 in #105
  • Row reconciliation improve sample by @m1n0 in #106
  • Schema reconciliation improve diagnostics by @m1n0 in #107
  • Add thresholds and diagnostics to scan result by @m1n0 in #108

[soda-library] 1.1.11

19 September 2023

Fixes and features

  • Feature: Support dbt 1.5 and 1.6 by @vijaykiran in #99
  • Feature: Reference check: support must NOT exist by @m1n0 in #100
  • Fix: Reconciliation variables support by @m1n0 in #93
  • Fix: Catch exceptions while building results file by @dirkgroenen in #63
  • Fix: Row diff: fix python 3.8 compatibility by @vijaykiran in #101
  • Improvement: Reconciliation row diff better config logging by @m1n0 in #102

[soda-library] 1.1.10

13 September 2023

Fixes

  • Reconciliation row diff handle incompatible schema by @m1n0 in #97
  • Fix: api_key_id tracking in check suggestions by @baturayo in #98

[soda-library] 1.1.9

12 September 2023

Fix

Soda Library 1.1.9 includes a fix for reconciliation check results that have been overwriting historical results data in Soda Cloud.

Upon upgrading, Soda Cloud will archive any existing check history for reconciliation checks, only. With 1.1.9, reconciliation check results start collecting a fresh history of results with an improved check identify algorithm that properly retains check history.

Action

  1. Upgrade to Soda Library 1.1.9 to leverage the fix.
  2. Initiate a new scan that involves your reconciliation checks.
  3. Review the refreshed check results in Soda Cloud, the start of new, properly-retained historical results.

[soda-library] 1.1.6 - 1.1.8

11 September 2023

Fixes and features

  • Discussion scan type by @m1n0 in #91
  • Reconciliation schema remove warn, adjust pass graph numbers by @m1n0 in #92
  • Apply filter in row reconciliation by @m1n0 in #94
  • Reconciliation schema check by @m1n0 in #89
  • Reconciliation freshness check fix cloud ingest by @m1n0 in #87

[soda-library] 1.1.0 - 1.1.5

31 August 2023

Fixes and features

  • Remove label from recon checks by @m1n0 in #82
  • Recon row sample more intuitive header by @m1n0 in #83
  • Handle recon metric division by zero by @m1n0 in #84
  • WIP row recon column mapping by @m1n0 in #85
  • Add Presto support by @vijaykiran in #86

  • Push recon metric diagnostics to Soda Cloud by @m1n0 in #77
  • Fix key columns related issue by @vijaykiran in #78
  • CLOUD-4549 change source/target column to source/target columns by @vijaykiran in #79
  • Fix recon row column handling by @m1n0 in #80
  • Fix divide by zero when the metric value is 0 by @vijaykiran in #81

  • Fix recon group type by @m1n0 in #72
  • Update recon label behaviour by @m1n0 in #73

  • Row reconciliation samples by @m1n0 in #70

  • Support custom identity for failed rows check type by @m1n0 in #65
  • Row recon metric send count only by @m1n0 in #66
  • Source and target key columns support by @m1n0 in #67
  • Ingest recon checks as groups, add summary and diagnostics by @m1n0 in #68
  • Row reconciliation samples by @m1n0 in #69

[soda-library] 1.0.6 - 1.0.8

11 August 2023

Fixes

  • Metrics-based recon checks WIP by @m1n0 in #52
  • Fix typo in recon check name construction by @m1n0 in #57
  • CLOUD-4314: Make abs default for reconciliation checks by @vijaykiran in #58
  • CLOUD-4320: Fix between thresholds for reconciliation by @vijaykiran in #59
  • Build cleanup by @vijaykiran in #60
  • CLOUD-4319: Add support for metric expressions by @vijaykiran in #61
  • CLOUD-3993: Apply the CI/CD fix from soda-core to CI/CD by @milanaleksic in #62
  • Reconciliation row diff checks WIP by @vijaykiran in #64
  • Recon row diff: fix threshold-based outcome WIP

[soda-library] 1.0.5

26 July 2023

Fixes

  • Trino connector has new options: source, client_tags
  • Fix for optional schema_name property added to schema checks (743811c)

[soda-library] 1.0.3 & 1.0.4

21 July 2023

Fixes and features

  • CLOUD-4112 pass scan reference by @gregkaczan in #37
  • Source owner property in scan insert payload by @m1n0 in #39
  • Remove code that was originaly copied over from core by @m1n0 in #41
  • CLOUD-4144 add attributes to cross-checks by @vijaykiran in #42
  • Evaluate group evolution conditions if no historical data is present by @m1n0 in #40
  • Add dict as an overridable field by @vijaykiran in #43
  • Samples columns support by @m1n0 in #38
  • Fix filter in failed rows samples with parenthesis by @m1n0 in #44
  • Add app identifier to datasources by @vijaykiran in #45
  • Bug: skipping partition suggestion were causing the app to fail by @baturayo in #46
  • CLOUD-4170 expose cloud url by @gregkaczan in #49
  • [sqlserver] fix port configuration by @vijaykiran in #50
  • Set metric for failed rows check by @m1n0 in #48
  • Block soda suggest if cloud config is missing by @m1n0 in #51
  • Fix templates for failed rows by @vijaykiran in #53
  • Introduce schema_name property for schema checks by @vijaykiran in #54
  • Bump requirements by @vijaykiran in #55
  • Bug: fix keyboard interrupt tracking in check suggestions by @baturayo in #21
  • CLOUD-3967 merge soda scientific into main package by @vijaykiran in #20
  • Include template definition in check definition by @m1n0 in #23
  • DB prefix set to None if no info available by @m1n0 in #27
  • Improve templates not found/provided msgs by @m1n0 in #26
  • Update check suggestion links by @janet-can in #25
  • Fix boolean attributes+add tests by @m1n0 in #28
  • Update PR Workflow for merge queue support by @vijaykiran in #29
  • Templates support for failed rows check by @m1n0 in #30
  • Feature: track supported and unsupported data sources by @baturayo in #32
  • CLOUD-3862 push ci info file contents to cloud scan results by @gregkaczan in #31
  • Fix link to attributes by @vijaykiran in #33
  • TRINO: add http_headers option by @vijaykiran in #35
  • HIVE: add configuration parameters by @vijaykiran in #36
  • CLOUD-3861 pass scanType with cicd option by @gregkaczan in #34

[soda-library] 1.0.1 & 1.0.2

23 June 2023

Fixes

  • Add dispatch pipeline for pushing to Dockerhub by @dakue-soda in #10
  • Fix container build, the reference to our own pypi was m… by @dakue-soda in #11
  • Allow newer version of pyyaml by @m1n0 in #13
  • Handle scenario where schema cannot be obtained by @m1n0 in #14
  • Set default for group by name by @vijaykiran in #16
  • Include checks metadata in scan result by @m1n0 in #17
  • Upgrade BigQuery client to 3.x by @m1n0 in #19
  • Include basic data source info in scan payload by @m1n0 in #15

[soda-library] 1.0.0

15 June 2023

General availability release

Introducing the launch of Soda Library, a Python library and CLI tool for testing data quality.

Built on top of Soda Core, Soda Library leverages all the features and functionality of the open-source tool, with newly added features. Install Soda Library from the command line, then configure it to connect to Soda Cloud using API keys that are valid for a free, 45-day trial.

pip install -i https://pypi.cloud.soda.io soda-postgres

If you already use Soda Core, you can seamlessly upgrade to Soda Library without changing any configurations, checks, or integrations. See Migrate from Soda Core for details.

Features

  • Soda Library supports SodaCL’s newest checks: Group By and Group Evolution.
    • For an individual dataset, add a Group By configuration to specify the categories into which Soda must group the check results. When you run a scan, Soda groups the results according to the unique values in the column you identified.
    • Use a Group Evolution check to validate the presence or absence of a group in a dataset, or to check for changes to groups in a dataset relative to their previous state.
  • Soda Library supports Check Suggestions, a helpful CLI tool that assists you in generating basic data quality checks. Instead of writing your own data quality checks from scratch, the check suggestions assisstant profiles your dataset, then prompts you through a series of questions so that it can leverage the built-in Soda metrics and auto-generate quality checks tailored to your data.
  • Soda Library supports Check template configurations that enable you to prepare a user-defined metric that you can reuse in checks in multiple checks YAML files.

Last modified on 15-Jun-23