my_code_base.stats.xarray_utils

Classes

StatsAccessor

This class provides statistical operations on xarray data.

Functions

xr_linregress(x, y[, dim, dof, deseasonalize])

Calculate linear regression statistics between two xarray.DataArray along a specified dimension.

Module Contents

class my_code_base.stats.xarray_utils.StatsAccessor(obj)[source]

This class provides statistical operations on xarray data.

Parameters:
obj : xarray.Dataset or xarray.DataArray

The xarray object on which statistical operations will be performed.

weighted_mean(dim)

Calculate the weighted mean based on the given dimension.

fill_months_with_annual_value()[source]

Fill the xarray object with annual values for each month.

Returns:

The xarray object with extended annual series.

Return type:

xarray.Dataset or xarray.DataArray

Raises:

ValueError – If the time series is not of yearly frequency.

weighted_mean(dim)

Calculate the weighted annual mean (taking days of months into account).

Parameters:
dim : str

The dimension along which to calculate the weighted mean.

Returns:

weighted_mean – The weighted annual mean.

Return type:

xarray.Dataset or xarray.DataArray

my_code_base.stats.xarray_utils.xr_linregress(x, y, dim='time', dof=None, deseasonalize: bool = True)[source]

Calculate linear regression statistics between two xarray.DataArray along a specified dimension.

Parameters:
x : xarray.DataArray

The independent variable.

y : xarray.DataArray

The dependent variable.

dim : str, optional

The dimension along which to perform the regression. Defaults to ‘time’.

dof : int, str, or tuple, optional

The degrees of freedom for the t-distribution. If None, it is calculated as n - 2, where n is the sample size. If ‘integral_timescale’, the integral timescale is calculated and used to determine the degrees of freedom. If ‘effective_sample_size’, the effective sample size is calculated and used to determine the degrees of freedom. Can also be a tuple ('integral_timescale', '1/e') to use the 1/e decay threshold instead of the first zero-crossing when computing the integral timescale. Defaults to None.

deseasonalize : bool

Returns:

A dataset containing the following regression statistics: - ‘sample_size’: The number of non-null values along the specified dimension. - ‘slope’: The slope of the regression line. - ‘intercept’: The intercept of the regression line. - ‘r_value’: The correlation coefficient between x and y. - ‘p_value’: The two-tailed p-value for the null hypothesis that the slope is zero. - ‘std_err’: The standard error of the slope.

Return type:

xarray.Dataset

Notes

  • NaN values are automatically excluded from the calculations.

  • The correlation coefficient, p-value, and standard error are calculated using the t-distribution.

  • If dof is ‘integral_timescale’, the integral timescale is calculated as the sum of autocorrelation values until the first zero-crossing. The effective sample size is then calculated as the total sample size divided by the integral timescale.

  • If dof is ‘effective_sample_size’, the effective sample size is calculated as n * (1 - r1*r2) / (1 + r1*r2), where r1 and r2 are the lag-1 autocorrelation coefficients of x and y, respectively.