my_code_base.stats.xarray_utils¶
Classes¶
This class provides statistical operations on xarray data. |
Functions¶
|
Calculate linear regression statistics between two |
Module Contents¶
- class my_code_base.stats.xarray_utils.StatsAccessor(obj)[source]¶
This class provides statistical operations on xarray data.
- Parameters:¶
- obj : xarray.Dataset or xarray.DataArray¶
The xarray object on which statistical operations will be performed.
- fill_months_with_annual_value()[source]¶
Fill the xarray object with annual values for each month.
- Returns:¶
The xarray object with extended annual series.
- Return type:¶
- Raises:¶
ValueError – If the time series is not of yearly frequency.
-
my_code_base.stats.xarray_utils.xr_linregress(x, y, dim=
'time', dof=None, deseasonalize: bool =True)[source]¶ Calculate linear regression statistics between two
xarray.DataArrayalong a specified dimension.- Parameters:¶
- x : xarray.DataArray¶
The independent variable.
- y : xarray.DataArray¶
The dependent variable.
- dim : str, optional¶
The dimension along which to perform the regression. Defaults to ‘time’.
- dof : int, str, or tuple, optional¶
The degrees of freedom for the t-distribution. If None, it is calculated as n - 2, where n is the sample size. If ‘integral_timescale’, the integral timescale is calculated and used to determine the degrees of freedom. If ‘effective_sample_size’, the effective sample size is calculated and used to determine the degrees of freedom. Can also be a tuple
('integral_timescale', '1/e')to use the 1/e decay threshold instead of the first zero-crossing when computing the integral timescale. Defaults to None.- deseasonalize : bool¶
- Returns:¶
A dataset containing the following regression statistics: - ‘sample_size’: The number of non-null values along the specified dimension. - ‘slope’: The slope of the regression line. - ‘intercept’: The intercept of the regression line. - ‘r_value’: The correlation coefficient between x and y. - ‘p_value’: The two-tailed p-value for the null hypothesis that the slope is zero. - ‘std_err’: The standard error of the slope.
- Return type:¶
Notes
NaN values are automatically excluded from the calculations.
The correlation coefficient, p-value, and standard error are calculated using the t-distribution.
If dof is ‘integral_timescale’, the integral timescale is calculated as the sum of autocorrelation values until the first zero-crossing. The effective sample size is then calculated as the total sample size divided by the integral timescale.
If dof is ‘effective_sample_size’, the effective sample size is calculated as n * (1 - r1*r2) / (1 + r1*r2), where r1 and r2 are the lag-1 autocorrelation coefficients of x and y, respectively.