summary_cont()
Returns a nice data table as a Pandas DataFrame that includes the variable name, total number of non-missing observations, standard deviation, standard error, and the 95% confidence interval. This is compatible with Pandas Series, DataFrame, and GroupBy objects.
Arguments
summary_cont(group1, conf = 0.95, decimals = 4)
- group1, must either be a Pandas Series or DataFrame with multiple
columns stated
conf, must be entered in decimal format. The default confidence interval being calculated is at 95%
decimals, rounds the output table to the specified decimal.
returns
Pandas DataFrame
Examples
import numpy, pandas, researchpy
numpy.random.seed(12345678)
df = pandas.DataFrame(numpy.random.randint(10, size= (100, 2)),
columns= ['healthy', 'non-healthy'])
df['tx'] = ""
df.loc[0:50, 'tx'] = "Placebo"
df.loc[50:101, 'tx'] = "Experimental"
df['dose'] = ""
df.loc[0:26, 'dose'] = "10 mg"
df.loc[26:51, 'dose'] = "25 mg"
df.loc[51:76, 'dose'] = "10 mg"
df.loc[76:101, 'dose'] = "25 mg"
# Summary statistics for a Series (single variable)
researchpy.summary_cont(df['healthy'])
Variable | N | Mean | SD | SE | 95% Conf. | Interval | |
---|---|---|---|---|---|---|---|
0 | healthy | 100.0 | 4.59 | 2.749086 | 0.274909 | 4.044522 | 5.135478 |
# Summary statistics for multiple Series
researchpy.summary_cont(df[['healthy', 'non-healthy']])
Variable | N | Mean | SD | SE | 95% Conf. | Interval | |
---|---|---|---|---|---|---|---|
0 | healthy | 100.0 | 4.59 | 2.749086 | 0.274909 | 4.044522 | 5.135478 |
1 | non-healthy | 100.0 | 4.16 | 3.132495 | 0.313250 | 3.538445 | 4.781555 |
# Easy to export results, assign to Python object which will have
# the Pandas DataFrame class
results = researchpy.summary_cont(df[['healthy', 'non-healthy']])
results.to_csv("results.csv", index= False)
# This works with GroupBy objects as well
researchpy.summary_cont(df['healthy'].groupby(df['tx']))
N | Mean | SD | SE | 95% Conf. | Interval | |
---|---|---|---|---|---|---|
tx | ||||||
Experimental | 50 | 4.66 | 2.560373 | 0.362091 | 3.943096 | 5.376904 |
Placebo | 50 | 4.52 | 2.950199 | 0.417221 | 3.693944 | 5.346056 |
# Even with a GroupBy object with a hierarchical index
researchpy.summary_cont(df.groupby(['tx', 'dose'])['healthy', 'non-healthy'])
healthy | non-healthy | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | sem | 95% Conf. | Interval | count | mean | std | sem | 95% Conf. | Interval | ||
tx | dose | ||||||||||||
Experimental | 10 mg | 25 | 4.360000 | 2.514624 | 0.502925 | 3.374267 | 5.345733 | 25 | 4.160000 | 3.197395 | 0.639479 | 2.906621 | 5.413379 |
25 mg | 25 | 4.960000 | 2.621704 | 0.524341 | 3.932292 | 5.987708 | 25 | 4.240000 | 3.205204 | 0.641041 | 2.983560 | 5.496440 | |
Placebo | 10 mg | 26 | 4.115385 | 2.984318 | 0.585273 | 2.968250 | 5.262520 | 26 | 3.961538 | 3.143002 | 0.616393 | 2.753407 | 5.169670 |
25 mg | 24 | 4.958333 | 2.911434 | 0.594294 | 3.793517 | 6.123150 | 24 | 4.291667 | 3.168859 | 0.646841 | 3.023859 | 5.559474 |
# Above is the default output, but if the results want to be compared
# above/below each other use .apply()
df.groupby(['tx', 'dose'])['healthy', 'non-healthy'].apply(researchpy.summary_cont)
Variable | N | Mean | SD | SE | 95% Conf. | Interval | |||
---|---|---|---|---|---|---|---|---|---|
tx | dose | ||||||||
Experimental | 10 mg | 0 | healthy | 25.0 | 4.360000 | 2.514624 | 0.502925 | 3.322014 | 5.397986 |
1 | non-healthy | 25.0 | 4.160000 | 3.197395 | 0.639479 | 2.840180 | 5.479820 | ||
25 mg | 0 | healthy | 25.0 | 4.960000 | 2.621704 | 0.524341 | 3.877814 | 6.042186 | |
1 | non-healthy | 25.0 | 4.240000 | 3.205204 | 0.641041 | 2.916957 | 5.563043 | ||
Placebo | 10 mg | 0 | healthy | 26.0 | 4.115385 | 2.984318 | 0.585273 | 2.909992 | 5.320777 |
1 | non-healthy | 26.0 | 3.961538 | 3.143002 | 0.616393 | 2.692052 | 5.231024 | ||
25 mg | 0 | healthy | 24.0 | 4.958333 | 2.911434 | 0.594294 | 3.728942 | 6.187724 | |
1 | non-healthy | 24.0 | 4.291667 | 3.168859 | 0.646841 | 2.953575 | 5.629758 |