summary_cont()

Returns a nice data table as a Pandas DataFrame that includes the variable name, total number of non-missing observations, standard deviation, standard error, and the 95% confidence interval. This is compatible with Pandas Series, DataFrame, and GroupBy objects.

Arguments

summary_cont(group1, conf = 0.95, decimals = 4)

group1, must either be a Pandas Series or DataFrame with multiple
columns stated

conf, must be entered in decimal format. The default confidence interval being calculated is at 95%

decimals, rounds the output table to the specified decimal.

returns

Pandas DataFrame

Examples

import numpy, pandas, researchpy

numpy.random.seed(12345678)

df = pandas.DataFrame(numpy.random.randint(10, size= (100, 2)),
                  columns= ['healthy', 'non-healthy'])
df['tx'] = ""
df.loc[0:50, 'tx'] = "Placebo"
df.loc[50:101, 'tx'] = "Experimental"

df['dose'] = ""
df.loc[0:26, 'dose'] = "10 mg"
df.loc[26:51, 'dose'] = "25 mg"
df.loc[51:76, 'dose'] = "10 mg"
df.loc[76:101, 'dose'] = "25 mg"

# Summary statistics for a Series (single variable)
researchpy.summary_cont(df['healthy'])

	Variable	N	Mean	SD	SE	95% Conf.	Interval
0	healthy	100.0	4.59	2.749086	0.274909	4.044522	5.135478

# Summary statistics for multiple Series
researchpy.summary_cont(df[['healthy', 'non-healthy']])

	Variable	N	Mean	SD	SE	95% Conf.	Interval
0	healthy	100.0	4.59	2.749086	0.274909	4.044522	5.135478
1	non-healthy	100.0	4.16	3.132495	0.313250	3.538445	4.781555

# Easy to export results, assign to Python object which will have
# the Pandas DataFrame class
results = researchpy.summary_cont(df[['healthy', 'non-healthy']])

results.to_csv("results.csv", index= False)

# This works with GroupBy objects as well
researchpy.summary_cont(df['healthy'].groupby(df['tx']))

	N	Mean	SD	SE	95% Conf.	Interval
tx
Experimental	50	4.66	2.560373	0.362091	3.943096	5.376904
Placebo	50	4.52	2.950199	0.417221	3.693944	5.346056

# Even with a GroupBy object with a hierarchical index
researchpy.summary_cont(df.groupby(['tx', 'dose'])['healthy', 'non-healthy'])

		healthy						non-healthy
		count	mean	std	sem	95% Conf.	Interval	count	mean	std	sem	95% Conf.	Interval
tx	dose
Experimental	10 mg	25	4.360000	2.514624	0.502925	3.374267	5.345733	25	4.160000	3.197395	0.639479	2.906621	5.413379
Experimental	25 mg	25	4.960000	2.621704	0.524341	3.932292	5.987708	25	4.240000	3.205204	0.641041	2.983560	5.496440
Placebo	10 mg	26	4.115385	2.984318	0.585273	2.968250	5.262520	26	3.961538	3.143002	0.616393	2.753407	5.169670
Placebo	25 mg	24	4.958333	2.911434	0.594294	3.793517	6.123150	24	4.291667	3.168859	0.646841	3.023859	5.559474

# Above is the default output, but if the results want to be compared
# above/below each other use .apply()

df.groupby(['tx', 'dose'])['healthy', 'non-healthy'].apply(researchpy.summary_cont)

			Variable	N	Mean	SD	SE	95% Conf.	Interval
tx	dose
Experimental	10 mg	0	healthy	25.0	4.360000	2.514624	0.502925	3.322014	5.397986
	10 mg	1	non-healthy	25.0	4.160000	3.197395	0.639479	2.840180	5.479820
	25 mg	0	healthy	25.0	4.960000	2.621704	0.524341	3.877814	6.042186
	25 mg	1	non-healthy	25.0	4.240000	3.205204	0.641041	2.916957	5.563043
Placebo	10 mg	0	healthy	26.0	4.115385	2.984318	0.585273	2.909992	5.320777
	10 mg	1	non-healthy	26.0	3.961538	3.143002	0.616393	2.692052	5.231024
	25 mg	0	healthy	24.0	4.958333	2.911434	0.594294	3.728942	6.187724
	25 mg	1	non-healthy	24.0	4.291667	3.168859	0.646841	2.953575	5.629758