Dive into the 'Statistical Business Analysis Quiz! Hardest Trivia Questions' to test and enhance your knowledge on ROC curves, data partitioning, logistic regression, and more. Essential for aspiring business analysts and data scientists aiming to sharpen their analytical skills.
The sample means from the validation data set are applied to the training and test data sets.
The sample means from the training data set are applied to the validation and test data sets.
The sample means from the test data set are applied to the training and validation data sets.
The sample means from each partition of the data are applied to their own partition.
Rate this question:
Score data=valid1 out=roc;
Score data=valid1 outroc=roc;
Mode1resp(event= '1') = gender region/outroc=roc;
Mode1resp(event"1") = gender region/ out=roc;
Rate this question:
Simple random sampling without replacement
Simple random sampling with replacement
Stratified random sampling without replacement
Sequential random sampling with replacement
Rate this question:
Proc surveryselect data=SASUSER.DATABASE samprate=0.6 out=sample; strata country; run;
Proc sort data=SASUSER.DATABASE; by county; run; proc surveyselect data=SASUSER.DATABASE samprate=0.6 out=sample outall; run;
Proc sort data=SASUSER.DATABASE; by county; run; proc surveyselect data=SASUSER.DATABASE samprate=0.6 out=sample outall; strata county; run;
Proc sort data=SASUSER.DATABASE; by county; run; proc surveyselect data=SASUSER.DATABASE samprate=0.6 out=sample; strata county; eun;
Rate this question:
Selecting the top 10% of the population scored by the model should result in 3.14 times more events than a random draw of 10%.
Selecting the observations with a response probability of at least 10% should result in 3.14 times more events than a random draw of 10%.
Selecting the top 10% of the population scored by the model should result in 3.14 timesgreater accuracy than a random draw of 10%.
Selecting the observations with a response probability of atleast 10% should result in 3.14times greater accuracy than a random draw of 10%.
Rate this question:
The predicted lift for the best 50% of validation data cases
The predicted lift if the entire population is scored as event cases
The predicted lift if none of the population are scored as event cases
The predicted lift if 50% of the population are randomly scored as event cases
Rate this question:
Rate this question:
Depth
Sensitivity
Specificity
Positive predictive value
Rate this question:
Model A. It is more complex with a higher accuracy than model B on training data.
Model A. It performs better on the boundary for the training data.
Model B. It is more complex with a higher accuracy than model A on validation data.
Model B. It is simpler with a higher accuracy than model A on validation data.
Rate this question:
Profit=(P_R>0.05)*Purch*200-(P_R>.05)*(1-Purch)*10;
Profit=(P_R.05)*(1-Purch)*10;
If P_R> 0.05; profit=(P_R>0.05)*Purch*200-(P_R>.05)*(1-Purch)*10;
If P_R> 0.05; profit=(P_R>0.05)*Purch*200+(P_R
Rate this question:
Training: 50% Validation: 0% Testing: 50%
Training: 100% Validation: 0% Testing: 0%
Training: 0% Validation: 100% Testing: 0%
Training: 50% Validation: 50% Testing: 0%
Rate this question:
Candidate 1, because the area outside the curve is greater
Candidate 2, because the area outside the curve is greater
Candidate 1, because it is closer to the diagonal reference curve
Candidate 2, because it shows less over fit than Candidate 1
Rate this question:
Sensitivity and PV+
Specificity and PV-
PV+ and PV-
Sensitivity and Specificity
Rate this question:
X=40, Y=10
X=.05, Y=10
X=.05, Y=.40
X=.10,Y=.05
Rate this question:
To provide a unbiased measure of assessment for the final model.
To compare models and select and fine-tune the final model.
To reduce total sample size to make computations more efficient.
To build the predictive models.
Rate this question:
Rate this question:
Training data
Total data
Test data
Validation data
Rate this question:
It violates assumptions of the model.
It requires extra computational effort and time.
It omits the training (and test) data sets from the benefits of the cleansing methods.
There is no ability to compare the effectiveness of different cleansing methods.
Rate this question:
More high value customers are found in some regions than others.
The difference between average purchases for medium and high value customers depends on the region.
Regions with higher average purchases have more high value customers.
Regions with higher average purchases have more medium value customers.
Rate this question:
Rate this question:
All groups are significantly different from each other.
2XL is significantly different from all other groups.
Only XL and 2XL are not significantly different from each other.
No groups are significantly different from each other.
Rate this question:
35%
65%
76%
Rate this question:
Normality, because Prob > F < .0001.
Normality, because the interquartile ranges are different in different ad campaigns.
Constant variance, because Prob > F < .0001.
Constant variance, because the interquartile ranges are different in different ad campaigns.
Rate this question:
Medium wrist size is significantly different than small wrist size.
Large wrist size is significantly different than medium wrist size.
Large wrist size is significantly different than small wrist size.
There is no significant difference due to wrist size.
Rate this question:
Proc glm data=salary; class gender; model pay=gender; run;
Proc ttest data=salary; class gender; var pay; run;
Proc glm data=salary; class pay; model pay=gender; run;
Proc ttest data=salary; class gender; model pay=gender; run;
Rate this question:
School*Gender should be removed because it is non-significant.
Gender should be removed because it is non-significant.
School should be removed because it is significant.
Gender should not be removed due to its involvement in the significant interaction.
Rate this question:
Rate this question:
Rate this question:
A scatter plot of binary response versus a predictor variable.
A trend plot of empirical logit versus a predictor variable.
A logistic regression plot of predicted probability values versus a predictor variable.
A box plot of the odds ratio values versus a predictor variable.
Rate this question:
Option A
Option B
Option C
Option D
Rate this question:
There is quasi-complete separation in the data.
There is collinearity among the predictors.
There are missing values in the data.
There are too many observations in the data.
Rate this question:
Eliminate store_id as a predictor in the model because it has too many levels to be feasible.
Cluster by using Greenacre's method to combine stores that are similar.
Use subject matter expertise to combine stores that are similar.
Randomly combine the stores into five groups to keep the stochastic variation among the observations intact.
Rate this question:
Stabilize parameter estimates and increase the risk of overfitting.
Destabilize parameter estimates and increase the risk of overfitting.
Stabilize parameter estimates and decrease the risk of overfitting.
Destabilize parameter estimates and decrease the risk of overfitting.
Rate this question:
Stabilize parameter estimates and increase the risk of overfitting.
Destabilize parameter estimates and increase the risk of overfitting.
Stabilize parameter estimates and decrease the risk of overfitting.
Destabilize parameter estimates and decrease the risk of overfitting.
Rate this question:
Collinearity
Influential observations
Quasi-complete separation
Problems that arise due to missing values
Rate this question:
The association between the continuous predictor and the binary response is quadratic.
The association between the continuous predictor and the log-odds is quadratic.
The association between the continuous predictor and the continuous response is quadratic.
The association between the binary predictor and the log-odds is quadratic.
Rate this question:
OUTPUT=estimates
OUTP=estimates
OUTSTAT=estimates
OUTCORR=estimates
Rate this question:
The model will likely be overfit.
There will be a high rate of collinearity among input variables.
Complete case analysis means that fewer observations will be used in the model building process.
New cases with missing values on input variables cannot be scored without extra data processing.
Rate this question:
Concordant and discordant pairs of ranked observations
Logit link (log(p/1-p))
Rank-ordered values of the variables
Weighted sum of chi-square statistics for 2x2 tables
Rate this question:
An increase in R-Square
A decrease in R-Square
A decrease in Mean Square Error
No change in R-Square
Rate this question:
An increase in R-Square
A decrease in R-Square
A decrease in Mean Square Error
No change in R-Square
Rate this question:
The errors are correlated, normally distributed with constant mean and zero variance.
The errors are correlated, normally distributed with zero mean and constant variance.
The errors are independent, normally distributed with constant mean and zero variance.
The errors are independent, normally distributed with zero mean and constant variance.
Rate this question:
Rate this question:
A
B
C
D
R-Square
Coeff Var
Adj R-Sq
Error DF
Rate this question:
Rate this question:
Rate this question:
A
B
C
D
Rate this question:
Quiz Review Timeline (Updated): Mar 21, 2023 +
Our quizzes are rigorously reviewed, monitored and continuously updated by our expert board to maintain accuracy, relevance, and timeliness.
Wait!
Here's an interesting quiz for you.