1.
Sta su OOB (out of Bag) uzorci:
Correct Answer
D. Slucajan uzorak dimenzije jednake trecini duzine obucavajuceg skupa
Explanation
OOB (out of Bag) uzorci refer to the samples that are not selected in the process of bootstrap sampling. In this case, the correct answer states that OOB uzorci are random samples with a size equal to one-third of the length of the training set.
2.
Kompromis izmedju varijanse i pomeraja znaci:
Correct Answer
B. Da se jedna od ovih velicina moze smanjiti samo na racun druge
Explanation
The correct answer means that one of these variables can only be reduced at the expense of the other. In other words, in order to decrease one variable, the other variable must increase. This suggests that there is a trade-off between variance and bias, where reducing variance may increase bias and vice versa.
3.
Da li metod resemplovanja, primenjen u butstrep metodi menja apriorne verovatnoce klasa tokom procedure:
Correct Answer
B. Ne
Explanation
The resampling method used in the bootstrap method does not change the prior probabilities of classes during the procedure.
4.
U stablu odlucivanja koren i unutrasnji cvorovi: ???
Correct Answer
B. Testiraju pojedinacne atribute
Explanation
The nodes in a decision tree are responsible for making decisions based on the attributes of the data. In this case, the question is asking about the role of the root and internal nodes. The correct answer is "Testiraju pojedinacne atribute" which means that these nodes test individual attributes to determine the path the data should take in the decision tree. This is in contrast to the leaf nodes, which provide the final classification based on the attributes tested by the root and internal nodes.
5.
Random Forest algoritam nece raditi dobro, u slucaju kada je odnos k/p (gde je p dimenzija vektora obelezja, a k broj bitnih obelezja, k=<p): ???
Correct Answer
A. Blizak 1
Explanation
Random Forest algoritam neće raditi dobro kada je odnos k/p (gde je p dimenzija vektora obeležja, a k broj bitnih obeležja) blizak 1. Ovo znači da je broj bitnih obeležja skoro jednak ukupnom broju obeležja u vektoru. U takvim slučajevima, Random Forest može imati poteškoća u pronalaženju najvažnijih obeležja i donošenju tačnih predikcija. Ovo se može desiti jer algoritam može biti preopterećen nepotrebnim ili manje bitnim obeležjima, umesto da se fokusira na ključne karakteristike podataka.
6.
Zasto se vrsi redukcija dimenzionalnosti prostora obelezja (dimenzionalnost analiziranih podataka):
Correct Answer(s)
B. 5. Omogucava jednostavnija objasnjenja
C. 2. Omogucava bolju vizualizaciju
Explanation
Reducing the dimensionality of feature space allows for simpler explanations and better visualization of the analyzed data. By reducing the number of dimensions, it becomes easier to understand and interpret the data, making it simpler to explain the patterns and relationships within the data. Additionally, reducing dimensionality can also improve visualization techniques, as it is easier to represent and plot data in lower-dimensional spaces.
7.
Evaluacione funkcije u metodama selekcije obelezja mogu biti:
Correct Answer(s)
A. Vraperi
D. Filteri
Explanation
The correct answer is "Vraperi, Filteri". In feature selection methods, the evaluation of functions can be done using wrappers and filters. Wrappers evaluate subsets of features by training and testing a model using different combinations of features. Filters, on the other hand, evaluate the relevance of features based on certain criteria without involving a specific model. Both wrappers and filters are commonly used in feature selection to assess the performance and relevance of different feature subsets.
8.
Koje od navedenih metoda spadaju u suboptimalne metode selekcije obelezja:
Correct Answer(s)
B. Sekvencijalna selekcija unapred
C. Sekvencijalna selekcija unazad
E. Selekcija "Dodaj i odbaci r"
Explanation
The suboptimal feature selection methods mentioned in the answer are "Sekvencijalna selekcija unapred" (Sequential forward selection), "Sekvencijalna selekcija unazad" (Sequential backward selection), and "Selekcija 'Dodaj i odbaci r'" (Add and discard r selection). These methods are considered suboptimal because they may not always find the best subset of features and can be computationally expensive. They rely on a stepwise approach of adding or removing features based on certain criteria, which may not always lead to the optimal solution.
9.
U random forest algoritmu se na svaki terminalni cvor prvo primeni operacija:
Correct Answer
A. Selektovanja m varijabli od ukupno p, m<p, na slucajan nacin
Explanation
In the random forest algorithm, the first operation applied to each terminal node is the selection of m variables out of a total of p, where m
10.
Ekstrakcija obelezja je:
Correct Answer
B. Projektovanje originalnih dimenzija podataka (obelezja) u nove dimenzije vece od originalnog prostora
Explanation
Ekstrakcija obelezja, or feature extraction, refers to the process of transforming the original data (features) into a new representation with reduced dimensionality. In this case, the correct answer states that it involves projecting the original dimensions of the data (features) into new dimensions that are larger than the original space. This means that the feature extraction technique aims to capture more information or create a more expressive representation of the data by expanding its dimensionality.
11.
Maksimalna vrednost Ginijevog indeksa je:
Correct Answer
A. 1-1/nc, nc je broj klasa
Explanation
The correct answer is 1-1/nc, where nc is the number of classes. The Gini index is a measure of impurity in a decision tree. It ranges from 0 to 1, with 0 indicating perfect purity (all samples belong to the same class) and 1 indicating maximum impurity (samples are evenly distributed across all classes). The formula for calculating the Gini index is 1 - sum(p^2), where p is the proportion of samples in each class. Therefore, the maximum value of the Gini index occurs when all classes have an equal proportion of samples, which is represented by 1-1/nc.
12.
Da li Random Forest algoritam moze da dovede do overfitinga, kada se poveca broj stabala u ansamblu:
Correct Answer
C. Ne
Explanation
Povećanje broja stabala u ansamblu Random Forest algoritma smanjuje rizik od overfittinga. Random Forest je kombinacija više stabala odlučivanja, gde svako stablo u ansamblu radi nezavisno. Kombinacija rezultata svih stabala pomaže u smanjenju varijance i generalizaciji modela. Veći broj stabala u ansamblu povećava raznolikost i stabilnost modela, smanjujući mogućnost da se model prilagodi preterano podacima za obuku. Stoga, povećanje broja stabala u ansamblu Random Forest algoritma ne dovodi do overfittinga.
13.
Dva stabla odlucivanja S1 i S2 daju ostu gresku klasifikacije na trening skupu. Ako S1 ima veci broj cvorova od stabla S2, sa velikom verovatnocom ce greska na test skupu biti:
Correct Answer
A. Greska (S1, Test) < Greska (S2, Test)
Explanation
If S1 has a greater number of nodes than S2, it is likely that S1 has a more complex structure and therefore has a higher chance of overfitting the training data. This means that S1 may perform better on the training set but worse on the test set, resulting in a higher error rate on the test set compared to S2. Therefore, the statement "Greska (S1, Test) < Greska (S2, Test)" is the correct answer.
14.
Ako je atribut x imenski atribut koji ima 3 razlicite vrednosti, potencijalan proj podela na_____ u svakom cvoru Hantovog algoritma moze biti:
Correct Answer
D. 3
Explanation
The attribute x is an attribute with 3 distinct values. In the context of the Hantov algorithm, the potential projection split in each node can be 3. This means that when using the Hantov algorithm to build a decision tree, the attribute x can be used to split the data into 3 different branches at each node.
15.
Varijansa jedne procene govori o tome:
Correct Answer
C. Koliko se procena menja od jednog do drugog uzorka
Explanation
The answer suggests that the variance of an estimate measures how much the estimate changes from one sample to another. It indicates the variability or inconsistency in the estimate across different samples.
16.
Cemu sluzi test skup:
Correct Answer
C. Za ocenu greske vec obucenog sistema
Explanation
The given correct answer states that the purpose of the test set is to evaluate the error of an already trained system. This means that the test set is used to measure the performance and accuracy of the trained system by comparing its output with the expected output. It helps in identifying any discrepancies or errors in the system's predictions and provides feedback on its effectiveness.
17.
Butstrep je metoda koja formira trening skupove tako sto:
Correct Answer
B. Iz skupa od N podataka izabere N podataka sa vracanjem izabran......... skup
Explanation
The correct answer is: Iz skupa od N podataka izabere N podataka sa vracanjem izabran skup.
The method of Butstrep forms training sets by selecting N data points with replacement from a set of N data points. This means that each data point can be selected multiple times, resulting in a new set with the same length as the original set.
18.
Ako je x redni atribut, koji ima 4 razlicite vrednosti, broj podela u cvorovima Hantovog algoritma, moze potencijalno biti:
Correct Answer
A. 2,3,4
Explanation
The correct answer is 2,3,4. The number of splits in the nodes of the Hant algorithm depends on the number of unique values in the attribute x. Since x has 4 different values, the potential number of splits can be 2, 3, or 4.
19.
Generalizaciona svojstva stabala se mogu povecati: ???
Correct Answer
A. Prekidom obucavanja iako bi se greska na treningu i dalje smanjivala, bez testiranja na validacionom skupu
Explanation
The correct answer suggests that the generalization properties of trees can be increased by stopping the training process even though the training error continues to decrease, without testing on a validation set. This means that by stopping the training process before it overfits the training data and without using a separate validation set to assess the model's performance, the model can have better generalization capabilities.
20.
U Random Forest algoritmu se radi o: ??
Correct Answer
B. Ansamblu stabala odlucivanja
Explanation
In the Random Forest algorithm, the process involves creating an ensemble of decision trees. Each decision tree is built independently using a random subset of the training data and features. The final prediction is then made by aggregating the predictions of all the individual trees in the ensemble. This approach of combining multiple decision trees helps to reduce overfitting and improve the overall accuracy and robustness of the model. Therefore, the correct answer is "Ansamblu stabala odlucivanja" which translates to "Ensemble of decision trees".
21.
Greska kvalifikacije ima maksimalnu vrednost:
Correct Answer
C. 1-1/nc, nc je broj klasa
Explanation
The correct answer is 1-1/nc, where nc is the number of classes. This formula represents the maximum value of the qualification error. By subtracting 1 divided by the number of classes from 1, we obtain the maximum value of the error. This makes sense because as the number of classes increases, the chance of making an error also increases, resulting in a higher maximum value for the error.
22.
U Random Forest algoritmu svako stablo odlucivanja se konstruise na:
Correct Answer
D. Na jednom butstrep uzorku izvucenom iz obucavajuceg skupa
Explanation
In the Random Forest algorithm, each decision tree is constructed on a bootstrap sample drawn from the training set. A bootstrap sample is created by randomly selecting data points from the training set with replacement. This means that some data points may be selected multiple times while others may not be selected at all. By constructing each tree on a different bootstrap sample, the Random Forest algorithm introduces randomness and diversity in the decision trees, which helps to reduce overfitting and improve the overall performance of the model.
23.
Pomeraj (Bias) jedne procene, govori koliko je ona:
Correct Answer
B. Blizu istinite vrednosti
Explanation
The correct answer suggests that the bias of an estimation indicates how close it is to the true value. This means that the estimation is not too far off from the actual value being estimated.
24.
Kako na K-krosvalidacionu procenu preformansi utice malo k:
Correct Answer(s)
A. Racunarska kompleksnost se smanjuje
C. Varijansa je mala
E. Pomeraj procene je vrlo veliki
Explanation
The correct answer is "Racunarska kompleksnost se smanjuje, Varijansa je mala, Pomeraj procene je vrlo veliki."
The reason for this is that when the value of k in k-fold cross-validation is small, the computational complexity decreases. This is because with a smaller value of k, the algorithm needs to perform fewer iterations.
Additionally, a smaller value of k leads to a smaller variance. This is because with a larger value of k, the model is trained on more diverse subsets of the data, leading to a higher variance in the performance estimates.
Lastly, a very large value of k leads to a very large bias or shift in the performance estimates. This is because with a larger value of k, the model is trained on smaller subsets of the data, leading to a higher bias in the performance estimates.
25.
Butstrep metoda ima:
Correct Answer
B. Dobru preciznost i za varijansu i za pomeraj
Explanation
The correct answer is "Dobru preciznost i za varijansu i za pomeraj" which translates to "Good precision for both variance and displacement" in English. This suggests that the Butstrep method has a high level of accuracy and reliability in measuring both variance and displacement.
26.
Visedimenziono skaliranje (MDS) je:
Correct Answer
C. Metoda ekstrakcije obelezja
Explanation
Visedimenziono skaliranje (MDS) is a method of feature extraction. It is not a method of feature selection because it does not involve choosing a subset of features. Instead, MDS aims to reduce the dimensionality of the data by creating a lower-dimensional representation that preserves the pairwise distances between data points. Therefore, the correct answer is "Metoda ekstrakcije obelezja" (Method of feature extraction).
27.
Optimalne metode selekcije obelezja su:
Correct Answer(s)
D. Potpuna pretraga
E. Metoda grananja i ogranicavanja (BB)
Explanation
The correct answer is Potpuna pretraga and Metoda grananja i ogranicavanja (BB). These methods are considered optimal feature selection methods because they involve exhaustively searching through all possible combinations of features to find the best subset. Potpuna pretraga (Complete search) involves evaluating all possible subsets of features, while Metoda grananja i ogranicavanja (Branch and bound) is a more efficient approach that eliminates subsets that are guaranteed to be suboptimal. These methods ensure that the selected features provide the best possible performance for the given task.
28.
Analiza glavnih komponenti (PCA) je metoda:
Correct Answer
C. Ekstrakcije obelezja
Explanation
Principal Component Analysis (PCA) is a method of feature extraction. It is used to reduce the dimensionality of a dataset by transforming the original features into a new set of uncorrelated variables called principal components. These principal components are linear combinations of the original features and are ordered in such a way that the first component explains the maximum variance in the data. PCA is not a feature selection method as it does not explicitly choose a subset of features, but rather creates new features that capture the most important information in the data. Therefore, the correct answer is "Ekstrakcije obelezja" (Feature extraction).
29.
_________ (overfitting) stabla se javlja u slucaju kada su stabla:
Correct Answer
B. Kompleksnija nego sto je potrebno
Explanation
Overfitting occurs when a model or decision tree is more complex than necessary. This means that the tree has too many branches or nodes, which can lead to memorizing the training data instead of learning the underlying patterns. As a result, the tree may perform well on the training data but poorly on new, unseen data. In this case, the correct answer states that overfitting occurs when the trees are more complex than necessary.