AML Quiz 4 Section B

2.

In a neural network, which one of the following techniques is NOT useful to reduce overfitting?

Dropout
Regularization
Batch normalizatioh
Adding more layers

Correct Answer

A. Adding more layers

Explanation

Adding more layers is not useful to reduce overfitting in a neural network. Overfitting occurs when a model becomes too complex and starts to memorize the training data instead of learning the underlying patterns. Adding more layers can increase the complexity of the model, potentially exacerbating overfitting. Techniques like dropout, regularization, and batch normalization, on the other hand, are specifically designed to combat overfitting by introducing regularization constraints, reducing co-adaptation between neurons, and normalizing the inputs, respectively.

Rate this question:

3.

For an image recognition problem (such as recognizing a cat in a photo), which architecture of neural network has been found to be better suited for the tasks

Multi layer perceptron
Recurrent neural network
Convolutional neural network
Perceptron

Correct Answer

A. Convolutional neural network

Explanation

Convolutional neural networks (CNNs) have been found to be better suited for image recognition problems such as recognizing a cat in a photo. CNNs are specifically designed to process grid-like data, like images, and are able to automatically learn and extract features from the input data. They consist of multiple layers of interconnected neurons, including convolutional layers that apply filters to the input data, pooling layers that downsample the data, and fully connected layers for classification. This architecture allows CNNs to efficiently capture spatial hierarchies and patterns in images, making them highly effective for image recognition tasks.

Rate this question:

4.

Suppose you run K-means clustering algorithm on a given dataset. What are the factors on which the final clusters depend on ? I. The value of K II. The initial cluster seeds chosen III. The distance function used.

I only
II only
I & II only
I,II and III

Correct Answer

A. I,II and III

Explanation

The final clusters in K-means clustering depend on the value of K, which determines the number of clusters to be formed. The initial cluster seeds chosen also affect the final clusters as they determine the starting points for the algorithm. Additionally, the distance function used plays a crucial role in calculating the similarity between data points and assigning them to clusters. Therefore, all three factors - the value of K, the initial cluster seeds, and the distance function used - impact the final clusters in K-means clustering.

Rate this question:

5.

If g(z) is the sigmoid function, then its derivative with respect to z may be written in term of g(z) as

G(z)(1-g(z))
G(z)(1+g(z))
-g(z)(1+g(z))
G(z)(g(z)-1)

Correct Answer

A. G(z)(1-g(z))

Explanation

The sigmoid function is defined as g(z) = 1 / (1 + e^(-z)). To find its derivative with respect to z, we can use the quotient rule. Let f(z) = 1 and g(z) = 1 + e^(-z). Applying the quotient rule, we get f'(z)g(z) - f(z)g'(z) / (g(z))^2. Simplifying this expression gives (0)(1 + e^(-z)) - 1(-e^(-z)) / (1 + e^(-z))^2, which simplifies further to -e^(-z) / (1 + e^(-z))^2. Multiplying the numerator and denominator by e^z, we get -e^(-z) * e^z / (e^z + 1)^2. Simplifying this gives -1 / (e^z + 1) * 1 / (e^z + 1), which is equivalent to -1 / (e^z + 1)^2. Since g(z) = 1 / (1 + e^(-z)), we can substitute g(z) into the expression to get -1 / (g(z) + 1)^2. This is equivalent to -g(z)(1 - g(z)), which matches the given answer of g(z)(1 - g(z)).

Rate this question:

6.

The back-propagation learning algorithm applied to a two layer neural network

Always finds the globally optimal solution
Finds a locally optimal solution which may be globally optimal.
Never finds the globally optimal solution.
Finds a locally optimal solution which is never globally optimal

Correct Answer

A. Finds a locally optimal solution which may be globally optimal.

Explanation

The back-propagation learning algorithm applied to a two-layer neural network finds a locally optimal solution which may be globally optimal. This means that while the algorithm may not guarantee the absolute best solution, it is able to find a solution that is locally optimal and could potentially be the globally optimal solution. This is because the back-propagation algorithm iteratively adjusts the weights of the neural network based on the error between the predicted and actual outputs, gradually improving the network's performance. However, it is important to note that the algorithm may still get stuck in local optima depending on the specific problem and network architecture.

Rate this question:

7.

The Bayes Optimal Classifier

Is an ensemble of some selected hypotheses in the hypothesis space
Is an ensemble of all the hypotheses in the hypothesis space
Is the hypothesis that gives best result on test instances
None of the above

Correct Answer

A. Is an ensemble of all the hypotheses in the hypothesis space

Explanation

The Bayes Optimal Classifier is an ensemble of all the hypotheses in the hypothesis space. This means that it considers and combines the predictions of all possible hypotheses in order to make the most accurate classification decisions. By considering all hypotheses, the Bayes Optimal Classifier aims to minimize the overall classification error and maximize the accuracy of its predictions. Therefore, it is considered to be the most optimal and reliable classifier.

Rate this question:

8.

Which of the following is/are true about the Perceptron classifier? (Choose multiple option)

It can learn a OR function
It can learn a AND function
The obtained separating hyperplane depends on the order in which the points are presented in the training process.
For a linearly separable problem, there exists some initialization of the weights which might lead to non-convergent cases.

Correct Answer(s)

A. It can learn a OR function
A. It can learn a AND function
A. The obtained separating hyperplane depends on the order in which the points are presented in the training process.

Explanation

The Perceptron classifier is capable of learning both the OR function and the AND function. It can learn the OR function by adjusting its weights and biases to classify inputs that satisfy the logical OR condition. Similarly, it can learn the AND function by adjusting its weights and biases to classify inputs that satisfy the logical AND condition. Additionally, the obtained separating hyperplane in the Perceptron classifier depends on the order in which the points are presented during the training process. This means that the order of the training data can affect the final decision boundary of the classifier. However, for a linearly separable problem, there always exists some initialization of the weights that will lead to convergence, eliminating the possibility of non-convergent cases.

Rate this question:

9.

For the given set of points which of the following lines is most suitable to be the decision boundary?

Option 1
Option 2
Option 3
Option 4

Correct Answer

A. Option 1

10.

Which of the following is true?

In batch gradient descent we update the weights and biases of the neural network after forward pass over each training example.
In batch gradient descent we update the weights and biases of our neural network after forward pass over all the training examples.
Each step of stochastic gradient descent takes more time than each step of batch gradient descent.
None of these three options is correct

Correct Answer

A. In batch gradient descent we update the weights and biases of our neural network after forward pass over all the training examples.

Explanation

In batch gradient descent, the weights and biases of the neural network are updated after a forward pass over all the training examples. This means that the updates are made using the average gradient calculated from all the training examples in a single iteration. This approach allows for a more accurate update of the parameters and can lead to faster convergence compared to updating after each individual training example.

Rate this question:

AML Quiz 4 Section B

After training an SVM, we can discard all examples which do not support vectors and can still classify new examples?

Quiz Preview

In a neural network, which one of the following techniques is NOT useful to reduce overfitting?

For an image recognition problem (such as recognizing a cat in a photo), which architecture of neural network has been found to be better suited for the tasks

Suppose you run K-means clustering algorithm on a given dataset. What are the factors on which the final clusters depend on ? I. The value of K II. The initial cluster seeds chosen III. The distance function used.

If g(z) is the sigmoid function, then its derivative with respect to z may be written in term of g(z) as

The back-propagation learning algorithm applied to a two layer neural network

The Bayes Optimal Classifier

Which of the following is/are true about the Perceptron classifier? (Choose multiple option)

For the given set of points which of the following lines is most suitable to be the decision boundary?

Which of the following is true?