AML Quiz 4 Section B

Approved & Edited by ProProfs Editorial Team
The editorial team at ProProfs Quizzes consists of a select group of subject experts, trivia writers, and quiz masters who have authored over 10,000 quizzes taken by more than 100 million users. This team includes our in-house seasoned quiz moderators and subject matter experts. Our editorial experts, spread across the world, are rigorously trained using our comprehensive guidelines to ensure that you receive the highest quality quizzes.
Learn about Our Editorial Process
A
Community Contributor
Quizzes Created: 1 | Total Attempts: 81
Questions: 10 | Attempts: 81

Settings

.

• 1.

• A.

Option 1

• B.

Option 2

• C.

Option 3

• D.

Option 4

A. Option 1
• 2.

Which of the following is/are true about the Perceptron classifier? (Choose multiple option)

• A.

It can learn a OR function

• B.

It can learn a AND function

• C.

The obtained separating hyperplane depends on the order in which the points are presented in the training process.

• D.

For a linearly separable problem, there exists some initialization of the weights which might lead to non-convergent cases.

A. It can learn a OR function
B. It can learn a AND function
C. The obtained separating hyperplane depends on the order in which the points are presented in the training process.
Explanation
The Perceptron classifier is capable of learning both the OR function and the AND function. It can learn the OR function by adjusting its weights and biases to classify inputs that satisfy the logical OR condition. Similarly, it can learn the AND function by adjusting its weights and biases to classify inputs that satisfy the logical AND condition. Additionally, the obtained separating hyperplane in the Perceptron classifier depends on the order in which the points are presented during the training process. This means that the order of the training data can affect the final decision boundary of the classifier. However, for a linearly separable problem, there always exists some initialization of the weights that will lead to convergence, eliminating the possibility of non-convergent cases.

Rate this question:

• 3.

Suppose you run K-means clustering algorithm on a given dataset. What are the factors on which the final clusters depend on ? I. The value of K II. The initial cluster seeds chosen III. The distance function used.

• A.

I only

• B.

II only

• C.

I & II only

• D.

I,II and III

D. I,II and III
Explanation
The final clusters in K-means clustering depend on the value of K, which determines the number of clusters to be formed. The initial cluster seeds chosen also affect the final clusters as they determine the starting points for the algorithm. Additionally, the distance function used plays a crucial role in calculating the similarity between data points and assigning them to clusters. Therefore, all three factors - the value of K, the initial cluster seeds, and the distance function used - impact the final clusters in K-means clustering.

Rate this question:

• 4.

After training an SVM, we can discard all examples which do not support vectors and can still classify new examples?

• A.

True

• B.

False

A. True
Explanation
After training an SVM, we can discard all examples which do not support vectors and still classify new examples because support vectors are the only examples that determine the decision boundary of the SVM. These support vectors are the closest examples to the decision boundary and are crucial for classification. By discarding examples that are not support vectors, we can simplify the model without losing classification accuracy.

Rate this question:

• 5.

If g(z) is the sigmoid function, then its derivative with respect to z may be written in term of g(z) as

• A.

G(z)(1-g(z))

• B.

G(z)(1+g(z))

• C.

-g(z)(1+g(z))

• D.

G(z)(g(z)-1)

A. G(z)(1-g(z))
Explanation
The sigmoid function is defined as g(z) = 1 / (1 + e^(-z)). To find its derivative with respect to z, we can use the quotient rule. Let f(z) = 1 and g(z) = 1 + e^(-z). Applying the quotient rule, we get f'(z)g(z) - f(z)g'(z) / (g(z))^2. Simplifying this expression gives (0)(1 + e^(-z)) - 1(-e^(-z)) / (1 + e^(-z))^2, which simplifies further to -e^(-z) / (1 + e^(-z))^2. Multiplying the numerator and denominator by e^z, we get -e^(-z) * e^z / (e^z + 1)^2. Simplifying this gives -1 / (e^z + 1) * 1 / (e^z + 1), which is equivalent to -1 / (e^z + 1)^2. Since g(z) = 1 / (1 + e^(-z)), we can substitute g(z) into the expression to get -1 / (g(z) + 1)^2. This is equivalent to -g(z)(1 - g(z)), which matches the given answer of g(z)(1 - g(z)).

Rate this question:

• 6.

The back-propagation learning algorithm applied to a two layer neural network

• A.

Always finds the globally optimal solution

• B.

Finds a locally optimal solution which may be globally optimal.

• C.

Never finds the globally optimal solution.

• D.

Finds a locally optimal solution which is never globally optimal

B. Finds a locally optimal solution which may be globally optimal.
Explanation
The back-propagation learning algorithm applied to a two-layer neural network finds a locally optimal solution which may be globally optimal. This means that while the algorithm may not guarantee the absolute best solution, it is able to find a solution that is locally optimal and could potentially be the globally optimal solution. This is because the back-propagation algorithm iteratively adjusts the weights of the neural network based on the error between the predicted and actual outputs, gradually improving the network's performance. However, it is important to note that the algorithm may still get stuck in local optima depending on the specific problem and network architecture.

Rate this question:

• 7.

Which of the following is true?

• A.

In batch gradient descent we update the weights and biases of the neural network after forward pass over each training example.

• B.

In batch gradient descent we update the weights and biases of our neural network after forward pass over all the training examples.

• C.

Each step of stochastic gradient descent takes more time than each step of batch gradient descent.

• D.

None of these three options is correct

B. In batch gradient descent we update the weights and biases of our neural network after forward pass over all the training examples.
Explanation
In batch gradient descent, the weights and biases of the neural network are updated after a forward pass over all the training examples. This means that the updates are made using the average gradient calculated from all the training examples in a single iteration. This approach allows for a more accurate update of the parameters and can lead to faster convergence compared to updating after each individual training example.

Rate this question:

• 8.

In a neural network, which one of the following techniques is NOT useful to reduce overfitting?

• A.

Dropout

• B.

Regularization

• C.

Batch normalizatioh

• D.

D. Adding more layers
Explanation
Adding more layers is not useful to reduce overfitting in a neural network. Overfitting occurs when a model becomes too complex and starts to memorize the training data instead of learning the underlying patterns. Adding more layers can increase the complexity of the model, potentially exacerbating overfitting. Techniques like dropout, regularization, and batch normalization, on the other hand, are specifically designed to combat overfitting by introducing regularization constraints, reducing co-adaptation between neurons, and normalizing the inputs, respectively.

Rate this question:

• 9.

For an image recognition problem (such as recognizing a cat in a photo), which architecture of neural network has been found to be better suited for the tasks

• A.

Multi layer perceptron

• B.

Recurrent neural network

• C.

Convolutional neural network

• D.

Perceptron

C. Convolutional neural network
Explanation
Convolutional neural networks (CNNs) have been found to be better suited for image recognition problems such as recognizing a cat in a photo. CNNs are specifically designed to process grid-like data, like images, and are able to automatically learn and extract features from the input data. They consist of multiple layers of interconnected neurons, including convolutional layers that apply filters to the input data, pooling layers that downsample the data, and fully connected layers for classification. This architecture allows CNNs to efficiently capture spatial hierarchies and patterns in images, making them highly effective for image recognition tasks.

Rate this question:

• 10.

The Bayes Optimal Classifier

• A.

Is an ensemble of some selected hypotheses in the hypothesis space

• B.

Is an ensemble of all the hypotheses in the hypothesis space

• C.

Is the hypothesis that gives best result on test instances

• D.

None of the above