DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST: Databricks Certified Professional Data Scientist Exam
Free Practice Exam Questions (page: 3)
Updated On: 2-Jan-2026

What is the considerable difference between L1 and L2 regularization?

  1. L1 regularization has more accuracy of the resulting model
  2. Size of the model can be much smaller in L1 regularization than that produced by L2-regularization
  3. L2-regularization can be of vital importance when the application is deployed in resource-tight environments such as cell-phones.
  4. All of the above are correct

Answer(s): B

Explanation:

The two most common regularization methods are called L1 and L2 regularization. L1 regularization penalizes the weight vector for its L1-norm (i.e. the sum of the absolute values of the weights), whereas L2 regularization uses its L2-norm. There is usually not a considerable difference between the two methods in terms of the accuracy of the resulting model (Gao et al 2007), but L1 regularization has a significant advantage in practice. Because many of the weights of the features become zero as a result of L1-regularized training, the size of the model can be much smaller than that produced by L2-regularization. Compact models require less space on memory and storage, and enable the application to start up quickly. These merits can be of vital importance when the application is deployed in resource-tight environments such as cell-phones. Regularization works by adding the penalty associated with the coefficient values to the error of the hypothesis. This way, an accurate hypothesis with unlikely coefficients would be penalized whila a somewhat less accurate but more conservative hypothesis with low coefficients would not be penalized as much.



Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is...

  1. L2 is the sum of the square of the weights, while L1 is just the sum of the weights
  2. L1 is the sum of the square of the weights, while L2 is just the sum of the weights
  3. L1 gives Non-sparse output while L2 gives sparse outputs
  4. None of the above

Answer(s): A

Explanation:

Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights. As follows: L1 regularization on least squares:



Select the correct option which applies to L2 regularization

  1. Computational efficient due to having analytical solutions
  2. Non-sparse outputs
  3. No feature selection

Answer(s): A,B,C

Explanation:

The difference between their properties can be promptly summarized as follows:



Regularization is a very important technique in machine learning to prevent over fitting. And Optimizing with a L1 regularization term is harder than with an L2 regularization term because

  1. The penalty term is not differentiate
  2. The second derivative is not constant
  3. The objective function is not convex
  4. The constraints are quadratic

Answer(s): A

Explanation:

Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights.
Much of optimization theory has historically focused on convex loss functions because they're much easier to optimize than non-convex functions: a convex function over a bounded domain is guaranteed to have a minimum, and it's easy to find that minimum by following the gradient of the function at each point no matter where you start. For non-convex functions, on the other hand, where you start matters a great deal; if you start in a bad position and follow the gradient, you're likely to end up in a local minimum that is not necessarily equal to the global minimum. You can think of convex functions as cereal bowls: anywhere you start in the cereal bowl, you're likely to roll down to the bottom. A non-convex function is more like a skate park: lots of ramps, dips, ups and downs. It's a lot harder to find the lowest point in a skate park than it is a cereal bowl.



Viewing page 3 of 35
Viewing questions 9 - 12 out of 138 questions



Post your Comments and Discuss Databricks DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST exam prep with other Community members:

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST Exam Discussions & Posts