Skip to content

Automate selection of number of features #155

@sandervh14

Description

@sandervh14

Task Title

Task: Automate selection of number of features

Task Description

Part of #134 but can be delivered independently and would already bring great value in allowing to re-fit models automatically.

cfr message to Nick of 21/3:

Hey Nick,

Benoît is right, you can just script one run of your model creation for future re-fits.

The only big hurdle indeed is the selection of parameters. If you fix the number of features to select to a reasonable number (for example, keep it equal to the currently​ selected number of features), the client can recurrently re-fit the model.

This solution is a bit sub-optimal though, in that new data or even new features may result in the model under- or over-fitting.
The manual process we make as data scientists to select the optimal​ number of parameters though, is easy, just an elbow curve problem. Algorithms already exist to select the optimal number automatically - mainly on elbow curves for clustering (or you could implement a simple check yourself on how the slope against the elbow curve evolves at each newly added feature). This will could be solved on the project of the client you mention, +- 2 days of work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions