-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description
Hello,
I went into a problem while deploying a model through the core API.
Note that this is not blocking, the end user simply has to be cautious.
Yet it seems as some kind of bug (or put it politely "weak conformity to the specification").
To put it briefly : when specifying a deployment on a table that reference an external table, one has to explicitly add the "SNB_" prefix to the name of the reference table.
This is non uniform with the other secundary tables.
I elaborate below with an example.
What, Why
Say you are using the khiops python library (core module) to execute a learn task and then a deploy task.
The root table for analysis goes something like this :
Root Dictionary subscriptions(subs_Id) {
Categorical subs_Id;
Categorical offer_name;
Categorical targetCol;
Table(Subs_Payments) payments;
Entity(Subs_Offer) offer [offer_name];
};
This main dictionary embeds reference to two tables : one secondary table that lists the payments related to each subscriptions and a reference table that provide generic info on the offer that can be subscribed.
This implies that the kdic files contains something along the following lines :
Root Dictionary Subs_Offer(offer_name) {
Categorical offer_name;
// various fields
};
Dictionary Subs_Payments(subs_Id) {
Categorical subs_Id;
// various fields
};
Ok. We're all set.
Now the API call to trigger learning through the khiops core library should be something like this (focus on the additional tables) :
kh.train_predictor(
subscriptions_kdic_path,
"subscriptions",
subscriptions_filepath,
"targetCol",
supervised_report_file_path,
additional_data_tables={
"payments":subs_payments_filepath,
"/Subs_Offer": offers_with_payment_options_filepath
},
max_trees=0,
)
Notice that above regular secondary tables are pointed to through the variable names, whereas external reference tables are pointed to through the slash + dictionary name (Subs_Offer and not offer).
Now here is the tricky part. When calling the API for deployment, we would expect the additional tables to have same names. Only the table to be deployed should be edited with the prefix "SNB_".
Alas, the reference table should also be edited with the prefix "SNB_".
So the current syntax is :
kh.deploy_model(
model_kdic_path,
"SNB_subscriptions",
subscriptions_filepath,
scored_subs_path, # Fichier de sortie
additional_data_tables={
"payments": subs_payments_filepath,
"/SNB_Subs_Offer": offers_with_payment_options_filepath
}
)
"/Subs_Offer" has turned into "/SNB_Subs_Offer".
Note that I have illustrated the problem with a small adhoc dico, inspired by the data I am currently working on.
But there is a dataset in the khiops data sample that might be used to work on this issue : it is the CustomerExtended dataset.
The main table Customer has a sub-table Address, that points to a reference table City.
Hence the spec of additional table should be something like : "/City" (when calling for train).
And when calling for deployment it becomes "/SNB_City".
Questions/Ideas
- A potential solution_
Ok, so the template suggests that I write dow a potential solution.
Alas I am not sure about this one.
When specifying paths to the additional tables in the core api, I understand that external reference tables are actually refered to by their dictionary name.
For instance, in the example I provided, the entity variable offer is actually of type Subs_Offer hence we put : "/Subs_Offer".
Then after training, in the kdic model file, all tables have been renamed with the "SNB_" prefix. There is no longer a Subs_Offer dictionary.