CalcRandom #33

IlCingalese · 2017-07-25T10:42:04Z

Hi,
is possible in calcRandom function accept EventsNames parameter like other algorithm function?
i think it's a bug

Claudio

pferrel · 2017-07-25T16:20:12Z

calcRandom, creates a random ranking of all items that is used if there is no reason to recommend any other way, such as using other events. It is therefore independent of events.

It is also seldom used. It is for situations where a large number of items do not have any events associated with them and gives very poor results (random recommendations?) but will expose more items to the user and then get events. I would not use it unless you have a good reason.

IlCingalese · 2017-07-25T17:28:59Z

I use a set of events for new product.. I use random to generate random new product to show to users.. If you pass none to peventstore.find you get all events back including $set and predict events that increase my spark output from 2.1 gb to 260gb.. I think that this can be a good reason for this OPTIONAL parameter Il 25 lug 2017 6:24 PM, "Pat Ferrel" <[email protected]> ha scritto:

…

calcRandom, creates a random ranking of all items that is used if there is no reason to recommend any other way, such as using other events. It is therefore independent of events. It is also seldom used. It is for situations where a large number of items do not have any events associated with them and gives very poor results (random recommendations?) but will expose more items to the user and then get events. I would not use it unless you have a good reason. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXkosKlC_sDdyu2oGATa4XswGXuROL0Dks5sRhahgaJpZM4OiYQO> .

pferrel · 2017-07-25T17:37:22Z

There is no need for events if the ranking is random and no need to increase event storage.

IlCingalese · 2017-07-25T17:46:11Z

Use optional parameter that corrispond at standard of your engine.json structure is a feature... Load useless data during training of large amount of data is a bad implementation. But as you wish and thanks for your time. Il 25 lug 2017 19:39, "Pat Ferrel" <[email protected]> ha scritto:

…

There is no need for events if the ranking is random and no need to increase event storage. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXkosDr1ims7Nhob0F8NbVV-OO92pTAMks5sRig0gaJpZM4OiYQO> .

pferrel · 2017-07-25T17:52:49Z

I think you misunderstand to score items randomly requires no data to be sent to the EventStore. During training, each item in the model is given a random number to rank them.

If you want to set a "custom" ranking this requires you to send $set events with your ranking.

Using "random" is extremely efficient, not sure why you would say it loads useless data.

IlCingalese · 2017-07-25T18:08:29Z

I use it on production... People can have same data storage and use a lot of engines.. I use random for showing news because i need item score change on every train for showing a variety of products. I tell you that using random without any eventname set can break down a train action because it load all events present in storage db. That' s my point of view. Maybe i use it for not it real purpouse.. And i' m so sorry.. But you cant load all records in a db for nothing Il 25 lug 2017 19:56, "Pat Ferrel" <[email protected]> ha scritto:

…

I think you misunderstand to score items randomly requires no data to be sent to the EventStore. During training, each item in the model is given a random number to rank them. If you want to set a "custom" ranking this requires you to send $set events with your ranking. Using "random" is extremely efficient, not sure why you would say it loads useless data. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXkosJTVlJj5Poy-O7XVVgwYDO9rC_FYks5sRixMgaJpZM4OiYQO> .

pferrel · 2017-07-25T19:58:06Z

It must load all events to calculate the model even without random ranking.

Sorry I still don't understand what you are saying is wrong. All data must be loaded, random ranking or not. This is a big-data application and so works on very large datasets. There are ways to trim the data but this has nothing to do with random ranking.

The random ranking should be part of the normal train operation, not a separate task. If you are doing 2 trains, there is no need. Updating the model is integral to changing the random ranking.

Are you trying to change the random ranking more often than you update the model?

IlCingalese · 2017-07-26T09:58:57Z

Ok i try to explain better with real example... Maybe you dont know that predicion.io store every query in database with special eventname "predict" and entitytype "pio_pr"... This mean that in production with 2-3 millions queries at day events present in db grow up faster.. And if you use find function without specify eventname you load useless data for nothing that is a dumb programming' s style. Il 25 lug 2017 21:58, "Pat Ferrel" <[email protected]> ha scritto:

…

It must load all events to calculate the model even without random ranking. Sorry I still don't understand what you are saying is wrong. All data must be loaded, random ranking or not. This is a big-data application and so works on very large datasets. There are ways to trim the data but this has nothing to do with random ranking. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXkosB8Ko9xnyVdJXeMB0k9p0bQQwWsGks5sRkjUgaJpZM4OiYQO> .

pferrel · 2017-07-28T00:59:51Z

@IlCingalese no you are wrong about this. Queries are not stored.

FYI I am a committer and PMC member on PIO and have worked on it for several years now. I also wrote the UR so I do know a bit about all this :-)

IlCingalese · 2017-07-28T07:15:54Z

I stop talk with you .. Really you scared me and dont be afraid about all forks.. Your mind is so close.. Have good life Il 28 lug 2017 2:59 AM, "Pat Ferrel" <[email protected]> ha scritto:

…

@IlCingalese <https://github.com/ilcingalese> no you are wrong about this. Queries are not stored. FYI I am a committer and PMC member on PIO and have worked on it for several years now. I also wrote the UR so I do know a bit about all this :-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXkosHNN8gp1TWnqi-KSkz1Na2fk7ifMks5sSTKIgaJpZM4OiYQO> .

dszeto · 2017-07-28T22:10:18Z

@IlCingalese It looks like your engine(s) have feedback turned on. Unless you are conducting online evaluation, having feedback turned on have no value and will only eat up event storage. Please turn it off by dropping --feedback from your pio deploy commands. Event feedback is not turned on by default.

pferrel · 2017-07-28T22:12:23Z

@IlCingalese there is an undocumented parameter to pio deploy called --feedback that will turn on storage of queries. It is off by default and not meant for casual use. https://github.com/apache/incubator-predictionio/blob/develop/core/src/main/scala/org/apache/predictionio/workflow/CreateServer.scala#L132

Could you or someone else have accidentally used this param in your pio deploy ... command?

As to calcRandom, it iterates through all the items in the Model to be written to Elasticsearch and assigns a random number to each. This happens during pio train. So no events need to be know by calcRandom.

pferrel · 2017-07-28T22:48:39Z

BTW if you have turned on --feedback for the UR it does nothing, the UR does not support it's use.

Furthermore the queries will never be deleted from the database. To cleanup the DB:

do a pio export
write a program to drop the queries and keep only the events you want
pio app data-delete to drop all data
pio-import... to import the cleaned up data

This will preserve your appName and access key. But it's not very safe if you are not completely sure the cleaned data is formatted correctly. To do it safely create a new app and access key for the cleaned data and test the import and predictions before switching to it. Then once everything is switched over, drop the old appName with pio add delete...

IlCingalese · 2017-07-29T01:10:18Z

Maybe i' m a great idiot.. I use the script linked here... https://predictionio.incubator.apache.org/deploy/#retrain-and-deploy-script I spotted there is a --feedback parameter set.. I ' m so sry but i use this script when i start use predicionio 8 months ago... When i was very newbie about predicionio and never check it and changed from it. After all i implement in your engine for production use other change that i tell so you can evaluate them in query request - a must fields propertie uselull for real filter some properties - a order by field so user can order result for a specific ranking score In train - change ranking group from ranking type to user ranking name so people can create custom similar ranking system like for example top click for last three days and top clicks on last month. Or random ranking only for some event.. For example new event... So news ranking change every train This is usefull because with one engine i can support a great number of query type. Have fun and thank. And if you can change that script or remove it from that page Il 29 lug 2017 00:48, "Pat Ferrel" <[email protected]> ha scritto: BTW if you have turned on --feedback for the UR it does nothing, the UR does not support it's use. Furthermore the queries will never be deleted from the database. To cleanup the DB: 1. do a pio export 2. write a program to drop the queries and keep only the events you want 3. pio app data-delete to drop all data 4. pio-import... to import the cleaned up data This will preserve your appName and access key. But it's not very safe if you are not completely sure the cleaned data is formatted correctly. To do it safely create a new app and access key for the cleaned data and test the import and predictions before switching to it. Then once everything is switched over, drop the old appName with pio add delete... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXkosGHD4YMx2Us5kjwNYvpkdwhqEl3Nks5sSmVIgaJpZM4OiYQO> .

pferrel closed this as completed Jul 25, 2017

pferrel reopened this Jul 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CalcRandom #33

CalcRandom #33

IlCingalese commented Jul 25, 2017

pferrel commented Jul 25, 2017

IlCingalese commented Jul 25, 2017 via email

pferrel commented Jul 25, 2017

IlCingalese commented Jul 25, 2017 via email

pferrel commented Jul 25, 2017

IlCingalese commented Jul 25, 2017 via email

pferrel commented Jul 25, 2017 •

edited

Loading

IlCingalese commented Jul 26, 2017 via email

pferrel commented Jul 28, 2017

IlCingalese commented Jul 28, 2017 via email

dszeto commented Jul 28, 2017

pferrel commented Jul 28, 2017

pferrel commented Jul 28, 2017

IlCingalese commented Jul 29, 2017 via email

CalcRandom #33

CalcRandom #33

Comments

IlCingalese commented Jul 25, 2017

pferrel commented Jul 25, 2017

IlCingalese commented Jul 25, 2017 via email

pferrel commented Jul 25, 2017

IlCingalese commented Jul 25, 2017 via email

pferrel commented Jul 25, 2017

IlCingalese commented Jul 25, 2017 via email

pferrel commented Jul 25, 2017 • edited Loading

IlCingalese commented Jul 26, 2017 via email

pferrel commented Jul 28, 2017

IlCingalese commented Jul 28, 2017 via email

dszeto commented Jul 28, 2017

pferrel commented Jul 28, 2017

pferrel commented Jul 28, 2017

IlCingalese commented Jul 29, 2017 via email

pferrel commented Jul 25, 2017 •

edited

Loading