-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CalcRandom #33
Comments
calcRandom, creates a random ranking of all items that is used if there is no reason to recommend any other way, such as using other events. It is therefore independent of events. It is also seldom used. It is for situations where a large number of items do not have any events associated with them and gives very poor results (random recommendations?) but will expose more items to the user and then get events. I would not use it unless you have a good reason. |
I use a set of events for new product.. I use random to generate random new
product to show to users.. If you pass none to peventstore.find you get all
events back including $set and predict events that increase my spark output
from 2.1 gb to 260gb..
I think that this can be a good reason for this OPTIONAL parameter
Il 25 lug 2017 6:24 PM, "Pat Ferrel" <[email protected]> ha scritto:
… calcRandom, creates a random ranking of all items that is used if there is
no reason to recommend any other way, such as using other events. It is
therefore independent of events.
It is also seldom used. It is for situations where a large number of items
do not have any events associated with them and gives very poor results
(random recommendations?) but will expose more items to the user and then
get events. I would not use it unless you have a good reason.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosKlC_sDdyu2oGATa4XswGXuROL0Dks5sRhahgaJpZM4OiYQO>
.
|
There is no need for events if the ranking is random and no need to increase event storage. |
Use optional parameter that corrispond at standard of your engine.json
structure is a feature... Load useless data during training of large amount
of data is a bad implementation. But as you wish and thanks for your time.
Il 25 lug 2017 19:39, "Pat Ferrel" <[email protected]> ha scritto:
… There is no need for events if the ranking is random and no need to
increase event storage.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosDr1ims7Nhob0F8NbVV-OO92pTAMks5sRig0gaJpZM4OiYQO>
.
|
I think you misunderstand to score items randomly requires no data to be sent to the EventStore. During training, each item in the model is given a random number to rank them. If you want to set a "custom" ranking this requires you to send $set events with your ranking. Using "random" is extremely efficient, not sure why you would say it loads useless data. |
I use it on production... People can have same data storage and use a lot
of engines.. I use random for showing news because i need item score change
on every train for showing a variety of products. I tell you that using
random without any eventname set can break down a train action because it
load all events present in storage db. That' s my point of view. Maybe i
use it for not it real purpouse.. And i' m so sorry.. But you cant load all
records in a db for nothing
Il 25 lug 2017 19:56, "Pat Ferrel" <[email protected]> ha scritto:
… I think you misunderstand to score items randomly requires no data to be
sent to the EventStore. During training, each item in the model is given a
random number to rank them.
If you want to set a "custom" ranking this requires you to send $set
events with your ranking.
Using "random" is extremely efficient, not sure why you would say it loads
useless data.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosJTVlJj5Poy-O7XVVgwYDO9rC_FYks5sRixMgaJpZM4OiYQO>
.
|
It must load all events to calculate the model even without random ranking. Sorry I still don't understand what you are saying is wrong. All data must be loaded, random ranking or not. This is a big-data application and so works on very large datasets. There are ways to trim the data but this has nothing to do with random ranking. The random ranking should be part of the normal train operation, not a separate task. If you are doing 2 trains, there is no need. Updating the model is integral to changing the random ranking. Are you trying to change the random ranking more often than you update the model? |
Ok i try to explain better with real example... Maybe you dont know that
predicion.io store every query in database with special eventname "predict"
and entitytype "pio_pr"... This mean that in production with 2-3 millions
queries at day events present in db grow up faster.. And if you use find
function without specify eventname you load useless data for nothing that
is a dumb programming' s style.
Il 25 lug 2017 21:58, "Pat Ferrel" <[email protected]> ha scritto:
… It must load all events to calculate the model even without random ranking.
Sorry I still don't understand what you are saying is wrong. All data must
be loaded, random ranking or not. This is a big-data application and so
works on very large datasets. There are ways to trim the data but this has
nothing to do with random ranking.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosB8Ko9xnyVdJXeMB0k9p0bQQwWsGks5sRkjUgaJpZM4OiYQO>
.
|
@IlCingalese no you are wrong about this. Queries are not stored. FYI I am a committer and PMC member on PIO and have worked on it for several years now. I also wrote the UR so I do know a bit about all this :-) |
I stop talk with you .. Really you scared me and dont be afraid about all
forks.. Your mind is so close..
Have good life
Il 28 lug 2017 2:59 AM, "Pat Ferrel" <[email protected]> ha scritto:
… @IlCingalese <https://github.com/ilcingalese> no you are wrong about
this. Queries are not stored.
FYI I am a committer and PMC member on PIO and have worked on it for
several years now. I also wrote the UR so I do know a bit about all this :-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosHNN8gp1TWnqi-KSkz1Na2fk7ifMks5sSTKIgaJpZM4OiYQO>
.
|
@IlCingalese It looks like your engine(s) have feedback turned on. Unless you are conducting online evaluation, having feedback turned on have no value and will only eat up event storage. Please turn it off by dropping |
@IlCingalese there is an undocumented parameter to Could you or someone else have accidentally used this param in your As to calcRandom, it iterates through all the items in the Model to be written to Elasticsearch and assigns a random number to each. This happens during |
BTW if you have turned on Furthermore the queries will never be deleted from the database. To cleanup the DB:
This will preserve your appName and access key. But it's not very safe if you are not completely sure the cleaned data is formatted correctly. To do it safely create a new app and access key for the cleaned data and test the import and predictions before switching to it. Then once everything is switched over, drop the old appName with |
Maybe i' m a great idiot.. I use the script linked here...
https://predictionio.incubator.apache.org/deploy/#retrain-and-deploy-script
I spotted there is a --feedback parameter set.. I ' m so sry but i use this
script when i start use predicionio 8 months ago... When i was very newbie
about predicionio and never check it and changed from it.
After all i implement in your engine for production use other change that i
tell so you can evaluate them
in query request
- a must fields propertie uselull for real filter some properties
- a order by field so user can order result for a specific ranking score
In train
- change ranking group from ranking type to user ranking name so people can
create custom similar ranking system like for example top click for last
three days and top clicks on last month. Or random ranking only for some
event.. For example new event... So news ranking change every train
This is usefull because with one engine i can support a great number of
query type.
Have fun and thank. And if you can change that script or remove it from
that page
Il 29 lug 2017 00:48, "Pat Ferrel" <[email protected]> ha scritto:
BTW if you have turned on --feedback for the UR it does nothing, the UR
does not support it's use.
Furthermore the queries will never be deleted from the database. To cleanup
the DB:
1. do a pio export
2. write a program to drop the queries and keep only the events you want
3. pio app data-delete to drop all data
4. pio-import... to import the cleaned up data
This will preserve your appName and access key. But it's not very safe if
you are not completely sure the cleaned data is formatted correctly. To do
it safely create a new app and access key for the cleaned data and test the
import and predictions before switching to it. Then once everything is
switched over, drop the old appName with pio add delete...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosGHD4YMx2Us5kjwNYvpkdwhqEl3Nks5sSmVIgaJpZM4OiYQO>
.
|
Hi,
is possible in calcRandom function accept EventsNames parameter like other algorithm function?
i think it's a bug
Claudio
The text was updated successfully, but these errors were encountered: