Hi I've read all the user guides but found none of them shows clearly instructions.
How can I write script that run normally on Omniscope?
If you're looking for a completely automated way to deal with this scenario (checking if records from a dataset are already in your DB) you could use the Join block.
You can pick multiple fields to define what constitutes a match e.g. Name (source1) = First name (source2) as well as ID = ID #, while you allow mismatches in other fields.
You could create 2 outputs : 1 for matches, second for the non-matches, then apply Fuzzy match to this group (Community blocks section) , where you can use our R code, or even amend it, to pick up fuzzy cases.
This should explain the Custom Block API for Python and R https://help.visokio.com/support/solutions/articles/42000071109-custom-block-r-python-api-reference . Please read it thoroughly together with some Custom Blocks present on our GitHub repo.
Here is another example you can play with it on our sandbox http://daily.omniscope.me/Nils/Blocks/ExampleCustomBlock.iox/ It has a Custom block written in Python which reads some options and outputs some data. You could expand it to slot in your custom script to start with.
Thanks for your reply.
However, I know how to use record filter indeed.
Let's go back to the example. So for a limited amount of data record from one table. It's doable certainly to enter manually using filter block, while it isn't efficient to check via filter block if you have 100 K of records
Is there any way you can run automatically? And also, it's not 100% equal because there are lots of entries containing something like "David_zhong" or "Zhong_David".
And hence I was thinking I can write a script in python to do it for me. I still try to get my head around the customer block and there is not a lot of tutorials and discussion online. It's really headache to make it work. Can you explain a little bit that why input_data=output_data in default script and what does output_data refer to?
I'm familiar with writing scripts on Jupyter notebook.
" to check whether a data entry is in our established database "
For this you can write your custom SQL query in the Database Source block indeed, and whatever data is returned by your query pass it to a Data Validation block, which for instance can check if a record exists or not.
Otherwise, if you dont want to use SQL at all, you can load the whole DB table in Omniscope, and use Record Filter, Data Validation and other Omniscope preparation blocks to filter out the data. and check whether an entry exists.
E.g. in Record Filter block you can define a rule that filters in / out all fields with a particular value, even use advanced fuzzy match rules not available in the SQL language.
An example for you http://daily.omniscope.me/Demo/filter+data.iox/
Thanks for your reply. really helpful.
correct me if I'm wrong. For the article you share, you can write SQL queries on the file once you created database that is connected (In my case, once I filled in host, etc for MySQL). Regarding SQL queries, I'm wondering how good it is. So can I write any SQL queries?
Let me give you more details. Any advice would be super appreciated.
In the article you shared, the scenario objective is to create a table to filter out those are in the categories. However, what if the scenario objective is to check whether a data entry is in our established database?
My idea is to use SQL Like to match them and return some sort of result to show me % they're likely in database. The difference between the dataset Fifa 2019 football player from Kaggle and the social media dataset we're dealing with is whether it is factual or not. In other words, any Fifa dataset you get is nearly 100% matching with data stored into database, because it contains fields such as player name, age, nationality, value, club.
What if there is a dataset containing double idenity like social media.
Do you have any idea based on what I described?
I would recommend to use the Database block to connect to MySQL, to either point to a table or execute your custom query.
Here's some info on how to https://help.visokio.com/support/discussions/topics/42000001523
Once you have the Database block set up, you can than connect it to other workflow blocks for further data transformation and other operations, even to a custom Python script if you'd want.
I'm not a developer but I'm exploring some ideas for the recent project. One of them is to write script to automate some sort of process.
I've read all the guideline and those are less applicable to my project. But is it not like writing jupyter notebook?
For connector, I was asking for write something like
!pip install ipython-sql !pip install pandas #load in SQL module you just installed %load_ext sql import pandas as pd %sql mysql://[IP address] %%sql #select one column from the database real_name= %sql SELECT Name FROM view_authors df = real_name.DataFrame()
So I can write SQL via python but actually is there any way like you said we can write SQL to interact with database since we don't have problem with connecting database.
To answer your question, it's MySQL.
Just to add that In general to connect to a database you can exploit any JDBC driver and have Omniscope Database Source block dealing with connection / querying etc. Simpler than writing your Python connector, of course unless the JDBC driver for that DB is not available.
Please let us know if you still need some help.
Do you refer to this guide ? https://help.visokio.com/support/solutions/articles/42000071270-getting-started-custom-scripts-in-omniscope-evo
Also notice this is our GitHub repo for custom blocks, some written in Python. You could check the code there as examples https://github.com/visokio/omniscope-custom-blocks
By the way, to which database your are trying to connect?
Sorry, I realized I have to provide more details. I want to write script to connect with our database. So I'm not clear how it works. In a normal situation, the user guide for writing python script is not that useful.