I have to do something that requires, I think, two scripting exercises.
The first part is to build a correspondence table.
For example: a file contains several thousand addresses (see an extract with the attached Excel). I need to group the addresses by similar sequences in the wording.
This attached drawing explains what I need.
This will save time in making a mapping table which can then be used in a join.... for which I need another block (another script).
I will write another ticket (called Help to create a join block by "one text contains another text") for the join.
Thanks in advance for help.
Best Regards
Magali
0 Votes
7 Comments
Sorted by
P
Paola Tomeiposted
6 months ago
Admin
Hi Magali,
It would be good to know whether you've had a chance to review the solution and if it deals with the requirement?
Thanks
Paola
0 Votes
P
Paola Tomeiposted
about 3 years ago
Admin
You can use the ETL process to break the string into individual words, merge them with the keywords list, then join the result with the original data.
Nice feature here is that if the text contains multiple keywords - this will be captured and multiple words will be listed, comma separated in the aggregated field.
However the "fuzzy match" matches for 313 records on 1609... the idea of trying to isolate a sequence that repeats may be better when both entry are not always in the same language.
Magali
0 Votes
Antonio Poggiposted
about 3 years ago
Admin
As per screenshare, you can exploit the community Fuzzy Join block to merge datasets based on similar text. It performs a join between the first (left) and second (right) input. The field on which the join is performed must be text containing multiple terms.
I have to do something that requires, I think, two scripting exercises.
The first part is to build a correspondence table.
For example: a file contains several thousand addresses (see an extract with the attached Excel). I need to group the addresses by similar sequences in the wording.
This attached drawing explains what I need.
This will save time in making a mapping table which can then be used in a join.... for which I need another block (another script).
I will write another ticket (called Help to create a join block by "one text contains another text") for the join.
Thanks in advance for help.
Best Regards
Magali
0 Votes
7 Comments
Paola Tomei posted 6 months ago Admin
Hi Magali,
It would be good to know whether you've had a chance to review the solution and if it deals with the requirement?
Thanks
Paola
0 Votes
Paola Tomei posted about 3 years ago Admin
You can use the ETL process to break the string into individual words, merge them with the keywords list, then join the result with the original data.
Nice feature here is that if the text contains multiple keywords - this will be captured and multiple words will be listed, comma separated in the aggregated field.
See the IOZ attached.
Attachments (1)
Key words CO....ioz
401 KB
0 Votes
Magali Colin - Avizua posted about 3 years ago
I tryed with a "left" join instead of a "right" one but it does not work finally :-(
Another idea ?
Magali
0 Votes
Magali Colin - Avizua posted about 3 years ago
Hi Antonio,
I understand why the result is not satifying : it keeps only one address line, and I need to keep all the adresses
I made a short exemple in the attached IOZ to explain :
Best Regards,
Magali
Attachments (1)
Test Fuzzy M....ioz
365 KB
0 Votes
Magali Colin - Avizua posted about 3 years ago
Thanks Antonio.
However the "fuzzy match" matches for 313 records on 1609... the idea of trying to isolate a sequence that repeats may be better when both entry are not always in the same language.
Magali
0 Votes
Antonio Poggi posted about 3 years ago Admin
As per screenshare, you can exploit the community Fuzzy Join block to merge datasets based on similar text. It performs a join between the first (left) and second (right) input. The field on which the join is performed must be text containing multiple terms.
The result will contain joined records based on how many terms they share, weighted by inverse document frequency ( https://en.wikipedia.org/wiki/Tf%E2%80%93idf )
Also you can leverage the fuzzy match in the Record filter but if you have many rules to set up that could be a bit tedious to set up and maintain.
1 Votes
Magali Colin - Avizua posted about 3 years ago
These are the files
Attachments (2)
file for new....xlsx
19 KB
Help to reco....png
71.9 KB
0 Votes
Login or Sign up to post a comment