-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser giving unicode characters for Arabic language #47
Comments
Looks like one of the pipeline steps has introduced spurious UTF escaping of your characters. Easily fixable but likely to be an annoyance. To help pin-point the cause, could you replace the |
Yeah I will try and update you |
ﻣﻣﻠﻛﺔ\n\nﺗﻛون\n\nاﻟﺗﻲ\n\nاﻟدوﻟﯾﺔ\n\nواﻻﺗﻔﺎﻗﯾﺎت\n\nواﻟﻣﻌﺎھدات\n\nاﻷﻧظﻣﺔ\n\nﺑﮫ\n\nﺗﻘﺿﻲ\n\nﻣﺎ\n\nﻣراﻋﺎة\n\nﻣﻊ\n\nﻋﻠﯾﮭم. when i made the mode static and used pw.debug.compute_and_print(documents) the output in terminal is able to show in arabic , so how to fix this in streaming mode while using pw.io.csv.write |
Hey @abdul756 , streaming mode is only usable when you are running the app with In this case, seems like parser is working ok. If you want to dump the text into some file and keep the app running (so that when a new content or new file arrives, new data is put into your csv file), you can run your code with streaming mode enabled, you can achieve this with the addition of So, it will look as: documents = folder.select(text=parser(pw.this.data))
pw.io.csv.write(documents, "output_stream_en_7.csv")
pw.run() You can run this in notebook or in regular python file. This will start the pipeline that will keep running until you close. After running the pw.run, you will see the output file being created. If, you are interested in taking a dump for one time in a static manner, you can run the following: df = pw.debug.table_to_pandas(documents)
df.to_csv("outputs_en.csv") this will put the content into csv file. In this case, we take data into Pandas DataFrame, then write it to a file. This one doesn't need a |
In streaming mode the output is not coming its giving only unicode characters, you can refer the file i attached with the issue |
Yes, I just replicated the issue with another file. The static mode works ok (refer to the |
Sure thanks |
@abdul756 thanks for reporting this. The problem will be fixed in the next release (it'll be released this week). |
Hey @abdul756, |
I will test and update |
Parser giving unicode characters for Arabic language how to parse files for languages other than english
Am attaching the output
Code
نظام التكاليف القضائية.pdf
output_stream_ar_7.csv
Please help in resolving this issue
I would request help; I would like to see an example; I would like to understand the cause of the issue.
The text was updated successfully, but these errors were encountered: