Skip to content

minseok0809/korean-sentence-segementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korean Sentence Segmentation

  • AIHub Korean Dataset
  • JSON Parsing
  • Data Cleaning
  • Regular Expression
  • TXT Merge
  • Multiprocessing
  • Exploratory Data Analysis
    • Length of Source List
    • The Number of Character
    • Capacity
    • Preprocessing Runtime Calculator
    • Preprocessing Memory & Process & Thread
  • Future Work
    • Dataframe(Pandas or Polars), Dictionary Optimization
    • The Searcher of Source JSON
      To compare JSON and TXT extracted from JSON
      File Naming & Storage System (Before & After File Name Match in Excel)
      What's the Source TXT File Name to Each Line in Proprocessed TXT File? (Dataframe)
    • Remove Warning kss 3.7.3 Message: "[Korean Sentence Splitter]: Too long text! turn off quotes calibration!"
    • Cython Multithreading


Library

kss regex pandas