Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

弹幕分析 #1003

Open
ubh0927 opened this issue Nov 12, 2023 · 2 comments
Open

弹幕分析 #1003

ubh0927 opened this issue Nov 12, 2023 · 2 comments

Comments

@ubh0927
Copy link

ubh0927 commented Nov 12, 2023

纪录片弹幕.csv
分析F列的中文,将重复的文字进行删除

@ubh0927
Copy link
Author

ubh0927 commented Nov 12, 2023

import pandas as pd

Load the CSV file

danmu_df = pd.read_csv('path_to_your_file.csv')

Remove duplicates in column 'F' and keep the first occurrence

danmu_df_unique = danmu_df.drop_duplicates(subset=['F'])

Save the DataFrame with duplicates removed to a new CSV file

danmu_df_unique.to_csv('path_to_your_new_file.csv', index=False)

@ubh0927
Copy link
Author

ubh0927 commented Nov 12, 2023

Load the CSV file and remove duplicate entries in column 'F'

Attempt to load the CSV file, trying different encodings if necessary

try:
# Trying with default encoding first
danmu_df = pd.read_csv('/mnt/data/纪录片弹幕.csv')
except UnicodeDecodeError:
# If default encoding fails, trying with 'gbk' encoding which is commonly used for Chinese text
danmu_df = pd.read_csv('/mnt/data/纪录片弹幕.csv', encoding='gbk')

Remove duplicates in column 'F' and keep the first occurrence

danmu_df_unique = danmu_df.drop_duplicates(subset=['F'])

Save the DataFrame with duplicates removed to a new CSV file

output_path = '/mnt/data/纪录片弹幕_no_duplicates.csv'
danmu_df_unique.to_csv(output_path, index=False)

output_path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant