弹幕分析 #1003

ubh0927 · 2023-11-12T14:39:45Z

纪录片弹幕.csv
分析F列的中文，将重复的文字进行删除

ubh0927 · 2023-11-12T14:40:32Z

import pandas as pd

Load the CSV file

danmu_df = pd.read_csv('path_to_your_file.csv')

Remove duplicates in column 'F' and keep the first occurrence

danmu_df_unique = danmu_df.drop_duplicates(subset=['F'])

Save the DataFrame with duplicates removed to a new CSV file

danmu_df_unique.to_csv('path_to_your_new_file.csv', index=False)

ubh0927 · 2023-11-12T14:41:04Z

Load the CSV file and remove duplicate entries in column 'F'

Attempt to load the CSV file, trying different encodings if necessary

try:
# Trying with default encoding first
danmu_df = pd.read_csv('/mnt/data/纪录片弹幕.csv')
except UnicodeDecodeError:
# If default encoding fails, trying with 'gbk' encoding which is commonly used for Chinese text
danmu_df = pd.read_csv('/mnt/data/纪录片弹幕.csv', encoding='gbk')

Remove duplicates in column 'F' and keep the first occurrence

danmu_df_unique = danmu_df.drop_duplicates(subset=['F'])

Save the DataFrame with duplicates removed to a new CSV file

output_path = '/mnt/data/纪录片弹幕_no_duplicates.csv'
danmu_df_unique.to_csv(output_path, index=False)

output_path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

弹幕分析 #1003

弹幕分析 #1003

ubh0927 commented Nov 12, 2023

ubh0927 commented Nov 12, 2023

ubh0927 commented Nov 12, 2023

弹幕分析 #1003

弹幕分析 #1003

Comments

ubh0927 commented Nov 12, 2023

ubh0927 commented Nov 12, 2023

Load the CSV file

Remove duplicates in column 'F' and keep the first occurrence

Save the DataFrame with duplicates removed to a new CSV file

ubh0927 commented Nov 12, 2023

Load the CSV file and remove duplicate entries in column 'F'

Attempt to load the CSV file, trying different encodings if necessary

Remove duplicates in column 'F' and keep the first occurrence

Save the DataFrame with duplicates removed to a new CSV file