Generation

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

# import data import pandas as pd data = pd.read_csv("ml-100k/u.data", sep="\t", names=["user_id", "item_id", "rating", "timestamp"]) data.head() # create a new dataframe that counts the number of ratings per user user_data = data.groupby('user_id').count() user_data.head() # create a new column that is true if a user has rated 100 or more user_data['hasrated'] = user_data['rating'] >= 100 user_data.head() # create a new dataframe from the original data, where user_id is in the list # user_id's that have more than 100 ratings rating_data = data[data['user_id'].isin(user_data[user_data['hasrated'] == True].index)] rating_data.head()

Questions about programming?Chat with your personal AI assistant